Eliminating Redundant Shells Within Job Containers (K8s GitLab Runner) Utilizing FF_KUBERNETES_HONOR_ENTRYPOINT

Context:

  1. ENTRYPOINT of the job’s image: ENTRYPOINT ["/sbin/tini", "--"], e.g.
FROM node:18-alpine

#[redacted]

ENTRYPOINT ["/sbin/tini", "--"]

  1. env var FF_KUBERNETES_HONOR_ENTRYPOINT=true set for all jobs for that gilab-runner
  2. sample pipeline yaml:
default:
  image: <image-built-with-Dockerfile-from-point-1>
test:
  script:
    - node some-script.js

From the container’s perspective, my job process namespace appears as follows:

  PID    PPID CMD
      1       0 /sbin/tini -- sh -c if [ -x /usr/local/bin/bash ]; then ?exec /usr/local/bin/bash  elif [ -x /usr/bin/bash ]; then ?exec /usr/bin/bash  elif [ -x /bin/bash ]; then ?exec /bin/bash  elif [ -x /usr/local/bin/sh ]; then ?exec /usr/local/bin/sh  elif [ -x /usr/bin/sh ]; then ?exec /usr/bin/sh  elif [ -x /bin/sh ]; then ?exec /bin/sh  elif [ -x /busybox/sh ]; then ?exec
      6       1 /bin/bash
     24       1 sh -c (/scripts-1089-1632280/detect_shell_script /scripts-1089-1632280/step_script 2>&1 | tee -a /logs-1089-1632280/output.log) &
     25      24  \_ /bin/bash /scripts-1089-1632280/step_script
     30      25  |   \_ /bin/bash /scripts-1089-1632280/step_script
    143      30  |       \_ bash [redacted]
    146     143  |           \_ node some-script.js
    254     146  |               \_ node [redacted]
     26      24  \_ tee -a /logs-1089-1632280/output.log

I would like to understand the purpose and reason behind the presence of the bash process (pid=6). Is there a way to not running it, and if so, how can I do that?

(*) yaml fragment of the job POD:

  containers:                                                                                                                                                                                                                                                                                                                                                                               │
│   - args:                                                                                                                                                                                                                                                                                                                                                                                   │
│     - sh                                                                                                                                                                                                                                                                                                                                                                                    │
│     - -c                                                                                                                                                                                                                                                                                                                                                                                    │
│     - "if [ -x /usr/local/bin/bash ]; then\n\texec /usr/local/bin/bash \nelif [ -x                                                                                                                                                                                                                                                                                                          │
│       /usr/bin/bash ]; then\n\texec /usr/bin/bash \nelif [ -x /bin/bash ]; then\n\texec                                                                                                                                                                                                                                                                                                     │
│       /bin/bash \nelif [ -x /usr/local/bin/sh ]; then\n\texec /usr/local/bin/sh \nelif                                                                                                                                                                                                                                                                                                      │
│       [ -x /usr/bin/sh ]; then\n\texec /usr/bin/sh \nelif [ -x /bin/sh ]; then\n\texec                                                                                                                                                                                                                                                                                                      │
│       /bin/sh \nelif [ -x /busybox/sh ]; then\n\texec /busybox/sh \nelse\n\techo shell                                                                                                                                                                                                                                                                                                      │
│       not found\n\texit 1\nfi\n\n"                                                                                                                                                                                                                                                                                                                                                          │
│

I’ve set FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY to false, but nothing changed.

My further search

  1. Show pid,ppids,state, and wchan
    ps -e afx -o pid,ppid,state,cmd,wchan

out:

    PID    PPID S CMD                         WCHAN
    388       0 S bash                        do_wait
    395     388 R  \_ ps -e afx -o pid,ppid,s -
      1       0 S /sbin/tini -- sh -c if [ -x do_sigtimedwait
      7       1 S /bin/bash                   pipe_read
     26       1 S sh -c (/scripts-1089-166072 do_wait
     27      26 S  \_ /bin/bash /scripts-1089 do_wait
     32      27 S  |   \_ /bin/bash /scripts- do_wait
    272      32 S  |       \_ node /usr/local do_epoll_wait
    376     272 S  |           \_ node /usr/l do_epoll_wait
     28      26 S  \_ tee -a /logs-1089-16607 pipe_read

We are looking for pid id = 7 and pipes as the wchan column shows (process is blocked in the kernel space on pipe_read kfunc).

  1. Find all open descriptors of the process
    ls -l /proc/7/fd

out:

lr-x------    1 root     root            64 Nov  3 09:37 0 -> pipe:[65275272]
l-wx------    1 root     root            64 Nov  3 09:37 1 -> pipe:[65275273]
l-wx------    1 root     root            64 Nov  3 09:37 2 -> pipe:[65275274]
  1. Looking for pids connected with above pipes:
    ls -l /proc/*/fd/* | grep -E "65275272|65275273|65275274"

out (I can’t see the writer for the read end of the pipe, and vice versa):

lr-x------    1 root     root            64 Nov  3 08:00 /proc/1/fd/0 -> pipe:[65275272]
l-wx------    1 root     root            64 Nov  3 08:00 /proc/1/fd/1 -> pipe:[65275273]
l-wx------    1 root     root            64 Nov  3 08:00 /proc/1/fd/2 -> pipe:[65275274]
l-wx------    1 root     root            64 Nov  3 09:39 /proc/26/fd/1 -> pipe:[65275273]
l-wx------    1 root     root            64 Nov  3 09:39 /proc/26/fd/2 -> pipe:[65275274]
l-wx------    1 root     root            64 Nov  3 09:39 /proc/28/fd/1 -> pipe:[65275273]
l-wx------    1 root     root            64 Nov  3 09:39 /proc/28/fd/2 -> pipe:[65275274]
lr-x------    1 root     root            64 Nov  3 09:37 /proc/7/fd/0 -> pipe:[65275272]
l-wx------    1 root     root            64 Nov  3 09:37 /proc/7/fd/1 -> pipe:[65275273]
l-wx------    1 root     root            64 Nov  3 09:37 /proc/7/fd/2 -> pipe:[65275274]
  1. Try to find by threads (warning: I used a different job, so the pipe IDs are different from those in the previous steps.):
    ls -l /proc/*/task/*/fd/* | grep -E "65276114|65276115|65276116"

out (I can’t see the writer for the read end of the pipe, and vice versa):

lr-x------    1 root     root            64 Nov  3 10:10 /proc/1/task/1/fd/0 -> pipe:[65276114]
l-wx------    1 root     root            64 Nov  3 10:10 /proc/1/task/1/fd/1 -> pipe:[65276115]
l-wx------    1 root     root            64 Nov  3 10:10 /proc/1/task/1/fd/2 -> pipe:[65276116]
l-wx------    1 root     root            64 Nov  3 10:10 /proc/25/task/25/fd/1 -> pipe:[65276115]
l-wx------    1 root     root            64 Nov  3 10:10 /proc/25/task/25/fd/2 -> pipe:[65276116]
l-wx------    1 root     root            64 Nov  3 10:10 /proc/27/task/27/fd/1 -> pipe:[65276115]
l-wx------    1 root     root            64 Nov  3 10:10 /proc/27/task/27/fd/2 -> pipe:[65276116]
lr-x------    1 root     root            64 Nov  3 10:10 /proc/7/task/7/fd/0 -> pipe:[65276114]
l-wx------    1 root     root            64 Nov  3 10:10 /proc/7/task/7/fd/1 -> pipe:[65276115]
l-wx------    1 root     root            64 Nov  3 10:10 /proc/7/task/7/fd/2 -> pipe:[65276116]

Summary:

Bash is associated with several pipes (refer to item 2). It’s possible that the writing processes connected to these pipes may have already completed their tasks and closed their file descriptors, or they may not have initiated properly. Conversely, the same reasoning could apply if Bash has opened a pipe for writing but no corresponding process has opened the pipe for reading.

OK, it ended up here: official issue