I have a CI/CD pipeline running on a GitLab server (common-lisp.net) for regression testing. The docker images used (to test the same code across different platforms) are constantly changing. Recently, when I have tried to update some of these docker images, I have been getting a “shell not found” error. This seems to be related to the use of docker images whose entry points are not shells.
The documentation suggests that overriding this by specifying entrypoint for these images, but it’s not clear how best to do this: there are 2 different recipes specified in the docs (entrypoint: [""] vs. entrypoint: ["/bin/bash", "-c"]), with no instructions on how to determine which to use nor any recipe that will run correctly in both situations (e.g., using a conditional).
My attempts to control the entrypoint have not worked reliably. Sometimes when I added an entrypoint: [““] specification, I get this error:
/usr/bin/sh: /usr/bin/sh: cannot execute binary file
Sometimes when I do `entrypoint: ["/bin/bash", "-c"] I get:
sh: 9: Syntax error: Unterminated quoted string
The description of how the runner interacts with the docker container is not at all complete, and I have not been able to determine which of the entrypoint specs, if either, is appropriate, or how to debug the above knock-on errors.
There’s a lot of discussion about this on the web, but no clear answer (or not one that I have been able to find).
Steps to reproduce
My attempts can be seen here: https://gitlab.common-lisp.net/asdf/asdf/-/pipelines
included yaml file with the key bits: `https://gitlab.common-lisp.net/asdf/asdf/-/blob/fix-sbcl-internal/gitlab-pipelines/standard-pipeline.yml?ref_type=heads`
GitLab Runner, if self-hosted (Web /admin/runners or CLI gitlab-runner --version): As a user, I don’t know how to get this information.
Apology
I tried to post links for a bunch of things in the above, but the forum software first modified them, and then refused to allow me to post the links. I have tried to quote the links, but may have failed.
So, in full transparency, I did not know how this is implemented yet but had nightmares with debugging similar problems with entrypoints when at my ex-company.
Docker images to reproduce the error:
I assume they are rpgoldman/sbcl:trixie and buildpack-deps:trixie and some more, which I cannot immediately extract from the matrix builds.
Navigating further, overriding the entrypoint to get inside the container to inspect what is going on.
docker run -ti --entrypoint bash rpgoldman/sbcl:trixie
root@42d2e4c75393:~# which docker-entrypoint.sh
/usr/local/bin/docker-entrypoint.sh
root@42d2e4c75393:~# cat /usr/local/bin/docker-entrypoint.sh
#!/bin/sh
# If the first arg starts with a hyphen, prepend sbcl to arguments.
if [ "${1#-}" != "$1" ]; then
set -- sbcl "$@"
fi
exec "$@"
Don’t know what sbcl is but from running the container like normal, it looks like a gdb / debugger environment for LiSp (ctrl+d to exit).
I asked GitLab Duo Agentic Chat on the gitlab/gitlab-runner project, to inspect the source code. I would usually do that manually but AI is more efficient.
Running with gitlab-runner 17.6.0 (374d34fd)
on common-lisp.net 509b4988, system ID: s_6de6a7cec1b3
Preparing the "docker" executor
00:14
Using Docker executor with image rpgoldman/sbcl:trixie ...
Pulling docker image rpgoldman/sbcl:trixie ...
Using docker image sha256:e563a364588c035d89409ee7fa079aeea6b3a2b83da8712785541c6dcd2cde8b for rpgoldman/sbcl:trixie with digest rpgoldman/sbcl@sha256:27edae485ae5d2a9c507e7912e43f2a67243d71ac09181e8400c6c7bae810005 ...
Preparing environment
00:06
Running on runner-509b4988-project-46-concurrent-0 via legacy.common-lisp.net...
Getting source from Git repository
00:16
Fetching changes...
Reinitialized existing Git repository in /builds/asdf/asdf/.git/
Checking out 197ca7c8 as detached HEAD (ref is refs/merge-requests/244/head)...
Removing build/
Removing test/test-multiple-too.asd
Removing test/try-reloading-dependency.asd
Updating/initializing submodules...
Synchronizing submodule url for 'ext/alexandria'
Synchronizing submodule url for 'ext/asdf-encodings'
Synchronizing submodule url for 'ext/cl-launch'
Synchronizing submodule url for 'ext/cl-ppcre'
Synchronizing submodule url for 'ext/cl-scripting'
Synchronizing submodule url for 'ext/closer-closer-mop'
Synchronizing submodule url for 'ext/fare-mop'
Synchronizing submodule url for 'ext/fare-quasiquote'
Synchronizing submodule url for 'ext/fare-utils'
Synchronizing submodule url for 'ext/inferior-shell'
Synchronizing submodule url for 'ext/lisp-invocation'
Synchronizing submodule url for 'ext/named-readtables'
Synchronizing submodule url for 'ext/optima'
Entering 'ext/alexandria'
Entering 'ext/asdf-encodings'
Entering 'ext/cl-launch'
Entering 'ext/cl-ppcre'
Entering 'ext/cl-scripting'
Entering 'ext/closer-closer-mop'
Entering 'ext/fare-mop'
Entering 'ext/fare-quasiquote'
Entering 'ext/fare-utils'
Entering 'ext/inferior-shell'
Entering 'ext/lisp-invocation'
Entering 'ext/named-readtables'
Entering 'ext/optima'
Entering 'ext/alexandria'
HEAD is now at 3b849bc Avoid duplicate logic in WHEN-LET*
Entering 'ext/asdf-encodings'
HEAD is now at 40a8670 Add support for CLASP, courtesy of Karsten Poeck
Entering 'ext/cl-launch'
HEAD is now at bed79a8 Clarify license at MIT, not LLGPL
Entering 'ext/cl-ppcre'
HEAD is now at 1ca0cd9 2.1.1
Entering 'ext/cl-scripting'
HEAD is now at 60c357e Tweak printing of failure object
Entering 'ext/closer-closer-mop'
HEAD is now at e37cff6 Checked against SBCL 1.5.7 - no changes.
Entering 'ext/fare-mop'
HEAD is now at 538aa94 Update package: It's uiop, not asdf/driver anymore
Entering 'ext/fare-quasiquote'
HEAD is now at 640d39a Merge branch 'master' into 'master'
Entering 'ext/fare-utils'
HEAD is now at 66e9c6f Don't define, use or export style-warn anymore
Entering 'ext/inferior-shell'
HEAD is now at e1f6378 Use exec when running simple commands.
Entering 'ext/lisp-invocation'
HEAD is now at ebf543c 1.0.14: Add :console argument, be nicer to allegro
Entering 'ext/named-readtables'
HEAD is now at 985b162 fix tests
Entering 'ext/optima'
HEAD is now at 373b245 Merge pull request #116 from jasom/list-star-fix
Updated submodules
Synchronizing submodule url for 'ext/alexandria'
Synchronizing submodule url for 'ext/asdf-encodings'
Synchronizing submodule url for 'ext/cl-launch'
Synchronizing submodule url for 'ext/cl-ppcre'
Synchronizing submodule url for 'ext/cl-scripting'
Synchronizing submodule url for 'ext/closer-closer-mop'
Synchronizing submodule url for 'ext/fare-mop'
Synchronizing submodule url for 'ext/fare-quasiquote'
Synchronizing submodule url for 'ext/fare-utils'
Synchronizing submodule url for 'ext/inferior-shell'
Synchronizing submodule url for 'ext/lisp-invocation'
Synchronizing submodule url for 'ext/named-readtables'
Synchronizing submodule url for 'ext/optima'
Entering 'ext/alexandria'
Entering 'ext/asdf-encodings'
Entering 'ext/cl-launch'
Entering 'ext/cl-ppcre'
Entering 'ext/cl-scripting'
Entering 'ext/closer-closer-mop'
Entering 'ext/fare-mop'
Entering 'ext/fare-quasiquote'
Entering 'ext/fare-utils'
Entering 'ext/inferior-shell'
Entering 'ext/lisp-invocation'
Entering 'ext/named-readtables'
Entering 'ext/optima'
Entering 'ext/alexandria'
Entering 'ext/asdf-encodings'
Entering 'ext/cl-launch'
Entering 'ext/cl-ppcre'
Entering 'ext/cl-scripting'
Entering 'ext/closer-closer-mop'
Entering 'ext/fare-mop'
Entering 'ext/fare-quasiquote'
Entering 'ext/fare-utils'
Entering 'ext/inferior-shell'
Entering 'ext/lisp-invocation'
Entering 'ext/named-readtables'
Entering 'ext/optima'
Executing "step_script" stage of the job script
00:08
Using docker image sha256:e563a364588c035d89409ee7fa079aeea6b3a2b83da8712785541c6dcd2cde8b for rpgoldman/sbcl:trixie with digest rpgoldman/sbcl@sha256:27edae485ae5d2a9c507e7912e43f2a67243d71ac09181e8400c6c7bae810005 ...
shell not found
Uploading artifacts for failed job
00:04
Uploading artifacts...
WARNING: build/asdf.lisp: no matching files. Ensure that the artifact path is relative to the working directory (/builds/asdf/asdf)
ERROR: No files to upload
Cleaning up project directory and file based variables
00:04
ERROR: Job failed: exit code 1
Then it said something interesting about how GitLab Runner detects shells inside the Docker executor.
entrypoint: [""] - Docker might not interpret this correctly, can fail if something weird goes on in the image. ["/bin/bash", "-c"] - Generated bash script might not align with how the Docker cmd executor expects the the format. entrypoint: ["/bin/sh", "-c"] - Similar problem as with sh … which SHELL are we using here? entrypoint: [] Default shell is detected
Asked Agentic Chat about this detail, no problems detected. Shared the CI/CD config, updated - and Agentic Chat summarized the problems with bash -c and sh -c again.
That matches with my Runner - also amd64. Your runner does not give that detail, runner-509b4988-project-46-concurrent-0 via legacy.common-lisp.net – maybe it is using arm64 or another platform architecture. (just a wild guess, don’t think this is the root cause)
Maybe a Docker version bug?
Me: Could this be a Docker bug in specific versions that an empty array works, [“”] not, and all bash/sh variants not.
Runner injects set -o pipefail before running scripts
Dash (/bin/sh) doesn’t support pipefail - only bash does
When the entrypoint is overridden, the wrong shell might be selected
Option 1: Empty entrypoint (simplest):
image:
name: rpgoldman/sbcl:trixie
entrypoint: []
Option 2: The workaround (if you need bash specifically):
Enable debug for CI/CD with CI_DEBUG_TRACE: 1 variable
Test entrypoint: [] and share feedback, maybe we can update the docs if we find out why
Maybe it is a Runner <> Docker version thing.
Maybe it is the problem with dash vs sh vs bash pipefail - and entrypoint: [] solves that.
Resulting CI/CD pipeline, final.
Summary
spec:
inputs:
image:
default: "rpgoldman/sbcl:trixie"
description: "Docker image to test"
---
variables:
CI_DEBUG_TRACE: 1
.debug-tmpl:
before_script:
- echo $SHELL
- which $SHELL
script:
- echo "SUCCESS - Entrypoint override worked!"
- which sbcl
- sbcl --version
# Might work but not buggy with how Docker handles and empty harry
test_entrypoint_empty_str:
extends: [ .debug-tmpl ]
image:
name: $[[ inputs.image ]]
entrypoint: [""] # This clears the entrypoint
# WORKS
test_entrypoint_empty_null:
extends: [ .debug-tmpl ]
image:
name: $[[ inputs.image ]]
entrypoint: [] # ← Empty array, not [""]
# DOES NOT WORK
test_entrypoint_shc:
extends: [ .debug-tmpl ]
image:
name: $[[ inputs.image ]]
entrypoint: ["/bin/sh", "-c"]
test_entrypoint_bashc:
extends: [ .debug-tmpl ]
image:
name: $[[ inputs.image ]]
entrypoint: ["/bin/bash", "-c"]
# HM?
test_entrypoint_sh:
extends: [ .debug-tmpl ]
image:
name: $[[ inputs.image ]]
entrypoint: ["/bin/sh"]
test_entrypoint_bash:
extends: [ .debug-tmpl ]
image:
name: $[[ inputs.image ]]
entrypoint: ["/bin/bash"]
What else?
If entrypoint overrides continue to not work with entrypoint: [], consider creating a separate image tag which does not use a default entrypoint and can be run in CI/CD. This would be my recommended path unless it causes too much work. Debugging (and changing) Runner/Docker/Shell detection is hard … but at least I learned many new things today. Hope it helps you, too.
Thank you so much for your very comprehensive answer. I’m still groping towards an answer here.
I backed out many of my random attempts and most recently have the following confusing results (which can be inspected here: https://gitlab.common-lisp.net/asdf/asdf/-/pipelines/13459). There are three jobs here (not in order), two of which are of immediate interest:
The one for your test_entry_override ( `https://gitlab.common-lisp.net/asdf/asdf/-/jobs/104732), which appears to run as expected:
Executing "step_script" stage of the job script 00:00
Using effective pull policy of [always] for container rpgoldman/sbcl:trixie
Using docker image sha256:e563a364588c035d89409ee7fa079aeea6b3a2b83da8712785541c6dcd2cde8b for rpgoldman/sbcl:trixie with digest rpgoldman/sbcl@sha256:27edae485ae5d2a9c507e7912e43f2a67243d71ac09181e8400c6c7bae810005 ...
Custom entrypoint worked
The one for ASDF build, which appears to show that overriding the entrypoint in the same way does not work ( https://gitlab.common-lisp.net/asdf/asdf/-/jobs/104730)
Executing "step_script" stage of the job script 00:01
Using effective pull policy of [always] for container rpgoldman/sbcl:trixie
Using docker image sha256:e563a364588c035d89409ee7fa079aeea6b3a2b83da8712785541c6dcd2cde8b for rpgoldman/sbcl:trixie with digest rpgoldman/sbcl@sha256:27edae485ae5d2a9c507e7912e43f2a67243d71ac09181e8400c6c7bae810005 ...
+ set -o
+ grep pipefail
+ set -o pipefail
+ set -o errexit
+ set +o noclobber
sh: 8: Syntax error: "do" unexpected
The second example appears to indicate that despite my entrypoint specification, the gitlab runner is using /bin/sh instead of /bin/bash.
Here are the two stanzas from the yaml file
test_entrypoint_override:
stage: build
rules:
- when: always
image:
name: rpgoldman/sbcl:trixie
entrypoint: ["/bin/bash", "-c", "echo 'Custom entrypoint worked'"]
script:
- echo "If you see this, entrypoint override is allowed"
- echo "Check above for 'Entrypoint override disabled' warning"
So there’s definitely something funny going on with the entrypoint.
I’ve spent hours on this, so I should probably just fix the docker images and give up on figuring out the runner, but it’s unsatisfactory not to know what’s going on here.
Ok, this aligns with what I saw - just ["/bin/bash"] confuses the Runner shell selector script, and it cannot detect the correct terminal. I did not look into the respective code / script parts but would assume we fall through a switch-case condition and hit a wrong default.
I’ve spent hours on this, so I should probably just fix the docker images and give up on figuring out the runner, but it’s unsatisfactory not to know what’s going on here.
I’m sorry you had this experience, and would love to find something “solutionary” that maybe no-one ever has found. Or, just giving others the sense with this topic that they are not alone, fighting this issue.
Ask for you
I’m generally curious if [] works for your pipelines.
Thanks again. I will do that check. Right now I am completely bamboozled by what I’m seeing in. I removed the entrypoint part of the configuration of “Build ASDF,” in preparation for using the updated docker files. I reran (inadvertently, before updating the required image). “Build ASDF” failed again, but that was to be expected.
What was not expected was that the following job – Build ASDF docs – also failed with the “shell not found” error.
It appears that configuration information from one yaml job can bleed into another. This is something I had seen before, but it seemed so odd that I just assumed that I had gotten confused, and I disregarded it. But here it is happening again. One job’s configuration somehow is affecting the following job.
This can be seen in the results here: https://gitlab.common-lisp.net/asdf/asdf/-/pipelines/13463 as contrasted with the results here: https://gitlab.common-lisp.net/asdf/asdf/-/pipelines/13461. In the first, “Build docs” failed, after other job configurations were modified. In the second, earlier run, “Build docs” succeeded. The “Build docs” job’s configuration did not change between the two runs, commit 490d6cdbc3118cfce8532f6f386fa5952d9c8118 succeeding and 7ed549f6ec1d7834a75365e16ce6a569ca9e8ff0 – its direct child – failing.
Note that the Build docs job not only is configured in a different stanza, it also uses a different Docker image!
Interesting find. This sounds like a problem with the Runner infrastructure - the parts we do not see as users.
Runner 17.6.0 is 1 year old, might have bugs.
We don’t know the config.toml, maybe the default shell is set, and other limitations apply
Which host system is used, caches, Docker versions, bugs in there.
If you can, involve an admin of that infrastructure to help investigate with settings, version, logs. And suggest upgrading to 18.5.x to match the same server major version.
Something had changed in that project - yesterday I was able to see these pipeline job logs without login, today it is limited. Pipeline #13463 · asdf / asdf · GitLab