How does gitlab-runner create a docker container exactly?

I’d need to understand how does gitlab-runner create the docker container using the docker executor. I am asking because my CI image contains systemd as ENTRYPOINT, and starting it manually with docker run produces a correct initialization and the systemd “boot” process ends up with multi-user.target, starting also the units that manage the processes I’m interested in. Instead, using the same image for running a CI job with gitlab-runner ends with systemd hitting the rescue target and finally with a non-working system since my units are not started. Here’s how I launch the container in the two cases:

sudo docker run -i --rm --name test --tmpfs /tmp:rw --tmpfs /run:rw -v /sys/fs/cgroup:/sys/fs/cgroup:ro --env container=docker  my-image:latest

and:

sudo gitlab-runner exec docker --docker-tmpfs /run:rw --docker-tmpfs /tmp:rw --docker-volumes "/sys/fs/cgroup:/sys/fs/cgroup:ro" --env container=docker my-build-job

where of course my-build-job uses my-image:latest.

I cannot understand what might differ in container creation between the two cases that makes systemd enter rescue mode when using gitlab-runner; no failed unit is reported when using gitlab-runner, and inspecting the systemd logs and the docker containers gives no evidence of what might be wrong (I can post the logs if someone is interested to take a look). So I decided to change approach and seek for differences in container creation commands, but I cannot find anything sufficiently detailed about gitlab-runner’s container instantiation.

Any help would be greatly appreciated. Using gitlab-runner 15.3.0 (bbcb5aba).

Hi @nicolamori

GitLab Runner with Docker executor creates 2 linked containers, one helper and one build container. The build container is where the (before_|after_)?script: part of of your job is executed.
The runner expects that the image has no entrypoint or that the entrypoint is prepared to start a shell command. The runner sends the script to the container’s stdin.

You have not shared what’s the content of your scripts sections.

1 Like

Hi @balonik, thanks for your answer. I think some clarification is needed. my container has this entrypoint:

ENTRYPOINT ["/usr/sbin/init"]

and an empty CMD. In this way Gitlab CI cannot work since its override of CMD has no effect (since the entrypoint ignores it) and thus when it attaches to the container there’s no shell for executing the CI job. So I created and enabled a systemd unit starting a bash shell attached to the container pipes:

ExecStart=/bin/bash -c "exec /bin/bash < /proc/1/fd/0 > /proc/1/fd/1 2>/proc/1/fd/2"

and this works: attaching to the container and sending commands results in these commands being routed to the stdin of the shell which executes them. All of this makes sense just because PID 1, i.e. systemd, does not read from stdin so all the stdin stream is captured by the shell.

Now, the issue with all of the above is that when gitlab-runner uses that image to create the build container something goes wrong with the startup procedure, making systemd fall back to the rescue target which does not execute my shell unit that wanted by multi-user target. So when gitlab-runner attaches to the container and sends in the CI commands there’s no shell in the container to execute them. By the way, at this point if I enter the container and start the shell unit manually with systemctl then the CI commands starts to be executed and the job terminates correctly. So everything is fine with the job, and as you now surely understand the content of .gitlab-ci.yml is not relevant for the problem.

And also the image is OK, since starting it with docker run the startup process proceeds regularly down to the mult-user target. At this point if I docker attach to the container and sends in commands with the keyboard they are correctly executed by the shell spawned by systemd through my custom shell unit.

Instead, what might be relevant is how the container is started by gitlab-runner, i.e. the options that it uses to start the container, since one of them not present in my manual docker run invocation described above (or the other way round) might trigger the bad startup. I don’t know the internals of gitlab-runner so I don’t know which mechanism it uses for creating a container (I’d guess docker API or socket) but if I could obtain the docker CLI equivalent then I could compare it with my manual startup, find the differences, and the identify which one triggers the problem.

Sorry for the verbosity, I hope that it’s a bit more clear now. For the time being I worked around the problem by making my shell unit wanted by the rescue target, and in this way I can run CI jobs without errors; but I would like to understand and possibly fix this problem.

Gitlab-runner is using API calls using official Docker client Go library. So it’s not easy to rewrite calls to CLI arguments. You can dig in it’s source code here
But running a docker inspect should give you enough details how was the container started.

Ok, thanks a lot. As I wrote in my initial post I already inspected the containers but found no relevant difference. But there must be something evil playing here, since when I launch the CI job by committing and pushing to the Gitlab repository then the container is started and the systemd initialization sequence finishes with multi-user.target without falling back to rescue.target. So it seems that it’s exclusively an issue of launching gitlab-runner manually from the CLI, but it’s way too hard for me to debug, and since the regular CI seems to work correctly without workarounds then I guess I’ll live with this.