Describe your question in as much detail as possible:
TL;DR: Only if user gitlab-runner
is logged into a shell on the box, CI/CD pipelines using GitLab Runner succeed. Otherwise, they all fail.
At work we are running GitLab 15.7.5-ee. But we donβt have any GitLab Runners defined, and I wanted to leverage the CI/CD pipeline features. So I followed the instructions in the GitLab docs to install a GitLab Runner on an RHEL8 server we have. I added the GitLab repo, installed gitlab-runner
, and registered it to a test project repository I setup to iron out any issues. All this went flawlessly. GitLab sees the runner, shows green, etc.
My intention is to use Podman vs. Docker on the RHEL8 box, so that is all installed and configured. And Iβve run some basic containers to verify things are working at a basic level. Gitlab Runner is configured according to the Docker docs which cover this. It is set to run under user gitlab-runner
(which it setup when it installed). The config.toml
file is adjusted to point at the Podman UNIX socket file. User gitlab-runner
is added to /etc/subuid
and /etc/subgid
, etc.
For now all I have in the repo itself is a README.md
and .gitlab-ci.yml
file. And the contents of the latter are the default where itβs just a few stages that all pretty much just execute echo
commands.
However, when I try to run a pipeline (for example, by editing the .gitlab-ci.yml and saving it to trigger this), the stages fail with
Running with gitlab-runner 15.8.2 (4d1ca121)
on ockness.net.unc.edu U8sxgA3a, system ID: s_4f3a2b84f82c
feature flags: FF_NETWORK_PER_BUILD:true
Preparing the "docker" executor 00:09
ERROR: Failed to remove network for build
ERROR: Preparation failed: Cannot connect to the Docker daemon at unix:///run/user/984/podman/podman.sock. Is the docker daemon running? (docker.go:753:0s)
Will be retried in 3s ...
failing after 3 tries.
Now I have Google-fuβd my way to finding multiple references to this error, most of which were folks trying to simply use Docker with GitLab Runner and not Podman. So the fixes described (mostly making sure Docker was installed) donβt apply.
Now the oddest part is I have tracked this down to something rather specific. That is, if all I do is login to the RHEL8 box as user gitlab-runner
(which I had to do initially following the instructions in order to install the socket feature/etc.) and just stay logged in and repeat the above to trigger a pipeline run, everything succeeds! If I log back out of gitlab-runner
and repeat, it fails once again.
As best as I can figure it, when GitLab Runner tickles the UNIX socket which should trigger bringing up the Podman service, it doesnβt. But if again, user gitlab-runner
is logged into a shell (even doing nothing), then it all works.
I cannot find anything in the logs to indicate what the issue here is. But Iβm hoping someone else has hit on this and can point me in the right direction.
What are you seeing, and how does that differ from what you expect to see?
I am seeing every pipeline fail when I expect to see them succeed.
Consider including screenshots, error messages, and/or other helpful visuals
See Above.
What version are you on? Are you using self-managed or GitLab.com?
- GitLab (Hint:
/help
):
GitLab 15.7.5-ee
- Runner (Hint:
/admin/runners
):
That doesnβt work for me. I get a 404. But Iβm only a user of the system, not an admin.
Add the CI configuration from .gitlab-ci.yml
and other configuration if relevant (e.g. docker-compose.yml)
stages: # List of stages for jobs, and their order of execution
- build
- test
- deploy
build-job: # This job runs in the build stage, which runs first.
image: alpine:latest
stage: build
script:
- echo "Compiling the code..."
- echo "Compile complete."
unit-test-job: # This job runs in the test stage.
image: alpine:latest
stage: test # It only starts when the job in the build stage completes successfully.
script:
- echo "Running unit tests... This will take about 60 seconds."
- sleep 60
- echo "Code coverage is 90%"
lint-test-job: # This job also runs in the test stage.
image: alpine:latest
stage: test # It can run at the same time as unit-test-job (in parallel).
script:
- echo "Linting code... This will take about 10 seconds."
- sleep 10
- echo "No lint issues found."
deploy-job: # This job runs in the deploy stage.
image: alpine:latest
stage: deploy # It only runs when *both* jobs in the test stage complete successfully.
environment: production
script:
- echo "Deploying application..."
- echo "Application successfully deployed."
What troubleshooting steps have you already taken? Can you link to any docs or other resources so we know where you have been?
Oof. Where to begin? Ok, I have tried everything from rebooting the RHEL8 box (no change) to adjusting the config.toml
file with various parameter changes and more. And I can consistently make the pipeline succeed if I simply login to the gitlab-runner
account and then go over to GitLab and re-run a pipeline job. And if I log out and do it again, it fails. Again, it feels like GitLab Runner simply isnβt starting up properly if GitLab triggers a CI/CD pipeline and that user isnβt logged in.
As for docs/resource, for starters,
- Docker executor | GitLab
- Install GitLab Runner | GitLab
- Install GitLab Runner using the official GitLab repositories | GitLab (the RHEL bits)
Finally, Some (Sanitized) Notes I Took of the process
# Add official GitLab repository
curl -L "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.rpm.sh" | sudo bash
# Install GitLab Runner
sudo dnf install gitlab-runner
# Register runner using info provided in GitLab under Settings | CICD, then expanding 'Runners', which includes a URL and token
sudo gitlab-runner register
# For CentOS/etc. you can simply do this
# yum install podman
# For RHEL8, to install Podman, do this which adds Podman and related tools
sudo dnf module enable -y container-tools:rhel8
sudo dnf module install -y container-tools:rhel8
sudo dnf install podman-docker # to get Docker CLI
sudo dns install podman-plugins # for network aliasing
GitLab Runner adds a user gitlab-runner
to the system. We need to make sure this user is configured properly to run GitLab Runner, so we do the following:
# As root, we set the password on this account so we can login
sudo passwd gitlab-runner
# Next we SSH into this account so we can set things up, using the password from the previous step
ssh gitlab-runner@localhost
# Next we enable and start the Podman socket (very important)
systemctl --user --now enable podman.socket
# And we verify things are working
systemctl status --user podman.socket
# The above command also provides us with that path to the socket we need (e.g., `/run/user/<#>/podman/podman.sock`) which we'll use in the next step
Modify /etc/gitlab-runner/config.toml
:
...
executor = "docker"
# ADD THE FOLLOWING LINES TO CREATE NETWORK FOR EACH JOB
[runners.feature_flags]
FF_NETWORK_PER_BUILD = true
...
[runners.docker]
# ADD THE FOLLOWING LINE SO GitLab Runner knows where to find Podman
host = "unix:///run/user/<#>/podman/podman.sock"
...
sudo gitlab-runner restart
Thanks for taking the time to be thorough in your request, it really helps!