GitLab Runner fails unless I am logged into gitlab-runner account

Describe your question in as much detail as possible:

TL;DR: Only if user gitlab-runner is logged into a shell on the box, CI/CD pipelines using GitLab Runner succeed. Otherwise, they all fail.


At work we are running GitLab 15.7.5-ee. But we don’t have any GitLab Runners defined, and I wanted to leverage the CI/CD pipeline features. So I followed the instructions in the GitLab docs to install a GitLab Runner on an RHEL8 server we have. I added the GitLab repo, installed gitlab-runner, and registered it to a test project repository I setup to iron out any issues. All this went flawlessly. GitLab sees the runner, shows green, etc.

My intention is to use Podman vs. Docker on the RHEL8 box, so that is all installed and configured. And I’ve run some basic containers to verify things are working at a basic level. Gitlab Runner is configured according to the Docker docs which cover this. It is set to run under user gitlab-runner (which it setup when it installed). The config.toml file is adjusted to point at the Podman UNIX socket file. User gitlab-runner is added to /etc/subuid and /etc/subgid, etc.

For now all I have in the repo itself is a README.md and .gitlab-ci.yml file. And the contents of the latter are the default where it’s just a few stages that all pretty much just execute echo commands.

However, when I try to run a pipeline (for example, by editing the .gitlab-ci.yml and saving it to trigger this), the stages fail with

Running with gitlab-runner 15.8.2 (4d1ca121)
  on ockness.net.unc.edu U8sxgA3a, system ID: s_4f3a2b84f82c
  feature flags: FF_NETWORK_PER_BUILD:true
Preparing the "docker" executor 00:09
ERROR: Failed to remove network for build
ERROR: Preparation failed: Cannot connect to the Docker daemon at unix:///run/user/984/podman/podman.sock. Is the docker daemon running? (docker.go:753:0s)
Will be retried in 3s ...

failing after 3 tries.

Now I have Google-fu’d my way to finding multiple references to this error, most of which were folks trying to simply use Docker with GitLab Runner and not Podman. So the fixes described (mostly making sure Docker was installed) don’t apply.

Now the oddest part is I have tracked this down to something rather specific. That is, if all I do is login to the RHEL8 box as user gitlab-runner (which I had to do initially following the instructions in order to install the socket feature/etc.) and just stay logged in and repeat the above to trigger a pipeline run, everything succeeds! If I log back out of gitlab-runner and repeat, it fails once again.

As best as I can figure it, when GitLab Runner tickles the UNIX socket which should trigger bringing up the Podman service, it doesn’t. But if again, user gitlab-runner is logged into a shell (even doing nothing), then it all works.

I cannot find anything in the logs to indicate what the issue here is. But I’m hoping someone else has hit on this and can point me in the right direction.

What are you seeing, and how does that differ from what you expect to see?

I am seeing every pipeline fail when I expect to see them succeed.

Consider including screenshots, error messages, and/or other helpful visuals

See Above.

What version are you on? Are you using self-managed or GitLab.com?

  • GitLab (Hint: /help):

GitLab 15.7.5-ee

  • Runner (Hint: /admin/runners):

That doesn’t work for me. I get a 404. But I’m only a user of the system, not an admin.

Add the CI configuration from .gitlab-ci.yml and other configuration if relevant (e.g. docker-compose.yml)

stages:          # List of stages for jobs, and their order of execution
  - build
  - test
  - deploy

build-job:       # This job runs in the build stage, which runs first.
  image: alpine:latest
  stage: build
  script:
    - echo "Compiling the code..."
    - echo "Compile complete."

unit-test-job:   # This job runs in the test stage.
  image: alpine:latest
  stage: test    # It only starts when the job in the build stage completes successfully.
  script:
    - echo "Running unit tests... This will take about 60 seconds."
    - sleep 60
    - echo "Code coverage is 90%"

lint-test-job:   # This job also runs in the test stage.
  image: alpine:latest
  stage: test    # It can run at the same time as unit-test-job (in parallel).
  script:
    - echo "Linting code... This will take about 10 seconds."
    - sleep 10
    - echo "No lint issues found."

deploy-job:      # This job runs in the deploy stage.
  image: alpine:latest
  stage: deploy  # It only runs when *both* jobs in the test stage complete successfully.
  environment: production
  script:
    - echo "Deploying application..."
    - echo "Application successfully deployed."

What troubleshooting steps have you already taken? Can you link to any docs or other resources so we know where you have been?

Oof. Where to begin? Ok, I have tried everything from rebooting the RHEL8 box (no change) to adjusting the config.toml file with various parameter changes and more. And I can consistently make the pipeline succeed if I simply login to the gitlab-runner account and then go over to GitLab and re-run a pipeline job. And if I log out and do it again, it fails. Again, it feels like GitLab Runner simply isn’t starting up properly if GitLab triggers a CI/CD pipeline and that user isn’t logged in.

As for docs/resource, for starters,


Finally, Some (Sanitized) Notes I Took of the process

# Add official GitLab repository
curl -L "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.rpm.sh" | sudo bash

# Install GitLab Runner
sudo dnf install gitlab-runner

# Register runner using info provided in GitLab under Settings | CICD, then expanding 'Runners', which includes a URL and token
sudo gitlab-runner register

# For CentOS/etc. you can simply do this
# yum install podman

# For RHEL8, to install Podman, do this which adds Podman and related tools
sudo dnf module enable -y container-tools:rhel8
sudo dnf module install -y container-tools:rhel8
sudo dnf install podman-docker # to get Docker CLI
sudo dns install podman-plugins # for network aliasing

GitLab Runner adds a user gitlab-runner to the system. We need to make sure this user is configured properly to run GitLab Runner, so we do the following:

# As root, we set the password on this account so we can login
sudo passwd gitlab-runner

# Next we SSH into this account so we can set things up, using the password from the previous step
ssh gitlab-runner@localhost

# Next we enable and start the Podman socket (very important)
systemctl --user --now enable podman.socket

# And we verify things are working
systemctl status --user podman.socket

# The above command also provides us with that path to the socket we need (e.g., `/run/user/<#>/podman/podman.sock`) which we'll use in the next step

Modify /etc/gitlab-runner/config.toml:

...
  executor = "docker"
  # ADD THE FOLLOWING LINES TO CREATE NETWORK FOR EACH JOB
  [runners.feature_flags]
    FF_NETWORK_PER_BUILD = true
...
  [runners.docker]
    # ADD THE FOLLOWING LINE SO GitLab Runner knows where to find Podman
    host = "unix:///run/user/<#>/podman/podman.sock"
...
sudo gitlab-runner restart

Thanks for taking the time to be thorough in your request, it really helps! :blush:

Just out of interest, wouldn’t it be easier to just start the podman systemd service and then use /var/run/podman/podman.sock?

I did find this issue however, that suggest some other changes that would need to be made to the runner config, eg: image url and privileged option. Podman as a drop-in replacement for Docker in GitLab Runner Docker Executor on Linux - Beta (#29108) Β· Issues Β· GitLab.org / gitlab-runner Β· GitLab

Hey @iwalker ,

To be clear, I followed the instructions here: Docker executor | GitLab

which only indicate in steps #5-6 to run

systemctl --user --now enable podman.socket
systemctl status --user podman.socket

which I did do. So I followed those steps, installing GitLab Runner (#1). Then in step #2 where they tell you to sign into the gitlab-runner account using SSH, I had to first set a password on that account (created when I installed the GitLab Runner in step #1). Then I made sure to adjust /etc/subuid and /etc/subgid for that user (#3). Podman was already installed, as I’d used the RHEL8 specific instructions (as can be seen in my notes) (#5). I followed steps #6-8 and adjusted the /etc/gitlab-runner/config.toml file with the appropriate socket string.

So not exactly sure what you mean when you say

Just out of interest, wouldn’t it be easier to just start the podman systemd service and then use /var/run/podman/podman.sock

Did I miss something in the instructions? Or are there steps not documented that should be?

Since you are having problems trying to run it as a user, I suggested to do what normally would be done with podman, and enable it system-wide so.

su root
systemctl enable podman.socket
systemctl restart podman

then you will see /var/run/podman/podman.sock and I believe you could use this, no matter what the user. At least if that works, it will at least confirm that it’s user-specific and not for entirely the system. Unless of course it doesn’t work using the system-wide method as well.

TL;DR: Tried your steps, and that works. But doesn’t explain why the user level approach doesn’t.

Since you are having problems trying to run it as a user

But that’s just it. I am not having problems running Podman as a user. If I log into the RHEL8 box with my own account, I can spin up containers with ease. If I login as user gitlab-runner, same thing. I can spin up containers.

But if I create a GitLab CI/CD pipeline, as in the example given above, then it fails.

If I look in /home/gitlab-runner/, I see the .config subdirectory, which contains

[gitlab-runner@[server] .config]$ tree
.
β”œβ”€β”€ cni
β”‚   └── net.d
β”‚       └── cni.lock
└── systemd
    └── user
        └── sockets.target.wants
            └── podman.socket -> /usr/lib/systemd/user/podman.socket

So if I understand it correctly, when I did step #5 (systemctl --user --now enable podman.socket), that likely created all this. And I’m guessing based on my experience that by installing the Podman socket with the --user option, that’s what makes this a user level service of sorts. So is the idea here (vs. your typical system-wide service that is always running) that by doing this, this service only starts up when that particular user logs in, and then stops when that user logs out? And IF that is the case, WHY exactly do GitLab’s own instructions indicate to do this if the GitLab Runner does not actually β€œlog in” as it were? Because that is what it feels like is going on here.

I followed GitLab’s instructions to the letter, and I get the whole Docker daemon vs. daemonless approach of Podman and using a socket, etc. What I don’t get is why this isn’t working. And if I need to do as you say and enable this system-wide, why is that not in the documentation?

For completeness, here’s what I did following your instructions:

  1. Just to confirm nothing changed overnight, I re-ran the pipeline WITHOUT being logged in as gitlab-runner on the RHEL8 box. It failed.
  2. I simply logged into the RHEL8 box as gitlab-runner, then I re-ran the pipeline. It succeeded.
  3. I logged back out of the RHEL8 box, then I re-ran the pipeline. It failed. So thus far, all is behaving as before.
  4. I logged into my own account and did the steps you provided above:
su root
systemctl enable podman.socket
systemctl restart podman
  1. I re-ran the pipeline. It failed.
  2. I realized that with enabling it system-wide, that UNIX socket path likely is different from the one when it’s user level. So I logged back into my account, went into root, and did a systemctl status --user podman.socket. Got an error. Whoops, that’s not right. We need systemctl status podman.socket. So did that. Got
# systemctl status podman.socket
● podman.socket - Podman API Socket
   Loaded: loaded (/usr/lib/systemd/system/podman.socket; enabled; vendor preset: disabled)
   Active: active (listening) since Mon 2023-02-13 08:36:52 EST; 6min ago
     Docs: man:podman-system-service(1)
   Listen: /run/podman/podman.sock (Stream)
    Tasks: 0 (limit: 23653)
   Memory: 0B
   CGroup: /system.slice/podman.socket

Feb 13 08:36:52 ockness.net.unc.edu systemd[1]: Listening on Podman API Socket.

Ahh. So now I have to modify /etc/gitlab-runner/config.toml to point to that socket file instead.

  1. Edited /etc/gitlab-runner/config.toml to have
...
  [runners.docker]
    host = "unix:///run/podman/podman.sock"
...
  1. Did a gitlab-runner restart as root to reload it, then got out of root.
  2. Re-ran the pipeline. It succeeded. So yes, system-wide it seems to be working (which I already figured to be the case by the fact I could spin up containers as either my own user or as the gitlab-runner user. But…
  3. I logged back into root, which I have never used with Podman to do anything, and I ran podman image ls. And sure enough, I now see the gitlab-runner images, along with the alpine image, in the root account.

Have I not, in essence, now simply made Podman run as root? I thought the whole point of Podman is that it can run rootless.

And again, I circle back to the GitLab docs. They clearly indicate setting things up to run GitLab Runner as a regular user. So why is that not working? I cannot for the life of me figure out what’s going on. This is an extremely stock build with next to nothing on it. We just built this VM recently, as our intention is to use it to host a few out-of-band apps for us.

Anyway, if you know of any logs/commands/steps I can do to track down where the trouble is, I’d appreciate it. Because if I run it like this, I’m really no better off than using Docker itself. And I can’t hammer this point home hard enough: the GitLab docs are written as if this should be working. So I’m left scratching my head. (I really don’t like it when I can’t figure out why something’s not working right. It just gnaws at me.)

Ok, so I’ve kept digging. And your help makes me think there’s something off with the gitlab-runner user. Mind you, this was created by the GitLab Runner package itself, as I never did it. (Again, from Settings | CI/CD after expanding the Runners section, I simply followed the instructions here: Install GitLab Runner | GitLab

specifically following the link here near the top of that page: Install GitLab Runner using the official GitLab repositories | GitLab

and doing steps #1, #2, and #4, after which I saw the GitLab Runner show up green in GitLab’s settings as they say.

Anyway, back to the gitlab-runner user.

  • There is nothing unusual about the home directory home/gitlab-runner.
  • Folder permissions match other users, etc.
  • Comparing the home directory files to my own, I do notice that gitlab-runner has no .bash_logout, .bash_profile, or .bashrc files, but otherwise looks identical.

Now one oddity I found is that when I look at /etc/passwd, I see that for pretty much every user, their UID and GID numbers match. For example, root is 0, my account is 1005, etc. The ONE exception is that gitlab-runner user, which apparently was created by the GitLab package installer. For that user, I see

gitlab-runner:x:984:979:GitLab Runner:/home/gitlab-runner:/bin/bash

Nothing wrong with it per se. If I check /etc/group, sure enough I find

gitlab-runner:x:979:

So that all lines up.

Just interesting that it didn’t match, as it makes me wonder what process that GitLab Runner .rpm is using to create the gitlab-runner account during installation.

Earlier I had noticed that in /etc/subuid and /etc/subgid that each of our user accounts (e.g., the account I normally log in with) were already added into those files with corresponding numbers. For example, (sanitized of course)

# cat /etc/subuid
user1:100000:65536
user2:165536:65536
user3:231072:65536
user4:296608:65536
user5:362144:65536
user6:427680:65536
gitlab-runner:493216:65536

But that gitlab-runner line I had to manually add into each file and do the math for the starting ID #.

Anyway, just what I’m finding so far.

Hi,

I’m assuming that the user-id provided in the config file matches the gitlab-runner user? Specifically:

    host = "unix:///run/user/<#>/podman/podman.sock"

the <#> part? If so, then I cannot think why it’s not working. If all the other remaining config options as per the link to the docs you used, then it should just work.

Correct. The user-id matches.

Mind you, the only way to verify this is to login as gitlab-runner and perform the command

systemctl status --user podman.socket

from there, as if I try to run that command from either my account or as root, you don’t see that in the output. But that I suspect is to be expected being a user-level service.

But yes, the # matches in the config. When we did your test of making Podman socket run system-wide, I had to adjust that in the config to be just

host = "unix:///run/podman/podman.sock"

But reverting things back (as I’d really like to get this working properly at the user level), the behavior is same as before. If user gitlab-runner is simply not logged in, then pipelines fail. But if I just log in as that user and do nothing else, pipelines succeed. To me this seems a bug in GitLab Runner, but don’t know what to do here.

One last thing. When I went into Settings | CI/CD and expanded the Runners section to provide proper URL in a previous post, I noticed this:

I was working off that first link. But then I realized they also had a button further below which seemed to be about the same thing. Odd I thought. So I clicked the button. And this is what pops up:

Now those latter instructions appear to be if you install the GitLab Runner’s binary manually and intend to manage/update it yourself. I had used the repo approach so that updates would be performed during usual maintenance automatically. But it’s odd they should have the same sort of info twice, once as a link to a doc page, then as a button with a popup. But that’s likely just a UI issue.

Digging more, I find there is no /usr/local/bin/gitlab-runner file. Instead, in my setup, that’s located in /usr/bin. Guessing again just a difference with .rpm vs. manual install. But otherwise I can’t find anything to indicate something amiss. Was hoping maybe there was some permissions step that the .rpm didn’t do properly. But so far I can’t find anything. And to be clear, I can run the gitlab-runner binary from any account. So it’s not a PATH issue or anything. (Thought maybe the lack of Bash config files in the gitlab-runner account was causing something. But if I log in as gitlab-runner, I can run that command. I can run podman commands. Heck, I can spin up containers.)

And you verified this file exists?

unix:///run/user/984/podman/podman.sock

that the path is correct with the UID 984, including the socket file?

Correct.

The file /run/user/984/podman/podman.sock exists IF I am logged in manually as gitlab-runner.
It does NOT exist if that user is not logged in. If, for example, I am simply in my own account, the entire subtree /run/user/984/ does not exist.

Logged in as gitlab-runner, I see

[gitlab-runner@server ~]$ tree /run/user
/run/user
β”œβ”€β”€ 0 [error opening dir]
β”œβ”€β”€ 1005 [error opening dir]
└── 984
    β”œβ”€β”€ bus
    β”œβ”€β”€ podman
    β”‚   └── podman.sock
    └── systemd
        β”œβ”€β”€ notify
        └── private

(1005 is my account’s ID)

I guess this is typical. You can see your own account’s sock files but not that of other users.

If I then log out of gitlab-runner and do the same immediately after from my own account I see

$ tree /run/user
/run/user
β”œβ”€β”€ 0 [error opening dir]
β”œβ”€β”€ 1005
β”‚   β”œβ”€β”€ bus
β”‚   β”œβ”€β”€ containers
β”‚   β”‚   β”œβ”€β”€ overlay
β”‚   β”‚   β”‚   β”œβ”€β”€ metacopy()-false
β”‚   β”‚   β”‚   β”œβ”€β”€ native-diff()-true
β”‚   β”‚   β”‚   └── overlay-true
β”‚   β”‚   β”œβ”€β”€ overlay-containers
β”‚   β”‚   β”œβ”€β”€ overlay-layers
β”‚   β”‚   β”‚   └── mountpoints.lock
β”‚   β”‚   └── overlay-locks
β”‚   β”œβ”€β”€ dbus-1
β”‚   β”‚   └── services
β”‚   β”œβ”€β”€ libpod
β”‚   β”‚   └── tmp
β”‚   β”‚       β”œβ”€β”€ alive
β”‚   β”‚       β”œβ”€β”€ alive.lck
β”‚   β”‚       β”œβ”€β”€ events
β”‚   β”‚       β”‚   β”œβ”€β”€ events.log
β”‚   β”‚       β”‚   └── events.log.lock
β”‚   β”‚       β”œβ”€β”€ exits
β”‚   β”‚       └── pause.pid
β”‚   └── systemd
β”‚       β”œβ”€β”€ notify
β”‚       β”œβ”€β”€ private
β”‚       └── transient
β”‚           └── podman-pause-5a39cc6f.scope
└── 984 [error opening dir]

Notice 984 still exists, but I can’t open it from my account. Again, I assume this to be normal.

But I do it again (likely > 5 seconds after logging out of gitlab-runner) and I see

$ tree /run/user
/run/user
β”œβ”€β”€ 0 [error opening dir]
└── 1005
    β”œβ”€β”€ bus
    β”œβ”€β”€ containers
    β”‚   β”œβ”€β”€ overlay
    β”‚   β”‚   β”œβ”€β”€ metacopy()-false
    β”‚   β”‚   β”œβ”€β”€ native-diff()-true
    β”‚   β”‚   └── overlay-true
    β”‚   β”œβ”€β”€ overlay-containers
    β”‚   β”œβ”€β”€ overlay-layers
    β”‚   β”‚   └── mountpoints.lock
    β”‚   └── overlay-locks
    β”œβ”€β”€ dbus-1
    β”‚   └── services
    β”œβ”€β”€ libpod
    β”‚   └── tmp
    β”‚       β”œβ”€β”€ alive
    β”‚       β”œβ”€β”€ alive.lck
    β”‚       β”œβ”€β”€ events
    β”‚       β”‚   β”œβ”€β”€ events.log
    β”‚       β”‚   └── events.log.lock
    β”‚       β”œβ”€β”€ exits
    β”‚       └── pause.pid
    └── systemd
        β”œβ”€β”€ notify
        β”œβ”€β”€ private
        └── transient
            └── podman-pause-5a39cc6f.scope

So it appears that ~5 seconds after logging out of gitlab-runner, the socket file is fully removed. Again, seems like expected behavior to me in a Podman environ. But maybe I misunderstand things.

Aaargh. This freakin’ editor lost my last post. Let me try again.

Hi,

See this post: [Solved] How to Auto-starting rootless pods using systemd - Red Hat Customer Portal

Use the enable-linger option for the service to remain on logout.

So:

loginctl enable-linger gitlab-runner

also means when rebooting the server, it will also automatically run even if the user is not logged in. I’ve just done this, and now the user-level podman service/socket remains.

2 Likes

YEEEESSSSSSS!!!

Thank you SO much, @iwalker ! You have no idea how happy this makes me.

By simply doing a

sudo loginctl enable-linger gitlab-runner

from my account (that’s where I was so was just easier) and then re-running the pipeline, all is working!!

Now just to get GitLab to update their docs (or modify their .rpm to handle this for users).

But seriously, THANK YOU. Your help is VERY much appreciated. (Yet again why I so love using FLOSS where I can. Statistically I find the devs and user communities tend to provide so much better support than dealing with purely proprietary offerings.)

2 Likes

And I have confirmed this resolves things. I rebooted the server on which Gitlab Runner was installed, and without touching that server re-ran the pipeline, and everything is now working as expected.

So yes, this resolves it. Amazing. In the end it was a single command line to fix this. sigh Ah the joys of tech, right? :slight_smile:

I definitely think this needs to be in the GitLab docs, though, as I can’t imagine others haven’t run into this. (Actually, my suspicion is that most who hit on this just gave up on Podman and went and ran Docker. That was going to be my fallback. But I really wanted to try and stay on Podman for multiple reasons: avoiding potential future Docker license changes, and key was the security aspect of running rootless. Hence persisting.)

Thanks God I found your post, I have been stuck with this for three weeks.

GitLab docs must be updated, what a waste of time, I’m sure tons of people are switching to Docker due to this problem.

Thanks

1 Like

Hello, thank you for your contribution.
By β€œmy account”, do you mean any privileged admin account?

For me, the problem was that I was entering the user ID of my administrator account, instead of the automatically created gitlab-account at:

unix:///run/user/<###>/podman/podman.sock

Thank you soo much for this!