Gitlab-runner as non-root does not clone repo

I’m trying to use gitlab-runner, configured with “shell” executor, and started manually with “gitlab-runner run” as a non-root user. My job fails with an error:

Running with gitlab-runner 11.5.0 (3afdaba6)
  on my-ci-runner 0078f850
Using Shell executor...
Running on freia013...
bash: line 59: cd: /home/chah/builds/0078f850/0/chah/glr-test: No such file or directory
ERROR: Job failed: exit status 1

It created …/chah/glr-test.tmp and wrote the CI_SERVER_TLS_CA_FILE but the glr-test directory was never created. In fact, AFAICT using strace, there was no attempt to clone the repository.

Have I missed some configuration? Should I even expect gitlab-runner to work in this mode? Any help appreciated.

Would it be possible to see your .gitlab-ci.yml file?

Here is the .gitlab-ci.yml:

mystage:
  tags:
    - mytag
  script:
    - touch /tmp/chah-ci-run.flag
    - echo Done CI

My config.toml:

concurrent = 1
check_interval = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "my-ci-runner"
  url = "https://git.(REDACTED)/"
  token = "(REDACTED)"
  executor = "shell"
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]

Other info:

  • The CI completes successfully for another commit where I changed the tags: so that one of the site’s shared docker runners was used.
  • Our site is running GitLab Community Edition 11.0.1

I don’t understand. The runner attempts to run a script, which at line 59 attempts to do

cd /home/chah/builds/0078f850/0/chah/glr-test

Who calls that script?

The script is generated internally by gitlab-runner and passed to bash. From unravelling my strace output, I can see that this script essentially performs two steps:

  1. Writes certificate data into …/glr-test.tmp/CI_SERVER_TLS_CA_FILE
  2. Tries to cd into …/glr-test (which does not exist)

Approaching the problem from a different angle, I tried running gitlab-runner on a different machine with the same config.toml and hey presto it works fine. This suggests that there is something in the environment of the original machine (maybe all the guff pulled in from /etc/profile by “bash --login”) that is causing gitlab-runner to misbehave.

I found the problem! Nasty, bordering on evil, I’d say. So here goes:

  • The shell executor runs “bash --login”. Quite why it needs a login shell, I don’t know. Anyway…

  • The first such shell command it runs is in the Prepare stage, which looks pretty innocuous

    echo “Running on $(hostname)…”

  • That surely doesn’t fail, does it? Not in itself - but being a login shell, bash then executes a .bash_logout if it exists. Mine looked like this:

    pkill -f something_irrelevant
    true

  • The pkill doesn’t find any matching processes, so exits with status 1.

  • Because the generated script does “set -e” at the top, this causes bash to exit with non-zero status.

  • Although the Prepare stage failed, gitlab-runner seemingly carried on to attempt the GetSources stage, but did not add any commands to the script which would fetch the sources. I looked at the latest source code for gitlab-runner; although I’m not familiar with go, I don’t see how it can do that. Maybe the released binary was built from a significantly different source version.

So my workaround was to comment out the pkill in my .bash_logout (it was historical cruft anyway). Putting “set +e” at the top of .bash_logout also works.

4 Likes

Now reported as issue #3849.

1 Like

OMG, I was fighting with this same issue for like 2 days. Except, my .bash_logout only had a clear in it (which I understand is not necessary). Obviously, I’m on corporate infrastructure that I did not provision and it has goofy stuff all over it.

I ran into the same issue, and setup a repo w/ runners to replicate and work on a fix - see build here.
Also shot a merge request there to try and get the fix merged upstream, feel free to have a look and discuss it.

@colinhogben: Thank you! You saved me many hours tracking this down. I just removed .bash_logout and all is well!