Docker runner error when disabling umask

Problem to solve

I’ve set up two test gitlab docker runners. Let us say I have a pipeline with jobs A → B → C. At first job C was failing due to file permissions when gitlab cloning everything as root, so I tried running job C as root which made that job pass. After that one successful run, all jobs started getting this error:

Preparing environment
00:00
Running on runner-...
Getting source from Git repository
Fetching changes...
Reinitialized existing Git repository in ...
Checking out ... as detached HEAD (ref is ... file or directory
Cleaning up project directory and file based variables
env: can't execute 'python': No such file or directory
ERROR: Job failed: exit code 1

So I completely reinstalled gitlab-runner and registered a fresh runner. I also after this enabled the flag FF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR, hoping that would help solve my problems. Now I had two fresh runners, and so job A and B started on each of them, both jobs passed. After this job C gets the same error as above, aswell as A and B if i rerun them.

I have no idea why everything seems to work once, then stops. It is as if the first run leaves something behind, making any job after fail. I am not able to reproduce this locally.

Running which python on the runners gives me /usr/bin/python, and python --version gives me Python 3.10.12

Steps to reproduce

I am not unfortunately not able to share this Dockerfile, and I do not know how to reproduce it outside of our pipeline. I have other pipelines using other docker images set up, running on the same runners, and they do not have the same problem.

Configuration

I’ll dump some variables I have set in the Dockefile in case something would be messing it up.
FF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR=“true”
GIT_SUBMODULE_STRATEGY: recursive
GITLAB_ONLY_ISSUE_FROM_COMMIT_LINE: “true”
GIT_SUBMODULE_FORCE_HTTPS: “true”

Versions

  • [ x ] Self-managed
  • [ x ] Self-hosted Runners

Versions

  • GitLab Enterprise Edition v17.3.3-ee
  • GitLab Runner, CLI gitlab-runner 17.4.0
  • Ubuntu 22.04

Update:

If I set disable_cache: true, the jobs can run again. Of course now everytime a job has to create a fresh directory and clone of the repo, adding ~3min to every job, which is not great!

Here’s is my docker section in config.toml, just the defaults which I got when setting up the runner.

  [runners.docker]
    tls_verify = false
    image = "ruby:2.7"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = true
    volumes = ["/cache"]
    shm_size = 0
    network_mtu = 0
~                            

I would use the before_script to run some tests. Have you run any tests through the pipeline to confirm user, permissions, etc are the same that you expect?

whoami
groups
ls -l /usr/bin/python
which python
python --version
Etc

1 Like