Jobs seems share their file system but should not


We have problem with our CI since some weeks. File system seems shared across jobs. There is a demonstration :

I create a .gitlab-ci.yml with simple jobs which fail if there is file coming from other jobs :

    - Linux
    - Docker

  - stress

  stage: stress
    - pwd
    - touch stress0
    - ls -l stress*
    - '[ "$(ls -l stress* | wc -l)" == "1" ]'

  stage: stress
    - pwd
    - touch stress1
    - ls -l stress*
    - '[ "$(ls -l stress* | wc -l)" == "1" ]'


  stage: stress
    - pwd
    - touch stress11
    - ls -l stress*
    - '[ "$(ls -l stress* | wc -l)" == "1" ]'

When I execute them, I can see failures (we can see in log than a file from other job, stress2, is present):

$ pwd
$ touch stress3
$ ls -l stress*
-rw-r--r--    1 root     root             0 Oct 10 09:31 stress2
-rw-r--r--    1 root     root             0 Oct 10 09:31 stress3
$ [ "$(ls -l stress* | wc -l)" == "1" ]
Cleaning up project directory and file based variables 00:01
ERROR: Job failed: exit code 1

Sometime, I can see this error (git fail probably because shared file system) :

Fetching changes...
Reinitialized existing Git repository in /builds/bt-lab-suite/bt-test-group/toruk/.git/
error: cannot lock ref 'refs/remotes/origin/stress-ci': is at 928071e153c9e69e6d97cecfc84758fa1756b854 but expected 3ce9bac8abd76f6c08bbd19e9d623784ad94f338
Cleaning up project directory and file based variables 00:00
ERROR: Job failed: exit code 1

Problem only appears when we use following tags, which include less performant runners :

    - Linux
    - Docker

These runners configuration is the following :

concurrent = 1
check_interval = 0
shutdown_timeout = 0
  session_timeout = 1800
  name = "srvrunnerlnx5"
  url = "http://HIDDEN"
  id = 227
  limit = 1
  token = "HIDDEN"
  token_obtained_at = 2023-09-04T11:37:39Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "docker"
    MaxUploadedArchiveSize = 0
    tls_verify = false
    image = "alpine"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0

I was convinced job file system was always unique (in docker container) but I’m maybe wrong ? Do you see with my demonstration what can be the cause of this file system sharing ? Thanks !

PS: Note the pwd command result /builds/my-company-name/my-group-name/my-project-name. The /builds folder look like a common folder :thinking:

Docker executor is caching git repo files so it doesn’t have to do a clone everytime. It is cached as Docker volume which is mounted to job containers. Since this volume is shared you should not switch refs in your jobs.

Okay. Thanks you. That’s a weird default behavior but okay. I can currently avoid the problem by specifying the build dir manually with job id inside. Example :