Hi,
We have problem with our CI since some weeks. File system seems shared across jobs. There is a demonstration :
I create a .gitlab-ci.yml
with simple jobs which fail if there is file coming from other jobs :
default:
tags:
- Linux
- Docker
stages:
- stress
stress0:
stage: stress
script:
- pwd
- touch stress0
- ls -l stress*
- '[ "$(ls -l stress* | wc -l)" == "1" ]'
stress1:
stage: stress
script:
- pwd
- touch stress1
- ls -l stress*
- '[ "$(ls -l stress* | wc -l)" == "1" ]'
[...]
stress11:
stage: stress
script:
- pwd
- touch stress11
- ls -l stress*
- '[ "$(ls -l stress* | wc -l)" == "1" ]'
When I execute them, I can see failures (we can see in log than a file from other job, stress2, is present):
$ pwd
/builds/my-company-name/my-group-name/my-project-name
$ touch stress3
$ ls -l stress*
-rw-r--r-- 1 root root 0 Oct 10 09:31 stress2
-rw-r--r-- 1 root root 0 Oct 10 09:31 stress3
$ [ "$(ls -l stress* | wc -l)" == "1" ]
Cleaning up project directory and file based variables 00:01
ERROR: Job failed: exit code 1
Sometime, I can see this error (git fail probably because shared file system) :
Fetching changes...
Reinitialized existing Git repository in /builds/bt-lab-suite/bt-test-group/toruk/.git/
error: cannot lock ref 'refs/remotes/origin/stress-ci': is at 928071e153c9e69e6d97cecfc84758fa1756b854 but expected 3ce9bac8abd76f6c08bbd19e9d623784ad94f338
Cleaning up project directory and file based variables 00:00
ERROR: Job failed: exit code 1
Problem only appears when we use following tags, which include less performant runners :
default:
tags:
- Linux
- Docker
These runners configuration is the following :
concurrent = 1
check_interval = 0
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "srvrunnerlnx5"
url = "http://HIDDEN"
id = 227
limit = 1
token = "HIDDEN"
token_obtained_at = 2023-09-04T11:37:39Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "docker"
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.docker]
tls_verify = false
image = "alpine"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
I was convinced job file system was always unique (in docker container) but I’m maybe wrong ? Do you see with my demonstration what can be the cause of this file system sharing ? Thanks !
PS: Note the pwd
command result /builds/my-company-name/my-group-name/my-project-name
. The /builds
folder look like a common folder