We have problem with our CI since some weeks. File system seems shared across jobs. There is a demonstration :
I create a
.gitlab-ci.yml with simple jobs which fail if there is file coming from other jobs :
default: tags: - Linux - Docker stages: - stress stress0: stage: stress script: - pwd - touch stress0 - ls -l stress* - '[ "$(ls -l stress* | wc -l)" == "1" ]' stress1: stage: stress script: - pwd - touch stress1 - ls -l stress* - '[ "$(ls -l stress* | wc -l)" == "1" ]' [...] stress11: stage: stress script: - pwd - touch stress11 - ls -l stress* - '[ "$(ls -l stress* | wc -l)" == "1" ]'
When I execute them, I can see failures (we can see in log than a file from other job, stress2, is present):
$ pwd /builds/my-company-name/my-group-name/my-project-name $ touch stress3 $ ls -l stress* -rw-r--r-- 1 root root 0 Oct 10 09:31 stress2 -rw-r--r-- 1 root root 0 Oct 10 09:31 stress3 $ [ "$(ls -l stress* | wc -l)" == "1" ] Cleaning up project directory and file based variables 00:01 ERROR: Job failed: exit code 1
Sometime, I can see this error (git fail probably because shared file system) :
Fetching changes... Reinitialized existing Git repository in /builds/bt-lab-suite/bt-test-group/toruk/.git/ error: cannot lock ref 'refs/remotes/origin/stress-ci': is at 928071e153c9e69e6d97cecfc84758fa1756b854 but expected 3ce9bac8abd76f6c08bbd19e9d623784ad94f338 Cleaning up project directory and file based variables 00:00 ERROR: Job failed: exit code 1
Problem only appears when we use following tags, which include less performant runners :
default: tags: - Linux - Docker
These runners configuration is the following :
concurrent = 1 check_interval = 0 shutdown_timeout = 0 [session_server] session_timeout = 1800 [[runners]] name = "srvrunnerlnx5" url = "http://HIDDEN" id = 227 limit = 1 token = "HIDDEN" token_obtained_at = 2023-09-04T11:37:39Z token_expires_at = 0001-01-01T00:00:00Z executor = "docker" [runners.cache] MaxUploadedArchiveSize = 0 [runners.docker] tls_verify = false image = "alpine" privileged = false disable_entrypoint_overwrite = false oom_kill_disable = false disable_cache = false volumes = ["/cache"] shm_size = 0
I was convinced job file system was always unique (in docker container) but I’m maybe wrong ? Do you see with my demonstration what can be the cause of this file system sharing ? Thanks !
PS: Note the
pwd command result
/builds folder look like a common folder