Gitlab-Runner: "Failed to cleanup volume" causes reuse of volume

We’re seeing intermittent build failures due to ‘directories’ already existing in the runners file system.

After investigation it seems to be related to: https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/1989 which introduced a regression that disabled_cache=true was not listend to and this is fixed in https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/2042

However we’re seeing that the new ‘error’ introduced in https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/2042 is thrown in our environment and are wondering what could cause this.
What we see happening is that the volume is not removed and in a next pipeline run, for this project, this volume is reused. This causes our build to fail as certain directories exist which we don’t expect.

We then remove the volumes manually via docker volume prune -f which solves the immediate problem. But we can then wait for the next volume to stick around.

kern.log.1:Jul 21 11:27:07 42-gitlab-runner01 gitlab-runner[15489]: #033[31;1mERROR: Failed to cleanup volumes                  #033[0;m  #033[31;1merror#033[0;m=remove temporary volumes: Error response from daemon: remove runner-anlw9ubl-project-56-concurrent-1-cache-c33bcaa1fd2c77edfc3893b41966cea8: volume is in use - [96d49732ae8c4e943f9a649fa2e736269c7a1a07b96839bed30928a3a00573a7, 6c6633b526593f62bce52057f1a9aa02874aa53810cf8e313a1dfdb872310d7c] (manager.go:220:0s) #033[31;1mjob#033[0;m=102380 #033[31;1mproject#033[0;m=56 #033[31;1mrunner#033[0;m=AnLw9ubL
kern.log.1:Jul 21 13:46:47 42-gitlab-runner01 gitlab-runner[15489]: #033[31;1mERROR: Failed to cleanup volumes                  #033[0;m  #033[31;1merror#033[0;m=remove temporary volumes: Error response from daemon: remove runner-anlw9ubl-project-53-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8: volume is in use - [596b7d4f8d3b5226e080baa4b3ffd90a5857df45870485b4268dd05c24d4b51a, a430438ff0efad91e639b1a4d2791e94a1d7f7f5f5bda39dd356e84a62737368] (manager.go:220:0s) #033[31;1mjob#033[0;m=102629 #033[31;1mproject#033[0;m=53 #033[31;1mrunner#033[0;m=AnLw9ubL
kern.log.1:Jul 21 13:47:17 42-gitlab-runner01 gitlab-runner[15489]: #033[31;1mERROR: Failed to cleanup volumes                  #033[0;m  #033[31;1merror#033[0;m=remove temporary volumes: Error response from daemon: remove runner-anlw9ubl-project-56-concurrent-1-cache-c33bcaa1fd2c77edfc3893b41966cea8: volume is in use - [96d49732ae8c4e943f9a649fa2e736269c7a1a07b96839bed30928a3a00573a7, 6c6633b526593f62bce52057f1a9aa02874aa53810cf8e313a1dfdb872310d7c] (manager.go:220:0s) #033[31;1mjob#033[0;m=102625 #033[31;1mproject#033[0;m=56 #033[31;1mrunner#033[0;m=AnLw9ubL
kern.log.1:Jul 21 13:48:22 42-gitlab-runner01 gitlab-runner[15489]: #033[31;1mERROR: Failed to cleanup volumes                  #033[0;m  #033[31;1merror#033[0;m=remove temporary volumes: Error response from daemon: remove runner-anlw9ubl-project-53-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8: volume is in use - [596b7d4f8d3b5226e080baa4b3ffd90a5857df45870485b4268dd05c24d4b51a, a430438ff0efad91e639b1a4d2791e94a1d7f7f5f5bda39dd356e84a62737368] (manager.go:220:0s) #033[31;1mjob#033[0;m=102632 #033[31;1mproject#033[0;m=53 #033[31;1mrunner#033[0;m=AnLw9ubL
kern.log.1:Jul 21 13:49:25 42-gitlab-runner01 gitlab-runner[15489]: #033[31;1mERROR: Failed to cleanup volumes                  #033[0;m  #033[31;1merror#033[0;m=remove temporary volumes: Error response from daemon: remove runner-anlw9ubl-project-53-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8: volume is in use - [a430438ff0efad91e639b1a4d2791e94a1d7f7f5f5bda39dd356e84a62737368, 596b7d4f8d3b5226e080baa4b3ffd90a5857df45870485b4268dd05c24d4b51a] (manager.go:220:0s) #033[31;1mjob#033[0;m=102633 #033[31;1mproject#033[0;m=53 #033[31;1mrunner#033[0;m=AnLw9ubL
kern.log.1:Jul 21 13:50:30 42-gitlab-runner01 gitlab-runner[15489]: #033[31;1mERROR: Failed to cleanup volumes                  #033[0;m  #033[31;1merror#033[0;m=remove temporary volumes: Error response from daemon: remove runner-anlw9ubl-project-53-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8: volume is in use - [596b7d4f8d3b5226e080baa4b3ffd90a5857df45870485b4268dd05c24d4b51a, a430438ff0efad91e639b1a4d2791e94a1d7f7f5f5bda39dd356e84a62737368] (manager.go:220:0s) #033[31;1mjob#033[0;m=102636 #033[31;1mproject#033[0;m=53 #033[31;1mrunner#033[0;m=AnLw9ubL
kern.log.1:Jul 21 13:58:49 42-gitlab-runner01 gitlab-runner[15489]: #033[31;1mERROR: Failed to cleanup volumes                  #033[0;m  #033[31;1merror#033[0;m=remove temporary volumes: Error response from daemon: remove runner-anlw9ubl-project-53-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8: volume is in use - [596b7d4f8d3b5226e080baa4b3ffd90a5857df45870485b4268dd05c24d4b51a, a430438ff0efad91e639b1a4d2791e94a1d7f7f5f5bda39dd356e84a62737368] (manager.go:220:0s) #033[31;1mjob#033[0;m=102637 #033[31;1mproject#033[0;m=53 #033[31;1mrunner#033[0;m=AnLw9ubL
kern.log.1:Jul 21 13:59:47 42-gitlab-runner01 gitlab-runner[15489]: #033[31;1mERROR: Failed to cleanup volumes                  #033[0;m  #033[31;1merror#033[0;m=remove temporary volumes: Error response from daemon: remove runner-anlw9ubl-project-53-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8: volume is in use - [596b7d4f8d3b5226e080baa4b3ffd90a5857df45870485b4268dd05c24d4b51a, a430438ff0efad91e639b1a4d2791e94a1d7f7f5f5bda39dd356e84a62737368] (manager.go:220:0s) #033[31;1mjob#033[0;m=102638 #033[31;1mproject#033[0;m=53 #033[31;1mrunner#033[0;m=AnLw9ubL
kern.log.1:Jul 21 16:34:35 42-gitlab-runner01 gitlab-runner[15489]: #033[31;1mERROR: Failed to cleanup volumes                  #033[0;m  #033[31;1merror#033[0;m=remove temporary volumes: Error response from daemon: remove runner-anlw9ubl-project-53-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8: volume is in use - [596b7d4f8d3b5226e080baa4b3ffd90a5857df45870485b4268dd05c24d4b51a, a430438ff0efad91e639b1a4d2791e94a1d7f7f5f5bda39dd356e84a62737368] (manager.go:220:0s) #033[31;1mjob#033[0;m=103054 #033[31;1mproject#033[0;m=53 #033[31;1mrunner#033[0;m=AnLw9ubL
kern.log.1:Jul 22 00:20:40 42-gitlab-runner01 gitlab-runner[15489]: #033[31;1mERROR: Failed to cleanup volumes                  #033[0;m  #033[31;1merror#033[0;m=remove temporary volumes: Error response from daemon: remove runner-anlw9ubl-project-53-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8: volume is in use - [596b7d4f8d3b5226e080baa4b3ffd90a5857df45870485b4268dd05c24d4b51a, a430438ff0efad91e639b1a4d2791e94a1d7f7f5f5bda39dd356e84a62737368] (manager.go:220:0s) #033[31;1mjob#033[0;m=103211 #033[31;1mproject#033[0;m=53 #033[31;1mrunner#033[0;m=AnLw9ubL
kern.log.1:Jul 23 15:52:25 42-gitlab-runner01 gitlab-runner[15489]: #033[31;1mERROR: Failed to cleanup volumes                  #033[0;m  #033[31;1merror#033[0;m=remove temporary volumes: Error response from daemon: remove runner-anlw9ubl-project-996-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8: volume is in use - [a6020c40feba66c90f45f5c10d363a97081b2f724e79ca22d1637e7d592cea53] (manager.go:220:0s) #033[31;1mjob#033[0;m=104005 #033[31;1mproject#033[0;m=996 #033[31;1mrunner#033[0;m=AnLw9ubL
kern.log.1:Jul 23 15:57:10 42-gitlab-runner01 gitlab-runner[15489]: #033[31;1mERROR: Failed to cleanup volumes                  #033[0;m  #033[31;1merror#033[0;m=remove temporary volumes: Error response from daemon: remove runner-anlw9ubl-project-996-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8: volume is in use - [a6020c40feba66c90f45f5c10d363a97081b2f724e79ca22d1637e7d592cea53] (manager.go:220:0s) #033[31;1mjob#033[0;m=104031 #033[31;1mproject#033[0;m=996 #033[31;1mrunner#033[0;m=AnLw9ubL

We are running Gitlab-runner 13.1.1 connected to a privately hosted gitlab

$ gitlab-runner -v
Version:      13.1.1
Git revision: 6fbc7474
Git branch:   13-1-stable
GO version:   go1.13.8
Built:        2020-07-01T06:49:55+0000
OS/Arch:      linux/amd64

relevant gitlab-runner config.toml

 cat /etc/gitlab-runner/config.toml
listen_address = ":9252"
concurrent = 6
check_interval = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "Shared docker runner, with Maven repository access"
  url = "https://gitlab.xxxxxx.xxxx/"
  token = "AnLw9ubLbqniW-5y-HwV"
  executor = "docker"
  [runners.cache]
  [runners.docker]
    tls_verify = false
    image = "maven:3.6.0-jdk-8"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = true
    volumes = ["/cache", "/home/gitlab-runner/.m2:/root/.m2"]
    extra_hosts = ["gitlab..xxxxxx.xxxx:xxx.xxx.xxx.xxx"]
    pull_policy = "if-not-present"
    shm_size = 0

Any clues where we might start looking.

We encounter the same issue with an omnibus deployment on premise without cache.

Gitlab: 13.4.3-ee (fd96f779e9d)
Gitlab Runner: 13.4.1

The error message itself, “Failed to cleanup volume” relates to the gitlab runner not having sufficient OS privileges. This is not related config.toml “privileged = true”.

  • If you deploy the runner using sudo, then the error does is not reported, in my test.
  • The cache setting enable/disable does not dictate when the error is reported.
  • The stuck concurrent volumes are not related to the error reported.
  • Pipeline success/failure also does not seem to dictate when concurrent volumes are cleaned up.

Stuck concurrent volumes are due to gitlab runner itself - defects or such. I’m not going to try to analyse some other guys code, but the volumes simply do not clean themselves up. Some volumes delete, some do now. I run a script on a cronjob to delete them.