We have a bunch of gitlab-runners using the docker executor on machines we own, and gitlab is self managed. For legacy reasons, the gitlab runner service lives in a docker container, but the host OS owns the docker service, so we bind the docker sock into the container that runs the service with
-v /var/run/docker.sock:/var/run/docker.sock --network=host
I will sporadically see fails
ERROR: Preparation failed: adding cache volume: set volume permissions: create permission container for volume "runner-xqtsnmt4-project-21-concurrent-2-cache-3c3f060a0374fc8bc39395164f415a70": Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? (linux_set.go:90:120s) Will be retried in 3s ...
This error manifests when a job is accepted.
I also see
ERROR: Job failed (system failure): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? (docker.go:705:120s) occasionally after git checkouts.
Gitlab runner version is 13.1.0, docker version is 19.03.11
The runners themselves do not access the docker executor. For the config.toml
Has anyone seen this before? I see similar errors for folks using the dind service, but unsure if that generalizes to my use case