Our docker+machine autoscaling builds on GCP started failing several days ago. No changes whatsoever on our side to anything related to CI or GCP infrastructure; just some small C++ code changes that triggered new builds.
The failure appears to be related to the runner service launching the remote docker on the spawned instance using the –storage-driver aufs option, which leads to missing driver errors. Specifically the dockerd launch command remotely executed on the spawned VM includes –storage-driver aufs, and the syslog on that machine states “level=error msg=“AUFS was not found in /proc/filesystems” storage-driver=aufs” before failing. But I could be wrong - am not experienced debugging CI setups.
Has anyone else seen this? I’d be grateful for any ideas.
This is the high level GitLab log for the pipeline:
Running with gitlab-runner 13.9.0 (2ebc4dc4) on gitlab-runner-gce-root-2 9db142ae
Preparing the "docker+machine" executor
05:48
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Job failed (system failure): exit status 1
The parent runner instance’s log:
Apr 29 01:17:42 gitlab-runner-root-2 gitlab-runner[4282]: Copying certs to the remote machine... driver=google name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 operation=create
Apr 29 01:17:44 gitlab-runner-root-2 gitlab-runner[4282]: Setting Docker configuration on the remote daemon... driver=google name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 operation=create
Apr 29 01:17:45 gitlab-runner-root-2 gitlab-runner[4282]: Feeding runners to channel builds=1
Apr 29 01:17:45 gitlab-runner-root-2 gitlab-runner[4282]: Docker Machine Details creating=1 idle=0 maxMachineCreate=0 maxMachines=0 minIdleCount=0 removing=0 runner=9db142ae time=2023-04-29 01:17:45.076152166 +0000 UTC m=+474.234273866 total=1 used=0
Apr 29 01:17:45 gitlab-runner-root-2 gitlab-runner[4282]: Failed to request job: runner requestConcurrency meet builds=1 runner=9db142ae
Apr 29 01:17:46 gitlab-runner-root-2 gitlab-runner[4282]: Submitting job to coordinator... ok code=200 job=4200869459 job-status= runner=9db142ae update-interval=0s
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: ERROR: Error creating machine: Error running provisioning: ssh command error: driver=google name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 operation=create
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: ERROR: command : sudo systemctl -f start docker driver=google name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 operation=create
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: ERROR: err : exit status 1 driver=google name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 operation=create
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: ERROR: output : Job for docker.service failed because the control process exited with error code. driver=google name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 operation=create
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: ERROR: See "systemctl status docker.service" and "journalctl -xe" for details. driver=google name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 operation=create
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: WARNING: Problem while reading command output error=read |0: file already closed
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: WARNING: Problem while reading command output error=read |0: file already closed
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: Skipping provision retry on failed machine error=exit status 1 name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: ERROR: Machine creation failed error=exit status 1 name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 time=34.562191809s
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: WARNING: Requesting machine removal lifetime=34.562447574s name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 now=2023-04-29 01:17:47.184745963 +0000 UTC m=+476.342867672 reason=Failed to create used=34.562447918s usedCount=0
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: WARNING: Stopping machine lifetime=34.58787779s name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 reason=Failed to create used=24.841869ms usedCount=0
Apr 29 01:17:47 gitlab-runner-root-2 gitlab-runner[4282]: Stopping "runner-9db142ae-auto-scale-runner-1682731032-4ec325c9"... name=runner-9db142ae-auto-scale-runner-1682731032-4ec325c9 operation=stop
Here is the output of “systemctl status docker.service” on a spawned machines that failed to execute the task:
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─10-machine.conf
Active: failed (Result: exit-code) since Sat 2023-04-29 01:20:32 UTC; 13s ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Process: 2409 ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2376 -H unix:///var/run/docker.sock --storage-driver aufs --tlsverify --tlscacert /etc/docker/ca.pem --tlscert /etc/docker/server.pem --tlskey /etc/docker/server-key.pem --label provider=google (code=exited, status=1/FAILURE)
Main PID: 2409 (code=exited, status=1/FAILURE)
CPU: 88ms
Apr 29 01:20:32 runner-9db142ae-auto-scale-runner-1682731191-4f4959bc systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Apr 29 01:20:32 runner-9db142ae-auto-scale-runner-1682731191-4f4959bc systemd[1]: Stopped Docker Application Container Engine.
Apr 29 01:20:32 runner-9db142ae-auto-scale-runner-1682731191-4f4959bc systemd[1]: docker.service: Start request repeated too quickly.
Apr 29 01:20:32 runner-9db142ae-auto-scale-runner-1682731191-4f4959bc systemd[1]: docker.service: Failed with result 'exit-code'.
Apr 29 01:20:32 runner-9db142ae-auto-scale-runner-1682731191-4f4959bc systemd[1]: Failed to start Docker Application Container Engine.
And here’s “journalctl -xe” on the same machine:
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: Starting Docker Application Container Engine...
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 dockerd[2409]: time="2023-04-29T01:23:19.618022956Z" level=info msg="Starting up"
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 dockerd[2409]: time="2023-04-29T01:23:19.622244404Z" level=info msg="[graphdriver] trying configured driver: aufs"
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 dockerd[2409]: time="2023-04-29T01:23:19.622410834Z" level=warning msg="[graphdriver] WARNING: the aufs storage-driver is deprecated and will be removed in a future release; visit https://docs.docker.com/go/storage-driver/ for more information"
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 dockerd[2409]: time="2023-04-29T01:23:19.624813559Z" level=error msg="AUFS was not found in /proc/filesystems" storage-driver=aufs
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 dockerd[2409]: failed to start daemon: error initializing graphdriver: driver not supported
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: docker.service: Failed with result 'exit-code'.
Apr 29 01:23:19 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: Failed to start Docker Application Container Engine.
Apr 29 01:23:21 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Apr 29 01:23:21 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: Stopped Docker Application Container Engine.
Apr 29 01:23:21 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: docker.service: Start request repeated too quickly.
Apr 29 01:23:21 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: docker.service: Failed with result 'exit-code'.
Apr 29 01:23:21 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: Failed to start Docker Application Container Engine.
Apr 29 01:23:21 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: docker.socket: Failed with result 'service-start-limit-hit'.
Apr 29 01:23:25 runner-9db142ae-auto-scale-runner-1682731361-a5ffb6e4 systemd[1]: Stopping User Manager for UID 1004...
Our config.toml:
concurrent = 3
check_interval = 0
log_level = "debug"
[[runners]]
name = "gitlab-runner-gce-root-2"
url = "https://gitlab.com/"
token = "[REDACTED]"
executor = "docker+machine"
environment = ["DOCKER_DRIVER=overlay2"]
[runners.docker]
tls_verify = false
image = "[REDACTED]"
privileged = false
disable_cache = true
shm_size = 0
[runners.cache]
Type = "gcs"
Path = "cache"
Shared = false
[runners.cache.gcs]
CredentialsFile = "[REDACTED]"
BucketName = "[REDACTED]"
[runners.machine]
IdleCount = 0
IdleTime = 60
MachineDriver = "google"
MachineName = "auto-scale-runner-%s"
MachineOptions = [
"google-project=[REDACTED]",
"google-machine-type=e2-standard-32",
"google-machine-image=[REDACTED]",
"google-tags=gitlab-ci-slave",
"google-preemptible=true",
"google-zone=us-west2-c",
"google-use-internal-ip=true",
"google-disk-type=pd-ssd",
"google-disk-size=50"
]