Our current setup uses docker+machine with on demand runners from AWS. This setup has been working like a charm for more than a year, but suddenly in July our pipelines started randomly failing during the build with the following error message:
WARNING: Failed to pull image with policy “always”: Cannot connect to the Docker daemon at tcp://172.31.38.228:2376. Is the docker daemon running? (manager.go:205:0s)
80% of the time the jobs work, but 20% of the time a job gets interrupted at some point with this error message. In this case I usually restart the job and it works.
I’ve tried to:
- disable spot instances and use normal ones
- downgrade docker to an older version
- restart gitlab-runner and docker
but the issue still keeps on occurring. Any ideas what could cause this issue or how this could be debugged in more detail?