Docker-compose fails to create containers (started 3/1)

docker-compose fails to create containers

We are using dind to build and tag containers and then docker-compose to run tests. However, starting yesterday, the job stopped executing after creating the containers. The entrypoint code is never executed and the job hangs until 1h timeout. This happens for new pushes and re-running build jobs that were successful in the past, so it is probably something with the Runner or with an image we are using. Is anyone experiencing something similar happening in the last few days?

$ IS_TESTING=true docker-compose run django test
Creating network "soapbox-backend_default" with the default driver
Creating volume "soapbox-backend_postgres_data" with default driver
Creating volume "soapbox-backend_redis_data" with default driver
Creating volume "soapbox-backend_libpostal_data" with default driver
Pulling db (postgres:10)...
10: Pulling from library/postgres
Digest: sha256:e957014667e27c2fad2f48c818f2ba77443d60481e5d28b14e11d0b461c104aa
Status: Downloaded newer image for postgres:10
Pulling redis (redis:5)...
5: Pulling from library/redis
Digest: sha256:5d3de3bfca8f861cab461dd286671e7cadb89544118afdb06106169eb5d03d77
Status: Downloaded newer image for redis:5
Pulling rabbitmq (rabbitmq:latest)...
latest: Pulling from library/rabbitmq
Digest: sha256:04d42d0f6b3f9e53cc7f1f511340ad40fb0a0f0e7752807b6971a8e11525b80b
Status: Downloaded newer image for rabbitmq:latest
Pulling anansi (407622332743.dkr.ecr.us-east-2.amazonaws.com/anansi:master)...
master: Pulling from anansi
Digest: sha256:7dd725aa77610f7a34bcbaa5ea82c052c1dc18a29965049ca56dd2b87b9c2c26
Status: Downloaded newer image for 407622332743.dkr.ecr.us-east-2.amazonaws.com/anansi:master
Creating soapbox-backend_redis_1 ... 
Creating soapbox-backend_rabbitmq_1 ... 
Creating soapbox-backend_db_1       ... 
Creating soapbox-backend_anansi_1   ... 
Creating soapbox-backend_redis_1    ... done
Creating soapbox-backend_db_1       ... done
Creating soapbox-backend_rabbitmq_1 ... done
Creating soapbox-backend_celery_normal_worker_1 ... 
Creating soapbox-backend_celery_rated_worker_1  ... 
Creating soapbox-backend_celery_rated_worker_1  ... done
Creating soapbox-backend_celery_normal_worker_1 ... done
Creating soapbox-backend_flower_1               ... 
Creating soapbox-backend_flower_1               ... done
Creating soapbox-backend_anansi_1               ... done
Creating soapbox-backend_django_run             ... 
Creating soapbox-backend_django_run             ... done
ERROR: Job failed: execution took longer than 1h0m0s seconds

Build job script:

   script:
    - if [ "${CI_COMMIT_BRANCH}" == "develop" ] || [ "${CI_COMMIT_BRANCH}" == "release" ] || [ "${CI_COMMIT_BRANCH}" == "master" ]; then BRANCH="${CI_COMMIT_BRANCH}"; else BRANCH="develop"; fi
    - docker pull ${REPO_IMAGE_BASE}:${BRANCH}
    - docker build --cache-from ${REPO_IMAGE_BASE}:${BRANCH} --tag ${REPO_IMAGE_BASE}:${BRANCH} --tag ${LOCAL_IMAGE} .
    - IS_TESTING=true docker-compose run django test
    - if [ "${CI_COMMIT_BRANCH}" == "develop" ] || [ "${CI_COMMIT_BRANCH}" == "release" ] || [ "${CI_COMMIT_BRANCH}" == "master" ]; then docker push ${REPO_IMAGE_BASE}:${CI_COMMIT_BRANCH}; else echo "Skipping push to ECR"; fi

We are having the same issue with both private and shared gitlab runners. Started happening roughly ~6h ago.

I downgraded my docker-compose rabbitmq from latest to 3.8.6 and it started woking again. Then, I put it back at latest and it worked again. I believe it was corrupted rabbitmq image cached by gitlab. I couldn’t reproduce with docker-compose, locally, though, so it’s tough to tell

Resolved automagically for us (maybe since we rebuild our ci images every night). I believe it may have been caused by this - Docker Engine release notes | Docker Documentation