Docker Login to AWS ECR Fails


Please select whether options apply, and add the version information.

  • Self-managed
  • SaaS
  • Self-hosted Runners

Problem to solve

As of yesterday a previously working CD job to push an image from the runner to AWS ECR has begun to fail.

The job fails at either of these steps, but neither one predictably:

  • aws ecr get-login-password | docker login --username AWS --password-stdin $AWS_ECR_REGISTRY: Response : Error response from daemon: Get "https://[MASKED]/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  • Or it fails after it successfully logs into the repository when it attempts to push: Error: dial tcp [MASKED]:443: i/o timeout, this occurs after successfully pushing some layers and Retrying others

Previously, this had no issues logging into the ECR repo or pushing the images. Its been rock solid for a couple years now.

I am able to log into the ECR repo locally with docker, without any issue.

Iā€™m just confused as to what may be causing this (apparently) net communication problem between the runner and the registry, given that there were no changes to the CI/CD configuration.

I read over the v17 breaking changes, and nothing in this job seems affected. My gut - unreliable though it may be - wants to suggest that this is a runner with the DNS on the runners, but I doubt that is in play (given that the SaaS service panel indicates that there are no interruptions)


  stage: build
    name: amazon/aws-cli:2.4.15
    entrypoint: [""]
    - docker:20.10.12-dind
    - amazon-linux-extras install docker
    - aws --version
    - docker --version
    - docker build . -f ./[MASKED].Dockerfile -t $AWS_ECR_REGISTRY
    - aws ecr get-login-password | docker login --username AWS --password-stdin $AWS_ECR_REGISTRY
1 Like

In any event, this apparently resolved itself overnight without any intervention.