Debug "credentials from /root/.docker/config.json" vs "credentials from payload" with GitLab Runner and Container Registry

Hi everyone

I have an issue where I have the same job that will sometimes pass, and sometimes fail with:

ERROR: Job failed: failed to pull image "my-registry/mygroup/base-containers/main:latest" with specified policies [always]: Error response from daemon: Head "https://my-registry/v2/mygroup/base-containers/main/manifests/latest": unauthorized: HTTP Basic: Access denied.

I have determined that every time the pipeline succeeds, it contains the following line in the job cli window:

Using Docker executor with image my_registry/my_group/base-containers/main:latest ...
Authenticating with credentials from job payload (GitLab Registry)
Pulling docker image my_registry/my_group/base-containers/main:latest ...

and every time it fails it contains:

Using Docker executor with image my_registry/my_group/base-containers/main:latest ...
Authenticating with credentials from /root/.docker/config.json
Pulling docker image my_registry/my_group/base-containers/main:latest ...

This will happen on the same runner using the same tags.

Excerpt from my .gitlab-ci.yml:

image: my_registry/my_group/base-containers/main:latest

  - test_code
  - build-container
  - deploy

  stage: test_code
    - docker
    - yum install -y httpd
    - cp -f ${CI_PROJECT_DIR}/index.html /var/www/html/
    - cat /var/www/html/index.html
    - /usr/sbin/httpd
    - curl
    - main

As I have stated, this same job will either pass or fail, depending on that auth line.

Could anyone please explain to me what the heck is going on? I have been at this for 8 hours straight now, and nothing I do works.

So when does the runner determine to use Authenticating with credentials from job payload vs Authenticating with credentials from /root/.docker/config.json, especially seeing as the service here is not defined as dind?

Thank you in advance for anyone willing to help!

Iā€™m not sure what happens in your case, but found documentation how the precedence for authorization is handled, which might be helpful to dig deeper.

Maybe there are more runners into play here, and one of them does not have a local docker auth file configured/readable.

To rule out this possibility, you can print the runner ID and versions by accessing predefined variables, Predefined variables reference | GitLab suggest adding that into the job script sections.

  - echo $CI_RUNNER_ID

Thank you for the response. I will apply the suggestions when behind my computer again and report back. I saw the Precedence docs, but did not make anything of it since I did not have any auth configured, either in successful or failed pipeline runs.

Might this be an issue because of a corporate proxy? Just trying to think of other possible causes.

I have confirmed already that it would succeed and fail on the same runner. The only difference being the auth line in the output as being stated above.

Thanks again for your time

Is that the GitLab container registry, or an external registry?

Another question: Which versions of GitLab server and runner are involved?

Hi @dnsmichi

Thanks again for your responses.

Yes, my_registry refers to the GitLab container registry.

Versions currently installed are 16.0.1-ee for GitLab instance and the runner the job is executing on is also on 16.0.1.

This was indeed a runner issue. Someone had added runners with the same names, so we had duplicates etc. We deleted all the runners, and started up a single one, and so far it has used the Credentials from Payload every single time.

Thank you for the help @dnsmichi !

1 Like

Found the actual problem, so posting here for anyone having the same issue in the future.

A dev logged into the runner host nodes and tested the registry with docker login. They never logged out and also did not clear the credentials this command saved into /root/.docker/config.json. So, every time a job runs, it tried to use the credentials from this file on the host, and obviously failed. So thanks to @dnsmichi and the link he posted about precedence for authorization, because in the end that was exactly the issue.

So the fix: Ensure you have no auth entries in /root/.docker/config.json on your runner host.

1 Like