Docker registry troubles

My self-deployed Gitlab instance (using gitlab-ce:17.0.1-ce.0 docker image) shows troubles with the docker registry:

  1. when deleting an image from the web UI a warning icon with caption Invalid tag: missing manifest digest appears and the image is still listed.

    Looking at the registry content on the Minio S3 backend I’m using I still see the deleted image tags in the _manifests/tags/ subfolder of the project.

  2. I cannot push some layers to the registry: some layers can, but others are retried many times until the push fails:

    $ docker push git.herd.cloud.infn.it:5050/herd/web/herd-code-docker:24.04.3.1.1
    The push refers to repository [git.herd.cloud.infn.it:5050/herd/web/herd-code-docker]
    ffd84f0f1746: Preparing
    2588ffe8aaa1: Preparing
    ba50df3141f5: Preparing
    665162453ef6: Preparing
    ffd84f0f1746: Layer already exists
    2588ffe8aaa1: Layer already exists
    665162453ef6: Pushed
    ba50df3141f5: Retrying in 5 seconds
    ba50df3141f5: Retrying in 4 seconds
    ba50df3141f5: Retrying in 3 seconds
    ba50df3141f5: Retrying in 2 seconds
    ba50df3141f5: Retrying in 1 second
    ba50df3141f5: Retrying in 10 seconds
    ba50df3141f5: Retrying in 9 seconds
    ba50df3141f5: Retrying in 8 seconds
    ba50df3141f5: Retrying in 7 seconds
    ba50df3141f5: Retrying in 6 seconds
    ba50df3141f5: Retrying in 5 seconds
    ba50df3141f5: Retrying in 4 seconds
    ba50df3141f5: Retrying in 3 seconds
    ba50df3141f5: Retrying in 2 seconds
    ba50df3141f5: Retrying in 1 second
    ba50df3141f5: Retrying in 15 seconds
    . . . 
    ba50df3141f5: Retrying in 1 second
    unknown: Client Closed Request
    

    When the push fails the registry log shows this:

    2024-05-29_07:41:59.37230 time="2024-05-29T07:41:59.372Z" level=warning msg="client disconnected during blob PATCH" action="blob PATCH" auth_project_paths="[herd/web/herd-code-docker]" auth_user_name=mori auth_user_type=build content_length=-1 copied=323840810 correlation_id=01HZ1MQ0SS1KQQZMTPNR70TZGB error="unexpected EOF" go_version=go1.21.9 root_repo=herd vars_name=herd/web/herd-code-docker vars_uuid=e477896b-5462-4f55-a267-64b316785633 version=v4.0.0-gitlab
    2024-05-29_07:41:59.61241 time="2024-05-29T07:41:59.372Z" level=error msg="connection reset by peer" auth_project_paths="[herd/web/herd-code-docker]" auth_user_name=mori auth_user_type=build code=CONNECTIONRESET content_type=application/octet-stream correlation_id=01HZ1MQ0SS1KQQZMTPNR70TZGB detail="client disconnected" error="connectionreset: connection reset by peer" go_version=go1.21.9 host="git.herd.cloud.infn.it:5050" method=PATCH remote_addr=131.154.98.235 root_repo=herd uri="/v2/herd/web/herd-code-docker/blobs/uploads/e477896b-5462-4f55-a267-64b316785633?_state=nV5kOnAPhXVPIMrVmQ3WxbP2hlz20SgE3piZ_ZxI7Yx7Ik5hbWUiOiJoZXJkL3dlYi9oZXJkLWNvZGUtZG9ja2VyIiwiVVVJRCI6ImU0Nzc4OTZiLTU0NjItNGY1NS1hMjY3LTY0YjMxNjc4NTYzMyIsIk9mZnNldCI6MCwiU3RhcnRlZEF0IjoiMjAyNC0wNS0yOVQwNzo0MDo1Ni4xMTY4NDQyNDJaIn0%3D" user_agent="docker/26.1.1 go/go1.21.9 git-commit/ac2de55 kernel/5.15.0-106-generic os/linux arch/amd64 UpstreamClient(Docker-Client/26.1.1 \\(linux\\))" vars_name=herd/web/herd-code-docker vars_uuid=e477896b-5462-4f55-a267-64b316785633 version=v4.0.0-gitlab
    2024-05-29_07:41:59.61268 {"content_type":"","correlation_id":"01HZ1MQ0SS1KQQZMTPNR70TZGB","duration_ms":62354,"host":"git.herd.cloud.infn.it:5050","level":"info","method":"PATCH","msg":"access","proto":"HTTP/1.1","referrer":"","remote_addr":"127.0.0.1:57714","remote_ip":"131.154.98.235","status":499,"system":"http","time":"2024-05-29T07:41:59.372Z","ttfb_ms":62354,"uri":"/v2/herd/web/herd-code-docker/blobs/uploads/e477896b-5462-4f55-a267-64b316785633?_state=nV5kOnAPhXVPIMrVmQ3WxbP2hlz20SgE3piZ_ZxI7Yx7Ik5hbWUiOiJoZXJkL3dlYi9oZXJkLWNvZGUtZG9ja2VyIiwiVVVJRCI6ImU0Nzc4OTZiLTU0NjItNGY1NS1hMjY3LTY0YjMxNjc4NTYzMyIsIk9mZnNldCI6MCwiU3RhcnRlZEF0IjoiMjAyNC0wNS0yOVQwNzo0MDo1Ni4xMTY4NDQyNDJaIn0%3D","user_agent":"docker/26.1.1 go/go1.21.9 git-commit/ac2de55 kernel/5.15.0-106-generic os/linux arch/amd64 UpstreamClient(Docker-Client/26.1.1 \\(linux\\))","written_bytes":108}
    

I don’t use the registry frequently so I cannot say for sure when the above started to happen. I don’t know how to handle it so I’d need help, thanks.

About point 1: I tried to remove another image in another project registry and I ended with the same result. How can this be fixed? I cannot believe this is a bug in Gitlab nobody hit before me…

  1. I have been able to clear the situation by manually removing everything related to the project from the Minio bucket, obviously now the registry is empty but at least no error is shown

  2. I found that the push succeeds if I login to the registry with a personal token with write_registry scope and then manually start docker push, but fails as above when the CI job logs in with CI_JOB_TOKEN. In my knowledge CI_JOB_TOKEN grants the same permissions as the user that started the job, so in my case registry write should be granted since I am the owner of the project (and also the instance admin). How can I troubleshoot this issue?

Did you find a solution? My CI/CD pipelines are also failing with this error now but this must be a recent issue (perhaps when I upgraded Gitlab)? I am the Owner so I have all permissions.

Not yet, I have had not much time to investigate further and the old image still works fine for me, so I temporarily gave up.

Shame, I was hoping you had the solution. I just can’t figure this out. The networking team does not see any blocked traffic or anything like that but I just have to go on their word. Locally, I was able to push some images but not others and it seems to fail on the very last layer. The registry uses workload identity and has storage permissions, my gitlab user is the admin and owner, etc. This has worked for over a year so I just have no idea what has changed.

My issue was fixed finally.

python:3.12-slim-bookworm was being flagged as Virus/Linux.WGeneric.eizzgy which caused the blob PATCH to fail for one of the layers. Not sure if your issue is caused by something similar or not, but good luck.

Hi, for sure this is something I didn’t think about. How did you solve it? I’ll take a look ASAP (very likely in a very far future…)

After some time I have bee able to look at this issue again. Now after several updates I’m on Gitlab 17.2.2 and I discovered that:

  1. no more images are present in the registry of the problematic project
  2. the push problem persists. The strange thing is that during push Docker says that Layer already exists for some layers, even if no image is present on the registry and thus I suspect something bad happened under the hood in the Gitlab registry. Unfortunately I still don’t know how to investigate further, I’ll have to do some research

So 1 vanished by itself, no idea what did the job, while for 2 I’m still in open waters.