Docker registry troubles

My self-deployed Gitlab instance (using gitlab-ce:17.0.1-ce.0 docker image) shows troubles with the docker registry:

  1. when deleting an image from the web UI a warning icon with caption Invalid tag: missing manifest digest appears and the image is still listed.

    Looking at the registry content on the Minio S3 backend I’m using I still see the deleted image tags in the _manifests/tags/ subfolder of the project.

  2. I cannot push some layers to the registry: some layers can, but others are retried many times until the push fails:

    $ docker push git.herd.cloud.infn.it:5050/herd/web/herd-code-docker:24.04.3.1.1
    The push refers to repository [git.herd.cloud.infn.it:5050/herd/web/herd-code-docker]
    ffd84f0f1746: Preparing
    2588ffe8aaa1: Preparing
    ba50df3141f5: Preparing
    665162453ef6: Preparing
    ffd84f0f1746: Layer already exists
    2588ffe8aaa1: Layer already exists
    665162453ef6: Pushed
    ba50df3141f5: Retrying in 5 seconds
    ba50df3141f5: Retrying in 4 seconds
    ba50df3141f5: Retrying in 3 seconds
    ba50df3141f5: Retrying in 2 seconds
    ba50df3141f5: Retrying in 1 second
    ba50df3141f5: Retrying in 10 seconds
    ba50df3141f5: Retrying in 9 seconds
    ba50df3141f5: Retrying in 8 seconds
    ba50df3141f5: Retrying in 7 seconds
    ba50df3141f5: Retrying in 6 seconds
    ba50df3141f5: Retrying in 5 seconds
    ba50df3141f5: Retrying in 4 seconds
    ba50df3141f5: Retrying in 3 seconds
    ba50df3141f5: Retrying in 2 seconds
    ba50df3141f5: Retrying in 1 second
    ba50df3141f5: Retrying in 15 seconds
    . . . 
    ba50df3141f5: Retrying in 1 second
    unknown: Client Closed Request
    

    When the push fails the registry log shows this:

    2024-05-29_07:41:59.37230 time="2024-05-29T07:41:59.372Z" level=warning msg="client disconnected during blob PATCH" action="blob PATCH" auth_project_paths="[herd/web/herd-code-docker]" auth_user_name=mori auth_user_type=build content_length=-1 copied=323840810 correlation_id=01HZ1MQ0SS1KQQZMTPNR70TZGB error="unexpected EOF" go_version=go1.21.9 root_repo=herd vars_name=herd/web/herd-code-docker vars_uuid=e477896b-5462-4f55-a267-64b316785633 version=v4.0.0-gitlab
    2024-05-29_07:41:59.61241 time="2024-05-29T07:41:59.372Z" level=error msg="connection reset by peer" auth_project_paths="[herd/web/herd-code-docker]" auth_user_name=mori auth_user_type=build code=CONNECTIONRESET content_type=application/octet-stream correlation_id=01HZ1MQ0SS1KQQZMTPNR70TZGB detail="client disconnected" error="connectionreset: connection reset by peer" go_version=go1.21.9 host="git.herd.cloud.infn.it:5050" method=PATCH remote_addr=131.154.98.235 root_repo=herd uri="/v2/herd/web/herd-code-docker/blobs/uploads/e477896b-5462-4f55-a267-64b316785633?_state=nV5kOnAPhXVPIMrVmQ3WxbP2hlz20SgE3piZ_ZxI7Yx7Ik5hbWUiOiJoZXJkL3dlYi9oZXJkLWNvZGUtZG9ja2VyIiwiVVVJRCI6ImU0Nzc4OTZiLTU0NjItNGY1NS1hMjY3LTY0YjMxNjc4NTYzMyIsIk9mZnNldCI6MCwiU3RhcnRlZEF0IjoiMjAyNC0wNS0yOVQwNzo0MDo1Ni4xMTY4NDQyNDJaIn0%3D" user_agent="docker/26.1.1 go/go1.21.9 git-commit/ac2de55 kernel/5.15.0-106-generic os/linux arch/amd64 UpstreamClient(Docker-Client/26.1.1 \\(linux\\))" vars_name=herd/web/herd-code-docker vars_uuid=e477896b-5462-4f55-a267-64b316785633 version=v4.0.0-gitlab
    2024-05-29_07:41:59.61268 {"content_type":"","correlation_id":"01HZ1MQ0SS1KQQZMTPNR70TZGB","duration_ms":62354,"host":"git.herd.cloud.infn.it:5050","level":"info","method":"PATCH","msg":"access","proto":"HTTP/1.1","referrer":"","remote_addr":"127.0.0.1:57714","remote_ip":"131.154.98.235","status":499,"system":"http","time":"2024-05-29T07:41:59.372Z","ttfb_ms":62354,"uri":"/v2/herd/web/herd-code-docker/blobs/uploads/e477896b-5462-4f55-a267-64b316785633?_state=nV5kOnAPhXVPIMrVmQ3WxbP2hlz20SgE3piZ_ZxI7Yx7Ik5hbWUiOiJoZXJkL3dlYi9oZXJkLWNvZGUtZG9ja2VyIiwiVVVJRCI6ImU0Nzc4OTZiLTU0NjItNGY1NS1hMjY3LTY0YjMxNjc4NTYzMyIsIk9mZnNldCI6MCwiU3RhcnRlZEF0IjoiMjAyNC0wNS0yOVQwNzo0MDo1Ni4xMTY4NDQyNDJaIn0%3D","user_agent":"docker/26.1.1 go/go1.21.9 git-commit/ac2de55 kernel/5.15.0-106-generic os/linux arch/amd64 UpstreamClient(Docker-Client/26.1.1 \\(linux\\))","written_bytes":108}
    

I don’t use the registry frequently so I cannot say for sure when the above started to happen. I don’t know how to handle it so I’d need help, thanks.

About point 1: I tried to remove another image in another project registry and I ended with the same result. How can this be fixed? I cannot believe this is a bug in Gitlab nobody hit before me…

  1. I have been able to clear the situation by manually removing everything related to the project from the Minio bucket, obviously now the registry is empty but at least no error is shown

  2. I found that the push succeeds if I login to the registry with a personal token with write_registry scope and then manually start docker push, but fails as above when the CI job logs in with CI_JOB_TOKEN. In my knowledge CI_JOB_TOKEN grants the same permissions as the user that started the job, so in my case registry write should be granted since I am the owner of the project (and also the instance admin). How can I troubleshoot this issue?