Dind no space left on device

Problem to solve

Describe your question in as much detail as possible:
We have been using the docker in docker plugin and gitlab-ci.yml for months to build our docker images. Then a manual publish task to download the specfiic version of a image and push into AWS. We recently started having issues with the publish portion of the job which manually download the image that was just previously built in the prior stage. We attempt to download from the image that was just pushed to the gitlab package repository. The publishes to the repository are continuing to work, as we are not above our quotas.

Now when downloading the image, after validating the server is starting clean, docker images retruns with no images reported. We get a space error before the first image download completes…

failed to register layer: write /home/python/.local/lib/python3.11/site-packages/triton/third_party/hip/llvm/bin/ld.lld: no space left on device

Previous to this, there were several months where this operation continued as expected. With the first failures on 8/7/24.
Last Success was on 7/24/24

With no changes to our .gitlab-ci.yml file between the runs.

  • What are you seeing, and how does that differ from what you expect to see?
    Expecte the build to succeed.
    Expected a quota issue.

  • Consider including screenshots, error messages, and/or other helpful visuals

Steps to reproduce

*Which troubleshooting steps have you already taken?
Added docker images to script before to see if there are already loaded images that may be taking up space. added df to script to see if the runner is adequate…

Can you link to any docs or other resources so we know where you have been?*

Configuration

Add the CI/CD configuration from .gitlab-ci.yml and other configuration if relevant (e.g. docker-compose.yml). Alternatively, create a public GitLab.com example project that provides all necessary files to reproduce the question.

publish-ecr-common:
stage: publish
services:
- docker:dind
before_script:
- docker login -u ${CI_REGISTRY_USER} -p ${CI_REGISTRY_PASSWORD} ${CI_REGISTRY}
- apk add --no-cache python3 py3-pip helm
- python3 -m venv /path/to/venv
- . /path/to/venv/bin/activate
- pip3 install --no-cache-dir awscli
- helm repo add --username ${CI_REGISTRY_USER} --password ${CI_REGISTRY_PASSWORD} gitlab-ai-services *****
- helm repo update
script:
- export AWS_ACCOUNT_ID AWS_REGION AWS_SECRET_ACCESS_KEY AWS_ACCESS_KEY_ID IMAGE_NAME
- echo “check starting state of docker images”
- df
- echo “Pull images from gitlab image repo”
- docker pull ${CI_REGISTRY_IMAGE}/${IMAGE_NAME}-image:${VERSION}
- aws ecr get-login-password --region $AWS_REGION |
docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com
- echo “Push images to AWS ECR”
- docker tag ${CI_REGISTRY_IMAGE}/${IMAGE_NAME}:${VERSION} $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/${IMAGE_NAME}:${VERSION}
- docker push ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${IMAGE_NAME}:${VERSION}
when: manual

Versions

Please select whether options apply, and add the version information.

  • Self-managed
  • GitLab.com SaaS
  • Self-hosted Runners

Versions

  • GitLab (Web: /help or self-managed system information):
    17.3.0-pre 217d5244052
  • GitLab Runner, if self-hosted (Web /admin/runners or CLI gitlab-runner --version):
    n/a