Reproducible 'no space left on device' on a part of the CI build

Hello friends! :wink:

For a few weeks now we are suffering from failing CI builds on The project in question is Island of TeX / images / texlive 路 GitLab, the official TeX Live docker images.

One of the failing jobs is build:latest: [2022, yes, yes] (#2369728101) 路 Jobs 路 Island of TeX / images / texlive 路 GitLab. Curiously, a job with nearly the same specs is completing successfully: build:latest: [2022, no, no] (#2369299040) 路 Jobs 路 Island of TeX / images / texlive 路 GitLab.

We have had no space left on device errors in the past but this time, they won鈥檛 vanish over time and they are not resolved by retrying.

As this is we do not have control over the actual runners. We are already using the overlay2 storage driver and know there is enough disk space from running df -h.

Our suspicion is that the error is related to the final docker image being slightly larger than 5 GB where the succeeding builds always end up being slightly smaller.

Do you have any ideas how to solve the no space left on device?

We are on GitLab鈥檚 OSS plan if that helps. :wink:

Thanks a lot!

I鈥檓 not sure how much this would help you, but you can try pruning any Docker objects that you don鈥檛 need:

 docker image prune --filter label=stage=intermediate

Thanks for the suggestion, Sarah! Where would we apply this? As we are on, we have no control over the runners, after all. :sweat_smile:

Right, but this is something you鈥檇 put in your .gitlab-ci.yml file. Where you have docker build ... you can run docker image prune ... afterwards to remove temporary images that were created in your docker build.

Whether this will actually help you will depend a bit on where in the process your runner is running out of memory.

I also found this issue which may or may not be of use to you!

Hi @cereda

I would say you are actually using all the disk space on the Runner SaaS VM. Here are the specs SaaS runners on Linux | GitLab

As @snim2 suggested doing some clean-up mid Job might help here. Otherwise, I think your only option is to self-host runner if you need more disk space for your jobs or optimize/split them.