Debugging artifact retention -- High Disk Space Use

Hello,

we are administering a self-hosted Gitlab instance for one of our clients.

Gitlab version in use: v14.7.0-ce.0
Host OS: Debian 9 Stretch

Said instance is having issues with high disk use in the Gitlab data folder (/var/opt/gitlab), caused primarily by artifact storage (/var/opt/gitlab/gitlab-rails/shared/artifacts) – Said directory currently takes up 190 GBs and contains directories created as far back as 2020.

I verified through the Admin section (/admin/jobs) that I can only download artifacts from jobs that are no older than the global expiration limit (2 weeks), and that no project has an expiration set to more than 2 weeks (In its .gitlab-ci.yml) and verified the artifact cleanup cron job is being scheduled and ran correctly (Via sidekiq log /var/log/gitlab/sidekiq/current)

Is there a way to pair a specific on disk artifact file to a build job in Gitlab to find out why the file isn’t being cleaned up? For example the file /var/opt/gitlab/gitlab-rails/shared/artifacts/26/d2/26d228663f13a88592a12d16cf9587caab0388b262d6d9f126ed62f9333aca94/2020_04_20/56466/63844/artifacts.zip

Note: There are many more 2020_…_… directories under the 26d22… dir. That one file is just an example that takes up about 26MBs

I tried using the rails console to find more details, alas, I am not very well versed in the Gitlab API / Libraries that that environment provides and was thus unable to find anything useful to go by.

Edit: Running gitlab-rake gitlab:cleanup:orphan_job_artifact_files also doesn’t find any orphaned to clean up: Processed 171887 job artifact(s) to find and cleaned 0 orphan(s).

Update: I was finally able to resolve the issue today. All of what I used is taken from here – Jobs artifacts administration | GitLab

Here is what I did:

All of the steps were done in the gitlab-rails console environment

First, I listed projects with the most on-disk storage:

include ActionView::Helpers::NumberHelper
ProjectStatistics.order(build_artifacts_size: :desc).limit(20).each do |s|
  puts "#{number_to_human_size(s.build_artifacts_size)} \t #{s.project.full_path}"
end

Then I took the first project and printed its URI to compare what was reported in console, with what Gitlab reports online (ID is taken from the last command’s output):
puts Project.find_by_id(*id*).web_url

Online, Gitlab reported the project only had a few MBs of data on disk. So I finally found why none of the maintenance / cleanup tasks worked (Probably) – Gitlab just lost track of this project’s artifacts (~25GBs worth).

I then tried to remove artifacts of just that project:

Ci::JobArtifact.where(project: Project.find_by_id(408)).delete_all

That freed up some space, but not nearly enough. I was getting pretty frustrated by this point, so I went for the nuclear option – Remove all the builds older than 2 months (Incl. build logs and artifacts):

admin_user = User.find_by(username: 'username') 
builds_to_clear = builds_with_artifacts.where("finished_at < ?", 1.week.ago)
builds_to_clear.find_each do |build|
  print "Ci::Build ID #{build.id}... "

  if build.erasable?
    build.erase(erased_by: admin_user)
    puts "Erased"
  else
    puts "Skipped (Nothing to erase or not erasable)"
  end
end

This little snippet of code ran for several hours, and ended up removing more than 200.000 builds, some of which contained the artifact archives that were taking up most of the space on disk!

After the code finished, my disk space use fell from 85% to 25%

I have no idea what happened that the artifact files were completely invisible for the Gitlab instance, but I’m glad I was at least able to delete them in an automated way of some sort…

1 Like

Thanks for sharing your detailed solution. I think you are affected by this problem: Improve expire_at behavior for self hosted GitLab instances so wanted artifacts are not deleted (&7097) · Epics · GitLab.org · GitLab Suggest subscribing in case the problem returns.