Hello,
we are administering a self-hosted Gitlab instance for one of our clients.
Gitlab version in use: v14.7.0-ce.0
Host OS: Debian 9 Stretch
Said instance is having issues with high disk use in the Gitlab data folder (/var/opt/gitlab), caused primarily by artifact storage (/var/opt/gitlab/gitlab-rails/shared/artifacts) – Said directory currently takes up 190 GBs and contains directories created as far back as 2020.
I verified through the Admin section (/admin/jobs) that I can only download artifacts from jobs that are no older than the global expiration limit (2 weeks), and that no project has an expiration set to more than 2 weeks (In its .gitlab-ci.yml) and verified the artifact cleanup cron job is being scheduled and ran correctly (Via sidekiq log /var/log/gitlab/sidekiq/current)
Is there a way to pair a specific on disk artifact file to a build job in Gitlab to find out why the file isn’t being cleaned up? For example the file /var/opt/gitlab/gitlab-rails/shared/artifacts/26/d2/26d228663f13a88592a12d16cf9587caab0388b262d6d9f126ed62f9333aca94/2020_04_20/56466/63844/artifacts.zip
Note: There are many more 2020_…_… directories under the 26d22… dir. That one file is just an example that takes up about 26MBs
I tried using the rails console to find more details, alas, I am not very well versed in the Gitlab API / Libraries that that environment provides and was thus unable to find anything useful to go by.
Edit: Running gitlab-rake gitlab:cleanup:orphan_job_artifact_files
also doesn’t find any orphaned to clean up: Processed 171887 job artifact(s) to find and cleaned 0 orphan(s).