LFS files are not garbage collected

Scenario

I have a repository with a rich history of changes. To keep things clean, I decided to trim its history by removing old branches and truncating commit history. For testing purposes, I deleted all branches and every commit except the newest one, making it the new HEAD. I also removed all tags, merge requests, and pipelines using the API.

Problem

Despite these cleanup efforts, there are still gigabytes of files in Git LFS (Large File Storage). I performed housekeeping via the Web GUI and a rake task, as well as ran a job to prune orphaned LFS files. I even checked the repository on the server itself to ensure there were no remaining references:


$ cd /var/opt/gitlab/git-data/repositories/@hashed/aa/bb/xxxx.git/refs

$ ls heads/

# <empty>

$ ls tags/

# <empty>

$ ls merge-requests/

# <empty>

$ ls keep-around/

# <empty>

Whacky workaround

I found the following workaround.

  1. export the project

  2. list all files that are still referenced in lfs: git lfs ls-files --all -l

  3. open the exported archive

  4. delete all files from tree/lfs-objects that where not listed in step 2

  5. compress the archive again

  6. delete the old repository

  7. let GitLab delete the orphaned files from LFS: sudo gitlab-rake gitlab:cleanup:orphan_lfs_files

  8. create a new project at the same place and import the modified export

  9. verify the project integrity: sudo gitlab-rake gitlab:git:fsck

After following these steps, the orphaned files are finally deleted. However, this process seems error-prone and somewhat risky.

What am I missing that prevents GitLab from automatically garbage collecting the orphaned files?

2 Likes