Scenario
I have a repository with a rich history of changes. To keep things clean, I decided to trim its history by removing old branches and truncating commit history. For testing purposes, I deleted all branches and every commit except the newest one, making it the new HEAD. I also removed all tags, merge requests, and pipelines using the API.
Problem
Despite these cleanup efforts, there are still gigabytes of files in Git LFS (Large File Storage). I performed housekeeping via the Web GUI and a rake task, as well as ran a job to prune orphaned LFS files. I even checked the repository on the server itself to ensure there were no remaining references:
$ cd /var/opt/gitlab/git-data/repositories/@hashed/aa/bb/xxxx.git/refs
$ ls heads/
# <empty>
$ ls tags/
# <empty>
$ ls merge-requests/
# <empty>
$ ls keep-around/
# <empty>
Whacky workaround
I found the following workaround.
-
export the project
-
list all files that are still referenced in lfs:
git lfs ls-files --all -l
-
open the exported archive
-
delete all files from tree/lfs-objects that where not listed in step 2
-
compress the archive again
-
delete the old repository
-
let GitLab delete the orphaned files from LFS:
sudo gitlab-rake gitlab:cleanup:orphan_lfs_files
-
create a new project at the same place and import the modified export
-
verify the project integrity:
sudo gitlab-rake gitlab:git:fsck
After following these steps, the orphaned files are finally deleted. However, this process seems error-prone and somewhat risky.
What am I missing that prevents GitLab from automatically garbage collecting the orphaned files?