How to use git filter-repo's output files "commit-map" and "ref-map" to clean up the repository?

Hi folks, :raised_hand:

I am using git filter-repo to clean up some large binaries that I pushed to the repository by mistake. I don’t quite understand how to use these two produced files: the commit-map and the ref-map files. I’m following this Gitlab documentation: Reduce repository size | GitLab

In the “Repository Cleanup” section, they point out to, for example, upload a commit-map file to clean-up the repository.

To clean up a repository:

Go to the project for the repository.
Navigate to Settings > Repository.
Upload a list of objects. For example, a commit-map file created by git filter-repo which is located in the filter-repo directory.

If your commit-map file is larger than about 250KB or 3000 lines, the file can be split and uploaded piece by piece:

I have some questions here:

  • What does for example mean? What other instances of these kinds of files do filter-repo produce?
  • Should I do the same with the ref-map file?
  • Spoiler: I don’t think so. I have tried, but their structure is different, so Gitlab throws an error when running the clean-up with this file.
  • However, the the filter-repo documentation suggest that they’re the same kind of file, a mapping file maybe?
  • If I run the cleanup with the produced commit-map file, as the Gitlab’s documentation instructs, it causes my Merge Requests to break, since all the MRs still point towards the old commit hash. This is very undesirable because I can no longer look at old MR and see their changes. Is it an optional step? Can I avoid running it to keep my MRs?

image

  • Why does the Gitlab documentation state that the max file size is 250KB or 3000 lines but their UI says something different (A max of 40MB)?

Thank you very much in advance! :smile:

1 Like

Hi @Maximetinu,

Interesting points. Did you manage to answer them yourself or have any safe path for filtering your repo?

I want to purge some unintentionally large files added to our repo some time ago, but not feeling secure about the implications…

I didn’t answer them tbh, I just followed the steps in the documentation, everything worked fine, the large files were gone and the repo became much smaller, but all the commit hashes changed so the PRs prior to the operation got their references broken. I had to live with that, other than that everything worked fine

The safe path for me was playing around a lot with repository backups and recoveries before doing the surgery, so I could be confident that worse case scenario I could just revert the backup

It’s also important that you synchronise with other people using the repository so they don’t push their previous history with the large files again!

1 Like

My doubt was about the MRs. I didn’t want to lose any history. But I guess it is better to do it now the problematic commit is kind of recent…

Thanks for sharing your experience!

2 Likes

If it’s recent, go for it before it’s too late, you’ll only lose a few MRs, the ones that were opened between the commit with the large file and now. We had to clean up the history from the very first commit so we lost absolutely all of them.

It’s not that they are lost-lost, they are there, and you can see the description and everything… But when you go to the changes tab it shows nothing. It’s not imposible to track them back and find the new commits that replaced them if needed, but it’s not as handy as having the changes showing in the GitLab web UI

That said, my experience was 3 years ago. Maybe they have improved it from then. I always understood this as a bug, because IIRC, according to somewhere in their documentation, I remember reading that there is this mapping file produced that you can submit to the GitLab repo post cleanup to map the old commit hashes to the new ones, so the MRs references shouldn’t break… But it didn’t work for me. Maybe they have fixed it between then and now.

1 Like

I see the instructions say you should upload both the “commit-map” and “ref-map” files, but it looks like you only need the “commit-map” one. Trying to use the “ref-map” file might cause errors because GitLab expects something different.

1 Like