Reduce Binary File Storage Repo Size?

Hi, I have a GitLab Repo where I am storing some binary files. Over time, with multiple commits for each binary file, the size of the repo has grown substantially. In an attempt to reduce the size of the repo by getting rid of all the old versions of the files (I am only interested in the latest version of each) I followed this guide: https://docs.gitlab.com/ee/user/project/repository/reducing_the_repo_size_using_git.html.

While the repo size appeared to be smaller, the indicated storage used did not change. I also thought that performing the actions in the guide would keep the latest version of each binary file, however it removed them all; including the latest versions.

Am I doing something wrong ?

P.S. I know that it is bad practice to store binary files in the first place.

Hi @rph, welcome to the GitLab Community Forum!

The instructions you linked to may not display expected results instantly because of a delay between Repository cleanup and git gc. There’s a relevant issue here: https://gitlab.com/gitlab-org/gitlab/-/issues/220104

Can you please manually trigger housekeeping on the repository? You can find this option under your project’s Settings > General > Advanced. You’ll receive an email once Housekeeping has completed with the updated repository size - hopefully minus the files you removed.

Let us know how it goes either way!

Hi, @gitlab-greg I forgot to mention that I did run housekeeping afterwards. Made no difference.

Hey @rph,

Could you show a screenshot?
For example, I’m wondering if you mean the right number here didn’t decrease:

image
The left number is the git repo size (small), the right is the total storage (rather large).

Reducing repo size and gc wouldn’t affect the right number which also includes artifacts in the pipelines among other things.

If you do find this is your issue, please be aware you need to delete the jobs before the pipelines to ensure statistics don’t end up wrong:

@n-hebert Hi, the size of my “Files” and “Storage” is the same. Both nearly 3GB.

@rph Did you make any commits or updates to the project following the Repository Cleanup process?

Can you try exporting the project and verifying how large the export archive is? I suspect it repository cleanup worked, the export would be significantly smaller than the Files/Storage size you see in the UI.

@n-hebert thanks for helping out, welcome to the GitLab Community forum! In this specific case, the problem is related to repository file storage and not artifact storage. A good call to check this, as artifacts can also take up a lot of space. For problems or questions about deleting cleaning up artifact storage, keep an eye on the issue here: https://gitlab.com/gitlab-org/gitlab/-/issues/224151

1 Like

Thanks for the welcome, @gitlab-greg. Checking the export is a good idea that should be pursued.

@rph, an additional aside to that which came to mind for me, having done this before, is to confirm that you did delete all the old branches on the repo.
The old branches (& tags) all need to be replaced by new ones (smaller ones) or else the repository is still using all the old files. :slight_smile:

An easier path might be to push to an empty repo once you confirm the filter worked well locally to see it from afar without having to clobber any old work. If you like what you see you can proceed forwards towards the original project’s namespace in various fashions.

1 Like

@gitlab-greg Yes, I exported the project right after performing all the actions on the post and it’s filesize was very small (greatly reduced). I have since made new commits on the binary files and the indicated filesize on GitLab has continued to increase…

@n-hebert I’m not an expert when it comes to this, so I only followed everything that was in the guide. :laughing: If there is anything else that needs to be done, I am happy to do it if I can find some instructions…

So is it possible to perform the cleanup tasks without removing the latest version of each binary file ?

Various clean up tasks will likely work, but probably not ones you’re worried about.

If you’re looking to gain significant storage space back, you do need all references to any large objects you want deleted completely obliterated from your git branches current commits and full history, or they are simply part of your git repo’s basic size.

When you clone the repo (full repository including all depth), is it the same size on disk? There will be some deviation due to compression and other factors, but you should be in the ballpark of how big it will be on GitLab.