Which deduplication factor to expect with unchanged forked repo?

I need to maintain binaries in some directory tree with possibly heavily customized configs and stuff for various different purposes and have chose to try that in GitLab with special deployment projects. One “template” project contains a default deployment based on those binaries and other projects are forked off that template and customized as needed. Those customisations mostly NOT include changing large binaries, but mostly only configs, possibly different directory tree layout and stuff like that. In theory, deduplication of GitLab should handle that pretty well.

But I’ve tested this with one binary right now and didn’t get expected results: The template project contains one EXE of ~10 MiB in size, which results in an overall repo size of ~4,7 MiB shown in web-UI of GitLab. I forked one repo, but actually BEFORE the EXE has been uploaded to the template, so pretty much empty, simply because the EXE wasn’t available yet when setting things up. After the EXE was pushed to the template, I created an upstream remote in the fork, pulled the EXE from upstream and pushed it into origin.

That resulted in ~4,7 MiB in size for the forked repo. The docs of GitLab mention some housekeeping being necessary to actually share data between pool and forked repo, so I issued that using the web-UI for the forked repo. Afterwards GitLab showed ~3,7 MiB for the size of the forked repo, while I would have expected much less, as all the data used by the fork is available in the pool already in theory.

So are these numbers as expected? Is there something wrong in my process, like data shared only at time of creating the fork? I didn’t understand the docs that way. Or do I simply need to provide more time to housekeeping in background?

Thanks for your thoughts!