Clone or pull of meta repositories sometimes get stuck on the "git-upload-pack" step of a random subrepo

Hello gitlab community,

At my company, we have the following use case :

  • We’re using gitlab CE via the official docker image (self-hosted).
  • We’re using versions 12.xx but due to the following problem we switched to version 13.5.3 yesterday without improvement.
  • We use mostly meta repositories. We clone them with git clone --recursive [meta url] then to pull the last changes we have a simple bash script that iterates on all subdirectories we cloned and do a git checkout master; git pull --rebase in every directory. Our biggest meta contains 64 sub repositories.

When working from our internal network everything runs fine, but when working remotely through our VPN, it seems to work for most of the subrepos then almost every time gets stuck on some random sub-repository. We didn’t see any blocked connection or denied packet in the firewall logs (but it’s a little complicated, as it’s a company wide hardware firewall managed remotely by a distant IT service and I don’t have a direct access on it)

I extracted the last output of GIT_TRACE, as said before it’s not always the same sub repository that gets stuck

15:56:20.592209 run-command.c:663       trace: run_command: unset GIT_DIR; GIT_PROTOCOL=version=2 ssh -o SendEnv=GIT_PROTOCOL -p [ourPort] git@[ourAddress] 'git-upload-pack '\''/measurement/soft.atr.logger.unsafe.git'\'''

and GIT_TRACE_PACKET

15:56:20.137410 pkt-line.c:80           packet:     sideband< PACK ...
15:56:20.243714 pkt-line.c:80           packet:     sideband< \2Total 1507 (delta 190), reused 182 (delta 133), pack-reused 1239
15:56:20.243760 pkt-line.c:80           packet:     sideband< 0000

Then nothing. It seems that sometimes it recovers and continue but most of the time it stays stuck for hours without any progress. It seems a workaround for this problem is to add a 3 seconds sleep between all subrepos checkout + pull (but it’s really annoying and there is no workaround for git clone --recursive as there’s no way to add a sleep command there).

We never encountered an issue (yet) cloning a single repo (not a meta) yet.

Does anyone knows how to pursue the investigation about this problem ?