GitLab 15 High Gitaly CPU Usage

Hello,

I recently upgraded my self-managed GitLab instance from 14.10.5 to 15.0.2 and then to 15.3.3 (as specified by Upgrading GitLab | GitLab). I waited for background migrations to complete after each upgrade and all background migrations completed successfully.

While running GitLab 14, I would occasionally see server load spikes but those were infrequent and short-lived. Since upgrading to GitLab 15 earlier this week, I have noticed sustained load spikes multiple times a day. Checking htop, I can see that the load is coming from lots of Gitaly jobs that are all using 90-150% CPU.

The load spikes are correlated with kicking off a CI pipeline that has roughly 200 jobs that each check out the repo and perform some tests. Once the CI jobs have finished checking out the code, the load drops back down to normal levels, even though the jobs may still be running. The CI jobs are all running on GitLab 15.3.0 runners. The CI jobs are doing shallow checkouts (GIT_DEPTH = 1) and the CI pipeline is unchanged since before the GitLab 15 upgrade.

This leads me to think that the issue is caused by the GitLab 15 upgrade. I checked the GitLab 15 changelog but didn’t see anything that seemed like it could be related to these load spikes.

Any thoughts on what could be causing this or what to check? Thanks!

… what could be causing [load spikes] or what to check?

Our fast-stats tool helps to analyse various GitLab logs. Esp. the top command will help identify which projects, users (or bots!) cause most load. Also our 2 advice about finding “user agents” in the Workhorse & API logs have proven useful in finding the causes for CPU spikes.


In this, case GIT_DEPTH = 1 is a prime suspect, because it causes high server load to compute the shallow clones. It may be better to optimise such large pipelines in other ways: