We have a very confusing issue with our Gitlab server.
The server is:
- running 14.9.3 and is using mostly default settings.
- running ubuntu 22.04, kernel 5.13.0.
- is virtualized on vmware.
- is configured with 4 CPUs and 8GB ram.
- is not heavily used, ~10 users and a handful of active repos.
Since upgrading from 13.12 we have started getting “freezes” in network connectivity.
In principle, all network traffic halts completely for around 60s before continuing as normal.
This includes traffic to and from the machine, but also between services on the machine (e.g. between workhorse and gitlab-rails).
This happens when running a few specific CI jobs.
The job that triggers the issue just runs gradle clean
.
It uploads/downloads a cache, but that is as far as I know handle by the runner, not the server.
The git repo is only around 1MB in size.
We have checked for memory/disk/cpu issues but can’t find any obvious problems with resource constraints.
The only few clues we have is that gitlab-workhorse reports:
badgateway: failed to receive response: context canceled
and
badgateway: failed to receive response: dial unix /var/opt/gitlab/gitlab-rails/sockets/gitlab.socket: connect: no such file or directory
But we are fairly sure this is just a symptom of the network stack being “frozen”.
We think this is not directly Gitlabs “fault”, but want to check here if anyone else have had similar issues?