Since we move from 15.6 to 15.7, git operations would become slow and eventually timeout
It’s very strange that the slowness doesn’t appear immediately once upgrading to 15.7. We observed that it took about a few hours (long enough after all background migration jobs have completed) before git operations became slow. Resource consumption looked fine throughout the process but git operation response time would sharply increase to the point where they would time out at some point.
At first we thought it’s a memory leak problem but memory consumption by pods looked ok.
Looking at logs, it looks suspicious that as soon as a worker starts, it shuts down:
15:16:49 gitlab-webservice-default-5b548f49b9-48qzd webservice unknown - Worker 18 (PID: 216) booted in 0.13s, phase: 0 15:16:49 gitlab-webservice-default-5b548f49b9-48qzd webservice unknown === puma shutdown: 20:16:49 +0000 === 15:16:49 gitlab-webservice-default-5b548f49b9-48qzd webservice unknown - Goodbye!
!108112 (merged) change also caught our attention. While it was enabled by default, we observed in logs
memory limit exceeded even before any slowness happened. We tried disabling it via op feature flag. The error message went away but the slowness was still there.
In our self hosted k8s environment, we have enough memory allocated for the webservice deployment and we have tried bumping up # of pods while reducing # of puma workers per pod.
Has anyone else seen similar symptom on 15.7 or later versions?