we use self-hosted Gitlab on Kubernetes using official Helm Chart from Gitlab.
The problem is that when webservice deployment scales down, we observe 502 errors on ingress controller (Traefik).
We have blackout, terminationGracePeriod configured, but this really doesn’t help.
What we have found so far.
webservice pod has two containers: workhorse, and the puma webservice.
When we or HPA deletes webservice pod we see that puma container honestly waits for blackout period, but workhorse container exits immediately after it gets SIGTERM from kubernetes.
Since ingress is pointing to workhorse 8181 port, it was clear that root cause behind 502 errors is workhorse container.
As an experiment, we added preStop hook with sleep 60 right to workhorse container in webservice deployment manifest, and the problem with 502 has gone.
The thing here is that Helm chart, most probably by some reason, doesn’t have a way to specify pod lifecycle hooks.
Please advise, if we should ask Gitlab helm chart maintainers to add option to configure lifecycle hooks for workhorse container.