Gitaly crashloop on install

Problem to solve

When installing gitlab via the helm chart into my new kubernetes cluster, I got everything to run except gitaly. It fails in a crashloop, but there is no error in the gitaly log, and no event in the kubernetes event list that might explain what the problem is. I am completely blocked here. I have made sure that there are entries for my gitlab, registry and minio in not only my local DNS, but even entries in the cluster’s coredns. I am using an NFS based PV provisioner, and gitaly is creating a PVC and a PV is being spawned to match it. The PV is successfully bound to the PVC, so I don’t know what else might be missing to make gitaly happy.

I see an earlier version of this question from 2018 that never got any answers. Surely people are running gitlab in a kube cluster with gitaly - how did you get it working?

Steps to reproduce

  1. Install gitlab via helm chart.
  2. Gitaly crashloop.

Configuration

My installation is configured with my own postgres & cert manager, so I have them turned off in the values file. I am using an NFS provisioner running in my TrueNAS server. I’m able to log into the gitlab instance and create a user and a group, so I’m sure the postgres install is correct and I am getting certs (these are based on my own self-signed CA) as expected, so the cert-manager is working correctly as well.

Versions

  • Self-managed
  • GitLab.com SaaS
  • Dedicated

Versions

  • GitLab (Web: /help or self-managed system information sudo gitlab-rake gitlab:env:info):
    Community Edition v17.5.1

I have checked both in forums here and online and haven’t seen any discussion of this issue.

Here is the log from the gitaly pod:

$ k logs -f gitlab-gitaly-0  -n gitlab
Defaulted container "gitaly" out of: gitaly, certificates (init), configure (init)
Begin parsing .tpl templates from /etc/gitaly/templates
Writing /etc/gitaly/config.toml
Copying other config files found in /etc/gitaly/templates to /etc/gitaly
Starting Gitaly
{"component": "gitaly","subcomponent":"gitaly","level":"info","msg":"maxprocs: Updating GOMAXPROCS=1: determined from CPU quota","pid":16,"time":"2024-11-02T21:07:34.993Z"}
{"component": "gitaly","subcomponent":"gitaly","latencies":[0.001,0.005,0.025,0.1,0.5,1,10,30,60,300,1500],"level":"info","msg":"grpc prometheus histograms enabled","pid":16,"time":"2024-11-02T21:07:34.993Z"}
{"component": "gitaly","subcomponent":"gitaly","level":"info","msg":"Starting Gitaly","pid":16,"time":"2024-11-02T21:07:34.993Z","version":"17.5.1"}
{"component": "gitaly","subcomponent":"gitaly","duration_ms":0,"level":"info","msg":"finished initializing cgroups","pid":16,"time":"2024-11-02T21:07:34.993Z"}
{"component": "gitaly","subcomponent":"gitaly","duration_ms":189,"level":"info","msg":"finished unpacking auxiliary binaries","pid":16,"time":"2024-11-02T21:07:35.183Z"}
{"component": "gitaly","subcomponent":"gitaly","duration_ms":0,"level":"info","msg":"finished initializing bootstrap","pid":16,"time":"2024-11-02T21:07:35.183Z"}
{"component": "gitaly","subcomponent":"gitaly","duration_ms":0,"level":"info","msg":"finished initializing command factory","pid":16,"time":"2024-11-02T21:07:35.183Z"}
{"component": "gitaly","subcomponent":"gitaly","binary_path":"/tmp/gitaly-1036465299/git-exec-655555590.d/git","level":"info","msg":"using Git binary","pid":16,"time":"2024-11-02T21:07:35.183Z"}
{"component": "gitaly","subcomponent":"gitaly","duration_ms":1,"level":"info","msg":"finished detecting git version","pid":16,"time":"2024-11-02T21:07:35.185Z"}
{"component": "gitaly","subcomponent":"gitaly","level":"info","msg":"using Git version","pid":16,"time":"2024-11-02T21:07:35.185Z","version":"2.46.2"}
{"component": "gitaly","subcomponent":"gitaly","level":"info","msg":"clearing disk cache object folder","pid":16,"storage":"default","time":"2024-11-02T21:07:35.185Z"}

And if I kill the gitaly pod and let it recreate, it goes into a crash loop within 2 seconds, and here is the event list:

gitlab      15s         Normal    SuccessfulCreate   statefulset/gitlab-gitaly           create Pod gitlab-gitaly-0 in StatefulSet gitlab-gitaly successful
gitlab      15s         Normal    NoPods             poddisruptionbudget/gitlab-gitaly   No matching pods found
gitlab      14s         Normal    Started            pod/gitlab-gitaly-0                 Started container certificates
gitlab      14s         Normal    Created            pod/gitlab-gitaly-0                 Created container certificates
gitlab      14s         Normal    Pulled             pod/gitlab-gitaly-0                 Container image "registry.gitlab.com/gitlab-org/build/cng/certificates:v17.5.1" already present on machine
gitlab      12s         Normal    Pulled             pod/gitlab-gitaly-0                 Container image "registry.gitlab.com/gitlab-org/build/cng/gitlab-base:v17.5.1" already present on machine
gitlab      12s         Normal    Created            pod/gitlab-gitaly-0                 Created container configure
gitlab      12s         Normal    Started            pod/gitlab-gitaly-0                 Started container configure
gitlab      10s         Normal    Pulled             pod/gitlab-gitaly-0                 Container image "registry.gitlab.com/gitlab-org/build/cng/gitaly:v17.5.1" already present on machine
gitlab      10s         Normal    Created            pod/gitlab-gitaly-0                 Created container gitaly
gitlab      10s         Normal    Started            pod/gitlab-gitaly-0                 Started container gitaly
gitlab      4s          Warning   BackOff            pod/gitlab-gitaly-0                 Back-off restarting failed container gitaly in pod gitlab-gitaly-0_gitlab(825a05d5-1b69-4f69-a9a7-7e507cb126e8)

I found the solution. In this case, the problem was PV disk permissions. I am using an NFS server, and the NFS dataset I’m using has permissions of being owned by nobody:nogroup, which are uids 65534:65534. I also then had to set the gitlab.gitaly.securityContext.fsGroup = 65534. I did it in a values override file, but you could pass it on the commandline as well.

I also had set the “maproot user = root” and “maproot group = wheel” as well in the nfs share section.