New installation, GitLab-CE, dies after a couple days with 502 errors

New to GitLab, and I can’t believe nobody else is having this problem but I’m not finding it anywhere. I installed GitLab v13.6.3 on RHEL 8 using Jeff Geerling’s Ansible playbook (which in turn uses the Omnibus package). It works fine for a couple of days, then goes into permanent 502 mode. Restarting GitLab doesn’t clear it, but rebooting the whole box does. What I’ve found online refers to ‘unicorn’, which doesn’t seem to be running or even part of CE. Help?

What cpu/ram specs for the machine? Puma is used instead of unicorn now in 13.x versions

Sorry; it’s an RHEL8 virtual machine under VMware ESXi 6.7, with 8 vCPU (increased from the original 4) and 8GB RAM, which ought to be enough for 1000 users. Right now there is only one user, me.

Yep that’s fine. Just wanted to check that in case the specs were low.

Firstly, I would suggest running this:

gitlab-ctl reconfigure

this is usually a good thing to do when there are problems occuring. Then when you’ve done this, do a

gitlab-ctl restart

and then see what happens. If it stabilises after this great, if not, then we’d need to take a look and see what is in the logs. Also, when it stops responding with the error 502, do a:

gitlab-ctl status

so we can see if all the processes are running. That way we can ascertain if something has failed, or if all processes are running, and one has blocked for some reason.

One other thing, are you using selinux? Try:

getenforce

if it shows enforcing, then we can try a:

setenforce 0

which will temporarily put selinux in permissive mode without actually blocking anything. Just so that we can rule out if selinux is the source of the problem.

Okay. I’ll have to wait for it to go back into 502, of course, so it’ll probably take a couple days. We are running SELinux, which is required (we’re a government site) so I can turn it to Permissive temporarily but it can’t stay that way.

I’ll post results when I have any. Thanks!

OK, so one more to check with selinux, make sure you have these packages installed (check the link): Upgrade to 13.5.1 failed

libsemanage-static libsemanage-devel policycoreutils

Hmm. No such package as libsemenage-static or libsemanage-devel. This is in RHEL 8. The policycoreutils package is there. I also have just libsemanage and python3-libsemanage, so it could be that those have just been renamed.

Ok lets see what happens when the problem repeats we can go through the steps i posted and also see what happens when selinux is temporarily disabled to rule it out as a potential cause to the problem.

Well, that didn’t take long. It’s dead again today.

[root@HECATE gitlab]# gitlab-ctl status
down: alertmanager: 1s, normally up, want up; run: log: (pid 3257236) 775s
run: gitaly: (pid 3270643) 29s; run: log: (pid 3255981) 808s
run: gitlab-exporter: (pid 3270663) 29s; run: log: (pid 3256974) 782s
run: gitlab-workhorse: (pid 3270667) 29s; run: log: (pid 3256697) 792s
run: grafana: (pid 3270696) 28s; run: log: (pid 3257374) 770s
run: logrotate: (pid 3270709) 28s; run: log: (pid 3256854) 787s
run: nginx: (pid 3270724) 27s; run: log: (pid 3256785) 789s
run: node-exporter: (pid 3270741) 27s; run: log: (pid 3256882) 785s
run: postgres-exporter: (pid 3270752) 27s; run: log: (pid 3257349) 772s
run: postgresql: (pid 3270771) 26s; run: log: (pid 3256185) 806s
run: prometheus: (pid 3270781) 26s; run: log: (pid 3257165) 777s
run: puma: (pid 3270991) 9s; run: log: (pid 3256340) 803s
run: redis: (pid 3270816) 25s; run: log: (pid 3255816) 811s
run: redis-exporter: (pid 3270821) 24s; run: log: (pid 3257132) 779s
run: sidekiq: (pid 3270880) 20s; run: log: (pid 3256542) 795s

So alert-manager is down but nothing else. Restarting didn’t bring alert-manager back up or allow me to get to GitLab. The 502, by the way, comes immediately, not after a delay. So it’s not trying something and timing out.

Setenforce 0 also doesn’t help, even with a restart after.

Rebooting the system will probably work, but it will also likely destroy any evidence so I won’t do that yet. It’s not in production yet, so I can keep it down.

Oh – I also tried gitlab-ctl reconfigure and gitlab-ctl restart; no luck.

So, this is still an issue. The server pretty much dies daily (meaning it returns 502 immediately) and only a reboot – not even gitlab-ctl restart – will bring it back. Any thoughts on how to proceed with this?

Check all the log files under /var/log/gitlab for errors and hope that something there will explain the problem.