Random 500 internal server errors

We have a self hosted single server instance of gitlab.

Our developers are seeing frequent (although intermittent) 500 (internal server error), these errors often can occur when accessing the docker repository, and are often associated with:

“GET /jwt/auth?scope=repository”

Requests in the nginx access logs but not always.

Can someone help me start un-pick this ?

Where to start looking ?

Spec of server? How much cpu/ram allocated to the Gitlab server?

CPU 8 cores
RAM 28 Gig

vmstat shows 0 swap in 0 swap out cpu % 96 idle 0 waiting 0 stolen.

OK, so cpu/ram specs OK.

As you mentioned registry/docker repository, anything showing up in /var/log/gitlab/registry/current when these errors occur?

You can also check:

gitlab-ctl status

and see if any services have recently restarted as well.

Doesn’t look like they were:

[root@devgitlab3 nginx]# gitlab-ctl status
run: alertmanager: (pid 1418) 486771s; run: log: (pid 1404) 486771s
run: gitaly: (pid 1399) 486771s; run: log: (pid 1397) 486771s
run: gitlab-exporter: (pid 1431) 486771s; run: log: (pid 1422) 486771s
run: gitlab-kas: (pid 1571) 486770s; run: log: (pid 1423) 486771s
run: gitlab-pages: (pid 1429) 486771s; run: log: (pid 1415) 486771s
run: gitlab-workhorse: (pid 1407) 486771s; run: log: (pid 1394) 486771s
run: logrotate: (pid 23576) 734s; run: log: (pid 1392) 486771s
run: mattermost: (pid 1414) 486771s; run: log: (pid 1411) 486771s
run: nginx: (pid 1417) 486771s; run: log: (pid 1403) 486771s
run: node-exporter: (pid 1420) 486771s; run: log: (pid 1400) 486771s
run: postgres-exporter: (pid 1430) 486771s; run: log: (pid 1421) 486771s
run: postgresql: (pid 1410) 486771s; run: log: (pid 1396) 486771s
run: prometheus: (pid 1412) 486771s; run: log: (pid 1395) 486771s
run: puma: (pid 1428) 486771s; run: log: (pid 1426) 486771s
run: redis: (pid 1398) 486771s; run: log: (pid 1391) 486771s
run: redis-exporter: (pid 1416) 486771s; run: log: (pid 1401) 486771s
run: registry: (pid 1409) 486771s; run: log: (pid 1408) 486771s
run: sidekiq: (pid 1413) 486771s; run: log: (pid 1402) 486771s

The server was rebooted 5 days ago, the errors are still occuring.