First of all I’ve been running this gitlab instance for many years now and sidekiq has been the number one problem for as long as I can remember.
But if I just look at the last 6 months this is what is happening.
I notice repos are not being updated so I check background jobs. Now there’s a dashboard but when I started using gitlab you had to check the process table.
Currently sidekiq says sidekiq 5.0.5 gitlab-rails [10 of 25 busy].
And in the Busy queue I see the following 1 week old job.
I know from previous incidents that trying to make it Stop does not help. So for troubleshooting purposes I send TTIN signal to sidekiq and save the log.
https://paste.fedoraproject.org/paste/2jsHS5oxnjmtw1Oq-cpIvw (it’s live until august 8th on this link)
I have no idea what it all means but I see a lot of references to connection and blocking so I tried restarting both redis and postgres but it did not help.
Eventually, as with all the other incidents the last 6 months, I’m forced to kill sidekiq with a KILL signal. The bundled gitlab restarts it and jobs tend to go through and repos are updated at this point.
And this repeats at least twice a month. Depending on how much I use gitlab. I seem to discover this issue everytime I create new repos.
The machine I run gitlab on has plenty of resources. It’s a vcenter 6 VM with no swap used, 255M RAM free, 2.7G RAM cached and plenty of disk free on the gitlab volume. Two CPU cores with load average: 0,20, 0,14, 0,13 just before I forcibly kill sidekiq.
It runs CentOS 7.
I’m using the ce package but I’ve separated nginx and postgres. Nginx is on the same system but also used for other sites.
Postgres is on a separate VM in the same subnet and same vcenter cluster.