Triple question about sidekiq background jobs

dunczzz · January 16, 2020, 1:54pm

Hi all,

We’re using Gitlab Enterprise Edition version 11.10.4 . Our server setup is 1x master and 2x worker nodes (physical dedicated servers each).

Our master server runs pretty much everything, including a standalone postgres instance (e.g. installed directly on to the server, not part of the Gitlab setup) and our worker servers almost exclusively run sidekiq processes.

The setup is as follows (I wasn’t involved with this, so not sure why):

master -> sidekiq['concurrency'] = 9, no cluster/queue groups enabled
worker1 -> sidekiq['concurrency'] = 50, no cluster/queue groups enabled
worker2 -> sidekiq['concurrency'] = 50, queue groups enabled for various pipeline_processing: queues (0/2 on about ~5 different pipeline queues)

My three questions are as follows.

On the worker boxes, the BuildQueueWorker jobs take ~190 seconds each. On our master server, these jobs take <20 seconds.

What could be causing this slowness? Is it because PostGres is on localhost on the master server perhaps?

Occasionally we’re seeing PostReceive and UpdateMergeRequest jobs becoming stuck (on all 3 servers). If I then investigate PostGres locks, there are loads of blocked locks for queries involving the relname of internal_ids. If I kill these locks (SELECT the pid from these queries, then kill them with xargs), the jobs start flowing again.

When this happens, all other jobs stop processing - regardless of whether they’re on worker or master.

Is there a way to stop this from happening?

The jobs that we see most often are BuildHooksWorker, ArchiveTraceWorker and ExpireJobCacheWorker (all quick - 0.1xx seconds), then the aforementioned UpdateMergeRequest, PostReceive (both very slow, as mentioned) and finally PipelineUpdateWorker and BuildQueueWorker (both reasonably slow, the latter as per my first question).

Given that we’ve got 3 separate workers, is there a way I should be tuning our setup for the purpose of improving the burndown on these jobs? Would it make sense for example to have 1 of each of those “categories” of queue on each server? E.g. BuildHooksWorker -> master, ArchiveTraceWorker -> worker1 etc?

Many thanks