We’re using Gitlab Enterprise Edition version
11.10.4 . Our server setup is 1x
master and 2x
worker nodes (physical dedicated servers each).
master server runs pretty much everything, including a standalone postgres instance (e.g. installed directly on to the server, not part of the Gitlab setup) and our
worker servers almost exclusively run
The setup is as follows (I wasn’t involved with this, so not sure why):
9, no cluster/queue groups enabled
50, no cluster/queue groups enabled
50, queue groups enabled for various
pipeline_processing:queues (0/2 on about ~5 different pipeline queues)
My three questions are as follows.
worker boxes, the
BuildQueueWorker jobs take ~190 seconds each. On our
master server, these jobs take <20 seconds.
What could be causing this slowness? Is it because PostGres is on localhost on the
master server perhaps?
Occasionally we’re seeing
UpdateMergeRequest jobs becoming stuck (on all 3 servers). If I then investigate PostGres locks, there are loads of blocked locks for queries involving the
internal_ids. If I kill these locks (
pid from these queries, then kill them with
xargs), the jobs start flowing again.
When this happens, all other jobs stop processing - regardless of whether they’re on worker or master.
Is there a way to stop this from happening?
The jobs that we see most often are
ExpireJobCacheWorker (all quick - 0.1xx seconds), then the aforementioned
PostReceive (both very slow, as mentioned) and finally
BuildQueueWorker (both reasonably slow, the latter as per my first question).
Given that we’ve got 3 separate workers, is there a way I should be tuning our setup for the purpose of improving the burndown on these jobs? Would it make sense for example to have 1 of each of those “categories” of queue on each server? E.g.