Sidekiq `MergeRequests::ProcessScheduledMergeWorker` jobs stuck at 100 % CPU on dedicated queue (Kubernetes deployment)

Hi everyone,

We first had a problem with GitLab not starting pipelines when pushing code, nor when starting them manually.
At first, pipelines started with a huge delay, and later they stopped starting altogether.

When we analyzed the system, we found that Sidekiq was hanging exclusively on jobs of type MergeRequests::ProcessScheduledMergeWorker, which were consuming 100% CPU and blocking other jobs from being processed.

To mitigate this, we created a dedicated queue and routing rule for that worker, and deployed Sidekiq pods dedicated to the scheduled-merge queue.
Since then, GitLab itself runs normally again and pipelines start as expected, but those specific jobs continue to accumulate in the dedicated queue, and the Sidekiq instances working on them remain stuck, each with a single CPU core fully utilized.


Deployment

We’re running GitLab 18.1.1 in Kubernetes, using the official GitLab Helm chart.

Our Sidekiq configuration includes horizontally scalable pods dedicated to the scheduled-merge queue. The scheduled-merge shard is configured with concurrency: 1 (per pod) and the chart autoscaling allows up to 4 replicas:

sidekiq:
  resources:
    requests:
      cpu: "900m"
      memory: "2Gi"
    limits:
      memory: "4Gi"
  minReplicas: 1
  maxReplicas: 4
  concurrency: 20

  pods:
    - name: immediate
      queues: high-urgency,network-intensive,mailers
      concurrency: 10
      replicas: 1
    - name: catchall
      queues: default
      concurrency: 5
      replicas: 1
    - name: scheduled-merge
      queues: scheduled-merge
      concurrency: 1
      replicas: 1

Routing rules:

sidekiq:
  routingRules:
    - ['resource_boundary!=cpu&urgency=high', 'high-urgency']
    - ['has_external_dependencies=true|feature_category=hooks|tags=network', 'network-intensive']
    - ['worker_name=MergeRequests::ProcessScheduledMergeWorker', 'scheduled-merge']
    - ['*', 'default']

So we have 4 pods handling only that queue, each with concurrency = 1.
After setting up the routing and this dedicated queue, GitLab works normally again and pipelines start as expected —
however, the scheduled-merge jobs still accumulate indefinitely, and those 4 Sidekiq pods remain stuck at 100 % CPU.


What I observed

Inside one of the scheduled-merge Sidekiq pods:

kubectl exec -it -n cicd pods/gitlab-cicd-sidekiq-scheduled-merge-... -c sidekiq -- /bin/sh
$ ps -fauxww
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
git           20  0.0  0.1 182632 47744 ?        Sl   Oct15   0:03 ruby /srv/gitlab/bin/sidekiq-cluster -r /srv/gitlab -e production -c 1 -t 25 scheduled-merge
git           26 99.6  3.2 1467524 793412 ?      Sl   Oct15 1415:18  \_ sidekiq 7.3.9 queues:scheduled-merge [1 of 1 busy]

Each pod shows the same pattern: the sidekiq process handling the scheduled-merge queue constantly uses ~100% CPU, even with concurrency: 1, and jobs remain stuck.

Recent Sidekiq logs from the pod also include repeated Redis latency warnings:

kubectl logs -n cicd pods/... -c sidekiq --since=5m
{"severity":"WARN","time":"2025-10-16T08:50:06.306Z","message":"Your Redis network connection is performing extremely poorly.\nLast RTT readings were [99950, 100054, 99928, 99894, 99969], ideally these should be < 1000.\nEnsure Redis is running in the same AZ or datacenter as Sidekiq.\nIf these values are close to 100,000, that means your Sidekiq process may be\nCPU-saturated; reduce your concurrency and/or see https://github.com/sidekiq/sidekiq/discussions/5039"}

And a later sample:

{"severity":"WARN","time":"2025-10-16T08:51:41.711Z","message":"Your Redis network connection is performing extremely poorly.\nLast RTT readings were [99955, 100745, 100303, 100026, 99881], ideally these should be < 1000."}

I ran kill -TTIN 20 on the Sidekiq process, as recommended for debugging. This produced thread backtrace logs showing many threads sleeping, along with Sidekiq internals related to dispatch, job scheduling, and connection pool waits.

{"severity":"WARN","time":"2025-10-16T08:55:26.080Z","message":"/srv/gitlab/vendor/gems/sidekiq/lib/sidekiq/cli.rb:206:in `backtrace' ... /srv/gitlab/vendor/gems/sidekiq/lib/sidekiq/processor.rb:131:in `dispatch' ... /srv/gitlab/vendor/gems/sidekiq/lib/sidekiq/processor.rb:86:in `process_one' ..."}
...
{"severity":"WARN","time":"2025-10-16T08:55:27.582Z","message":"/srv/gitlab/vendor/bundle/ruby/3.2.0/gems/connection_pool-2.5.3/lib/connection_pool/timed_stack.rb:77:in `sleep' ... /srv/gitlab/vendor/gems/sidekiq/lib/sidekiq/scheduled.rb:118:in `wait'"}

also ran this diagnostic command from the toolbox pod:

gitlab-rails runner "require 'sidekiq/api'; Sidekiq::Workers.new.each {|proc, thread, work| puts work.inspect }"

It shows that all four pods each have one active job, all of the same class:

#<Sidekiq::Work ... "class"="MergeRequests::ProcessScheduledMergeWorker", "meta.caller_id"="Cronjob", "idempotency_key"="resque:gitlab:duplicate:scheduled-merge:a6d655d277495cf53e182e80822575dacc8a3389eea573549908a88faf373827" ... 

So the same job appears duplicated across pods, all stuck “running” indefinitely.
This seems to confirm that the workers are hanging inside the scheduled-merge processing logic.


Questions:

  1. Is there any known issue or recent regression with MergeRequests::ProcessScheduledMergeWorker getting stuck?
  2. Any recommended way to debug further these hanging jobs?

Environment:

gitlab-rake gitlab:env:info
System information
System:
Current User:   git
Using RVM:      no
Ruby Version:   3.2.5
Gem Version:    3.6.9
Bundler Version:2.6.9
Rake Version:   13.0.6
Redis Version:  7.0.15
Sidekiq Version:7.3.9
Go Version:     unknown

GitLab information
Version:        18.1.1
Revision:       1587ac6c162
Directory:      /srv/gitlab
DB Adapter:     PostgreSQL
DB Version:     16.9
URL:            https://gitlab.intern.gipmbh.de
HTTP Clone URL: https://gitlab.intern.gipmbh.de/some-group/some-project.git
SSH Clone URL:  ssh://git@gitlab.intern.gipmbh.de:7999/some-group/some-project.git
Using LDAP:     no
Using Omniauth: yes
Omniauth Providers: openid_connect

GitLab Shell
Version:        14.42.0
Repository storages:
- default:      tcp://gitaly-cicd-service:8075
GitLab Shell path:              /home/git/gitlab-shell

Gitaly
- default Address:      tcp://gitaly-cicd-service:8075
- default Version:      18.1.1
- default Git Version:  2.49.0.gl2