Background Migrations stuck after upgrade from 14.0.10 to 14.2.3


after the Upgrade from 14.0.10 to 14.2.3 via omnibus I ran into the issue described in Batched background migrations | GitLab and I did run the gitlab-rake gitlab:background_migrations:finalize[CopyColumnUsingBackgroundMigrationJob,push_event_payloads,event_id,'[["event_id"]\, ["event_id_convert_to_bigint"]]'] command several times and it finished each time with Done.

But the upgrade still failed again and again with the same message although a manual gitlab start worked. So 14.2.3 is running now, but I see the following background jobs queued and stuck at eitehr 74% or 0%, see screenshot.

I read some forum posts, that they might take some time but even after 1d no change.

When I run sudo gitlab-rake db:migrate:status all are in the state up and none in down.
I also tried the following:

udo gitlab-rails c
 Ruby:         ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
 GitLab:       14.2.3-ee (b5eea856eca) EE
 GitLab Shell: 13.19.1
 PostgreSQL:   12.7
Loading production environment (Rails
irb(main):001:0> scheduled_queue =
=> #<Sidekiq::ScheduledSet:0x00007fef49a69948 @name="schedule", @_size=3>
irb(main):002:0> pending_job_classes = { |job| job["class"] == "BackgroundMigrationWorker" }.map { |job| job["args"].first }.uniq
=> []

Which results empty.

The command sudo gitlab-rails runner -e production 'puts Gitlab::BackgroundMigration.remaining' takes some time and returns either 0 or 1.

So I found Background migrations | GitLab which recommends to run delete_queued_jobs('BackgroundMigrationClassName') but trying this with the names from the webui results in:

Traceback (most recent call last):
        2: from (irb):6
        1: from (irb):7:in `rescue in irb_binding'
NoMethodError (undefined method `delete_queued_jobs' for main:Object)

irb(main):008:0> delete_queued_jobs('BackgroundMigrationWorker')
Traceback (most recent call last):
        2: from (irb):6
        1: from (irb):7:in `rescue in irb_binding'
NoMethodError (undefined method `delete_queued_jobs' for main:Object)

I also pressed the pause and resume button several times in the UI, no change.

Is there any other way to get them to resume, reset, delete or fix this?


So it turns out, that even if the state is up in sudo gitlab-rake db:migrate:status it might be necessary to rerun some of those.

I grepped in /opt/gitlab/embedded/service/gitlab-rails/db/post_migrate/ for those that would match the names of the queued but stuck jobs and did a sudo gitlab-rake db:migrate:redo VERSION=${i} with each of those. That slowly fixed the issue. Another sudo gitlab-ctl upgrade finished without issues.