Upgrading from Major Version 13 to 14, at 14.0.Z "Background Migrations" are stuck at 0% marked as Active

Similar issues and TL;DR

My issue is similar to Background migrations does not start when upgrading from version 13.12.11 to version 14.0.12 however I am running FOSS. There was no reply to this issue.

It might be possible that my issue is similar to Background Migrations stuck after upgrade from 14.0.10 to 14.2.3 but the circumstances are slightly different. I haven’t tried their migrate:redo trick, which I will do after completing this post.

As a note, I have discovered through writing this issue that I have a sidekiq_queue called " cronjob:database_batched_background_migration" but no corresponding Cron job in sidekiq… I wonder if that’s the problem. There is no button for me to add the cronjob.

How to execute stuck background migrations?

Howdy Gitlab Forum goers. I seek your eternal wisdom.

I have inherited the astonishing technical debt of an aged Gitlab platform, that was running Debian 8 (Jessie) and Gitlab 12.5.X. Following the wonderful upgrade pathways and tooling provided by the documentation archives, I’ve been able to bring it into this decade; now running Debian 11 Bullseye and Gitlab 14-0-stable.

Previously, I have tried to upgrade following the recommended pathway:

  • Upgrade Path
  • 13.12.15 → 14.0.12 → 14.3.6
    however I run into problems running the db:migration steps of the upgrade when going to 14.3.6, as it complains that there are unfinished background migrations, and there is no facility for me to resolve these. I believe it’s because there are critical migrations that are queued up in 14.0, 14.1 and 14.2 which are not addressed by the critical path upgrade, recommended by the upgrade tool.

The problem I have run into is that the upgrade pathway describes the next step as 14-3-stable, however elsewhere there are critical database migrations that need to be run, as some bigint/var limits change in some of the tables. I have determined that these jobs exist as “Background Migrations” (Distinctly different from Batched Background Migrations, introduced circa version 15 or late 14, which are the newer, allegedly better, version of background migrations which predate them.)

When I upgrade to version 14-0-stable, I can see there are 9 jobs in the Background Migrations (Admin GUI > Monitoring > Background Migrations) that are active and sitting at 0%. I have left them over the weekend, ensuring that my sidekiq cron jobs are all enabled, but there’s been no change.

I do not have any facility that I have found after a few days of trawling forums and issues and stackoverflow and the rest of the internet that will allow me to run these database background migrations manually. Downtime is not a problem here, and I can do dangerous maneuvers if required because I have snapshots and backups to rely on if something goes south.

Gitlab Check is otherwise gravy, and the output of “database migrations outstanding” is 0, whilst queued jobs is at 9.

And here’s the output of the Background Migrations Remaining checks, showing the 9 queued jobs:

root@git:/home/git/gitlab# sudo -u git -H bundle exec rails runner -e production 'puts Gitlab::BackgroundMigration.remaining'
0
root@git:/home/git/gitlab# sudo -u git -H bundle exec rails runner -e production 'puts Gitlab::Database::BackgroundMigration::BatchedMigration.queued.count'
9

There are no error messages in the logs that are obviously related to this issue, in fact the logs are very quiet, with few warnings or errors; I have pasted a few items that might be related below:

Summary

From application_json.log, this appears more-or-less on the hour

{"severity":"INFO","time":"2024-04-14T20:00:29.470Z","correlation_id":"3a11f55645c5e611ef2e30b17412513f","message":"StuckCiJobsWorker: Cleaning stuck builds"}

Steps to reproduce

  • I have already tried to forge ahead and upgrade to 14-3-stable but run into errors with the db:migrate. I am not comfortable pushing past these as I have determined that the schema change will likely be breaking, and I will end up locked at 14-3
  • I have tried a littany of rails console/rake commands to try and force the background jobs to complete; such as sudo -u git -H bundle exec rake "gitlab:background_migrations:finalize" RAILS_ENV=production
    Output of the finalize:
Summary

root@git:/home/git/gitlab# sudo -u git -H bundle exec rake “gitlab:background_migrations:finalize” RAILS_ENV=production
rake aborted!
Don’t know how to build task ‘gitlab:background_migrations:finalize’ (See the list of available tasks with rake --tasks)
Did you mean? gitlab:packages:migrate
/home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/rake-13.0.3/exe/rake:27:in <top (required)>' /usr/local/bin/bundle:23:in load’
/usr/local/bin/bundle:23:in `’
(See full trace by running task with --trace)

When running “rake --tasks” to check for other likely candidates, I get:

root@git:/home/git/gitlab# sudo -u git -H bundle exec rake --tasks RAILS_ENV=production
rake aborted!
NameError: uninitialized constant RSpec
/home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/active_support.rb:80:in block in load_missing_constant' /home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/active_support.rb:9:in without_bootsnap_cache’
/home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/active_support.rb:80:in rescue in load_missing_constant' /home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/active_support.rb:59:in load_missing_constant’
/home/git/gitlab/lib/tasks/benchmark.rake:7:in block in <main>' /home/git/gitlab/lib/tasks/benchmark.rake:5:in
/home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:55:in load' /home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:55:in load’
/home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/railties-6.1.3.2/lib/rails/engine.rb:675:in block in run_tasks_blocks' /home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/railties-6.1.3.2/lib/rails/engine.rb:675:in each’
/home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/railties-6.1.3.2/lib/rails/engine.rb:675:in run_tasks_blocks' /home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/railties-6.1.3.2/lib/rails/application.rb:521:in run_tasks_blocks’
/home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/railties-6.1.3.2/lib/rails/engine.rb:464:in load_tasks' /home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/railties-6.1.3.2/lib/rails/railtie.rb:207:in public_send’
/home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/railties-6.1.3.2/lib/rails/railtie.rb:207:in method_missing' /home/git/gitlab/Rakefile:14:in <top (required)>’
/home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/rake-13.0.3/exe/rake:27:in <top (required)>' /usr/local/bin/bundle:23:in load’
/usr/local/bin/bundle:23:in `’

Caused by:
NameError: uninitialized constant RSpec
/home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/active_support.rb:61:in block in load_missing_constant' /home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/active_support.rb:17:in allow_bootsnap_retry’
/home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/active_support.rb:60:in load_missing_constant' /home/git/gitlab/lib/tasks/benchmark.rake:7:in block in ’
/home/git/gitlab/lib/tasks/benchmark.rake:5:in <main>' /home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:55:in load’
/home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/bootsnap-1.4.6/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:55:in load' /home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/railties-6.1.3.2/lib/rails/engine.rb:675:in block in run_tasks_blocks’
/home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/railties-6.1.3.2/lib/rails/engine.rb:675:in each' /home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/railties-6.1.3.2/lib/rails/engine.rb:675:in run_tasks_blocks’
/home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/railties-6.1.3.2/lib/rails/application.rb:521:in run_tasks_blocks' /home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/railties-6.1.3.2/lib/rails/engine.rb:464:in load_tasks’
/home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/railties-6.1.3.2/lib/rails/railtie.rb:207:in public_send' /home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/railties-6.1.3.2/lib/rails/railtie.rb:207:in method_missing’
/home/git/gitlab/Rakefile:14:in <top (required)>' /home/git/gitlab/vendor/bundle/ruby/2.7.0/gems/rake-13.0.3/exe/rake:27:in <top (required)>’
/usr/local/bin/bundle:23:in load' /usr/local/bin/bundle:23:in
(See full trace by running task with --trace)

  • I have tried waiting out the migrations as some posts indicate that it can take “up to a day”. I have left them all weekend.

I believe that this can be replicating by building a FOSS Gitlab at ~12.5 and doing the FOSS manual upgrade steps by documentation to 14-0-stable, however I only have my environment to boot. The operating system or configuration could be a contributing factor.

Configuration

Here’s Gitlab Check:

Summary

root@git:/home/git/gitlab# sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production
Checking GitLab subtasks …
Checking GitLab Shell …
GitLab Shell: … GitLab Shell version >= 13.19.1 ? … OK (13.19.1)
Running /home/git/gitlab-shell/bin/check
Internal API available: OK
Redis available via internal API: OK
gitlab-shell self-check successful
Checking GitLab Shell … Finished
Checking Gitaly …
Gitaly: … default … OK
Checking Gitaly … Finished
Checking Sidekiq …
Sidekiq: … Running? … yes
Number of Sidekiq processes (cluster/worker) … 1/1
Checking Sidekiq … Finished
Checking Incoming Email …
Incoming Email: … Reply by email is disabled in config/gitlab.yml
Checking Incoming Email … Finished
Checking LDAP …
LDAP: … LDAP is disabled in config/gitlab.yml
Checking LDAP … Finished
Checking GitLab App …
Git configured correctly? … yes
Database config exists? … yes
All migrations up? … yes
Database contains orphaned GroupMembers? … no
GitLab config exists? … yes
GitLab config up to date? … yes
Log directory writable? … yes
Tmp directory writable? … yes
Uploads directory exists? … yes
Uploads directory has correct permissions? … yes
Uploads directory tmp has correct permissions? … yes
Init script exists? … yes
Init script up-to-date? … yes
Projects have namespace: …
Redacted Namespaces … yes

Redacted Namespaces … yes
Redis version >= 5.0.0? … yes
Ruby version >= 2.7.2 ? … yes (2.7.2)
Git version >= 2.31.0 ? … yes (2.32.0)
Git user has default SSH configuration? … yes
Active users: … 20
Is authorized keys file accessible? … yes
GitLab configured to store new projects in hashed storage? … yes
All projects are in hashed storage? … yes
Checking GitLab App … Finished
Checking GitLab subtasks … Finished

I am running the default gitlab init script, and the gitlab.yml based on the 14-0-stable examlpe, with my own instances’ configuration.

The main thing I can see that is “out of order” is that my environment is running Postgres13, and whilst the upgrade docs say “at least Postgres12” they were probably written at the time before 13 was released; I can see in another doc that 13 may not be supported, presumably because it hadn’t been tested at the time.

Versions

  • Self-managed Gitlab, from Source - 14.0.12 (git:14-0-stable)

Here’s Gitlab Env Info:

Summary

root@git:/home/git/gitlab# sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production

System information
System: Debian 11
Current User: git
Using RVM: no
Ruby Version: 2.7.2p137
Gem Version: 3.1.4
Bundler Version:2.1.4
Rake Version: 13.0.3
Redis Version: 6.0.16
Git Version: 2.32.0
Sidekiq Version:5.2.9
Go Version: go1.13.5 linux/amd64

GitLab information
Version: 14.0.12
Revision: def69dde9f1
Directory: /home/git/gitlab
DB Adapter: PostgreSQL
DB Version: 13.14
URL: https://git.micron21.com
HTTP Clone URL: https://git.micron21.com/some-group/some-project.git
SSH Clone URL: git@git.micron21.com:some-group/some-project.git
Using LDAP: no
Using Omniauth: no

GitLab Shell
Version: 13.19.1
Repository storage paths:

  • default: /home/git/repositories
    GitLab Shell path: /home/git/gitlab-shell
    Git: /usr/local/bin/git

Versions

Conclusion

I still have a few leads to follow up on:

  • bundle exec rake db:migrate:status might have leads for me, based on another issue
  • The fact that I don’t seem to have a cron that corresponds to the sidekiq queue doesn’t make sense to me; see if I can find out more about the relatoinship there.

but please, if there’s anything you can do to point me in the right direction, I will be grateful.

In what has become my classic Gitlab Forum Experience™, I will probably end up solving my own problem but there is a clue here:

It says that:

Due to an issue where BatchedBackgroundMigrationWorkers were not working for self-managed instances, a fix was created that requires an update to at least 14.0.5. The fix was also released in 14.1.0.

After you update to 14.0.5 or a later 14.0 patch version, batched background migrations must finish before you update to a later version.

If the migrations are not finished and you try to update to a later version, you see an error like:

Expected batched background migration for the given configuration to be marked as 'finished', but it is 'active':

See how to resolve this error.

and this pretty adequately describes exactly the issue I’m having:
Looks like we missed that the Database::BatchedBackgroundMigrationWorker is configured to run only on GitLab.com, which means these migrations are present on self-managed, but not being processed.

However it looks liker this only affected versions 14.0.0 to 14.0.3, and I’m on 14.0.12 (according to Run batched migrations on self-managed instances (!65106) · Merge requests · GitLab.org / GitLab · GitLab, this was fixed).

Clicking “see how to resolve this error” it links to:

But this page doesn’t contain any commands that work in my 14.0 environment; most of the issues seem to be about jobs that are “pending” or “failed” but nothing about active jobs that just don’t seem to exist.

I figure that there should be some sidekiq jobs that exist for these migrations, but I can’t find them.

From the console

irb(main):008:0> puts Gitlab::Database::BackgroundMigrationJob.inspect
Gitlab::Database::BackgroundMigrationJob(id: integer, created_at: datetime_with_timezone, updated_at: datetime_with_timezone, status: integer, class_name: text, arguments: jsonb)
=> nil
irb(main):009:0> puts Gitlab::Database::Migrations.inspect
Gitlab::Database::Migrations
=> nil
irb(main):010:0> puts Gitlab::Database::BackgroundMigration.inspect
Gitlab::Database::BackgroundMigration
=> nil

There are no jobs.

I am not skilled enough at rails console to explore here without some really detailed documentation, which I don’t have or can’t find.

I can see that there is a pending migration:

irb(main):027:0> puts Gitlab::Database::BackgroundMigrationJob.pending.inspect
#<ActiveRecord::Relation [#<Gitlab::Database::BackgroundMigrationJob id: 2, created_at: "2024-04-10 20:40:35.017234000 +1000", updated_at: "2024-04-10 20:40:35.017234000 +1000", status: "pending", class_name: "BackfillJiraTrackerDeploymentType2", arguments: [1, 2]>]>
=> nil

but that’s not one of the migrations listed in the Active queue, so I’m at a bit of a loss as to what that’s about. It’s not even listed in the sidekiq queues:

Reverse engineering some of the commands in other documentation, I’ve managed to get Rails Console to show me the money shot:

irb(main):032:0> puts Gitlab::Database::BackgroundMigration::BatchedMigration.active.inspect
#<ActiveRecord::Relation [#<Gitlab::Database::BackgroundMigration::BatchedMigration id: 1, created_at: "2024-04-12 17:34:20.201170000 +1000", updated_at: "2024-04-12 17:34:20.298863000 +1000", min_value: 1, max_value: 21326, batch_size: 15000, sub_batch_size: 100, interval: 120, status: "active", job_class_name: "CopyColumnUsingBackgroundMigrationJob", batch_class_name: "PrimaryKeyBatchingStrategy", table_name: "ci_builds_metadata", column_name: "id", job_arguments: [["build_id"], ["build_id_convert_to_bigint"]], total_tuple_count: 21316, pause_ms: 100>, #<Gitlab::Database::BackgroundMigration::BatchedMigration id: 2, created_at: "2024-04-12 17:34:20.400520000 +1000", updated_at: "2024-04-12 17:34:20.421145000 +1000", min_value: 1, max_value: 31225, batch_size: 20000, sub_batch_size: 500, interval: 120, status: "active", job_class_name: "CopyColumnUsingBackgroundMigrationJob", batch_class_name: "PrimaryKeyBatchingStrategy", table_name: "events", column_name: "id", job_arguments: [["id"], ["id_convert_to_bigint"]], total_tuple_count: 11131, pause_ms: 100>, #<Gitlab::Database::BackgroundMigration::BatchedMigration id: 3, created_at: "2024-04-12 17:34:20.483079000 +1000", updated_at: "2024-04-12 17:34:20.505923000 +1000", min_value: 1, max_value: 31225, batch_size: 20000, sub_batch_size: 2500, interval: 120, status: "active", job_class_name: "CopyColumnUsingBackgroundMigrationJob", batch_class_name: "PrimaryKeyBatchingStrategy", table_name: "push_event_payloads", column_name: "event_id", job_arguments: [["event_id"], ["event_id_convert_to_bigint"]], total_tuple_count: 5751, pause_ms: 100>, #<Gitlab::Database::BackgroundMigration::BatchedMigration id: 4, created_at: "2024-04-12 17:34:20.574158000 +1000", updated_at: "2024-04-12 17:34:20.595754000 +1000", min_value: 1, max_value: 24825, batch_size: 20000, sub_batch_size: 2000, interval: 120, status: "active", job_class_name: "CopyColumnUsingBackgroundMigrationJob", batch_class_name: "PrimaryKeyBatchingStrategy", table_name: "ci_job_artifacts", column_name: "id", job_arguments: [["id", "job_id"], ["id_convert_to_bigint", "job_id_convert_to_bigint"]], total_tuple_count: 24796, pause_ms: 100>, #<Gitlab::Database::BackgroundMigration::BatchedMigration id: 7, created_at: "2024-04-12 17:34:20.780212000 +1000", updated_at: "2024-04-12 17:34:20.800244000 +1000", min_value: 1, max_value: 27732, batch_size: 20000, sub_batch_size: 250, interval: 120, status: "active", job_class_name: "CopyColumnUsingBackgroundMigrationJob", batch_class_name: "PrimaryKeyBatchingStrategy", table_name: "ci_builds", column_name: "id", job_arguments: [["id", "stage_id"], ["id_convert_to_bigint", "stage_id_convert_to_bigint"]], total_tuple_count: 27722, pause_ms: 100>, #<Gitlab::Database::BackgroundMigration::BatchedMigration id: 10, created_at: "2024-04-12 17:34:21.203480000 +1000", updated_at: "2024-04-12 17:34:21.224169000 +1000", min_value: 1, max_value: 174, batch_size: 15000, sub_batch_size: 100, interval: 120, status: "active", job_class_name: "CopyColumnUsingBackgroundMigrationJob", batch_class_name: "PrimaryKeyBatchingStrategy", table_name: "taggings", column_name: "id", job_arguments: [["id", "taggable_id"], ["id_convert_to_bigint", "taggable_id_convert_to_bigint"]], total_tuple_count: 157, pause_ms: 100>, #<Gitlab::Database::BackgroundMigration::BatchedMigration id: 11, created_at: "2024-04-12 17:34:21.456566000 +1000", updated_at: "2024-04-12 17:34:21.477595000 +1000", min_value: 1, max_value: 4660, batch_size: 20000, sub_batch_size: 1000, interval: 120, status: "active", job_class_name: "CopyColumnUsingBackgroundMigrationJob", batch_class_name: "PrimaryKeyBatchingStrategy", table_name: "deployments", column_name: "id", job_arguments: [["deployable_id"], ["deployable_id_convert_to_bigint"]], total_tuple_count: 4658, pause_ms: 100>, #<Gitlab::Database::BackgroundMigration::BatchedMigration id: 13, created_at: "2024-04-12 17:34:24.568487000 +1000", updated_at: "2024-04-12 17:34:24.594253000 +1000", min_value: 1, max_value: 18841, batch_size: 20000, sub_batch_size: 1000, interval: 120, status: "active", job_class_name: "CopyColumnUsingBackgroundMigrationJob", batch_class_name: "PrimaryKeyBatchingStrategy", table_name: "ci_stages", column_name: "id", job_arguments: [["id"], ["id_convert_to_bigint"]], total_tuple_count: 18131, pause_ms: 100>, #<Gitlab::Database::BackgroundMigration::BatchedMigration id: 14, created_at: "2024-04-12 17:34:25.562007000 +1000", updated_at: "2024-04-12 17:34:25.583007000 +1000", min_value: 1, max_value: 21326, batch_size: 15000, sub_batch_size: 100, interval: 120, status: "active", job_class_name: "CopyColumnUsingBackgroundMigrationJob", batch_class_name: "PrimaryKeyBatchingStrategy", table_name: "ci_builds_metadata", column_name: "id", job_arguments: [["id"], ["id_convert_to_bigint"]], total_tuple_count: 21316, pause_ms: 100>]>
=> nil

One other MR listed a manual work around, linked as : Manual workaround is documented at Upgrade GitLab | GitLab
but that link is no longer current, and I’m not sure which version of the document archives contains that link. Parking that idea for now.

Here’s the pending job also

irb(main):041:0> puts Gitlab::Database::BackgroundMigrationJob.pending[0].pretty_inspect
#<Gitlab::Database::BackgroundMigrationJob:0x000055e0e48b0a90
 id: 2,
 created_at: Wed, 10 Apr 2024 20:40:35.017234000 AEST +10:00,
 updated_at: Wed, 10 Apr 2024 20:40:35.017234000 AEST +10:00,
 status: "pending",
 class_name: "BackfillJiraTrackerDeploymentType2",
 arguments: [1, 2]>
=> nil

So maybe the pending job is causing the background migrations to wait? But this pending job is not in the sidekiq queue. How can I clear this, presuming it’s holding everything up?

So, after seemingly exhausting all documented options, I just bullied on with the upgrade; ignoring the documented upgrade path of 14.3.8, I’ve instead upgraded to the next minor version 14-1-stable (14.1.8)

There were no issues (I expected db:migrate task to fail) and I successfully started the services.

Migrations are now seemingly running.

Here’s the finding; It was the missing Cron all along;

14.0.Z doesn’t have this cron in sidekiq: batched_background_migrations_worker

14.1.8 does. This is why batched background migrations simply weren’t running - there was nothing to run them!

Mystery solved. I hope this thread helps someone out there <3.

1 Like

Interesting, since in my case the cron job is there

I’m using docker image gitlab/gitlab-ce:14.0.12-ce.0

Are your migrations stuck at 0%?

Admitedly different architecture; from-source installation might have been missing a back-port that was baked into the docker image, so your mileage may vary: some of the troubleshooting in my first few posts might help you - root cause for mine was definitely the missing CRON, but the other commands didn’t help me at all because of that. If you have the CRON, then those commands might sort you out.

It turns out that my problem was my lack of understanding of batched background migration. I wrote a script to automate the upgrade process from v12.6.2 to v17.2.1, but the script only waits for the “legacy” background migration to finish. When it’s done, the script immediately runs the next version, causing the error.

Yeah, tricky naming convention, I think batched background migrations only got introduced around this point in time. I’m assuming that since you’ve worked this out, you’re in the clear? If so, nice work!