Database Background Migration job Stuck

Hello,

yesterday, I was performing an upgrade of one of our Gitlab instances.

The intention was to go from Gitlab-ce v13.11.3-ce.0, through v13.12.15-ce.0, 14.0.12-ce.0 to 14.8.2-ce.0

After performing a routine pre-upgrade backup (Carried out manually by backing up the database, binaries in /opt/gitlab and data in /var/opt/gitlab), I successfully updated to newest 13.x, then 14.0.12 – Till this point, everything worked as expected, but after upgrading to the latest 14.8.2, the instance would not start.

Having faced such issue for the first time, I decided to roll back the changes by removing and purging all data of the Gitlab instance, reinstalling the original version v13.11.3-ce.0, and replacing the data, app and DB from the backup done beforehand.

When next I wanted to attempt the upgrade, I found the instance is failing one of the pre-upgrade steps –

According to the documentation, all background migration jobs need to have been completed.

The command gitlab-rails runner -e production 'puts Gitlab::Database::BackgroundMigrationJob.pending.count', however, returns 1.

Running the following snippet to check what background migration job is pending:

Gitlab::Database::BackgroundMigrationJob.pending.find_each do |job|
   puts "Pending job: '#{job.class_name}' with arguments #{job.arguments}"
end

returns Pending job: 'MoveContainerRegistryEnabledToProjectFeature' with arguments [1, 21]

I tried forcing the job to run by calling Gitlab::BackgroundMigration.perform(job.class_name, job.arguments) after the puts command, but that command returns nil and the job does not get removed from the job queue.

Of note should be that I did not carry these pre-upgrade checks between the incremental upgrades I did first (Prior to rolling back), but as I completely replaced the data directories after downgrading with version from before the upgrade process, this hopefully shouldn’t be the issue.

Does anyone know what the “MoveContainerRegistryEnabledToProjectFeature” job is, and whether it is safe to cancel it when the job continuously fails to run?

In the end, I managed to find out what the background job does (Its source is at https://github.com/gitlabhq/gitlabhq/blob/master/lib/gitlab/background_migratio/move_container_registry_enabled_to_project_feature.rb).

It really only runs one SQL UPDATE query that updates some fields by the content of other fields.

Okay, so, next, I checked the actual production DB and that all the new fields had correct values, which they did.

Concluding that the job did run correctly, and only failed to get marked as complete, I manually ran the Gitlab::Database::BackgroundMigrationJob.mark_all_as_succeeded('MoveContainerRegistryEnabledToProjectFeature', [from_id, to_id]) procedure (Substituting from_id and to_id by the arguments from my first post) and voila, the job is finished and I have no background migrations pending!