We were upgrading our self-managed instance of Omnibus GitLab ee from release 13.9.1 to the latest version, according to upgrade docs, I’ve selected the upgrade path 13.9.1
→ 13.12.15
→ 14.0.12
→ 14.10.0
The first two steps were seamless, after upgrading to 14.0.12 I’ve waited for background migrations to finish (14 migration were displayed as finished in admin console). Just in case, I’ve waited over the weekend before continuing with the upgrade, but still got the following error:
Recipe: gitlab::database_migrations
* ruby_block[check remote PG version] action nothing (skipped due to action :nothing)
* rails_migration[gitlab-rails] action run
* bash[migrate gitlab-rails database] action run
================================================================================
Error executing action `run` on resource 'bash[migrate gitlab-rails database]'
================================================================================
Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Command execution failed. STDOUT/STDERR suppressed for sensitive resource
Resource Declaration:
---------------------
suppressed sensitive resource output
Compiled Resource:
------------------
suppressed sensitive resource output
System Info:
------------
chef_version=15.17.4
platform=ubuntu
platform_version=20.04
ruby=ruby 2.7.5p203 (2021-11-24 revision f69aeb8314) [x86_64-linux]
program_name=/opt/gitlab/embedded/bin/chef-client
executable=/opt/gitlab/embedded/bin/chef-client
================================================================================
Error executing action `run` on resource 'rails_migration[gitlab-rails]'
================================================================================
Mixlib::ShellOut::ShellCommandFailed
------------------------------------
bash[migrate gitlab-rails database] (/opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/resources/rails_migration.rb line 16) had an error: Mixlib::ShellOut::ShellCommandFailed: Command execution failed.
STDOUT/STDERR suppressed for sensitive resource
Resource Declaration:
---------------------
# In /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/recipes/database_migrations.rb
51: rails_migration "gitlab-rails" do
52: rake_task 'gitlab:db:configure'
53: logfile_prefix 'gitlab-rails-db-migrate'
54: helper migration_helper
55:
56: environment env_variables
57: dependent_services dependent_services
58: notifies :run, "execute[clear the gitlab-rails cache]", :immediately
59: notifies :run, "ruby_block[check remote PG version]", :immediately
60:
61: only_if { migration_helper.attributes_node['auto_migrate'] }
62: end
Compiled Resource:
------------------
# Declared in /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/recipes/database_migrations.rb:51:in `from_file'
rails_migration("gitlab-rails") do
action [:run]
default_guard_interpreter :default
declared_type :rails_migration
cookbook_name "gitlab"
recipe_name "database_migrations"
rake_task "gitlab:db:configure"
logfile_prefix "gitlab-rails-db-migrate"
helper "*sensitive value suppressed*"
environment "*sensitive value suppressed*"
dependent_services []
only_if { #code block }
end
System Info:
------------
chef_version=15.17.4
platform=ubuntu
platform_version=20.04
ruby=ruby 2.7.5p203 (2021-11-24 revision f69aeb8314) [x86_64-linux]
program_name=/opt/gitlab/embedded/bin/chef-client
executable=/opt/gitlab/embedded/bin/chef-client
After manually running db migrate task with gitlab-rake db:migrate --trace
I got the following error:
** Execute db:migrate
== 20220322071127 FinalizeProjectNamespacesBackfill: migrating ================
rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:
Expected batched background migration for the given configuration to be marked as 'finished', but it is 'paused': {:job_class_name=>"ProjectNamespaces::BackfillProjectNamespaces", :table_name=>:projects, :column_name=>:id, :job_arguments=>[nil, "up"]}
Finalize it manualy by running
sudo gitlab-rake gitlab:background_migrations:finalize[ProjectNamespaces::BackfillProjectNamespaces,projects,id,'[null\,"up"]']
I’ve rushed to troubleshoot and tried to fix this using method, suggested in this post (which might be the cause?)
After that, even though gitlab-rails runner -e production 'puts Gitlab::BackgroundMigration.remaining'
is now returns 0, I can still see 13 background migrations that are still active in the admin console, and in the database as well:
job_class_name | table_name | column_name | job_arguments | status
--------------------------------------------------+---------------------------------+-------------+------------------------------------------------------------------------------------+--------
BackfillMemberNamespaceForGroupMembers | members | id | [] | 1
BackfillIssueSearchData | issues | id | [] | 1
BackfillNamespaceIdForNamespaceRoute | routes | id | [] | 1
NullifyOrphanRunnerIdOnCiBuilds | ci_builds | id | [] | 1
BackfillGroupFeatures | namespaces | id | [10000] | 1
BackfillWorkItemTypeIdForIssues | issues | id | [0, 1] | 1
BackfillWorkItemTypeIdForIssues | issues | id | [1, 2] | 1
BackfillWorkItemTypeIdForIssues | issues | id | [2, 3] | 1
BackfillWorkItemTypeIdForIssues | issues | id | [3, 4] | 1
BackfillWorkItemTypeIdForIssues | issues | id | [4, 5] | 1
BackfillUserNamespace | namespaces | id | [] | 1
MigratePersonalNamespaceProjectMaintainerToOwner | members | id | [] | 1
CopyColumnUsingBackgroundMigrationJob | ci_builds | id | [["id", "stage_id"], ["id_convert_to_bigint", "stage_id_convert_to_bigint"]] | 3
CopyColumnUsingBackgroundMigrationJob | taggings | id | [["id", "taggable_id"], ["id_convert_to_bigint", "taggable_id_convert_to_bigint"]] | 3
CopyColumnUsingBackgroundMigrationJob | ci_stages | id | [["id"], ["id_convert_to_bigint"]] | 3
CopyColumnUsingBackgroundMigrationJob | ci_builds_metadata | id | [["id"], ["id_convert_to_bigint"]] | 3
ProjectNamespaces::BackfillProjectNamespaces | projects | id | [null, "up"] | 3
BackfillIntegrationsTypeNew | integrations | id | [] | 3
CopyColumnUsingBackgroundMigrationJob | ci_sources_pipelines | id | [["source_job_id"], ["source_job_id_convert_to_bigint"]] | 3
CopyColumnUsingBackgroundMigrationJob | ci_build_needs | id | [["build_id"], ["build_id_convert_to_bigint"]] | 3
CopyColumnUsingBackgroundMigrationJob | ci_builds_runner_session | id | [["build_id"], ["build_id_convert_to_bigint"]] | 3
CopyColumnUsingBackgroundMigrationJob | ci_build_trace_chunks | id | [["build_id"], ["build_id_convert_to_bigint"]] | 3
CopyColumnUsingBackgroundMigrationJob | deployments | id | [["deployable_id"], ["deployable_id_convert_to_bigint"]] | 3
CopyColumnUsingBackgroundMigrationJob | geo_job_artifact_deleted_events | id | [["job_artifact_id"], ["job_artifact_id_convert_to_bigint"]] | 3
CopyColumnUsingBackgroundMigrationJob | ci_builds_metadata | id | [["build_id"], ["build_id_convert_to_bigint"]] | 3
CopyColumnUsingBackgroundMigrationJob | events | id | [["id"], ["id_convert_to_bigint"]] | 3
CopyColumnUsingBackgroundMigrationJob | push_event_payloads | event_id | [["event_id"], ["event_id_convert_to_bigint"]] | 3
CopyColumnUsingBackgroundMigrationJob | ci_job_artifacts | id | [["id", "job_id"], ["id_convert_to_bigint", "job_id_convert_to_bigint"]] | 3
(28 rows)
I’ve waited a day, but all of those 13 background migrations are still stuck with the active
state and doesn’t seem to progress
Checked the Sidekiq production log - it contains a lot of similar messages about missing column “on_hold_until”:
{
"severity": "WARN",
"time": "2022-04-26T09:02:16.823Z",
"retry": 0,
"queue": "cronjob:database_batched_background_migration",
"version": 0,
"queue_namespace": "cronjob",
"args": [],
"class": "Database::BatchedBackgroundMigrationWorker",
"jid": "860fbeca81a9cc19fe23821e",
"created_at": "2022-04-26T09:02:16.787Z",
"meta.caller_id": "Cronjob",
"correlation_id": "12cb00086e5be14876e0dd50450f8ed2",
"meta.feature_category": "database",
"worker_data_consistency": "always",
"idempotency_key": "resque:gitlab:duplicate:cronjob:database_batched_background_migration:592d9619e1997b640b70ce6a22f6713bc7793bb7a4e342b7380d90b691fcd6ae",
"enqueued_at": "2022-04-26T09:02:16.788Z",
"job_size_bytes": 2,
"pid": 665690,
"message": "Database::BatchedBackgroundMigrationWorker JID-860fbeca81a9cc19fe23821e: fail: 0.024996 sec",
"job_status": "fail",
"scheduling_latency_s": 0.007553,
"redis_calls": 1,
"redis_duration_s": 0.001408,
"redis_read_bytes": 131,
"redis_write_bytes": 360,
"redis_cache_calls": 1,
"redis_cache_duration_s": 0.001408,
"redis_cache_read_bytes": 122,
"redis_cache_write_bytes": 35,
"redis_queues_read_bytes": 9,
"redis_queues_write_bytes": 325,
"db_count": 1,
"db_write_count": 0,
"db_cached_count": 0,
"db_replica_count": 0,
"db_primary_count": 1,
"db_main_count": 1,
"db_main_replica_count": 0,
"db_replica_cached_count": 0,
"db_primary_cached_count": 0,
"db_main_cached_count": 0,
"db_main_replica_cached_count": 0,
"db_replica_wal_count": 0,
"db_primary_wal_count": 0,
"db_main_wal_count": 0,
"db_main_replica_wal_count": 0,
"db_replica_wal_cached_count": 0,
"db_primary_wal_cached_count": 0,
"db_main_wal_cached_count": 0,
"db_main_replica_wal_cached_count": 0,
"db_replica_duration_s": 0,
"db_primary_duration_s": 0.01,
"db_main_duration_s": 0.01,
"db_main_replica_duration_s": 0,
"cpu_s": 0.008069,
"mem_objects": 1277,
"mem_bytes": 140160,
"mem_mallocs": 407,
"mem_total_bytes": 191240,
"duration_s": 0.024996,
"completed_at": "2022-04-26T09:02:16.821Z",
"load_balancing_strategy": "primary",
"error_message": "PG::UndefinedColumn: ERROR: column \"on_hold_until\" does not exist\nLINE 1: ...ched_background_migrations\".\"status\" IN (1)) AND (on_hold_un...\n ^\n",
"error_class": "ActiveRecord::StatementInvalid",
"error_backtrace": [
"lib/gitlab/database/load_balancing/connection_proxy.rb:99:in `block in read_using_load_balancer'",
"lib/gitlab/database/load_balancing/load_balancer.rb:112:in `block in read_write'",
"lib/gitlab/database/load_balancing/load_balancer.rb:172:in `retry_with_backoff'",
"lib/gitlab/database/load_balancing/load_balancer.rb:110:in `read_write'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:98:in `read_using_load_balancer'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:47:in `select_all'",
"lib/gitlab/database/background_migration/batched_migration.rb:79:in `active_migration'",
"app/workers/database/batched_background_migration/single_database_worker.rb:66:in `active_migration'",
"app/workers/database/batched_background_migration/single_database_worker.rb:48:in `block in perform'",
"app/workers/database/batched_background_migration/single_database_worker.rb:47:in `perform'",
"lib/gitlab/sidekiq_middleware/duplicate_jobs/strategies/until_executing.rb:16:in `perform'",
"lib/gitlab/sidekiq_middleware/duplicate_jobs/server.rb:8:in `call'",
"lib/gitlab/application_context.rb:93:in `block in use'",
"lib/gitlab/application_context.rb:93:in `use'",
"lib/gitlab/sidekiq_middleware/worker_context/server.rb:17:in `block in call'",
"lib/gitlab/application_context.rb:93:in `use'",
"lib/gitlab/application_context.rb:44:in `with_context'",
"lib/gitlab/sidekiq_middleware/worker_context/server.rb:15:in `call'",
"lib/gitlab/sidekiq_status/server_middleware.rb:7:in `call'",
"lib/gitlab/sidekiq_versioning/middleware.rb:9:in `call'",
"lib/gitlab/sidekiq_middleware/query_analyzer.rb:7:in `block in call'",
"lib/gitlab/database/query_analyzer.rb:46:in `within'",
"lib/gitlab/sidekiq_middleware/query_analyzer.rb:7:in `call'",
"lib/gitlab/sidekiq_middleware/admin_mode/server.rb:14:in `call'",
"lib/gitlab/sidekiq_middleware/instrumentation_logger.rb:9:in `call'",
"lib/gitlab/sidekiq_middleware/batch_loader.rb:7:in `call'",
"lib/gitlab/sidekiq_middleware/extra_done_log_metadata.rb:7:in `call'",
"lib/gitlab/sidekiq_middleware/request_store_middleware.rb:10:in `block in call'",
"lib/gitlab/with_request_store.rb:17:in `enabling_request_store'",
"lib/gitlab/with_request_store.rb:10:in `with_request_store'",
"lib/gitlab/sidekiq_middleware/request_store_middleware.rb:9:in `call'",
"lib/gitlab/sidekiq_middleware/server_metrics.rb:74:in `block in call'",
"lib/gitlab/sidekiq_middleware/server_metrics.rb:97:in `block in instrument'",
"lib/gitlab/metrics/background_transaction.rb:33:in `run'",
"lib/gitlab/sidekiq_middleware/server_metrics.rb:97:in `instrument'",
"lib/gitlab/sidekiq_middleware/server_metrics.rb:73:in `call'",
"lib/gitlab/sidekiq_middleware/monitor.rb:10:in `block in call'",
"lib/gitlab/sidekiq_daemon/monitor.rb:49:in `within_job'",
"lib/gitlab/sidekiq_middleware/monitor.rb:9:in `call'",
"lib/gitlab/sidekiq_middleware/size_limiter/server.rb:13:in `call'",
"lib/gitlab/sidekiq_logging/structured_logger.rb:21:in `call'"
],
"db_duration_s": 0.010023
}
I was able to successfully finalize one of the broken migrations using suggested command from the error log - gitlab-rake gitlab:background_migrations:finalize[...]
, but I’m not sure whether it does actually run the migration manually, or just force sets it’s state to finished
Should I finalize the rest of the stuck migrations using this command and reconfigure, or rather rollback to 14.0.12
and use different upgrade path? If so, what path should I use?