Failed to upgrade GitLab ee 14.0.12 -> 14.10.0

We were upgrading our self-managed instance of Omnibus GitLab ee from release 13.9.1 to the latest version, according to upgrade docs, I’ve selected the upgrade path 13.9.113.12.1514.0.1214.10.0
The first two steps were seamless, after upgrading to 14.0.12 I’ve waited for background migrations to finish (14 migration were displayed as finished in admin console). Just in case, I’ve waited over the weekend before continuing with the upgrade, but still got the following error:

Recipe: gitlab::database_migrations
  * ruby_block[check remote PG version] action nothing (skipped due to action :nothing)
  * rails_migration[gitlab-rails] action run
    * bash[migrate gitlab-rails database] action run
      
      ================================================================================
      Error executing action `run` on resource 'bash[migrate gitlab-rails database]'
      ================================================================================
      
      Mixlib::ShellOut::ShellCommandFailed
      ------------------------------------
      Command execution failed. STDOUT/STDERR suppressed for sensitive resource
      
      Resource Declaration:
      ---------------------
      suppressed sensitive resource output
      
      Compiled Resource:
      ------------------
      suppressed sensitive resource output
      
      System Info:
      ------------
      chef_version=15.17.4
      platform=ubuntu
      platform_version=20.04
      ruby=ruby 2.7.5p203 (2021-11-24 revision f69aeb8314) [x86_64-linux]
      program_name=/opt/gitlab/embedded/bin/chef-client
      executable=/opt/gitlab/embedded/bin/chef-client

    
    ================================================================================
    Error executing action `run` on resource 'rails_migration[gitlab-rails]'
    ================================================================================
    
    Mixlib::ShellOut::ShellCommandFailed
    ------------------------------------
    bash[migrate gitlab-rails database] (/opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/resources/rails_migration.rb line 16) had an error: Mixlib::ShellOut::ShellCommandFailed: Command execution failed. 
STDOUT/STDERR suppressed for sensitive resource
    
    Resource Declaration:
    ---------------------
    # In /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/recipes/database_migrations.rb
    
     51: rails_migration "gitlab-rails" do
     52:   rake_task 'gitlab:db:configure'
     53:   logfile_prefix 'gitlab-rails-db-migrate'
     54:   helper migration_helper
     55: 
     56:   environment env_variables
     57:   dependent_services dependent_services
     58:   notifies :run, "execute[clear the gitlab-rails cache]", :immediately
     59:   notifies :run, "ruby_block[check remote PG version]", :immediately
     60: 
     61:   only_if { migration_helper.attributes_node['auto_migrate'] }
     62: end
    
    Compiled Resource:
    ------------------
    # Declared in /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/recipes/database_migrations.rb:51:in `from_file'
    
    rails_migration("gitlab-rails") do
      action [:run]
      default_guard_interpreter :default
      declared_type :rails_migration
      cookbook_name "gitlab"
      recipe_name "database_migrations"
      rake_task "gitlab:db:configure"
      logfile_prefix "gitlab-rails-db-migrate"
      helper "*sensitive value suppressed*"
      environment "*sensitive value suppressed*"
      dependent_services []
      only_if { #code block }
    end
    
    System Info:
    ------------
    chef_version=15.17.4
    platform=ubuntu
    platform_version=20.04
    ruby=ruby 2.7.5p203 (2021-11-24 revision f69aeb8314) [x86_64-linux]
    program_name=/opt/gitlab/embedded/bin/chef-client
    executable=/opt/gitlab/embedded/bin/chef-client

After manually running db migrate task with gitlab-rake db:migrate --trace I got the following error:

** Execute db:migrate
== 20220322071127 FinalizeProjectNamespacesBackfill: migrating ================
rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:

Expected batched background migration for the given configuration to be marked as 'finished', but it is 'paused':	{:job_class_name=>"ProjectNamespaces::BackfillProjectNamespaces", :table_name=>:projects, :column_name=>:id, :job_arguments=>[nil, "up"]}

Finalize it manualy by running

	sudo gitlab-rake gitlab:background_migrations:finalize[ProjectNamespaces::BackfillProjectNamespaces,projects,id,'[null\,"up"]']

I’ve rushed to troubleshoot and tried to fix this using method, suggested in this post (which might be the cause?)
After that, even though gitlab-rails runner -e production 'puts Gitlab::BackgroundMigration.remaining' is now returns 0, I can still see 13 background migrations that are still active in the admin console, and in the database as well:

                  job_class_name                  |           table_name            | column_name |                                   job_arguments                                    | status 
--------------------------------------------------+---------------------------------+-------------+------------------------------------------------------------------------------------+--------
 BackfillMemberNamespaceForGroupMembers           | members                         | id          | []                                                                                 |      1
 BackfillIssueSearchData                          | issues                          | id          | []                                                                                 |      1
 BackfillNamespaceIdForNamespaceRoute             | routes                          | id          | []                                                                                 |      1
 NullifyOrphanRunnerIdOnCiBuilds                  | ci_builds                       | id          | []                                                                                 |      1
 BackfillGroupFeatures                            | namespaces                      | id          | [10000]                                                                            |      1
 BackfillWorkItemTypeIdForIssues                  | issues                          | id          | [0, 1]                                                                             |      1
 BackfillWorkItemTypeIdForIssues                  | issues                          | id          | [1, 2]                                                                             |      1
 BackfillWorkItemTypeIdForIssues                  | issues                          | id          | [2, 3]                                                                             |      1
 BackfillWorkItemTypeIdForIssues                  | issues                          | id          | [3, 4]                                                                             |      1
 BackfillWorkItemTypeIdForIssues                  | issues                          | id          | [4, 5]                                                                             |      1
 BackfillUserNamespace                            | namespaces                      | id          | []                                                                                 |      1
 MigratePersonalNamespaceProjectMaintainerToOwner | members                         | id          | []                                                                                 |      1
 CopyColumnUsingBackgroundMigrationJob            | ci_builds                       | id          | [["id", "stage_id"], ["id_convert_to_bigint", "stage_id_convert_to_bigint"]]       |      3
 CopyColumnUsingBackgroundMigrationJob            | taggings                        | id          | [["id", "taggable_id"], ["id_convert_to_bigint", "taggable_id_convert_to_bigint"]] |      3
 CopyColumnUsingBackgroundMigrationJob            | ci_stages                       | id          | [["id"], ["id_convert_to_bigint"]]                                                 |      3
 CopyColumnUsingBackgroundMigrationJob            | ci_builds_metadata              | id          | [["id"], ["id_convert_to_bigint"]]                                                 |      3
 ProjectNamespaces::BackfillProjectNamespaces     | projects                        | id          | [null, "up"]                                                                       |      3
 BackfillIntegrationsTypeNew                      | integrations                    | id          | []                                                                                 |      3
 CopyColumnUsingBackgroundMigrationJob            | ci_sources_pipelines            | id          | [["source_job_id"], ["source_job_id_convert_to_bigint"]]                           |      3
 CopyColumnUsingBackgroundMigrationJob            | ci_build_needs                  | id          | [["build_id"], ["build_id_convert_to_bigint"]]                                     |      3
 CopyColumnUsingBackgroundMigrationJob            | ci_builds_runner_session        | id          | [["build_id"], ["build_id_convert_to_bigint"]]                                     |      3
 CopyColumnUsingBackgroundMigrationJob            | ci_build_trace_chunks           | id          | [["build_id"], ["build_id_convert_to_bigint"]]                                     |      3
 CopyColumnUsingBackgroundMigrationJob            | deployments                     | id          | [["deployable_id"], ["deployable_id_convert_to_bigint"]]                           |      3
 CopyColumnUsingBackgroundMigrationJob            | geo_job_artifact_deleted_events | id          | [["job_artifact_id"], ["job_artifact_id_convert_to_bigint"]]                       |      3
 CopyColumnUsingBackgroundMigrationJob            | ci_builds_metadata              | id          | [["build_id"], ["build_id_convert_to_bigint"]]                                     |      3
 CopyColumnUsingBackgroundMigrationJob            | events                          | id          | [["id"], ["id_convert_to_bigint"]]                                                 |      3
 CopyColumnUsingBackgroundMigrationJob            | push_event_payloads             | event_id    | [["event_id"], ["event_id_convert_to_bigint"]]                                     |      3
 CopyColumnUsingBackgroundMigrationJob            | ci_job_artifacts                | id          | [["id", "job_id"], ["id_convert_to_bigint", "job_id_convert_to_bigint"]]           |      3
(28 rows)

I’ve waited a day, but all of those 13 background migrations are still stuck with the active state and doesn’t seem to progress
Checked the Sidekiq production log - it contains a lot of similar messages about missing column “on_hold_until”:

{
  "severity": "WARN",
  "time": "2022-04-26T09:02:16.823Z",
  "retry": 0,
  "queue": "cronjob:database_batched_background_migration",
  "version": 0,
  "queue_namespace": "cronjob",
  "args": [],
  "class": "Database::BatchedBackgroundMigrationWorker",
  "jid": "860fbeca81a9cc19fe23821e",
  "created_at": "2022-04-26T09:02:16.787Z",
  "meta.caller_id": "Cronjob",
  "correlation_id": "12cb00086e5be14876e0dd50450f8ed2",
  "meta.feature_category": "database",
  "worker_data_consistency": "always",
  "idempotency_key": "resque:gitlab:duplicate:cronjob:database_batched_background_migration:592d9619e1997b640b70ce6a22f6713bc7793bb7a4e342b7380d90b691fcd6ae",
  "enqueued_at": "2022-04-26T09:02:16.788Z",
  "job_size_bytes": 2,
  "pid": 665690,
  "message": "Database::BatchedBackgroundMigrationWorker JID-860fbeca81a9cc19fe23821e: fail: 0.024996 sec",
  "job_status": "fail",
  "scheduling_latency_s": 0.007553,
  "redis_calls": 1,
  "redis_duration_s": 0.001408,
  "redis_read_bytes": 131,
  "redis_write_bytes": 360,
  "redis_cache_calls": 1,
  "redis_cache_duration_s": 0.001408,
  "redis_cache_read_bytes": 122,
  "redis_cache_write_bytes": 35,
  "redis_queues_read_bytes": 9,
  "redis_queues_write_bytes": 325,
  "db_count": 1,
  "db_write_count": 0,
  "db_cached_count": 0,
  "db_replica_count": 0,
  "db_primary_count": 1,
  "db_main_count": 1,
  "db_main_replica_count": 0,
  "db_replica_cached_count": 0,
  "db_primary_cached_count": 0,
  "db_main_cached_count": 0,
  "db_main_replica_cached_count": 0,
  "db_replica_wal_count": 0,
  "db_primary_wal_count": 0,
  "db_main_wal_count": 0,
  "db_main_replica_wal_count": 0,
  "db_replica_wal_cached_count": 0,
  "db_primary_wal_cached_count": 0,
  "db_main_wal_cached_count": 0,
  "db_main_replica_wal_cached_count": 0,
  "db_replica_duration_s": 0,
  "db_primary_duration_s": 0.01,
  "db_main_duration_s": 0.01,
  "db_main_replica_duration_s": 0,
  "cpu_s": 0.008069,
  "mem_objects": 1277,
  "mem_bytes": 140160,
  "mem_mallocs": 407,
  "mem_total_bytes": 191240,
  "duration_s": 0.024996,
  "completed_at": "2022-04-26T09:02:16.821Z",
  "load_balancing_strategy": "primary",
  "error_message": "PG::UndefinedColumn: ERROR:  column \"on_hold_until\" does not exist\nLINE 1: ...ched_background_migrations\".\"status\" IN (1)) AND (on_hold_un...\n                                                             ^\n",
  "error_class": "ActiveRecord::StatementInvalid",
  "error_backtrace": [
    "lib/gitlab/database/load_balancing/connection_proxy.rb:99:in `block in read_using_load_balancer'",
    "lib/gitlab/database/load_balancing/load_balancer.rb:112:in `block in read_write'",
    "lib/gitlab/database/load_balancing/load_balancer.rb:172:in `retry_with_backoff'",
    "lib/gitlab/database/load_balancing/load_balancer.rb:110:in `read_write'",
    "lib/gitlab/database/load_balancing/connection_proxy.rb:98:in `read_using_load_balancer'",
    "lib/gitlab/database/load_balancing/connection_proxy.rb:47:in `select_all'",
    "lib/gitlab/database/background_migration/batched_migration.rb:79:in `active_migration'",
    "app/workers/database/batched_background_migration/single_database_worker.rb:66:in `active_migration'",
    "app/workers/database/batched_background_migration/single_database_worker.rb:48:in `block in perform'",
    "app/workers/database/batched_background_migration/single_database_worker.rb:47:in `perform'",
    "lib/gitlab/sidekiq_middleware/duplicate_jobs/strategies/until_executing.rb:16:in `perform'",
    "lib/gitlab/sidekiq_middleware/duplicate_jobs/server.rb:8:in `call'",
    "lib/gitlab/application_context.rb:93:in `block in use'",
    "lib/gitlab/application_context.rb:93:in `use'",
    "lib/gitlab/sidekiq_middleware/worker_context/server.rb:17:in `block in call'",
    "lib/gitlab/application_context.rb:93:in `use'",
    "lib/gitlab/application_context.rb:44:in `with_context'",
    "lib/gitlab/sidekiq_middleware/worker_context/server.rb:15:in `call'",
    "lib/gitlab/sidekiq_status/server_middleware.rb:7:in `call'",
    "lib/gitlab/sidekiq_versioning/middleware.rb:9:in `call'",
    "lib/gitlab/sidekiq_middleware/query_analyzer.rb:7:in `block in call'",
    "lib/gitlab/database/query_analyzer.rb:46:in `within'",
    "lib/gitlab/sidekiq_middleware/query_analyzer.rb:7:in `call'",
    "lib/gitlab/sidekiq_middleware/admin_mode/server.rb:14:in `call'",
    "lib/gitlab/sidekiq_middleware/instrumentation_logger.rb:9:in `call'",
    "lib/gitlab/sidekiq_middleware/batch_loader.rb:7:in `call'",
    "lib/gitlab/sidekiq_middleware/extra_done_log_metadata.rb:7:in `call'",
    "lib/gitlab/sidekiq_middleware/request_store_middleware.rb:10:in `block in call'",
    "lib/gitlab/with_request_store.rb:17:in `enabling_request_store'",
    "lib/gitlab/with_request_store.rb:10:in `with_request_store'",
    "lib/gitlab/sidekiq_middleware/request_store_middleware.rb:9:in `call'",
    "lib/gitlab/sidekiq_middleware/server_metrics.rb:74:in `block in call'",
    "lib/gitlab/sidekiq_middleware/server_metrics.rb:97:in `block in instrument'",
    "lib/gitlab/metrics/background_transaction.rb:33:in `run'",
    "lib/gitlab/sidekiq_middleware/server_metrics.rb:97:in `instrument'",
    "lib/gitlab/sidekiq_middleware/server_metrics.rb:73:in `call'",
    "lib/gitlab/sidekiq_middleware/monitor.rb:10:in `block in call'",
    "lib/gitlab/sidekiq_daemon/monitor.rb:49:in `within_job'",
    "lib/gitlab/sidekiq_middleware/monitor.rb:9:in `call'",
    "lib/gitlab/sidekiq_middleware/size_limiter/server.rb:13:in `call'",
    "lib/gitlab/sidekiq_logging/structured_logger.rb:21:in `call'"
  ],
  "db_duration_s": 0.010023
}

I was able to successfully finalize one of the broken migrations using suggested command from the error log - gitlab-rake gitlab:background_migrations:finalize[...], but I’m not sure whether it does actually run the migration manually, or just force sets it’s state to finished

Should I finalize the rest of the stuck migrations using this command and reconfigure, or rather rollback to 14.0.12 and use different upgrade path? If so, what path should I use?

Hey @DrCringe,

A bunch of people are running into this problem at the time. I think this is the solution you are looking for. Hope it helps!

Cheers
Henrik

Thanks, @Th3Ph4nt0m

In my case manually running db migration with gitlab-rake db:migrate --trace again after finalizing background migration ProjectNamespaces::BackfillProjectNamespaces resolved the issue

Update / Solution:

I ran the db migration task again after finalizing background migration ProjectNamespaces::BackfillProjectNamespaces that was causing the error, and in finished successfully this time and triggered the rest of the stuck migrations. After all migrations were complete, I was able to reconfigure successfully.

To summarize, the issue was fixed with the following steps:

  • Finalize migration that caused the error manually by runing sudo gitlab-rake gitlab:background_migrations:finalize[ProjectNamespaces::BackfillProjectNamespaces,projects,id,'[null\,"up"]'], as suggested in error logs. After that the ProjectNamespaces::BackfillProjectNamespaces migration should be finished.
  • run db migration task sudo gitlab-rake db:migrate - it should complete without further errors
  • wait for the rest of the background migrations to finish
  • reconfigure gitlab to finish the upgrade - sudo gitlab-ctl reconfigure
3 Likes

Thank you, you saved me.

Thank you you saved time

Thank you! What a gem of a thread.

How do I run these instructions from a containerized Gitlab (14.3.6-ce), since the container exits after the error there’s no opportunity to exec into it?

When I run
docker compose run gitlab /bin/bash
and
gitlab-rake gitlab:background_migrations:finalize[ProjectNamespaces::BackfillProjectNamespaces,projects,id,'[null\,"up"]']
it responds

/opt/gitlab/bin/gitlab-rake error: could not load /opt/gitlab/etc/gitlab-rails-rc
Either you are not allowed to read the file, or it does not exist yet.
You can generate it with:   sudo gitlab-ctl reconfigure

So I do that and get some output ending in

    * ruby_block[reload_log_service] action create
      * ruby_block[restart_service] action nothing (skipped due to action :nothing)
      * ruby_block[restart_log_service] action nothing (skipped due to action :nothing)
      * ruby_block[reload_log_service] action nothing (skipped due to action :nothing)
      * directory[/opt/gitlab/sv/logrotate] action create (up to date)
      * template[/opt/gitlab/sv/logrotate/run] action create (up to date)
      * directory[/opt/gitlab/sv/logrotate/log] action create (up to date)
      * directory[/opt/gitlab/sv/logrotate/log/main] action create (up to date)
      * template[/opt/gitlab/sv/logrotate/log/config] action create (up to date)
      * ruby_block[verify_chown_persisted_on_logrotate] action nothing (skipped due to action :nothing)
      * link[/var/log/gitlab/logrotate/config] action create (up to date)
      * template[/opt/gitlab/sv/logrotate/log/run] action create (up to date)
      * directory[/opt/gitlab/sv/logrotate/env] action create (up to date)
      * ruby_block[Delete unmanaged env files for logrotate service] action run (skipped due to only_if)
      * template[/opt/gitlab/sv/logrotate/check] action create (skipped due to only_if)
      * template[/opt/gitlab/sv/logrotate/finish] action create (skipped due to only_if)
      * directory[/opt/gitlab/sv/logrotate/control] action create (up to date)
      * template[/opt/gitlab/sv/logrotate/control/t] action create (up to date)
      * link[/opt/gitlab/init/logrotate] action create (up to date)
      * file[/opt/gitlab/sv/logrotate/down] action nothing (skipped due to action :nothing)
      * directory[/opt/gitlab/service] action create (up to date)
      * link[/opt/gitlab/service/logrotate] action create
        - create symlink at /opt/gitlab/service/logrotate to /opt/gitlab/sv/logrotate
      * ruby_block[wait for logrotate service socket] action run

Its now stuck.