Problem with background migration

Hello everybody,

Gitlab 14.2.7-ce Docker
PSQL 12
Can do stupid test/thing on testing env, not done on prod env of our self-hosted Gitlab

I have a problem on our self-hosted Gitlab Instance. I tried various things (update database in PSQL …) before ask help here.

Last year, I have make a stupid thing, stop a background migration task : CopyColumnUsingBackgroundMigrationJob: ci_builds.

I don’t see it. But now, I want to update to 14.3.0-ce Docker, but I get :

Summary
web_1  | Running handlers:
web_1  | There was an error running gitlab-ctl reconfigure:
web_1  | 
web_1  | Multiple failures occurred:
web_1  | * Mixlib::ShellOut::ShellCommandFailed occurred in Chef Infra Client run: rails_migration[gitlab-rails] (gitlab::database_migrations line 51) had an error: Mixlib::ShellOut::ShellCommandFailed: bash[migrate gitlab-rails database] (/opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/resources/rails_migration.rb line 16) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
web_1  | ---- Begin output of "bash"  "/tmp/chef-script20220809-28-skn5qu" ----
web_1  | STDOUT: rake aborted!
web_1  | StandardError: An error has occurred, all later migrations canceled:
web_1  | 
web_1  | Expected batched background migration for the given configuration to be marked as 'finished', but it is 'failed':	{:job_class_name=>"CopyColumnUsingBackgroundMigrationJob", :table_name=>"ci_builds", :column_name=>"id", :job_arguments=>[["id", "stage_id"], ["id_convert_to_bigint", "stage_id_convert_to_bigint"]]}
web_1  | 
web_1  | Finalize it manualy by running
web_1  | 
web_1  | 	sudo gitlab-rake gitlab:background_migrations:finalize[CopyColumnUsingBackgroundMigrationJob,ci_builds,id,'[["id"\, "stage_id"]\, ["id_convert_to_bigint"\, "stage_id_convert_to_bigint"]]']
web_1  | 
web_1  | For more information, check the documentation
web_1  | 
web_1  | 	https://docs.gitlab.com/ee/user/admin_area/monitoring/background_migrations.html#database-migrations-failing-because-of-batched-background-migration-not-finished
web_1  | /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migration_helpers.rb:1109:in `ensure_batched_background_migration_is_finished'
web_1  | /opt/gitlab/embedded/service/gitlab-rails/db/post_migrate/20210701141346_finalize_ci_builds_stage_id_bigint_conversion.rb:11:in `up'
web_1  | /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/lock_retry_mixin.rb:31:in `ddl_transaction'
web_1  | /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/db.rake:61:in `block (3 levels) in <top (required)>'
web_1  | /opt/gitlab/embedded/bin/bundle:23:in `load'
web_1  | /opt/gitlab/embedded/bin/bundle:23:in `<main>'
web_1  | 
web_1  | Caused by:
web_1  | Expected batched background migration for the given configuration to be marked as 'finished', but it is 'failed':	{:job_class_name=>"CopyColumnUsingBackgroundMigrationJob", :table_name=>"ci_builds", :column_name=>"id", :job_arguments=>[["id", "stage_id"], ["id_convert_to_bigint", "stage_id_convert_to_bigint"]]}
web_1  | 
web_1  | Finalize it manualy by running
web_1  | 
web_1  | 	sudo gitlab-rake gitlab:background_migrations:finalize[CopyColumnUsingBackgroundMigrationJob,ci_builds,id,'[["id"\, "stage_id"]\, ["id_convert_to_bigint"\, "stage_id_convert_to_bigint"]]']
web_1  | 
web_1  | For more information, check the documentation
web_1  | 
web_1  | 	https://docs.gitlab.com/ee/user/admin_area/monitoring/background_migrations.html#database-migrations-failing-because-of-batched-background-migration-not-finished
web_1  | /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migration_helpers.rb:1109:in `ensure_batched_background_migration_is_finished'
web_1  | /opt/gitlab/embedded/service/gitlab-rails/db/post_migrate/20210701141346_finalize_ci_builds_stage_id_bigint_conversion.rb:11:in `up'
web_1  | /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/lock_retry_mixin.rb:31:in `ddl_transaction'
web_1  | /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/db.rake:61:in `block (3 levels) in <top (required)>'
web_1  | /opt/gitlab/embedded/bin/bundle:23:in `load'
web_1  | /opt/gitlab/embedded/bin/bundle:23:in `<main>'
web_1  | Tasks: TOP => db:migrate
web_1  | (See full trace by running task with --trace)
web_1  | == 20210622045705 FinalizeEventsBigintConversion: migrating ===================
web_1  | -- transaction_open?()
web_1  |    -> 0.0000s
web_1  | -- index_exists?("events", :id_convert_to_bigint, {:unique=>true, :name=>"index_events_on_id_convert_to_bigint", :algorithm=>:concurrently})
web_1  |    -> 0.0066s
web_1  | -- execute("SET statement_timeout TO 0")
web_1  |    -> 0.0002s
web_1  | -- add_index("events", :id_convert_to_bigint, {:unique=>true, :name=>"index_events_on_id_convert_to_bigint", :algorithm=>:concurrently})
web_1  |    -> 0.0823s
web_1  | -- execute("RESET statement_timeout")
web_1  |    -> 0.0003s
web_1  | -- transaction_open?()
web_1  |    -> 0.0000s
web_1  | -- index_exists?("events", [:project_id, :id_convert_to_bigint], {:name=>"index_events_on_project_id_and_id_convert_to_bigint", :algorithm=>:concurrently})
web_1  |    -> 0.0049s
web_1  | -- execute("SET statement_timeout TO 0")
web_1  |    -> 0.0002s
web_1  | -- add_index("events", [:project_id, :id_convert_to_bigint], {:name=>"index_events_on_project_id_and_id_convert_to_bigint", :algorithm=>:concurrently})
web_1  |    -> 0.0722s
web_1  | -- execute("RESET statement_timeout")
web_1  |    -> 0.0002s
web_1  | -- transaction_open?()
web_1  |    -> 0.0000s
web_1  | -- index_exists?("events", [:project_id, :id_convert_to_bigint], {:order=>{:id_convert_to_bigint=>:desc}, :where=>"action = 7", :name=>"index_events_on_project_id_and_id_bigint_desc_on_merged_action", :algorithm=>:concurrently})
web_1  |    -> 0.0056s
web_1  | -- execute("SET statement_timeout TO 0")
web_1  |    -> 0.0002s
web_1  | -- add_index("events", [:project_id, :id_convert_to_bigint], {:order=>{:id_convert_to_bigint=>:desc}, :where=>"action = 7", :name=>"index_events_on_project_id_and_id_bigint_desc_on_merged_action", :algorithm=>:concurrently})
web_1  |    -> 0.0250s
web_1  | -- execute("RESET statement_timeout")
web_1  |    -> 0.0003s
web_1  | -- transaction_open?()
web_1  |    -> 0.0000s
web_1  | -- foreign_keys(:push_event_payloads)
web_1  |    -> 0.0079s
web_1  | -- execute("LOCK TABLE events, push_event_payloads IN SHARE ROW EXCLUSIVE MODE")
web_1  |    -> 0.0007s
web_1  | -- execute("ALTER TABLE push_event_payloads\nADD CONSTRAINT fk_36c74129da_tmp\nFOREIGN KEY (event_id)\nREFERENCES events (id_convert_to_bigint)\nON DELETE CASCADE\nNOT VALID;\n")
web_1  |    -> 0.0056s
web_1  | -- execute("SET statement_timeout TO 0")
web_1  |    -> 0.0002s
web_1  | -- execute("ALTER TABLE push_event_payloads VALIDATE CONSTRAINT fk_36c74129da_tmp;")
web_1  |    -> 0.0529s
web_1  | -- execute("RESET statement_timeout")
web_1  |    -> 0.0002s
web_1  | -- execute("LOCK TABLE events, push_event_payloads IN ACCESS EXCLUSIVE MODE")
web_1  |    -> 0.0002s
web_1  | -- quote_table_name("events")
web_1  |    -> 0.0000s
web_1  | -- quote_column_name(:id)
web_1  |    -> 0.0000s
web_1  | -- quote_column_name("id_tmp")
web_1  |    -> 0.0000s
web_1  | -- execute("ALTER TABLE \"events\" RENAME COLUMN \"id\" TO \"id_tmp\"")
web_1  |    -> 0.0003s
web_1  | -- quote_table_name("events")
web_1  |    -> 0.0000s
web_1  | -- quote_column_name(:id_convert_to_bigint)
web_1  |    -> 0.0000s
web_1  | -- quote_column_name(:id)
web_1  |    -> 0.0000s
web_1  | -- execute("ALTER TABLE \"events\" RENAME COLUMN \"id_convert_to_bigint\" TO \"id\"")
web_1  |    -> 0.0003s
web_1  | -- quote_table_name("events")
web_1  |    -> 0.0000s
web_1  | -- quote_column_name("id_tmp")
web_1  |    -> 0.0000s
web_1  | -- quote_column_name(:id_convert_to_bigint)
web_1  |    -> 0.0000s
web_1  | -- execute("ALTER TABLE \"events\" RENAME COLUMN \"id_tmp\" TO \"id_convert_to_bigint\"")
web_1  |    -> 0.0002s
web_1  | -- quote_table_name("trigger_69523443cc10")
web_1  |    -> 0.0000s
web_1  | -- execute("ALTER FUNCTION \"trigger_69523443cc10\" RESET ALL")
web_1  |    -> 0.0007s
web_1  | -- execute("ALTER SEQUENCE events_id_seq OWNED BY events.id")
web_1  |    -> 0.0014s
web_1  | -- change_column_default("events", :id, #<Proc:0x00007f418465b608 /opt/gitlab/embedded/service/gitlab-rails/db/post_migrate/20210622045705_finalize_events_bigint_conversion.rb:68 (lambda)>)
web_1  |    -> 0.0067s
web_1  | -- change_column_default("events", :id_convert_to_bigint, 0)
web_1  |    -> 0.0032s
web_1  | -- execute("ALTER TABLE events DROP CONSTRAINT events_pkey CASCADE")
web_1  |    -> 0.0043s
web_1  | -- rename_index("events", "index_events_on_id_convert_to_bigint", "events_pkey")
web_1  |    -> 0.0013s
web_1  | -- execute("ALTER TABLE events ADD CONSTRAINT events_pkey PRIMARY KEY USING INDEX events_pkey")
web_1  |    -> 0.0005s
web_1  | -- execute("DROP INDEX index_events_on_project_id_and_id")
web_1  |    -> 0.0007s
web_1  | -- rename_index("events", "index_events_on_project_id_and_id_convert_to_bigint", "index_events_on_project_id_and_id")
web_1  |    -> 0.0003s
web_1  | -- execute("DROP INDEX index_events_on_project_id_and_id_desc_on_merged_action")
web_1  |    -> 0.0003s
web_1  | -- rename_index("events", "index_events_on_project_id_and_id_bigint_desc_on_merged_action", "index_events_on_project_id_and_id_desc_on_merged_action")
web_1  |    -> 0.0003s
web_1  | -- quote_table_name(:push_event_payloads)
web_1  |    -> 0.0000s
web_1  | -- quote_column_name("fk_36c74129da_tmp")
web_1  |    -> 0.0000s
web_1  | -- quote_column_name("fk_36c74129da")
web_1  |    -> 0.0000s
web_1  | -- execute("ALTER TABLE \"push_event_payloads\"\nRENAME CONSTRAINT \"fk_36c74129da_tmp\" TO \"fk_36c74129da\"\n")
web_1  |    -> 0.0009s
web_1  | == 20210622045705 FinalizeEventsBigintConversion: migrated (0.3678s) ==========

Does anyone know how to reset database as before that migration or restart it without failing ? Or patching to go to 14.3.0 ?

I have tried that commands but without success :

Summary
gitlab-rake db:migrate
gitlab-psql

Seems OK (no _converted to big int)

	\d ci_sources_pipelines
	

Column big int present, but identical to id

	SELECT id, id_convert_to_bigint FROM events WHERE id != id_convert_to_bigint;
	ALTER TABLE events DROP COLUMN id_convert_to_bigint;
	

Column big int present, but identical to id

	SELECT event_id, event_id_convert_to_bigint FROM public.push_event_payloads WHERE event_id != event_id_convert_to_bigint;
	ALTER TABLE public.push_event_payloads DROP COLUMN event_id_convert_to_bigint;
	

Column big int present, but identical to id

	SELECT id, id_convert_to_bigint FROM ci_job_artifacts WHERE id != id_convert_to_bigint;
	SELECT job_id, job_id_convert_to_bigint FROM ci_job_artifacts WHERE job_id != job_id_convert_to_bigint;
	ALTER TABLE ci_job_artifacts DROP COLUMN job_id_convert_to_bigint;
	ALTER TABLE ci_job_artifacts DROP COLUMN id_convert_to_bigint;
	
	

gitlab-rake db:migrate

	
	

gitlabhq_production=# SELECT id, status FROM batched_background_migrations;
id | status
----±-------
4 | 3
5 | 3
7 | 3
6 | 3
10 | 3
13 | 3
2 | 3
3 | 3
9 | 3
11 | 3
12 | 3
14 | 3
15 | 3
16 | 3
8 | 4
(15 rows)

gitlabhq_production=# SELECT * FROM batched_background_migrations WHERE id=8;
id | created_at | updated_at | min_value | max_value | batch_size | sub_batch_size | interval | status | job_class_name | batch_class_name
| table_name | column_name | job_arguments | total_tuple_count | pause_ms
----±------------------------------±------------------------------±----------±----------±-----------±---------------±---------±-------±--------------------------------------±---------------------------
±-----------±------------±-----------------------------------------------------------------------------±------------------±---------
8 | 2021-06-29 12:18:27.059852+00 | 2022-08-09 14:22:46.464668+00 | 1 | 40541 | 20000 | 250 | 120 | 4 | CopyColumnUsingBackgroundMigrationJob | PrimaryKeyBatchingStrategy
| ci_builds | id | [[“id”, “stage_id”], [“id_convert_to_bigint”, “stage_id_convert_to_bigint”]] | 39960 | 100
(1 row)

Try to end job ? (i'm lost)

gitlabhq_production=# UPDATE batched_background_migrations SET status=3 WHERE id=8;

Thanks in advance,
Sorry for my ugly english
Have a good day,
Cordially
Thomas S.

Hello,

Anyone have idea please ?
Perhaps, should in post it as issue on Gitlab.com ?

Have a good day
Thomas S.

Hello,

Anyone have idea please ? Do you need more informations ?

Have a good day
Thomas S.

Hi,

The docs seem to give information, as it happens, similar format for the migration that you mention: Batched background migrations | GitLab

In particular from your error message in the summary of your first post, it said what you needed to run, so:

sudo gitlab-rake gitlab:background_migrations:finalize[CopyColumnUsingBackgroundMigrationJob,ci_builds,id,'[["id"\, "stage_id"]\, ["id_convert_to_bigint"\, "stage_id_convert_to_bigint"]]']

and quoted from your output:

If that command doesn’t work, you need to show the full output of what happens when you run it.

Hello :slight_smile:

Yes command failed too.

I have post log at: Output from Gitlab after passing from 14.2.7 -> 14.3.0 ($2402455) · Snippets · Snippets · GitLab

When I started Gitlab in Docker as gitlab/gitlab-ce:14.3.0-ce.0, I get the first log ( log_14.2.7_to_14.3.0.txt)

And by executing finalize background migrations (gitlab/gitlab-ce:14.2.7-ce.0 because 14.3.0 doesn’t start) - (Second log log_14.2.7_with_finalize_background_migration.txt)

Thanks in advance :smiley:
Have a good day
Cordially
Thomas S.