Upgrade 16.7.5 -> 16.9.11 fails: xpected to find a valid foreign key between p_ci_builds and ci_stages

Problem to solve

Upgrading self-managed Gitlab 16.7.10 with Postgres 14 to 16.9.11 (or 16.11.10) fails with:

StandardError: An error has occurred, all later migrations canceled:                                                                                                                                                 
                                                                                                                                                                                                                     
Expected to find a valid foreign key between p_ci_builds and ci_stages                                                                                                                                               
/opt/gitlab/embedded/service/gitlab-rails/db/post_migrate/20240205120110_add_synchronous_fk_validation_from_p_ci_builds_partitions_to_ci_stages.rb:25:in `up'                                                        
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb:33:in `block in exec_migration'                                                                            
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/query_analyzer.rb:40:in `within'                                                                                                                       
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb:30:in `exec_migration'                                                                                     
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migration_helpers/automatic_lock_writes_on_tables.rb:21:in `exec_migration'                                                                            
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/lock_retry_mixin.rb:36:in `ddl_transaction'                                                                                                 
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/runner_backoff/active_record_mixin.rb:21:in `execute_migration_in_transaction'                                                              
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:15:in `block in with_advisory_lock_connection'                                                                            
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/pg_backend_pid.rb:12:in `with_advisory_lock_connection'                                                                                     
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/db.rake:138:in `configure_database'                                                                                                                       
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/db.rake:107:in `configure_pg_databases'                                                                                                                   
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/db.rake:94:in `block (3 levels) in <top (required)>'                                                                                                      
/opt/gitlab/embedded/bin/bundle:25:in `load'                                                                                                                                                                         
/opt/gitlab/embedded/bin/bundle:25:in `<main>'  

I’ve found an MR with the migration that seems failing Add FK from p_ci_builds to ci_stages & sync validations (!143811) · Merge requests · GitLab.org / GitLab · GitLab

Unfortunately, there’s nothing there on how to fix the issue of missing FK or whatever.

Configuration

  • Self-managed 16.3.5 > 16.3.9 > 16.7.10 > 16.9.11

16.9.11 is now installed, but doesn’t function properly, even authorization is broken.

System:         Ubuntu 22.04
Proxy:          no_proxy: localhost
Current User:   git
Using RVM:      no
Ruby Version:   3.1.4p223
Gem Version:    3.5.5
Bundler Version:2.5.5
Rake Version:   13.0.6
Redis Version:  7.0.15
Sidekiq Version:7.1.6
Go Version:     unknown
Version:        16.9.11
DB Adapter:     PostgreSQL
DB Version:     14.11
Elasticsearch:  no
Geo:            no
GitLab Shell Version:        14.33.0
Gitaly default Version:      16.9.11
- default Git Version:  2.43.0
1 Like

Hello @qrkot , have you managed to omit or somehow solve this issue this time ?
We have exactly the same issue on our update path Omnibus , self -managed from 16.7.10 => 16.11.10

So far we think it can be data inconsistency , that become critical , while doing the update . As we use docker image , gitlab is not working at all, and docker container is permanently restarting.

We have found relevant errors in the logs, we have an error with the same foreign key

ERROR:  insert or update on table "ci_builds" violates foreign key constraint "fk_d3130c9a7f"
Key (commit_id)=(62717) is not present in table "ci_pipelines".
STATEMENT:  ALTER TABLE ONLY public.ci_builds
ADD CONSTRAINT fk_d3130c9a7f FOREIGN KEY (commit_id) REFERENCES public.ci_pipelines(id) ON DELETE CASCADE;

@qrkot We have detected, that our issue related with Gitlab DB inconsistency. Using the error , that we’re able to catch , we’re figured out that one of the entry
with commit_id is present in the table public.ci_builds but missed in the table ci_pipelines

Have you ever restored your DB from backup , in our case due to this data inconsistency our backup was affected , and once we restored data this FK was not created as it should be fk_d3130c9a7f

And this FK have a pure dependencies on missed commit_id

“fk_d3130c9a7f” FOREIGN KEY (commit_id) REFERENCES ci_pipelines(id) ON DELETE CASCADE