Can't upgrade 17.3.x (db migration fails, duplicate keys)

I’m currently running a self-hosted gitlab 17.3.7 (was 17.3.3 at first) omnibus install on almalinux 9.4.

Right now I fail to raise version to anything higher that that, because some of the new db migrations fail during upgrade. So far I tried it from 17.3.3/.5/.7, it always happens.

The basic error is “duplicate Key”, that a new database key ist already existing(?).

Relevant excerpt of the error message might be:


[…]
Caused by:
PG::UniqueViolation: ERROR: could not create unique index “idx_sbom_components_on_name_purl_type_component_type_and_org_id”
DETAIL: Key (name, purl_type, component_type, organization_id)=(camelcase, 6, 0, 1) is duplicated.
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migration_helpers.rb:122:in block in add_concurrent_index' /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migrations/timeout_helpers.rb:31:in disable_statement_timeout’
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migration_helpers.rb:121:in add_concurrent_index' /opt/gitlab/embedded/service/gitlab-rails/db/migrate/20240730203748_replace_sbom_components_unique_index.rb:11:in up’
[…]


This is the .rb file, what the migration is trying to do:
https://gitlab-org.gitlab.io/gitlab/coverage-ruby/#930731778137cf20ccf50cbeb9ad0a0affbf8cb7

Only documention regarding this I found so far, is a general hint that this might happen when upgrading from 17.3 (GitLab 17 changes | GitLab), but the documented issue there was regarding a different migration and database key and the solution presented won’t help in my specific case resp. migration and key.

Soo, why is this happening?
I haven’t found this specific migration to be problematic anywhere, so maybe what makes my installation unique in this?

And most importantly: how exactly can I fix this?
Is there a more general way to solve this (better than fiddling with specific keys)?

Ok, fyi: we were able to fix this.
In the end it wasn’t an upgrade issue, but a db migration problem, resp. some faulty(?) database entries.

Specific problem was an invalid index:

gitlabhq_production=# SELECT indexrelname, schemaname, relname, indisvalid FROM pg_stat_all_indexes allind JOIN pg_index ind ON ind.indexrelid = allind.indexrelid AND indisvalid = 'f';
                          indexrelname                           | schemaname |     relname     | indisvalid
-----------------------------------------------------------------+------------+-----------------+------------
 idx_sbom_components_on_name_purl_type_component_type_and_org_id | public     | sbom_components | f

And basic reason for this were about a dozen of duplicate entries in the sbom_components table, that instead should have been UNIQUE for the index. E.g. like:

gitlabhq_production=# SELECT * FROM sbom_components WHERE name = 'is-typed-array';
   id   |          created_at           |          updated_at           | component_type |      name      | purl_type | organization_id 
--------+-------------------------------+-------------------------------+----------------+----------------+-----------+-----------------
 464053 | 2023-11-21 12:31:43.818607+00 | 2024-05-06 08:17:52.938053+00 |              0 | is-typed-array |         6 |               1
 583544 | 2024-06-27 12:51:37.722351+00 | 2024-08-14 06:57:15.374549+00 |              0 | is-typed-array |         6 |               1

I am not sure how this could have happened.
Our DB guys said, they had never seen an invalid index, and that “this better should never happen when using the CONCURRENTLY strategy”.

My solution was to DROP the index and then DELETE all lower IDs of the duplicate entries.
Probably one should never do this in the database itself, but that was our only option going forward.
There were references from other tables to these entries and maybe I broke SBOM now. :man_shrugging:.

Even if this was our solution, generally it may be better to “don’t do this at home”…