Can I recover from this upgrade path?

We have 2 simple self-managed air-gapped gitlab servers, a test and one for actual usage.

  • Omnibus, using all the embedded components
  • update method is to bring over the rpms and apply per online documentation for this environment
  • ee edition, premium level support
  • RHEL 8.10
  • gitlab and OS are required to be updated for vulnerabilities at minimum every 30 days

The SA, in response to patching requests for the past 3 months, followed this upgrade path on both test and for-use servers:
16.11.10 => 17.3.5 => 17.4.2

He followed the online upgrade helper application, which he says recommended the 17.3.5 to 17.4.2 in October.
Though he skipped what is now the .z version between the 17 step upgrades, both instances were working.

This month, he continued the path thusly for the test server:
16.11.10 => 17.3.5 => 17.4.2 => 17.5.2
Yes, he did skip the 17.4.4.

Though he said he got no errors from the installer, the gitlab instance failed to start.

I’d like some suggestions, ideas, advice, speculation, or experience. :slightly_smiling_face: before I start the rebuild the test environment.

Is it more likely that skipping the zed version of 17.3 (17.3.7), or skipping the zed of 17.4 (17.4.4) caused a problem or omission with the db tables that caused the 17.5.2 install to fail to start? Or could that cryptic “Potential PostgreSQL index corruption when upgrading OS - glibc locale data compatibility” posted on the upgrade path calculator, as monthly kernel updates usually update the glibc libraries as well?

The SA’s attempted rollback makes it impossible to determine much from the test system. I would appreciate any input that will allow me to save some time and determine how to avoid a problem with the “for use” system, which is working at 17.4.2. Thank you!

The current upgrade path suggest:

16.11.0 → 17.5.2 → latest

I would say if it isn’t working it’s most likely that they didn’t wait for background migrations to finish before starting the next upgrade on the upgrade path.

Skipping a patch release (where only the third part of the version number changes) never causes problems. I don’t remember if the documentation says so explicitly, but at both the documentation and advice given here does so uimplicitly by only mentioning one patch release for the minor versions entioned. So it’s not at all likely that skipping either 17.3.7 or 17.4.4 (if you were on - or going to - those minor versions) would cause any problems.

You haven’t given us many options to determine what is wrong with your instance, so I would have to guess on the same reason as @iwalker, i.e. that they didn’t wait for background migrations.

And to address the subject: Of course you can recover from it, but it’s probably going to be slower (and in the text you mention “save some time”) than starting from a backup - and you won’t be able to get much help.

Many thanks! I would love to think that it was that easy, but we didn’t have any background migrations on this. Also do not have an instance of runner associated.

Thanks! I am sorry not to be able to provide more info, I know it would help. If I come up with a firm reason after rebuilding from backup, I’ll follow-on post.