Startup loop after 14.1.4 -> 14.2.0 upgrade (debian, omnibus)

Yes, yet another one of these posts.

I am using gitlab-ce on debian buster using the gitlab repository/omnibus. While using apt my installation was upgraded from 14.0.x to the latest version (14.2.3). As a result I only got HTTP/502 on the WebUI. I rolled back and manuall< installed all incremental upgrades which narrowed the problem down to the upgrade 14.1.4 → 14.2.0.

After the Upgrade I’ve got the following envigonment:

 Ruby:         ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
 GitLab:       14.2.0 (d678b7c987f) FOSS
 GitLab Shell: 13.19.1
 PostgreSQL:   13.3

(Postgres 12.x was manually updated in this test, 12.x showed the same error)

gitlab-ctl reconfigure, gitlab-ctl upgrade and gitlab-rake db:migrate all finish without errors, gitlab-ctl status also doesn’t show any errors. gitlab-rake db:migrate:status confirms all migrations to be “up”. As said WebUI is responding but stuck at 502. I see puma and sidekiq both running at 100% CPU and changing PIDs, so probably crashing and respawning. I tail -f’d all *log-files in /var/log/gitlab and did only get some startup messages, but no errors.

Looking for background tasks using gitlab-rails runner -e production 'puts Gitlab::BackgroundMigration.remaining' (or any other gitlab-rails-command like gitlab-rails console) yields:
[...]file /opt/gitlab/embedded/service/gitlab-rails/lib/migrate.rb to define constant Migrate, but didn't (Zeitwerk::NameError).

I’m not quite sure where to go from here. As far as I can tell all migrations should be completed, so why exactly is rake crashing at the migration stage? Any ideas or tips where I can look next?`

Thanks
Florian

2 Likes

We resolved a similar issue be going 14.1.5 → 14.2.0, is there a reason you went 14.1.4 straight to 14.2 when you were trying incremental upgrades?

As I’m having the similaire issues trying to upgrade 14.1.* → 14.2.* (I’ve tried all versions), I can confirm that 14.1.5 → 14.2.3 still leads to this last migration error, preventing Gitlab from being usable.
So I’m currently stucked at 14.1.5 as I haven’t been able to find a workaround anywhere.
I did also faced other migration issues that indeed are now resolved in the latest 14.2 release.

==> /var/log/gitlab/puma/current <==
2021-09-13_14:57:55.66244 {"timestamp":"2021-09-13T14:57:55.662Z","pid":487464,"message":"! Unable to load application: Zeitwerk::NameError: expected file /opt/gitlab/embedded/service/gitlab-rails/lib/migration.rb to define constant Migration, but didn't"}
2021-09-13_14:57:55.66260 bundler: failed to load command: puma (/opt/gitlab/embedded/bin/puma)
2021-09-13_14:57:55.66268 Zeitwerk::NameError: expected file /opt/gitlab/embedded/service/gitlab-rails/lib/migration.rb to define constant Migration, but didn't
2021-09-13_14:57:55.66269   /opt/gitlab/embedded/lib/ruby/gems/2.7.0/gems/zeitwerk-2.4.2/lib/zeitwerk/loader/callbacks.rb:18:in `on_file_autoloaded'
2021-09-13_14:57:55.66269   /opt/gitlab/embedded/lib/ruby/gems/2.7.0/gems/zeitwerk-2.4.2/lib/zeitwerk/kernel.rb:27:in `block in require'

Hm, didn’t see 14.1.5 in the Debian repo back then

I could solve the problem by completely purging the package and reinstalling Gitlab 14.1.4. Since the data itself wasn’t touched I didn’t loose any repositories or configuration. After this the upgrade to 14.2 worked as expected.

apt remove gitlab-ce
apt purge gitlab-ce
rm -Rv /opt/gitlab/
apt install gitlab-ce=14.1.4-ce.0
gitlab-ctl reconfigure
apt full-upgrade
gitlab-ctl reconfigure
1 Like

Thanks for your input, it really helped me solved it too.
I did the same steps as yours, expect I didn’t do “apt purge” and it worked too.

The secret seems to be hidden in aboves rm -Rv /opt/gitlab/.

I ran into a similar issue on centos7 omnibus, upgrading from 4.0.11 to latest 4.4.1. (/var/log/gitlab/puma/current: “Unable to load application: Zeitwerk::NameError: expected file /opt/gitlab/embedded/service/gitlab-rails/lib/migration.rb to define constant Migration, but didn’t”).

While comparing, a fresh install does not seem to have this /opt/gitlab/embedded/service/gitlab-rails/lib/migration.rb, so I simply removed that file and voila: puma`s constant restart stopped, the 502 vanished and I could access the UI again.

Looks like gitlab’s files are not cleaned up properly?

@schild mine doesn’t have that file, although my instance on Debian works perfectly fine and has also upgraded through all 14.x releases. I generally though upgrade as soon as a new one is released. It more looks like that the file didn’t get deleted maybe because not all of the migrations in the file had finished and been applied to the databases. I’m not sure what the end effect of that will be though as unsure which migrations didn’t finish, or whether you might have problems in the future if something is expecting a certain migration to be complete or relies on something particular on the database that a previous migration should have applied.

If it wasn’t cleaned up properly, then I would also have the file on my system, so I can only come to the conclusion that it existed due to unfinished migrations, or some failed. The log files though should give a hint to this if that was the case, or the output on screen during the upgrade itself would hint at any failed migrations. If none, then all should be good.

I absolutely agree that it is a harsh move to remove that file and that there is risk left, that it is still needed for any kind of migration.

As I understand it, in 14.0 gitlab switched(?) migration processes to what is called “background migrations” which is much more visible for admins ( see /admin/background_migrations). So maybe that migration.rb became truly useless.
I did let those background migrations finish after upgrading to 14.0.

However, as mentioned above, when upgrading beyond >14.2, that file seems to become harmful, as it prevents puma from launching (see error msg above).
My migration.rb is dated from 2019. I’m not really sure, but I think at that time gitlab still used unicorn and puma not even existed? Which also leans to the fact, that it is an old, forgotten and uncleaned component.

This is not a common problem, right? It seems to only affects very old installations?
I am still worried, that deleting that file damages my instalIation somehow.
And I can’t really believe, that this slipped gitlab teams attention (can’t find a suitable bug report either).

My system has been running since 2017, so can’t really say. Not sure if that file disappeared from 12.x or earlier. I’m unable to confirm if it existed on my system at some point and disappeared later when upgraded. The background migrations appeared in the webgui, but it was still monitorable via the console on earlier versions to see if background migrations needed to finish after an upgrade prior to continuing the next upgrade. I remember this from upgrading from 12.9.x to 13.x and later to 14.x and still use this command since I’m used to this on the console now, rather than flicking between console and the gui.

Unicorn was replaced by puma either in 13.x or 14.x. I didn’t have any unicorn-specific config in my gitlab.rb so that didn’t cause me any issues, but it did do to some, since then the gitlab.rb needed to be changed and updated to use puma-related settings instead, or just hash everything related to unicorn so that it would just use the Gitlab defaults for puma anyway without any additional tuning. But that’s for displaying the web application anyway.

I haven’t seen anyone post about it before, so sounds uncommon. Whether it wasn’t cleaned up after a prior upgrade or not, I can only guess. If more and more people were experiencing it, then yes a bug report would need to be filed. I’ve been on the forum for over a year, and not seen anyone post about it yet. You’re the first :slight_smile:

The good thing is you are up and running anyway, so perhaps it won’t be anything serious. If you still have the content of the migrations.rb file, it might be worth checking it anyway. Or perhaps raise an issue about it on gitlab so that the gitlab team can add some input on whether there is anything additional you might need to do.

Aah, geez, I remember now.

Do you guys, who had problems with this migration.rb, remember to ever have tried to do an LDAP migration?
Maybe following this forum article?:

I think this is where my offending file comes from.
It was just there to add a function to the console, placed by myself. I just forgot.

So simply removing it, should be a very fine solution.

2 Likes

@schild nice to know :slight_smile:

I don’t use LDAP, so this would explain why, and also that I didn’t attempt to add functionality with a migration.rb file. Thx for the info, this could help someone who might experience this in the future as well.