Upgrade from 16.9.2 to 16.10.0 broke Gitlab

Hello all,

I am currently using the Docker image from gitlab/gitlab-ee:latest. I had some “customization” (now I completely forgot what I did to get this “customization” to work) which allowed me to update Gitlab-EE directly in the container without deleting the container, pulling the new image, and then redeploying the container. While I was attempting to upgrade from 16.9.2 to 16.10.0, it broke with an error message saying something along the lines of

NameError: undefined local variable or method `include_optional_metrics_in_service_ping' for #<ApplicationSetting id: 1

I don’t have the entire error log anymore because I stopped my container. What I ended up doing was

  • Renamed the old container and then created a new container
  • Copied down the old container’s backup files right before all hell went loose, this includes everything in the /var/opt/gitlab/git-data folder as well
  • Copied the backup file which apparently only contains the database and backup_information.yml to the new container
  • Did a restore of the database
  • Copied the git-data folder to the new container
  • Restarted the new container
  • Everything runs ok and all services are up and running

The restore did give me all of my; projects, users, groups, commit history, Gitlab settings, etc back, but now the issue I’m having is I don’t think/know why Gitlab isn’t recognizing majority of the specific hashed commits anymore. So for instance if I try to view the contents of a file within the Gitlab UI, I get this error.

If I attempt to view a previous pipeline from before the upgrade, I get this error message.

Is there anyway to recover from this? I don’t want to have to start completely new where I have no Git commit histories for any of my projects. I did apply the proper permissions to the git-data folder I believe.

I did a

chown -R git:git /var/opt/gitlab/git-data

This is what I’m seeing.

root@gitlab:/var/opt/gitlab/git-data/repositories# ls -la
total 20
drwxr-sr-x  4 git git 4096 Mar 22 01:23 +gitaly
drwxrws---  4 git git 4096 Mar 11 11:50 .
drwx------  3 git git 4096 Dec 29 00:49 ..
-rw-------  1 git git   64 Dec 29 00:50 .gitaly-metadata
drwxr-s--- 19 git git 4096 Feb 28 18:04 @hashed
root@gitlab:/var/opt/gitlab/git-data/repositories# 

UPDATE: Ok… Weirdly enough, I can view “some” files and I can’t view others??? This makes no sense.

When I look at the error message for the old container, this is the exact error message.

[2024-03-22T05:40:28-05:00] FATAL: Stacktrace dumped to /opt/gitlab/embedded/cookbooks/cache/cinc-stacktrace.out
[2024-03-22T05:40:28-05:00] FATAL: ---------------------------------------------------------------------------------------
[2024-03-22T05:40:28-05:00] FATAL: PLEASE PROVIDE THE CONTENTS OF THE stacktrace.out FILE (above) IF YOU FILE A BUG REPORT
[2024-03-22T05:40:28-05:00] FATAL: ---------------------------------------------------------------------------------------
[2024-03-22T05:40:28-05:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: rails_migration[gitlab-rails] (gitlab::database_migrations line 51) had an error: Mixlib::ShellOut::ShellCommandFailed: bash_hide_env[migrate gitlab-rails database] (gitlab::database_migrations line 20) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of "bash"  ----
STDOUT: rake aborted!
NameError: undefined local variable or method `include_optional_metrics_in_service_ping' for #<ApplicationSetting id: 1, default_projects_limit: 100000, signup_enabled: false, gravatar_enabled: [REDACTED INFORMATION]

use_clickhouse_for_analytics: false, clickhouse: {"use_clickhouse_for_analytics"=>false}>
/opt/gitlab/embedded/service/gitlab-rails/app/models/application_setting_implementation.rb:522:in `usage_ping_features_enabled'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/current_settings.rb:31:in `method_missing'
/opt/gitlab/embedded/service/gitlab-rails/ee/app/models/gitlab_subscriptions/features.rb:364:in `features_with_usage_ping'
/opt/gitlab/embedded/service/gitlab-rails/ee/app/models/gitlab_subscriptions/features.rb:348:in `usage_ping_feature?'
/opt/gitlab/embedded/service/gitlab-rails/ee/app/models/license.rb:83:in `feature_available?'
/opt/gitlab/embedded/service/gitlab-rails/ee/lib/ee/gitlab/auth/ldap/config.rb:19:in `_available_servers'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/auth/ldap/config.rb:37:in `available_servers'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/auth/ldap/config.rb:49:in `available_providers'
/opt/gitlab/embedded/service/gitlab-rails/config/initializers/8_devise.rb:241:in `block in <top (required)>'
/opt/gitlab/embedded/service/gitlab-rails/config/initializers/8_devise.rb:7:in `<top (required)>'
/opt/gitlab/embedded/service/gitlab-rails/config/environment.rb:7:in `<top (required)>'
<internal:/opt/gitlab/embedded/lib/ruby/site_ruby/3.1.0/rubygems/core_ext/kernel_require.rb>:37:in `require'
<internal:/opt/gitlab/embedded/lib/ruby/site_ruby/3.1.0/rubygems/core_ext/kernel_require.rb>:37:in `require'
/opt/gitlab/embedded/bin/bundle:25:in `load'
/opt/gitlab/embedded/bin/bundle:25:in `<main>'
Tasks: TOP => gitlab:db:configure => environment
(See full trace by running task with --trace)
STDERR: 
---- End output of "bash"  ----
Ran "bash"  returned 1
===
There was an error running gitlab-ctl reconfigure. Please check the output above for more
details

Ok, so I finally got my old Gitlab container working now. What I ended up doing was trying to bypass the whole “container crashing” issue. The reason why the container kept crashing was because there were a lot of things missing due to the fact that the upgrade messed up the entire system.

First thing I saw was that error message from my 2nd post. That kept showing up when I would attempt to do apt-get -y update just to run the regular Linux update command. Due to this fact, Gitlab couldn’t install properly because it kept erroring out saying

NameError: undefined local variable or method 'include_optional_metrics_in_service_ping' for

Which I assume was referencing the gitlab_rails['usage_ping_enabled'] feature I enabled, but that makes no sense because the feature exists so I don’t understand why it would be erroring out. So after that, I restarted the container and cleared the logs. From then on, the container kept crashing and crashing and crashing and never recovered since.

I attempted soooooo many things such as copying over the Gitlab binary files from the “working” container to the broken container. That didn’t work at all so I attempted to try and uninstall Gitlab from the container while I had brief moments where it would come up and then crash. That didn’t work at all since it was only for that brief moment and then it crashed.

So here is my solution. What I ended up doing was I overwrote the gitlab-ctl file in both /usr/bin/ and /opt/gitlab/embedded/bin/ locations with a “Hello World” replacement. First made it executable and then copied it to both those location. Restarted the “broken” container and logged into it using the exec command. Waited for 30 seconds and this time it didn’t kick me out and crash. So from what I remembered, the gitlab-ctl file does the reconfigure which makes sense why it would crash because now the files are broken due to my multiple attempts of varies things.

So what I did next while I had the chance was did a

dpkg --configure -a

To correct any broken packages that may have been damaged in the process. After all packages were configured, the new Gitlab version was also being configured as well. This time, it kept breaking in the same exact spot in my 2nd post. What I did next was I basically disabled that feature by setting it to false. Did another reconfigure and this time it passed that spot it kept failing at. After that, it was now complaining about a new error stating some of the features in the development folder was already enabled. So what I ended up doing there was moved the whole folder somewhere else. Did a reconfigure and Gitlab finally came up for the first time ever since it was “broken”. Though this time, it was stuck at the 502 error page because Puma kept shutting off.

What I ended up doing was uninstalling Gitlab again and then deleted the /opt/gitlab folder. Then reinstalled Gitlab and bam everything came back up.

Don’t worry, the gitlab-ctl file gets overwritten whenever you do an upgrade or reinstall Gitlab so that “Hello World” method was just so I could inject an entrypoint into the container and stop it from continuously crashing.

2 Likes

Nicely done. But best to take a backup (including secrets) so that you can always recover - and avoid the angst like you went through.

I maintain a gitlab environment used by many developers, so I normally do a backup, including all secrets, before I attempt any gitlab upgrade. The process is as below:

  • deploy a new gitlab environment with the same gitlab version as the one I’m planning on upgrading
  • restore the backup from old environment and verify all is well
  • downtime begins now as I block access from all others but myself to the old environment
  • do another backup (to capture last minute changes)
  • restore to new environment
  • upgrade and verify all is well
  • change dns records to point to new environment

Obviously this is a much longer process, but it hasn’t failed me yet!

Note: I use LXC containers instead of Docker.