Upgrade to GitLab 13.4.0 (b0481767fe4) killed all repositories

@brodock anything else I can help with lemme know.
not sure what to add to the issue there, I see that everything I could add there is already there

isn’t backup-etc doing just that ?!

if you copied what’s inside /etc/gitlab you are good, but the secrets are not part of the backup bundle, for security reasons

I do

gitlab-ctl backup-etc
gitlab-backup create

I understand first line backups the secrets and second one backups the repos, wikis, database…

You are right, with backup-etc this should be covered. We have two separate issues here, one is that the storage migration was left in inconsistent state when an exception occurred, which is not what we want to happen, so this should be fixed and I will investigate it on the original issue.

The other is how you got your system in a situation where OpenSSL::Cipher::CipherError triggered. I’ve created another issue to follow up on an idea on how we could prevent that: https://gitlab.com/gitlab-org/gitlab/-/issues/262040

@arhi did you see any error on the logs regarding those two? We’ve seen during the hashed storage attempt on gitlab.com a few cases where the repositories had wrong permissions and because of that the migration scripts couldn’t move them from one folder to another.

the migration (in my case) failed with cipher error, some others had it fail with some other error, the failed migration is a big fail that should never happen… to be honest, “automated migration without explicit question to migrate is a huge fail!!!”… just like gitlab-ci will not update if it cannot make a backup I’d prevent it from doing anything if gitlab:doctor:secrets fails too. Also, “auto migration” … hm … just like you do not continue upgrade when you don’t make backup you can prevent update if there is still old style storage and request manual upgrade to hashed … now doc. states hashed will be mandatory in 14.0 so no need to migrate in 13.x why do you force it anyhow?! … in any case migration of storage system is not something I expect in the upgrade procedure :frowning: … it’s something I’ll do manually, first I’ll backup the %$#^#^+ out of the system, test if the backup is good by restoring it to a stage system, only then I’ll try to migrate the storage … doing it like this … well not the first gray hair I got nor the last but I could really skipped this stress :smiley: :smiley: :smiley:

now how this cipher error happened - no idea, we copied etc and restored backup and everything worked ok for few updates… no way to turn back time to see what got messed up so dunno what info I can add about it

nope, no errors, the migrate script was finishing without reporting any errors anywhere but the repos were not being migrated (file permissions were ok I checked that before I decided to delete .wiki to see what will happen) … I’m not a ruby person so I don’t know how this all works, I tried to find out what migration actually does but could not find the script where it is done, but it looked to me that when the crypto thing was solved the migration script returned the repo files from the hashed to the old but did not move the .wiki files for some reason so in the @hashed structure the wiki data was left alone and in the old style repo was stored but there was a .wiki “excess” in the old style … this .wiki then prevented migration of the repo so when I deleted the .wiki the repo was migrated and the old wiki data that was there from the first migration try was now joined with the repo … should be possible to track this trough the script but … I don’t do ruby

I understand your frustration, and I’m sorry this caused issues for you. Let me explain the decisions we made and why:

In 13.0 release post we warned that legacy storage was discouraged from that point ahead, and that we would automatically migrate them two releases ahead: GitLab 13.0 released with Gitaly Clusters, Epic Hierarchy on Roadmaps, and Auto Deploy to ECS | GitLab (we actually allowed 2 extra releases to be sure).

From 13.0 ahead, gitlab-rake gitlab:check would tell you you have an issue that needs to be fixed.

The rationale for the auto migration being triggered 4 releases after the first warning was that everyone should have migrated already or at least attempted and reported any exception we didn’t covered so we could fix them.

The automatic migration was intended to get everyone who either missed or that had any unmigrated repository left to hashed storage. So in a sense this was a mandatory one. (please understand that this whole migration involved a ton of work as we had to change and fix things that goes back to probably the very first versions). Having Hashed Storage allow us to fix other problems that may occur when running GitLab at scale.

for your curiosity this is what the migration does (after the multiple hops to schedule them at scale):

and related code in:

don’t get me wrong, I was soooooo pissed you cannot imagine, happened in the worse possible moment etc etc … but I do know I’m getting a mega turbo uber giga best system out there for free and I do appreciate that big time… and I do appreciate all the work you guy’s are putting in and working for a big open source project myself I had my share of angry customers and crazy decisions :smiley: … so don’t get me wrong I still think you guy’s are the best

1 Like

what I’d do (not important any more but just for the sake of argument)

  • change that it’s not deprecated in 13 and removed in 14 but removed in 13.4 so ppl know they must migrate
  • change the upgrade to 13.4 so that it checks if there is unmigrated data and fail to upgrade if there is requiring user migrate manually before upgrading to 13.4

I’m pretty sure that would prevent a whole bunch of issues and gray hairs :smiley:

gitlab-rake gitlab:check

is this something that’s run during the upgrade, I did not notice that. In any way IMHO this is something I should be seeing on /admin page right under the “update asap” warning :smiley:

thanks I’ll take a look, reading rb is not a problem when I know where to look, the structure is still bit strange to me :slight_smile:

keep up the good work :slight_smile: and thanks again

@bjelline can you provide any additional information, logs or any other error you can find and paste in: https://gitlab.com/gitlab-org/gitlab/-/issues/259605#note_423733035 ?

We still need to have you in hashed storage, so we need to figure out why you ended up in this half-migrated state.

Just checking in here. Has anyone found a workaround for the 2:NoMethodError: undefined method 'relative_path' variant of this error while we wait for an official fix?

Hi,
I tried the “sudo gitlab-rake gitlab:storage:rollback_to_legacy” and I do not get the repositories back from @hashed. I get the following message “Enqueuing rollback of 4 projects in batches of 200. Done!” . Yet I do not see any repositories in the legacy storage. I still see them in @hashed folder.

We still have the same case without a solution over here: Error on Gitlab CE version 13.6 No repository - #2 by vhristev

Nothing worked for me:

  • Clear cache
  • Migrate legacy storage to hashes
  • Clear registration tokens.

Still cannot see repos in UI but the data is on the server.