Upgrade to GitLab 13.4.0 (b0481767fe4) killed all repositories

@steep_suzuki: If you’re asking me: NO, I had seen that point in the documentation, and except that I got the project ids from the project home page rather than through the admin area, I had done that for some of my own projects (where downtime was expected - and due to a typo, some other projects, that haven’t made anyone complain), but I’ve gotten timings from 1/16th of a second to 50 seconds pr. project. Scale that to 3600 projects and I get estimates ranging from 3-4 minutes to 2-3 days, As that is quite a wide range, I would like more knowledge/measurements/…

I don’t mind if it works, I will do it manually. And I don’t have a problem I have under 50 repos

Nope. That, again, fails with "OpenSSL::Cipher::CipherError: "

well you did answer but your answer does not work for me :frowning:

e.g.

[root@git backups]# gitlab-rake gitlab:storage:migrate_to_hashed ID_FROM=25 ID_TO=25
Enqueueing storage migration of elco/5kwtenk (ID=25)...
 Done!
[root@git backups]# cd /var/opt/gitlab/git-data/repositories/elco
[root@git elco]# pwd
/var/opt/gitlab/git-data/repositories/elco
[root@git elco]# ls -la
total 16
drwxr-sr-x.  4 git root 4096 Sep 23 12:14 .
drwxrws---. 10 git root 4096 Sep 24 13:56 ..
drwxr-sr-x.  6 git root 4096 Sep 23 03:27 knowledgeBase.git
drwxr-s---.  7 git root 4096 Sep 23 09:22 knowledgeBase.wiki.git
[root@git elco]#

image

what do you have in git-data/repositories for that id240 ?

[root@git arhi]# pwd
/var/opt/gitlab/git-data/repositories/arhi
[root@git arhi]# ls -la
total 8
drwxr-sr-x.  2 git root 4096 Sep 23 03:27 .
drwxrws---. 10 git root 4096 Sep 24 13:56 ..
[root@git arhi]#

if there is no data in the old storage dir … looks like this cipher fails 'cause dir is empty

looks like we solved it.

we deleted registration tokens https://docs.gitlab.com/ce/raketasks/backup_restore.html#reset-runner-registration-tokens

and then gitlab-rake gitlab:storage:migrate_to_hashed works :smiley: (so far so good)

5 Likes

Resolved according to https://docs.gitlab.com/ee/raketasks/backup_restore.html#when-the-secrets-file-is-lost

My repositories where half-migrated: the files on disk were in the new location, but in the database they were still marked legacy projects.

I wrote a little script that goes through the hash storage, looks at the config-file, reads out the legacy path from the config file, and moves the directory back to it’s old posision.

THIS IS A VERY UNSAVE SCRIPT!

DO NOT USE THIS UNLESS YOU ARE SURE YOU HAVE THE SAME PROBLEM!

Dir.chdir 'data/git-data/repositories/'

puts "=== find hashed project and fix them ==="
i = 0
Dir.glob("@hashed/*/*/*.git/config").each do |configfile|
  puts configfile
  next if configfile =~ /wiki/
  stem = configfile.gsub('.git/config', '')
  config = File.read(configfile)
  if config =~ /fullpath = (.*)/
    legacypath = $1
    puts "move #{stem} to #{legacypath}"
    if File.exists?(stem + ".wiki.git")

      if File.exists?("#{legacypath}.wiki.git")
            puts "rm -rf data/git-data/repositories/#{legacypath}.wiki.git"
            `rm -rf #{legacypath}.wiki.git`
      end
      puts "  mv #{stem}.wiki.git to #{legacypath}.wiki.git"
      File.rename( "#{stem}.wiki.git", "#{legacypath}.wiki.git")
    end

    if File.exists?(stem + ".git")
            i+=1
      puts "  mv #{stem}.git to #{legacypath}.git"
      File.rename( "#{stem}.git", "#{legacypath}.git")
    end
  else
      puts "no config file, skipping #{stem}"
  end
#   break if i>30
end
puts "=== done ==="

most of ppl in the topic had exactly the same problem. the problem was fixed by

  1. delete registration tokens as explained on the backup_restore.html
  2. restart the migration (rerun migrate_to_hashed script)

hm … wiki’s are not back :frowning:

repo with wiki:

gitlab-rake gitlab:storage:migrate_to_hashed ID_FROM=8 ID_TO=8

supposedly worked ok

but is still on the old storage format and wiki not available :frowning:

Hi arhi,

thank you for the information. I didn’t notice it because after upgrade (12.10.14 -> 13.0.12 -> 13.4.1) I checked only the web dashboard. Beside this, why did you do it (question for understanding) because it seems that hashed repository has become “the rule” https://docs.gitlab.com/ee/administration/repository_storage_types.html#hashed-storage ?

thank you
cheers
Stefano

solved the repo with wiki by deleting wiki dir
so

rm -rf /var/opt/gitlab/git-data/repositories/arhi/repo.wiki.git
gitlab-rake gitlab:storage:migrate_to_hashed ID_FROM=8 ID_TO=8

this migrated it to hashed and restored the wiki (from who knows where)

Not sure I understand. Why did I do what?

@bjelline were you able to resolve this problem on your server? My migration jobs are also failing with:

Gitlab::Git::CommandError: 2:NoMethodError: undefined method `relative_path’ for nil:NilClass.

I can’t seem to figure out what’s causing it.

I created an issue for this bug on the gitlab issue tracker here.

I’m still looking for some way to get access to my repos again. I don’t use runners, so I can’t imagine that clearing secrets out of the database as suggested above will do me much good.

I have not used any runners neither but that solved the problem

Interesting. I went ahead and gave it a try. Unfortunately, I’m getting the same error. :frowning: I think since we saw different error messages in the Sidekiq logs, we’re experiencing different issues.

Hi all, I will look at all the information you all provided here and track the problem in https://gitlab.com/gitlab-org/gitlab/-/issues/259605. This is a high-priority issue to get fixed as it shouldn’t have caused problems in the first place. The way the migration was coded makes it very very hard to loose data, so for those of you who have it in an inconsistent state, it may be the case that the database has flagged the storage as migrated, but the repositories are still located on the legacy storage format, or the opposite, you have the repositories migrated but database for some reason failed. In any case, it’s possible to get it back to normal.

Please follow the issue to get notified of any solution (we will probably have a patch release for 14.4.x with a fix, but I will also provide instructions on how to manually fix, so you don’t have to wait)

The OpenSSL::Cipher::CipherError means some encrypted data in the database couldn’t be read with existing keys in /etc/gitlab/gitlab-secrets.json. The only reason I can think of this happening is when you have GitLab installed in either HA or sort-of HA where you are running on multiple nodes to spread the load. In that scenario, if you have sidekiq in a different node and you have forgotten to copy the secrets there, you may endup with this kind of issue.

@arhi Could you please provide additional insights here: https://gitlab.com/gitlab-org/gitlab/-/issues/259605#note_423592490 ?/

For those of you having issues related with OpenSSL::Cipher::CipherError please look at the documentation here: https://docs.gitlab.com/ee/administration/raketasks/doctor.html#verify-database-values-can-be-decrypted-using-the-current-secrets