Upgrade to GitLab 13.4.0 (b0481767fe4) killed all repositories

yum update automatically upgraded gitlab-ci to 13.4
after upgrade finished no repo is available any more
I can see projects listed on the web, but clicking on any of them show “repository does not exist / create empty repository”

/var/opt/gitlab/git-data/repositories

here I can see that all my project directories are empty (I see group dir’s but nothing in them) and there’s a @hashed directory that is same size as all the other ones were so looks like 13.4 upgrade tried to migrate storage to hashed but after it moved/renamed/reorganized all the files something died and process is not finished.

How to get out of this huge trouble?!

image

[root@git ~]# find / -name repocheck.log -type f
[root@git ~]#

Same here with the Docker Container.

Seems they still exist in @hashed

sidekiq even logged the move, but dunno why it was done in the 1st place.

Seems the Update automaticly starts a storage Migration to hashed Storage

I of course did not check the job list, since I needed to get it back to work. So I did a rollback of our VM.

I unfortunately have only a month old backup :frowning: so dunno what to do now. Any way to restart the migration or to roll it back. All the files are moved to @hashed and all the repo’s now show that they are gone :frowning:

Maybe give it some time to complete, there is a counter in the monitoring dashboard thing.

See here: ‘You can monitor the progress in the Admin Area > Monitoring > Background Jobs page. On the Queues tab, you can watch the hashed_storage:hashed_storage_project_rollback queue to see how long the process will take to finish.’

I will try again, likely tomorrow.

not sure I follow the monitoring pages

busy jobs are zero so what I should be waiting ?

I’ve looked at the gitlab install from @archi - I know him IRL.

Jobs failed with Error Class OpenSSL::Cipher::CipherError

I’ve checked /etc/gitlab/gitlab-secrets.json and gitlab.yml and /var/opt/gitlab/gitlab-rails/etc/secrets.yml are present. gitlab-rake check reports no problem. gitlab-ctl reinstall doesn’t alter the secrets. Restart doesn’t help. cache:clear doesn’t help. Running hashed jobs again from rake restarts the tasks, but they fail again.

I’m out of ideas. I’m guessing that the certs fail to verify since the dirs are empty - and they skip the next step, since the data has been moved to @hashed. Still, there are no links in the database, so the data is just “hanging”…

1 Like

I have the same problem, I didn’t move to hashed storage before upgrading,
And When I want to move it now, jobs going to fail with Cipher Error.

@davodm when it crashed what state was your system in? here the files were all moved from …/repositories/groupname/projectname/* to …/repositiories/@hashed/somegarblednames and when you click on any repo on the web it show that repo does not exist

trying to revert this %#%^@# back to usable state but 24 hours later and we are still not operational :frowning: … I fear we lost a month of work (backup is 1month old)

For me the HashedStorage::MigratorWorker jobs fail with

“error_message”:“2:NoMethodError: undefined method `relative_path’ for nil:NilClass.”

Is there a way to find the dead jobs and rerun them? (if we work out how to fix them?)

I tried everything that came to mind without success, I can’t revert back to old storage nor can’t get this new hashed to work, some 10+ repositories offline and a month old backup :frowning: … so trying to figure out how to move forward or backwards but for now nothing :frowning:

Now’s a great time to backup what you have on Amazon AWS S3 storage.

I host 50 GB of git repos on it for about $1.50/month via the Infrequent Access tier. If you store it in Glacier, it’ll cost 1/2 that.

I use s3fs-fuse to mount it directly to /storage/s3 and then use rsync via an hourly cronjob.

Update from me. This time I did the migration manually before updateing and that went fine. So updating and doing the auto migration seems to be the problem.

yes, it was in the queue to setup daily backups offsite … but git backups were not working ok past month as all backups are without repositories :frowning: only DB was backed up :frowning: … and I set the VM to be backed up only once a month (that’s why I have a backup that’s a month old) …

anyhow I can’t believe there is no way to somehow get this hashed directory to revert back to old storage ?!?!?!?

Did you try the rollback command in the migration documentation?

We have quite a few projects (~3600, a few will be created after hashed storage became the choice for new projects) that will need conversion.

I’m trying to get an idea of how long that will take, is there any good way to get such an idea?

Even though I know that there’s a lot of local factors, can one of you that have been through this share some numbers on how many projects they have, and how long it took?

@arhi I have the same issue as you, but I was able to solve it by the following steps.

Check the project ID.

You can get your project ID in the Admin Area > Project > [your project] > ID

Migrate to hashed storage

To migrate to hashed storage, type the gitlab-rake command as follows.

sudo gitlab-rake gitlab:storage:migrate_to_hashed ID_FROM=240 ID_TO=240

Ref: https://docs.gitlab.com/ee/administration/raketasks/storage.html#migrate-to-hashed-storage

Did I answer your question?

@steep_suzuki: If you’re asking me: NO, I had seen that point in the documentation, and except that I got the project ids from the project home page rather than through the admin area, I had done that for some of my own projects (where downtime was expected - and due to a typo, some other projects, that haven’t made anyone complain), but I’ve gotten timings from 1/16th of a second to 50 seconds pr. project. Scale that to 3600 projects and I get estimates ranging from 3-4 minutes to 2-3 days, As that is quite a wide range, I would like more knowledge/measurements/…

I don’t mind if it works, I will do it manually. And I don’t have a problem I have under 50 repos