Problem restoring backup

I need to migrate my GitLab Enterprise server to a new server. The old one is on CentOS 6 and we’re migrating to CentOS 8. Both server are running gitlab-ee-13.3.5-ee.0.el6.x86_64 (installed with Omnibus yum/dnf). Both servers share a mounted SAN volume on /var/opt/gitlab/backups

For the backup I used the following commands and it ran without errors.
gitlab-ctl stop puma
gitlab-ctl stop sidekiq
gitlab-ctl status
gitlab-backup create SKIP=tar BACKUP=backup-date +"%Y%m%d-%H%M%S" GZIP_RSYNCABLE=yes

For the restore it’s a fresh install and I’m using the following commands.
gitlab-ctl stop unicorn
gitlab-ctl stop nginx
gitlab-ctl stop puma
gitlab-ctl stop sidekiq
gitlab-ctl status
gitlab-backup restore BACKUP=backup-20200915-160212 SKIP=tar

It doesn’t matter how long I let this run, this is the only thing that happens. I have also tried running without SKIP=tar and it takes even longer and never gets any further.
Transfering ownership of /var/opt/gitlab/gitlab-rails/shared/registry to git

Hi @mattshields74, welcome to the GitLab Community forum!

It sounds like it always hangs, never gets past, this part of the restore process: Transfering ownership of /var/opt/gitlab/gitlab-rails/shared/registry to git

Can you verify whether or not GitLab Container Registry is enabled on the old CentOS6 server and the new CentOS 8 machine?

gitlab-ctl status | grep registry

If you see gitlab-registry running on the new machine but not the old one, this could cause the restore process to fail.

If this is the case, I suggest one of the following options:

  • disable the container registry on the new CentOS8 box by setting registry['enabled'] = false in your /etc/gitlab/gitlab.rb file, run gitlab-ctl reconfigure to apply the changes, and retry the restore process
  • you can also try gitlab-backup create SKIP=registry to create a fresh backup, and use that to restore your data on the new machine.

If registry is synced (enabled on both or disabled on both), can you share the last ~10 lines of terminal output before you see the backup process hang or fails?

1 Like

I was able to disable Registry on the new server since it’s not on the old server. I then generated a new backup. But the current problem is when I kick off the restore, I cannot tell if it’s doing anything. At the moment when I ran the restore command it’s now been running for over 14 hours and so far no output. Here’s the size of the backup. Note, I’ve also tried with and without the “pigz” patch as recommended here: https://gitlab.com/gitlab-org/gitlab/-/issues/17197#note_423356310

772G artifacts.tar.gz
4.0K backup_information.yml
3.6M builds.tar.gz
815M db
12K lfs.tar.gz
19M pages.tar.gz
2.6G repositories
30M uploads.tar.gz

@mattshields74 It appears you have over 772G of artifacts on your server, I suspect that is causing the holdup. The backup create command can take a while to backup 1Tb+ of data.

To find what is taking up so much space, you can use du /var/opt/gitlab/gitlab-rails/shared/artifacts | sort -n to help locate what is taking up so much space. Just guessing, I suspect large build artifacts from old CI jobs are getting saved every time the CI pipeline is run.

If you did not cancel the backup create command manually, it is most likely still running in the background.

You can verify whether the backup is still running ps | grep gitlab-backup | grep -v grep or watch -d ls -al /var/opt/gitlab/backups/. If its still running, I suggest letting it complete.

If the backup command is not running, I suggest splitting the backup process into two processes that can run in parallel :

sudo gitlab-backup create SKIP=artifacts
and
sudo gitlab-backup create SKIP=db,uploads,repositories,builds,lfs,registry,pages

If you’d like to simplify the backup process and keep all 700Gb+ of artifacts intact, you might also consider alternative backup strategies: https://docs.gitlab.com/ee/raketasks/backup_restore.html#alternative-backup-strategies

The backup seems to run fine, it takes about 12 hours. It’s the restore that sits there and appears to do nothing. Let me try splitting the restore using the skip method.