How does GitLab ensure that a generated backup archive embodies a clean state of the application?

This issue was first published at serverfault.com/questions/976709/how-does-gitlab-ensure-that-a-generated-backup-archive-embodies-a-clean-state-of.

When you ask a running GitLab instance to generate a full backup archive with the gitlab-rake gitlab:backup:create command :

  • Does GitLab perform anything to freeze the application state ?
  • Is there any risk to get a technically working backup generated that embodies an inconsistent state?

In detail:

  • What happens when new commits are pushed while the backup is being generated?
  • Generally speaking, if any modification is initiated during the backup what can happen?
  • Is there any cache that queues changes to apply to the database or to write to files/repositories?

At the moment I have no idea what happens when you archive a repository being modified or when a backup is done on a database running transactions?


I read through the backup code of GitLab today gitlab.com/gitlab-org/gitlab-ce/tree/master/lib/backup but could not find any hint to my questions. I do not code with Ruby so that doesn’t help me…

GitLab just run the tar command on the files to backup.

In the GitLab documentation docs.gitlab.com/ee/raketasks/backup_restore.html#backup-strategy-option it is stated that:

When data changes while tar is reading it, the error file changed as we read it may occur, and will cause the backup process to fail. To combat this, 8.17 introduces a new backup strategy called copy. The strategy copies data files to a temporary location before calling tar and gzip, avoiding the error.

The STRATEGY=copy argument makes gitlab-rake gitlab:backup:create run a rsync -a command to copy all files before creating the archive with tar.

In my understanding of the documentation it is stated that by using the copy strategy GitLab will never produce a technically corrupted archive and will never fail creating it. I assume this strategy ensures that the archive generated is restorable but what about the consistency state of the data?

Can we make sure the backup archive embodies a consistent/clean snapshot state of the GitLab instance?

I can not find any information in the documentation in this regard.


I do want to backup GitLab with no interruption.

I know I could stop GitLab for a few seconds and snapshot the LVM volume or filesystem instead of using the integrated backup mechanism but I do not want to interrupt GitLab.

You can run a backup of GitLab, interrupting all services but the postgresql one, so no modification can occur while backing up with the integrated mechanism of GitLab, but still you have to black out the service to your users for some time.


Bonus: My questions applies also on snapshotting the LVM volume or filesystem!

The backup rake task backs up your:

  • Database
  • Attachments
  • Git repositories data
  • CI/CD job output logs
  • CI/CD job artifacts
  • LFS objects
  • Container Registry images
  • GitLab Pages content

The backup process creates background jobs in Sidekiq. Sidekiq also queues migrations. This avoids having to “freeze” the application state, require downtime, or risk an inconsistent state in the backups.
If the commits/modifications/changes make it in before the queued backup job for that repository is started, the commits will be backed up. If the queued background job executes before the commits are made, they will not be part of the backup.

Can we make sure the backup archive embodies a consistent/clean snapshot state of the GitLab instance?

Yes. This is default for all the items listed above. The only exception would be if there was an error executing the backup rake task and it does not complete successfully. This would be clearly indicated in the terminal output.

Note that GitLab’s backup rake task does not back up any configuration files, secrets, SSL certificates, or system files. Your database contains encrypted information and storing encrypted information along with its key in the same place defeats the purpose of using encryption.

At the very minimum, you must separately backup:

  • /etc/gitlab/gitlab-secrets.json
  • /etc/gitlab/gitlab.rb

You can find the details in our documentation on storing configuration files in addition to rake task backups.

Regarding filesystem and/or LVM-level snapshots: this is a viable alternative to the rake task backup which is used for backups by a number of GitLab users. I’m not sure of all the technical details regarding how these methods ensure a consistent/clean snapshot state, but I’d point you to the official documentation for the backup/snapshot method you’re considering as the answer will be the same whether or not GitLab is in the picture.

1 Like