This issue was first published at serverfault.com/questions/976709/how-does-gitlab-ensure-that-a-generated-backup-archive-embodies-a-clean-state-of.
When you ask a running GitLab instance to generate a full backup archive with the
gitlab-rake gitlab:backup:create command :
- Does GitLab perform anything to freeze the application state ?
- Is there any risk to get a technically working backup generated that embodies an inconsistent state?
- What happens when new commits are pushed while the backup is being generated?
- Generally speaking, if any modification is initiated during the backup what can happen?
- Is there any cache that queues changes to apply to the database or to write to files/repositories?
At the moment I have no idea what happens when you archive a repository being modified or when a backup is done on a database running transactions?
I read through the backup code of GitLab today gitlab.com/gitlab-org/gitlab-ce/tree/master/lib/backup but could not find any hint to my questions. I do not code with Ruby so that doesn’t help me…
GitLab just run the
tar command on the files to backup.
In the GitLab documentation docs.gitlab.com/ee/raketasks/backup_restore.html#backup-strategy-option it is stated that:
When data changes while tar is reading it, the error file changed as we read it may occur, and will cause the backup process to fail. To combat this, 8.17 introduces a new backup strategy called copy. The strategy copies data files to a temporary location before calling tar and gzip, avoiding the error.
STRATEGY=copy argument makes
gitlab-rake gitlab:backup:create run a
rsync -a command to copy all files before creating the archive with
In my understanding of the documentation it is stated that by using the
copy strategy GitLab will never produce a technically corrupted archive and will never fail creating it. I assume this strategy ensures that the archive generated is restorable but what about the consistency state of the data?
Can we make sure the backup archive embodies a consistent/clean snapshot state of the GitLab instance?
I can not find any information in the documentation in this regard.
I do want to backup GitLab with no interruption.
I know I could stop GitLab for a few seconds and snapshot the LVM volume or filesystem instead of using the integrated backup mechanism but I do not want to interrupt GitLab.
You can run a backup of GitLab, interrupting all services but the
postgresql one, so no modification can occur while backing up with the integrated mechanism of GitLab, but still you have to black out the service to your users for some time.
Bonus: My questions applies also on snapshotting the LVM volume or filesystem!