I run Gitlab on a Kubernetes cluster with the backup cronjob enabled to run once a day.
A few days ago, the cronjob started failing continuously. I ran the backup command manually in the task-runner and it breaks while backing up the uploads with the following message;
CommandException: 1 files/objects could not be copied/removed.
The list of files being backed up is absolutely enormous and the error does not give any indication about WHICH file it’s failing to copy, or why.
I have now added --skip uploads
to the backup job for the time being, but that is not a solution.
The uploads are stored in a Google Cloud Storage bucket.
This error started happening suddenly. One day everything was fine, the next - this. Nothing changed in any configuration.
How can I figure out which file is failing here? And does this mean there is a reference to a non-existing file somewhere in gitlab, or that the file itself cannot actually be read?