I run gitlab in a kubernetes cluster and every night the backup script creates archives of my data. I noticed a very weird behavior: in the final stage of the backup after the tarball is created there is a timeout while uploading the file to storage. This triggers a retry and eventually succeeds after 2-6 tries.
In the end all the uploads go through and I have duplicate files in object storage.
Anyone had this issue and was able to find a solution?
I’m assuming your backup script is something that doesn’t come with Gitlab by default. Please can you show the backup script commands that upload the file to your storage so we can see.
Usually, for example, if wget is being used, you need to use the -c parameter so that it continues to upload the file that it failed with in the beginning. Otherwise there will be filename.tar.1, filename.tar.2 and so on. The -c parameter means we continue with the file that partially uploaded. But without seeing the backup script, I cannot say if it is this situation or not.
The -C - allows you to continue a download/upload which has failed - at least according to the man page. It’s worked for me when testing a download via a HTTP/HTTPS url. Don’t forget the additional space and minus sign after -C as this is what tells it where to continue from.
Were those two backup runs done manually or also via cron? If manually, then perhaps something is happening during the night, with the kubernetes cluster or network that might be causing your retry issues. Worth checking out as seems strange otherwise.
Well, the problem still remains. You are probably right, that there is something going on with my servers at night that causes the slow or dropped connections.