Horrendously-Poor S3 Performance for Backup-to-S3

I have a GitLab server I’ve deployed into AWS. It’s eventually going to host a workload that’s currently running in an on-prem environment. I’ve done a test-migration by taking an export of the current, on-prem GitLab server and then reconfiguring (and gitlab-rake gitlab:<TYPE>:migrate migration) to place all of my BLOB content to S3.

While the …:migrate tasks were pretty far from anything resembling “blazingly-fast”, it wasn’t fatally-slow for the amount of data that needed to be moved.

That said, I’ve also got backups configured to go to S3. Unfortunately, it seems like the S3 uploads of the resultant (backup) TAR-file is ungodly slow. The backup initially generates and stages a TAR-file that’s nearly 400GiB in size to local storage. After it finishes creating the locally-staged copy, it pauses at:

Creating backup archive: 1725035175_2024_08_30_17.2.2-ee_gitlab_backup.tar ...

For the first several minutes, I’d assumed it was simply doing a file-validation before bothering to upload. However, 90+ minutes into this pause, I decided to see if I could see what it was doing. Busted out netstat and found:

# netstat -4Wp | grep s3
tcp        0 3421044 ip-10-200-3-16.cirrus.aws.abinitio.com:40490 s3-1-w.amazonaws.com:https ESTABLISHED 238802/rake gitlab:

# ps -fp 238802
UID          PID    PPID  C STIME TTY          TIME CMD
git       238802  238801 31 16:25 pts/2    00:29:45 /opt/gitlab/embedded/bin/rake gitlab:backup:create STRATEGY=copy DIRECTORY=20240
#

When using the AWS CLI (with tunes in place), I would tyically see a multiple connections to S3. However, as you can see above, there’s only one. All of which is to say, in spite of the documentation seeming to indicate that GitLab’s S3 client would attempt to do multi-part uploads, it appear to only be doing a single-stream upload.

Previously, when I was trying to do a data-migration to this instance via the AWS CLI, I’d also gotten miserable performance, even with my chunking-factor and concurrency tuned. That said, it was only an hour’s worth of transit time and netstat showed the expected number of streams. Wasn’t until I reached out to AWS and they recommended setting the CLI’s preferred_transfer_client parameter’s value to crt and the target_bandwidth paramter’s value to 10GB/s that I got both the parallelism and the aggregated-throughput I was expecting to get.

At any rate… I’m not seeing it in the documentation, but, is there any methods for tuning GitLab’s S3 upload configuration similar to how the AWS CLI can be tuned? I mean, even if leveraging the C run-time uploader isn’t an option, it’d be nice if I could at least force it to max out the concurrency and set an appropriate chunking-factor (just for backups).

I mean, worst case, I can reconfigure S3-based backups out of GitLab and just execute a copy using the AWS CLI. But I was hoping to avoid that.

Host details:

  • GitLab CE: 17.2.2 (to match the migration-source)
  • Host OS: Red Hat Enterprise Linux 9.4
  • Hosting Environment: AWS
  • VM-Type: m7i.4xlarge
    • CPU: 16 (/proc/cpuinfo shows: Intel(R) Xeon(R) Platinum 8488C @ 2.4GHz)
    • RAM: 64GiB
    • Disk: 1000GiB (gp3 @ 4000 IOPS and 1000GiB/s throughput)

Not much after posting this, it finally finished. But, yeah, two-ish hours to transfer a 365GiB tar-file to S3. For comparison, if I use the AWS CLI to do the same:

  • It takes about half an hour with a concurrency of 40 and a chunking-factor of 55MiB
  • It takes less than ten minutes if I enable the AWS CLI’s common run-time s3 client

Note that, while the link is to the Java SDK’s documentation, the crt is made available to other implementations by setting aws configure set <PROFILE>.s3.preferred_transfer_client crt