Issues with GitLab EE LFS Storage Backend Using MinIO: An Infinite Upload Loop

:hugs: Please review whether this category fits your question about GitLab topics. If not, please change the category dropdown to switch the guided template automatically.

For support questions about GitLab product usage and features, please check these categories instead:

  1. How to Use GitLab
  2. GitLab CI/CD
  3. GitLab Duo (AI)
  4. DevSecOps
  5. Infrastructure as Code & Cloud Native
  6. Observability

Maybe your question was answered already, or you are hitting a bug?

Introduction

In our pursuit of enhancing our project’s version control capabilities with GitLab EE, we chose MinIO as the backend storage for Large File Storage (LFS), attracted by its high-performance, S3-compatible object storage features. This setup was intended to efficiently manage our large binaries and assets. However, an unexpected and quite perplexing issue arose, especially during the handling of LFS objects.

Configuration Overview

Our integration involves configuring GitLab EE to utilize MinIO as the LFS storage backend. This integration was motivated by the scalability and S3 compatibility of MinIO, which promised a smooth and efficient management of large files. Here are the specifics of our setup:

  • GitLab EE Version: [Specify version]
  • MinIO Version: [Specify version]
  • Storage Configuration: Configured to store LFS objects in MinIO, with bucket policies and IAM roles set up for access management.

Encountered Issue

The core issue emerges during the LFS object handling phase, particularly when GitLab attempts to perform CopyObjectPart operations within MinIO. Despite successful multipart uploads and new multipart upload sessions initiation, these CopyObjectPart operations consistently fail with 404 Not Found errors. Indicating that the targeted object parts for copying are seemingly non-existent in the MinIO bucket, despite previous successful uploads.

Additionally, a more critical manifestation of this issue is observed at the client side - an infinite upload loop. A 7GB file, for instance, has accumulated over 100GB of data transfer due to continuous re-uploads, with no successful completion in sight. This behavior not only disrupts the efficiency of our storage management but also significantly inflates our data transfer volumes unnecessarily.

Sequence of Events

  1. Infinite Upload Loop: The client repeatedly uploads parts of a large file, leading to over 100GB of data transferred for a 7GB file, without successful completion.
  2. Failed Copy Operations: Logs show multiple CopyObjectPart operations with 404 Not Found responses, suggesting the non-existence of the specified parts in MinIO.
  3. Session Abortion and Restart: Following these failures, an AbortMultipartUpload operation is executed to clean up, succeeded by a NewMultipartUpload for retrying the upload process.

Troubleshooting Steps

To address these issues, we undertook the following steps:

  • Object Keys and Paths Verification: Checked for accuracy in object keys and storage paths configuration.
  • Permissions Review: Ensured appropriate access permissions for GitLab instances on MinIO buckets.
  • Configuration Audit: Revisited GitLab and MinIO settings to identify any misconfigurations.
  • Logs Analysis: Investigated GitLab and MinIO logs for patterns that might explain the errors and the infinite upload loop.

Seeking Community Insights

Despite extensive troubleshooting, the reasons behind the CopyObjectPart failures and the infinite upload loop remain unclear. We speculate it might involve timing issues, complex permission settings, or specific aspects of the GitLab-MinIO integration, especially concerning LFS handling.

We appeal to the community for any insights, shared experiences, or suggestions that could help resolve these issues. Any advice on halting the infinite upload loop, correcting the copy operation failures, or optimizing our configuration would be highly valued.

Conclusion

Integrating MinIO with GitLab LFS appeared as a promising approach for efficient large asset management. Yet, the challenges we’ve encountered highlight the intricacies of such integrations. We are hopeful that with the community’s support, we can overcome these obstacles for a seamless and efficient workflow.

I encountered the same problem, and I can provide some potentially useful information. The file was actually uploaded successfully, but it was deleted, which is why I reported a 404 error. However, I don’t understand why the file was deleted.