Slow checkout when many LFS files need to be copied to working directory

,

Problem to solve

In a self-managed GitLab, we have large projects (research data) with many files (largest has ~800GB and ~4500 files). Many LFS large binary files make the volume.

We use our own runner (currently set to execute one job at a time) with docker executor and mounted /builds and /cache volumes not to clone the projects again.

The CI/CD executes data validation on every push (main branch and MRs). The validation is fast and needs all files, including the LFS files.

However, sometimes, the pipeline execution takes long. I narrowed it down to the commit checkout step before the actual job executes, like:

Checking out 9dff2b0c as detached HEAD (ref is refs/merge-requests/8/head)...

The long pipelines happen when the runner checks out a commit with many more large LFS files than the version in the working directory. The difference can be hundreds of gigabytes. Although all LFS objects are already in .git/lfs storage, the checkout is slow.

There may not be a solution to my problem, because the files required for the commit being checked out have to be copied to the working directory. If this is much volume, it simply takes time. Though, it seems that the LFS files are being processed one at a time and maybe there is some space for improvement.

I tried to set up more checkout workers with git config checkout.workers 5 but it didn’t make a difference. It actually seemed to increase the runtime.

I’d appreciate any help, pointers to relevant documentation, issues, etc.

Steps to reproduce

  1. Clone a repository with many large LFS files (unfortunately I cannot give one of our examples here). I may try to generate a project like that if someone would like to investigate.
  2. Checkout a commit far in the history with a small fraction of LFS files. This should be fast.
  3. Checkout a more recent commit with most of the LFS files. This should be slow.

Configuration

We use our own runner (currently set to execute one job at a time) with docker executor and mounted /builds and /cache volumes not to clone the projects again.

This configuration, however, is not relevant, as the problem can be reproduced on my local machine as in Steps to reproduce.

Versions

Please select whether options apply, and add the version information.

  • Self-managed
  • GitLab.com SaaS
  • Self-hosted Runners

Versions

  • GitLab: v17.3.1-ee
  • GitLab Runner: 17.3.1

Hi @matojsc,

We seem to run into similar issues on our side, with “Checking out” part of our CI/CD taking up to 20mins.

Did you find a solution for it by any chance?

Thank you.

Hi @simneu :slight_smile: I opened an issue in LFS project, but unfortunately didn’t have time to work on it :see_no_evil: The idea would be to create a minimal example and produce the output with the recommended flags.

1 Like