Git checkout hang for more then 6 min

running on gitlab.com
Private runners - running on EKS
Large repo > 8gb
Run jobs in parallel (20 jobs)
When the job run without other jobs all works well
When there is more then one job the checkout phase stuck for more then 6 min
I enabled tracing the relevent lines are

07:30:43.744383 git.c:462               trace: built-in: git checkout -f -q 0bce1f38e88ab1f11330e7e8f181b1c01d4c4528
07:37:50.688964 git.c:462               trace: built-in: git clean -ffdx -e node_modules/ -e vendor/ -e last_commit_id.txt

as it can be seen the checkout takes 7 min which in this time nothing is happening

when single job is running it takes less then 50 ms

runner configuration

rbac:
  create: true
  clusterWideAccess: true
  rules:
    - resources: ['configmaps', 'pods', 'pods/attach', 'secrets', 'services']
      verbs: ['get', 'list', 'watch', 'create', 'patch', 'delete', 'update']
    - apiGroups: ['']
      resources: ['pods/exec']
      verbs: ['create', 'patch', 'delete']

nodeSelector:
  cicd-type: runner
runners:
  config: |
    [[runners]]
      name = "EKS-RUNNER"
      executor = "kubernetes"
      builds_dir = "/builds"
      [runners.kubernetes]
        namespace = "{{.Release.Namespace}}"
        service_account = "{{ include "gitlab-runner.fullname" . }}"
        namespace_overwrite_allowed = "ci-.*"
        privileged = true
        cpu_limit = "2"
        memory_limit = "6Gi"
        service_cpu_limit = "1"
        service_memory_limit = "1Gi"
        helper_cpu_limit = "1"
        helper_memory_limit = "2Gi"
        poll_interval = 5
        poll_timeout = 3600
        [runners.kubernetes.node_selector]
          cicd-type = "build"
        [[runners.kubernetes.volumes.host_path]]
             name = "gitlab-repo"
             mount_path = "RRRR"
             read_only = false
             host_path = "BBBBB"

gitlab-ci.yaml configuration

stages:
  # PREPARING JOB EXECUTORS
  - prepare executor markers
  # EXECUTION STAGES
  - init
  - gublisher
  - test
  - wrap vendor
  - deploy scripts
  - build
  - deploy
  - finalize

# VARS
variables:
  GIT_TRACE: 1
  GIT_TRANSFER_TRACE: 1
  GIT_CURL_VERBOSE: 1
  GIT_TRACE_SHALLOW: 1
  GIT_LFS_SKIP_SMUDGE: 1
  GIT_CLONE_PATH: $CI_BUILDS_DIR/YYY/$CI_CONCURRENT_ID
  GIT_DEPTH: 10
  S3_BUCKET_NAME: 'XXXX'
  ENV: 'DEV'
  BRANCH_NAME: $CI_COMMIT_BRANCH
  # LAST_COMMIT_FILE_NAME: ${CI_COMMIT_BRANCH/ZZZ\//}' CCCCC
  LAST_COMMIT_FILE_NAME: 'CCCCC'
  RUN_JOB: 'false'
  # GIT_CLEAN_FLAGS: -ffdx -e node_modules/ -e vendor/ -e ${CI_COMMIT_BRANCH/ZZZ\//}_CCCCC
  GIT_CLEAN_FLAGS: -ffdx -e node_modules/ -e vendor/ -e CCCCC

Hi @nehemial

Does it happen only on single job out of the 20 or all of them? First thing that comes to my mind if the jobs are not started on a single node causing CPU throttling. But just a shot in a dark since I don’t know size/setup of your EKS.

Another thing is the runners.kubernetes.volumes.host_path. Why do you mount local EKS node disk to the containers? Are you by any chance mounting the git repo and running 20 parallel git checkout on it by any chance?

The mount is exactly what you said. I prepared 20 folders 0 -19, which each hold the repo (i can reach 20 parallel jobs)
Each pod is running on its on server …
I also verified CPU and memory and saw no issue using kubectl top pod