K8s build, helper, svc-0 not leveraging maximum cpu/memory of host

I deployed the gitlab/gitlab-runner helm chart to a standard cluster on gke.


  config: |
    [[runners]]
      environment = ["DOCKER_TLS_CERTDIR=/certs", "DOCKER_TLS_VERIFY=1", "DOCKER_CERT_PATH=/certs/client", "DOCKER_HOST=tcp://docker:2376", "DOCKER_DRIVER=overlay2"]
      executor = "kubernetes"
      [runners.feature_flags]
        FF_KUBERNETES_HONOR_ENTRYPOINT=false
        FF_USE_ADVANCED_POD_SPEC_CONFIGURATION=true
        FF_GITLAB_REGISTRY_HELPER_IMAGE=true
        FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY=true
      [runners.kubernetes]
        namespace = "{{.Release.Namespace}}" 
        privileged = true
        allow_privilege_escalation = true
        pull_policy = "always"
        allowed_pull_policies = ["always", "if-not-present"]

        poll_interval = 5
        poll_timeout = 360
        cleanup_resources_timeout = "5m"
        cap_add = ["SYS_ADMIN", "SYS_TIME", "IPC_LOCK"]

        # build
        cpu_limit = "2000m"
        cpu_request = "1300m"
        memory_limit = "8024M"
        memory_request = "5024M"

        # helper; gcs storage
        helper_cpu_limit = "2000m"
        helper_cpu_request = "250m"
        helper_memory_limit = "8024M"
        helper_memory_request = "1024M"

        # service e.g. dind
        service_cpu_limit = "2000m"
        service_cpu_request = "1500m"
        service_memory_limit = "8024M"
        service_memory_request = "6024M"

        [[runners.kubernetes.volumes.empty_dir]]
          name = "docker-certs"
          mount_path = "/certs/client"
          medium = "Memory"

Jobs are getting pulled and executed as expected.

In some cases, I get a FALSE passing job, where the log shows the job actually didn’t complete yet marked as completed successfully. (artifact was not generated)

This issue is not present in my non-K8s runners.

When using docker + DIND, CPU consumption is low, and docker stats do not match what k8s reports.

in my .gitlab-ci.yml I have something similar to:

.docker:
  image: docker:24.0.6
  services:
    - name: docker:24.0.6-dind
      command: ["--mtu=1300"]
  interruptible: true

e2e-test:
  stage: test
  tags:
    - k8s-small
  when: on_success
  timeout: 30 minutes
  extends: .docker
  script:
    - *shared_shell_functions
    - |
docker run -v ./:/builds -w /builds --entrypoint=/bin/bash   --ipc=host --cap-add=SYS_ADMIN \
  -e FEATURE_NAME="e2e-4073-1c29e237e1" \
  -e FILE="tests/features/navigation/navigation.as.spec.ts=1/2" \
  -e PROJECT="chromium" \
  -e CI="true" \
  "mcr.microsoft.com/playwright:v1.38.1-jammy" -c "./gitlab.test.e2e.sh"

natively without docker on my m1 I get 41 passed (3.1minutes)
with docker on my m1 I get 41 passed (3.6minutes)

On K8s I have seen my tests pass, but resource utilization seems to be nondeterministic, so it only passes when I get lucky.

Does anyone use the k8s runner, and what detail might I be missing in my setup?

Thanks,

It could be related to: