Runner reboot with multiple replicas with agents

I refer to Manage cluster applications | GitLab to deploy kubernetes agent and deploy gitlab-runner with replicas: 2.

this is my applications/gitlab-runner/values.yaml.gotmpl

## REQUIRED VALUES
gitlabUrl: {{ requiredEnv "CI_SERVER_URL" | quote }}
runnerRegistrationToken: {{ requiredEnv "GITLAB_RUNNER_REGISTRATION_TOKEN" | quote }}
replicas: 2

## Specify whether the runner should start the session server.
## Defaults to false
## ref: 
##
## When sessionServer is enabled, the user can either provide a public publicIP 
## or either rely on the external IP auto discovery
## When a serviceAccountName is used with the automounting to the pod disable,
## we recommend the usage of the publicIP
sessionServer:
  enabled: true
  timeout: 1800
  internalPort: 8093
  externalPort: 30013
  publicIP: "xxx.xxx.xxx.xxx"
  # loadBalancerSourceRanges:
  #   - 1.2.3.4/32


## Configure the maximum number of concurrent jobs
## - Documentation: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section
## - Default value: 10
## - Currently don't support auto-scaling.
concurrent: 50

## Defines in seconds how often to check GitLab for a new builds
## - Documentation: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section
## - Default value: 3
checkInterval: 3

## For RBAC support
rbac:
  create: true
  clusterWideAccess: false

## Configuration for the Pods that that the runner launches for each new job
runners:
  image: docker:stable
  builds: {}
  services: {}
  helpers: {}

  ## Specify the tags associated with the runner. Comma-separated list of tags.
  ## - Documentation: https://docs.gitlab.com/ce/ci/runners/#using-tags
  tags: hanlin,k8s

  ## Determine whether the runner should also run jobs without tags.
  ## - Documentation: https://docs.gitlab.com/ee/ci/runners/configure_runners.html#set-a-runner-to-run-untagged-jobs
  runUntagged: true

  ## Run all containers with the privileged flag enabled
  ## This will allow the docker:dind image to run if you need to run Docker
  ## commands. Please read the docs before turning this on:
  ## - Documentation: https://docs.gitlab.com/runner/executors/kubernetes.html#using-docker-dind
  privileged: true

  ## Kubernetes related options to control which nodes executors use
  ## - Documentation: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/
  # nodeSelector:
  #   myLabel: myValue
  #
  ## Documentation: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
  # nodeTolerations:
  #   - key: myTaint
  #     operator: Equal
  #     value: myValue
  #     effect: NoSchedule

  ## If you can't find a setting you think should be here this may help:
  ##
  ## The gitlab-runner chart uses `templates/configmap.yaml` to configure runners
  ## `configmap.yaml`'s `data.register-the-runner` transforms this file into runner CLI options
  ## `configmap.yaml`'s `data.config.toml` and `data.config.template.toml` transform this file into the runner's config.toml
  ##
  ## - Source code for `configmap.yaml` https://gitlab.com/gitlab-org/charts/gitlab-runner/-/blob/main/templates/configmap.yaml
  ## - Documentation for `config.toml` https://docs.gitlab.com/runner/executors/kubernetes.html#the-available-configtoml-settings
  ## - Source code for runner CLI options (see `KubernetesConfig` struct) https://gitlab.com/gitlab-org/gitlab-runner/-/blob/main/common/config.go
  
  
  config: |
    [[runners]]
      environment = ["DOCKER_TLS_CERTDIR=''", "DOCKER_DRIVER=overlay2", "FF_GITLAB_REGISTRY_HELPER_IMAGE=1", "FF_USE_FASTZIP=1"]
      [runners.docker]
        tls_verify = false
        volumes = ["/cache"]
      [runners.machine]
        MachineOptions = ["engine-registry-mirror=https://ngxmmhkl.mirror.aliyuncs.com"]
      [runners.kubernetes]
        [runners.kubernetes.volumes]
          [[runners.kubernetes.volumes.host_path]]
            name = "maven-repository"
            mount_path = "/root/.m2"
            host_path = "/var/docker/gitlab-runner/maven/.m2"
          [runners.kubernetes.affinity]
            [runners.kubernetes.affinity.node_affinity]
              [runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution]
                [[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms]]
                  [[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
                    key = "hanlin.arch/enviroment-scope.ci"
                    operator = "Exists"

resources: {}

At first, all runners can be successfully deployed and registered on gitlab, and the cicd jobs can be executed normally. But then found that gitlab-runner would be restarted frequently. The log is as follows

Are you sure Runner pod isn’t terminated by Kubernetes itself?

How to determine if kubernetes shut it down?

In general going to namespace where the Pod is running and looking how long the Pod is running kubectl get pods or to see events kubectl get events

@balonik From the event log, it should be a health check gone wrong. What is the reason for this one?

There isn’s a single possible cause for this. It could be network issues, container running out of memory, …

@balonik However, I have not encountered this problem when deploying directly to k8s via the statefulset script, like gitlab-runner-0/gitlab-runner-1 in the image above

Is there any way to further troubleshoot the specific issue?

Thanks!

After checking the runner’s chart, I found that the probe used was executing the following script https://gitlab.com/gitlab-org/charts/gitlab-runner/-/blob/main/templates/configmap.yaml

  check-live: |
    #! /bin/bash
    if /usr/bin/pgrep -f . *register-the-runner; then
      exit 0
    elif /usr/bin/pgrep gitlab.*runner; then
      exit 0
    else
      exit 1
    fi

The problem was solved when I increased the probe delay to 5s probeTimeoutSeconds: 5(default value is 1)