[SOLVED] kubernetes executor runner job metrics: scrape pod labels/annotations with prometheus

lutz108 · October 4, 2024, 4:26pm

Hey there

I reworked our runner structure and need some metrics to optimize resource assignment. My goal, is to have a dashboard, where I can allocate/group different statistics and, for example, get the load of certain jobs.

We have several hosts with a k3s cluster. Our argoCD deploys a set of gitlab runners with different properties to each cluster, using the official helm chart. The runners basic difference of these runners is using either docker or kubernetes executor to run the actual jobs. Additionally, we have a prometheus in each cluster, also deployed via helm and argoCD.

The gitlab-runner service monitor is enabled. The runner config (template) contains

...
service:
  enabled: true
  {{- if eq .executor "kubernetes" }}
  annotations:
    external-dns.alpha.kubernetes.io/hostname: "{{ .cluster }}-{{ .executor }}.{{ .location }}.something.cloud"
  clusterIP: None
  {{- end }}
{{ .additionalValues -}}
runners:
...

My idea, was to attach CI job variables to containers and then make prometheus collect these labels as the metrics’ label. Then I can combine them to get metrics of a certain pipeline stage or similar.

This works just perfectly for docker executor jobs, using cadvisor (also deployed via the helm chart and argoCD) by having in the docker executer config.toml

...
[runners.docker.container_labels]
  "com.gitlab.gitlab.runner.job.id" = "$CI_JOB_ID"
  "com.gitlab.gitlab.runner.job.stage" = "$CI_JOB_STAGE"
  "com.gitlab.gitlab.runner.job.name" = "$CI_JOB_NAME"
  "com.gitlab.gitlab.runner.pipeline.url" = "$CI_PIPELINE_URL"
  "com.gitlab.gitlab.runner.pipeline.name" = "$CI_PIPELINE_NAME"
  "com.gitlab.gitlab.runner.project.path" = "$CI_PROJECT_PATH"
...

and the scrape config

prometheus:
  server:
    ingress:
      enabled: true
      hosts:
        - prometheus.runnerhost.inhouse.something.cloud
  # scrape the cadvisor for docker jobs
  extraScrapeConfigs: |
    - job_name: cadvisor
      static_configs:
        - targets:
            - cadvisor.cadvisor.svc.cluster.local:8080

and an example promQL query

sum(
    rate(
        container_cpu_usage_seconds_total{
            container_label_com_gitlab_gitlab_runner_job_id!="",
            container_label_com_gitlab_gitlab_runner_job_name!=""
        }
        [$__rate_interval])
    )
by (
    container_label_com_gitlab_gitlab_runner_job_id,
    )
* 100

I get all the metadata I need.

I hoped to achieve the same with the kubernetes executor and kubernetes-nodes-cadvisor, so I tried with pod_labels and pod_annotations

...
[runners.kubernetes.pod_annotations]
  "job.runner.gitlab.com/stage" = "$CI_JOB_STAGE"
  ...
[runners.kubernetes.pod_labels]
  "com.gitlab.gitlab.runner.job.stage" = "$CI_JOB_STAGE" 
  ...
...

But metrics like the container_cpu_usage_seconds_total, coming from kubernetes-nodes-cadvisor job, do not contain labels, coming from annotations or labels attached to the executor pods. The actual pods however have the annotations/labels I defined.

I tried many different scrape configs. I tried some custom extraScrapeConfigs that should monitor pods, but I got 404s.

prometheus:
  server:
    ...
  serverFiles
    prometheus.yml:
        scrape_configs:
          - job_name: 'kubernetes-nodes-cadvisor'
            relabel_configs:
              - action: labelmap
                regex: __meta_kubernetes_(.*)_label_(.+)
              - action: labelmap
                regex: __meta_kubernetes_(.*)_annotation_(.+)

I also got 404s when I added

[runners.kubernetes.pod_annotations]
  "prometheus.io/scrape" = "true"
  "prometheus.io/path" = "metrics"
  "prometheus.io/port" = "9252"

Any ideas, suggestions? I sure can provide more config details, didn’t want to bloat the first post.

lutz108 · October 10, 2024, 9:34am

I managed to solve it with the help of a k8s magician colleague :o)
In order to maybe help others, I’ll share some details:

There seems to be a bug, that may have prevented adding the pod labels to the metrics
New ansatz was to add kube-state-metrics to prometheus and tell it to add pod labels/annotations
a. prometheus values.yaml

prometheus:
  server:
    ingress:
      enabled: true
      hosts:
        - prometheus.runnerfarm-0.inhouse.platform.reservix.cloud
  kube-state-metrics:
    enabled: true
#    metricAllowlist:
#      - kube_pod_annotations
    metricAnnotationsAllowList:
     - pods=[*]
#     - namespaces=[gitlab]

Use promQL to join labels of kubernetes-nodes-cadvisor (resource usage metrics) and kube-state-metrics (containing the desired labels). A simple query would be as

container_cpu_usage_seconds_total{pod=~"runner-.*", container!=""}
*
on(pod) 
group_left(annotation_job_runner_gitlab_com_id)
(kube_pod_annotations)

And for reference, this is how I calculate the CPU utilization (see related github discussion)

round(
  100 * 
  sum(
    rate(
        container_cpu_usage_seconds_total{pod=~"runner-.*", container!=""}[5m]
    )
  )
  by (
    pod,
    container,
    annotation_job_runner_gitlab_com_id,
    ...
  )
) 
/ 
sum by (
    pod,
    container,
    annotation_job_runner_gitlab_com_id,
    ...
    )
    (
        container_spec_cpu_quota{pod=~"runner-.*", container!=""}
        /
        container_spec_cpu_period{pod=~"runner-.*", container!=""}
    )
*
on(pod)
group_left(
    annotation_job_runner_gitlab_com_id,
    ...
)
(
    max by(
        pod,
        container,
        annotation_job_runner_gitlab_com_id,
        ...
    )
(kube_pod_annotations)
)

where I add the desired labels to the by() expressions so I can use them as graph labels in Grafana. The max by() was necessary to assure uniqueness

Topic		Replies	Views
Monitoring of a self-hosted runner with Kubernetes executor GitLab CI/CD runner , kubernetes	0	505	October 26, 2022
Gitlab runners job names within container name GitLab CI/CD runner , kubernetes	2	872	July 30, 2024
Enable Gitlab Metrics through Gitlab Helm Chart Observability kubernetes , azure , helm	1	1527	May 21, 2024
GitLab Prometheus Kubernetes Observability prometheus	0	771	October 12, 2018
How to enable metrics and web terminal for custom kubernetes clusters? Observability ci , kubernetes	0	457	November 30, 2020

[SOLVED] kubernetes executor runner job metrics: scrape pod labels/annotations with prometheus

Related topics