Gitlab Prometheus Integration Fails

Hi, after gradually letting our Gitlab Omnibus do more and more stuff for us, we’re now at the stage were we want to integrate kubernetes into it.

After some trial and error, we downgraded to version 1.15.4 to get Helm Tiller to work, but we’re having issues with Prometheus not installing from Gitlab as it should.

I have already tested this on a GCP Cluster, and it works there, so it is probably related to the fact that we’re running our own kubernetes cluster on premises.

Disclaimer: We’re fairly new to Kubernetes, but we do see the benefits of moving some of our services into containers, and want to set it up to start testing it, but we’re having some difficulties playing ball with it, and it might be related to the fact that we’re threading uncharted territory.

Our problem as it stands is that Prometheus when installed from our Gitlab says it’s installed just fine, but displays this error on our website:

Unexpected metrics data response from prometheus endpoint

Upon further exploration in our Kubernetes cluster I see that:

gitlab-managed-apps - pod/prometheus-kube-state-metrics-744949b679-5n8nm
has 1/1 Running

But:
gitlab-managed-apps pod/prometheus-prometheus-server-646888949c-xrk7k
Has 0/2 Pending.

This pod consists of 2 containers and an init container and none of those has any logs showing:

helge@master:~$ kubectl -n gitlab-managed-apps logs -p pod/prometheus-prometheus-server-646888949c-xrk7k
Error from server (BadRequest): a container name must be specified for pod prometheus-prometheus-server-646888949c-xrk7k, choose one of: [prometheus-server-configmap-reload prometheus-server] or one of the init containers: [init-chown-data]
helge@master:~$ kubectl -n gitlab-managed-apps logs -p pod/prometheus-prometheus-server-646888949c-xrk7k prometheus-server
helge@master:~$ kubectl -n gitlab-managed-apps logs -p pod/prometheus-prometheus-server-646888949c-xrk7k prometheus-server-configmap-reload
helge@master:~$ kubectl -n gitlab-managed-apps logs -p pod/prometheus-prometheus-server-646888949c-xrk7k init-chown-data

Here’s a description of the Pod:

helge@master:~$ kubectl -n gitlab-managed-apps describe pod prometheus-prometheus-server-646888949c-xrk7k
Name:           prometheus-prometheus-server-646888949c-xrk7k
Namespace:      gitlab-managed-apps
Priority:       0
Node:           <none>
Labels:         app=prometheus
                component=server
                pod-template-hash=646888949c
                release=prometheus
Annotations:    <none>
Status:         Pending
IP:
Controlled By:  ReplicaSet/prometheus-prometheus-server-646888949c
Init Containers:
  init-chown-data:
    Image:      busybox:latest
    Port:       <none>
    Host Port:  <none>
    Command:
      chown
      -R
      65534:65534
      /data
    Environment:  <none>
    Mounts:
      /data from storage-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-prometheus-server-token-2bpdf (ro)
Containers:
  prometheus-server-configmap-reload:
    Image:      jimmidyson/configmap-reload:v0.1
    Port:       <none>
    Host Port:  <none>
    Args:
      --volume-dir=/etc/config
      --webhook-url=http://localhost:9090/-/reload
    Environment:  <none>
    Mounts:
      /etc/config from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-prometheus-server-token-2bpdf (ro)
  prometheus-server:
    Image:      prom/prometheus:v2.4.3
    Port:       9090/TCP
    Host Port:  0/TCP
    Args:
      --config.file=/etc/config/prometheus.yml
      --storage.tsdb.path=/data
      --web.console.libraries=/etc/prometheus/console_libraries
      --web.console.templates=/etc/prometheus/consoles
      --web.enable-lifecycle
    Liveness:     http-get http://:9090/-/healthy delay=30s timeout=30s period=10s #success=1 #failure=3
    Readiness:    http-get http://:9090/-/ready delay=30s timeout=30s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /data from storage-volume (rw)
      /etc/config from config-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-prometheus-server-token-2bpdf (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus-prometheus-server
    Optional:  false
  storage-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  prometheus-prometheus-server
    ReadOnly:   false
  prometheus-prometheus-server-token-2bpdf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-prometheus-server-token-2bpdf
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  69s (x30 over 41m)  default-scheduler  pod has unbound immediate PersistentVolumeClaims (repeated 2 times)

And I can see that it has an unbound PVC, and it refers to a PVC in the same namespace.

Here’s a description of the Deployment of the pod:

helge@master:~$ kubectl -n gitlab-managed-apps describe deployment prometheus-prometheus-server
Name:                   prometheus-prometheus-server
Namespace:              gitlab-managed-apps
CreationTimestamp:      Wed, 06 Nov 2019 12:36:06 +0000
Labels:                 app=prometheus
                        chart=prometheus-6.7.3
                        component=server
                        heritage=Tiller
                        release=prometheus
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=prometheus,component=server,release=prometheus
Replicas:               1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  1 max unavailable, 1 max surge
Pod Template:
  Labels:           app=prometheus
                    component=server
                    release=prometheus
  Service Account:  prometheus-prometheus-server
  Init Containers:
   init-chown-data:
    Image:      busybox:latest
    Port:       <none>
    Host Port:  <none>
    Command:
      chown
      -R
      65534:65534
      /data
    Environment:  <none>
    Mounts:
      /data from storage-volume (rw)
  Containers:
   prometheus-server-configmap-reload:
    Image:      jimmidyson/configmap-reload:v0.1
    Port:       <none>
    Host Port:  <none>
    Args:
      --volume-dir=/etc/config
      --webhook-url=http://localhost:9090/-/reload
    Environment:  <none>
    Mounts:
      /etc/config from config-volume (ro)
   prometheus-server:
    Image:      prom/prometheus:v2.4.3
    Port:       9090/TCP
    Host Port:  0/TCP
    Args:
      --config.file=/etc/config/prometheus.yml
      --storage.tsdb.path=/data
      --web.console.libraries=/etc/prometheus/console_libraries
      --web.console.templates=/etc/prometheus/consoles
      --web.enable-lifecycle
    Liveness:     http-get http://:9090/-/healthy delay=30s timeout=30s period=10s #success=1 #failure=3
    Readiness:    http-get http://:9090/-/ready delay=30s timeout=30s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /data from storage-volume (rw)
      /etc/config from config-volume (rw)
  Volumes:
   config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus-prometheus-server
    Optional:  false
   storage-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  prometheus-prometheus-server
    ReadOnly:   false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  prometheus-prometheus-server-646888949c (1/1 replicas created)
NewReplicaSet:   <none>
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  46m   deployment-controller  Scaled up replica set prometheus-prometheus-server-646888949c to 1

It also refers to the PVC in the same namespace.

if I do a kubectl get pvc -A I get just the 1 PVC, that has status PENDING.
There are by default no StorageClasses, and no PersistentVolumes.
I’ve tried creating just a storageClass, to see if that unclogged it, but no.
I’ve tried creating just a 10Gi PV to see if it that helped, but no.
I’ve also tried creating a storageClass with a 10Gi PersistentVolume in it, but that didn’t help either.

I’ve been at this for a few days now, and I’m in need of a little push or something in the right direction.
I couldn’t find many people that have experienced the same issue as me, but the amount of people running on prem kubernetes clusters might be low.

Is there anyone out there that could offer some assistance or guidance?