Hi, after gradually letting our Gitlab Omnibus do more and more stuff for us, we’re now at the stage were we want to integrate kubernetes into it.
After some trial and error, we downgraded to version 1.15.4 to get Helm Tiller to work, but we’re having issues with Prometheus not installing from Gitlab as it should.
I have already tested this on a GCP Cluster, and it works there, so it is probably related to the fact that we’re running our own kubernetes cluster on premises.
Disclaimer: We’re fairly new to Kubernetes, but we do see the benefits of moving some of our services into containers, and want to set it up to start testing it, but we’re having some difficulties playing ball with it, and it might be related to the fact that we’re threading uncharted territory.
Our problem as it stands is that Prometheus when installed from our Gitlab says it’s installed just fine, but displays this error on our website:
Unexpected metrics data response from prometheus endpoint
Upon further exploration in our Kubernetes cluster I see that:
gitlab-managed-apps - pod/prometheus-kube-state-metrics-744949b679-5n8nm
has 1/1 Running
But:
gitlab-managed-apps pod/prometheus-prometheus-server-646888949c-xrk7k
Has 0/2 Pending.
This pod consists of 2 containers and an init container and none of those has any logs showing:
helge@master:~$ kubectl -n gitlab-managed-apps logs -p pod/prometheus-prometheus-server-646888949c-xrk7k
Error from server (BadRequest): a container name must be specified for pod prometheus-prometheus-server-646888949c-xrk7k, choose one of: [prometheus-server-configmap-reload prometheus-server] or one of the init containers: [init-chown-data]
helge@master:~$ kubectl -n gitlab-managed-apps logs -p pod/prometheus-prometheus-server-646888949c-xrk7k prometheus-server
helge@master:~$ kubectl -n gitlab-managed-apps logs -p pod/prometheus-prometheus-server-646888949c-xrk7k prometheus-server-configmap-reload
helge@master:~$ kubectl -n gitlab-managed-apps logs -p pod/prometheus-prometheus-server-646888949c-xrk7k init-chown-data
Here’s a description of the Pod:
helge@master:~$ kubectl -n gitlab-managed-apps describe pod prometheus-prometheus-server-646888949c-xrk7k
Name: prometheus-prometheus-server-646888949c-xrk7k
Namespace: gitlab-managed-apps
Priority: 0
Node: <none>
Labels: app=prometheus
component=server
pod-template-hash=646888949c
release=prometheus
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/prometheus-prometheus-server-646888949c
Init Containers:
init-chown-data:
Image: busybox:latest
Port: <none>
Host Port: <none>
Command:
chown
-R
65534:65534
/data
Environment: <none>
Mounts:
/data from storage-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-prometheus-server-token-2bpdf (ro)
Containers:
prometheus-server-configmap-reload:
Image: jimmidyson/configmap-reload:v0.1
Port: <none>
Host Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://localhost:9090/-/reload
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-prometheus-server-token-2bpdf (ro)
prometheus-server:
Image: prom/prometheus:v2.4.3
Port: 9090/TCP
Host Port: 0/TCP
Args:
--config.file=/etc/config/prometheus.yml
--storage.tsdb.path=/data
--web.console.libraries=/etc/prometheus/console_libraries
--web.console.templates=/etc/prometheus/consoles
--web.enable-lifecycle
Liveness: http-get http://:9090/-/healthy delay=30s timeout=30s period=10s #success=1 #failure=3
Readiness: http-get http://:9090/-/ready delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-prometheus-server-token-2bpdf (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-prometheus-server
Optional: false
storage-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: prometheus-prometheus-server
ReadOnly: false
prometheus-prometheus-server-token-2bpdf:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-prometheus-server-token-2bpdf
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 69s (x30 over 41m) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
And I can see that it has an unbound PVC, and it refers to a PVC in the same namespace.
Here’s a description of the Deployment of the pod:
helge@master:~$ kubectl -n gitlab-managed-apps describe deployment prometheus-prometheus-server
Name: prometheus-prometheus-server
Namespace: gitlab-managed-apps
CreationTimestamp: Wed, 06 Nov 2019 12:36:06 +0000
Labels: app=prometheus
chart=prometheus-6.7.3
component=server
heritage=Tiller
release=prometheus
Annotations: deployment.kubernetes.io/revision: 1
Selector: app=prometheus,component=server,release=prometheus
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: app=prometheus
component=server
release=prometheus
Service Account: prometheus-prometheus-server
Init Containers:
init-chown-data:
Image: busybox:latest
Port: <none>
Host Port: <none>
Command:
chown
-R
65534:65534
/data
Environment: <none>
Mounts:
/data from storage-volume (rw)
Containers:
prometheus-server-configmap-reload:
Image: jimmidyson/configmap-reload:v0.1
Port: <none>
Host Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://localhost:9090/-/reload
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
prometheus-server:
Image: prom/prometheus:v2.4.3
Port: 9090/TCP
Host Port: 0/TCP
Args:
--config.file=/etc/config/prometheus.yml
--storage.tsdb.path=/data
--web.console.libraries=/etc/prometheus/console_libraries
--web.console.templates=/etc/prometheus/consoles
--web.enable-lifecycle
Liveness: http-get http://:9090/-/healthy delay=30s timeout=30s period=10s #success=1 #failure=3
Readiness: http-get http://:9090/-/ready delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-prometheus-server
Optional: false
storage-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: prometheus-prometheus-server
ReadOnly: false
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets: prometheus-prometheus-server-646888949c (1/1 replicas created)
NewReplicaSet: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 46m deployment-controller Scaled up replica set prometheus-prometheus-server-646888949c to 1
It also refers to the PVC in the same namespace.
if I do a kubectl get pvc -A
I get just the 1 PVC, that has status PENDING.
There are by default no StorageClasses, and no PersistentVolumes.
I’ve tried creating just a storageClass, to see if that unclogged it, but no.
I’ve tried creating just a 10Gi PV to see if it that helped, but no.
I’ve also tried creating a storageClass with a 10Gi PersistentVolume in it, but that didn’t help either.
I’ve been at this for a few days now, and I’m in need of a little push or something in the right direction.
I couldn’t find many people that have experienced the same issue as me, but the amount of people running on prem kubernetes clusters might be low.
Is there anyone out there that could offer some assistance or guidance?