When deploying to a Kubernetes Cluster within a CICD Pipeline using the Gitlab Agent for Kubernetes, we occasionally get this error: Error: Kubernetes cluster unreachable: an error on the server ("unknown") has prevented the request from succeeding
Steps to reproduce
Intermittent problem
What is the current bug behavior?
Deployment job exits as failed because it cannot interact with the cluster.
What is the expected correct behavior?
Deployment succeeds by properly interacting with the Kubernetes API using the Gitlab Agent for Kubernetes as its medium.
Relevant logs and/or screenshots
From last lines of the job:
Error: Kubernetes cluster unreachable: an error on the server ("unknown") has prevented the request from succeeding
Cleaning up project directory and file based variables
ERROR: Job failed: command terminated with exit code 1
Thank you, interesting, is there any kind of a pattern to this intermittency? Specific time window throughout the day? does it coincide with any releases going on internally etc? do you have monitoring / graphs (grafana / prometheus) where you can visualise what exactly is happening in your environment during this “unreachable” behaviour?
Intermittency can be hard to troubleshoot, best way is to use monitoring data to visualise what is happening in the whole environment overall, which component is doing what etc
We do have some monitoring in place, but the problem has been so intermittent and short lived that it hasn’t created a need to monitor it immediately. However, it is often enough that our app devs have had to ping us multiple times when they were unable to deploy to the cluster with the first attempt to run the pipeline. Our current solution has been to add a retry script to the pipeline that verifies the cluster is reachable before continuing and deploying.
I guess I’m wondering if there would be logs in the Gitlab Agent that could give insight to these problems if it is in fact a Gitlab Agent problem. Also I didn’t specify that we are not self-hosting our gitlab instance but rather, using the cloud version.