I have an issue where I’m unable to register an agent with my self-hosted Gitlab CE server, with the following error:
{"level":"error","time":"2024-01-01T07:06:14.950Z","msg":"Failed to register agent pod. Please make sure the agent version matches the server version","mod_name":"agent_registrar","error":"rpc error: code = Unavailable desc = unavailable"}
My Gitlab version is v16.7 and Gitlab-KAS is v16.8.0-rc1. I have also tested this with v16.4 but unfortunately the result is the same. I installed the agent using the helm command generated by Gitlab and I have triple checked to make sure the version is correct. Would appreciate any help regarding this.
Same issue here, in the same configuration. The rpc error code and description aren’t very verbose… Using WSCAT from the node hosting Kubernetes seems succeeding to connect to Gitlab KAS (even if it gets disconnected instantly). Hope we’ll get an answer
Try v16.7.0 for the Kubernetes Agent. It’s supposed to match your GitLab instance’s version.
Using the latest Helm chart, you probably need to --set image.version=v16.7.0 to override the version you are using now. Alternatively, you can search the Helm repo for a chart version that uses v16.7.0 as the default app/image version.
The upstream documentation has an example that illustrates overriding the image version that may be helpful as well.
Hi, thanks for your answer. In fact I already tried that without success (sorry for not specifying it). In addition, why would Gitlab generate a wrong helm command ?
Does KAS work properly ? By the way I already (I think well) configured Gitlab and my Apache reverse proxy as before I was getting 301 (Gitlab wrong configuration) and 426 (Apache wrong configuration).
I don’t think it’s supposed to match my GitLab instance version, it’s supposed to match the GitLab-KAS version on my server, which is correctly v16.8.0-rc1. I’m not sure why v16.7 comes with KAS v16.8 but that’s how it is and the helm command generated by GitLab contains the right version.
I have had no problems upgrading an already registered GitLab agent from 16.5.1 to 16.6.0 to 16.7.0 on our self-hosted GitLab instance. The only odd things is that the /-/clusters page doesn’t update (see this issue).
On a freshly deployed 16.7.0 GitLab test instance, I tried registering an agent with the GitLab generated command. I confirm that that shows version 16.8.0-rc1. First thing I stumbled on was a self-signed certificate for the GitLab test instance. Added that by appending
to the GitLab generated command line.
That still fails.
In both cases, the msg part in the logs reads
Failed to register agent pod. Please make sure the agent version matches the server version
However, the error part changes from complaining to an “unknown authority” to complaining about a certificate that “doesn’t contain IP SANs”.
# The test instance is exposed by IP address and it’s certificate does have an IP SAN entry.
In view of the above, I think the msg about version mismatches is a generic hint for any kind of error and you really should be looking at the error. Unfortunately, in @ahmed.ajwed’s case that is not particularly informative
BTW, I’m using the Docker image, i.e. GitLab Omnibus, so that nginx is the one that comes bundled with it. I don’t run an extra reverse-proxy in front of the container.
Sorry, that was caused by a bad WebSocket URL. It seems that GitLab drops the instance’s port number if that is not the default 443
Anyway, getting all my test setups sorted out, I am now able to reproduce @ahmed.ajwed’s error message with 16.8.0-rc1 as well as 16.7.0 agents trying to register as separate clusters to a GitLab 16.7.0 instance. I’ve tried with versions 1.22.0 and 1.21.0 of the Helm chart.
Just noticed this in the gitlab-kas log (redacted for readability)
{
"level":"error",
"time":"2024-01-11T14:27:47.665+0900",
"msg":"AgentInfo()",
"grpc_service":"gitlab.agent.reverse_tunnel.rpc.ReverseTunnel",
"grpc_method":"Connect",
"error":"Get \"https://$HOSTNAME/gitlab/api/v4/internal/kubernetes/agent_info\": tls: failed to verify certificate: x509: certificate signed by unknown authority"
}
To solve that, I put a copy of /path/to/self-signed-cert in the container’s /etc/gitlab/trusted-certs/
and in the container ran
gitlab-ctl reconfigure
sv restart gitlab-kas
Eventually, all the cluster from my previous registration attempts connected successfully
So, if you are using self-signed certificates, make sure to establish trust on the agent and GitLab sides and your problem might/should go away.
I face same issue when I am trying to deploy a Kubernetes agent to my local k3s cluster.
The pod log showed following error
{"level":"error","time":"2024-01-15T04:55:29.991Z","msg":"Failed to register agent pod. Please make sure the agent version matches the server version","mod_name":"agent_registrar","error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: failed to WebSocket dial: expected handshake response status code 101 but got 400\""}
I tried to deploy Kubernetes agent on same cluster and same way few months ago and worked fine. Is it an issue from their latest version?
I have also checked gitlab-kas log and I didn’t see any certificate error like above.
I got the same issue on k3s cluster.
Here is my override value :
# -- confiure the used image
image:
# -- Overrides the image tag whose default is the chart appVersion.
tag: "v16.8.0-rc1"
# -- configure the agent
config:
# -- The user-facing URL for the in-cluster `agentk`
#kasAddress: 'wss://gitlab.homelab.dev/-/kubernetes-agent/'
kasAddress: 'wss://kas.homelab.dev/'
# -- put your agent token here
token: glagent-glagent-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# -- list of hosts and IPs that will be injected into the pod's hosts file
# Example:
# `[{ "ip": "127.0.0.1", "hostnames": [ "foo.local", "bar.local" ]}, { "ip": "10.1.2.3", "hostnames": [ "foo.remote", "bar.remote" ]}]`
hostAliases: [
{ "ip": "10.43.24.120", "hostnames": [ "kas.homelab.dev" ] }
]
I installed my GitLab EE by package manager instead of Helm, and I have also configured the kas external url to wss://gitlab.mydomain.com/-/kubernetes-agent/ in gitlab.rb, but it’s still failed to connect with same error.
Is the issue you posted actually saying if we deploy GitLab to a subdomain, we need to specifically define the kas external url to kas.gitlab.mydomain.com, otherwise by default it will be kas.mydomain.com?