Kubernetes Agent - Failed to register agent pod. Please make sure teh agent version matches the server version"

Hello,

I have an issue where I’m unable to register an agent with my self-hosted Gitlab CE server, with the following error:

{"level":"error","time":"2024-01-01T07:06:14.950Z","msg":"Failed to register agent pod. Please make sure the agent version matches the server version","mod_name":"agent_registrar","error":"rpc error: code = Unavailable desc = unavailable"}

My Gitlab version is v16.7 and Gitlab-KAS is v16.8.0-rc1. I have also tested this with v16.4 but unfortunately the result is the same. I installed the agent using the helm command generated by Gitlab and I have triple checked to make sure the version is correct. Would appreciate any help regarding this.

1 Like

Hi !

Same issue here, in the same configuration. The rpc error code and description aren’t very verbose… Using WSCAT from the node hosting Kubernetes seems succeeding to connect to Gitlab KAS (even if it gets disconnected instantly). Hope we’ll get an answer :slight_smile:

Try v16.7.0 for the Kubernetes Agent. It’s supposed to match your GitLab instance’s version.

Using the latest Helm chart, you probably need to --set image.version=v16.7.0 to override the version you are using now. Alternatively, you can search the Helm repo for a chart version that uses v16.7.0 as the default app/image version.

The upstream documentation has an example that illustrates overriding the image version that may be helpful as well.

Hope this helps,

Hi, thanks for your answer. In fact I already tried that without success (sorry for not specifying it). In addition, why would Gitlab generate a wrong helm command ?

For information, I don’t know if I should be worried by those results. From the Kubernetes nodes, I get this :

  • When using wscat :
    wscat -c "wss://GITLABURL/-/kubernetes-agent/" Connected (press CTRL+C to quit) Disconnected (code: 1002, reason: "Expecting "ws-tunnel" subprotocol, got """)

  • When using curl :
    curl --header "Connection: Upgrade" --header "Upgrade: websocket" https:/GITLAB_URL/-/kubernetes-agent/ -i HTTP/1.1 400 Bad Request **Date**: Fri, 05 Jan 2024 06:17:39 GMT **Server**: nginx **Strict-Transport-Security**: max-age=15552000; includeSubDomains **Content-Security-Policy**: frame-ancestors 'self' INTERNAL **X-Content-Type-Options**: nosniff **Referrer-Policy**: same-origin **Permissions-Policy**: geolocation=(none), midi=(none), notifications=(INTERNAL), push=(INTERNAL), sync-xhr=(none), microphone=(INTERNAL), camera=(INTERNAL), magnetometer=(none), gyroscope=(none), speaker=(INTERNAL), vibrate=(INTERNAL), fullscreen(self), payment=(none) **X-Frame-Options**: SAMEORIGIN **Content-Type**: text/plain; charset=utf-8 **Content-Length**: 66 **Sec-Websocket-Version**: 13 **Connection**: close unsupported WebSocket protocol version (only 13 is supported): ""

Does KAS work properly ? By the way I already (I think well) configured Gitlab and my Apache reverse proxy as before I was getting 301 (Gitlab wrong configuration) and 426 (Apache wrong configuration).

I don’t think it’s supposed to match my GitLab instance version, it’s supposed to match the GitLab-KAS version on my server, which is correctly v16.8.0-rc1. I’m not sure why v16.7 comes with KAS v16.8 but that’s how it is and the helm command generated by GitLab contains the right version.

I have had no problems upgrading an already registered GitLab agent from 16.5.1 to 16.6.0 to 16.7.0 on our self-hosted GitLab instance. The only odd things is that the /-/clusters page doesn’t update (see this issue).

On a freshly deployed 16.7.0 GitLab test instance, I tried registering an agent with the GitLab generated command. I confirm that that shows version 16.8.0-rc1. First thing I stumbled on was a self-signed certificate for the GitLab test instance. Added that by appending

--set-file config.caCert=/path/to/self-signed-cert

to the GitLab generated command line.
That still fails.

In both cases, the msg part in the logs reads

Failed to register agent pod. Please make sure the agent version matches the server version

However, the error part changes from complaining to an “unknown authority” to complaining about a certificate that “doesn’t contain IP SANs”.
# The test instance is exposed by IP address and it’s certificate does have an IP SAN entry.

In view of the above, I think the msg about version mismatches is a generic hint for any kind of error and you really should be looking at the error. Unfortunately, in @ahmed.ajwed’s case that is not particularly informative :unamused:

When I try your curl command (making sure to use https:// instead of http:/) and ignoring certficate issues for the moment, I get

$ curl --insecure --header "Connection: Upgrade" --header "Upgrade: websocket" https://$IP_ADDRESS/gitlab/-/kubernetes-agent/ -i
HTTP/2 426 
server: nginx
date: Thu, 11 Jan 2024 03:06:56 GMT
content-type: text/plain; charset=utf-8
content-length: 81
curl: (92) Invalid HTTP header field was received: frame type: 1, stream: 1, name: [upgrade], value: [websocket]

Not sure what to make of that :thinking:

BTW, I’m using the Docker image, i.e. GitLab Omnibus, so that nginx is the one that comes bundled with it. I don’t run an extra reverse-proxy in front of the container.

a certificate that “doesn’t contain IP SANs”.

Sorry, that was caused by a bad WebSocket URL. It seems that GitLab drops the instance’s port number if that is not the default 443 :unamused:

Anyway, getting all my test setups sorted out, I am now able to reproduce @ahmed.ajwed’s error message with 16.8.0-rc1 as well as 16.7.0 agents trying to register as separate clusters to a GitLab 16.7.0 instance. I’ve tried with versions 1.22.0 and 1.21.0 of the Helm chart.

Just noticed this in the gitlab-kas log (redacted for readability)

{
  "level":"error",
  "time":"2024-01-11T14:27:47.665+0900",
  "msg":"AgentInfo()",
  "grpc_service":"gitlab.agent.reverse_tunnel.rpc.ReverseTunnel",
  "grpc_method":"Connect",
  "error":"Get \"https://$HOSTNAME/gitlab/api/v4/internal/kubernetes/agent_info\": tls: failed to verify certificate: x509: certificate signed by unknown authority"
}

To solve that, I put a copy of /path/to/self-signed-cert in the container’s /etc/gitlab/trusted-certs/
and in the container ran

 gitlab-ctl reconfigure
sv restart gitlab-kas

Eventually, all the cluster from my previous registration attempts connected successfully :confetti_ball:

So, if you are using self-signed certificates, make sure to establish trust on the agent and GitLab sides and your problem might/should go away.

I face same issue when I am trying to deploy a Kubernetes agent to my local k3s cluster.

The pod log showed following error

{"level":"error","time":"2024-01-15T04:55:29.991Z","msg":"Failed to register agent pod. Please make sure the agent version matches the server version","mod_name":"agent_registrar","error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: failed to WebSocket dial: expected handshake response status code 101 but got 400\""}

I tried to deploy Kubernetes agent on same cluster and same way few months ago and worked fine. Is it an issue from their latest version?

I have also checked gitlab-kas log and I didn’t see any certificate error like above.

I got the same issue on k3s cluster.
Here is my override value :

# -- confiure the used image
image:
 # -- Overrides the image tag whose default is the chart appVersion.
 tag: "v16.8.0-rc1"

# -- configure the agent
config:
 # -- The user-facing URL for the in-cluster `agentk`
 #kasAddress: 'wss://gitlab.homelab.dev/-/kubernetes-agent/'
 kasAddress: 'wss://kas.homelab.dev/'
 # -- put your agent token here
 token: glagent-glagent-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# -- list of hosts and IPs that will be injected into the pod's hosts file
# Example:
# `[{ "ip": "127.0.0.1", "hostnames": [ "foo.local", "bar.local" ]}, { "ip": "10.1.2.3", "hostnames": [ "foo.remote", "bar.remote" ]}]`
hostAliases: [
 { "ip": "10.43.24.120", "hostnames": [ "kas.homelab.dev" ] }
]

Any thoughts ?

I just fix my issue please see KAS should be reachable on a subdomain of Gitlab (frontend) (#4920) · Issues · GitLab.org / charts / GitLab Chart · GitLab Thank to Caleb Hansard

I installed my GitLab EE by package manager instead of Helm, and I have also configured the kas external url to wss://gitlab.mydomain.com/-/kubernetes-agent/ in gitlab.rb, but it’s still failed to connect with same error.

Is the issue you posted actually saying if we deploy GitLab to a subdomain, we need to specifically define the kas external url to kas.gitlab.mydomain.com, otherwise by default it will be kas.mydomain.com?

1 Like

I still have the same issue as yours I have Gitlab 16.4.7 and I’m trying to install the agent and I get the same error as yours


any suggestions of the solution

has anyone solved this issue?

这个问题我也遇到了。
原因是配置的gitlab_kas_external_url参数不正确导致的。
我使用的docker启动的并且没有公网域名, 直接将示例域名更换为内网IP+8929端口,
然后用wscat请求gitlab_kas_external_url地址显示返回301,重新注册KAS,这个问题就解决了
image