[SOLVED] KAS not working after upgrade to 15.10

Hi,

I’ve regular upgrade our GitLab installation (on bare metal using debian with gitlab package repository).
But during the upgrade to 15.10.1 the KAS stop to work. I’ve checked the upgrade notice before and saw, that they added a backward compatible change to the kas (using subdomain, but disabled default). I also upgraded my gitlab.rb to the latest template, but it didn’t helped.

my gitlab agents in k8s told me

{"level":"error","time":"2023-04-04T06:45:26.338Z","msg":"Error handling a connection","mod_name":"reverse_tunnel","error":"Connect(): rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: failed to WebSocket dial: expected handshake response status code 101 but got 502\""}

while try to restart the kas lead to a timeout:

gitlab-ctl restart gitlab-kas
timeout: run: gitlab-kas: (pid 4090336) 1361087s, got TERM

The logs for gitlab-kas are empty:

gitlab-ctl tail gitlab-kas
==> /var/log/gitlab/gitlab-kas/state <==

==> /var/log/gitlab/gitlab-kas/current <==

i’m a little bit out ouf ideas…

In Addition: Gitlab shows in all “Kubernetes Clusters” pages the message:

An error occurred while loading your agents

I’ve also checked if the service is still running, and he does:

netstat -tulpn | grep 8151
tcp        0      0 127.0.0.1:8151          0.0.0.0:*               LISTEN      4090336/gitlab-kas  

if i kill the service, he will be respawned

but that also not worked…

Update: I restarted the whole server and it worked… It seems that a zombie process blocks the service from getting up again…

2 Likes