Hi,
I’ve regular upgrade our GitLab installation (on bare metal using debian with gitlab package repository).
But during the upgrade to 15.10.1 the KAS stop to work. I’ve checked the upgrade notice before and saw, that they added a backward compatible change to the kas (using subdomain, but disabled default). I also upgraded my gitlab.rb to the latest template, but it didn’t helped.
my gitlab agents in k8s told me
{"level":"error","time":"2023-04-04T06:45:26.338Z","msg":"Error handling a connection","mod_name":"reverse_tunnel","error":"Connect(): rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: failed to WebSocket dial: expected handshake response status code 101 but got 502\""}
while try to restart the kas lead to a timeout:
gitlab-ctl restart gitlab-kas
timeout: run: gitlab-kas: (pid 4090336) 1361087s, got TERM
The logs for gitlab-kas are empty:
gitlab-ctl tail gitlab-kas
==> /var/log/gitlab/gitlab-kas/state <==
==> /var/log/gitlab/gitlab-kas/current <==
i’m a little bit out ouf ideas…
In Addition: Gitlab shows in all “Kubernetes Clusters” pages the message:
An error occurred while loading your agents
I’ve also checked if the service is still running, and he does:
netstat -tulpn | grep 8151
tcp 0 0 127.0.0.1:8151 0.0.0.0:* LISTEN 4090336/gitlab-kas
if i kill the service, he will be respawned
but that also not worked…
Update: I restarted the whole server and it worked… It seems that a zombie process blocks the service from getting up again…