GitLab Runner Web terminal : Connection failure

Hi everyone,

I currently try to enable gitlab-runner Terminal using [session_server] settings.

While my CI job is running, when I click on “Debug”, I get a “Connection failure” in the black web terminal.

If i run the web inspector, I get a 500 error on terminal.ws without much detail "Firefox cannot establish the connection with wss://…/terminal.ws.

when I curl on https://runner.dev.home:8093, I get a SSL Certificate problem: Unable to get local issuer certificate. But when I look at the doc, it says that x509 self signed certificate is created on runner start so it seems normal.

I also used openssl s_client -connect runner.dev.home:8093 and it says “unable to verify the first certificate” but as it’s a self signed certificate by gitlab runner, again is seems OK.
Each time I restart the gitlab-runner, certificate is changing (as the doc says).

journalctl -f -u gitlab-runner says :

If anyone have an idea about where I should start looking, I’ll be glad to hear :pray:

Context

  • Ubuntu 20.04 15.1.0 gitlab-runner with docker executor on bare metal : runner.dev.home;
  • GitLab 15.1.2 self hosted on Debian 10 VM: gitlab.dev.home : https://gitlab.dev.home;
  • Windows PKI (ADCS) : pki.dev.home;
  • Client : Ubuntu focal, Firefox 98.
  • I’m not using reverse proxy for gitlab.dev.home.
  • I used this doc : (Interactive Web Terminals | GitLab)
  • Every certificate is signed by my PKI and each OS has the CA certificate in his trusted CA store (i can curl/firefox https://gitlab.dev.home without warning from client and runner)
  • gitlab-runner register is OK with my certificate without CA bypass

Config abstracts

My gitlab-runner toml contains is the following lines:

# /etc/gitlab-runner/config.toml
[session_server]
  listen_address = "0.0.0.0:8093"
  advertise_address = "runner.dev.home:8093"

So… after I restarted several times gitlab-runner service, it seems that it now works but I don’t know why.

Here is my final config.toml if someone find this useful.

concurrent = 40
check_interval = 0
#log_level = "debug"

[session_server]
  session_timeout = 1800
  listen_address = "0.0.0.0:8093"
  advertise_address = "runner.dev.home:8093"

[[runners]]
  name = "runner.dev.home"
  url = "https://gitlab.dev.home"
  token = "THATS_NOT_MY_TOKEN"
  executor = "docker"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.docker]
    gpus = "all"
    tls_verify = false
    image = "debian"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache", "/etc/ssl/certs:/etc/ssl/certs:ro"]
    pull_policy = "if-not-present"
    shm_size = 0

I did the same test with gitlab.com and a runner on a VM in DMZ. I used my public IP as advertise_address and 0.0.0.0:8093 as listen_address.

And I have the same problem: Connection failure…

Does anyone have an idea about the problem behind the “Connection failure” message ?

I can provide more details on this setup but it’s pretty much the same as on my air gaped network.

My ISP box NAT of port 8093 did not worked. So my problem was linked to this…

My public IP, WAN side : PUB.LIC.IPA.DDR should be accessible from gitlab.com.

Here is my working configuration for gitlab runner 15.1.0 on Debian 10.

listen_address = ":9252" # gives access to http://*:9252/metrics for prometheus
concurrent = 4
check_interval = 0

[session_server]
  listen_address = ":8093"
  advertise_address = "PUB.LIC.IPA.DDR:8093" # this should be your public IP address accessible from gitlab.com but don't worry if not accessible from your local computer who access gitlab.com web terminal
  session_timeout = 1800

[[runners]]
  name = "buster"
  url = "https://gitlab.com/"
  token = "NoT-MyTok3n"
  executor = "docker"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.docker]
    tls_verify = false
    image = "debian"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0

Had the same issue. Solution was to use 0.0.0.0:8093, not 127.0.0.1. It is also important to specify the advertise_address as the hostname of the machine where the runner is running (my-machine-name:8093).