Task fails to start with "remote error: tls: unknown certificate authority"

Problem to solve

Today I started getting the following errors in the logs when running tasks on my local gitlab-runner instance connected to gitlab.com:

Jun 27 15:17:07 vesho docker[379590]: ERROR: Job failed (system failure):
   error during connect: Get "https://184.72.163.13:2376/v1.24/info":
   remote error: tls: unknown certificate authority (docker.go:958:1s)
   duration_s=109.616854734 job=7202449094 project=10176871 runner=KkF5hGxd

Then in the GitLab task log I have this:

Running with gitlab-runner 17.1.0 (fe451d5a)
  on vesho-autoscaler-public KkF5hGxd, system ID: r_Hn18yAGIixfO
Preparing the "docker+machine" executor
01:14
ERROR: Failed to remove network for build
ERROR: Preparation failed: error during connect: Get "https://184.72.163.13:2376/v1.24/info": remote error: tls: unknown certificate authority (docker.go:958:1s)
Will be retried in 3s ...
ERROR: Failed to remove network for build
ERROR: Preparation failed: error during connect: Get "https://54.224.151.177:2376/v1.24/info": remote error: tls: unknown certificate authority (docker.go:958:1s)
Will be retried in 3s ...
ERROR: Failed to remove network for build
ERROR: Preparation failed: error during connect: Get "https://54.224.151.177:2376/v1.24/info": remote error: tls: unknown certificate authority (docker.go:958:1s)
Will be retried in 3s ...
ERROR: Job failed (system failure): error during connect: Get "https://54.224.151.177:2376/v1.24/info": remote error: tls: unknown certificate authority (docker.go:958:1s)

The runner configuration is as follows:

[[runners]]
  name = "vesho-autoscaler-public"
  url = "https://gitlab.com/"
  id = XXXXXXX
  token = "XXXXXXX"
  token_obtained_at = 2022-11-10T15:40:28Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "docker+machine"
  limit = 3
  [runners.cache]
    MaxUploadedArchiveSize = 0
  [runners.docker]
    host = "tcp://docker:2375"
    tls_verify = false
    image = "docker:latest"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    cache_dir = "/cache"
    extra_hosts = ["docker:172.17.0.2"]
    shm_size = 0
    services_limit = 1
  [runners.machine]
    IdleCount = 0
    IdleScaleFactor = 0.0
    IdleCountMin = 0
    IdleTime = 1800
    MaxBuilds = 10
    MachineDriver = "amazonec2"
    MachineName = "cimachine-%s"
    MachineOptions = [
      "engine-opt=bip=172.17.0.1/24",
      "amazonec2-ami=ami-0b9a603c10937a61b",
      "amazonec2-access-key=XXXXXXX",
      "amazonec2-secret-key=XXXXXXX",
      "amazonec2-region=us-east-1",
      "amazonec2-vpc-id=vpc-XXXXXXX",
      "amazonec2-subnet-id=subnet-XXXXXXX",
      "amazonec2-zone=a",
      "amazonec2-use-private-address=false",
      "amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true",
      "amazonec2-security-group=gitlab-ci-runner",
      "amazonec2-instance-type=c6a.2xlarge",
      "amazonec2-root-size=40",
      "amazonec2-request-spot-instance=false",
      "amazonec2-spot-price="
    ]

This worked well earlier this week (I’m not sure when was the previous build, but it was this week).

Versions

Please select whether options apply, and add the version information.

  • Self-managed
  • GitLab.com SaaS
  • Self-hosted Runners

Versions

  • GitLab: gitlab.com
  • GitLab Runner, if self-hosted: I’m running the official docker images, which I pulled today trying to work around the problem and it reports itself in the logs like so:
Runtime platform arch=amd64 os=linux pid=7 revision=fe451d5a version=17.1.0
1 Like

The addresses that are reported to have unknown certificate authority are the public IP addresses of the EC2 instances used by the runner’s docker-machine. 2376 is the docker port, IIRC, and the certificates set there are self-signed certificates. So this look like a docker-machine misconfiguration?

I have tls_verify set to false and also have the host set to non-TLS connection, so I don’t understand why I get this error.

There was an update to the docker-dind image a couple of days back (version 27.0.2), so I tried to revert by setting the runners.docker image to a previous version - Reverting to 26 (aka 26.1.4) did not fix this nor 27.0.1, but with docker:27.0.0-dnid I did get the old working behavior back.

Interestingly, while the configuration has host = "tcp://docker:2375", the log has this:

Jun 27 19:02:47 vesho docker[923766]: Using existing docker-machine created=2024-06-27 15:54:29.28651096 +0000 UTC m=+4.582039409 docker=tcp://3.92.55.129:2376 job=7205050194 name=runner-kkf5hgxd-gitlab-XXXXX-1719503669-4b1d2ca7 now=2024-06-27 16:02:47.944534527 +0000 UTC m=+503.240062947 project=10176871 runner=KkF5hGxd usedcount=6

So where does the docker=tcp://3.92.55.129:2376 part come from, and can I change it in the configuration?

I was apparently too hasty. Tasks are no failing even with the runners.docker image set to the older DIND docker image.

I have the same problem since yesterday. Setting is a gitlab-manager runner in AWS creating on demand EC2 instances for jobs via Gitlabs docker-machine fork.

@odeda : Did you find a solution?

I had TLS activated until now. Docker-machine installs Docker version 27.0.3 at the runner machines and according to cli/docs/deprecated.md at v27.0.2 · docker/cli · GitHub tlsverify=false is not working anymore.

It looks like DOCKER_TLS_CERTDIR is ignored. At least when I ssh into the runner machines docker is running and the only certificates I can find are located in /etc/docker.

Annoyingly - stopping and starting the gitlab-runner service sometimes works, as in it will run some jobs, then will lose the connection to gitlab-com (I’m running the service on my laptop, which sometimes sleeps and after waking the runner gets 502 errors from gitlab-com), then when I restart I will continue to get TLS verification errors. A couple of stop/starts later - it starts running things again.

I initially thought that I can select an image that will work, but the change of image doesn’t seem to have an affect on this otherwise completely arbitrary behavior.

The tls_verify parameter is still documented in the Gitlab Runner docker runner configuration even though it obviously doesn’t work, and neither is the host parameter that I have in my config (though that isn’t actually currently listed in the docs). :person_shrugging:

The runner log says things like:

Creating CA: /root/.docker/machine/certs/ca.pem
Creating client certificate: /root/.docker/machine/certs/cert.pem

So I added tls_cert_path = "/root/.docker/machine/certs" to the [runners.docker] section and right now jobs complete - but as I’ve noted above, this may not mean anything because Gitlab Runner is arbitrary and capricious. :face_exhaling:

1 Like

I got the same issue again today, no configuration change on my side.

I’ve noticed that I’m running (hard-coded) Gitlab Runner docker image for ubuntu-v17.0.0 and the current version (released last week) is 17.2.0. I updated my runner docker image, restarted, and now it works fine again.

⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣠⠴⠒⠋⠉⠉⠙⠒⠦⣄⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡰⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⢳⡀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣸⢁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢧⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣿⡇⡕⢢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡸⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠃⡇⡇⢸⢦⠤⠤⠤⠄⠀⠀⠀⠀⢀⡇⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⢺⠘⠁⠇⢸⠀⡆⠀⠀⠀⠀⠀⠀⠀⡼⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⡽⠀⠀⠀⠘⠀⡇⠀⠀⠀⠀⠀⠀⣼⡁⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⣠⠼⣇⠀⠀⠀⠀⠀⡏⠉⠑⠀⠀⣠⠞⡇⠙⢦⡀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⣠⠊⠁⠀⠀⠙⢦⠀⠀⢀⡇⠀⣀⡠⠞⣁⠜⠁⠀⠀⠙⢦⠀⠀
⠀⠀⠀⠀⢠⠞⠁⠀⠀⠀⠀⠀⠈⡗⠚⠉⠉⠛⠓⠒⠉⠁⠀⠀⠀⠀⠀⠀⠳⡄
⠀⠀⢀⡴⠁⠀⠀⠀⠀⠀⠀⢀⡴⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸
⠀⢠⠎⠀⠀⠀⠀⠀⠀⢀⠔⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸
⢠⠏⠀⠀⠀⠀⠀⢀⣴⠃⠀⠢⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢐⠀⠀⠀⠀⠀⠘
⡞⠀⠀⠀⠀⠀⠔⠁⣀⣠⠴⠒⠚⢳⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⡇⠀⠀⠀⠀⢀
⠹⢤⣀⣀⠤⠴⠒⠋⠁⠀⠀⠀⠀⢸⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⣇⠀⠀⠀⠀⢸

Having issues with this also, docker started to stop working with an infinite timeout. So the solution is to downgrade the gitlab runner to 17.0.0 or downgrade docker version to 26.x on the host machine? In my case I was using semantic-release-docker and it just hangs the pipeline forever, maybe because it is unable to connect to the docker tcp service.

We have started to get the same error today. We were on docker 26.1.3 and gitlab runner 17.0.0 and getting the error. Have updated to docker 27.3.1 and gitlab runner 17.5.3, but still same issue. Has anyone resolved this or raised with gitlab support?

I’m currently running gitlab runner 17.2.0 on Podman 4.9.3 with an EC2 auto-scale executor - and it has been working well for a while now. I’m loath to update the runner image as there’s always the fear that this will break everything. Though my impression is that this problem isn’t about specific versions that are problematic or incompatible with something - something that recurs from time to time and could be some kind of race condition or misalignment.

I have updated to runner 17.6.0, and using the EC2 autoscale executor (docker+machine; MachineDriver: amazonec2), with the image docker:27.0.3-dind it so far works well. This is my runner configuration:

[[runners]]
  name = "name-of-runner"
  url = "https://gitlab.com/"
  id = 0
  token = "MYTOKEN"
  token_obtained_at = 0001-01-01T00:00:00Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "docker+machine"
  [runners.cache]
    MaxUploadedArchiveSize = 0
  [runners.docker]
    host = "tcp://docker:2375"
    tls_verify = false
    tls_cert_path = "/root/.docker/machine/certs"
    image = "docker:27.0.3-dind"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    cache_dir = "/cache"
    extra_hosts = ["docker:172.17.0.2"]
    shm_size = 0
    services_limit = 1
  [runners.machine]
    IdleCount = 0
    IdleScaleFactor = 0.0
    IdleCountMin = 0
    IdleTime = 1800
    MaxBuilds = 10
    MachineDriver = "amazonec2"
    MachineName = "name-of-runner-%s"
    MachineOptions = [
      "engine-opt=bip=172.17.0.1/24",
      "amazonec2-ami=ami-04a81a99f5ec58529",
      "amazonec2-access-key=AKIABCDEFGHIJKLMNOP",
      "amazonec2-secret-key=deadbeefdeadbeefdeadbeef",
      "amazonec2-region=us-east-1",
      "amazonec2-vpc-id=vpc-abcd1234",
      "amazonec2-subnet-id=subnet-abcd1234",
      "amazonec2-zone=a",
      "amazonec2-use-private-address=false",
      "amazonec2-tags=tag-a,val-a,tag-b,val-b,tag-c,val-c",
      "amazonec2-security-group=gitlab-ci-runner",
      "amazonec2-instance-type=c6a.2xlarge",
      "amazonec2-root-size=40",
      "amazonec2-request-spot-instance=false",
      "amazonec2-spot-price="
    ]