DinD in docker-machine autoscale runners don't want to connect to port 2376

I’m stuck last few days with a TLS problem. I’ve read all related dind TLS problems on this forum but nothing works completely.

My runner is hosted on a Ubuntu 18.04 server and use the Docker’s Official docker-machine (starting VM on a VMware infrastructure).

Here is my actual config.toml file :

check_interval = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "AutoScale Runner #1"
  limit = 10
  url = "https://GITLAB_SERVER/"
  token = "TOKEN"
  executor = "docker+machine"
  environment = ["DOCKER_TLS_CERTDIR=/certs","DOCKER_DRIVER=overlay2","DOCKER_HOST=tcp://docker:2376/"]
  [runners.custom_build_dir]
  [runners.docker]
    tls_verify = false
    image = "alpine:stable"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    cache_dir = "/cache"
    volumes = ["/certs/client", "/cache"]
    shm_size = 0
  [runners.cache]
    Type = "s3"
    Shared = true
    [runners.cache.s3]
      ...
  [runners.machine]
    ...

I’ve tried lot of combinations (with ou without DOCKER_HOST, with or without /certs between quotes, …) but nothing works. The only working way is to disable TLS with an empty DOCKER_TLS_CERTDIR but this way implies waiting service timeout in each job…

If I want to enable TLS I’ve got always the same error :

$ docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $CI_REGISTRY
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning...

Login Succeeded
$ docker build -t ${CI_REGISTRY_IMAGE}:${CI_COMMIT_REF_SLUG} .
    time="2019-12-19T10:46:44Z" level=error msg="failed to dial gRPC: cannot connect to the Docker daemon. Is 'docker daemon' running on this host?: dial tcp 172.17.0.2:2375: connect: connection refused"
    error during connect: Post http://docker:2375/v1.40/build?buildargs=%7B%7D&cachefrom=%5B%5D&cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=Dockerfile&labels=%7B%7D&memory=0&memswap=0&networkmode=default&rm=1&session=rqv94svkg50k5nqk8ae404d7c&shmsize=0&t=GITLABSERVER%3A5555%2FGRP%2FSUBGROUP%2FPROJECT%3Amaster&target=&ulimits=null&version=1: context canceled
    ERROR: Job failed: exit code 1

Why docker never want to connect to port 2376 when using TLS ?

If somebody have any clues…

Thanks,
BM

Kinda surfing the same wave.

The only difference is Im setting network_mode to a custom network rather than the default docker bridge. Also the containers are not privileged. Tried with docker 18.x and 19.x also but in vain.

The other difference is that your docker try to connect to good port. Mine never try 2376 port…

1 Like

Ever make any progress on this. I’m having the same issue, but it’s random. Doesn’t happen every time a new docker+machine instance spins up.