[SOLVED] Autoscale (docker+machine) throwing "bad certificate"

Currently, we’re using Gitlab CI with Gitlab Runner configured for autoscale on Digital Ocean.

This week we decided to migrate to AWS.
To do this, we followed this tutorial:
https://docs.gitlab.com/runner/configuration/runner_autoscale_aws/

The runner seems to be picking the jobs accordingly and is even able to raise the instances and running ssh into them. But when it tries to talk to the remote docker, a tls error happens.

Here is what is happening:

Waiting for SSH to be available...    
Detecting the provisioner...
Provisioning with coreOS...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker... 

ERROR: Error creating machine: Error checking the host: Error checking and/or regenerating the certs: There was an error validating certificates for host "<remote-machine-ip>": remote error: tls: bad certificate#033

It’s worth noticing that I’m able to use docker-machine create to manually raise machines and run docker commands inside them without any trouble.

config.toml contents

concurrent = 5
check_interval = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "bastiao-aws"
  url = "https://gitlab.com"
  token = "<token>"
  executor = "docker+machine"
  [runners.custom_build_dir]
  [runners.docker]
    tls_verify = false
    image = "alpine:latest"
    privileged = true
    disable_cache = false
  [runners.cache]
    Type = "s3"
    Shared = true
    [runners.cache.s3]
      ServerAddress = "s3.amazonaws.com"
      AccessKey = "<access-key>"
      SecretKey = "<secret-key>"
      BucketName = "genebra-gitlab-ci"
      BucketLocation = "us-east-2"
      Insecure = true
  [runners.machine]
    IdleCount = 0
    IdleTime = 180
    MachineDriver = "amazonec2"
    MachineName = "gitlab-docker-machine-%s"
    MachineOptions = [
      "amazonec2-access-key=<acces-key>",
      "amazonec2-secret-key=<secret-key>",
      "amazonec2-region=us-east-2",
      "amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true\
",
      "amazonec2-vpc-id=<vpc-id>",
      "amazonec2-subnet-id=<subnet-id>",
      "amazonec2-use-private-address",
      "amazonec2-zone=b",
      "amazonec2-security-group=default",
      "amazonec2-instance-type=t3a.large",
      "amazonec2-ami=ami-0c6f750453c5ea69b",
      "amazonec2-ssh-user=core",
      "amazonec2-device-name=/dev/xvda"
    ]

It’s solved.
The problem was related to gitlab-runner being executed as root and the docker-machine certificates had previously been generated without root.

To solve it, I logged in a shell as root, removed the certs located in ~/.docker/machine/certs and ran a docker-machine create to manually create a machine (and recreate the removed certs as root). This regenerated the certificated with the correct user root. This solved the issue.

1 Like

Worked for me with a very similar issue. Thanks for posting.