Failing to start Docker runners

Hi,

We have an issue where Gitlab Runner manager is failing to start runners with an “Error running provisioning ssh command error”, then it spits out the certificate in the log, and also says “no such file or directory” for /etc/docker/ca.pem. It suddenly started happening after happily running for a long time

What could be the reason and how could it be resolved?

6 Likes

Same here! Our runners have been up for months, then they collapsed a few hours ago with that same failure.

I’ll let you know if I figure it out, but for now: you’re not alone.

1 Like

Hi, I have the same issue, our runners were file until a few hours ago they start failing with the same error as you described. For now have no idea what’s happened

1 Like

It’s an issue with the docker machine v 23 : Docker 23 entirely breaks docker+machine executor (#29593) · Issues · GitLab.org / gitlab-runner · GitLab

This fix has worked for me (from this issue: [Docker runner] Machine creation failed ssh command error (#29594) · Issues · GitLab.org / gitlab-runner · GitLab:
Workaround: create file startup-script.sh with content:

#!/bin/bash
mkdir -p /etc/docker

append path of file to config.toml as startup script

[[runners]]
  [runners.machine]
    MachineOptions = [... , "amazonec2-userdata=<path>/startup-script.sh"]
9 Likes

Thank you, it worked for me as well :slightly_smiling_face:

1 Like

Same issue here. Started about 20 hrs ago. The workaround is effective.

1 Like

I believe this merge request - Fix provisioning for Docker 23+ for some of the provisioners (!102) · Merge requests · GitLab.org / Ops Sub-Department / docker-machine · GitLab and hopefully this release v0.16.2-gitlab.19 · GitLab.org / Ops Sub-Department / docker-machine · GitLab resolves the issue

2 Likes

New release, from @carrchr comment above, resolved the issue for me.

Hi any one is facing this issue , i tried the above /etc/docker it didnt work for me .can any one help me with this.
gitlab-runner status and docker status are fine and config file all looks good , i tried to install another instance and installed still having the same issue

We are now facing a different issue, similar to what is described here: Docker Machine failed to create EC2 spot instance: InvalidParameter: 'instance' is not a valid taggable resource type for this operation (#29213) · Issues · GitLab.org / gitlab-runner · GitLab
We have to use the workaround this is provided by Justin Sleep in a comment thread on that issue: Docker Machine failed to create EC2 spot instance: InvalidParameter: 'instance' is not a valid taggable resource type for this operation (#29213) · Issues · GitLab.org / gitlab-runner · GitLab
I have a ticket open with GitLab Support.

@vijay.prathipati1 Did you add this line to MachineOptions: amazonec2-userdata=<path>/startup-script.sh?

i tried that as well , but no luck

Did you change the in the script with the actual file path? Did you restart the runner after changing the config? Basically you need to create the file, reference it in the config.toml and then restart the runner.
If that still doesn’t work, you can log on to the runner if you have access to it and see if it’s failing with the same error or it’s something else.