Docker machine would give up to generate new CloudStack runners

How comes that GitLab gives up on generating new runners?

We use an autoscaler setup with:

  • Docker machine 0.16.1
  • Apache CloudStack driver

One autoscaler works fine. But in the other region the following occurs; with 4 runners it works more or less okay but above we run into a “give-up” situation.

  • One or two runners come up; in our setup, we allow one runner to execute one job in row only.
  • What we observe then, are timeouts either problems with TLS certificate. On the CloudStack side, VMs come and go. Though, our impression is that the sync is quite bad, far beyond the normal network lag. Also, the autoscaler VM has enough free resources.
  • For some time, GitLab seems to tell the Docker Machine to re-spawn runners.
  • Finally, it seems that no more new runners are created.

If we restart the master process on the autoscaler machine, the whole game repeat once over again.

Which algorithm is behind this behavior and how to assess the situation? What is the recommended fine-tuning with the Docker Machine / Apache CloudStack driver context?