Using amazonec2 causes 'RequestLimitExceeded' in logs and results in many dangling EC2 runners

When we have a sudden influx of simultaneous jobs arrive the gitlab runner log results in many examples of the following,

# cat /var/log/messages | grep runner-e38880da-GENERIC-GITLAB-RUNNER-1521685562-1e5ae6de
Mar 22 02:26:50 ip-XX-XX-XX-XX gitlab-runner[8892]: time="2018-03-22T02:26:50Z" level=error msg="Error creating machine: Error in driver during machine creation: Error fulfilling spot request: RequestLimitExceeded: Request limit exceeded." driver=amazonec2 name=runner-e38880da-GENERIC-GITLAB-RUNNER-1521685562-1e5ae6de operation=create 
Mar 22 02:26:50 ip-XX-XX-XX-XX gitlab-runner[8892]: time="2018-03-22T02:26:50Z" level=error msg="\tstatus code: 503, request id: 7e961753-27f6-4e74-ad09-275a90965437" driver=amazonec2 name=runner-e38880da-GENERIC-GITLAB-RUNNER-1521685562-1e5ae6de operation=create 
Mar 22 02:26:50 ip-XX-XX-XX-XX gitlab-runner[8892]: time="2018-03-22T02:26:50Z" level=warning msg="Machine creation failed, trying to provision" error="exit status 1" name=runner-e38880da-GENERIC-GITLAB-RUNNER-1521685562-1e5ae6de 
Mar 22 02:34:59 ip-XX-XX-XX-XX gitlab-runner[8892]: time="2018-03-22T02:34:59Z" level=error msg="Too many retries waiting for SSH to be available.  Last error: Maximum number of retries (60) exceeded" name=runner-e38880da-GENERIC-GITLAB-RUNNER-1521685562-1e5ae6de operation=provision 
Mar 22 02:34:59 ip-XX-XX-XX-XX gitlab-runner[8892]: time="2018-03-22T02:34:59Z" level=warning msg="Machine creation failed, trying to provision" error="exit status 1" name=runner-e38880da-GENERIC-GITLAB-RUNNER-1521685562-1e5ae6de 
Mar 22 02:42:33 ip-XX-XX-XX-XX gitlab-runner[8892]: time="2018-03-22T02:42:33Z" level=error msg="Too many retries waiting for SSH to be available.  Last error: Maximum number of retries (60) exceeded" name=runner-e38880da-GENERIC-GITLAB-RUNNER-1521685562-1e5ae6de operation=provision 
Mar 22 02:42:33 ip-XX-XX-XX-XX gitlab-runner[8892]: time="2018-03-22T02:42:33Z" level=warning msg="Machine creation failed, trying to provision" error="exit status 1" name=runner-e38880da-GENERIC-GITLAB-RUNNER-1521685562-1e5ae6de 
Mar 22 02:49:40 ip-XX-XX-XX-XX gitlab-runner[8892]: time="2018-03-22T02:49:40Z" level=error msg="Too many retries waiting for SSH to be available.  Last error: Maximum number of retries (60) exceeded" name=runner-e38880da-GENERIC-GITLAB-RUNNER-1521685562-1e5ae6de operation=provision 
Mar 22 02:49:40 ip-XX-XX-XX-XX gitlab-runner[8892]: time="2018-03-22T02:49:40Z" level=error msg="Machine creation failed" error="exit status 1" name=runner-e38880da-GENERIC-GITLAB-RUNNER-1521685562-1e5ae6de time=23m38.690378944s 
Mar 22 02:49:40 ip-XX-XX-XX-XX gitlab-runner[8892]: time="2018-03-22T02:49:40Z" level=warning msg="Requesting machine removal" created=23m38.697092175s name=runner-e38880da-GENERIC-GITLAB-RUNNER-1521685562-1e5ae6de now=2018-03-22 02:49:40.764547245 +0000 UTC m=+33390.022668205 reason="Failed to create" used=23m38.697092305s usedCount=0 
Mar 22 02:49:40 ip-XX-XX-XX-XX gitlab-runner[8892]: time="2018-03-22T02:49:40Z" level=warning msg="Stopping machine" created=23m38.721113948s name=runner-e38880da-GENERIC-GITLAB-RUNNER-1521685562-1e5ae6de reason="Failed to create" used=23.967887ms usedCount=0 
Mar 22 02:50:09 ip-XX-XX-XX-XX gitlab-runner[8892]: time="2018-03-22T02:50:09Z" level=error msg="InvalidInstanceID.Malformed: Invalid id: \"\" (expecting \"i-...\")" name=runner-e38880da-GENERIC-GITLAB-RUNNER-1521685562-1e5ae6de operation=stop 
Mar 22 02:50:09 ip-XX-XX-XX-XX gitlab-runner[8892]: time="2018-03-22T02:50:09Z" level=error msg="\tstatus code: 400, request id: 5bf77278-4c1c-468a-baed-719d67a29d2f" name=runner-e38880da-GENERIC-GITLAB-RUNNER-1521685562-1e5ae6de operation=stop 
Mar 22 02:50:09 ip-XX-XX-XX-XX gitlab-runner[8892]: time="2018-03-22T02:50:09Z" level=warning msg="Error while stopping machine" created=24m7.499980435s error="exit status 1" name=runner-e38880da-GENERIC-GITLAB-RUNNER-1521685562-1e5ae6de reason="Failed to create" used=28.802834351s usedCount=0 
Mar 22 02:50:09 ip-XX-XX-XX-XX gitlab-runner[8892]: time="2018-03-22T02:50:09Z" level=warning msg="Removing machine" created=24m7.500049734s name=runner-e38880da-GENERIC-GITLAB-RUNNER-1521685562-1e5ae6de reason="Failed to create" used=28.802903507s usedCount=0 

When this happens we seem to get many orphaned EC2 runner instances which never terminate unless done manually. The orphaned instances are characterised by lacking any tags which are configured to be set in config.toml

I suspect that gitlab runner instigates the creation of many EC2 instances at which point the EC2 API hits the Request Limit. After which Docker Machine cannot complete the creation process and instances are left dangling.

# gitlab-runner --version
Version:      10.6.0-rc1
Git revision: 0a9d5de9
Git branch:   10-6-stable
GO version:   go1.9.4
Built:        2018-03-08T09:39:52+00:00
OS/Arch:      linux/386
# docker-machine version
docker-machine version 0.14.0, build 89b8332
# docker --version
Docker version 17.12.0-ce, build 3dfb8343b139d6342acfd9975d7f1068b5b1c3d3

In the release notes for docker-machine version 0.14.0, they state

amazonec2
Upon failure, the create command now ensures dangling resources are cleaned up before exiting

See

I don’t know if this has any bearing on the issue or not.