Spawning AWS EC2 instances with a custom AMI

GitLab version: 14.3.1-ee
GitLab runner version: 14.2.0

Hi,

I’m running GitLab on AWS. The GitLab runner is currently configured to spawn new AWS EC2 instances to run CICD jobs, and this works well. Here’s the working /etc/gitlab-runner/config.toml file:

concurrent = 5
check_interval = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "<redacted>"
  limit = 1
  output_limit = 51200
  url = "<redacted>"
  token = "<redacted"
  executor = "docker+machine"
  environment = ["DOCKER_TLS_CERTDIR="]
  [runners.custom_build_dir]
  [runners.docker]
    tls_verify = false
    image = "alpine:latest"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0
  [runners.cache]
    Type = "s3"
    Shared = true
    [runners.cache.s3]
      ServerAddress = "s3.amazonaws.com"
      AccessKey = "<redacted>"
      SecretKey = "<redacted>"
      BucketName = "<redacted>"
      BucketLocation = "eu-central-1"
    [runners.cache.gcs]
  [runners.machine]
    IdleCount = 0
    IdleTime = 1800
    MaxBuilds = 10
    MachineDriver = "amazonec2"
    MachineName = "gitlab-docker-machine-%s"
    MachineOptions = ["amazonec2-access-key=<redacted", "amazonec2-secret-key=<redacted>", "amazonec2-iam-instance-profile=<redacted>", "amazonec2-region=eu-central-1", "amazonec2-zone=b", "amazonec2-vpc-id=<redacted>", "amazonec2-subnet-id=<redacted", "amazonec2-use-private-address=true", "amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true", "amazonec2-instance-type=t3.xlarge"]
    OffPeakPeriods = ["* * 0-7,18-23 * * mon-fri *", "* * * * * sat,sun *"]
    OffPeakTimezone = ""
    OffPeakIdleCount = 0
    OffPeakIdleTime = 1200
  [runners.custom]
    run_exec = ""

Now, I would like to use a a custom AMI image to spawn our EC2 instances. By default, GitLab uses Ubuntu 16.04, but my custom AMI includes Ubuntu 20.04. My custom AMI has docker installed and the amazon-ecr-credential-helper is available on its $PATH. Furthermore, the docker config file ~/.docker/config.json has been modified to make use of amazon-ecr-credential-helper. Other than that, no modifications have been made.

To run my custom AMI, I have modified /etc/gitlab-runner/config.toml by adapting the MachineOptions like so:

MachineOptions = ["amazonec2-ami=ami-0caXXXXXXXXXXXXXX"]

When I run a new CI job, I see in the AWS console that a new EC2 instance with the given AMI attempts to start. I have also verified that the machine is available via SSH for a brief moment. But for some reason, after a few moments, the instance is not available via SSH anymore, and AWS shows that the instance is stuck at state “initializing”. On the gitlab runner instance, I see the following in journalctl:

Oct 15 15:07:17 my-hostname gitlab-runner[584]: Running pre-create checks...                        driver=amazonec2 name=runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx operation=create
Oct 15 15:07:17 my-hostname gitlab-runner[584]: Creating machine...                                 driver=amazonec2 name=runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx operation=create
Oct 15 15:07:17 my-hostname gitlab-runner[584]: (runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx) Launching instance...  driver=amazonec2 name=runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx operation=create
Oct 15 15:07:25 my-hostname gitlab-runner[584]: Waiting for machine to be running, this may take a few minutes...  driver=amazonec2 name=runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx operation=create
Oct 15 15:07:25 my-hostname gitlab-runner[584]: Detecting operating system of created instance...   driver=amazonec2 name=runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx operation=create
Oct 15 15:07:25 my-hostname gitlab-runner[584]: Waiting for SSH to be available...                  driver=amazonec2 name=runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx operation=create
Oct 15 15:07:48 my-hostname gitlab-runner[584]: Detecting the provisioner...                        driver=amazonec2 name=runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx operation=create
Oct 15 15:07:49 my-hostname gitlab-runner[584]: Provisioning with ubuntu(systemd)...                driver=amazonec2 name=runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx operation=create
Oct 15 15:07:59 my-hostname gitlab-runner[584]: Installing Docker...                                driver=amazonec2 name=runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx operation=create
Oct 15 15:08:02 my-hostname gitlab-runner[584]: Copying certs to the local machine directory...     driver=amazonec2 name=runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx operation=create
Oct 15 15:08:02 my-hostname gitlab-runner[584]: Machine "runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx" was stopped.  name=runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx operation=stop
Oct 15 15:08:02 my-hostname gitlab-runner[584]: WARNING: Problem while reading command output       error=read |0: file already closed
Oct 15 15:08:02 my-hostname gitlab-runner[584]: WARNING: Removing machine                           lifetime=2m24.566758926s name=runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx reason=Failed to create used=46.305707048s usedCount=0
Oct 15 15:08:02 my-hostname gitlab-runner[584]: About to remove runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx  name=runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx operation=remove
Oct 15 15:08:02 my-hostname gitlab-runner[584]: WARNING: This action will delete both local reference and remote instance.  name=runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx operation=remove
Oct 15 15:08:02 my-hostname gitlab-runner[584]: Successfully removed runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx  name=runner-xxxxxxxx-gitlab-docker-machine-xxxxxxxxxx-xxxxxxxx operation=remove

I haven’t seen anything useful in the AWS console, the system log just remains blank. I’m not sure how to investigate this further, I haven’t found much documentation on how to use custom AMIs on GitLab. So, any advice is appreciated.

In case you’re wondering why I want to use a custom AMI in the first place: I need the amazon-ecr-credential-helper available on the machine, and I haven’t found any other way to achieve this. I want to use docker services in my CI and I want to fetch the images for those services from AWS ECR instead of GitLab’s registry.