Getting Gitlab CI to act as pilot for provisioning within my own AWS VPC

I have found when compiling CPU / RAM intensive things, that often CI tools I have used in the past have died because of resource constraints.

First off, is there an existing tool which does what I describe in the next paragraph? I would like to know about it and explore its use.

Last year I needed to get something done quickly so I put together some reusable terraform scripts in order to provision from my own VPC. Find the repo with more detailed docs here. This way my Amazon IAM role could govern which services I was entitled to use, etc. My first go-to was Packer, however it was annoying because it didn’t have an option NOT to create an AMI at the end. All I want is for an instance to spin up in my VPC, to compile some things, send some things to S3 and ECR, doing the processing on the instance type I choose and not some underpowered CI instance.

This did work out for me at the time using another CI tool. It still does work using the docker-compose file inside the repo. But it does not work with Gitlab CI. Is there come constraint where Gitlab is not allowed to establish an ssh connection outside?

I will detail the process if anyone would like to try reproduce.

First of all, this is an example .gitlab-ci.yml:

image:
  name: hashicorp/terraform:0.12.5
  entrypoint: [""]
build:
  script:
    - wget https://raw.githubusercontent.com/dnk8n/remote-provisioner/master/src/terraform/terraform.aws.main.tf
    - terraform init
    - terraform apply -auto-approve=true -var-file=${TERRAFORM_VAR_FILE} || JOB_STATUS=$?
    - terraform destroy -auto-approve=true -var-file=${TERRAFORM_VAR_FILE}
    - exit ${JOB_STATUS:-0}

Here are example variables set:

Type: Variable AWS_ACCESS_KEY_ID = changeme # Mask this value
Type: Variable AWS_SECRET_ACCESS_KEY = changeme # Mask this value
Type: File: TERRAFORM_VAR_FILE

file_or_dir_source = "docker/"  # This depends on your repo, change to a folder that exists
file_or_dir_dest = "/tmp/"
remote_command = [  # build.sh exists in $file_or_dir_source above
    "chmod +x /tmp/build.sh",
    "/tmp/build.sh"
]
ami_owners = ["self"]
ami_name_regex = "base-ubuntu.*"  # You will need to change this
timeout_minutes = "10"
region = "eu-west-1"

So this setup is intended to spin up some temporary infrastructure in your own AWS VPC, send a directory of artifacts to it, run some commands, destroy all infrastructure once artifacts, etc are saved.

An example CI log. Seems the ssh connection does not work:

Running with gitlab-runner 12.1.0 (***)
  on docker-auto-scale ***
Using Docker executor with image hashicorp/terraform:0.12.5 ...
Pulling docker image hashicorp/terraform:0.12.5 ...
Using docker image sha256:8c0a09d4cd38cf1f71e5c7cba3165eafbe3c7c78038d850f90e5b20b3b9fcf4b for hashicorp/terraform:0.12.5 ...
Running on runner-***-concurrent-0 via runner-***...
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/***/***/.git/
Created fresh repository.
From https://gitlab.com/***
 * [new branch]      poc_ci     -> origin/poc_ci
Checking out *** as poc_ci...

Skipping Git submodules setup
$ wget https://raw.githubusercontent.com/dnk8n/remote-provisioner/master/src/terraform/terraform.aws.main.tf
Connecting to raw.githubusercontent.com (151.101.0.133:443)
terraform.aws.main.t 100% |********************************|  3797  0:00:00 ETA
$ terraform init

Initializing the backend...

Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "aws" (terraform-providers/aws) 2.21.1...
- Downloading plugin for provider "http" (terraform-providers/http) 1.1.1...
- Downloading plugin for provider "tls" (terraform-providers/tls) 2.0.1...

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
$ terraform apply -auto-approve=true -var-file=${TERRAFORM_VAR_FILE} || JOB_STATUS=$?
data.http.my_public_ip: Refreshing state...
data.aws_vpc.provisioner: Refreshing state...
data.aws_ami.provisioner: Refreshing state...
tls_private_key.provisioner: Creating...
aws_security_group.provisioner: Creating...
aws_security_group.provisioner: Creation complete after 3s [id=***]
aws_security_group_rule.allow_ssh: Creating...
tls_private_key.provisioner: Creation complete after 6s [id=***]
aws_key_pair.provisioner: Creating...
aws_security_group_rule.allow_ssh: Creation complete after 2s [id=***]
aws_key_pair.provisioner: Creation complete after 1s [id=***]
aws_instance.provisioner: Creating...
aws_instance.provisioner: Still creating... [10s elapsed]
aws_instance.provisioner: Provisioning with 'file'...
aws_instance.provisioner: Still creating... [20s elapsed]
aws_instance.provisioner: Still creating... [30s elapsed]
aws_instance.provisioner: Still creating... [40s elapsed]
aws_instance.provisioner: Still creating... [50s elapsed]
aws_instance.provisioner: Still creating... [1m0s elapsed]
aws_instance.provisioner: Still creating... [1m10s elapsed]
aws_instance.provisioner: Still creating... [1m20s elapsed]
aws_instance.provisioner: Still creating... [1m30s elapsed]
aws_instance.provisioner: Still creating... [1m40s elapsed]
aws_instance.provisioner: Still creating... [1m50s elapsed]
aws_instance.provisioner: Still creating... [2m0s elapsed]
aws_instance.provisioner: Still creating... [2m10s elapsed]
aws_instance.provisioner: Still creating... [2m20s elapsed]
aws_instance.provisioner: Still creating... [2m30s elapsed]
aws_instance.provisioner: Still creating... [2m40s elapsed]
aws_instance.provisioner: Still creating... [2m50s elapsed]
aws_instance.provisioner: Still creating... [3m0s elapsed]
aws_instance.provisioner: Still creating... [3m10s elapsed]
aws_instance.provisioner: Still creating... [3m20s elapsed]
aws_instance.provisioner: Still creating... [3m30s elapsed]
aws_instance.provisioner: Still creating... [3m40s elapsed]
aws_instance.provisioner: Still creating... [3m50s elapsed]
aws_instance.provisioner: Still creating... [4m0s elapsed]
aws_instance.provisioner: Still creating... [4m10s elapsed]
aws_instance.provisioner: Still creating... [4m20s elapsed]
aws_instance.provisioner: Still creating... [4m30s elapsed]
aws_instance.provisioner: Still creating... [4m40s elapsed]
aws_instance.provisioner: Still creating... [4m50s elapsed]
aws_instance.provisioner: Still creating... [5m0s elapsed]
aws_instance.provisioner: Still creating... [5m10s elapsed]


Error: timeout - last error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain


$ terraform destroy -auto-approve=true -var-file=${TERRAFORM_VAR_FILE}
data.http.my_public_ip: Refreshing state...
tls_private_key.provisioner: Refreshing state... [id=***]
data.aws_vpc.provisioner: Refreshing state...
data.aws_ami.provisioner: Refreshing state...
aws_key_pair.provisioner: Refreshing state... [id=***]
aws_security_group.provisioner: Refreshing state... [id=***]
aws_instance.provisioner: Refreshing state... [id=***]
aws_security_group_rule.allow_ssh: Refreshing state... [id=***]
aws_instance.provisioner: Destroying... [id=***]
aws_security_group_rule.allow_ssh: Destroying... [id=***]
aws_security_group_rule.allow_ssh: Destruction complete after 1s
aws_instance.provisioner: Still destroying... [id=***, 10s elapsed]
aws_instance.provisioner: Still destroying... [id=***, 20s elapsed]
aws_instance.provisioner: Destruction complete after 21s
aws_key_pair.provisioner: Destroying... [id=***]
aws_security_group.provisioner: Destroying... [id=***]
aws_key_pair.provisioner: Destruction complete after 1s
tls_private_key.provisioner: Destroying... [id=***]
tls_private_key.provisioner: Destruction complete after 0s
aws_security_group.provisioner: Destruction complete after 1s

Destroy complete! Resources: 5 destroyed.
$ exit ${JOB_STATUS:-0}
ERROR: Job failed: exit code 1

It is my own mistake. I forgot that I had to update the default variable of ssh_user from ec2-user to ubuntu in my case!