Configuring autoscaling for on demand runners

hudsterboy · June 3, 2024, 9:37pm

Problem to solve

What are best practices for configuring the ASG ‘scale in’ rules for on-demand runners? Our runners scale out fine when they hit the CPU threshold. The problem is that sometimes one of those instances will pick up a job near the end of its timeout threshold, then terminate the job. We’re using CPU as the scale in metric, and have it set to something ridiculously low like '2% cpu utilization and it still terminates the job. Is there a better way to do this so that we don’t get terminated jobs or do we need to go to static instances?

What are you seeing, and how does that differ from what you expect to see?
Seeing jobs terminate that run on instances that have been created via scale out and have timed out. I’d expect the instance to persist until the job stopped running.

Which troubleshooting steps have you already taken? Can you link to any docs or other resources so we know where you have been?

Other than extending the instance timeout and lowering the CPU scale in threshold, nothing much.

Add the infrastructure-as-code or cloud-native configuration relevant to the question.
Not really relevant as this is more about configuring the ASG. But here’s what we’re using for Scale in/Scale out parameters:

5ASGAutoScalingMetricTypeToMonitor	CPU	-
5ASGAutoScalingSetScaleInUtilizationThreshold	2	-
5ASGAutoScalingSetScaleInUtilizationThresholdSeconds	600	-
5ASGAutoScalingSetScaleOutUtilizationThreshold	50	-
5ASGAutoScalingSetScaleOutUtilizationThresholdSeconds	300	-

Versions

Self-managed
GitLab.com SaaS
Self-hosted Runners

Versions
gitlab-runner 17.0.0

Topic		Replies	Views
Confirmation of autoscaling runner behavior when spot instances are unavailable GitLab CI/CD	0	274	June 3, 2022
Kubernetes Runner Autoscaling Issues GitLab CI/CD	11	4333	December 2, 2022
Best practice for using GitLab runner autoscaling on AWS Self-managed	0	360	July 31, 2020
I want to autoscale my gitlab runners (kubernetes and helm) How to Use GitLab kubernetes , helm , eks	1	1657	September 20, 2022
Docker machine would give up to generate new CloudStack runners Self-managed runner , docker	0	425	September 6, 2019

Configuring autoscaling for on demand runners

Problem to solve

Versions

Related topics