Bug ? idle runners are still active after the IdleTimeout time

benoitc · May 10, 2020, 7:53pm

I have setup autoscaling with openstack on ovh with different idle time depending to the period and one max of 1800 seconds. But the runners as still active after 60s today. Any idea what’s wrong in the current config? is IdleTimeout the time in second?

I am using the following configuration:

concurrent = 20
check_interval = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "shared runners"
  limit = 10
  url = "https://gitlab.enki-multimedia.eu/"
  token = "token"
  executor = "docker+machine"
  environment = ["OS_PROJECT_NAME=infrastructure", "OS_IDENTITY_API_VERSION=3", "OS_USER_DOMAIN_NAME=Default"]
  [runners.cache]
    Type = "s3"
    Path = "/runners/cache"
    Shared = true
    [runners.cache.s3]
      ServerAddress = "https://s3.gra.cloud.ovh.net"
      AccessKey = "accesskey"
      SecretKey = "secretkey"
      BucketName = "gitlab-runners-cache"
  [runners.docker]
    tls_verify = false
    image = "alpine:latest"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0
  [runners.machine]
    IdleCount = 2
    IdleTime = 1800
    MaxBuilds = 1
    MachineDriver = "openstack"
    MachineName = "shared-runner-%s"
    MachineOptions = [
	"openstack-auth-url=https://auth.cloud.ovh.net/v3",
	"openstack-domain-name=Default", 
	"openstack-username=username", 
	"openstack-password=password", 
	"openstack-tenant-id=mytenant", 
	"openstack-flavor-name=s1-4", 
	"openstack-image-name=Debian 10", 
	"openstack-net-name=Ext-Net", 
	"openstack-ssh-user=debian", 
	"openstack-region=GRA1"
    ]
    [[runners.machine.autoscaling]]
      Periods = ["* * 9-17 * * mon-fri *"]
      IdleCount = 2
      IdleTime = 3600
      Timezone = "UTC"
    [[runners.machine.autoscaling]]
      Periods = ["* * * * * sat,sun *"]
      IdleCount = 2
      IdleTime = 60
      Timezone = "UTC"

dnsmichi · May 12, 2020, 6:15pm

Hi,

IdleTime are seconds, according to the docs example. I’m wondering though if 1 minute isn’t too low to allow shutdown/boot.

At my past employer we often had seen problems with OpenStack bringing up the machines in time. docker+machine is known for hanging and laggy responses.

Does the GitLab Runner log unveil any more issues? Often the runner themselves tries to shutdown the machines but then there is an OpenStack API failure or alike.

Cheers,
Michael

benoitc · May 13, 2020, 5:51am

Thanks foryhr answer . That maybe it indeed I opened a ticket at my provider about it to get more useful informations.

Where could i find such logs?

Is there another way to have good autoscaling?

dnsmichi · May 13, 2020, 11:35am

Hi,

Depending on the installation, either in syslog or in a file. IIRC the default in Debian packages uses the syslog.

One immediate thing to test - set the IdleTimeout to 5 minutes and see if that changes something.

The docker+machine driver is known to be troublesome and also not developed anymore to my knowledge. In combination with OpenStack, this is likely your best bet I’d say. A friend of mine has been implementing a similar setup at NETWAYS, where I had blogged about it.

You might have read that GitLab 12.10 now supports AWS autoscaling natively within Fargate. Previously this setup also used docker+machine which has now been replaced with a custom driver using the custom executor.

I’m not sure if that’s possible with OpenStack too, or if plans go there. Since I see OVH in your config, it may be an idea to ask them too, maybe they have some experience in this region

Peeking into the LXD driver example, this invokes some scripts to run the driver stages. That may be the first attempt to look into OpenStack then. For implementing a real driver in Golang, you probably need to mimic the AWS Fargate logic - this uses the AWS SDK for Golang. I’d love to dive deeper but too many open tasks

Cheers,
Michael

benoitc · May 13, 2020, 9:41pm

Thanks! i didn’t know about this custom executor. That’s really a usefull feature. I will have a look if i can eventually provide a driver for ovh for it.

Topic		Replies	Views
Gitlab Runner with docker+autoscaler: Pipelines are slow to be cancelled GitLab CI/CD runner , docker , pipelines	0	180	July 5, 2024
Confirmation of autoscaling runner behavior when spot instances are unavailable GitLab CI/CD	0	270	June 3, 2022
Docker machine would give up to generate new CloudStack runners Self-managed runner , docker	0	425	September 6, 2019
Docker-in-Docker runner resource management GitLab CI/CD	0	200	June 18, 2024
GitLab Runner with docker-autoscaler not reusing available cache volumes GitLab CI/CD ci , runner , docker , pipelines	6	278	July 31, 2024

Bug ? idle runners are still active after the IdleTimeout time

Related topics