Bug ? idle runners are still active after the IdleTimeout time

I have setup autoscaling with openstack on ovh with different idle time depending to the period and one max of 1800 seconds. But the runners as still active after 60s today. Any idea what’s wrong in the current config? is IdleTimeout the time in second?

I am using the following configuration:

concurrent = 20
check_interval = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "shared runners"
  limit = 10
  url = "https://gitlab.enki-multimedia.eu/"
  token = "token"
  executor = "docker+machine"
  environment = ["OS_PROJECT_NAME=infrastructure", "OS_IDENTITY_API_VERSION=3", "OS_USER_DOMAIN_NAME=Default"]
  [runners.cache]
    Type = "s3"
    Path = "/runners/cache"
    Shared = true
    [runners.cache.s3]
      ServerAddress = "https://s3.gra.cloud.ovh.net"
      AccessKey = "accesskey"
      SecretKey = "secretkey"
      BucketName = "gitlab-runners-cache"
  [runners.docker]
    tls_verify = false
    image = "alpine:latest"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0
  [runners.machine]
    IdleCount = 2
    IdleTime = 1800
    MaxBuilds = 1
    MachineDriver = "openstack"
    MachineName = "shared-runner-%s"
    MachineOptions = [
	"openstack-auth-url=https://auth.cloud.ovh.net/v3",
	"openstack-domain-name=Default", 
	"openstack-username=username", 
	"openstack-password=password", 
	"openstack-tenant-id=mytenant", 
	"openstack-flavor-name=s1-4", 
	"openstack-image-name=Debian 10", 
	"openstack-net-name=Ext-Net", 
	"openstack-ssh-user=debian", 
	"openstack-region=GRA1"
    ]
    [[runners.machine.autoscaling]]
      Periods = ["* * 9-17 * * mon-fri *"]
      IdleCount = 2
      IdleTime = 3600
      Timezone = "UTC"
    [[runners.machine.autoscaling]]
      Periods = ["* * * * * sat,sun *"]
      IdleCount = 2
      IdleTime = 60
      Timezone = "UTC"

Hi,

IdleTime are seconds, according to the docs example. I’m wondering though if 1 minute isn’t too low to allow shutdown/boot.

At my past employer we often had seen problems with OpenStack bringing up the machines in time. docker+machine is known for hanging and laggy responses.

Does the GitLab Runner log unveil any more issues? Often the runner themselves tries to shutdown the machines but then there is an OpenStack API failure or alike.

Cheers,
Michael

Thanks foryhr answer :slight_smile: . That maybe it indeed I opened a ticket at my provider about it to get more useful informations.

Where could i find such logs?

Is there another way to have good autoscaling?

Hi,

Depending on the installation, either in syslog or in a file. IIRC the default in Debian packages uses the syslog.

One immediate thing to test - set the IdleTimeout to 5 minutes and see if that changes something.

The docker+machine driver is known to be troublesome and also not developed anymore to my knowledge. In combination with OpenStack, this is likely your best bet I’d say. A friend of mine has been implementing a similar setup at NETWAYS, where I had blogged about it.

You might have read that GitLab 12.10 now supports AWS autoscaling natively within Fargate. Previously this setup also used docker+machine which has now been replaced with a custom driver using the custom executor.

I’m not sure if that’s possible with OpenStack too, or if plans go there. Since I see OVH in your config, it may be an idea to ask them too, maybe they have some experience in this region :slight_smile:

Peeking into the LXD driver example, this invokes some scripts to run the driver stages. That may be the first attempt to look into OpenStack then. For implementing a real driver in Golang, you probably need to mimic the AWS Fargate logic - this uses the AWS SDK for Golang. I’d love to dive deeper but too many open tasks :slight_smile:

Cheers,
Michael

2 Likes

Thanks! i didn’t know about this custom executor. That’s really a usefull feature. I will have a look if i can eventually provide a driver for ovh for it.

1 Like