When does a runner get displayed as offline?

zovitsa · November 22, 2024, 3:47pm

Problem to solve

After setting up my first runner in a container, I set out to transform it into a managed service, which involved stopping the runner process. After it was stopped as a standalone process but before it was started as a service (i.e. it was not running), I checked the UI of Gitlab, which happily reported the runner as online, with last contact up to an hour ago. I would have expected the Gitlab instance to recognize the loss of the connection to the runner process and mark it as offline accordingly within a reasonable time frame (say about 5-10 minutes max).

Searching the web for this only returned inverse cases, where a runner was supposed to be online but was shown as offline, so I’m asking here: why didn’t Gitlab mark the runner as offline? Was it due to the time elapsed being too low to be registered as a lost connection? Was it because of a mistake I made when shutting down the runner? What or where should I check in order to learn more about this topic?

Steps to reproduce

Set up a runner
Start the runner in a podman container and register it (podman run gitlab/gitlab-runner --url https://some.local.url --token glrt-secret-token, podman run gitlab/gitlab-runner)
Verify that the running container results in Gitlab correctly displaying the runner as online
Stop the container (podman stop gitlab-runner)
Check the runner in the Gitlab interface

Expected result: Gitlab says the runner is offline after at most a few minutes.
Observed result: Gitlab says the runner is online, even after an hour.

Versions

Self-managed, GitLab Enterprise Edition v17.0.2-ee
GitLab.com SaaS
Dedicated

r.weires · December 4, 2024, 11:02am

Same issue here, on Self-managed GitLab CE v17.6.1

We noticed that one of our servers with gitlab-runner is gone for more than 1 hour now (physical linux-server that seems fully frozen, can’t even be ping’d anymore), but the runner still appears online in GitLab - and the job that was running on that host is also still in ‘running’ state, seemingly still continuing to tail the logs of that job.

I’m also very interested in knowing what knobs there are to control / tune this, to make the GitLab server recognize and handle such situations in a defined time-frame of some minutes - and especially, to then also treat “hanging” jobs as failed accordingly.

Our runners are configured with a very high max. job-timeout (more than 1 day) due to long-running jobs. I suppose that passing this threshold would also make GitLab recogize failed runners eventually, but surely there should be some other mechanism to detect that a runner is gone outside of those bounds…?

Topic		Replies	Views
Runner is offline, last contact x hours ago (Docker runners) GitLab CI/CD ci , runner , docker	0	7844	July 23, 2021
All Runners are no more online while in contact after update 16.11 GitLab CI/CD runner	6	1274	April 25, 2024
Gitlab runner offline status GitLab CI/CD	0	1064	May 29, 2024
Runners paused even when they are running GitLab CI/CD	3	4144	August 5, 2017
Connection time out from gitlab GitLab CI/CD runner	1	2942	October 29, 2021

When does a runner get displayed as offline?

Problem to solve

Steps to reproduce

Versions

Related topics