Using Gitlab (Ultimate licences) at work connected to on-premises runners. As part of risk and compliance requirements our runners have been identified as single points of failure.
We use Podman as the engine for executing our containers and in recent times had runners offline due to a bug which filled the tmpfs filesystem of the Podman user (non-root execution) - this has now been resolved by RedHat. However, due to this issue we are looking at ways to ensure scheduled tasks do not fail in the event of a single runner being offline.
If we were to implement HA, at a high level I believe we would:
- Deploy an additional Linux VM
- Ensure that Podman is installed
- Register this new VM as a runner with GitLab
Note: we do not have any dedicated container platforms in place (i.e. Kubernetes or Docker Swarm) at this time, just traditional VM infrastructure.
Questions:
-
Can GitLab runners operate in a HA model? At a high-level I believe they can through load balancing - though many posts suggest it doesn’t work that well. i.e. second runner doesn’t do anything until primary is 100% utilised. Though possibly this depends on the methods used - load balancing vs round robin?
-
Are there any better ways? I don’t believe we have a huge number of pipelines or jobs so any sort of autoscaling doesn’t seem required. Simply want to add some redundancy to what we currently do.