I’m using AWS EKS to run GitLab CI jobs. The GitLab Kubernetes runner is running on the core k8s nodes and I’m using Karpenter to launch EC2 instances on demand for each CI job and then kill the instances off afterwards.
Since the jobs each run in their own isolated EC2 instance, I’ve had the “privileged” flag set in the runner configuration in order to allow docker-in-docker to be used when needed by the CI jobs. For example, here is a snippet:
runners: name: "arm64-runner" privileged: true tags: "arm64-runner,aarch64-runner" config: | [[runners]] [runners.kubernetes] namespace = "gitlab-karpenter-space" image = "ubuntu:20.04"
On October 21, we upgraded GitLab from 15.11.13 to 16.3.5. I’ve recently discovered that, after that upgrade (and the corresponding runner upgrade to 16.3.3) docker-in-docker doesn’t work as it used to. We get errors like this:
failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables v1.8.9 (legacy): can’t initialize iptables table `nat’: Permission denied (you must be root)
The solution I’ve found is to use Kyverno to reconfigure the CI job pod on the fly so that this clause:
securityContext: privileged: true
is added to the pod’s definition.
Is this an expected change in behaviour for the Kubernetes runner? Or have I been misconfiguring my runners and the upgrade changed something that has exposed that misconfiguration?