Kubernetes Gitlab runner ignoring node affinity specification in runner-chart-values.yaml, assigning to node with invalid availability zone

As part of a gitlab CI/CD pipeline, a job should be scheduled and deployed on a Kubernetes pod that has an EBS volume attached to its node. The EBS volume is in us-east-1c. I have the following affinity section specified in my runner-chart-values.yaml:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: topology.kubernetes.io/zone
          operator: In
          values:
          - us-east-1c

which, as described here, means that the node must have an availability zone of us-east-1c. I know for a fact that there is exactly one node in the relevant EKS cluster that has an availability zone of us-east-1c.

When the gitlab runner executes, the job fails:

  Warning  FailedAttachVolume  1s  attachdetach-controller  AttachVolume.Attach 
failed for volume "<volume-name>" : rpc error: code = Internal desc = Could not attach volume "<volume-id>" to node "<instance-id>": could not attach volume 
"<volume-id>" to node "<instance-id>": InvalidVolume.ZoneMismatch: The volume '<volume-id>' is not in 
the same availability zone as instance '<instance-id>'

How could this be happening? I have verified that the node is available when the job runs, but even if that were not the case wouldn’t the scheduler wait until it was available (and possibly timeout) rather than use a node in the wrong availability zone?

It looks as if the EBS volume and EC2 instance are in separate zones. AWS requires them to be in the same. Can you double-check that the two are in the same zone?

Also I assume that you’re specifying an EBS to use persistent volumes, is that correct?

Yes, I can confirm that both the EBS volume and the EC2 instance are in the same zone, us-east-1c.

There are six nodes in the EKS cluster that the runner could potentially choose from. Exactly one of them is in the same AZ as my EBS volume, which is us-east-1c. The other five nodes are are different AZs.
The instance ID referred to in the error message is indeed in a different AZ, which explains the error. However, my question is why the scheduler is trying to attach to any node besides the one in the valid AZ, since I explicitly require that AZ in the node affinity.

And you are correct, I am specifying EBS to use persistent volumes.

Take a look at the node labels - there may be more than one EC2 that has the label it is looking for

Labels:
 ...
            topology.kubernetes.io/zone=us-east-1c
 ...

I checked using kubectl get nodes --show-labels and can confirm that there is only one EC2 with the label topology.kubernetes.io/zone=us-east-1c (which is the expected node).

GitLab Runner deployment in k8s consists of 2 types of Pods:

  • controller pod which runs the gitlab runner process and runs all the time, it’s responsible to process jobs and creates job pods
  • job pods are shortlived pods where the jobs are executed

I am afraid that you have configured affinity for the GitLab Runner controller pods. Affinity for Job pods must be defined in Runner config.

(example using node_selector, this is not a complete config)

runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        [runners.kubernetes.node_selector]
          "topology.kubernetes.io/zone" = "us-east-1c"

All options are available in docs

1 Like

Thank you! This seems to have done the trick.

That configuration specifically didn’t, but the concept of having my affinity specification in the wrong location was correct. Just for completeness, this was the relevant configuration I needed to add:

config: |
    [[runners]]
      [runners.kubernetes]
        [runners.kubernetes.affinity]
          [runners.kubernetes.affinity.node_affinity]
            [runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution]
              [[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms]]
                [[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
                  key = "topology.kubernetes.io/zone"
                  operator = "In"
                  values = [
                    "us-east-1c"
                  ]
2 Likes

Thank you very much! This was exactly the level of detail I needed to get this working! :clap: