As part of a gitlab CI/CD pipeline, a job should be scheduled and deployed on a Kubernetes pod that has an EBS volume attached to its node. The EBS volume is in us-east-1c. I have the following affinity section specified in my runner-chart-values.yaml:
which, as described here, means that the node must have an availability zone of us-east-1c. I know for a fact that there is exactly one node in the relevant EKS cluster that has an availability zone of us-east-1c.
When the gitlab runner executes, the job fails:
Warning FailedAttachVolume 1s attachdetach-controller AttachVolume.Attach
failed for volume "<volume-name>" : rpc error: code = Internal desc = Could not attach volume "<volume-id>" to node "<instance-id>": could not attach volume
"<volume-id>" to node "<instance-id>": InvalidVolume.ZoneMismatch: The volume '<volume-id>' is not in
the same availability zone as instance '<instance-id>'
How could this be happening? I have verified that the node is available when the job runs, but even if that were not the case wouldn’t the scheduler wait until it was available (and possibly timeout) rather than use a node in the wrong availability zone?
It looks as if the EBS volume and EC2 instance are in separate zones. AWS requires them to be in the same. Can you double-check that the two are in the same zone?
Also I assume that you’re specifying an EBS to use persistent volumes, is that correct?
Yes, I can confirm that both the EBS volume and the EC2 instance are in the same zone, us-east-1c.
There are six nodes in the EKS cluster that the runner could potentially choose from. Exactly one of them is in the same AZ as my EBS volume, which is us-east-1c. The other five nodes are are different AZs.
The instance ID referred to in the error message is indeed in a different AZ, which explains the error. However, my question is why the scheduler is trying to attach to any node besides the one in the valid AZ, since I explicitly require that AZ in the node affinity.
And you are correct, I am specifying EBS to use persistent volumes.
I checked using kubectl get nodes --show-labels and can confirm that there is only one EC2 with the label topology.kubernetes.io/zone=us-east-1c (which is the expected node).
That configuration specifically didn’t, but the concept of having my affinity specification in the wrong location was correct. Just for completeness, this was the relevant configuration I needed to add: