What is the CPU limit overwrite feature?

Harri · September 6, 2021, 3:57pm

The gitlab runner seems to support some environment variables KUBERNETES_CPU_LIMIT_OVERWRITE_MAX_ALLOWED, KUBERNETES_MEMORY_LIMIT_OVERWRITE_MAX_ALLOWED, KUBERNETES_CPU_REQUEST_OVERWRITE_MAX_ALLOWED and KUBERNETES_MEMORY_REQUEST_OVERWRITE_MAX_ALLOWED to set appropriate values in config.toml. But the documentation about what these features are really good for appears to be missing.

Is there some web page I missed to recognize? Every helpful link is highly appreciated.

Harri

dnsmichi · September 6, 2021, 6:48pm

Hi,

the environment variables are described at the bottom, whereas the config.toml settings are a little bit above in a table, with details on their meaning.

Cheers,
Michael

Harri · September 7, 2021, 8:12am

Sorry to say, but this is not really helpful. There are already variables for the limits of the build containers, e.g. KUBERNETES_CPU_LIMIT, so what would KUBERNETES_CPU_LIMIT_OVERWRITE_MAX_ALLOWED be good for, that could not be accomplished by setting KUBERNETES_CPU_LIMIT and KUBERNETES_CPU_REQUEST?

Your link to config.toml mentions a “cpu limit overwrite feature”, but it doesn’t explain. If you google for this string, then you find nothing but the phrase “When empty, it disables the cpu limit overwrite feature”. Obviously this is not sufficient.

dnsmichi · September 7, 2021, 11:38am

Hi,

sorry that my response is not helpful for you. I googled the variables myself landing in the docs, and thought the table above would be a good pointer since you did not mention it in your post.

Let’s dive a little deeper into the context from the docs.

KUBERNETES_CPU_LIMIT_OVERWRITE_MAX_ALLOWED
cpu_limit_overwrite_max_allowed

The max amount the CPU allocation can be written to for build containers. When empty, it disables the cpu limit overwrite feature.

Taking a step back, and following the Kubernetes docs on assigning CPU resources Assign CPU Resources to Containers and Pods | Kubernetes

If you do not specify a CPU limit

If you do not specify a CPU limit for a Container, then one of these situations applies:

The Container has no upper bound on the CPU resources it can use. The Container could use all of the CPU resources available on the Node where it is running.

The Container is running in a namespace that has a default CPU limit, and the Container is automatically assigned the default limit. Cluster administrators can use a LimitRange to specify a default value for the CPU limit.

With a CPU limit in place, the container still sees the available resources on the node. If the application is designed in a way that it may consume all available resources. That’s where the requests value comes to action.

The GitLab Runner and its Kubernetes Executor allow you to specify these limits in .gitlab-ci.yml as CI/CD variables, giving the user a way of defining the pod resources.

In order to prevent abuse or overcommitment leading to killed pods, the max allowed setting the runner’s config.toml ensures an administrative limit. This cannot be overridden by the user in .gitlab-ci.yml

The values for these variables are restricted to the max overwrite setting for that resource.

If you are interested in how it is implemented, the source code of gitlab-org/gitlab-runner provides more insights. There are comparison functions which ensure that the configured cpu, memory and ephemeral storage values are not bigger than the max overwrite values in config.toml.

evaluateMaxResourceListOverwrite executors/kubernetes/overwrites.go · 9e0ea6e109a842c17e94e5003a76dfb859d21212 · GitLab.org / gitlab-runner · GitLab
evaluateMaxResourceOverwrite(cpuFieldName, currentCPU, maxCPU, overwriteCPU, logger) executors/kubernetes/overwrites.go · 9e0ea6e109a842c17e94e5003a76dfb859d21212 · GitLab.org / gitlab-runner · GitLab
cmp := rOverwriteValue.Cmp(rMaxResource) executors/kubernetes/overwrites.go · 9e0ea6e109a842c17e94e5003a76dfb859d21212 · GitLab.org / gitlab-runner · GitLab
resource quantity Cmp() is a Kubernetes API provided function: https://github.com/kubernetes/apimachinery/blob/master/pkg/api/resource/quantity.go#L590

The implementation reminds me of how rlimit() is available in libc, specifying a max value which cannot be overridden from the user side. Limits on Resources (The GNU C Library)

The documentation for the executer is at docs/executors/kubernetes.md · main · GitLab.org / gitlab-runner · GitLab - if you see things to update/make more clear from this topic, please go ahead with a merge request

Cheers,
Michael

Harri · September 7, 2021, 12:33pm

Hi Michael,

thank you very much for your detailed response. I should have been able to find Overwriting Container Resources on my own.

If I got this correctly, the SOMETHING_LIMIT_OVERWRITE_MAX_ALLOWED and SOMETHING_REQUEST_OVERWRITE_MAX_ALLOWED tell by how much the user (writing ci.yaml) is allowed to ignore the default container limit/request settings defined either in the namespace, or in the config.toml file for the Kubernetes executor. Is this correct?

Regards
Harri

dnsmichi · September 16, 2021, 2:43pm

Hi,

no worries. I tend to read the source code if there is something I do not understand in full, or want to provide more insights in. I started to be as verbose as possible so anyone can learn how my brain works when doing research.

Given the initial question, the docs need clarifications where the research and discussion is helpful to determine what to change/add, e.g. the relationship between the limit value and max allowed in a more descriptive way.

SOMETHING_LIMIT_OVERWRITE_MAX_ALLOWED is the “administrator” setting for the runner, the max resources which can be reached by the user setting. If max allowed goes beyond what the Kubernetes namespace has configured, this may be saturated by Kubernetes itself.

The Kubernetes executor determines the limit as follows:

User specified a limit in .gitlab-ci.yml?
- No. Let Kubernetes determine the default values.
- Yes: Compare it with the max allowed value.
  - value > max_allowed => warning logged, cut the value to the limit defined in config.toml
  - value <= max_allowed => use the limit to assign resources

I’m not yet sure how to explain it better. Maybe something like this:

Kubernetes: Default resource values.
User: Can overwrite resource values in .gitlab-ci.yml
Admin: Specifies max allowed overwrite values in the runner configuration. If the user configures an exceeding value, it is cut to max_allowed.

Can you maybe think of an addition to the docs as a MR? That would be awesome

Cheers,
Michael

metanerd · November 30, 2021, 3:19pm

Hey @dnsmichi ! Thank you for your detailed answer! This really helped me better understand the *_LIMIT_OVERWRITE_MAX_ALLOWED.
I however still have not fully grasped what values I now should set. I think what would really help me, is an example.

I will just describe my use case:
I have 4 kubernetes nodes with m5.2xlarge machines à 8 vCPU, 32 GiB RAM.
I have mostly jobs running docker-compose with DIND with the kubernetes-executor.
The problem I currently face is that no job is set to pending. GitLab tells kubernetes to allocate the resources, but then kubernetes cannot spawn the pod and I run into this error:

Unschedulable: "0/14 nodes are available: 2 Insufficient memory, 4 Insufficient cpu, 4 node(s) had taint {dedicated: gitlabrunner-generic}, that the pod didn't tolerate, 6 node(s) didn't match Pod's node affinity."

(4 nodes are dedicated to the higher memory runners)

I am suspecting my values need to be tuned:

cpu_request = "1"
cpu_request_overwrite_max_allowed = ""

cpu_limit = "7"
cpu_limit_overwrite_max_allowed = ""

helper_cpu_request = ""
helper_cpu_request_overwrite_max_allowed = ""

helper_cpu_limit = "500m"
helper_cpu_limit_overwrite_max_allowed = ""

service_cpu_request = "1"
service_cpu_request_overwrite_max_allowed = ""

service_cpu_limit = "4"
service_cpu_limit_overwrite_max_allowed = ""


memory_request = "2Gi"
memory_request_overwrite_max_allowed = ""

memory_limit = "30Gi"
memory_limit_overwrite_max_allowed = ""

helper_memory_limit = "2Gi"
helper_memory_limit_overwrite_max_allowed = ""

helper_memory_request = ""
helper_memory_request_overwrite_max_allowed = ""

service_memory_request = "4Gi"
service_memory_request_overwrite_max_allowed = ""

service_memory_limit = "20Gi"
service_memory_limit_overwrite_max_allowed = ""

(I think service should be way higher, since I am running mostly docker-compose jobs there. )
What sane values would you suggest for my use case? Thank you so much!