config changes in config.toml file do not reflect even after gitlab runner restart
I have a created a new gitlab runner which works fine.
However, it does not access the GPU in the host machine. Therefore I added the below entry in the config.toml file:
[runners.docker]
gpus = “all”
I restarted my gitlab runner but still the runner can not access the GPUs.
However, when I run the docker image directly with the runner and with the cli argument “–docker-gpus all”, the runner works as expected.
I hope to receive a solution soon. Many thanks in advance.
Please post your complete config.toml
in a code block.
1 Like
I am running the runner with root user.
Here is my /etc/gitlab-runner/config.toml:
concurrent = 1
check_interval = 0
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
name = <runner-name>
url = <url>
id = 972
token = "XXXXXXXXXXXXXXXX"
token_obtained_at = 2023-08-08T13:59:34Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "docker"
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.docker]
tls_verify = false
image = <image-path>
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
gpus = "all"
I have masked out some of the confidential parts of the config.
My .gitlab-ci.yml:
stages:
- build
build-test:
stage: build
image: nvcr.io/nvidia/cuda:12.1.1-base-ubuntu22.04
script:
- nvidia-smi
Here are the relevant screenshots:
Error message when GPU enabled via config.toml:
Job success when GPU enabled via CLI:
Check that nvidia-container-runtime-hook
is accessible and executable on the host by the user under which GitLab Runner is running (default user is gitlab-runner
).
runuser -u gitlab-runner -- which nvidia-container-runtime-hook
You could also run some command like sleep 3600
before the failing nvidia-smi
and running docker inspect
on the container in both cases to see if the GPU support is indeed enabled or not.
1 Like
I did try your suggestions and here is the outputs:
runuser -u gitlab-runner -- which nvidia-container-runtime-hook
/usr/bin/nvidia-container-runtime-hook
docker inspect <container-id>
gitlab runner with GPU access enabled via config.toml:
“HostConfig”: {
“DeviceRequests”: null,
}
gitlab runner with GPU access enabled via CLI:
“HostConfig”: {
“DeviceRequests”: [
{
“Driver”: “”,
“Count”: -1,
“DeviceIDs”: null,
“Capabilities”: [
[
“gpu”
]
],
“Options”: {}
}
],
}
What could be the next step to resolve these differences?
At this point I suggest to raise a Bug in official issue tracker here: Issues · GitLab.org / gitlab-runner · GitLab