Config changes in config.toml file do not reflect even after gitlab runner restart

config changes in config.toml file do not reflect even after gitlab runner restart

I have a created a new gitlab runner which works fine.
However, it does not access the GPU in the host machine. Therefore I added the below entry in the config.toml file:
[runners.docker]
gpus = “all”

I restarted my gitlab runner but still the runner can not access the GPUs.

However, when I run the docker image directly with the runner and with the cli argument “–docker-gpus all”, the runner works as expected.

I hope to receive a solution soon. Many thanks in advance.

Please post your complete config.toml in a code block.

1 Like

I am running the runner with root user.
Here is my /etc/gitlab-runner/config.toml:

concurrent = 1
check_interval = 0
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = <runner-name>
  url = <url>
  id = 972
  token = "XXXXXXXXXXXXXXXX"
  token_obtained_at = 2023-08-08T13:59:34Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "docker"
  [runners.cache]
    MaxUploadedArchiveSize = 0
  [runners.docker]
    tls_verify = false
    image = <image-path>
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0
    gpus = "all"

I have masked out some of the confidential parts of the config.

My .gitlab-ci.yml:

stages:
  - build

build-test:
  stage: build
  image: nvcr.io/nvidia/cuda:12.1.1-base-ubuntu22.04
  script:
    - nvidia-smi

Here are the relevant screenshots:
Error message when GPU enabled via config.toml:

Job success when GPU enabled via CLI:

Check that nvidia-container-runtime-hook is accessible and executable on the host by the user under which GitLab Runner is running (default user is gitlab-runner).

runuser -u gitlab-runner -- which nvidia-container-runtime-hook

You could also run some command like sleep 3600 before the failing nvidia-smi and running docker inspect on the container in both cases to see if the GPU support is indeed enabled or not.

1 Like

I did try your suggestions and here is the outputs:

runuser -u gitlab-runner -- which nvidia-container-runtime-hook

/usr/bin/nvidia-container-runtime-hook

docker inspect <container-id>

gitlab runner with GPU access enabled via config.toml:
“HostConfig”: {
“DeviceRequests”: null,
}

gitlab runner with GPU access enabled via CLI:
“HostConfig”: {
“DeviceRequests”: [
{
“Driver”: “”,
“Count”: -1,
“DeviceIDs”: null,
“Capabilities”: [
[
“gpu”
]
],
“Options”: {}
}
],
}

What could be the next step to resolve these differences?

At this point I suggest to raise a Bug in official issue tracker here: Issues · GitLab.org / gitlab-runner · GitLab