I am trying to enable gitlab-runner with gpu
I can see that docker is installed correctly.
$ sudo docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
Fri Oct 15 15:28:28 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 25% 29C P8 8W / 250W | 16MiB / 11176MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
config.toml is like this
[[runners]]
name = "nvidia_test"
url = "https://gitlab.com/"
token = ""
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
gpus = "all"
tls_verify = false
runtime = "nvidia"
image = "nvidia/cuda:9.0-base"
devices = ["/dev/nvidiactl", "/dev/nvidia-uvm", "/dev/nvidia-uvm-tools", "/dev/nvidia0"]
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
pull_policy = ["if-not-present", "always"]
shm_size = 0
I’ve tried gpus, runtime options seperately.
I always see this error during CI with the gitlab-runner
$ nvidia-smi
Error relocating /usr/bin/nvidia-smi: __strtok_r: symbol not found
Error relocating /usr/bin/nvidia-smi: __strdup: symbol not found
Cleaning up file based variables
ERROR: Job failed: exit code 127
Can anybody help me?