Unable to use Nvidia GPU with GitLab runner but works when running containers manually - same executor

alatan · October 30, 2025, 4:04am

Problem to solve

Self-managed GitLab runner always reporting that no CUDA is available. Sometimes even reporting that no Nvidia Driver and Nvidia Container Toolkit detected.

Steps to reproduce

Install and configure Podman
Install Nvidia Container Toolkit
If not automatically manually generate CDI specification and make sure that the correct driver is selected
Verify and validate that Podman has access to Nvidia GPUs and can execute CUDA code (e.g. the official PyTorch image or the base Nvidia CUDA image)
Install and configure GitLab runner with Docker as executor
Create a simple CI pipeline that runs the same code that was used to verify and validate the GPU access in Podman
Based on the image and CUDA version selected observe
- (always) no CUDA detected even if Nvidia driver is
```
import torch`
print(torch.cuda.is_available())
```
- (sometimes) no Nvidia driver detected
  
  Using effective pull policy of [if-not-present] for container nvidia/cuda:13.0.1-cudnn-devel-ubuntu24.04
  Using docker image sha256:92a047cf48371393d2d27c9a696f3afd7548b1b39e27d0696e2ec18c22e41ccc for nvidia/cuda:13.0.1-cudnn-devel-ubuntu24.04 with digest docker.io/nvidia/cuda@sha256:5a2d3b02eb7412847d051d0f2b0f0a5031057a0172d9ca78743cc41cfc5d037f …
  ==========
  == CUDA ==
  ==========
  CUDA Version 13.0.1
  Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  This container image and its contents are governed by the NVIDIA Deep Learning Container License.
  By pulling and using the container, you accept the terms and conditions of this license:
  https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
  A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
  WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
  Use the NVIDIA Container Toolkit to start this container with GPU support; see
  NVIDIA Cloud Native Technologies - NVIDIA Docs .

Configuration

Output from nvidia-ctk cdi list (upon request full CDI YAML can be provided)

INFO[0000] Found 5 CDI devices                          
nvidia.com/gpu=0
nvidia.com/gpu=1
nvidia.com/gpu=GPU-2aff26da-3664-9eeb-13ba-b78397cace6f
nvidia.com/gpu=GPU-66878602-8286-6421-1ec4-8d097b71be4e
nvidia.com/gpu=all

GitLab runner’s configuration TOML file

concurrent = 8
check_interval = 0
connection_max_age = "15m0s"
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "gpu-runner"
  url = "https://XXXXXXXXXXXx"
  id = 45874
  token = "YYYYYYYYYYYYYYYYYYYYYYYY"
  token_obtained_at = 2025-10-29T15:29:11Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "docker"
  [runners.cache]
    MaxUploadedArchiveSize = 0
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.docker]
    host = "unix:///run/podman/podman.sock"
    tls_verify = false
    image = "ubuntu24.04"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0
    network_mtu = 0
#    devices = [
#      "nvidia.com/gpu=0",
#      "nvidia.com/gpu=1",
#    ]
#    environment = [
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.10.0/user-guide.html#gpu- enumeration
#      "NVIDIA_VISIBLE_DEVICES=all",
#      "NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all",
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.10.0/user-guide.html#driver-capabilities
#      "NVIDIA_DRIVER_CAPABILITIES=all"
#    ]
    gpus = "all"
    service_gpus = "all"
    allowed_pull_policies = ["always", "if-not-present"]

Here you can also see my futile attempts to adjust the environment variables of the executor (based on Docker’s documentation, which is references by the gpus option for the GitLab runner even though I am using Podman

.gitlab-ci.yml

stages:
 - check
check_cuda:
  stage: check
  tags:
    - gpu
    - ml
    - linux
  image: 
    name: docker.io/pytorch/pytorch:2.9.0-cuda12.8-cudnn9-runtime
    pull_policy: if-not-present
  script:
    - python pytorch_check.py

Here I also tried adding the environment variables NVIDIA_VISIBLE_DEVICES: "nvidia.com/gpu=all" (also tried with just all) and NVIDIA_DRIVER_CAPABILITIES: "all".

Python script pytorch_check.py used for checking whether CUDA is detected or not
```
import torch
print(torch.cuda.is_available())
```

Versions

Please select whether options apply, and add the version information.

Self-managed
GitLab.com SaaS
Dedicated
Self-hosted Runners

Versions

Ubuntu Server 24.04 with kernel 6.8.0-1024-oracle (based on output from uname -a)
Nvidia driver 580.82.07 (based on output from nvidia-smi)
CUDA 13.0 (based on output from nvidia-smi)
Podman 4.9.3 (official Ubuntu package, unable to upgrade to 5.x)

Nvidia Container Toolkit

NVIDIA Container Toolkit CLI version 1.18.0
commit: f8daa5e26de9fd7eb79259040b6dd5a52060048c

GitLab 18.4

GitLab Runner, if self-hosted

Version:      18.5.0
Git revision: bda84871
Git branch:   18-5-stable
GO version:   go1.24.6 X:cacheprog
Built:        2025-10-13T19:20:30Z
OS/Arch:      linux/amd64

Local check (no runner)

Use
```
podman run -it --device nvidia.com/gpu=0 --security-opt=label=disable docker.io/pytorch/pytorch:2.9.0-cuda12.8-cudnn9-runtime /bin/bash
```
to launch container with PyTorch image (note that currently there is no CUDA 13.0 image for PyTorch but based on how Nvidia drivers and CUDA runtime work, executing code for 12.8 works on 13.0 as well)

Obtain container ID

CONTAINER ID  IMAGE                                                    COMMAND     CREATED       STATUS       PORTS       NAMES
115c92bb4d3a  docker.io/pytorch/pytorch:2.9.0-cuda12.8-cudnn9-runtime  /bin/bash   12 hours ago  Up 12 hours              jolly_solomon

and copy PyTorch test script to container

podman cp pytorch_cuda.py 115c92bb4d3a:/workspace/

Execute PyTorch test script inside container and check output

podman attach 115c92bb4d3a

root@115c92bb4d3a:/workspace# ls
pytorch_cuda.py
root@115c92bb4d3a:/workspace# python pytorch_cuda.py 
CUDA available: True

Documentation

Nvidia Container Toolkit (Overview — NVIDIA Container Toolkit)
Podman with GPU (GPU container access | Podman Desktop)
Podman and Nvidia Container Toolkit (Installing Podman and the NVIDIA Container Toolkit — NVIDIA AI Enterprise: Red Hat Enterprise Linux With KVM Deployment Guide)
GitLab runner with GPU (Using Graphical Processing Units (GPUs) | GitLab Docs)

ase2356 · October 31, 2025, 10:34am

looks like gitlab runner isn’t passing gpu devices from podman. try setting runner to privileged mode and add nvidia env vars like nvidia_visible_devices=all. if it still fails, switch to docker socket instead of podman, that usually fixes cuda detection.

alatan · October 31, 2025, 6:07pm

Thank you for the reply, @ase2356. I thought of that too but then I checked with any normal user (beside gitlab-runner) and I could use Podman with GPU passthrough with no issues. I did what you suggested though and changed priveleged to true but it made no difference. I tried also both versions of the env var NVIDIA_VISIBLE_DEVICES - all (compatible with Docker CLI) and nvidia.com/gpu=all (compatible with the actual Podman call I would do in the terminal when launching a container manually and not through the runner) before and after the change in the priveleged state. You can see my attempts as comments inside the TOML code snippet in my original question.

As for switching to a Docker socket I don’t know how to do that so if you can lend a hand, I would appreciate it.

I have also seen numerous times people using the Podman socket service in user mode, that is systemctl enable --user --now podman.socket, instead of system mode:

systemctl status podman.socket

● podman.socket - Podman API Socket
     Loaded: loaded (/usr/lib/systemd/system/podman.socket; enabled; preset: enabled)
     Active: active (listening) since Tue 2025-10-28 10:05:57 CET; 3 days ago
   Triggers: ● podman.service
       Docs: man:podman-system-service(1)
     Listen: /run/podman/podman.sock (Stream)
     CGroup: /system.slice/podman.socket

Oct 28 10:05:57 gin-vm-gpu systemd[1]: Listening on podman.socket - Podman API Socket.

which should make the socket available to all users. Accordingly people, who use the user mode for the socket put a different path for it, namely unix:///run/user/<gitlab-runner-uid>/podman.socket. The documentation on GitLab runners is rather confusing in this regard.

alatan · November 4, 2025, 1:44pm

I have conducted further investigation. A colleague of mine gave an advice that I should put CUDA + cuDNN image as the base image for the runner. Since the GitLab runner GPU support article simply mentions to run nvidia-smi in the script block, it makes sense. This would mean that the CI job is using the underlying Nvidia container to run the command. At least in theory.

I used nvidia/cuda:13.0.1-cudnn-runtime-ubuntu24.04 and nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04 in two different ways:

Manually started local container with GPU passthrough using
```
podman run --rm --device nvidia.com/gpu=0 --device nvidia.com/gpu=1 --security-opt=label=disable -it <NVIDIA-IMAGE> /bin/bash
```
with <NVIDIA-IMAGE> being nvidia/cuda:13.0.1-cudnn-runtime-ubuntu24.04 and then (just to try with a CUDA version lower than the one coming with the toolkit and driver I have on the system) nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04
Automatically started local container with (theoretically) GPU passthrough through the GitLab runner docker/podman executor

For both Nvidia images I received the following results when trying to execute nvidia-smi

manually started container - command worked as expected and provided the information it is supposed to provide
automatically started container (used podman exec -it <gitlab-runner-started-container> /bin/bash to log into it with a bash shell to run my test with) - I receive command not found error

It appears that even though the same image is used, the container environment is different. I used podman ps --no-trunc reveals the command that the container (GitLab runner one) is executing which is a simple bash session.

CONTAINER ID                                                      IMAGE                                                 COMMAND     CREATED     STATUS      PORTS       NAMES
a6b3ae83249c23eff18af6855516eaa12163601df15390186c74bd47a3556074  nvcr.io/nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04  sh -c if [ -x /usr/local/bin/bash ]; then
                                                                  exec /usr/local/bin/bash 
elif [ -x /usr/bin/bash ]; then
            exec /usr/bin/bash 
elif [ -x /bin/bash ]; then
            exec /bin/bash 
elif [ -x /usr/local/bin/sh ]; then
            exec /usr/local/bin/sh 
elif [ -x /usr/bin/sh ]; then
            exec /usr/bin/sh 
elif [ -x /bin/sh ]; then
            exec /bin/sh 
elif [ -x /busybox/sh ]; then
            exec /busybox/sh 
else
            echo shell not found
            exit 1
fi

            About a minute ago  Up About a minute              runner-5j-fmjokx-project-87457-concurrent-1-4e3dc2992f3a7b23-build

I am able to find the libraries inside the container that are related to CUDA and cuDNN but I cannot find nvidia-smi. I tried both recursively listing everything in / and greping as well as which:

root@runner-5j-fmjokx-project-87457-concurrent-1:/# which nvidia-smi
root@runner-5j-fmjokx-project-87457-concurrent-1:/# ls -R | grep nvidia-smi

For reference inside the manually started container I get

root@77237af1ac7f:/# which nvidia-smi
/usr/bin/nvidia-smi

which is the same output that I get on the host.

alatan · November 4, 2025, 2:14pm

The environment is indeed different. Inside /usr/lib/x86_64-linux-gnu/ I can see different libraries, with the ones on the automatically launched container being considerably less and, what’s a definite no go, the libnvidia and libcuda ones are definitely missing.

on the automatically (via GitLab runner) started container

e2fsprogs                libcap-ng.so.0                               libdb-5.3.so               libhogweed.so.6     libncursesw.so.6       libproc2.so.0.0.2      libsystemd.so.0.38.0
engines-3                libcap-ng.so.0.0.0                           libdebconfclient.so.0      libhogweed.so.6.8   libncursesw.so.6.4     libpsx.so.2            libtasn1.so.6
gconv                    libcap.so.2                                  libdebconfclient.so.0.0.0  libidn2.so.0        libnettle.so.8         libpsx.so.2.66         libtasn1.so.6.6.3
ld-linux-x86-64.so.2     libcap.so.2.66                               libdl.so.2                 libidn2.so.0.4.0    libnettle.so.8.8       libpthread.so.0        libthread_db.so.1
libBrokenLocale.so.1     libcom_err.so.2                              libdrop_ambient.so.0       libksba.so.8        libnpth.so.0           libreadline.so.8       libtic.so.6
libacl.so.1              libcom_err.so.2.1                            libdrop_ambient.so.0.0.0   libksba.so.8.14.6   libnpth.so.0.1.2       libreadline.so.8.2     libtic.so.6.4
libacl.so.1.1.2302       libcrypt.so.1                                libe2p.so.2                liblber.so.2        libnsl.so.1            libresolv.so.2         libtinfo.so.6
libanl.so.1              libcrypt.so.1.1.0                            libe2p.so.2.3              liblber.so.2.0.200  libnss_compat.so.2     librt.so.1             libtinfo.so.6.4
libapt-pkg.so.6.0        libcrypto.so.3                               libext2fs.so.2             libldap.so.2        libnss_dns.so.2        libsasl2.so.2          libudev.so.1
libapt-pkg.so.6.0.0      libcudnn.so.9                                libext2fs.so.2.4           libldap.so.2.0.200  libnss_files.so.2      libsasl2.so.2.0.25     libudev.so.1.7.8
libapt-private.so.0.0    libcudnn.so.9.10.2                           libffi.so.8                liblz4.so.1         libnss_hesiod.so.2     libseccomp.so.2        libunistring.so.5
libapt-private.so.0.0.0  libcudnn_adv.so.9                            libffi.so.8.1.4            liblz4.so.1.9.4     libp11-kit.so.0        libseccomp.so.2.5.5    libunistring.so.5.0.0
libassuan.so.0           libcudnn_adv.so.9.10.2                       libformw.so.6              liblzma.so.5        libp11-kit.so.0.3.1    libselinux.so.1        libutil.so.1
libassuan.so.0.8.6       libcudnn_cnn.so.9                            libformw.so.6.4            liblzma.so.5.4.5    libpam.so.0            libsemanage.so.2       libuuid.so.1
libattr.so.1             libcudnn_cnn.so.9.10.2                       libgcc_s.so.1              libm.so.6           libpam.so.0.85.1       libsepol.so.2          libuuid.so.1.3.0
libattr.so.1.1.2502      libcudnn_engines_precompiled.so.9            libgcrypt.so.20            libmd.so.0          libpam_misc.so.0       libsmartcols.so.1      libxxhash.so.0
libaudit.so.1            libcudnn_engines_precompiled.so.9.10.2       libgcrypt.so.20.4.3        libmd.so.0.1.0      libpam_misc.so.0.82.1  libsmartcols.so.1.1.0  libxxhash.so.0.8.2
libaudit.so.1.0.0        libcudnn_engines_runtime_compiled.so.9       libgmp.so.10               libmemusage.so      libpamc.so.0           libsqlite3.so.0        libz.so.1
libblkid.so.1            libcudnn_engines_runtime_compiled.so.9.10.2  libgmp.so.10.5.0           libmenuw.so.6       libpamc.so.0.82.1      libsqlite3.so.0.8.6    libz.so.1.3
libblkid.so.1.1.0        libcudnn_graph.so.9                          libgnutls.so.30            libmenuw.so.6.4     libpanelw.so.6         libss.so.2             libzstd.so.1
libbz2.so.1              libcudnn_graph.so.9.10.2                     libgnutls.so.30.37.1       libmount.so.1       libpanelw.so.6.4       libss.so.2.0           libzstd.so.1.5.5
libbz2.so.1.0            libcudnn_heuristic.so.9                      libgpg-error.so.0          libmount.so.1.1.0   libpcprofile.so        libssl.so.3            ossl-modules
libbz2.so.1.0.4          libcudnn_heuristic.so.9.10.2                 libgpg-error.so.0.34.0     libmvec.so.1        libpcre2-8.so.0        libstdc++.so.6         perl-base
libc.so.6                libcudnn_ops.so.9                            libhistory.so.8            libnccl.so.2        libpcre2-8.so.0.11.2   libstdc++.so.6.0.33    sasl2
libc_malloc_debug.so.0   libcudnn_ops.so.9.10.2                       libhistory.so.8.2          libnccl.so.2.27.3   libproc2.so.0          libsystemd.so.0        security

on the manually started container (same image!)

  e2fsprogs                         libcudadebugger.so.580.82.07                 liblber.so.2                      libnvidia-glsi.so.580.82.07             libreadline.so.8.2
  engines-3                         libcudnn.so.9                                liblber.so.2.0.200                libnvidia-glvkspirv.so.580.82.07        libresolv.so.2
  gbm                               libcudnn.so.9.10.2                           libldap.so.2                      libnvidia-gpucomp.so.580.82.07          librt.so.1
  gconv                             libcudnn_adv.so.9                            libldap.so.2.0.200                libnvidia-gtk2.so.580.82.07             libsasl2.so.2
  ld-linux-x86-64.so.2              libcudnn_adv.so.9.10.2                       liblz4.so.1                       libnvidia-gtk3.so.580.82.07             libsasl2.so.2.0.25
  libBrokenLocale.so.1              libcudnn_cnn.so.9                            liblz4.so.1.9.4                   libnvidia-ml.so.1                       libseccomp.so.2
  libEGL_nvidia.so.0                libcudnn_cnn.so.9.10.2                       liblzma.so.5                      libnvidia-ml.so.580.82.07               libseccomp.so.2.5.5
  libEGL_nvidia.so.580.82.07        libcudnn_engines_precompiled.so.9            liblzma.so.5.4.5                  libnvidia-ngx.so.1                      libselinux.so.1
  libGLESv1_CM_nvidia.so.1          libcudnn_engines_precompiled.so.9.10.2       libm.so.6                         libnvidia-ngx.so.580.82.07              libsemanage.so.2
  libGLESv1_CM_nvidia.so.580.82.07  libcudnn_engines_runtime_compiled.so.9       libmd.so.0                        libnvidia-nvvm.so.4                     libsepol.so.2
  libGLESv2_nvidia.so.2             libcudnn_engines_runtime_compiled.so.9.10.2  libmd.so.0.1.0                    libnvidia-nvvm.so.580.82.07             libsmartcols.so.1
  libGLESv2_nvidia.so.580.82.07     libcudnn_graph.so.9                          libmemusage.so                    libnvidia-opencl.so.1                   libsmartcols.so.1.1.0
  libGLX_indirect.so.0              libcudnn_graph.so.9.10.2                     libmenuw.so.6                     libnvidia-opencl.so.580.82.07           libsqlite3.so.0
  libGLX_nvidia.so.0                libcudnn_heuristic.so.9                      libmenuw.so.6.4                   libnvidia-opticalflow.so                libsqlite3.so.0.8.6
  libGLX_nvidia.so.580.82.07        libcudnn_heuristic.so.9.10.2                 libmount.so.1                     libnvidia-opticalflow.so.1              libss.so.2
  libacl.so.1                       libcudnn_ops.so.9                            libmount.so.1.1.0                 libnvidia-opticalflow.so.580.82.07      libss.so.2.0
  libacl.so.1.1.2302                libcudnn_ops.so.9.10.2                       libmvec.so.1                      libnvidia-pkcs11-openssl3.so.580.82.07  libssl.so.3
  libanl.so.1                       libdb-5.3.so                                 libnccl.so.2                      libnvidia-present.so.580.82.07          libstdc++.so.6
  libapt-pkg.so.6.0                 libdebconfclient.so.0                        libnccl.so.2.27.3                 libnvidia-ptxjitcompiler.so.1           libstdc++.so.6.0.33
  libapt-pkg.so.6.0.0               libdebconfclient.so.0.0.0                    libncursesw.so.6                  libnvidia-ptxjitcompiler.so.580.82.07   libsystemd.so.0
  libapt-private.so.0.0             libdl.so.2                                   libncursesw.so.6.4                libnvidia-rtcore.so.580.82.07           libsystemd.so.0.38.0
  libapt-private.so.0.0.0           libdrop_ambient.so.0                         libnettle.so.8                    libnvidia-sandboxutils.so.1             libtasn1.so.6
  libassuan.so.0                    libdrop_ambient.so.0.0.0                     libnettle.so.8.8                  libnvidia-sandboxutils.so.580.82.07     libtasn1.so.6.6.3
  libassuan.so.0.8.6                libe2p.so.2                                  libnpth.so.0                      libnvidia-tls.so.580.82.07              libthread_db.so.1
  libattr.so.1                      libe2p.so.2.3                                libnpth.so.0.1.2                  libnvidia-vksc-core.so.1                libtic.so.6
  libattr.so.1.1.2502               libext2fs.so.2                               libnsl.so.1                       libnvidia-vksc-core.so.580.82.07        libtic.so.6.4
  libaudit.so.1                     libext2fs.so.2.4                             libnss_compat.so.2                libnvidia-wayland-client.so.580.82.07   libtinfo.so.6
  libaudit.so.1.0.0                 libffi.so.8                                  libnss_dns.so.2                   libnvoptix.so.1                         libtinfo.so.6.4
  libblkid.so.1                     libffi.so.8.1.4                              libnss_files.so.2                 libnvoptix.so.580.82.07                 libudev.so.1
  libblkid.so.1.1.0                 libformw.so.6                                libnss_hesiod.so.2                libp11-kit.so.0                         libudev.so.1.7.8
  libbz2.so.1                       libformw.so.6.4                              libnvcuvid.so                     libp11-kit.so.0.3.1                     libunistring.so.5
  libbz2.so.1.0                     libgcc_s.so.1                                libnvcuvid.so.1                   libpam.so.0                             libunistring.so.5.0.0
  libbz2.so.1.0.4                   libgcrypt.so.20                              libnvcuvid.so.580.82.07           libpam.so.0.85.1                        libutil.so.1
  libc.so.6                         libgcrypt.so.20.4.3                          libnvidia-allocator.so.1          libpam_misc.so.0                        libuuid.so.1
  libc_malloc_debug.so.0            libgmp.so.10                                 libnvidia-allocator.so.580.82.07  libpam_misc.so.0.82.1                   libuuid.so.1.3.0
  libcap-ng.so.0                    libgmp.so.10.5.0                             libnvidia-cfg.so.1                libpamc.so.0                            libxxhash.so.0
  libcap-ng.so.0.0.0                libgnutls.so.30                              libnvidia-cfg.so.580.82.07        libpamc.so.0.82.1                       libxxhash.so.0.8.2
  libcap.so.2                       libgnutls.so.30.37.1                         libnvidia-egl-gbm.so.1            libpanelw.so.6                          libz.so.1
  libcap.so.2.66                    libgpg-error.so.0                            libnvidia-egl-gbm.so.1.1.2        libpanelw.so.6.4                        libz.so.1.3
  libcom_err.so.2                   libgpg-error.so.0.34.0                       libnvidia-egl-wayland.so.1        libpcprofile.so                         libzstd.so.1
  libcom_err.so.2.1                 libhistory.so.8                              libnvidia-egl-wayland.so.1.1.19   libpcre2-8.so.0                         libzstd.so.1.5.5
  libcrypt.so.1                     libhistory.so.8.2                            libnvidia-eglcore.so.580.82.07    libpcre2-8.so.0.11.2                    nvidia
  libcrypt.so.1.1.0                 libhogweed.so.6                              libnvidia-encode.so               libproc2.so.0                           ossl-modules
  libcrypto.so.3                    libhogweed.so.6.8                            libnvidia-encode.so.1             libproc2.so.0.0.2                       perl-base
  libcuda.so                        libidn2.so.0                                 libnvidia-encode.so.580.82.07     libpsx.so.2                             sasl2
  libcuda.so.1                      libidn2.so.0.4.0                             libnvidia-fbc.so.1                libpsx.so.2.66                          security
  libcuda.so.580.82.07              libksba.so.8                                 libnvidia-fbc.so.580.82.07        libpthread.so.0                         vdpau
  libcudadebugger.so.1              libksba.so.8.14.6                            libnvidia-glcore.so.580.82.07     libreadline.so.8

Same applies to /usr/bin with the nvidia-smi clearly missing from the container started uisng the GitLab runner.

dnsmichi · November 4, 2025, 2:28pm

Quick thought without validation - Check if the image has entrypoints configured that are run and install specific packages on container runtime. I suspect that the manual run does this, whereas the CI runner docker executor does not.

alatan · November 4, 2025, 3:31pm

I will check that. I need to find the Dockerfile for the Nvidia CUDA cuDNN images first.

After all

GitLab runner CI job’s log clearly states that The NVIDIA Driver was not detected
after inspecting the libraries it is clear that the CUDA related ones are missing but not the cuDNN ones. While the cuDNN libraries do not come with an installer but are a simple download from Nvidia’s website and a copy-paste to the /usr/lib/..., the CUDA ones come from installing the toolkit. And if the installer fails (this is my experience setting up CUDA on a normal host) fails to detect an Nvidia GPU, the toolkit will not be installed, hence the missing libraries.

So we are back to square one - why is the runner not detecting the GPU and/or has some sort of a conflict with the Nvidia container toolkit, which - as seen previously - works perfectly fine with podman.

UPDATE: Hah, didn’t know Nvidia was using GitLab and not GitHub for hosting.

Dockerfile · master · nvidia / container-images / cuda · GitLab - this is probably what I am looking for but there might be other images’ Dockerfiles to inspect)
entrypoint.d/50-gpu-driver-check.sh · master · nvidia / container-images / cuda · GitLab - the GPU check entrypoint script that is run (one of multiple)

Topic		Replies	Views
Gitlab-runner with a gpu GitLab CI/CD	0	2180	October 15, 2021
Nvidia runtime in a gitlab-ci service GitLab CI/CD	0	656	December 16, 2019
Private Gitlab Runner not working with GPU GitLab CI/CD ci , runner , pipelines	0	771	July 14, 2023
Run docker container with the `--gpu` option in jobs fails GitLab CI/CD	1	356	December 9, 2024
Config changes in config.toml file do not reflect even after gitlab runner restart GitLab CI/CD runner , docker	5	4450	August 10, 2023

Unable to use Nvidia GPU with GitLab runner but works when running containers manually - same executor

Problem to solve

Steps to reproduce

Configuration

Versions

Local check (no runner)

Documentation

Related topics