Unable to Increase shm_size for GitLab CI/CD Container (Docker Executor)

Schmiedsimmal · November 18, 2024, 12:11pm

Problem to solve:

I have a GitLab pipeline that is supposed to train an AI model. The pipeline is executed with a Docker executor. The training should run in the image gitlab.lrz.de:5005/messtechnik-labor/barcs/docker/mmdetection3d-training/tmp:0.5.1. The problem is that during the training, I only have a shm_size of 64 MB in this container, which is not sufficient. I have adjusted the shm_size in both the docker-compose.yml file and the GitLab Runner’s config.toml file. When I run the docker inspect command, I can confirm that the shm_size has been successfully increased to 20 GB. However, when I start the pipeline, these 20 GB are not available in the training image. What else do I need to adjust or what can I do? I would really appreciate any help.

Steps to reproduce:

I have modified the shm_size in the docker-compose.yml file (which starts the runner) and in the GitLab Runner’s config.toml file.
I have verified the changes by running docker inspect and confirmed that the shm_size is set to 20 GB.
I have restarted the GitLab Runner and the associated containers multiple times, but the issue persists.
I have reviewed the GitLab Runner documentation regarding shared memory allocation and Docker configuration.

Configuration:

config.toml:

[[runners]]
  name = "my-runner"
  url = "https://gitlab.com/"
  token = "YOUR_TOKEN"
  executor = "docker"
  [runners.docker]
    tls_verify = false
    image = "docker:latest"
    privileged = true
    gpus = "all"  # Ensure only one occurrence
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache", "/data:/data"]
    shm_size = 20g

docker-compose.yml:

volumes:
  gitlab-runner-config:
    external: true

services:
  gitlab-runner:
    container_name: gitlab-runner
    restart: always
    image: gitlab/gitlab-runner:latest
    shm_size: 20g  # Set shared memory to 20 GB
    volumes:
      - ${HOME}/data:/data
      - /etc/mysql:/etc/mysql
      - /var/run/docker.sock:/var/run/docker.sock
      - gitlab-runner-config:/etc/gitlab-runner
    hostname: "$(hostname)"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Pipeline:

stages:
  - run_training

run_training:
  image: gitlab.lrz.de:5005/messtechnik-labor/barcs/docker/mmdetection3d-training/tmp:0.5.1
  tags:
    - ai-worker-3
  stage: run_training
  script:
    - echo "Running training job"
    - echo "Check if GPU is available"
    - nvidia-smi
    - echo "Shared memory size:"
    - df -h /dev/shm

dnsmichi · November 18, 2024, 6:59pm

Schmiedsimmal:

  [runners.docker]
    tls_verify = false
    image = "docker:latest"
    privileged = true
    gpus = "all"  # Ensure only one occurrence
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache", "/data:/data"]
    shm_size = 20g

I have not used this setting before, but the documentation says it expects the size in bytes. Not sure if 20g are interpreted correctly. 21474836480 as raw bytes value works, or fails, too?

Schmiedsimmal · November 19, 2024, 8:08am

Many thanks for the quick reply!
I have already tried the specification in bytes (just now again) but the problem still persists…

Docker compose file for the runner:

volumes:
  gitlab-runner-config:
    external: true

services:
  gitlab-runner:
    container_name: gitlab-runner
    restart: always
    image: gitlab/gitlab-runner:latest
    shm_size: 21474836480
    volumes:
      - ${HOME}/data:/data
      - /etc/mysql:/etc/mysql
      - /var/run/docker.sock:/var/run/docker.sock
      - gitlab-runner-config:/etc/gitlab-runner
    hostname: "$(hostname)"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

runner config file:

[[runners]]
  name = "my-runner"
  url = "https://gitlab.com/"
  token = "YOUR_TOKEN"
  executor = "docker"
  environment = ["DOCKER_SHM_SIZE=256m"]
  [runners.docker]
    tls_verify = false
    image = "docker:latest"
    privileged = true
    gpus = "all"  # Ensure only one occurrence
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache", "/data:/data"]
    shm_size = 21474836480

When the gitlab-runner container is running and I check the shm_size in it, I get the output:

docker exec -it 7c0ecba224be /bin/bash
root@$(hostname):/# df -h /dev/shm
Filesystem      Size  Used Avail Use% Mounted on
shm              20G     0   20G   0% /dev/shm

so my runner has enough shm. But when i run my pipeline i still get:

Topic		Replies	Views
Gitlab runner required help to set shm-size for docker GitLab CI/CD docker	1	3456	November 16, 2018
Increasing memory and cpus for windows docker build GitLab CI/CD ci , runner , windows , docker	2	2310	June 6, 2024
How to increase disk space of a gitlab runner? GitLab CI/CD	0	1833	July 21, 2021
GitLab Runner with docker-autoscaler not reusing available cache volumes GitLab CI/CD ci , runner , docker , pipelines	6	226	July 31, 2024
Gitlab CI job report no space error How to Use GitLab	1	438	April 14, 2020

Unable to Increase shm_size for GitLab CI/CD Container (Docker Executor)

Problem to solve:

Steps to reproduce:

Configuration:

Related topics