GitLab runner tmpfs/ramdisk extremely slow

I already posted an issue in the GitLab runner repo (GitLab runner - tmpfs/ramdisk extremely slow (no speed-up compared to HDD) (#29651) · Issues · GitLab.org / gitlab-runner · GitLab), but then realised that not much is going on there, so I’ll “double post it” here, sorry for that. Once it’s resolved, I’ll also update the issue.

TL;DR: using tmpfs in GitLab CI is very slow, basically HDD speed (writing around 60MB/s) although a similar manual setup of creating a Docker container with a ramdisk mounted is performing well, similar to what you get when running on bare metal (around 2GB/s).

Long story:

I need to speed up the I/O massively for a particular type of jobs and wanted to utilise tmpfs for that. I found an SO question and the accepted answer there with the manual mount of tmpfs inside the CI configuration (via before_script) which requires running the GitLab in privileged mode was not faster at all. It was anyways not a nice solution so I kept looking further.

I also found in the official GitLab Runner docs and in another blog post by Major Hayden which mentions the [runners.docker.tmpfs] option in the runner config, but I don’t see any speed-up with that approach either.

Another thing I’ve tried is to create a tmpfs mount point on the host machine and mount it to the GitLab runner’s container and then configure that to be used in the runner instance container, but again, no speed-up at all. I can’t get more than about 60 MB/s of write speed for a 1GB file.

Based on the official docs, the configuration of my GitLab runner which is running via Docker is (I also tried a bare-metal deployment without Docker, but got the same 60MB/s):

concurrent = 1
check_interval = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "Pre-calibration"
  output_limit = 16384
  url = "https://..."
  id = 39
  token = "..."
  token_obtained_at = 2023-02-27T10:21:22Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "docker"
# The goal is to have the builds and cache dir also in RAM, once it's working
#  builds_dir = "/ramdisk"
#  cache_dir = "/ramdisk/cache"
  environment = ["DOCKER_DRIVER=overlay2"]
  [runners.custom_build_dir]
    enabled = true
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.docker]
    tls_verify = false
    image = "docker:latest"
    memory = "128m"
    cpus = "120"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/var/run/docker.sock:/var/run/docker.sock"]
    shm_size = 0
    [runners.docker.tmpfs]
        "/ramdisk" = "rw,exec"

Here is the .gitlab-ci.yml:

variables:
  DOCKER_DRIVER: overlay2

pre-calibration:
  script:
    - pwd
    - df -h
    - ls -al /ramdisk
    - time dd if=/dev/zero of=/ramdisk/1G.dat bs=1G count=1 oflag=dsync
    - time cp /ramdisk/1G.dat /ramdisk/1G_copy.dat
  tags:
    - pre-calibration

Below you can see in the job ouptput that the write speed from /dev/zero to /ramdisk, which is also shown as a tmpfs mount and also from /ramdisk to /ramdisk via cp is around 60MB/s which is more or less the hard disk drive speed, nothing close to what the RAM would do.

$ pwd
/builds/tgal/pre-calibration-runner

$ df -h
Filesystem                         Size  Used Avail Use% Mounted on
overlay                            394G  301G   76G  80% /
tmpfs                               64M     0   64M   0% /dev
tmpfs                              252G     0  252G   0% /sys/fs/cgroup
shm                                 64M     0   64M   0% /dev/shm
/dev/mapper/ubuntu--vg-ubuntu--lv  394G  301G   76G  80% /builds
tmpfs                              252G     0  252G   0% /ramdisk
tmpfs                               51G  3.5M   51G   1% /run/docker.sock

$ ls -al /ramdisk
total 4
drwxrwxrwt 2 root root   40 Feb 27 12:41 .
drwxr-xr-x 1 root root 4096 Feb 27 12:41 ..

$ time dd if=/dev/zero of=/ramdisk/1G.dat bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 17.512 s, 61.3 MB/s
real	0m17.559s
user	0m0.000s
sys	0m8.726s

$ time cp /ramdisk/1G.dat /ramdisk/1G_copy.dat
real	0m16.593s
user	0m0.021s
sys	0m7.147s


Cleaning up project directory and file based variables
00:01
Job succeeded

writing smaller files is faster, but it’s still five times slower than outside of the Docker context. Is this a Docker issue?

This is what I get on the machine itself (no Docker):

# dd if=/dev/zero of=/ramdisk/test.img bs=1M count=4096
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 1.90168 s, 2.3 GB/s

and here within the CI:

$ time dd if=/dev/zero of=/ramdisk/test.img bs=1M count=4096
76real	0m16.376s
77user	0m0.017s
78sys	0m7.580s
794096+0 records in
804096+0 records out
814294967296 bytes (4.3 GB) copied, 9.17098 s, 468 MB/s

I thought it might be a Docker issue but spinning up a container with /ramdisk (tmpfs) mounted shows that the speed is fine:

root:/ramdisk# docker run -v /ramdisk:/ramdisk -it debian:buster
Unable to find image 'debian:buster' locally
buster: Pulling from library/debian
b2404786f3fe: Pull complete
Digest: sha256:233c3bbc892229c82da7231980d50adceba4db56a08c0b7053a4852782703459
Status: Downloaded newer image for debian:buster

root@90833333ba7f:/# time dd if=/dev/zero of=/ramdisk/test.img bs=1M count=4096
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 2.03867 s, 2.1 GB/s

real	0m2.543s
user	0m0.000s
sys	0m2.539s

root@90833333ba7f:/# time dd if=/dev/zero of=/ramdisk/1G.dat bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.21548 s, 883 MB/s

Has anyone any idea what’s going on? As far as I understood, the GitLab runner does more or less the same as I did in the last example so I don’t understand why there is such a huge difference in performance.

Can anyone who has successfully set up a tmpfs-speed-up check if they get better numbers?