I already posted an issue in the GitLab runner repo (GitLab runner - tmpfs/ramdisk extremely slow (no speed-up compared to HDD) (#29651) · Issues · GitLab.org / gitlab-runner · GitLab), but then realised that not much is going on there, so I’ll “double post it” here, sorry for that. Once it’s resolved, I’ll also update the issue.
TL;DR: using tmpfs
in GitLab CI is very slow, basically HDD speed (writing around 60MB/s) although a similar manual setup of creating a Docker container with a ramdisk mounted is performing well, similar to what you get when running on bare metal (around 2GB/s).
Long story:
I need to speed up the I/O massively for a particular type of jobs and wanted to utilise tmpfs
for that. I found an SO question and the accepted answer there with the manual mount of tmpfs
inside the CI configuration (via before_script
) which requires running the GitLab in privileged mode was not faster at all. It was anyways not a nice solution so I kept looking further.
I also found in the official GitLab Runner docs and in another blog post by Major Hayden which mentions the [runners.docker.tmpfs]
option in the runner config, but I don’t see any speed-up with that approach either.
Another thing I’ve tried is to create a tmpfs mount point on the host machine and mount it to the GitLab runner’s container and then configure that to be used in the runner instance container, but again, no speed-up at all. I can’t get more than about 60 MB/s of write speed for a 1GB file.
Based on the official docs, the configuration of my GitLab runner which is running via Docker is (I also tried a bare-metal deployment without Docker, but got the same 60MB/s):
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "Pre-calibration"
output_limit = 16384
url = "https://..."
id = 39
token = "..."
token_obtained_at = 2023-02-27T10:21:22Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "docker"
# The goal is to have the builds and cache dir also in RAM, once it's working
# builds_dir = "/ramdisk"
# cache_dir = "/ramdisk/cache"
environment = ["DOCKER_DRIVER=overlay2"]
[runners.custom_build_dir]
enabled = true
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "docker:latest"
memory = "128m"
cpus = "120"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/var/run/docker.sock:/var/run/docker.sock"]
shm_size = 0
[runners.docker.tmpfs]
"/ramdisk" = "rw,exec"
Here is the .gitlab-ci.yml
:
variables:
DOCKER_DRIVER: overlay2
pre-calibration:
script:
- pwd
- df -h
- ls -al /ramdisk
- time dd if=/dev/zero of=/ramdisk/1G.dat bs=1G count=1 oflag=dsync
- time cp /ramdisk/1G.dat /ramdisk/1G_copy.dat
tags:
- pre-calibration
Below you can see in the job ouptput that the write speed from /dev/zero
to /ramdisk
, which is also shown as a tmpfs
mount and also from /ramdisk
to /ramdisk
via cp
is around 60MB/s which is more or less the hard disk drive speed, nothing close to what the RAM would do.
$ pwd
/builds/tgal/pre-calibration-runner
$ df -h
Filesystem Size Used Avail Use% Mounted on
overlay 394G 301G 76G 80% /
tmpfs 64M 0 64M 0% /dev
tmpfs 252G 0 252G 0% /sys/fs/cgroup
shm 64M 0 64M 0% /dev/shm
/dev/mapper/ubuntu--vg-ubuntu--lv 394G 301G 76G 80% /builds
tmpfs 252G 0 252G 0% /ramdisk
tmpfs 51G 3.5M 51G 1% /run/docker.sock
$ ls -al /ramdisk
total 4
drwxrwxrwt 2 root root 40 Feb 27 12:41 .
drwxr-xr-x 1 root root 4096 Feb 27 12:41 ..
$ time dd if=/dev/zero of=/ramdisk/1G.dat bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 17.512 s, 61.3 MB/s
real 0m17.559s
user 0m0.000s
sys 0m8.726s
$ time cp /ramdisk/1G.dat /ramdisk/1G_copy.dat
real 0m16.593s
user 0m0.021s
sys 0m7.147s
Cleaning up project directory and file based variables
00:01
Job succeeded
writing smaller files is faster, but it’s still five times slower than outside of the Docker context. Is this a Docker issue?
This is what I get on the machine itself (no Docker):
# dd if=/dev/zero of=/ramdisk/test.img bs=1M count=4096
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 1.90168 s, 2.3 GB/s
and here within the CI:
$ time dd if=/dev/zero of=/ramdisk/test.img bs=1M count=4096
76real 0m16.376s
77user 0m0.017s
78sys 0m7.580s
794096+0 records in
804096+0 records out
814294967296 bytes (4.3 GB) copied, 9.17098 s, 468 MB/s
I thought it might be a Docker issue but spinning up a container with /ramdisk
(tmpfs) mounted shows that the speed is fine:
root:/ramdisk# docker run -v /ramdisk:/ramdisk -it debian:buster
Unable to find image 'debian:buster' locally
buster: Pulling from library/debian
b2404786f3fe: Pull complete
Digest: sha256:233c3bbc892229c82da7231980d50adceba4db56a08c0b7053a4852782703459
Status: Downloaded newer image for debian:buster
root@90833333ba7f:/# time dd if=/dev/zero of=/ramdisk/test.img bs=1M count=4096
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 2.03867 s, 2.1 GB/s
real 0m2.543s
user 0m0.000s
sys 0m2.539s
root@90833333ba7f:/# time dd if=/dev/zero of=/ramdisk/1G.dat bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.21548 s, 883 MB/s
Has anyone any idea what’s going on? As far as I understood, the GitLab runner does more or less the same as I did in the last example so I don’t understand why there is such a huge difference in performance.
Can anyone who has successfully set up a tmpfs-speed-up check if they get better numbers?