Possible gitlab-runner regression between 16.5.0 and 16.6.x

Problem

I’d upgraded a local runner machine to v16.6.0 a few weeks ago and started seeing the following error in my CI jobs.

Running with gitlab-runner 16.6.0 (3046fee8)
  on <my-machine> Podman/Docker runner <...>, system ID: <...>
  feature flags: FF_NETWORK_PER_BUILD:true, FF_USE_FASTZIP:true, FF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR:true
Resolving secrets 00:00
Preparing the "docker" executor 00:07
Using Docker executor with image <local gitlab registry>image:latest ...
ERROR: Job failed: adding cache volume: set volume permissions: running permission container "35e3042f0071e22a1b4d9b3ab73d541c9e0941b8dc60101b37d0dabd127ddb1e" for volume "runner-uyekh3z-project-225-concurrent-0-cache-3c3f060a0374fc8bc39395164f415a70": waiting for permission container to finish: exit code 127

I also see the same error when upgrading to the v16.6.1. The latest gitlab-runner release at this time.
This machine has been setup to run Podman in rootless mode under the gitlab-runner user.

Two ways I can work around the issue:

Workaround Option 1

Downgrade from 16.6.x to v16.5.0. Running…

sudo dnf install gitlab-runner-16.5.0-1.x86_64

…then retrying the previously failed job will pass.

Workaround Option 2

Override the gitlab helper image to the SAME version it would have used. If I have gitlab-runner 16.6.1 installed, that’s f5da3c5a (see gitlab-runner repo tags).

/etc/gitlab-runner/config.toml

  [runners.docker]
    helper_image = "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-f5da3c5a"

Retrying a previously failed job looks like this. Check this out, the “default would be” and the overridden helper image tag are the same!

Preparing the "docker" executor 00:02
Using Docker executor with image  <local gitlab registry>image:latest ...
Using helper image:  registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-f5da3c5a  (overridden, default would be  registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-f5da3c5a )
Pulling docker image registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-f5da3c5a ...
Using docker image sha256:561dca7a33f86bf3c2bf1112bbc1d3d12c6962e202e1e985185cc61177a4fdc1 for registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-f5da3c5a with digest registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper@sha256:8c79152aed93973ee94ff532e32dab167ef5ce34ec0aef072f07097d587821a8 ...
Using helper image:  registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-f5da3c5a  (overridden, default would be  registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-f5da3c5a )
Using docker image sha256:561dca7a33f86bf3c2bf1112bbc1d3d12c6962e202e1e985185cc61177a4fdc1 for registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-f5da3c5a with digest registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper@sha256:8c79152aed93973ee94ff532e32dab167ef5ce34ec0aef072f07097d587821a8 ...
Authenticating with credentials from job payload (GitLab Registry)
Pulling docker image  <local gitlab registry>image:latest ...
Using docker image sha256:<...> for  <local gitlab registry>image:latest with digest <...>
Not using umask - FF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR is set!

[...]

Job succeeded

Question

Am I crazy? Anyone else seen this too? Thoughts/ideas?

Runner System Details

All system packages up-to-date as of 2023-11-27

[gitlab-runner@my-machine ~]$ podman info
host:
  arch: amd64
  buildahVersion: 1.31.3
  cgroupControllers: []
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.8-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.8, commit: 879ca989e09d731947cd8d9cbb41038549bf669d'
  cpuUtilization:
    idlePercent: 98.74
    systemPercent: 0.24
    userPercent: 1.02
  cpus: 16
  databaseBackend: boltdb
  distribution:
    distribution: '"almalinux"'
    version: "9.3"
  eventLogger: file
  freeLocks: 2048
  hostname: epyc-rhino
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 5.14.0-362.8.1.el9_3.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 30848843776
  memTotal: 33241456640
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.7.0-1.el9.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.7.0
    package: netavark-1.7.0-1.el9.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.7.0
  ociRuntime:
    name: crun
    package: crun-1.8.7-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.7
      commit: 53a9996ce82d1ee818349bdcc64797a1fa0433c4
      rundir: /run/user/1001/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20230818.g0af928e-4.el9.x86_64
    version: |
      pasta 0^20230818.g0af928e-4.el9.x86_64
      Copyright Red Hat
      GNU Affero GPL version 3 or later <https://www.gnu.org/licenses/agpl-3.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1001/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.1-1.el9.x86_64
    version: |-
      slirp4netns version 1.2.1
      commit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 16827543552
  swapTotal: 16827543552
  uptime: 1h 7m 26.00s (Approximately 0.04 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /mnt/ci-data/gitlab-runner-home/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /mnt/ci-data/gitlab-runner-home/.local/share/containers/storage
  graphRootAllocated: 502921392128
  graphRootUsed: 35369885696
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/user/1001/containers
  transientStore: false
  volumePath: /mnt/ci-data/gitlab-runner-home/.local/share/containers/storage/volumes
version:
  APIVersion: 4.6.1
  Built: 1695842412
  BuiltTime: Wed Sep 27 12:20:12 2023
  GitCommit: ""
  GoVersion: go1.19.10
  Os: linux
  OsArch: linux/amd64
  Version: 4.6.1

/etc/gitlab-runner/config.toml

concurrent = 1
check_interval = 0
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "<my-machine> Podman/Docker runner"
  url = "<...>"
  id = 292
  token = "<...>"
  token_obtained_at = <...>
  token_expires_at = <...>
  executor = "docker"
  environment = ["FF_NETWORK_PER_BUILD=1", "FF_USE_FASTZIP=1", "ARTIFACT_COMPRESSION_LEVEL=fast", "CACHE_COMPRESSION_LEVEL=fast", "FF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR=1"]
  [runners.docker]
    # Explicit helper image to workaround "waiting for permission container to finish exit 127" error encountered 2023-11-20
    helper_image = "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-f5da3c5a"
    host = "unix:///run/user/1001/podman/podman.sock"
    tls_verify = false
    image = "quay.io/podman/stable"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    # 1 GByte
    shm_size = 1073741824
    [runners.docker.tmpfs]
      "/mnt/ramdisk" = "rw,exec,size=2G"

When running v16.6.1, if I

  • Run podman system prune --all --volumes to clean up any old cruft
  • Retry a CI job

The CI job log shows the same error as before:

Running with gitlab-runner 16.6.1 (f5da3c5a)
  on <my-machine> Podman/Docker runner <...>, system ID: <...>
  feature flags: FF_NETWORK_PER_BUILD:true, FF_USE_FASTZIP:true, FF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR:true
Resolving secrets 00:00
Preparing the "docker" executor 00:06
Using Docker executor with image <local gitlab registry>image:latest ...
ERROR: Job failed: adding cache volume: set volume permissions: running permission container "f404c36e60fca30ce81978bb9d92247812394ffafb3dadfaca52c1c537e03ada" for volume "runner-<...>-project-225-concurrent-0-cache-3c3f060a0374fc8bc39395164f415a70": waiting for permission container to finish: exit code 127

And the runner machine shows the helper image has been imported to Podman.

$ podman images
REPOSITORY                                                         TAG              IMAGE ID      CREATED         SIZE
registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper  x86_64-f5da3c5a  53b08cbd609b  16 seconds ago  69.1 MB

… so I don’t think it’s issue gitlab-org/gitlab-runner#29576

Notice, however,

  • run podman system prune --all --volume
  • set config.toml
    • helper_image = "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-${CI_RUNNER_REVISION}"
  • sudo gitlab-runner restart
  • Retry a job

Then notice the helper image that gets pulled into podman:

$ podman images
REPOSITORY                                                         TAG              IMAGE ID      CREATED     SIZE
registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper  x86_64-f5da3c5a  561dca7a33f8  3 days ago  69.1 MB

Why are the image ID’s not the same? Isn’t “IMAGE ID” supposed to be the unique hash that we can verify with upstream registries? Why would gitlab runner default to an image with the same tag but a different hash?

I’ve been trying to fix this on my two systems off-and-on for the last few months. I think I finally have a fix! A little bit of a re-cap first:

Recap / More Background

As I said above, I’m running Podman in rootless mode under the gitlab-runner user.

I am now running GitLab Runner v17.1.0 where this permissions error is occurring all the time.
My previous workaround of using an explicit gitlab runner helper tag is no longer working.

Looking in journalctl logs I saw messages like the following:

Jul 10 12:18:45 myhostname kernel: podman0: port 1(veth0) entered forwarding state
Jul 10 12:18:45 myhostname podman[4790]: time="2024-07-10T12:18:45-07:00" level=warning msg="Requested oom_score_adj=0 is lower than the current one, changing to 200"
Jul 10 12:18:45 myhostname podman[4790]: time="2024-07-10T12:18:45-07:00" level=info msg="Running conmon under slice user.slice and unitName libpod-conmon-6fbf38946024e48b89db9e081e355f88c3e4f8730ef66f06b88547d71cf80e3d.scope"
Jul 10 12:18:45 myhostname systemd[1368]: Started libpod-conmon-6fbf38946024e48b89db9e081e355f88c3e4f8730ef66f06b88547d71cf80e3d.scope.
Jul 10 12:18:45 myhostname systemd[1368]: Started libcrun container.
Jul 10 12:18:45 myhostname podman[4790]: time="2024-07-10T12:18:45-07:00" level=info msg="Got Conmon PID as 4900"
Jul 10 12:18:45 myhostname conmon[4900]: conmon 6fbf38946024e48b89db <ninfo>: container 4902 exited with status 127
Jul 10 12:18:45 myhostname conmon[4900]: conmon 6fbf38946024e48b89db <nwarn>: Failed to open cgroups file: /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-6fbf38946024e48b89db9e081e355f88c3e4f8730ef66f06b88547d71cf80e3d.scope/container/memory.events
Jul 10 12:18:45 myhostname podman[4790]: @ - - [10/Jul/2024:12:18:45 -0700] "POST /v1.41/containers/6fbf38946024e48b89db9e081e355f88c3e4f8730ef66f06b88547d71cf80e3d/start HTTP/1.1" 204 0 "" "Go-http-client/1.1"
Jul 10 12:18:45 myhostname kernel: podman0: port 1(veth0) entered disabled state
Jul 10 12:18:45 myhostname kernel: device veth0 left promiscuous mode
Jul 10 12:18:45 myhostname kernel: podman0: port 1(veth0) entered disabled state
Jul 10 12:18:46 myhostname podman[4790]: @ - - [10/Jul/2024:12:18:45 -0700] "POST /v1.41/containers/6fbf38946024e48b89db9e081e355f88c3e4f8730ef66f06b88547d71cf80e3d/wait?condition=not-running HTTP/1.1" 200 32 "" "Go-http-client/1.1"
Jul 10 12:18:46 myhostname podman[4790]: @ - - [10/Jul/2024:12:18:46 -0700] "DELETE /v1.41/containers/6fbf38946024e48b89db9e081e355f88c3e4f8730ef66f06b88547d71cf80e3d?force=1 HTTP/1.1" 204 0 "" "Go-http-client/1.1"
Jul 10 12:18:46 myhostname gitlab-runner[2261]: WARNING: Job failed: adding cache volume: set volume permissions: running permission container "6fbf38946024e48b89db9e081e355f88c3e4f8730ef66f06b88547d71cf80e3d" for volume "runner-uyekh3z-project-225-concurrent-0-cache-3c3f060a0374fc8bc39395164f415a70": waiting for permission container to finish: exit code 127

Not being a DevOps expert, this looked to me like some sort of cgroups issue.
So I had tried enabling various kernel command line arguments in GRUB, such as cgroup_enable=memory or user_namespace.enable=1 but none of that was helping in this AlmaLinux 9.4 (kernel 5.14.0-427) OS.

Solution

I eventually found there was no storage configuration file for the gitlab-runner user. (${HOME}/.config/containers/storage.conf)
There is also no ~/.config/containers/libpod.conf but this did not seem to matter.
I made a new storage config by copying the default system-wide one from /etc/containers/storage.conf into this file.

Then made three changes

  • comment out runroot
  • comment out graphroot
  • un-comment mount_program

The diff looks like this

$ diff /etc/containers/storage.conf ~/.config/containers/storage.conf

- runroot = "/run/containers/storage"
+ # runroot = "/run/containers/storage"
---
- graphroot = "/var/lib/containers/storage"
+ # graphroot = "/var/lib/containers/storage"
---
- # mount_program = "/usr/bin/fuse-overlayfs"
+ mount_program = "/usr/bin/fuse-overlayfs"

For posterity, here’s the complete config file with comments removed:

Click to expand
[storage]
driver = "overlay"
# runroot = "/run/containers/storage"
# graphroot = "/var/lib/containers/storage"
# rootless_storage_path = "$HOME/.local/share/containers/storage"
# transient_store = true

[storage.options]
additionalimagestores = []
pull_options = {enable_partial_images = "false", use_hard_links = "false", ostree_repos=""}
# remap-uids = "0:1668442479:65536"
# remap-gids = "0:1668442479:65536"
# remap-user = "containers"
# remap-group = "containers"
# root-auto-userns-user = "storage"
# auto-userns-min-size=1024
# auto-userns-max-size=65536

[storage.options.overlay]
# ignore_chown_errors = "false"
# inodes = ""
mount_program = "/usr/bin/fuse-overlayfs"
mountopt = "nodev,metacopy=on"
# skip_mount_home = "false"
# size = ""
# force_mask = ""

[storage.options.thinpool]
# autoextend_percent = "20"
# autoextend_threshold = "80"
# basesize = "10G"
# blocksize="64k"
# directlvm_device = ""
# directlvm_device_force = "True"
# fs="xfs"
# log_level = "7"
# min_free_space = "10%"
# mkfsarg = ""
# metadata_size = ""
# size = ""
# use_deferred_removal = "True"
# use_deferred_deletion = "True"
# xfs_nospace_max_retries = "0"

I don’t know if this was necessary, but I did clear out all images and containers after doing this.
I was trying a bunch of things and did need to reset the podman system for the gitlab-runner many times for those other failed experiments.

podman system reset

My podman info output now looks like this

Click to expand
host:
  arch: amd64
  buildahVersion: 1.33.7
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: 574ce145d4fde456322f648afc2cb9dc2141ee16'
  cpuUtilization:
    idlePercent: 99.66
    systemPercent: 0.12
    userPercent: 0.22
  cpus: 16
  databaseBackend: sqlite
  distribution:
    distribution: almalinux
    version: "9.4"
  eventLogger: file
  freeLocks: 2045
  hostname: epyc-hippo
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.14.0-427.22.1.el9_4.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 28474130432
  memTotal: 33244123136
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.10.0-3.el9_4.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.10.0
    package: netavark-1.10.3-1.el9.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: crun-1.14.3-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.14.3
      commit: 1961d211ba98f532ea52d2e80f4c20359f241a98
      rundir: /run/user/1001/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20231204.gb86afe3-1.el9.x86_64
    version: |
      pasta 0^20231204.gb86afe3-1.el9.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1001/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.3-1.el9.x86_64
    version: |-
      slirp4netns version 1.2.3
      commit: c22fde291bb35b354e6ca44d13be181c76a0a432
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 16827543552
  swapTotal: 16827543552
  uptime: 1h 43m 49.00s (Approximately 0.04 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /mnt/ci-data/gitlab-runner-home/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.13-1.el9.x86_64
      Version: |-
        fusermount3 version: 3.10.2
        fuse-overlayfs: version 1.13-dev
        FUSE library version 3.10.2
        using FUSE kernel interface version 7.31
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /mnt/ci-data/gitlab-runner-home/.local/share/containers/storage
  graphRootAllocated: 502921392128
  graphRootUsed: 34324008960
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 2
  runRoot: /run/user/1001/containers
  transientStore: false
  volumePath: /mnt/ci-data/gitlab-runner-home/.local/share/containers/storage/volumes
version:
  APIVersion: 4.9.4-rhel
  Built: 1720519277
  BuiltTime: Tue Jul  9 03:01:17 2024
  GitCommit: ""
  GoVersion: go1.21.11 (Red Hat 1.21.11-1.el9_4)
  Os: linux
  OsArch: linux/amd64
  Version: 4.9.4-rhel

Finally, after this change, I was able to remove the helper_image override from /etc/gitlab-runner/config.toml

Click to expand
[[runners]]
  name = "Podman/Docker runner"
  url = "<redacted>"
  id = <redacted>
  token = "<redacted>"
  token_obtained_at = <redacted>
  token_expires_at = <redacted>
  executor = "docker"
  # Set 'FF_NETWORK_PER_BUILD=1' if we start using services with containers.
  environment = ["FF_USE_FASTZIP=1", "ARTIFACT_COMPRESSION_LEVEL=fast", "CACHE_COMPRESSION_LEVEL=fast", "FF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR=1"]
  [runners.docker]
    runtime = "podman"
    host = "unix:///run/user/1001/podman/podman.sock"
    tls_verify = false
    image = "quay.io/podman/stable"
    dns = ["<internal dns ip>"]
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 1073741824
-   helper_image = "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-${CI_RUNNER_REVISION}"
    network_mtu = 0
    [runners.docker.tmpfs]
      "/mnt/ramdisk" = "rw,exec,size=2G,mode=1777"
1 Like

Another follow-on, I found a gitlab repo where we are using Google’s Kaniko to build container images. The cache flag is enabled, so Kaniko is caching build layers and uploading them to our GitLab project as a separate image registry.

In this rootless Podman runner, I started seeing this error during the Kaniko build of of an image build

INFO[0027] Found cached layer, extracting to filesystem 
INFO[0044] RUN echo "Misc. cleanup" &&   apt-get purge -y wget software-properties-common &&   apt-get autoremove -y &&   apt-get autoclean -y 
INFO[0044] Found cached layer, extracting to filesystem 
error building image: error building stage: failed to execute command: extracting fs from image: open /etc/dbus-1/system.d/.wh.com.ubuntu.SoftwareProperties.conf: invalid argument

This looks to me like Kaniko is finding and using the previously-built cached layers, but is for some reason having trouble using files inside them.

I can’t for the life of me figure out why. But I can see that going back to the ${HOME_GITLAB_RUNNER}/.config/containers/storage.conf file and changing…

- driver = "overlay"
+ driver = "vfs"
---
- mount_program = "/usr/bin/fuse-overlayfs"
- mountopt = "nodev,metacopy=on"
+ # mount_program = "/usr/bin/fuse-overlayfs"
+ # mountopt = "nodev,metacopy=on"
podman system reset -f  && \
  podman system migrate && \
  rm -rf ~/.local/share/containers

…has helped.