How to prevent GitLab CI from needlessly recreating cache?

Problem to solve

I use the following .gitlab-ci.yml to lint a Python poetry project:

image: python

stages:
  - lint

cache:
  paths:
    - .poetry
    - .venv
  key: "${CI_COMMIT_REF_SLUG}"

before_script:
  - export POETRY_HOME=.poetry
  - python3 -m venv $POETRY_HOME
  - $POETRY_HOME/bin/pip install poetry
  - $POETRY_HOME/bin/poetry config virtualenvs.in-project true
  - $POETRY_HOME/bin/poetry install
  - source $($POETRY_HOME/bin/poetry env info --path)/bin/activate

pre-commit:
  stage: lint
  script:
    - pre-commit install
    - pre-commit run --all-files

Note that I use pip to install poetry into ./.poetry/ and poetry to install the project dependencies into ./.venv/. This is a time consuming process so I cache both these directories. Using key: "${CI_COMMIT_REF_SLUG}", poetry and the dependencies are installed the first time I push a new branch.

When I push additional commits, pip doesn’t install anything and just displays:

$ $POETRY_HOME/bin/pip install poetry
Requirement already satisfied: poetry in ./.poetry/lib/python3.12/site-packages (1.8.3)
Requirement already satisfied: build<2.0.0,>=1.0.3 in ./.poetry/lib/python3.12/site-packages (from poetry) (1.2.1)
...

Also poetry doesn’t install anything and displays:

$ $POETRY_HOME/bin/poetry install
Installing dependencies from lock file
No dependencies to install or update
Installing the current project: myproject (0.0.1)

This saves time, which is good. However, after the CI job completes successfully, it recreates the cache:

Saving cache for successful job 02:41
Creating cache ci-cd-non_protected...
.poetry: found 4165 matching artifact files and directories 
.venv: found 40214 matching artifact files and directories 
No URL provided, cache will not be uploaded to shared cache server. Cache will be stored only locally. 
Created cache

This step takes a lot of time and nullifies the time saving caching is supposed to provide in the first place. I know that recaching ./.poetry is completely unnecessary because pip changes nothing about its conents, and recaching ./.venv is also unnecessary because if anything the editable installation of myproject changed, but that will get overwritten in the next CI run anyways.

Steps to reproduce

Use the above .gitlab-ci.yml in any Python poetry project that depends on pre-commit.

Versions

Please select whether options apply, and add the version information.

  • Self-managed
  • GitLab.com SaaS
  • Self-hosted Runners

Versions

Version:      17.0.0
Git revision: 44feccdf
Git branch:   17-0-stable
GO version:   go1.21.9
Built:        2024-05-16T13:46:14+0000
OS/Arch:      linux/amd64

Searching the web for this string with preceding gitlab runner string led me to Pipeline shared cache "No URL provided, cache will be not uploaded to shared cache server" on one specific project (#16097) · Issues · GitLab.org / gitlab-runner · GitLab which indicates that the cache might be on an S3 bucket, having authentication issues. The runner log should provide more insights.

The troubleshooting docs provide more help: Troubleshooting GitLab Runner | GitLab

If I understand this documentation section correctly, I don’t think this applies to my case. I have a single host server that runs a single GitLab runner. So since “Cache is stored where GitLab Runner is installed”, there should be no need for a remote cache server setup, right?

Also, apparently the cache is there and can be accessed, otherwise pip and poetry would have to reinstall dependencies every time the pipeline is run.

Again, what I don’t get is why a cache is being created at the end of a successful run, if for that run a cache was available originally.

FWIW, here is a complete job log

Summary
Running with gitlab-runner 17.0.0 (44feccdf)
  on ci-cd D8rh4HFdx, system ID: s_9092089c3878
Resolving secrets
Preparing the "docker" executor 00:03
Using Docker executor with image python ...
Pulling docker image python ...
Using docker image sha256:12e5ab9d51c883bedc4db6c7cbc49f3fa97aef4e98dc205bf1c67e21fd9cb6f4 for python with digest python@sha256:3966b81808d864099f802080d897cef36c01550472ab3955fdd716d1c665acd6 ...
Preparing environment 00:00
Running on runner-d8rh4hfdx-project-168381-concurrent-0 via ci-cd...
Getting source from Git repository 00:03
Fetching changes with git depth set to 20...
Reinitialized existing Git repository in /builds/niklas.kappel/humf/.git/
Checking out e2c411c6 as detached HEAD (ref is main)...
Removing .poetry/
Removing .pytest_cache/
Removing .venv/
Removing src/humf/__pycache__/
Removing src/humf/data/__pycache__/
Removing src/humf/lammps/__pycache__/
Removing src/humf/models/__pycache__/
Removing tests/data/__pycache__/
Removing tests/inputs/bonds_dataset/data/processed/
Removing tests/inputs/lammps_dataset/data/processed/
Removing tests/models/__pycache__/
Skipping Git submodules setup
Restoring cache 01:19
Checking cache for main-protected...
No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted. 
Successfully extracted cache
Executing "step_script" stage of the job script 00:31
Using docker image sha256:12e5ab9d51c883bedc4db6c7cbc49f3fa97aef4e98dc205bf1c67e21fd9cb6f4 for python with digest python@sha256:3966b81808d864099f802080d897cef36c01550472ab3955fdd716d1c665acd6 ...
$ export POETRY_HOME=.poetry
$ python3 -m venv $POETRY_HOME
$ $POETRY_HOME/bin/pip install poetry
Requirement already satisfied: poetry in ./.poetry/lib/python3.12/site-packages (1.8.3)
Requirement already satisfied: build<2.0.0,>=1.0.3 in ./.poetry/lib/python3.12/site-packages (from poetry) (1.2.1)
Requirement already satisfied: cachecontrol<0.15.0,>=0.14.0 in ./.poetry/lib/python3.12/site-packages (from cachecontrol[filecache]<0.15.0,>=0.14.0->poetry) (0.14.0)
Requirement already satisfied: cleo<3.0.0,>=2.1.0 in ./.poetry/lib/python3.12/site-packages (from poetry) (2.1.0)
Requirement already satisfied: crashtest<0.5.0,>=0.4.1 in ./.poetry/lib/python3.12/site-packages (from poetry) (0.4.1)
Requirement already satisfied: dulwich<0.22.0,>=0.21.2 in ./.poetry/lib/python3.12/site-packages (from poetry) (0.21.7)
Requirement already satisfied: fastjsonschema<3.0.0,>=2.18.0 in ./.poetry/lib/python3.12/site-packages (from poetry) (2.19.1)
Requirement already satisfied: installer<0.8.0,>=0.7.0 in ./.poetry/lib/python3.12/site-packages (from poetry) (0.7.0)
Requirement already satisfied: keyring<25.0.0,>=24.0.0 in ./.poetry/lib/python3.12/site-packages (from poetry) (24.3.1)
Requirement already satisfied: packaging>=23.1 in ./.poetry/lib/python3.12/site-packages (from poetry) (24.0)
Requirement already satisfied: pexpect<5.0.0,>=4.7.0 in ./.poetry/lib/python3.12/site-packages (from poetry) (4.9.0)
Requirement already satisfied: pkginfo<2.0,>=1.10 in ./.poetry/lib/python3.12/site-packages (from poetry) (1.10.0)
Requirement already satisfied: platformdirs<5,>=3.0.0 in ./.poetry/lib/python3.12/site-packages (from poetry) (4.2.2)
Requirement already satisfied: poetry-core==1.9.0 in ./.poetry/lib/python3.12/site-packages (from poetry) (1.9.0)
Requirement already satisfied: poetry-plugin-export<2.0.0,>=1.6.0 in ./.poetry/lib/python3.12/site-packages (from poetry) (1.8.0)
Requirement already satisfied: pyproject-hooks<2.0.0,>=1.0.0 in ./.poetry/lib/python3.12/site-packages (from poetry) (1.1.0)
Requirement already satisfied: requests<3.0,>=2.26 in ./.poetry/lib/python3.12/site-packages (from poetry) (2.32.2)
Requirement already satisfied: requests-toolbelt<2.0.0,>=1.0.0 in ./.poetry/lib/python3.12/site-packages (from poetry) (1.0.0)
Requirement already satisfied: shellingham<2.0,>=1.5 in ./.poetry/lib/python3.12/site-packages (from poetry) (1.5.4)
Requirement already satisfied: tomlkit<1.0.0,>=0.11.4 in ./.poetry/lib/python3.12/site-packages (from poetry) (0.12.5)
Requirement already satisfied: trove-classifiers>=2022.5.19 in ./.poetry/lib/python3.12/site-packages (from poetry) (2024.5.22)
Requirement already satisfied: virtualenv<21.0.0,>=20.23.0 in ./.poetry/lib/python3.12/site-packages (from poetry) (20.26.2)
Requirement already satisfied: msgpack<2.0.0,>=0.5.2 in ./.poetry/lib/python3.12/site-packages (from cachecontrol<0.15.0,>=0.14.0->cachecontrol[filecache]<0.15.0,>=0.14.0->poetry) (1.0.8)
Requirement already satisfied: filelock>=3.8.0 in ./.poetry/lib/python3.12/site-packages (from cachecontrol[filecache]<0.15.0,>=0.14.0->poetry) (3.14.0)
Requirement already satisfied: rapidfuzz<4.0.0,>=3.0.0 in ./.poetry/lib/python3.12/site-packages (from cleo<3.0.0,>=2.1.0->poetry) (3.9.1)
Requirement already satisfied: urllib3>=1.25 in ./.poetry/lib/python3.12/site-packages (from dulwich<0.22.0,>=0.21.2->poetry) (2.2.1)
Requirement already satisfied: jaraco.classes in ./.poetry/lib/python3.12/site-packages (from keyring<25.0.0,>=24.0.0->poetry) (3.4.0)
Requirement already satisfied: SecretStorage>=3.2 in ./.poetry/lib/python3.12/site-packages (from keyring<25.0.0,>=24.0.0->poetry) (3.3.3)
Requirement already satisfied: jeepney>=0.4.2 in ./.poetry/lib/python3.12/site-packages (from keyring<25.0.0,>=24.0.0->poetry) (0.8.0)
Requirement already satisfied: ptyprocess>=0.5 in ./.poetry/lib/python3.12/site-packages (from pexpect<5.0.0,>=4.7.0->poetry) (0.7.0)
Requirement already satisfied: charset-normalizer<4,>=2 in ./.poetry/lib/python3.12/site-packages (from requests<3.0,>=2.26->poetry) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in ./.poetry/lib/python3.12/site-packages (from requests<3.0,>=2.26->poetry) (3.7)
Requirement already satisfied: certifi>=2017.4.17 in ./.poetry/lib/python3.12/site-packages (from requests<3.0,>=2.26->poetry) (2024.2.2)
Requirement already satisfied: distlib<1,>=0.3.7 in ./.poetry/lib/python3.12/site-packages (from virtualenv<21.0.0,>=20.23.0->poetry) (0.3.8)
Requirement already satisfied: cryptography>=2.0 in ./.poetry/lib/python3.12/site-packages (from SecretStorage>=3.2->keyring<25.0.0,>=24.0.0->poetry) (42.0.7)
Requirement already satisfied: more-itertools in ./.poetry/lib/python3.12/site-packages (from jaraco.classes->keyring<25.0.0,>=24.0.0->poetry) (10.2.0)
Requirement already satisfied: cffi>=1.12 in ./.poetry/lib/python3.12/site-packages (from cryptography>=2.0->SecretStorage>=3.2->keyring<25.0.0,>=24.0.0->poetry) (1.16.0)
Requirement already satisfied: pycparser in ./.poetry/lib/python3.12/site-packages (from cffi>=1.12->cryptography>=2.0->SecretStorage>=3.2->keyring<25.0.0,>=24.0.0->poetry) (2.22)
$ $POETRY_HOME/bin/poetry config virtualenvs.in-project true
$ $POETRY_HOME/bin/poetry install
Installing dependencies from lock file
No dependencies to install or update
Installing the current project: humf (0.0.1)
$ source $($POETRY_HOME/bin/poetry env info --path)/bin/activate
$ source .gitlab-ci/pre-commit.sh
pre-commit installed at .git/hooks/pre-commit
[INFO] Initializing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Initializing environment for https://github.com/astral-sh/ruff-pre-commit.
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/astral-sh/ruff-pre-commit.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
Trim Trailing Whitespace.................................................Passed
Fix End of Files.........................................................Passed
Check Yaml...............................................................Passed
Check for added large files..............................................Passed
ruff.....................................................................Passed
ruff-format..............................................................Passed
All pre-commit checks passed.
Saving cache for successful job 04:02
Creating cache main-protected...
.poetry: found 4165 matching artifact files and directories 
.venv: found 40214 matching artifact files and directories 
No URL provided, cache will not be uploaded to shared cache server. Cache will be stored only locally. 
Created cache
Cleaning up project directory and file based variables 00:01
Job succeeded

Can you share the runner configuration toml file? (redact all sensitive information) I’m trying to understand the setup in more detail.

1 Like

Thanks, I can share the file on Monday (don’t have access to the server right now). However, I did not touch the file or change any settings, so it should be the default configuration for a GitLab runner (with docker executor, on Ubuntu 22.04, installed from the repo).

Additional thought - maybe the no URL provided message is not an error message, but an indicator that the cache is stored locally.

The reason for the job to take a long time to store the cache could be analyzed through these steps:

  1. Resource bottleneck - you mentioned that GitLab Runner is installed on the same host as the GitLab server. Recommended setup is to create a dedicated VM/host for the Runner.
  2. I/O performance - shared filesystem, slow disks, etc. A dedicated SSD device might provide faster caching.
  3. Monitoring tools like Prometheus and its node exporter can help shed light here.
1 Like

Here is the config file:

concurrent = 1
check_interval = 0
connection_max_age = "15m0s"
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "ci-cd"
  url = ...
  id = 949
  token = ...
  token_obtained_at = 2024-05-29T12:27:55Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "docker"
  [runners.custom_build_dir]
  [runners.cache]
    MaxUploadedArchiveSize = 0
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.docker]
    tls_verify = false
    image = "python"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0
    network_mtu = 0

Sorry for the confusion, but GitLab runner and GitLab server are not on the same host. GitLab server itself is managed by my university. I only manage a single host with GitLab runner installed.

If this cache recreation behavior is to be expected and the cache is saved where GitLab runner is installed, I may try to profile I/O performance on the runner host.

If this cache recreation behavior is to be expected and the cache is saved where GitLab server is installed, I may ask my uni to profile I/O performance on the server host.

More realistically, I may try a docker setup like this one instead of the GitLab cache, to make sure installation and caching steps only run when the requirements of my project actually change.

You could utilize the GIT_CLEAN_FLAGS to prevent your virtual environment (venv) from being deleted, and then move your caching term under a conditional stage, for example.

image: python

stages:
  - lint

cache:
  paths:
    - .poetry
    - .venv
  key: "${CI_COMMIT_REF_SLUG}"

before_script:
  - export POETRY_HOME=.poetry
  - python3 -m venv $POETRY_HOME
  - $POETRY_HOME/bin/pip install poetry
  - $POETRY_HOME/bin/poetry config virtualenvs.in-project true
  - $POETRY_HOME/bin/poetry install
  - source $($POETRY_HOME/bin/poetry env info --path)/bin/activate

# run build only when poetry.lock changes or if .venv/ does not exist
build:
  stage: build
  script:
    - poetry install --no-interaction --no-root
  rules:
    - changes:
      - pyproject.toml
      - poetry.lock
    - exists:
      - .venv/

  cache:
    paths:
      - .venv/
      - .poetry/

pre-commit:
  stage: lint
  script:
    - pre-commit install
    - pre-commit run --all-files