Using cache for pip/npm dependencies in Gitlab CI

Hi @mlorant,
I used to have the very same problem up until a couple of weeks ago, when I finally found a webpage setting it up in the right way.
First of all, you have to keep in mind, that when a job creates a cache, even with the broadest matching key, it will only stay local to the runner it was created on, with a bit of luck a future job might pull it. Watch out, default policy is pull-push, so every job can rewrite the cache and potentially wipe content you want to preserve. In the case of “pip-and-npm-global-cache”, the npm and python job are trashing each others content. Most important thing I did, was get a Minio container and provide a shared cache. Next I change policy to pull and had the first job take care of filling the pip cache and installing venv. The whole thing works best though, when venv is saved as artifact!

variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"
  XDG_CACHE_HOME: "$CI_PROJECT_DIR/.cache"

image: python:3.8-slim-buster

## These paths will be cached in between test runs. Saving the download times.
cache: &package_cache
  key:
    files:
      - poetry.lock
    prefix: poetry
  paths:
    - .cache
  policy: pull

stages:
  - Prepare
  - Static Analysis

before_script:
  - apt-get update &&
      apt-get install -qqy --no-install-recommends --no-install-suggests
        make
  - pip install poetry

dependencies:
  stage: Prepare
  cache:
    <<: *package_cache
    policy: pull-push
  script:
    - python -m venv --copies venv
    - source venv/bin/activate
    - python -m pip install --upgrade pip
    - poetry export --without-hashes -n |
        tee requirements.txt
    - poetry export --dev --without-hashes -n |
        tee requirements-dev.txt |
        pip install -r /dev/stdin
    # pip cache filled with all package downloads
    - poetry build
  artifacts:
    paths:
      - requirements.txt
      - requirements-dev.txt
      - dist/
      - .venv/
    exclude:
      - .venv/**/__pycache__/*
    when: on_success
    # venv gets extracted for each job and is immutable

# The uncompromising Python code formatter
black:
  stage: Static Analysis
  script:
    - make lint-test

# Simple and scalable tests for Python code
pytest:
  stage: Static Analysis
  script:
    - poetry install
    - make test
  artifacts:
    when: on_success
    reports:
      cobertura: coverage.xml
      junit: report.xml

Found the webpage again, that gave me the idea: GitLab CI: Cache and Artifacts explained by example

I hope that helps

1 Like