Using cache for pip/npm dependencies in Gitlab CI

Hi,

We’ve been using Gitlab.com (not self-managed) for the last few weeks. We want to use the shared runners to execute our CI, and I succeed to set up a config with our existing suite tests.

The main stage passes, however it takes about 22 minutes compared to 10-12 minutes on our legacy CI for one main reasons : Pypi and npm packages are downloaded and re-installed/compiled at each pipeline, which takes minutes (definitely most of the 10 extra minutes, maybe the whole).

Our .gitlab-ci.yml looks like this right now. Sorry for the long paste, but I prefer to give as much context as possible:

image: "python:3.7-alpine"

variables:
  [... some db/tokens variables...]
  # Set pip's cache inside the project directory since we can only cache local items
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"

stages:
  - test
  - coverage

cache:
  key: pip-and-npm-global-cache
  paths:
    - $CI_PROJECT_DIR/.cache/pip
    - $CI_PROJECT_DIR/.cache/npm

before_script:
  - mkdir -p $CI_PROJECT_DIR/.cache/pip $CI_PROJECT_DIR/.cache/npm

django_tests:
  stage: test

  services:
    - postgres:9.6-alpine
    - mongo:3.6-xenial

  cache:
    key: "coverage-$CI_COMMIT_REF_SLUG"
    paths:
      - .coverage

  script:
    # Various packages required to run dependencies below
    - apk add [...]
    - pip install pip --upgrade
    - pip install -r requirements.txt
    - coverage ...  # execute tests here

js_tests:
  stage: test
  image: "node:alpine"
  cache:
    key: "$CI_COMMIT_REF_SLUG"
    paths:
      - node_modules/

  script:
    - npm ci --cache $CI_PROJECT_DIR/.cache/npm --prefer-offline
    - npm install && npm run build
    - npm run test

coverage:
  stage: coverage
  script:
    - pip install coverage==4.5.3 django_coverage_plugin==1.6.0
    - coverage report -i -m [...]

First, the “test” stage always re-installs the packages, even between two builds on the same branch without any new commits. The stage passes though (as said before) but the coverage one doesn’t because some pip requirements installed before are not available anymore.
I have the same problem with a local runner on my machine and with the shared runners of Gitlab.com.

I tried to set some ls in the script and it seems $CI_PROJECT_DIR/.cache is always empty at the start of a job (django_tests and coverage). Did I miss something? Does any of my cache declaration overlaps another one?

1 Like

I allow myself to resurrect my question. Does anyone have already encounter this problem? Does anyone has a config example working for pip?

2 Likes

Same thing here. I searched every corner of the internet for someone who had some working example or application. I have a successful case with maven, but for python it’s a different template and i have tried everything within my knowledge but without any success.

Hi @mlorant and @luciojb,

the Python example from the official documentation has worked pretty well for me in the past:
https://docs.gitlab.com/ee/ci/caching/#caching-python-dependencies

My understanding of the cache configuration is that job-level directives override global ones, so each of your jobs seems to be using a different cache, and only the last job is using the global one. If all jobs are supposed to use the same cache, try using only a global cache configuration and see if that helps.

One thing I would like to point out about the Python example above is that it caches the venv directory it installs packages to in addition to the Pip package cache. This should prevent jobs from re-installing the same packages every time.

Kind regards,
Alexander

1 Like