Hi,
We’ve been using Gitlab.com (not self-managed) for the last few weeks. We want to use the shared runners to execute our CI, and I succeed to set up a config with our existing suite tests.
The main stage passes, however it takes about 22 minutes compared to 10-12 minutes on our legacy CI for one main reasons : Pypi and npm packages are downloaded and re-installed/compiled at each pipeline, which takes minutes (definitely most of the 10 extra minutes, maybe the whole).
Our .gitlab-ci.yml looks like this right now. Sorry for the long paste, but I prefer to give as much context as possible:
image: "python:3.7-alpine"
variables:
[... some db/tokens variables...]
# Set pip's cache inside the project directory since we can only cache local items
PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"
stages:
- test
- coverage
cache:
key: pip-and-npm-global-cache
paths:
- $CI_PROJECT_DIR/.cache/pip
- $CI_PROJECT_DIR/.cache/npm
before_script:
- mkdir -p $CI_PROJECT_DIR/.cache/pip $CI_PROJECT_DIR/.cache/npm
django_tests:
stage: test
services:
- postgres:9.6-alpine
- mongo:3.6-xenial
cache:
key: "coverage-$CI_COMMIT_REF_SLUG"
paths:
- .coverage
script:
# Various packages required to run dependencies below
- apk add [...]
- pip install pip --upgrade
- pip install -r requirements.txt
- coverage ... # execute tests here
js_tests:
stage: test
image: "node:alpine"
cache:
key: "$CI_COMMIT_REF_SLUG"
paths:
- node_modules/
script:
- npm ci --cache $CI_PROJECT_DIR/.cache/npm --prefer-offline
- npm install && npm run build
- npm run test
coverage:
stage: coverage
script:
- pip install coverage==4.5.3 django_coverage_plugin==1.6.0
- coverage report -i -m [...]
First, the “test” stage always re-installs the packages, even between two builds on the same branch without any new commits. The stage passes though (as said before) but the coverage one doesn’t because some pip requirements installed before are not available anymore.
I have the same problem with a local runner on my machine and with the shared runners of Gitlab.com.
I tried to set some ls
in the script and it seems $CI_PROJECT_DIR/.cache
is always empty at the start of a job (django_tests and coverage). Did I miss something? Does any of my cache declaration overlaps another one?
2 Likes
I allow myself to resurrect my question. Does anyone have already encounter this problem? Does anyone has a config example working for pip?
3 Likes
Same thing here. I searched every corner of the internet for someone who had some working example or application. I have a successful case with maven, but for python it’s a different template and i have tried everything within my knowledge but without any success.
Hi @mlorant and @luciojb,
the Python example from the official documentation has worked pretty well for me in the past:
https://docs.gitlab.com/ee/ci/caching/#caching-python-dependencies
My understanding of the cache
configuration is that job-level directives override global ones, so each of your jobs seems to be using a different cache, and only the last job is using the global one. If all jobs are supposed to use the same cache, try using only a global cache
configuration and see if that helps.
One thing I would like to point out about the Python example above is that it caches the venv
directory it installs packages to in addition to the Pip package cache. This should prevent jobs from re-installing the same packages every time.
Kind regards,
Alexander
1 Like
Hi @mlorant,
I used to have the very same problem up until a couple of weeks ago, when I finally found a webpage setting it up in the right way.
First of all, you have to keep in mind, that when a job creates a cache, even with the broadest matching key, it will only stay local to the runner it was created on, with a bit of luck a future job might pull it. Watch out, default policy is pull-push, so every job can rewrite the cache and potentially wipe content you want to preserve. In the case of “pip-and-npm-global-cache”, the npm and python job are trashing each others content. Most important thing I did, was get a Minio container and provide a shared cache. Next I change policy to pull and had the first job take care of filling the pip cache and installing venv. The whole thing works best though, when venv is saved as artifact!
variables:
PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"
XDG_CACHE_HOME: "$CI_PROJECT_DIR/.cache"
image: python:3.8-slim-buster
## These paths will be cached in between test runs. Saving the download times.
cache: &package_cache
key:
files:
- poetry.lock
prefix: poetry
paths:
- .cache
policy: pull
stages:
- Prepare
- Static Analysis
before_script:
- apt-get update &&
apt-get install -qqy --no-install-recommends --no-install-suggests
make
- pip install poetry
dependencies:
stage: Prepare
cache:
<<: *package_cache
policy: pull-push
script:
- python -m venv --copies venv
- source venv/bin/activate
- python -m pip install --upgrade pip
- poetry export --without-hashes -n |
tee requirements.txt
- poetry export --dev --without-hashes -n |
tee requirements-dev.txt |
pip install -r /dev/stdin
# pip cache filled with all package downloads
- poetry build
artifacts:
paths:
- requirements.txt
- requirements-dev.txt
- dist/
- .venv/
exclude:
- .venv/**/__pycache__/*
when: on_success
# venv gets extracted for each job and is immutable
# The uncompromising Python code formatter
black:
stage: Static Analysis
script:
- make lint-test
# Simple and scalable tests for Python code
pytest:
stage: Static Analysis
script:
- poetry install
- make test
artifacts:
when: on_success
reports:
cobertura: coverage.xml
junit: report.xml
Found the webpage again, that gave me the idea: GitLab CI: Cache and Artifacts explained by example
I hope that helps
1 Like