How to use gitlab cache

Hi there.

I’m trying to figure out how to use the caching feature of the gitlab pipelines.

The setup we are using is very simple.
We use the docker executor.
The default image is just a node:alpine image.
Every stage needs to install some npm packages, so initially we started with yarn install in the before_script.

The file was something like this

default:
  image: node:alpine

  before_script:
    - yarn install

stages:
  - audit
  - lint
  - test
  - build

audit:
  stage: audit
  script:
    - yarn audit --level moderate

lint:
  stage: lint
  script:
    - yarn lint

test:
  stage: test
  script:
    - yarn coverage

build:
  stage: build
  script:
    - yarn build

then, once we saw that yarn install spends 4-5 mins each time, we wanted to cache the results and not run it on every step.
yarn.lock is the same, so the output of yarn install is the same as well.

I tried every single combination I found on the internet about cache.
None of them worked.

I tried caching node_modules, I tried using a .yarn directory as the cache, etc.
I tried untracked: true, I tried global cache or adding it in every job / stage, etc.

for example, one of the attempts I made ( I have plenty variations, I won’t post all of them )

default:
  image: node:alpine

  before_script:
    - yarn install

stages:
  - audit
  - install-packages
  - lint
  - test
  - build

audit:
  stage: audit
  script:
    - yarn audit --level moderate

install-packages:
  stage: install-packages
  script:
    - yarn install
  cache:
    paths:
      - node_modules
    untracked: true
    policy: pull-push

lint:
  stage: lint
  script:
    - yarn lint
  cache:
    paths:
      - node_modules
    policy: pull

test:
  stage: test
  script:
    - yarn coverage
  cache:
    paths:
      - node_modules
    policy: pull

build:
  stage: build
  script:
    - yarn build
  cache:
    paths:
      - node_modules
    policy: pull

Every time, the result was the same more or less.
The install-packages stage I made would have a message like this at the end:

Saving cache for successful job 00:20
Creating cache default...
node_modules: found 70214 matching files and directories 
untracked: found 59929 files                       
No URL provided, cache will be not uploaded to shared cache server. Cache will be stored only locally. 
Created cache
Job succeeded

but the very next stage, would say

Checking out bf6cbf54 as tech-debt/test-pipeline-fix...
Removing node_modules/
Skipping Git submodules setup
Restoring cache 00:00
Checking cache for default...
No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted. 
Successfully extracted cache
Executing "step_script" stage of the job script 00:01
Using docker image sha256:9d..4 for docker.ko...f3 ...
$ yarn lint
yarn run v1.22.5
$ eslint --ext .ts,.tsx ./src/
/bin/sh: eslint: not found
error Command failed with exit code 127.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
ERROR: Job failed: exit code 127

and go on to fail because node_modules was not there.

my questions :

  • where does that cache go exactly? do we have to set something in the runner config?
  • why do I see the Removing node_modules/ in the next stage/job? Isn’t the whole point of the cache to add/mount that directory? why is it removing it?

I looked at the docs and I cannot find any answer, unless I’m missing something.

In paper, it looks so simple.
Mark cache directory path, push or pull, done.
But it’s not working. I must be doing something wrong.
Any help would be appreciated.

Thank you.

So, a cache is for things like dependencies that you install for the pipeline. For example, you might need to install Node, so that your pipeline jobs can run npm.

What you need is to pass artifacts between the pipeline stages. e.g.

install-packages:
  stage: install-packages
  script:
    - yarn install
artifacts:
    paths:
      - node_modules
    expire_in: 2 week

You will probably want to read about when artifacts are deleted and the related sections on how to keep the latest pipeline.

1 Like

The gitlab documentation specifically mentions caching node modules with cache: in their examples:

https://docs.gitlab.com/ee/ci/caching/

We’ve tried several incarnations of this setup, but none of them seems to be able to restore the cache for the current or subsequent runs of the pipeline. Any advice on how to debug this would be welcome, like how can we inspect the cache after a pipeline is complete, or at the beginning of a new pipeline?

Hi,

With Docker executor cache from jobs is stored in /cache dir in the container itself. Since the container is ephemeral this is not stored on Docker host out of the box. You need to configure your GitLab Runner for local cache. Make sure you have something like this in your config.toml (not including other options)

[[runners]]
cache_dir = "/cache"
[runners.docker]
disable_cache = false
cache_dir = ""
volumes = ['/cache']

There is also option to use distributed cache.

2 Likes

that was the issue!
works great now.

Thank you @balonik