GitLab CI rules:exists fails to see cache

alan.flaherty · May 26, 2021, 4:43pm

I am trying to build a multistep pipeline that avoids running the jobs that are not needed on each and every run, speeding up the regular build in the process.

The build pipeline looks like:

build a data snapshot
install node packages
build code
run e2e-tests

I’m expecting that the prepare job should only run if the ci-build cache is not populated, and that the npm-install job should only run if the package-lock.json changes or there is no node_modules cache. This would let me reset the node_modules and the generated data snapshot for the next build run by just deleting the runners cache in the project pipelines.

What I’m seeing is that the tasks initial tasks don’t run when the cache is empty, I am able to work around this by manually starting the pipeline after clearing the cache, this works but it is obviously not the best solution.

The node_modules rule check in npm-install works as expected for code pushes and edits to the package-lock.json file, but it doesn’t rebuild the node modules if you clear the cache and then checkin code. You need to manually start a build pipeline for it to rebuild the node_modules cache.

  rules:
    - changes:
      - package-lock.json
    - exists:
      - node_modules
      when: never

The following rule for the prepare task always runs.

  rules:
    - exists:
      - ci-build
      when: never
    - when: always

I am worried that I am incorrectly using the rules in this build file or is there another way to accomplish the same tasks. Any help would be appreciated!

This project is hosted on gitlab.com, it uses a self hosted runner but broke it out into the sample below for testing on the shared runners to ensure that there were no configuration differences causing the issue.

----------------------------------------------------------- .gitlab-ci.yml ---------------------------------------------------------

variables:
  global_cache_key: ${CI_COMMIT_REF_SLUG}


default:
  image: alpine 
  after_script: &default_after
    - echo [.gitlab-ci.yml] saving root cache to local cache - in default after_script 
    - ls -lah
  before_script: &default_before
    - echo [.gitlab-ci.yml] saving local cache firebase to root cache - in default before_script 
    - export
    - ls -lah


stages:
  - prepare
  - install
  - build 
  - test


prepare:
  stage: prepare
  cache:
    - key: ${global_cache_key}
      paths:
        - ci-build
    - key: ${global_cache_key}-NODE_MODULES
      paths:
        - node_modules
  script: |
      #!/bin/bash
      echo "-------------------- install-firebase --------------------"
      if [  ! -d ci-build/firebase ];  then 
        echo [.gitlab-ci.yml] Installing Firestore - directory does not exist
        mkdir -p ci-build/firebase
        echo testing > ci-build/firebase/rules.txt
      else
        echo ![.gitlab-ci.yml] Using Firestore from cache
      fi

      echo "-------------------- create-extract --------------------"
      if [  ! -d ci-build/firebase/seed-data ];  then 
        echo [.gitlab-ci.yml] No seed data - creating data extract
        cd $CI_PROJECT_DIR/ci-build/firebase

        mkdir seed-data
        echo testing > seed-data/data-file.txt

        ls -lah ./seed-data/
      else
        echo ![.gitlab-ci.yml] Using seed-data from cache, directory exists
      fi
  rules:
    - exists:
      - ci-build
      when: never
    - when: always 


npm-install:
  stage: install
  cache:
    key: ${global_cache_key}-NODE_MODULES
    paths:
      - node_modules
  script:
    - mkdir -p node_modules
    - echo $(date) >> node_modules/test.txt
  rules:
    - changes:
      - package-lock.json
    - exists:
      - node_modules
      when: never


build-distribution:
  stage: build
  cache:
    - key: ${global_cache_key}
      paths:
        - ci-build
    - key: ${global_cache_key}-NODE_MODULES
      paths:
        - node_modules
  script: 
    - echo "-------------------- build-ember-distribution --------------------"
    - '[ ! -d ci-build/firebase/ ] && echo firebase missing && exit 1'
    - '[ ! -d ci-build/firebase/seed-data ] && echo seed-data missing && exit 1'
    - '[ ! -d node_modules ] && echo node_modules missing && exit 1'
    - mkdir -p ./dist/
    - echo "<h1>Home Page</h1>" > ./dist/index.html
  artifacts:
    expire_in: 1 week
    when: always
    paths:
      - dist


run-e2e-tests:
  stage: test
  cache:
    - key: ${global_cache_key}
      paths:
        - ci-build
    - key: ${global_cache_key}-NODE_MODULES
      paths:
        - node_modules
  script:
    - echo "-------------------- run-e2e-tests --------------------"
    - '[ ! -d ci-build/firebase/ ] && echo firebase missing && exit 1'
    - '[ ! -d ci-build/firebase/seed-data ] && echo seed-data missing && exit 1'
    - '[ ! -d node_modules ] && echo node_modules missing && exit 1'
    - '[ ! -d dist ] && echo dist missing && exit 1'
    - mkdir -p e2e-tests/cypress/screenshots
    - echo testing > e2e-tests/cypress/screenshots/results.txt
    - mkdir -p e2e-tests/cypress/videos
    - echo testing > e2e-tests/cypress/videos/results.txt
    - cat node_modules/test.txt
    - cat dist/index.html
  artifacts:
    expire_in: 1 week
    when: always
    paths:
      - e2e-tests/cypress/screenshots
      - e2e-tests/cypress/videos

ketzacoatl · June 4, 2021, 2:24pm

I just let the cache do its job without interfering and trying to skip the cache when it’s already there. In other words, if there’s a cache to populate, I’ve accepted that time per job as a general cost. That let’s all the jobs be more independent and move around with more flexibility, runners are more resilient, etc…

For example, if I killed your runner in-between jobs with your setup, there’s enough inter-job dependency (unless I’m confused about the setup you’ve described, which might be true) I’d probably take down the whole pipeline. Whereas, by letting the jobs handle their cache themselves, my jobs will continue on new/another runner host without interruption.

I hope that helps, goodluck.

balonik · June 7, 2021, 8:50am

Hi @alan.flaherty
I got a little lost in your explanation so I apologize if I am going to give some unrelated info.

According to GitLab documentation cache is not guaranteed. It might be extracted or it might not and your jobs should not rely on fact that cache works. You have to use artifacts as guaranteed method.

You also cannot use rules exists with dynamically created directories, because as mentioned in docs exists can be used only for files that exist in repository. In assume directory ci-build is not in repository and is only created by prepare job while the pipeline is running. That’s why prepare job is always running and also why your node_modules rule is not working in npm-install job.

EDIT: if the prepare or npm-install jobs take a long time to run and you are not changing them often what I usually do is prepare my custom container image with the dependencies installed and using that image for other jobs. This way you can just create another container image instead of flushing cache.

alan.flaherty · June 14, 2021, 4:25pm

Apologies for the delay in reply, there was a lot to reconsider!

We are using a private runner as the testing process itself is fairly intensive, everything is sequential for the same reason. There is just about enough memory to run one testing task at a time, if two check-in’s on different branches happened within a few minutes of each other the test task on one of the branches would most likely fail due to memory errors if they were run in parallel.

But I do take your point on killing a runner in the middle of populating the cache. That you should design the build to accommodate a failure like this. I think that by merging the tasks, populating the cache in the before_script and writing “check” files during the process I should be able to get what I was looking for out of the build system. When I was initially writing the ci script it seemed natural to separate these parts into individual jobs but in hindsight that may have been a mistake.

In short, what I was originally trying to achieve was to cache all node_modules directories and rebuild when the associated package-lock.json changes or the cache has been deleted. I think that by writing the package-lock.json checksum to file in the cache after each npm install I should be able to make it work. Compare stored checksum against current file, call npm ci if different. Finally, adding a check file to the cache when the process has completed should prevent cache corruption when a runner is aborted mid task.

The reasoning for all this is that a full CI rebuild could take 20 - 30 minutes right now and that is likely to increase as more test are added. However, only about 5 minutes of that is the tests themselves. By only running npm installs when the relevant package-lock.json changes it should be possible to avoid the majority of the build process and have the test results for that check-in back to the developer within a few minutes.

Thanks for your help on this, it is much appreciated!

alan.flaherty · June 14, 2021, 4:44pm

Hi @balonik,

Apologies, the question was a little terse, but yes it did all hinge around incorrect use of rules that I managed to get to the bottom of a day or so after posting the question. You can only use rules on repo files and they must be in place at the start of the entire process unless you use child builds.

There was also a lot of conflicting advice on the internet about use of artifacts instead of cache. For my case I wanted it persisted, not just passed between stages, and the cache should always be there as it runs on its own gitlab-runner, so I felt cache was a better solution.

I have a custom docker image for the build but it does not contain the dependencies, these change reasonably regularly, too regularly to have to rebuild and re-deploy a docker image when this does happen.

I think that the solution I mentioned in the previous post, maintaining the checksums for each package-lock.json in the cache, merging the prepare and npm-install tasks into the before_script of the testing task and using a check file to indicate that the initial cache was written successfully should meet the original requirements, without using the rules and additional tasks.

Thank you for your assistance, it is much appreciated!

ketzacoatl · June 15, 2021, 12:03pm

If you do wish to separate them, you would want different stages. Multiple jobs can run in parallel in a stage, and cannot run sequentially within a stage. Multiple stages run sequentially, not in parallel.

You can then use a combination of caching and artifacts to prime jobs and pass along outputs. For example, it’s reasonable to make node_modules an artifact between jobs (and cached by the job that “builds” the initial artifact for the pipeline.

rules is a bit more complicated than the original facility for this, however you should be able to accomplish what you are looking for. I would recommend “keeping it simple”, and building it out iteratively if you run into issues.

It sounds like you are “doing too much”, and “making it more complicated than it needs to be” - go back to “keeping it simple”. Let GitLab manage the cache. Do less work. Maybe you do need the if/else conditionals, but I would first challenge myself to find a way without the complication.

Also, for caching, you might be clobbering your cache, and you might want to ensure the cache keys are more unique (this can be challenging to get right, but I tend to start with “more unique” and make the key “more generic” if the cache is not being used enough. Per branch caches, and falling back to master is a good default.