Job with when: always gets skipped

I am trying to run a job to upload rspec tests to Datadog after the tests either pass or fail. The tests themselves are in one job and the upload task is in another job. The tests are in the stage “test” and the upload is in the stage “monitor,” which takes place right after the test stage.
I have the job to upload the test reports set with when: always, yet the job gets skipped if the unit tests fail.
I am able to work around this somewhat by setting the unit tests to have allow_failure: true, but I want the pipeline to fail if the tests fail, which still uploading the test reports. Is this possible?

I am using Gitlab.com.

Here is my configuration:

include:
  - project: templates/kubernetes
    ref: master
    file: /.kube-api-version-checks.yaml
  - local: .choose-runner.yaml
    ref: master
.run_specs_script: &run_specs_script |
  ./kubernetes/integration/specs/run-specs.sh $CI_COMMIT_SHA $TEST_NAMESPACE $ECR_BASE_URL/test/$IMAGE_NAME $PROCESSES ${UNIT_TEST_INSTANCE_TYPE:-c5d.12xlarge}

.base_unit_tests:
  image: XXX
  stage: test
  coverage: '/TOTAL\sCOVERAGE:\s\d+\.\d+%/'
  variables:
    GIT_DEPTH: 1
  script:
    - *run_specs_script
  after_script:
    - kubectl delete ns $TEST_NAMESPACE
  artifacts:
    when: always
    reports:
      junit: tmp/*.xml
    paths:
      - tmp/*.xml
      - artifact.tar.gz

unit_tests:
  allow_failure: true
  extends:
    - .base_unit_tests
    - .integration

unit_tests_dependency_update:
  extends:
    - .base_unit_tests
    - .low-priority

unit_tests_dependencies_next:
  image: XXX
  stage: test
  allow_failure: true
  except:
    - web
    - triggers
  tags:
    - integration-green-kube-runner
  only:
    refs:
      - merge_requests
    variables:
      - $CI_MERGE_REQUEST_SOURCE_BRANCH_NAME =~ /^hint\/upgrade/
  variables:
    GIT_DEPTH: 1
    DEPENDENCIES_NEXT: 1
    IMAGE_NAME: next
  script:
    - *run_specs_script
  after_script:
    - kubectl delete ns $TEST_NAMESPACE
  artifacts:
    when: always
    reports:
      junit: tmp/*.xml
    paths:
      - tmp/*.xml
      - artifact.tar.gz

unit_tests_datadog:
  extends:
    - .integration
  stage: monitor
  image: node
  variables:
    DD_API_KEY: XXX
  before_script:
    - npm install -g @datadog/datadog-ci
  script:
    - DD_ENV=ci DATADOG_API_KEY="$DD_API_KEY" DATADOG_SITE=datadoghq.com datadog-ci junit upload --service <service> ./tmp
  dependencies:
    - unit_tests
  when: always

Hi @Nicholas.Calaway,

interesting challenge :slight_smile: Let’s break down the requirements from your description and configuration:

  • unit_tests is either 0 or 1+ as exit status, and you’ll want to store the report data always.
  • unit_tests_datadog runs always and needs access to the report data.

For simplicity, I am using unit_tests and upload in my tests below.

Parent-child pipelines for async upload?

I was thinking about parent-child pipelines though I am not sure if that really works with “always trigger upload”.

While trying to implement the steps, I thought about using caches instead of artifacts. You can specify to keep a cache even when the job failed, using when:always. The reason I remembered - I created a Pipeline Efficiency workshop a while ago :slight_smile:

More in the Developer Evangelism handbook at

Always cache job reports to upload in always-run later jobs

I’m suggesting the following steps:

  1. Introduce a cache with the branch name as key.
cache:
  key: $CI_COMMIT_REF_SLUG  
  1. Define global variables which work as parameters for the jobs
variables:
  STATUS: 1 # control the `unit_tests`  job exit status. 0..ok, 1..failure
  CACHE_FILE: upload/date.data 
  1. Define the unit_tests job to use the cache, with upload as the path. Additionally specify that the cache will always be uploaded, no matter the job status.

The script section then purge earlier data, creates the upload directory, and simulates data creation by generating a file. The last step is to call exit $STATUS based on the CI/CD variable value.

unit_tests:
  stage: build
  cache:
    paths:
      - upload/
    when: always # Keep the cache even when the job fails,  https://docs.gitlab.com/ee/ci/yaml/#cachewhen   
  script: 
    - rm -f $CACHE_FILE
    - mkdir -p upload && echo "`date +%s`" > $CACHE_FILE
    - echo "Will return success or failure based on $STATUS"
    - exit $STATUS
  1. Create the upload which always runs, in a stage after build, use test for example. This is the same as your question.

The upload job also uses the upload/ directory from the cache. Its script section uses inline bash conditions whether the cached data is available.

  • Yes? Upload data, and delete the data from the cache for the next run.
  • No? Must not happen. There is an error in the pipeline, a config problem with unit_tests not populating the cache. Therefore it uses a hard exit 1 to signal the error.
upload:
  stage: test
  when: always # Always run the job, no matter if previous jobs/stages failed 
  cache:
    paths:
      - upload/
  script:
    - |
      if [ -f $CACHE_FILE ]; then
        echo "Cache exists, starting upload"
        rm -f $CACHE_FILE
        echo "Upload finished, removed data from cache"
      else
        echo "Cache empty. Pipeline problem?"
        exit 1
      fi

Tests

Unit tests OK

unit_tests job

upload job

Unit tests FAIL

Pipeline status is FAIL too.

unit_tests job

upload job

The MR shows the CI/CD results.

Break the pipeline

Don’t generate any cache data, comment out the script step in unit_tests.

Results n broken pipeline

upload job is not happy.

image

Tips

The Pipeline Editor allows to quickly test the scenario. I’ve made the exit status for unit_tests a CI/CD variable, allowing to control the pipeline state directly from the editor. :slight_smile:

Full example

Located at .gitlab-ci.yml · 07a8ad83bec95c241b62ee6b9f2547ba1a4ec120 · Michael Friedrich / ci-cd-playground · GitLab

# Problem: https://forum.gitlab.com/t/job-with-when-always-gets-skipped/64390
#
# - Run `unit_tests` job which generates reports for later upload
# - `unit_tests` job can return success or failure
# - `upload` job needs access to generated report to upload 
#
# Solution:
# 
# - Introduce cache based on the branch as key 
# - Generate cache data into upload/ in `unit_tests`
# - Always run `upload` job and check if cache data exists, to start upload 

variables:
  STATUS: 1 # control the `unit_tests`  job exit status. 0..ok, 1..failure
  CACHE_FILE: upload/date.data 

cache:
  key: $CI_COMMIT_REF_SLUG  

unit_tests:
  stage: build
  cache:
    paths:
      - upload/
    when: always # Keep the cache even when the job fails,  https://docs.gitlab.com/ee/ci/yaml/#cachewhen   
  script: 
    - rm -f $CACHE_FILE
    - mkdir -p upload && echo "`date +%s`" > $CACHE_FILE
    - echo "Will return success or failure based on $STATUS"
    - exit $STATUS

upload:
  stage: test
  when: always # Always run the job, no matter if previous jobs/stages failed 
  cache:
    paths:
      - upload/
  script:
    - |
      if [ -f $CACHE_FILE ]; then
        echo "Cache exists, starting upload"
        rm -f $CACHE_FILE
        echo "Upload finished, removed data from cache"
      else
        echo "Cache empty. Pipeline problem?"
        exit 1
      fi

Hope the detailed example helps implementing this in your pipeline workflow :slight_smile:

In case you wonder: I don’t think that you can solve the problem with job artifacts; caches are a must for my solution, reducing complexity and blockers.

Cheers,
Michael