Environments are sometimes not stopped after merging/closing the merge request

Hi Community,

we are experiencing a strange problem with environments in our self-hosted GitLab where the on_stop action is not always triggered after closing/merging a merge request.

Since we are banging our heads against this for nearly a month now and are still searching for the the needle in the haystack to understand the problem we’d really appreciate some help or small hints what and where to look for to analyze this further.

E.g.: In which log can we find informations about jobs that would trigger the on_stop action after merging/closing a merge request and what would a log entry look like?

Problem description and backstory

Some of the development teams using our GitLab platform are making heavy use of automatic dependency management using renovate with the automerge feature enabled and pipelines for merge requests. Renovate will check for updates on dependencys within the project every time it runs and create merge requests which will update the dependency versions according to the configuration provided.

This will trigger their merge request pipeline, which defines a total of 9 environments in their pipeline definition. Only 2 environments here are relevant in the context of a merge request.

For exactly these 2 environments the on_stop action is not always executed if a merge request is closed/merged. Sometimes it works and sometimes it doesn’t.

We have not yet found any cause or at least a hint in the logs about this. We just see looking at their pipeline and merge request that the on_stop action is not executed. We also cannot reproduce this in a way where this always happens. It’s like a traffic sign.

The projects where we see this behavior are rather old (created <= 2020) and we have seen the first occurence of this problem after the date we updated GitLab to 15.4.

The automerge feature from renovate basically just sets the flag “merge when pipeline succeeds”, so we have excluded renovate as the cause. Addtionally we have also seen this happening for merge requests someone created manually within GitLab and setting the “merge when pipeline succeeds” flag.

According to the documentation we also think the pipeline definition is correct. I could share the full pipeline definition here but it it is rather large (~10.000 LoC). I did my best to reduce the full pipeline definition to the relevant parts.

Stripped pipeline with environment definition
stages:
- post_image_analysis
- review_app_setup
- cleanup

workflow:
  rules:
  - if: "$CI_MERGE_REQUEST_ID"
  - if: $CI_COMMIT_TAG && $CI_COMMIT_REF_PROTECTED == "true"
  - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH && $CI_COMMIT_REF_PROTECTED == "true"

deploy_k8_review:
  rules:
  - if: "$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH"
    when: never
  - if: "$CI_COMMIT_TAG"
    when: never
  - when: on_success
  needs:
  - job: deploy_k8_at
  environment:
    name: review/${CI_COMMIT_REF_NAME}-${CI_PROJECT_ID}
    url: http://${CI_ENVIRONMENT_SLUG}${DK8R_APPLICATION_URL_SUFFIX}
    on_stop: undeploy_k8_review
  stage: review_app_setup
  resource_group: review/${CI_COMMIT_REF_NAME}-${CI_PROJECT_ID}

undeploy_k8_review:
  rules:
  - if: "$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH"
    when: never
  - if: "$CI_COMMIT_TAG"
    when: never
  - when: manual
    allow_failure: true
  needs:
  - job: deploy_k8_review
    optional: true
  environment:
    name: review/${CI_COMMIT_REF_NAME}-${CI_PROJECT_ID}
    action: stop
  stage: cleanup

deploy_trivy_report:
  stage: post_image_analysis
  rules:
  - when: always
  allow_failure: true
  environment:
    name: trivy-report/$CI_COMMIT_REF_SLUG
    url: "$DYNAMIC_ENVIRONMENT_URL"
    on_stop: undeploy_trivy_report

undeploy_trivy_report:
  stage: cleanup
  rules:
  - when: manual
    allow_failure: true
  allow_failure: true
  needs:
  - job: deploy_trivy_report
  environment:
    name: trivy-report/$CI_COMMIT_REF_SLUG
    action: stop

kind regards
Markus

1 Like