Failed to Stop Pipeline, Draining All Compute Time

Problem to solve

I could not stop the pipeline even the job was already failed and the job duration was kept incresing.

last night, I saw my job failed but and the 3 dot animation at the bottom was kept moving and the duration was kept increasing

So I tried to cancel the pipeline and the pipeline status was canceling

I could not do anything about the job.

Today I noticed the job duration was 203 minutes but my project timeout was 60m and no job timeout was set.

Steps to reproduce

Could not reproduce

Configuration

my .gitlab-ci.yml was like this

variables:
  DOCKER_HOST: tcp://docker:2375
  DOCKER_TLS_CERTDIR: ""
  DOCKER_DRIVER: overlay2
  DOCKER_BUILDKIT: 0
  BUILDKIT_PROGRESS: plain

before_script:
  - env
  - |
    echo "Starting pipeline for branch: ${CI_COMMIT_REF_NAME}"
    prefix="dev-"
    case "$CI_COMMIT_REF_NAME" in
      "testing") prefix="test-" ;;
      "uat") prefix="stg-" ;;
      "staging") prefix="stg-" ;;
      "live") prefix="prod-" ;;
      "production") prefix="prod-" ;;
    esac
    export image=registry.gitlab.com/keypair1/alpm:${prefix}frontend-${CI_COMMIT_SHORT_SHA}
  - mkdir .docker
  - ': DOCKER_CONFIG:?DOCKER_CONFIG is required'
  - cp "$DOCKER_CONFIG" ".docker/config.json"
  - export DOCKER_CONFIG=.docker


build:
  stage: build
  image: docker:24.0.5
  services:
    - name: docker:24.0.5-dind
      alias: docker
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
      when: never
    - if: $DEPLOY
      when: manual
    - if: $CI_COMMIT_REF_NAME =~ /^(staging|uat)$/
      when: on_success
      variables:
        VAR_FILENAME: STG_PUBLIC_VARS
    - if: $CI_COMMIT_REF_NAME =~ /^(production|live)$/
      when: on_success
      variables:
        VAR_FILENAME: PROD_PUBLIC_VARS
    - if: $CI_COMMIT_REF_NAME =~ /^(development|develop|testing)$/
      when: on_success
      variables:
        VAR_FILENAME: DEV_PUBLIC_VARS
  script:
  - |
    echo "Building docker image $image"
    # check if already exist
    exist=$(docker manifest inspect $image && echo 1 || echo "")
    echo "using ${VAR_FILENAME}"
    eval "varfile=\$$VAR_FILENAME"
    cp $varfile .env.next
    cat .env.next
    source .env.next
    if [ -n "$exist" ] && [ -z "$REBUILD" ]; then
      echo "Image $image already exist"
    else
      [ -n "$REBUILD" ] && echo "REBUILD is on"
      echo "Building: $image"
      docker build \
        --build-arg NEXT_PUBLIC_API_KEY=$NEXT_PUBLIC_API_KEY \
        --build-arg NEXT_PUBLIC_CONTRACT_ADDRESS=$NEXT_PUBLIC_CONTRACT_ADDRESS \
        -t $image .
      echo "Pushing"
      docker push $image
    fi

and it failed because there was some error on docker build command

Versions

Please select whether options apply, and add the version information.

1 Like

Same issue here:

Almost 6 hours for ours. Also while the job “failed” it also claims to have passed in the CI. I can only think it somehow stalled after succeeding.

The job should take about 1min to run. It is just polling an api and updating a static website a couple of times a day. The job has run fine since, but we now have 5% of pipeline minutes left :frowning: