Problem to solve
Hey everyone,
We’ve been encountering a perplexing issue with our GitLab CI pipeline lately, and despite multiple attempts at troubleshooting, we haven’t been able to resolve it. Here’s the situation:
We have a GitLab CI pipeline that has been running smoothly for some time. However, recently, the pipeline has stopped functioning altogether on stage buildAndTest (jobs buildAndTestBackend & testBackendWithLatest), seemingly out of nowhere. We haven’t made any changes to the configuration file or the project settings that would explain this sudden breakdown.
In our troubleshooting efforts, we’ve tried a few different approaches:
- Comparing Configurations: compared gitlab-ci configs => they are the same
- Downgrading GitLab CI Version: Suspecting a compatibility issue with a recent update in keycloak (new minor), postgres (new minor) or the gitlab-runner (new major), we downgraded our versions to match the environment where the pipeline previously succeeded (gitlab-runner with self hosted runner on kubernetes). However, even after reverting to the older version, the problem persists.
- Checking Environment Variables: We meticulously reviewed all environment variables used within the CI pipeline, ensuring they are correctly set and haven’t been inadvertently altered.
Steps to reproduce
Honestly we cant think of any steps, since we did not change anything to our knowledge.
Configuration
gitlab-ci.yaml
variables:
APP_IMAGE_NAME: ${ACR_USER}/xxx:${CI_COMMIT_REF_SLUG}
workflow:
rules:
- if: $CI_PIPELINE_SOURCE == ‘merge_request_event’
- if: $CI_COMMIT_TAG
- if: $CI_COMMIT_BRANCH == ‘dev’
- if: $CI_COMMIT_BRANCH == ‘master’
stages:
- analyze
- buildAndTest
- package
- deployment
eslint:
stage: analyze
image: node:lts
before_script:
- corepack enable
- corepack prepare pnpm@latest-9 --activate
- pnpm config set store-dir .pnpm-store
script:
- ls -l
- cd Backend
- ls -l
- pnpm i
- pnpm eslint:ci
artifacts:
when: always
reports:
codequality: Backend/gl-codequality.json
buildAndTestBackend:
stage: buildAndTest
image: node:lts
services:
- name: postgres:latest
alias: postgres
variables:
POSTGRES_USER: root
POSTGRES_PASSWORD: root
- name: Quay
alias: keycloak
variables:
DB_VENDOR: POSTGRES
DB_ADDR: postgres
DB_DATABASE: keycloak
DB_USER: root
DB_PASSWORD: root
KEYCLOAK_ADMIN: admin
KEYCLOAK_ADMIN_PASSWORD: admin
command: [“start-dev”]
variables:
HOST: “0.0.0.0”
PORT: 2700
LOG_LEVEL: info
DATABASE_URL: postgres://root:root@postgres/prisma
before_script:
- corepack enable
- corepack prepare pnpm@latest-9 --activate
- pnpm config set store-dir .pnpm-store
script:
- node --version
- cd Backend
- pnpm i
- pnpm prisma:dbpush
- pnpm test:server
- pnpm build:server
coverage: ‘/^Statements\s*:\s*([^%]+)/’
artifacts:
when: always
reports:
junit: Backend/test-results.xml
paths:
- Backend/dist/
expire_in: 1 hour
testBackendWithLatest:
stage: buildAndTest
image: node:latest
services:
- name: postgres:latest
alias: postgres
variables:
POSTGRES_USER: root
POSTGRES_PASSWORD: root
- name: Quay
alias: keycloak
variables:
DB_VENDOR: POSTGRES
DB_ADDR: postgres
DB_DATABASE: keycloak
DB_USER: root
DB_PASSWORD: root
KEYCLOAK_ADMIN: admin
KEYCLOAK_ADMIN_PASSWORD: admin
command: [“start-dev”]
variables:
HOST: “0.0.0.0”
PORT: 2700
LOG_LEVEL: info
DATABASE_URL: postgres://root:root@postgres/prisma
before_script:
- corepack enable
- corepack prepare pnpm@latest-9 --activate
- pnpm config set store-dir .pnpm-store
script:
- node --version
- cd Backend
- pnpm i
- pnpm prisma:dbpush
- pnpm test:server
coverage: ‘/^Statements\s*:\s*([^%]+)/’
artifacts:
when: always
reports:
junit: Backend/test-results.xml
buildFrontend:
stage: buildAndTest
image: node:lts
before_script:
- corepack enable
- corepack prepare pnpm@latest-9 --activate
- pnpm config set store-dir .pnpm-store
script:
- cd Backend/ClientApp
- pnpm install
- pnpm build
artifacts:
paths:
- ClientApp/build
expire_in: 1 hour
Versions
- GitLab Runner, 17.0.0 in Runner from Gitlab, 16.11.0 in self hosted and last successfull run (this was still using cloud runner)
Summary and other interesting things
- the posted config worked a month and now stopped working
- we havent changed a thing to the config
- on self hosted runner the jobs fail differently and sometimes succeeds (different topic)
- of course we know the ci fails because of a 403 from keycloak, but he have not made any changes to our files (if we choose the exact git commit where the pipeline worked three weeks ago, it also does not work anymore)