GitLab CI Not Functioning Despite Identical Configurations: Seeking Troubleshooting Assistance

Problem to solve

Hey everyone,

We’ve been encountering a perplexing issue with our GitLab CI pipeline lately, and despite multiple attempts at troubleshooting, we haven’t been able to resolve it. Here’s the situation:

We have a GitLab CI pipeline that has been running smoothly for some time. However, recently, the pipeline has stopped functioning altogether on stage buildAndTest (jobs buildAndTestBackend & testBackendWithLatest), seemingly out of nowhere. We haven’t made any changes to the configuration file or the project settings that would explain this sudden breakdown.

In our troubleshooting efforts, we’ve tried a few different approaches:

  1. Comparing Configurations: compared gitlab-ci configs => they are the same
  2. Downgrading GitLab CI Version: Suspecting a compatibility issue with a recent update in keycloak (new minor), postgres (new minor) or the gitlab-runner (new major), we downgraded our versions to match the environment where the pipeline previously succeeded (gitlab-runner with self hosted runner on kubernetes). However, even after reverting to the older version, the problem persists.
  3. Checking Environment Variables: We meticulously reviewed all environment variables used within the CI pipeline, ensuring they are correctly set and haven’t been inadvertently altered.

Steps to reproduce

Honestly we cant think of any steps, since we did not change anything to our knowledge.

Configuration

gitlab-ci.yaml

variables:
APP_IMAGE_NAME: ${ACR_USER}/xxx:${CI_COMMIT_REF_SLUG}

workflow:
rules:
- if: $CI_PIPELINE_SOURCE == ‘merge_request_event’
- if: $CI_COMMIT_TAG
- if: $CI_COMMIT_BRANCH == ‘dev’
- if: $CI_COMMIT_BRANCH == ‘master’

stages:

  • analyze
  • buildAndTest
  • package
  • deployment

eslint:
stage: analyze
image: node:lts
before_script:
- corepack enable
- corepack prepare pnpm@latest-9 --activate
- pnpm config set store-dir .pnpm-store
script:
- ls -l
- cd Backend
- ls -l
- pnpm i
- pnpm eslint:ci
artifacts:
when: always
reports:
codequality: Backend/gl-codequality.json

buildAndTestBackend:
stage: buildAndTest
image: node:lts
services:
- name: postgres:latest
alias: postgres
variables:
POSTGRES_USER: root
POSTGRES_PASSWORD: root
- name: Quay
alias: keycloak
variables:
DB_VENDOR: POSTGRES
DB_ADDR: postgres
DB_DATABASE: keycloak
DB_USER: root
DB_PASSWORD: root
KEYCLOAK_ADMIN: admin
KEYCLOAK_ADMIN_PASSWORD: admin
command: [“start-dev”]
variables:
HOST: “0.0.0.0”
PORT: 2700
LOG_LEVEL: info
DATABASE_URL: postgres://root:root@postgres/prisma
before_script:
- corepack enable
- corepack prepare pnpm@latest-9 --activate
- pnpm config set store-dir .pnpm-store
script:
- node --version
- cd Backend
- pnpm i
- pnpm prisma:dbpush
- pnpm test:server
- pnpm build:server
coverage: ‘/^Statements\s*:\s*([^%]+)/’
artifacts:
when: always
reports:
junit: Backend/test-results.xml
paths:
- Backend/dist/
expire_in: 1 hour

testBackendWithLatest:
stage: buildAndTest
image: node:latest
services:
- name: postgres:latest
alias: postgres
variables:
POSTGRES_USER: root
POSTGRES_PASSWORD: root
- name: Quay
alias: keycloak
variables:
DB_VENDOR: POSTGRES
DB_ADDR: postgres
DB_DATABASE: keycloak
DB_USER: root
DB_PASSWORD: root
KEYCLOAK_ADMIN: admin
KEYCLOAK_ADMIN_PASSWORD: admin
command: [“start-dev”]
variables:
HOST: “0.0.0.0”
PORT: 2700
LOG_LEVEL: info
DATABASE_URL: postgres://root:root@postgres/prisma
before_script:
- corepack enable
- corepack prepare pnpm@latest-9 --activate
- pnpm config set store-dir .pnpm-store
script:
- node --version
- cd Backend
- pnpm i
- pnpm prisma:dbpush
- pnpm test:server
coverage: ‘/^Statements\s*:\s*([^%]+)/’
artifacts:
when: always
reports:
junit: Backend/test-results.xml

buildFrontend:
stage: buildAndTest
image: node:lts
before_script:
- corepack enable
- corepack prepare pnpm@latest-9 --activate
- pnpm config set store-dir .pnpm-store
script:
- cd Backend/ClientApp
- pnpm install
- pnpm build
artifacts:
paths:
- ClientApp/build
expire_in: 1 hour

Versions

  • GitLab Runner, 17.0.0 in Runner from Gitlab, 16.11.0 in self hosted and last successfull run (this was still using cloud runner)

Summary and other interesting things

  • the posted config worked a month and now stopped working
  • we havent changed a thing to the config
  • on self hosted runner the jobs fail differently and sometimes succeeds (different topic)
  • of course we know the ci fails because of a 403 from keycloak, but he have not made any changes to our files (if we choose the exact git commit where the pipeline worked three weeks ago, it also does not work anymore)

Hi @tzebastian,

I would recommend reviewing our documentation on Expired Access Tokens and rotating your access token.

GitLab removed support for non-expiring access tokens as part of a security best practice to ensure that leaked tokens are not usable. While we did send several notifications via blog and email beginning in September 2022, we recognize that not every customer saw these notifications, and some have experienced a service disruption as a result.

Hi,

i do not see why this should be affecting us since

  1. we only have 1 access token used for some automations towards jira
  2. the errors we described happened also before 2024-05-14 (precisely since 2024-05-12)
  3. the 403 error we get is from a fresh keycloak instance initiated in the pipeline

But thanks for the input

1 Like

Updated: I FIXED IT

After some troubleshooting, we could specify the error more accurately: Keycloak was recognizing the requests as external requests, and because of the default configuration of Keycloak, external requests are required to be with HTTPS.

It seems GitLab has changed the networking between services or the default value of FF_NETWORK_PER_BUILD has changed. Anyways,
variables:
FF_NETWORK_PER_BUILD: “true”
fixed this error.

1 Like