All Pipelines Fail after Upgrade from 15.4.1 to 15.4.6, runner not triggered

Runner is not executed

Installed with docker-compose on a self-hosted server.
docker-compose.yml:

version: "3.5"
services:
  gitlab:
    image: 'gitlab/gitlab-ee:${VERSION:-latest}'
    restart: always
    hostname: "${DOMAIN}"
    environment:
      GITLAB_OMNIBUS_CONFIG: |
        external_url "https://${DOMAIN}"
        letsencrypt['enable'] = false
        # the smtp config has been added alongside the update:
        gitlab_rails['smtp_enable'] = true
        gitlab_rails['gitlab_email_from'] = 'gitlab@example.com
        gitlab_rails['smtp_address'] = "mail.example.com"
        gitlab_rails['smtp_port'] = 25
        # end of added config
        nginx['listen_https'] = true
        nginx['ssl_certificate'] = "/etc/gitlab/ssl/${DOMAIN}.crt"
        nginx['ssl_certificate_key'] = "/etc/gitlab/ssl/${DOMAIN}.key"
        nginx['http2_enabled'] = false
        nginx['proxy_set_headers'] = {
          "Host" => "${DOMAIN}",
        }
        gitlab_rails['omniauth_providers'] = [
          {
            "name" => "github",
            "app_id" => "(id)",
            "app_secret" => "(secret)",
            "args" => { "scope" => "user:email" }
          }
        ]
        gitlab_rails['packages_enabled'] = true
        gitlab_rails['lfs_enabled'] = true
        gitlab_rails['registry_port'] = ${REGISTRYPORT}
        gitlab_rails['registry_host'] = "${DOMAIN}"
        gitlab_rails['backup_keep_time'] = 86400
registry['enable'] = true
        registry_external_url "https://${DOMAIN}:${REGISTRYPORT}"
        registry_nginx['ssl_certificate'] = "/etc/gitlab/ssl/${DOMAIN}.crt"
        registry_nginx['ssl_certificate_key'] = "/etc/gitlab/ssl/${DOMAIN}.key"
        puma['worker_processes'] = 2
        puma['per_worker_max_memory_mb'] = 2024
        sidekiq['max_concurrency'] = 10
        postgresql['shared_buffers'] = "500MB"
        prometheus['listen_address'] = '0.0.0.0:${PROMETHEUSPORT}'
    ports:
      - '${HTTPPORT}:80'
      - '${SSHPORT}:22'
      - '${HTTPSPORT}:443'
      - '${REGISTRYPORT}:5050'
      - '127.0.0.1:${PROMETHEUSPORT}:${PROMETHEUSPORT}'
    volumes:
      - '$GITLAB_HOME/config:/etc/gitlab'
      - '$GITLAB_HOME/logs:/var/log/gitlab'
      - '$GITLAB_HOME/data:/var/opt/gitlab'
    networks:
      - gitlab_network
    shm_size: '256m'
gitlab-runner:
    image: gitlab/gitlab-runner:alpine
    restart: unless-stopped
    depends_on:
      - gitlab
    volumes:
      - /srv/gitlab-runner/config:/etc/gitlab-runner
      - /var/run/docker.sock:/var/run/docker.sock
    networks:
      - gitlab_network

networks:
  gitlab_network:
    driver: 'bridge'

runner config.toml:

concurrent = 2
check_interval = 0
log_level = "debug"

[session_server]
  session_timeout = 1800

[[runners]]
  name = "gitlab-ce-ruby-2.6"
  url = "https://gitlab.example.com"
  token = "(not shown)"
  executor = "docker"
  environment = ["DOCKER_DRIVER=overlay2", "DOCKER_TLS_CERTDIR="]
  pre_build_script = "export DOCKER_HOST=tcp://localhost:2375"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.docker]
    tls_verify = false
    image = "ruby:2.6"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache", "/var/run/docker.sock:/var/run/docker.sock"]
    network_mode = "gitlab_gitlab_network"
    shm_size = 0

[[runners]]
  name = "Deployment Runner"
  url = "https://gitlab.example.com/"
  token = "(not shown)"
  executor = "docker"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.docker]
    tls_verify = false
    image = "docker:stable"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = true
    volumes = ["/cache", "/var/run/docker.sock:/var/run/docker.sock"]
    shm_size = 0

After upgrading to 15.4.6. all pipelines return immediately with the error message:
“The scheduler failed to assign job to the runner, please try again or contact system administrator”

Checking the runner shows it does not see any jobs to pick up.

In the logs the error message looks like this:

{
  "severity": "ERROR",
  "time": "2022-12-04T23:35:18.745Z",
  "correlation_id": "01GKFQWYAXAE4011R5JATZKR89",
  "exception.class": "ActiveModel::MissingAttributeError",
  "exception.message": "can't write unknown attribute ``",
  "exception.backtrace": [
    "lib/gitlab/database/load_balancing/connection_proxy.rb:120:in `block in write_using_load_balancer'",
    "lib/gitlab/database/load_balancing/load_balancer.rb:115:in `block in read_write'",
    "lib/gitlab/database/load_balancing/load_balancer.rb:184:in `retry_with_backoff'",
    "lib/gitlab/database/load_balancing/load_balancer.rb:111:in `read_write'",
    "lib/gitlab/database/load_balancing/connection_proxy.rb:119:in `write_using_load_balancer'",
    "lib/gitlab/database/load_balancing/connection_proxy.rb:71:in `transaction'",
    "lib/gitlab/database.rb:332:in `block in transaction'",
    "lib/gitlab/database.rb:331:in `transaction'",
    "app/models/concerns/cross_database_modification.rb:83:in `transaction'",
    "app/services/ci/register_job_service.rb:258:in `assign_runner!'",
    "app/services/ci/register_job_service.rb:184:in `process_build'",
    "app/services/ci/register_job_service.rb:84:in `block in process_queue'",
    "app/services/ci/register_job_service.rb:142:in `block in each_build'",
    "app/services/ci/register_job_service.rb:142:in `each'",
    "app/services/ci/register_job_service.rb:142:in `each_build'",
    "app/services/ci/register_job_service.rb:57:in `process_queue'",
    "app/services/ci/register_job_service.rb:33:in `block in execute'",
    "lib/gitlab/ci/queue/metrics.rb:97:in `observe_queue_time'",
    "app/services/ci/register_job_service.rb:32:in `execute'",
    "lib/api/ci/runner.rb:153:in `block (2 levels) in <class:Runner>'",
    "ee/lib/gitlab/middleware/ip_restrictor.rb:14:in `block in call'",
    "ee/lib/gitlab/ip_address_state.rb:10:in `with'",
    "ee/lib/gitlab/middleware/ip_restrictor.rb:13:in `call'",
    "lib/api/api_guard.rb:215:in `call'",
    "lib/gitlab/metrics/elasticsearch_rack_middleware.rb:16:in `call'",
    "lib/gitlab/middleware/memory_report.rb:13:in `call'",
    "lib/gitlab/middleware/speedscope.rb:13:in `call'",
"lib/gitlab/database/load_balancing/rack_middleware.rb:23:in `call'",
    "lib/gitlab/middleware/rails_queue_duration.rb:33:in `call'",
    "lib/gitlab/metrics/rack_middleware.rb:16:in `block in call'",
    "lib/gitlab/metrics/web_transaction.rb:46:in `run'",
    "lib/gitlab/metrics/rack_middleware.rb:16:in `call'",
    "lib/gitlab/jira/middleware.rb:19:in `call'",
    "lib/gitlab/middleware/go.rb:20:in `call'",
    "lib/gitlab/etag_caching/middleware.rb:21:in `call'",
    "lib/gitlab/middleware/query_analyzer.rb:11:in `block in call'",
    "lib/gitlab/database/query_analyzer.rb:37:in `within'",
    "lib/gitlab/middleware/query_analyzer.rb:11:in `call'",
    "lib/gitlab/middleware/multipart.rb:173:in `call'",
    "lib/gitlab/middleware/read_only/controller.rb:50:in `call'",
    "lib/gitlab/middleware/read_only.rb:18:in `call'",
    "lib/gitlab/middleware/same_site_cookies.rb:27:in `call'",
    "lib/gitlab/middleware/handle_malformed_strings.rb:21:in `call'",
    "lib/gitlab/middleware/basic_health_check.rb:25:in `call'",
    "lib/gitlab/middleware/handle_ip_spoof_attack_error.rb:25:in `call'",
    "lib/gitlab/middleware/request_context.rb:21:in `call'",
    "lib/gitlab/middleware/webhook_recursion_detection.rb:15:in `call'",
    "config/initializers/fix_local_cache_middleware.rb:11:in `call'",
    "lib/gitlab/middleware/compressed_json.rb:26:in `call'",
    "lib/gitlab/middleware/rack_multipart_tempfile_factory.rb:19:in `call'",
    "lib/gitlab/middleware/sidekiq_web_static.rb:20:in `call'",
    "lib/gitlab/metrics/requests_rack_middleware.rb:77:in `call'",
    "lib/gitlab/middleware/release_env.rb:13:in `call'"
  ],
"exception.sql": "/*application:web,correlation_id:01GKFQWYAXAE4011R5JATZKR89,endpoint_id:POST /api/:version/jobs/request,db_config_name:main*/ UPDATE \"ci_builds_metadata\" SET \"config_options\" = '{\"cache\":[{\"key\":\"default\",\"when\":\"on_success\",\"paths\":[\".cache/pip\",\"venv/\"],\"policy\":\"pull-push\"}],\"image\":{\"name\":\"python:3.10.4\"},\"script\":[\"virtualenv venv\",\"cat $PYPIRC \\u003e /tmp/.pypirc\",\"pip install twine\",\"python setup.py  bdist_wheel\",\"python -m twine upload --config-file /tmp/.pypirc --repository gitlab dist/${CI_PROJECT_NAME}-${CI_COMMIT_TAG}-py3-none-any.whl\"],\"before_script\":[\"python -V\",\"pip install virtualenv\",\"virtualenv venv\"]}' WHERE \"ci_builds_metadata\".\"\" = 1487",
  "user.username": null,
  "tags.program": "web",
  "tags.locale": "en",
  "tags.feature_category": "continuous_integration",
  "tags.correlation_id": "01GKFQWYAXAE4011R5JATZKR89",
  "extra.build_id": 1487,
  "extra.build_name": "build",
  "extra.build_stage": "build",
  "extra.pipeline_id": 856,
  "extra.project_id": 46
}

The example here is from a python job, a completely different job for javascript returns the same error.

What I tried:

  • restarting the services
  • stopping, deleting the containers and recreating them from scratch
  • removing the added smtp config
  • from the user interface: clearing the cache
  • reinstalling version 15.4.1.

I did not yet try to restore everything from the backup and use 15.4.1.

I also had a look at the changelog for 15.4.1 to 15.4.6 but would not expect any of the mentioned changes having to do with this.

When I look at the SQL throwing the error, it looks to me as if the WHERE-clause was wrong as it states:

WHERE "ci_builds_metadata"."" = 1487"

I suppose it should be ci_builds_metadata.id = 1487.

I also tried keeping version 15.4.6 and running gitlab-ctl pg-upgrade from inside the container.
The output showed something like “database version not ok” on startup, but after the command finished without errors I still get the same error.

Note that there is no configuration concerning postgres. The postgres logs (of another job than above) show this:

2022-12-05_16:14:39.34055 LOG:  no match in usermap "gitlab" for user "gitlab" authenticated as "root"
2022-12-05_16:14:39.34058 FATAL:  Peer authentication failed for user "gitlab"
2022-12-05_16:14:39.34058 DETAIL:  Connection matched pg_hba.conf line 70: "local   all         all                               peer map=gitlab"
2022-12-05_16:15:26.25307 ERROR:  zero-length delimited identifier at or near """" at character 837
2022-12-05_16:15:26.25312 STATEMENT:  /*application:web,correlation_id:01GKHH47FE07HV3WBMDVP82WZN,endpoint_id:POST /api/:version/jobs/request,db_config_name:main*/ UPDATE "ci_builds_metadata" SET "config_options" = '{"cache":[{"key":"build-cache","when":"on_success","paths":["node_modules/"],"policy":"pull-push"}],"image":{"name":"node:16-alpine"},"script":["apk update \u0026\u0026 apk upgrade \u0026\u0026 apk --no-cache add jq curl","export VERSION=`jq -r \".version\" \u003c ./package.json`","npm install","npm run generate","tar -C ./dist/ -cvzf - . \u003e ${CI_PROJECT_NAME}.tar.gz","curl -u \"${CI_DEPLOY_USER}:${CI_DEPLOY_PASSWORD}\" --upload-file ${CI_PROJECT_NAME}.tar.gz  \"${CI_API_V4_URL}/projects/86/packages/generic/${CI_PROJECT_NAME}/${VERSION}/${CI_PROJECT_NAME}.tar.gz\""],"artifacts":{"paths":["dist/"],"expire_in":"1 day"}}' WHERE "ci_builds_metadata"."" = 1506

“charcter 837” of the statement is where the missing identifier should be (right after “ci_builds_metadata.”)

Upgraded to gitlab 15.6.2 and created a new runner.

The upgrade itself did not help, but stopping the old runner and creating a new one did. I cannot tell if that would have worked in 15.4.6 too.

1 Like