Custom Postgres Service Unavailable in Job

Custom Postgres Service connection fails in test step

.gitlab-ci.yml

.ci-test:
  stage: Core
  image: docker.mycompany.com/my-repo:${IMAGE_TAG}-ci
  tags:
    - k8s-runner
  variables:
    GIT_STRATEGY: none
    IS_CI_TEST: 1
    TESTING: 1
    DB_USER: 'myusername'
    DB_PASS: 'mypassword'
    TEST_DB_HOST: 'test-db-host'
    TEST_DB_NAME: 'test-db-name'
    TEST_DB: 'postgresql://$DB_USER:$DB_PASS@$TEST_DB_HOST/$TEST_DB_NAME'
  before_script:
    - cd /usr/src/my-repo

Run Misc Tests:
  extends:
    - .ci-test
  services:
    - name: docker.mycompany.com/testdb:current
      alias: test-db-host 
  script:
    - env | grep DB | sort
    - pg_isready -d $TEST_DB

Error Case

The environment variables that are printed are what I expect; however, the final script step results in

$ pg_isready -d $TEST_DB
test-db-host:5432 - no response

I expect the service to be accessible at this point.

Versioning

Running with gitlab-runner 15.1.0 (76984217)
  on gitlab-runner-5b47cd79d-xtrmd Cx6KC7sf
Preparing the "kubernetes" executor

Troubleshooting So Far

Some of the lines of inquiry I’ve pursued up to this point…

Image Issues?

I have run this image locally and found it to be as expected, including being accessible and properly configured/migrated with

docker run -d --rm  docker.mycompany.com/myrepo:current &&  pg_isready -d postgresql://myusername:mypassword@localhost/aledade
localhost:5432 - accepting connections

and running psql -c '\l+' agains this local container shows me the structure I expect, so I am reasonably confident that the image itself is correct.

I have also tried setting the postgres connection variables POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB in the service definition (though this is demonstrably unnecessary in the local environment, since they are set in the image itself)

Connection String + Service Alias

  • I have echoed the runtime value of TEST_DB and found it to match my expectations. The printed result of pg_isready is clearly respecting the value in the connection string.
  • I have gotten a successful connection to a base image, postgres:11 using the same pattern of access.
  • I’ve gotten the same result from setting TEST_DB_HOST to both the alias and name of the service.

It appears that this is no more than a timing issue – my custom container starts up slower than the base postgres image, so adding a sleep 60 “resolved” the problem.

In the absence of a specific dependency directive, the best solution I’ve come up with is a script to try to connect, then sleep for a short time before retrying
That looks like this:

script:
    - ./gitlab_wait_for_testdb.sh $TEST_DB $WAIT_FOR_DB_TIMEOUT
    - python tests-run.py --testset misc

with the script

#!/bin/bash
echo "Testing if testdb is up and running"
echo "Connecting to $1 ..."
NUM_ATTEMPTS=$2
echo "will try to connect $NUM_ATTEMPTS times."
COUNTER=0
while [ $COUNTER -lt $NUM_ATTEMPTS ]; do
  echo "Waiting for testdb to finish starting up... (attempt $COUNTER)"
  if pg_isready -d "$1"
  then
    echo "Connection Successful!"
    exit 0
  fi
  sleep 1
  let COUNTER=COUNTER+1
done
echo "Unable to connect to $1 after $NUM_ATTEMPTS attempts"
exit 1