Custom Postgres Service Unavailable in Job

Custom Postgres Service connection fails in test step


  stage: Core
    - k8s-runner
    GIT_STRATEGY: none
    IS_CI_TEST: 1
    TESTING: 1
    DB_USER: 'myusername'
    DB_PASS: 'mypassword'
    TEST_DB_HOST: 'test-db-host'
    TEST_DB_NAME: 'test-db-name'
    - cd /usr/src/my-repo

Run Misc Tests:
    - .ci-test
    - name:
      alias: test-db-host 
    - env | grep DB | sort
    - pg_isready -d $TEST_DB

Error Case

The environment variables that are printed are what I expect; however, the final script step results in

$ pg_isready -d $TEST_DB
test-db-host:5432 - no response

I expect the service to be accessible at this point.


Running with gitlab-runner 15.1.0 (76984217)
  on gitlab-runner-5b47cd79d-xtrmd Cx6KC7sf
Preparing the "kubernetes" executor

Troubleshooting So Far

Some of the lines of inquiry I’ve pursued up to this point…

Image Issues?

I have run this image locally and found it to be as expected, including being accessible and properly configured/migrated with

docker run -d --rm &&  pg_isready -d postgresql://myusername:mypassword@localhost/aledade
localhost:5432 - accepting connections

and running psql -c '\l+' agains this local container shows me the structure I expect, so I am reasonably confident that the image itself is correct.

I have also tried setting the postgres connection variables POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB in the service definition (though this is demonstrably unnecessary in the local environment, since they are set in the image itself)

Connection String + Service Alias

  • I have echoed the runtime value of TEST_DB and found it to match my expectations. The printed result of pg_isready is clearly respecting the value in the connection string.
  • I have gotten a successful connection to a base image, postgres:11 using the same pattern of access.
  • I’ve gotten the same result from setting TEST_DB_HOST to both the alias and name of the service.

It appears that this is no more than a timing issue – my custom container starts up slower than the base postgres image, so adding a sleep 60 “resolved” the problem.

In the absence of a specific dependency directive, the best solution I’ve come up with is a script to try to connect, then sleep for a short time before retrying
That looks like this:

    - python --testset misc

with the script

echo "Testing if testdb is up and running"
echo "Connecting to $1 ..."
echo "will try to connect $NUM_ATTEMPTS times."
while [ $COUNTER -lt $NUM_ATTEMPTS ]; do
  echo "Waiting for testdb to finish starting up... (attempt $COUNTER)"
  if pg_isready -d "$1"
    echo "Connection Successful!"
    exit 0
  sleep 1
echo "Unable to connect to $1 after $NUM_ATTEMPTS attempts"
exit 1