Bad hostname resolution in Job with two services

In a Job that requires two different postgres services only one of the connections works

When running tests, only one of the database connections that is required works. While connections to both hosts can be made, the results do not match the expected containers.

In the following config, both the aliases/hostnames for the services should resolve to separate containers, but they appear to resolve to the same container.

Config

This is the merged yaml for the job that is failing

Misc Tests:
  stage: Static
  image: docker.myorg.com/myrepo:${IMAGE_TAG}-ci
  tags:
  - k8s-runner
  variables:
    FF_NETWORK_PER_BUILD: 1
    GIT_STRATEGY: none
    IS_CI_TEST: 1
    TESTING: 1
    DB_USER: youzer
    DB_PASS: passwrd
    ALFA_DB_NAME: alfa
    ALFA_DB_HOST: alfa-test-db
    ALFA_DB_URL: postgresql://$DB_USER:$DB_PASS@$CLAIMS_DB_HOST/$CLAIMS_DB_NAME
    TEST_DB_HOST: test-db
    TEST_DB_NAME: bravo
    TEST_DB: postgresql://$DB_USER:$DB_PASS@$TEST_DB_HOST/$TEST_DB_NAME
  services:
  - name: postgres:11
    alias: alfa-test-db
    variables:
      POSTGRES_USER: "$DB_USER"
      POSTGRES_PASSWORD: "$DB_PASS"
      POSTGRES_DB: "$CLAIMS_DB_NAME"
  - name: docker.myorg.com/testdb:current
    alias: test-db
  before_script:
  - cd /usr/src/app
  extends:
  - ".ci-test"
  script:
  - "./gitlab_wait_for_testdb.sh $ALFA_DB_URL $DB_CONNECTION_TIMEOUT"
  - "./gitlab_wait_for_testdb.sh $TEST_DB $DB_CONNECTION_TIMEOUT"
  - cat /etc/hosts
  - env | grep -v GITLAB | grep -v CI | sort
  - psql -d $ALFA_DB_URL -c '\l+'
  - psql -d $TEST_DB -c '\l+'  #### <--- fails at this point, because the database it expects does not exist. 
  - python tests-run.py --testset misc

Debugging

  • Each of the connections works in other jobs where it is the only service
  • pg_isready works for both services, but this really only indicates that any postgres instance is reachable at the hostname.
  • examining the custom container locally, or in a single-service Job shows that it is configured as expected, i.e. has the required database, bravo.

Ultimately, the results of $ cat /etc/hosts appears to show both services’ being aliased to the same IP address

# Kubernetes-managed hosts file.
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
fe00::0	ip6-mcastprefix
fe00::1	ip6-allnodes
fe00::2	ip6-allrouters
10.70.2.5	runner-cx6kc7sf-project-21-concurrent-0z7dbc
# Entries added by HostAliases.
127.0.0.1	docker.aledade.com-testdb	test-db	postgres	claims-test-db

I’m pretty baffled as to why both of these services seem to be getting conflated…

Turns out, the kubernetes executor does this on purpose. It’s not noticeable except in a case like mine, where both services use the same port!

This needs to be called out more explicitly in the services documentation around aliases, but the upshot is that all your services are mapped to 127.0.0.1 by just appending aliases to /etc/hosts, and so if there are two services with the same port, one will silently shadow the other.