CI errors with IPv6 EHOSTUNREACH

Problem to solve

Our GitLab CI job includes a test suite, running on a node:18-bullseye-slim Docker image, connecting to a postgres:12 service.

Sometime between 12:25pm and 1:54pm today (US Eastern time), these jobs started failing with errors similar to the following:

connect EHOSTUNREACH fc00::242:ac11:3:5432
Error: connect EHOSTUNREACH fc00::242:ac11:3:5432
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1555:16)

Steps to reproduce

Comparison of successful and failing jobs show that the SHAs of the postgres:12 and node:18-bullseye-slim image haven’t changed. If I rerun a previously successful job, it now fails.

In the past, I’ve tried and failed to connect to hosts via IPv6 from GitLab CI shared runners (due to gitlab-runner #37419 and/or #37437?), so my best guess is that part of the CI stack has started trying to use IPv6 and it isn’t (fully?) supported. However, I do not know what changed or what to do about it.

Configuration

image: node:18-bullseye-slim

stages:
  - build
  - test

test:
  stage: test
  services:
    - postgres:12
  script:
    - yarn test:ci

Versions

  • Self-managed
  • GitLab.com SaaS
  • Self-hosted Runners

Versions

6 Likes

Hi @joshkel, not sure if it’s related, but we have a very similar issue in our test job, since yesterday, also running along a service.

In our case, the service is an eclipse-mosquitto MQTT broker with alias: localhost, so our test suite can connect to mqtt://localhost:1883 (so, using IP4, not IPv6)

We never had any problem at all with this system, and 21h hours ago it was working ok. But since approx the time you point, they are failing with connection errors, not reaching the service :sweat:

If I rerun a previously successful job, it now fails.

Same here (also using GitLab.com SaaS), so it’s something broken there…

Our YAML looks like this:

test backend:
  stage: test
  services:
    - name: eclipse-mosquitto:1.6.15
      alias: localhost
  image: ${IMAGE_FULL_NAME}
  script:
    - ...

I’m thinking about trying to install a self-hosted runner to run the job locally, to check if I can see WTF is going on

I am having the same issue at the moment, with node:18 and postgres:14 as a service, when running tests:

1) routes
       tests
         "before each" hook for "pre-test":
     Error: connect EHOSTUNREACH fc00::242:ac11:3:5432
      at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1555:16)

Iam having same issue

Node 18 , postgres 13.3

connection error Error: connect EHOSTUNREACH fc00::123:ac11:3:5432

[1609](https://gitlab.com/xxxxxx/engineering/xxxxxx/api/-/jobs/6800607165#L1609) at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1555:16)

[1610](https://gitlab.com/xxxxxx/engineering/xxxxxx/api/-/jobs/6800607165#L1610)Error: Jest: Got error running globalSetup - /builds/xxxxxx/engineering/xxxxxx/api/v2/test/common/test-setup/delete-test-dbs.ts, reason: Connection terminated unexpectedly

Same here, was just fine yesterday and today started failing on no changes to that part of the code

My org’s test jobs that use shared runners and connect to Mongo, Redis, Docker services started failing due to network errors yesterday. Node 18 on debian bullseye slim

+1 - seeing a similar issue in a django project using a postgres:15.5 service.

Connections to the test job DB started failing yesterday evening (between 5pm and 9pm UTC+1 on the 7th May). There was no change in the codebase between those times. Ruled out a lot of different explanations before finding this thread.

Listing an ip6 address in the exception:

psycopg.OperationalError: connection failed: :242:ac11:3), port 5432 failed: server closed the connection unexpectedly

Hi folks, looking into this internally. We weren’t able to reproduce with our Example project yet, does anyone have a minimal reproducible config available that consistently fails?

Here’s a quick-and-dirty repro using LocalStack as the service: test (#6808172424) · Jobs · Mikko Piuhola / repro-gl-runner-ipv6-connectivity · GitLab

Shows IPv4 working but it’s now defaulting to IPv6 and failing to reach the server.

Related issue: Error: connect EHOSTUNREACH using postgres as a service (#460821) · Issues · GitLab.org / GitLab · GitLab

For us, the issues started with the GL 17.0 breaking changes window #3 when we swapped from gitlab-org (i.e. *.shared-gitlab-org.runners-manager.gitlab.com) tagged runners to saas-linux-small-amd64 (i.e. *.saas-linux-small-amd64.runners-manager.gitlab.com/default).

1 Like

Similar issue in our Django project. Was running just fine until yesterday evening UTC time.
Initial config:

services:
  - docker:19.03.5-dind
  - kartoza/postgis:latest
variables:
  CONTAINER_IMAGE: ...
  POSTGRES_DB: test
  POSTGRES_USER: ...
  POSTGRES_PASSWORD: ...
  POSTGRES_HOST_AUTH_METHOD: trust
  ALLOW_IP_RANGE: 0.0.0.0/0
script:
  - docker pull ${CONTAINER_IMAGE}
  - docker run
    --rm --add-host=kartoza-postgis:$(getent hosts kartoza-postgis  | awk '{ print $1 }')
    --env DB_NAME=${POSTGRES_DB}
    --env DB_TEST_HOST=kartoza-postgis
    --env DB_TEST_USERNAME=${POSTGRES_USER}
    --env DB_TEST_USER_PASS=${POSTGRES_PASSWORD}
    ${CONTAINER_IMAGE} sh -c "pytest"

Output:

docker: Error response from daemon: cgroups: cgroup mountpoint does not exist: unknown.

I updated the docker version and changed postgres service:

services:
  - docker:20.10.17-dind
  - postgres:14.11
variables:
  CONTAINER_IMAGE: ...
  POSTGRES_DB: testing
  POSTGRES_USER: ...
  POSTGRES_PASSWORD: ...
  POSTGRES_HOST_AUTH_METHOD: trust
  ALLOW_IP_RANGE: 0.0.0.0/0
script:
  - docker pull ${CONTAINER_IMAGE}
  - docker run
      --rm --add-host=postgres:$(getent hosts postgres | awk '{ print $1 }')
      --env DB_NAME=${POSTGRES_DB}
      --env DB_TEST_HOST=postgres
      --env DB_TEST_USERNAME=${POSTGRES_USER}
      --env DB_TEST_USER_PASS=${POSTGRES_PASSWORD}
      ${CONTAINER_IMAGE} sh -c "pytest"

Output:

psycopg2.OperationalError: could not connect to server: Cannot assign requested address
Is the server running on host "postgres" (fc00::242:ac11:4) and accepting TCP/IP connections on port 5432?

As a note: “getent hosts postgres” returns only IPv6.

Hi @manuelgrabowski,

I tried to use the give example project: .gitlab-ci.yml · master · GitLab-examples / postgres · GitLab and I observed the issue and I got the following errors:

Using docker image sha256:8e4fc9e184899a58735e7ee333f5e272d7d2298cf59302006b71f33e217be130 for postgres with digest postgres@sha256:4aea012537edfad80f98d870a36e6b90b4c09b27be7f4b4759d72db863baeebb ...
$ export PGPASSWORD=$POSTGRES_PASSWORD
$ psql -h "postgres" -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c "SELECT 'OK' AS status;"
psql: error: connection to server at "postgres" (fc00::242:ac11:3), port 5432 failed: No route to host
	Is the server running on that host and accepting TCP/IP connections?
connection to server at "postgres" (172.17.0.3), port 5432 failed: FATAL:  password authentication failed for user "postgres"
Cleaning up project directory and file based variables
00:00
ERROR: Job failed: exit code 1

Updated: I’m sorry that it’s incorrect configurations in my code. The example was run successfully.

Anyway, my observation from my current pipeline was that it resolved the “localhost” to IPV6 causing “EHOSTUNREACH”. Then I tried to set the “localhost” to “127.0.0.1” to use IPv4, I got another error “ECONNREFUSED”.

Best regards,
Sunsern

I found another post: Error: connect EHOSTUNREACH using postgres as a service (#218) · Issues · GitLab.org / Ops Sub-Department / shared-runners / infrastructure · GitLab and I tried to set FF_NETWORK_PER_BUILD: “true” following a discussion and it seems to help resolve the issue.

That’s the issue where we’re now tracking this problem internally, thanks for sharing!

Setting FF_NETWORK_PER_BUILD: "true" in your CI config should indeed be a workaround to get everyone unblocked for now, but we’re looking into making this the default so it doesn’t have to be set at the individual level.