Connectivity issues between gitlab-runner and gitlab in a Docker environment

Hi, we’ve been running GitLab and one runner on a build server for several years with no problems. Both run in Docker.

Excerpt from docker-compose.yml:

    gitlab:
        image: gitlab/gitlab-ce:13.8.4-ce.0
        container_name: gitlab
        restart: always
        hostname: "git.mydomain.com"
        ports:
            - "5001:80" # web
            - "2222:22" # git
            - "5050:5050" # containers
        volumes:
            - /mnt/data/gitlab/config:/etc/gitlab
            - /mnt/data/gitlab/logs:/var/log/gitlab
            - /mnt/data/gitlab/data:/var/opt/gitlab
        networks:
            - git

    gitlab-runner:
        image: gitlab/gitlab-runner:alpine-v13.8.0
        container_name: gitlab-runner
        restart: always
        volumes:
            - /var/run/docker.sock:/var/run/docker.sock
            - /mnt/data/gitlab-runner/config:/etc/gitlab-runner
        networks:
            - git

A separate nginx proxy (non-Docker) runs in front of GitLab, mapping the URLs and handling SSL etc.

config.toml for the runner looks like this:

concurrent = 1

[[runners]]
  name = "RunnerOne"
  url = "https://git.mydomain.com/"
  token = "XXXX"
  executor = "docker"
  environment = ["DOCKER_TLS_CERTDIR=/certs"]
  [runners.docker]
    image = "alpine:latest"
    disable_cache = false
    cache_dir = "/cache"
    pull_policy = "if-not-present"
    volumes = ["/mnt/data/gitlab-runner/certs:/certs","/mnt/data/gitlab-runner/cache:/cache","/var/run/docker.sock:/var/run/docker.sock"]

This all runs fine for around 5 minutes, but then the runner starts throwing these errors on every poll:

2021-02-22T14:21:56.810505826Z WARNING: Checking for jobs... failed runner=G51F98QW status=couldn't execute POST against https://git.mydomain.com/api/v4/jobs/request: Post https://git.mydomain.com/api/v4/jobs/request: dial tcp 172.25.0.3:443: connect: connection refused

At this point the only solution is to move both GitLab and the runner to a new Docker network (eg “git1”) and restart it all.

It would seem that the Gitlab API is rate limiting connections from the runner. This can be confirmed by adding eg check_interval = 15 to the runner config. The runner will now poll without errors proportionally longer according to the interval set.

Sadly this is not a solution as a) setting large intervals increases build times and b) it will still fail eventually.

Changing runner config.toml from:

url = "https://git.mydomain.com/"

to

url = "http://gitlab/"

Solves this first issue by polling on the internal Docker address. However, now the runner now throws this issue on uploading artifacts, making multi step builds impossible:

ERROR: Uploading artifacts as "archive" to coordinator... error error=couldn't execute POST against http://gitlab/api/v4/jobs/23688/artifacts?artifact_format=zip&artifact_type=archive&expire_in=2+hrs: Post http://gitlab/api/v4/jobs/23688/artifacts?artifact_format=zip&artifact_type=archive&expire_in=2+hrs: dial tcp: lookup gitlab on 67.207.67.2:53: no such host id=23688 token=5iUB9Td4

As mentioned, this was working fine until a couple of months ago. I’m not sure what changed, but I’ve since tried to rollback versions of GitLab, runner, docker etc with no success. What are we doing wrong here?

Many thanks!

There is something weird in your DNS setup. If you have Runner configured for https://git.mydomain.com why do you get Docker interface IP in the error log 172.25.0.3:443?
You should get IP that the FQDN translates to, the IP of the build server itself (on which the nginx is listening).
If 172.25.0.3 is IP of your build server, make sure it does not conflict with the subnet for the Docker interfaces as they usually go for 172.xx.yy.zz/16.

Thank you so much @balonik!

This has been causing grief for months and I’ve tried so many different options, but your answer pointed me in the right direction.

The culprit seems to be the “hostname: git.mydomain.com” line in docker-compose.yml. That line has always been there and it’s included in the official documentation (GitLab Docker images | GitLab), so I’m not sure what changed. Removing it and using the FQDN in the runner config solves all issues.

Thanks again!

@sbmatt I am glad that it works for you. I have missed the hostname in the docker-compose.yml file to be honest. I guess GitLab has that in their official documentation so that all GitLab services are communicating within the container and traffic is not going to the Docker host and back. With the hostname present and the fact that both GitLab and Runner container are on the same user-defined network the DNS works just fine.
When I took a second look I noticed that you have https in url in Runner’s config.toml, but SSL is handled by outside Nginx Proxy so the GitLab container only listens on http. Then the connection refused on port 443 make sense. Changing the protocol in url in config.toml to http://git.mydomain.com should actually solve your problem and is in my opinion better option than having the traffic go through the Nginx proxy.

@balonik You’re quite right: reinstating hostname for the GitLab service and switching the runner to point at http: does indeed work and solve the first issue.

However, I did then run into an artifact upload issue again:

WARNING: Uploading artifacts as "archive" to coordinator... failed id=XXXXX responseStatus=404 Not Found status=404 token=XXXXX

WARNING: Retrying... context=artifacts-uploader error=invalid argument

I agree that having internal traffic pass through the Nginx proxy isn’t ideal, but to be honest if it works I’m happy. If the runner(s) was on a different server this would happen anyway.

Thanks again for your time and help.