Hi, we’ve been running GitLab and one runner on a build server for several years with no problems. Both run in Docker.
Excerpt from docker-compose.yml:
gitlab:
image: gitlab/gitlab-ce:13.8.4-ce.0
container_name: gitlab
restart: always
hostname: "git.mydomain.com"
ports:
- "5001:80" # web
- "2222:22" # git
- "5050:5050" # containers
volumes:
- /mnt/data/gitlab/config:/etc/gitlab
- /mnt/data/gitlab/logs:/var/log/gitlab
- /mnt/data/gitlab/data:/var/opt/gitlab
networks:
- git
gitlab-runner:
image: gitlab/gitlab-runner:alpine-v13.8.0
container_name: gitlab-runner
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /mnt/data/gitlab-runner/config:/etc/gitlab-runner
networks:
- git
A separate nginx proxy (non-Docker) runs in front of GitLab, mapping the URLs and handling SSL etc.
config.toml for the runner looks like this:
concurrent = 1
[[runners]]
name = "RunnerOne"
url = "https://git.mydomain.com/"
token = "XXXX"
executor = "docker"
environment = ["DOCKER_TLS_CERTDIR=/certs"]
[runners.docker]
image = "alpine:latest"
disable_cache = false
cache_dir = "/cache"
pull_policy = "if-not-present"
volumes = ["/mnt/data/gitlab-runner/certs:/certs","/mnt/data/gitlab-runner/cache:/cache","/var/run/docker.sock:/var/run/docker.sock"]
This all runs fine for around 5 minutes, but then the runner starts throwing these errors on every poll:
2021-02-22T14:21:56.810505826Z WARNING: Checking for jobs... failed runner=G51F98QW status=couldn't execute POST against https://git.mydomain.com/api/v4/jobs/request: Post https://git.mydomain.com/api/v4/jobs/request: dial tcp 172.25.0.3:443: connect: connection refused
At this point the only solution is to move both GitLab and the runner to a new Docker network (eg “git1”) and restart it all.
It would seem that the Gitlab API is rate limiting connections from the runner. This can be confirmed by adding eg check_interval = 15
to the runner config. The runner will now poll without errors proportionally longer according to the interval set.
Sadly this is not a solution as a) setting large intervals increases build times and b) it will still fail eventually.
Changing runner config.toml from:
url = "https://git.mydomain.com/"
to
url = "http://gitlab/"
Solves this first issue by polling on the internal Docker address. However, now the runner now throws this issue on uploading artifacts, making multi step builds impossible:
ERROR: Uploading artifacts as "archive" to coordinator... error error=couldn't execute POST against http://gitlab/api/v4/jobs/23688/artifacts?artifact_format=zip&artifact_type=archive&expire_in=2+hrs: Post http://gitlab/api/v4/jobs/23688/artifacts?artifact_format=zip&artifact_type=archive&expire_in=2+hrs: dial tcp: lookup gitlab on 67.207.67.2:53: no such host id=23688 token=5iUB9Td4
As mentioned, this was working fine until a couple of months ago. I’m not sure what changed, but I’ve since tried to rollback versions of GitLab, runner, docker etc with no success. What are we doing wrong here?
Many thanks!