I use gitlab runner with docker in docker. Unfortunately, our runners sometimes (and infrequently and un-reproducibly) hang or get stuck upon the following commands while building the docker container:
RUN pip download --no-cache-dir -r requirements.txt -d /artifacts
or
RUN pip install --no-cache-dir --no-index --find-links=/tmp/artifacts /tmp/artifacts/*
It hangs there for an hour until the build times out. The only solution is to restart the build process (and then it usually completes within just 3-5 minutes!).
The gitlab yaml looks (kinda) like this following minimal example:
- docker:dind
before_script:
- docker info
build:
stage: build
image: docker:1.11
tags:
- docker
script:
- ./configure
- make build
- make push
and make build simply does docker build $(IMAGE_NAME) -t $(IMAGE_NAME_SHORT) -f Dockerfile .
Any idea what could be the cause for this or where to first dig into to identify the problem?
Wanted to add attention to this post, we are currently having significant issue on many of our builds due to timeouts after hanging on pip installations.
The problem is sporadic, but also a frequent kind of sporadic in our cases.
Lots of this:
WARNING: Retrying
(Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by
'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443):
Read timed out. (read timeout=15)",)':
Among other things. Builds work fine locally, no changes in requirements.txt simple Docker command:
RUN pip install -r requirements.txt RUN pip install --no-cache-dir -vvv -r requirements.txt
Also tried all sorts of stuff about adding indexes explicitly to pip but that did nothing.
Hey @ghostsquad - thanks for the link. Yeah, I should’ve been more explicit, like the original poster my runner times out after 1 hr (the default), the stack trace above doesn’t actually alway happen, nor is it necessarily the place where the pip install command hangs. We did try increasing it to 360 seconds but hasn’t quite solved it.
Regarding your second point.
The job includes pulling an ubuntu image, cloning repositories and sub-repositories, and often the job is able to reach pypi and download and install multiple packages and then hangs. So we know it has network connection (but at somepoint get’s wonked out).
@iamliamc docker in docker runner? Or other? Have you tried multiple varieties to see if it’s specific to the runner?
Can you encode your job as a script and run it 100+ times (in the same docker image in your pipeline) without the using Gitlab runner? Maybe that might help isolate whether this is a runner problem, script problem, network problem, etc.
This keeps happening even with --default-timeout=999.
Here’s the beginning of the gitlab runner log:
Running with gitlab-runner 11.11.0-rc2 (7f58b1ec)
on docker-auto-scale fa6cab46
Using Docker executor with image docker:1.11 ...
Starting service docker:dind ...
For now it seems to work again. Can someone from gitlab comment on this? Could it have been network issues on gitlab’s site or issues on PyPi’s site? Thanks!
Yeah I’ve got to agree I’ve no longer had this problem this week and never came to an actual conclusion or fix on my end… we did upgrade to the newest gitlab. Thanks for all the comments and collaboration on possible fixes.
I’m trying to build a simple Python application and in some builds Runner take timeout from py.org in pip install. My image base is python:3.7.8-slim.
The Runner is running on Kubernetes on AWS. About internet connection I don’t have any problems.
Getting this pretty much all the time for gitlab runners running on Kubernetes with the Kubernetes executor. Only see it with docker in docker jobs, where we are doing a docker build that does a pip install for example.
Haven’t taken an overly scientific approach to solving the issue yet, due to lack of time lately, but just wanted to show a couple of setups where I have seen it work (and not work).
Since the 1.10.5 k8s cluster on CoreOS, I haven’t been able to get the docker in docker jobs running using the k8s gitlab-runner on any setup other than the EKS cluster with the AWS VPC CNI. Admittedly I haven’t gone too low-level on the issue yet.
Would be interested in knowing what versions/technologies people are running where this works in AWS.
Here is a version of a job (from .gitlab-ci.yml) that fails during the docker build when doing a pip install:
we had a similar issue when running k8s executor on AWS EKS. however it was resolved when we use the option “–network host” along with other options in docker build and there are no more connectivity errors
This is still occurring for me with gitlab-runner 12.6.0 for all python images, I’m not even doing any docker build, it just does pip install and run an python command. Fails 50% of the times without able to reach PyPi. Can someone please help