Gitlab runner hangs infrequently on Python's `pip install` or `pip download`

Hi,

I use gitlab runner with docker in docker. Unfortunately, our runners sometimes (and infrequently and un-reproducibly) hang or get stuck upon the following commands while building the docker container:

RUN pip download --no-cache-dir -r requirements.txt -d /artifacts

or

RUN pip install --no-cache-dir --no-index --find-links=/tmp/artifacts /tmp/artifacts/*

It hangs there for an hour until the build times out. The only solution is to restart the build process (and then it usually completes within just 3-5 minutes!).

The gitlab yaml looks (kinda) like this following minimal example:

 - docker:dind
before_script:
  - docker info

build:
  stage: build
  image: docker:1.11
  tags:
    - docker
  script:
    - ./configure
    - make build
    - make push

and make build simply does
docker build $(IMAGE_NAME) -t $(IMAGE_NAME_SHORT) -f Dockerfile .

Any idea what could be the cause for this or where to first dig into to identify the problem?

  • Is this most likely a pypi issue?
  • Or is this a docker issue?
  • Or is this a gitlab issue?

Thanks!

1 Like

Unfortunately, this keeps happening, anyone any idea?

1 Like

Maybe this might help?

Wanted to add attention to this post, we are currently having significant issue on many of our builds due to timeouts after hanging on pip installations.

The problem is sporadic, but also a frequent kind of sporadic in our cases.

Lots of this:

WARNING: Retrying 
(Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by
'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): 
Read timed out. (read timeout=15)",)':

Among other things. Builds work fine locally, no changes in requirements.txt simple Docker command:

RUN pip install -r requirements.txt
RUN pip install --no-cache-dir -vvv -r requirements.txt

Also tried all sorts of stuff about adding indexes explicitly to pip but that did nothing.

@iamliamc maybe this would help you? Tldr: change the timeout

https://stackoverflow.com/questions/43298872/how-to-solve-readtimeouterror-httpsconnectionpoolhost-pypi-python-org-port

Some other things that come to mind:

  1. When this happens, can you identify which runner (if you have multiple) that this occurs on?
  2. Can you do some other internet connectivity tests before this command (like ping, etc) to see if there’s a problem there?

Hey @ghostsquad - thanks for the link. Yeah, I should’ve been more explicit, like the original poster my runner times out after 1 hr (the default), the stack trace above doesn’t actually alway happen, nor is it necessarily the place where the pip install command hangs. We did try increasing it to 360 seconds but hasn’t quite solved it.

Regarding your second point.
The job includes pulling an ubuntu image, cloning repositories and sub-repositories, and often the job is able to reach pypi and download and install multiple packages and then hangs. So we know it has network connection (but at somepoint get’s wonked out).

@iamliamc docker in docker runner? Or other? Have you tried multiple varieties to see if it’s specific to the runner?

Can you encode your job as a script and run it 100+ times (in the same docker image in your pipeline) without the using Gitlab runner? Maybe that might help isolate whether this is a runner problem, script problem, network problem, etc.

This keeps happening even with --default-timeout=999.
Here’s the beginning of the gitlab runner log:

Running with gitlab-runner 11.11.0-rc2 (7f58b1ec)
  on docker-auto-scale fa6cab46
Using Docker executor with image docker:1.11 ...
Starting service docker:dind ...

For now it seems to work again. Can someone from gitlab comment on this? Could it have been network issues on gitlab’s site or issues on PyPi’s site? Thanks!

Yeah I’ve got to agree I’ve no longer had this problem this week and never came to an actual conclusion or fix on my end… we did upgrade to the newest gitlab. Thanks for all the comments and collaboration on possible fixes.

One year later, and it is keep hanging in python:3.6-alpine3.11 :roll_eyes: :cry:

I resolved it with this accepted answer

I wrote in my docker-compose.yml file:

       version: '3.4'
       services:
            image_name:
               build:
                   context: .
                   network: host

Sorry guys, but I’m getting this issue to in Gitlab.

Like @robert.meyer said, infrequently.

I’m trying to build a simple Python application and in some builds Runner take timeout from py.org in pip install. My image base is python:3.7.8-slim.

The Runner is running on Kubernetes on AWS. About internet connection I don’t have any problems.

Using GitLab Community Edition 13.1.2

Down is an example:

Setting up zlib1g-dev:amd64 (1:1.2.11.dfsg-1) ...

[343](https://gitlab.biqueirabr.com.br/gitlab/front/-/jobs/16#L343) Setting up libgnutls28-dev:amd64 (3.6.7-4+deb10u4) ...

[344](https://gitlab.biqueirabr.com.br/gitlab/front/-/jobs/16#L344) Setting up libmariadb-dev (1:10.3.22-0+deb10u1) ...

[345](https://gitlab.biqueirabr.com.br/gitlab/front/-/jobs/16#L345) Setting up libmariadb-dev-compat:amd64 (1:10.3.22-0+deb10u1) ...

[346](https://gitlab.biqueirabr.com.br/gitlab/front/-/jobs/16#L346) Setting up default-libmysqlclient-dev:amd64 (1.0.5) ...

[347](https://gitlab.biqueirabr.com.br/gitlab/front/-/jobs/16#L347) Processing triggers for libc-bin (2.28-10) ...

[348](https://gitlab.biqueirabr.com.br/gitlab/front/-/jobs/16#L348) WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/mysqlclient/

[349](https://gitlab.biqueirabr.com.br/gitlab/front/-/jobs/16#L349) WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/mysqlclient/

[350](https://gitlab.biqueirabr.com.br/gitlab/front/-/jobs/16#L350) WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/mysqlclient/

[351](https://gitlab.biqueirabr.com.br/gitlab/front/-/jobs/16#L351) Collecting mysqlclient==1.4.6

[352](https://gitlab.biqueirabr.com.br/gitlab/front/-/jobs/16#L352)  Downloading mysqlclient-1.4.6.tar.gz (85 kB)

[353](https://gitlab.biqueirabr.com.br/gitlab/front/-/jobs/16#L353) Collecting Flask==1.1.2

[354](https://gitlab.biqueirabr.com.br/gitlab/front/-/jobs/16#L354)  Downloading Flask-1.1.2-py2.py3-none-any.whl (94 kB)

[355](https://gitlab.biqueirabr.com.br/gitlab/front/-/jobs/16#L355) Collecting Flask-RESTful==0.3.8

Some builds pass ok and others crash out of by timeout.

Getting this pretty much all the time for gitlab runners running on Kubernetes with the Kubernetes executor. Only see it with docker in docker jobs, where we are doing a docker build that does a pip install for example.

GitLab runner version: gitlab/gitlab-runner:alpine-v13.3.0
Helm chart version: gitlab-runner-0.20.0

This issue occurs with the following setups (all in AWS):

Kops: 1.15.1
K8s: 1.15.9
CNI: Calico
OS: Debian 9 (stretch)
Kernel: 4.9.0
Container runtime: docker://18.6.3

Kops: 1.15.1
K8s: 1.15.9
CNI: Calico
OS: Amazon Linux 2
Kernel: 4.14.193
Container runtime: docker://19.3.6

Kops: 1.17.2
K8s: 1.17.9
CNI: Calico
OS: Flatcar 2512.3.0 (Oklo)
Kernel version: 4.19.123
Container runtime: docker://18.6.3

This issue DOESN’T occur with the following setups:

Kops: 1.15.1
K8s: 1.10.5
CNI: Calico
OS: CoreOS 2512.3.0 (Oklo)
Kernel version: 4.19.123
Container runtime: docker://18.6.3

eksctl: 0.28.1
K8s: 1.17.9
CNI: AWS VPC
OS: Amazon Linux 2
Kernel version: 4.14.193
Container runtime: docker://19.3.6

Haven’t taken an overly scientific approach to solving the issue yet, due to lack of time lately, but just wanted to show a couple of setups where I have seen it work (and not work).

Since the 1.10.5 k8s cluster on CoreOS, I haven’t been able to get the docker in docker jobs running using the k8s gitlab-runner on any setup other than the EKS cluster with the AWS VPC CNI. Admittedly I haven’t gone too low-level on the issue yet.

Would be interested in knowing what versions/technologies people are running where this works in AWS.

Here is a version of a job (from .gitlab-ci.yml) that fails during the docker build when doing a pip install:

variables:  
  DOCKER_DRIVER: overlay2
  DOCKER_HOST: tcp://localhost:2375

stages:
  - build

build: 
  image: docker:stable
  stage: build
  services:
    - docker:stable-dind
  script:
    - docker build -t myrepo.io/my-image .
    - docker push myrepo.io/my-image

Any ideas?

we had a similar issue when running k8s executor on AWS EKS. however it was resolved when we use the option “–network host” along with other options in docker build and there are no more connectivity errors

docker build --network host -t {DOCKER_REGISTRY}/{IMAGE_NAME}:${IMAGE_VERSION} .

Below link as more details

Hope it helps!

This is still occurring for me with gitlab-runner 12.6.0 for all python images, I’m not even doing any docker build, it just does pip install and run an python command. Fails 50% of the times without able to reach PyPi. Can someone please help

@Alageshan-M - were you able to solve the issue. I am new to docker and trying to install pip packages and it fails.