Pulling docker image gitlab/gitlab-runner-helper times out (not docker-in-docker)

Hi community,

What are you seeing, and how does that differ from what you expect to see?

I am using GitLab CI on a self-hosted instance: 12.10.3, with runners using docker executor running in a self-hosted Docker environment (see details below). The pipeline fails “every once in a while” with this error message

Pulling docker image gitlab/gitlab-runner-helper:x86_64-c5874a4b ...
ERROR: Job failed: execution took longer than 1h0m0s seconds

meaning a failed job could succeed when I execute it a second or third time.

I already found these threads

but they build Docker images in docker-in-docker, but in my case I am running a Maven and Node.js build).

These are the last messages in the job’s log. The runner’s log level is set to debug:

[...] 
Downloading artifacts
Running before_script and script
Authenticating with credentials from $DOCKER_AUTH_CONFIG
$ mvn $MAVEN_CLI_OPTS verify --projects gui
[INFO] Error stacktraces are turned on.
[INFO] Scanning for projects...
[INFO] 
[INFO] ------------------------< fxmi:gui >------------------------
[INFO] Building gui 1.18.0-SNAPSHOT
[INFO] --------------------------------[ pom ]---------------------------------
[INFO] 
[INFO] --- frontend-maven-plugin:1.3:install-node-and-npm (install node and npm) @ gui ---
[INFO] Node v8.16.2 is already installed.
[INFO] NPM 6.4.1 is already installed.
[INFO] 
[INFO] --- frontend-maven-plugin:1.3:npm (npm install) @ gui ---
[INFO] Running 'npm ci --cache ../.npm' in /builds/gitlab/im/fxmi/gui
[INFO] 
[INFO] > uws@9.14.0 install /builds/gitlab/im/fxmi/gui/node_modules/uws
[INFO] > node-gyp rebuild > build_log.txt 2>&1 || exit 0
[INFO] 
Running after_script
WARNING: Failed to inspect build container 79a6a08400ad12696541620e90709a8b1c037dfa469de16a1958d4a3ad7be1c2 context deadline exceeded (docker_command.go:77:0s)
Authenticating with credentials from $DOCKER_AUTH_CONFIG
Pulling docker image my-registry/maven-nvm:3.6-jdk-8 ...
Uploading artifacts for failed job
Pulling docker image gitlab/gitlab-runner-helper:x86_64-c5874a4b ...
ERROR: Job failed: execution took longer than 1h0m0s seconds

Docker image used by this job

I use a custom Docker image for running this job. This is the Dockerfile that creates the image:

FROM maven:3.6-jdk-8

RUN mkdir /builds \
    && useradd --no-log-init -mr -g users -N non-root \
    && mkdir -p /home/non-root/.m2 \
    && chown -R non-root:users /builds /home/non-root

USER non-root

ENV MAVEN_CONFIG="/home/non-root/.m2"

ENV NVM_DIR="/home/non-root/.nvm"
ENV YVM_DIR="/home/non-root/.yvm"

COPY settings.xml /home/non-root/.m2/

COPY install-scripts /
RUN ./install-node-yarn-version-manager.sh

.gitlab-ci.yml

image: my-registry/maven-nvm:3.6-jdk-8
variables:
  MAVEN_OPTS: "-Dmaven.repo.local=.m2/repository"
  MAVEN_CLI_OPTS: "--batch-mode --errors"

stages:
  - build-gui
  - build-backend

build-gui:
  stage: build-gui
  cache:
    key: "gui"
    paths:
      - .m2/repository/
      - .npm/
  script:
    - mvn $MAVEN_CLI_OPTS verify --projects gui
  artifacts:
    paths:
      - gui/target

[...]

This Docker hosts the GitLab runners and build containers

Attribute Value
OS Information linux x86_64 Debian GNU/Linux 10 (buster)
Kernel Version 4.19.0-8-amd64

Engine Details

Version 19.03.9 (API: 1.40)
Root directory /var/lib/docker
Storage Driver overlay2
Logging Driver json-file
Volume Plugins local
Network Plugins bridge, host, ipvlan, macvlan, null, overlay

Any ideas? I was hoping, that this issue https://github.com/moby/moby/issues/40514 has been fixed and according to its milestone label “19.03.7” I should be safe running 19.03.9 here
:man_shrugging:

Where is your docker host located?

Does it have a solid and reliable connection to the internet or is it on a corporate network that has to go through a proxy of some sort?

We’ve had issues with this trying to reach the public images in the past.

If you can reliably access your private registry, you can try looking at pulling, tagging, and pushing the public gitlab runner helper image into your private registry.
You can then configure your docker executor to use the gitlab helper image from your private registry with this configuration option

Thanks @tmos22 for advice. I didn’t knew about this configuration option and I’ll give it a try. We have a private docker registry and 3 docker hosts on a corporate network, each is running 5 shared GitLab Runner containers. Proxying the helper image would at least speed up things a bit.

I’m sorry I redacted some part of the .gitlab-ci.yml which makes me expect, that the internet connection might not be the cause of this problem.

build-gui:
  stage: build-gui
  cache:
    key: "gui"
    paths:
      - .m2/repository/
      - .npm/
  script:
    - mvn $MAVEN_CLI_OPTS verify --projects gui
  artifacts:
    paths:
      - gui/target

build-gui-vue:
  stage: build-gui
  cache:
    key: "gui"
    paths:
      - .m2/repository/
      - .npm/
  script:
    - mvn $MAVEN_CLI_OPTS verify --projects gui-vue
  artifacts:
    paths:
      - gui-vue/dist

There are two similar jobs on the same stage. Both jobs effectively execute npm install via this mvn verify. On the job build-gui this npm install (Node.js 8.16) fails sometimes, but on build-gui-vue the npm install (Node.js 10.16) finishes successfully. There are several other projects running its pipelines frequently on this same infrastructure and they don’t experience this timeout problem.

I think it is remarkable that the timeout problem always occurs in the same step of npm install.

expected log

Authenticating with credentials from $DOCKER_AUTH_CONFIG
 $ mvn $MAVEN_CLI_OPTS verify --projects gui
 [INFO] Error stacktraces are turned on.
 [INFO] Scanning for projects...
 [INFO] 
 [INFO] ------------------------< fxmi:gui >------------------------
 [INFO] Building rudi-gui 1.20.0-SNAPSHOT
 [INFO] --------------------------------[ pom ]---------------------------------
 [INFO] 
 [INFO] --- frontend-maven-plugin:1.3:install-node-and-npm (install node and npm) @ gui ---
 [INFO] Node v8.16.2 is already installed.
 [INFO] NPM 6.4.1 is already installed.
 [INFO] 
 [INFO] --- frontend-maven-plugin:1.3:npm (npm install) @ gui ---
 [INFO] Running 'npm ci --cache ../.npm' in /builds/gitlab/im/fxmi/gui
 [INFO] 
 [INFO] > uws@9.14.0 install /builds/gitlab/im/fxmi/gui/node_modules/uws
 [INFO] > node-gyp rebuild > build_log.txt 2>&1 || exit 0
 [INFO] 
 [INFO] 
 [INFO] > fsevents@1.1.3 install /builds/gitlab/im/fxmi/gui/node_modules/fsevents
 [INFO] > node install
 [INFO] 
 [INFO] 
 [INFO] > node-sass@4.7.2 install /builds/gitlab/im/fxmi/gui/node_modules/node-sass
 [INFO] > node scripts/install.js
 [INFO] 
 [INFO] Cached binary found at /builds/gitlab/im/fxmi/.npm/node-sass/4.7.2/linux-x64-57_binding.node
 [INFO] 
 [INFO] > node-sass@4.7.2 postinstall /builds/gitlab/im/fxmi/gui/node_modules/node-sass
 [INFO] > node scripts/build.js
 [INFO] 
 [INFO] Binary found at /builds/gitlab/im/fxmi/gui/node_modules/node-sass/vendor/linux-x64-57/binding.node
 [INFO] Testing binary
 [INFO] Binary is fine

Every time the job breaks due to timeout, it breaks here

[INFO] > uws@9.14.0 install /builds/gitlab/im/fxmi/gui/node_modules/uws
[INFO] > node-gyp rebuild > build_log.txt 2>&1 || exit 0