CI Build using new (local) docker image

ttpears · January 17, 2020, 6:26pm

Issue: CI Builds using Docker are failing conditionally.
Gitlab type: Self-Hosted (paid)
Gitlab version: 12.6.2-ee
Docker image: Locally built docker image (Debian Buster w/PHP7.4)
Notes: Previous CI/Docker images continue to work properly.

I’m working on a new CI/Deployment, which requires we upgrade our (local) Docker images we use.
I’ve built a new Debian Buster image the same way we’ve done previously for older OS versions.
Added a new runner, similar to all our others (30+), didn’t work initially, same HTTP 443 as described below.

While attempting to debug the issue, I noticed if I was actively running the container while the job processed, it succeeded.

Example:
gitlab-host:~# docker run -it --rm buster-php74 bash

Retry the pipeline that had been failing, results in success.

Now, logout of the container mentioned above, retry the pipeline, it will fail with HTTP 443:

fatal: unable to access 'https://gitlab.example.com/project/project-api.git/': Failed to connect to gitlab.example.com port 443: Connection refused

Open to suggestions as to what’s going on here.

dnsmichi · January 17, 2020, 6:32pm

Hi,

can you share the .gitlab-ci.yml configuration? I have a hard time understanding the relation between running a container manually and the CI runners.

Cheers,
Michael

ttpears · January 17, 2020, 6:35pm

Yea, same here, I just happened to have it open once when I re-tried the pipeline, noticed it worked, and it’s consistent.

.gitlab-ci.yml

stages:
  - build

branch-build:
  variables:
    CI_DEBUG_TRACE: "true"
  stage: build
  script:
    - APP_ENV=dev composer install --optimize-autoloader --no-interaction
    - composer dump-env dev
    - bin/console cache:clear --env=dev --no-warmup
    - bin/console cache:warmup --env=dev
    - bin/console doctrine:database:create --if-not-exists
    - bin/console doctrine:migrations:migrate --no-interaction
    - vendor/bin/phpcs --warning-severity=8 --extensions=php --standard=PSR1,PSR2 src/
  only:
    - branches
    - pushes
  except:
    - dev
    - master
  tags:
    - symfony
    - docker
    - php74
  services:
    - mysql:5.6

dnsmichi · January 17, 2020, 7:00pm

Since you are not specifying the image keyword here, I’d say that this is run via the shell executor, and not in Docker itself. The GitLab runner config.toml would be interesting here.

If you start your Docker container once, it highly likely maps ports to the host system (80, 443) and then the CI pipeline succeeds. If you stop the container, service ports are gone.

If you add the following to your script section, it shows where this is executed.

  script:
    - echo $(hostname -f)
    - APP_ENV=dev composer install --optimize-autoloader --no-interaction
    - composer dump-env dev
    - bin/console cache:clear --env=dev --no-warmup
    - bin/console cache:warmup --env=dev
    - bin/console doctrine:database:create --if-not-exists
    - bin/console doctrine:migrations:migrate --no-interaction
    - vendor/bin/phpcs --warning-severity=8 --extensions=php --standard=PSR1,PSR2 src/

Cheers,
Michael

ttpears · January 17, 2020, 7:50pm

Hmm, we don’t specify “image” in any of our others, but I went ahead and added it for good measure (it is specified in the config.toml), but no change in the behavior.

FWIW, this is identical to our other runners with the exception of name/token/image

/etc/gitlab-runner/config.toml (snippet):

[[runners]]
  name = "Symfony/composer project building using docker with PHP 7.4"
  url = "https://gitlab.example.com/ci"
  token = "xxxxxxx"
  executor = "docker"
  environment = ["MYSQL_ALLOW_EMPTY_PASSWORD=1"]
  [runners.docker]
    tls_verify = false
    image = "buster-php74"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    pull_policy = "if-not-present"
    shm_size = 0

Output from the echo:

+ eval 'echo "Running on $(hostname) via <gitlab host>..."
'
+++ hostname
++ echo 'Running on runner-xxxxxxx-project-759-concurrent-0 via <gitlab hostname>...'
Running on runner-xxxxxxx-project-759-concurrent-0 via <gitlab hostname>...
+ exit 0

nightman68 · January 17, 2020, 8:06pm

Hi there,

when I see the message below I would strait guess there is a connection issue. Have you checked the DNS - may be just run a curl or ping to the repository URL?

fatal: unable to access 'https://gitlab.example.com/project/project-api.git/': Failed to connect to gitlab.example.com port 443: Connection refused

May be something changed for Debian Buster compared to the older releases?

ttpears · January 17, 2020, 8:38pm

Yea, I did check them, they are set to our defaults, and furthermore, when running the containers manually, I can clone any repo just fine, and do anything else network related that I would expect to work such as curl-ing the main gitlab.example.com page.

The Connection refused would indicate at the least that DNS is working, but the host is refusing, so I also double-checked our firewall to make sure we’re not blocking any of the traffic (though this host has been running for years and no one has made any firewall changes recently).

Just an some extra info, while we were debugging using some echo commands, we found that we could hold the runner container open if we left a never ending ping running. I did have to start a container manually (it doesn’t seem to matter which one I start) to get the build to progress far enough to reach the ping command, but from there I was able to grab the ci-token info from the ENV and confirm I was able to clone the repo without issue.

I’m trying to add a before_script to the failing builds to it will stay open long enough to examine it, but so far a simple ping at the top isn’t keeping it open.

nightman68 · January 17, 2020, 8:56pm

one more question: the connection error message comes from git when the repository is cloned before the script tasks are running?

ttpears · January 17, 2020, 9:00pm

In the interest of no one wasting time, we just tcpdump-ed the docker0 interface while one of the doomed containers was starting up and we were able to capture the following:

15:41:13.122317 IP (tos 0xc0, ttl 64, id 23994, offset 0, flags [none], proto ICMP (1), length 88)
    <gitlab host> > 192.168.0.3: ICMP <gitlab host> tcp port https unreachable, length 68

Which sure seems to suggest there’s is some form of networking issue, though why a manually run container would resolve it, is still unclear.

Does anyone know if the “runner-helper” containers do anything special with network?

@nightman68 yea, all fails before anything starts building

nightman68 · January 17, 2020, 9:09pm

When I understood right you setup a new runner with a new image. Have you tried to run the jobs with one of the old images on the new runner which were working on the old runner?

If the git clone/fetch fails I would enable some git debugging in the CI file:

variables:
  GIT_CURL_VERBOSE: 1
  GIT_TRACE: 1

I never had issues with the helper images…

ttpears · January 20, 2020, 3:55pm

Just wanted to comment that this got resolved.

It seemed that something changed in our docker0 interface and was running into our firewall (we use Firehol FWIW). Strange because we didn’t make any actual changes to our firewall, so possibly related to the upgrade we did earlier that week to gitlab runner?

@nightman68 thanks for suggesting the verbose options, it did help illustrate where the breakdown was happening.

@dnsmichi thanks for looking into it, was a great learning experience for Gitlab CI

nightman68 · January 20, 2020, 5:23pm

You’re welcome!

dnsmichi · January 20, 2020, 5:54pm

Glad you could figure it out by yourself I’ve learned new things with the great help from @nightman68 myself Maybe you’ll stay here for a bit and try to help others too? Or you’ll throw in some likes showing your appreciation, or mark one reply as solution

In case you want to share your CI experience, there’s a great epic/issue for doing so. I had added my feedback there too.

And in case you want to dig even deeper into the CI, my employer’s trainings are opensourced: https://github.com/NETWAYS?utf8=✓&q=gitlab&type=&language= (just check the release pages with pdfs). I’m the author in case you ask

Cheers,
Michael

Topic		Replies	Views
Unable to build docker images on shared runners GitLab CI/CD ci , runner	5	2400	January 18, 2024
Cannot connect to the Docker daemon : intermittent error GitLab CI/CD ci , runner , docker	4	4298	June 7, 2019
GitLab runner/Docker executor 'Cannot link to a non running container' GitLab CI/CD	3	833	July 20, 2024
Trying to persist docker images and expose ports on local runner How to Use GitLab	1	16	October 18, 2024
Runner fail build GitLab CI/CD ci , runner , docker , registry	4	4422	May 26, 2022

CI Build using new (local) docker image

Related Topics