CI Build using new (local) docker image

Issue: CI Builds using Docker are failing conditionally.
Gitlab type: Self-Hosted (paid)
Gitlab version: 12.6.2-ee
Docker image: Locally built docker image (Debian Buster w/PHP7.4)
Notes: Previous CI/Docker images continue to work properly.

I’m working on a new CI/Deployment, which requires we upgrade our (local) Docker images we use.
I’ve built a new Debian Buster image the same way we’ve done previously for older OS versions.
Added a new runner, similar to all our others (30+), didn’t work initially, same HTTP 443 as described below.

While attempting to debug the issue, I noticed if I was actively running the container while the job processed, it succeeded.

Example:
gitlab-host:~# docker run -it --rm buster-php74 bash

Retry the pipeline that had been failing, results in success.

Now, logout of the container mentioned above, retry the pipeline, it will fail with HTTP 443:

fatal: unable to access 'https://gitlab.example.com/project/project-api.git/': Failed to connect to gitlab.example.com port 443: Connection refused

Open to suggestions as to what’s going on here.

Hi,

can you share the .gitlab-ci.yml configuration? I have a hard time understanding the relation between running a container manually and the CI runners.

Cheers,
Michael

Yea, same here, I just happened to have it open once when I re-tried the pipeline, noticed it worked, and it’s consistent.

.gitlab-ci.yml

stages:
  - build

branch-build:
  variables:
    CI_DEBUG_TRACE: "true"
  stage: build
  script:
    - APP_ENV=dev composer install --optimize-autoloader --no-interaction
    - composer dump-env dev
    - bin/console cache:clear --env=dev --no-warmup
    - bin/console cache:warmup --env=dev
    - bin/console doctrine:database:create --if-not-exists
    - bin/console doctrine:migrations:migrate --no-interaction
    - vendor/bin/phpcs --warning-severity=8 --extensions=php --standard=PSR1,PSR2 src/
  only:
    - branches
    - pushes
  except:
    - dev
    - master
  tags:
    - symfony
    - docker
    - php74
  services:
    - mysql:5.6

Since you are not specifying the image keyword here, I’d say that this is run via the shell executor, and not in Docker itself. The GitLab runner config.toml would be interesting here.

If you start your Docker container once, it highly likely maps ports to the host system (80, 443) and then the CI pipeline succeeds. If you stop the container, service ports are gone.

If you add the following to your script section, it shows where this is executed.

  script:
    - echo $(hostname -f)
    - APP_ENV=dev composer install --optimize-autoloader --no-interaction
    - composer dump-env dev
    - bin/console cache:clear --env=dev --no-warmup
    - bin/console cache:warmup --env=dev
    - bin/console doctrine:database:create --if-not-exists
    - bin/console doctrine:migrations:migrate --no-interaction
    - vendor/bin/phpcs --warning-severity=8 --extensions=php --standard=PSR1,PSR2 src/

Cheers,
Michael

Hmm, we don’t specify “image” in any of our others, but I went ahead and added it for good measure (it is specified in the config.toml), but no change in the behavior.

FWIW, this is identical to our other runners with the exception of name/token/image

/etc/gitlab-runner/config.toml (snippet):

[[runners]]
  name = "Symfony/composer project building using docker with PHP 7.4"
  url = "https://gitlab.example.com/ci"
  token = "xxxxxxx"
  executor = "docker"
  environment = ["MYSQL_ALLOW_EMPTY_PASSWORD=1"]
  [runners.docker]
    tls_verify = false
    image = "buster-php74"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    pull_policy = "if-not-present"
    shm_size = 0

Output from the echo:

+ eval 'echo "Running on $(hostname) via <gitlab host>..."
'
+++ hostname
++ echo 'Running on runner-xxxxxxx-project-759-concurrent-0 via <gitlab hostname>...'
Running on runner-xxxxxxx-project-759-concurrent-0 via <gitlab hostname>...
+ exit 0

Hi there,

when I see the message below I would strait guess there is a connection issue. Have you checked the DNS - may be just run a curl or ping to the repository URL?

fatal: unable to access 'https://gitlab.example.com/project/project-api.git/': Failed to connect to gitlab.example.com port 443: Connection refused

May be something changed for Debian Buster compared to the older releases?

1 Like

Yea, I did check them, they are set to our defaults, and furthermore, when running the containers manually, I can clone any repo just fine, and do anything else network related that I would expect to work such as curl-ing the main gitlab.example.com page.

The Connection refused would indicate at the least that DNS is working, but the host is refusing, so I also double-checked our firewall to make sure we’re not blocking any of the traffic (though this host has been running for years and no one has made any firewall changes recently).

Just an some extra info, while we were debugging using some echo commands, we found that we could hold the runner container open if we left a never ending ping running. I did have to start a container manually (it doesn’t seem to matter which one I start) to get the build to progress far enough to reach the ping command, but from there I was able to grab the ci-token info from the ENV and confirm I was able to clone the repo without issue.

I’m trying to add a before_script to the failing builds to it will stay open long enough to examine it, but so far a simple ping at the top isn’t keeping it open.

one more question: the connection error message comes from git when the repository is cloned before the script tasks are running?

In the interest of no one wasting time, we just tcpdump-ed the docker0 interface while one of the doomed containers was starting up and we were able to capture the following:

15:41:13.122317 IP (tos 0xc0, ttl 64, id 23994, offset 0, flags [none], proto ICMP (1), length 88)
    <gitlab host> > 192.168.0.3: ICMP <gitlab host> tcp port https unreachable, length 68

Which sure seems to suggest there’s is some form of networking issue, though why a manually run container would resolve it, is still unclear.

Does anyone know if the “runner-helper” containers do anything special with network?

@nightman68 yea, all fails before anything starts building

When I understood right you setup a new runner with a new image. Have you tried to run the jobs with one of the old images on the new runner which were working on the old runner?

If the git clone/fetch fails I would enable some git debugging in the CI file:

variables:
  GIT_CURL_VERBOSE: 1
  GIT_TRACE: 1

I never had issues with the helper images…

2 Likes

Just wanted to comment that this got resolved.

It seemed that something changed in our docker0 interface and was running into our firewall (we use Firehol FWIW). Strange because we didn’t make any actual changes to our firewall, so possibly related to the upgrade we did earlier that week to gitlab runner?

@nightman68 thanks for suggesting the verbose options, it did help illustrate where the breakdown was happening.

@dnsmichi thanks for looking into it, was a great learning experience for Gitlab CI

You’re welcome!

Glad you could figure it out by yourself :slight_smile: I’ve learned new things with the great help from @nightman68 myself :slight_smile: Maybe you’ll stay here for a bit and try to help others too? Or you’ll throw in some likes showing your appreciation, or mark one reply as solution :slight_smile:

In case you want to share your CI experience, there’s a great epic/issue for doing so. I had added my feedback there too.

And in case you want to dig even deeper into the CI, my employer’s trainings are opensourced: https://github.com/NETWAYS?utf8=✓&q=gitlab&type=&language= (just check the release pages with pdfs). I’m the author in case you ask :wink:

Cheers,
Michael

1 Like