GitLab Runner 12.10 - network per build - healtheck for services time out

benkelukas · May 12, 2020, 1:19pm

Hi, I’m having trouble with healthcheck for services in GitLab CI when using Network per build, I have a bunch of services which start just fine and work as expected, but the healthcheck always end in timeout, which adds significant time to build duration. This was working previously when not using the Network per Build feature. I have all the appropriate ports exposed in my Dockerfiles (cannot share them as they are on my company’s private Artifactory)

I’m seeing following logs when running job with defined services (again, the services itself work as expected, but waiting for healthcheck prolongs the overall build time)

Version information

GitLab: 12.10 - self managed
Runner: 12.10 - self managed
using Docker with socket mounting

Relevant config files

relevant part of .gitlab-ci.yml (redacted sensitive stuff)

component tests:
  tags:
    ...
  image: **redacted**/php72-cli:2.1.0
  services:
    - name: **redacted**
      alias: **redacted**
    - .... other services
  variables:
    ...
  stage: test
  script:
    - ...

Example Dockerfile

....
    EXPOSE 8091
....

Troubleshooting

I’ve double checked my Exposed ports, checked documentation on healthcheck and services, not much else I can do I guess

If anybody got any ideas / clues it would be much appreciated, thanks very much in advance

dnsmichi · May 12, 2020, 5:51pm

Hi,

this is an area where I haven’t been before, so please bear with me if my guesses are wrong

It sounds like a race condition with the service coming up, but the health check does not detect it soon enough. Or the there is a problem with the exposed ports. Is there any chance that your Dockerfile exposes multiple ports? That could point to

Cheers,
Michael

benkelukas · May 12, 2020, 6:37pm

Hey Michael, thanks a lot for Your response.
For exposed ports I’ve checked with docker inspect and there is only one exposed port for service

As for race condititon I think that is unlikely, because when I remove the FF_NETWORK_PER_BUILD: 1 (relevant docs here) from variables in my .gitlab-ci.yml the problem goes away, please see screenshot:

Unfortunately services do not see each other which is problem for me because I’m trying to run integration tests.

Do You have any other ideas?

Thanks again for Your reply

dnsmichi · May 13, 2020, 11:15am

Hi,

I’ve asked our engineers - maybe you have hit a bug here which needs to be investigated. Your analysis with enabling the feature flag and using network per build is a good one, this narrows down to look into the health checks again … maybe Docker versions introduce trouble here.

That being said, please collect all the details in here. Best would be if you can create a reproducible environment and share it in a new bug report.

Thanks & cheers,
Michael

benkelukas · May 13, 2020, 3:37pm

Hey Michael,
Thanks for getting back to me so quickly, I’ve made a bug report 25660.

I’ve also included public repo with two pipelines which demonstrate the issue.

Hope it is enough and thanks for Your support