Hi, I’m having trouble with healthcheck for services in GitLab CI when using Network per build, I have a bunch of services which start just fine and work as expected, but the healthcheck always end in timeout, which adds significant time to build duration. This was working previously when not using the Network per Build feature. I have all the appropriate ports exposed in my Dockerfiles (cannot share them as they are on my company’s private Artifactory)
I’m seeing following logs when running job with defined services (again, the services itself work as expected, but waiting for healthcheck prolongs the overall build time)
- GitLab: 12.10 - self managed
- Runner: 12.10 - self managed
- using Docker with socket mounting
Relevant config files
relevant part of .gitlab-ci.yml (redacted sensitive stuff)
- name: **redacted**
- .... other services
I’ve double checked my Exposed ports, checked documentation on healthcheck and services, not much else I can do I guess
If anybody got any ideas / clues it would be much appreciated, thanks very much in advance
this is an area where I haven’t been before, so please bear with me if my guesses are wrong
It sounds like a race condition with the service coming up, but the health check does not detect it soon enough. Or the there is a problem with the exposed ports. Is there any chance that your Dockerfile exposes multiple ports? That could point to
Hey Michael, thanks a lot for Your response.
For exposed ports I’ve checked with docker inspect and there is only one exposed port for service
As for race condititon I think that is unlikely, because when I remove the FF_NETWORK_PER_BUILD: 1 (relevant docs here) from variables in my .gitlab-ci.yml the problem goes away, please see screenshot:
Unfortunately services do not see each other which is problem for me because I’m trying to run integration tests.
Do You have any other ideas?
Thanks again for Your reply
I’ve asked our engineers - maybe you have hit a bug here which needs to be investigated. Your analysis with enabling the feature flag and using network per build is a good one, this narrows down to look into the health checks again … maybe Docker versions introduce trouble here.
That being said, please collect all the details in here. Best would be if you can create a reproducible environment and share it in a new bug report.
Thanks & cheers,
Thanks for getting back to me so quickly, I’ve made a bug report 25660.
I’ve also included public repo with two pipelines which demonstrate the issue.
Hope it is enough and thanks for Your support