Get net/http TLS handshake timeout

work_account · January 13, 2020, 3:57pm

Getting an intermittent error during my CI pipeline.
Setup:

2+ runners with docker executor. Docker v18 and v19, gitlab-runner v12.0 and v12.5
1 registry
I have scheduled pipelines to run a couple of times a day.

Error:

Every day, out of 5 pipeline executions, at least one fails. Usually the one in the morning.
Error message is generic when doing a docker push my-registry.some.co
journaclctl -u docker shows:
dockerd[29874]: time=“2020-01-13T07:22:08.328674059-05:00” level=error msg=“Handler for POST /v1.39/images/create returned error: Get https://registry.mine.mine/v2/: net/http: TLS handshake timeout”

dockerd[10544]: time=“2020-01-12T07:24:48.573875158-05:00” level=info msg=“Attempting next endpoint for push after error: Get https://registry.mine.mine/v2/: net/http: TLS handshake timeout”

I use a proxy, and have it configured. I believe correctly because my pipelines for other projects work fine and for this one works most of the time (about 80%).

Any suggestions?

dnsmichi · January 13, 2020, 4:22pm

Hi,

from a network perspective, where are the runners and the registry located? Since you’ve said “proxy”, which connections are proxied and which one + config are you using?

TLS handshake errors not only source from low latency network connections, but also with limited CPU resources on the end performing the handshake. If the registry host for example is overloaded with other tasks/connections, this may pile up into blocked cryptography calculations and thus, timing out the handshake request from the other end.

Which TLS versions and ciphers are offered/used by the registry host? You can check that e.g. with sslscan.

Cheers,
Michael

work_account · January 13, 2020, 4:40pm

Hi Michael,

Thanks for the prompt reply!
The runners, gitlab, and the registry are all in the same vlan, so I have http_proxy and no_proxy configured.
The CPU should not be a problem because the times it fails, is after business hours, but I will keep an eye on that.
I’ll try sslscan, what should I expect to see?

dnsmichi · January 14, 2020, 8:49am

Hi,

sslscan should return the used TLS versions, and also the cipher suites being used. Some of these are more CPU extensive than others, so it was a quick shot in the dark.

Typically, I’d rather monitor the CPU load on the system, and especially the docker registry daemon. Correlate these graphs with the TLS timeout, you can e.g. with an HTTP check against the registry which is run every time à la curl registryurl.com or docker login registryurl.com and collecting the http response time metrics and state.

Cheers,
Michael

HuyVo2112 · July 10, 2024, 1:52pm

Hi @work_account,

We are running into the same problem with intermittent TLS handshake timeout errors when connecting to our GitLab registry.

Can you please share how did you fix it?

Thanks,
Huy

Topic		Replies	Views
TLS timeout for docker registry How to Use GitLab	0	2429	December 7, 2017
CI pipeline fails randomly when accessing GitLab APIs (Docker registry, Artifacts, ...) GitLab CI/CD runner	2	657	September 29, 2023
Can not login to gitlab registry with runner (via $CI_REGISTRY_PASSWORD), causes HTTP 401 GitLab CI/CD ci , registry	0	286	April 27, 2024
Failed to push image to registry by runner GitLab CI/CD	1	1427	April 7, 2023
Error response from daemon: Get "https://registry.gitlab.com/v2/": dial tcp 35.227.35.254:443: i/o timeout GitLab CI/CD	0	1482	June 29, 2023

Get net/http TLS handshake timeout

Related topics