Environment:
- K8s v1.14.1 on bare metal, deployed via kubespray v2.10.0 on single Debian 9.9 node. All default inventory settings except: flannel instead of calico, kube_proxy_mode iptables, and helm enabled
- Rook installed via latest helm chart (v1.0.1),
helm install --namespace rook-ceph rook-release/rook-ceph --set agent.flexVolumeDirPath=/var/lib/kubelet/volume-plugins
with cluster-test.yml and storageclass-test.yml. This storage class was made the default. - GitLab installed via helm (11.11)
- nginx ingress service changed from LoadBalancer to externalIP
job log:
$ docker push "$BUILD_IMAGE_NAME"
The push refers to repository [registry.mydomain.com/web/auto-build-image/master]
0a667c142b26: Preparing
1e8ec32b2f91: Preparing
a21c0a6873db: Preparing
c895bf09456a: Preparing
968d46c1d20e: Preparing
b87598efb2f0: Preparing
f1b5933fe4b5: Preparing
b87598efb2f0: Waiting
f1b5933fe4b5: Waiting
1e8ec32b2f91: Layer already exists
968d46c1d20e: Layer already exists
b87598efb2f0: Layer already exists
a21c0a6873db: Layer already exists
f1b5933fe4b5: Layer already exists
0a667c142b26: Pushed
c895bf09456a: Retrying in 5 seconds
c895bf09456a: Retrying in 4 seconds
c895bf09456a: Retrying in 3 seconds
c895bf09456a: Retrying in 2 seconds
c895bf09456a: Retrying in 1 second
c895bf09456a: Retrying in 10 seconds
c895bf09456a: Retrying in 9 seconds
c895bf09456a: Retrying in 8 seconds
c895bf09456a: Retrying in 7 seconds
c895bf09456a: Retrying in 6 seconds
c895bf09456a: Retrying in 5 seconds
c895bf09456a: Retrying in 4 seconds
c895bf09456a: Retrying in 3 seconds
c895bf09456a: Retrying in 2 seconds
c895bf09456a: Retrying in 1 second
c895bf09456a: Retrying in 15 seconds
c895bf09456a: Retrying in 14 seconds
c895bf09456a: Retrying in 13 seconds
c895bf09456a: Retrying in 12 seconds
c895bf09456a: Retrying in 11 seconds
c895bf09456a: Retrying in 10 seconds
c895bf09456a: Retrying in 9 seconds
c895bf09456a: Retrying in 8 seconds
c895bf09456a: Retrying in 7 seconds
c895bf09456a: Retrying in 6 seconds
c895bf09456a: Retrying in 5 seconds
c895bf09456a: Retrying in 4 seconds
c895bf09456a: Retrying in 3 seconds
c895bf09456a: Retrying in 2 seconds
c895bf09456a: Retrying in 1 second
c895bf09456a: Retrying in 20 seconds
c895bf09456a: Retrying in 19 seconds
c895bf09456a: Retrying in 18 seconds
c895bf09456a: Retrying in 17 seconds
c895bf09456a: Retrying in 16 seconds
c895bf09456a: Retrying in 15 seconds
c895bf09456a: Retrying in 14 seconds
c895bf09456a: Retrying in 13 seconds
c895bf09456a: Retrying in 12 seconds
c895bf09456a: Retrying in 11 seconds
c895bf09456a: Retrying in 10 seconds
c895bf09456a: Retrying in 9 seconds
c895bf09456a: Retrying in 8 seconds
c895bf09456a: Retrying in 7 seconds
c895bf09456a: Retrying in 6 seconds
c895bf09456a: Retrying in 5 seconds
c895bf09456a: Retrying in 4 seconds
c895bf09456a: Retrying in 3 seconds
c895bf09456a: Retrying in 2 seconds
c895bf09456a: Retrying in 1 second
received unexpected HTTP status: 504 Gateway Time-out
ERROR: Job failed: command terminated with exit code 1
registry pod log:
time="2019-05-28T02:19:02.052602654Z" level=error msg="client disconnected during blob PATCH" auth.user.name=fury contentLength=-1 copied=24731056 error="http: unexpected EOF reading trailer" go.version=go1.11.2 http.request.host=registry.mydomain.com http.request.id=7ab88f1b-4e47-491f-aaeb-387ce10a70ce http.request.method=PATCH http.request.remoteaddr=10.233.64.1 http.request.uri="/v2/web/auto-build-image/master/blobs/uploads/705390eb-eb88-423e-9c72-d39433f35ac4?_state=BJG7VilD0PITqxhJw36_4PnZhiYu_56crVZFlxjFkUZ7Ik5hbWUiOiJ3ZWIvYXV0by1idWlsZC1pbWFnZS9tYXN0ZXIiLCJVVUlEIjoiNzA1MzkwZWItZWI4OC00MjNlLTljNzItZDM5NDMzZjM1YWM0IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDE5LTA1LTI4VDAyOjE2OjI5LjI3MjUyOTM1NFoifQ%3D%3D" http.request.useragent="docker/18.09.6 go/go1.10.8 git-commit/481bc77 kernel/4.9.0-9-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.6 \(linux\))" vars.name="web/auto-build-image/master" vars.uuid=705390eb-eb88-423e-9c72-d39433f35ac4
...
10.233.64.117 - - [28/May/2019:02:30:19 +0000] "PATCH /v2/web/auto-build-image/master/blobs/uploads/1c334efa-e7df-43d1-a82b-e92fc7d67de8?_state=5vY-ETO868QRnHC5SwBmCUXvFcQDNzBSUj-8cye0aTp7Ik5hbWUiOiJ3ZWIvYXV0by1idWlsZC1pbWFnZS9tYXN0ZXIiLCJVVUlEIjoiMWMzMzRlZmEtZTdkZi00M2QxLWE4MmItZTkyZmM3ZDY3ZGU4IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDE5LTA1LTI4VDAyOjMwOjA5LjEzODczNzk0MloifQ%3D%3D HTTP/1.1" 500 89 "" "docker/18.09.6 go/go1.10.8 git-commit/481bc77 kernel/4.9.0-9-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.6 \\(linux\\))"
time="2019-05-28T02:35:59.851231763Z" level=error msg="response completed with error" auth.user.name=fury err.code=unknown err.detail="client disconnected" err.message="unknown error" go.version=go1.11.2 http.request.host=registry.mydomain.com http.request.id=d5280657-f0f2-4420-9fed-c34ab26caa04 http.request.method=PATCH http.request.remoteaddr=10.233.64.1 http.request.uri="/v2/web/auto-build-image/master/blobs/uploads/1c334efa-e7df-43d1-a82b-e92fc7d67de8?_state=5vY-ETO868QRnHC5SwBmCUXvFcQDNzBSUj-8cye0aTp7Ik5hbWUiOiJ3ZWIvYXV0by1idWlsZC1pbWFnZS9tYXN0ZXIiLCJVVUlEIjoiMWMzMzRlZmEtZTdkZi00M2QxLWE4MmItZTkyZmM3ZDY3ZGU4IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDE5LTA1LTI4VDAyOjMwOjA5LjEzODczNzk0MloifQ%3D%3D" http.request.useragent="docker/18.09.6 go/go1.10.8 git-commit/481bc77 kernel/4.9.0-9-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.6 \(linux\))" http.response.contenttype="application/json; charset=utf-8" http.response.duration=5m40.543384288s http.response.status=500 http.response.written=89 vars.name="web/auto-build-image/master" vars.uuid=1c334efa-e7df-43d1-a82b-e92fc7d67de8
2019/05/28 02:36:46 http: multiple response.WriteHeader calls
10.233.64.112 - - [28/May/2019:02:31:55 +0000] "PATCH /v2/web/auto-build-image/master/blobs/uploads/a949ebbb-cbd2-4b94-81e7-3b009ea95af1?_state=omQnXuq6SJvTunGR1PM8dBp7E4bYWu3nIs-bsJ3-lNN7Ik5hbWUiOiJ3ZWIvYXV0by1idWlsZC1pbWFnZS9tYXN0ZXIiLCJVVUlEIjoiYTk0OWViYmItY2JkMi00Yjk0LTgxZTctM2IwMDllYTk1YWYxIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDE5LTA1LTI4VDAyOjMxOjQ5LjkzMjM2NzA1WiJ9 HTTP/1.1" 500 89 "" "docker/18.09.6 go/go1.10.8 git-commit/481bc77 kernel/4.9.0-9-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.6 \\(linux\\))"
time="2019-05-28T02:36:46.646680713Z" level=error msg="response completed with error" auth.user.name=fury err.code=unknown err.detail="client disconnected" err.message="unknown error" go.version=go1.11.2 http.request.host=registry.mydomain.com http.request.id=2300b4cc-7f3c-44d5-b370-c210b36fc50e http.request.method=PATCH http.request.remoteaddr=10.233.64.1 http.request.uri="/v2/web/auto-build-image/master/blobs/uploads/a949ebbb-cbd2-4b94-81e7-3b009ea95af1?_state=omQnXuq6SJvTunGR1PM8dBp7E4bYWu3nIs-bsJ3-lNN7Ik5hbWUiOiJ3ZWIvYXV0by1idWlsZC1pbWFnZS9tYXN0ZXIiLCJVVUlEIjoiYTk0OWViYmItY2JkMi00Yjk0LTgxZTctM2IwMDllYTk1YWYxIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDE5LTA1LTI4VDAyOjMxOjQ5LjkzMjM2NzA1WiJ9" http.request.useragent="docker/18.09.6 go/go1.10.8 git-commit/481bc77 kernel/4.9.0-9-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.6 \(linux\))" http.response.contenttype="application/json; charset=utf-8" http.response.duration=4m51.363115069s http.response.status=500 http.response.written=89 vars.name="web/auto-build-image/master" vars.uuid=a949ebbb-cbd2-4b94-81e7-3b009ea95af1
nginx-ingress log: (Where: XXX
is the actual IP of the node)
XXX - [XXX] - - [28/May/2019:02:22:19 +0000] "PUT /registry/docker/registry/v2/repositories/web/auto-build-image/master/_uploads/397a7c95-4c6d-446a-ba2f-0702e5855150/startedat HTTP/1.1" 200 0 "-" "aws-sdk-go/1.15.11 (go1.11.2; linux; amd64)" 1093 0.030 [default-gitlab-minio-svc-9000] 10.233.64.139:9000 0 0.028 200 21d1666fff24c3894bf07d2971164211
XXX - [XXX] - - [28/May/2019:02:22:21 +0000] "PUT /registry/docker/registry/v2/repositories/web/auto-build-image/master/_uploads/397a7c95-4c6d-446a-ba2f-0702e5855150/hashstates/sha256/0 HTTP/1.1" 200 0 "-" "aws-sdk-go/1.15.11 (go1.11.2; linux; amd64)" 1192 0.009 [default-gitlab-minio-svc-9000] 10.233.64.139:9000 0 0.008 200 6ae2a7087337387128b6a182e37ae273
XXX - [XXX] - - [28/May/2019:02:22:38 +0000] "POST /api/v4/jobs/request HTTP/1.1" 204 0 "-" "gitlab-runner 11.11.0 (11-11-stable; go1.8.7; linux/amd64)" 917 0.041 [default-gitlab-unicorn-8181] 10.233.64.114:8181 0 0.044 204 f22af74c035dfffce71006e65e200457
2019/05/28 02:23:48 [error] 2252#2252: *28463 upstream timed out (110: Connection timed out) while sending request to upstream, client: 10.233.64.1, server: registry.mydomain.com, request: "PATCH /v2/web/auto-build-image/master/blobs/uploads/397a7c95-4c6d-446a-ba2f-0702e5855150?_state=wL6Un-taQENi2-I9YMYXPyiXM6sqB8t-Yzdi5E1dUPJ7Ik5hbWUiOiJ3ZWIvYXV0by1idWlsZC1pbWFnZS9tYXN0ZXIiLCJVVUlEIjoiMzk3YTdjOTUtNGM2ZC00NDZhLWJhMmYtMDcwMmU1ODU1MTUwIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDE5LTA1LTI4VDAyOjIyOjE4LjE1MTkzMzgwMloifQ%3D%3D HTTP/1.1", upstream: "http://10.233.64.110:5000/v2/web/auto-build-image/master/blobs/uploads/397a7c95-4c6d-446a-ba2f-0702e5855150?_state=wL6Un-taQENi2-I9YMYXPyiXM6sqB8t-Yzdi5E1dUPJ7Ik5hbWUiOiJ3ZWIvYXV0by1idWlsZC1pbWFnZS9tYXN0ZXIiLCJVVUlEIjoiMzk3YTdjOTUtNGM2ZC00NDZhLWJhMmYtMDcwMmU1ODU1MTUwIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDE5LTA1LTI4VDAyOjIyOjE4LjE1MTkzMzgwMloifQ%3D%3D", host: "registry.mydomain.com"
10.233.64.1 - [10.233.64.1] - - [28/May/2019:02:23:48 +0000] "PATCH /v2/web/auto-build-image/master/blobs/uploads/397a7c95-4c6d-446a-ba2f-0702e5855150?_state=wL6Un-taQENi2-I9YMYXPyiXM6sqB8t-Yzdi5E1dUPJ7Ik5hbWUiOiJ3ZWIvYXV0by1idWlsZC1pbWFnZS9tYXN0ZXIiLCJVVUlEIjoiMzk3YTdjOTUtNGM2ZC00NDZhLWJhMmYtMDcwMmU1ODU1MTUwIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDE5LTA1LTI4VDAyOjIyOjE4LjE1MTkzMzgwMloifQ%3D%3D HTTP/1.1" 504 160 "-" "docker/18.09.6 go/go1.10.8 git-commit/481bc77 kernel/4.9.0-9-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.6 \x5C(linux\x5C))" 24096768 86.710 [default-gitlab-registry-5000] 10.233.64.110:5000 0 86.707 504 219162a5049d01db9fa5677b9ae0cf68
I’ve tried installing the runner as a Docker runner on two different machines to no avail.
The problem is consistently with that particular layer in this project–the problem is consistently with some other layer in the other project I’ve tried building.
The external DNS for the cluster runs through cloudflare, but I’ve set up /etc/hosts and gitlab runner extra_hosts for gitlab.mydomain.com and registry.mydomain.com to point directly to the node’s IP because cloudflare has a 100 MB upload limit. I suspected it was still running through cloudflare, but according to the nginx log it’s only making it through 22-24 MB of the upload before 504ing (after 1 minute–terrible transfer rate or something just locking up?).
I’m leaning toward some minio problem, as I don’t have this issue on my cluster at work which is deployed via the old omnibus helm chart, but otherwise very similar. (no cloudflare at all there)
Any ideas?