Gitlab-runner unreliable when cloning repository

About every 10 builds or so I’m getting some kind of failure to clone from the GitLab repo, e.g. one of the below:

remote: Internal server error
error: RPC failed; HTTP 500 curl 22 The requested URL returned error: 500 Internal Server Error
fatal: The remote end hung up unexpectedly
fatal: unable to access 'https://gitlab-ci-token:xxxxxxxxxxxxxxxxxxxx@gitlab.com/blahblah.git/': The requested URL returned error: 500

A retry generally works, but this failure rate is too high for something intended for use as a regression test.

EDIT: This fails both from Gitlab.com and from my own private docker runner.

Is it something that you are aware of and will be fixed?

(maybe an automatic retry if the job fails before running any customer test code would at least be an improvement ?)

This could be related to https://gitlab.com/gitlab-com/infrastructure/issues/3909#note_64513266.

If you keep seeing the failure, please comment in the thread.

I’ll see how it goes and let you know.

From a system perspective I think the runner end needs to be more robust to these “should never happen” events. I’ve also seen the docker image pull fail, BTW.

I would suggest a couple of retries (to different servers) at least, before giving up.

And possibly a different final state. An infrastructure failure (before any user code has been compiled at all) should not look the same as a test failure.

It is essential to a CI flow that the actual system is very solid, as any fail signals to all developers that the build is broken.

Did you figure anything out on this? I’m seeing similar errors in a large repo that was working fine before Gitlab 11.

I’ve only seen one random fail since then… however haven’t been in such intensive use.