Intermittent 500 errors on token retrieval and git clone

We are seeing a substantial number of 500 errors during our builds, and an increase in the last few days. We are hosting our own runners, and using Gitlab’s control plane.

During our build, after we do a shallow clone of our repo, we then fetch a limited depth of revs.
Then, if the running pipeline job determines that this pipeline is no longer the latest, we get it to cancel itself with:

# Cancel this pipeline if it is no longer the latest for the current branch
[[ $CI_BRANCH_LATEST_COMMIT_SHA != "$CI_COMMIT_SHA" ]] && curl --header "PRIVATE-TOKEN: $GITLAB_USER_TOKEN" -X POST "$CI_API_V4_URL/projects/$CI_PROJECT_ID/pipelines/$CI_PIPELINE_ID/cancel"

All of these steps work in the general case. However, we are seeing a sharp uptick in random 500 errors coming from gitlab’s CI servers:

fatal: unable to access 'https://gitlab-ci-token:[MASKED]@gitlab.com/redacted/redacted.git/': The requested URL returned error: 500

and:

error: RPC failed; HTTP 500 curl 22 The requested URL returned error: 500 Internal Server Error fatal: The remote end hung up unexpectedly

Sometimes the original clone itself errors too (but we have set up a retry on that).

Gitlab’s status page shows nothing wrong, but we are seeing instability. We shall try to increase our resiliency, but can you see any increase in error rates on your servers?

Hello, @hlascelles and thanks for reaching out to us.

Are you still experiencing these issues? I think that our Ci should be working properly now.

You can always find more status info via https://twitter.com/gitlabstatus, but feel free to reach out to our Support by opening a ticket at https://support.gitlab.com

We have wrapped it all in custom retry code, so we aren’t seeing failures bubble up any more.