We are seeing a substantial number of 500 errors during our builds, and an increase in the last few days. We are hosting our own runners, and using Gitlab’s control plane.
During our build, after we do a shallow clone of our repo, we then fetch a limited depth of revs.
Then, if the running pipeline job determines that this pipeline is no longer the latest, we get it to cancel itself with:
# Cancel this pipeline if it is no longer the latest for the current branch
[[ $CI_BRANCH_LATEST_COMMIT_SHA != "$CI_COMMIT_SHA" ]] && curl --header "PRIVATE-TOKEN: $GITLAB_USER_TOKEN" -X POST "$CI_API_V4_URL/projects/$CI_PROJECT_ID/pipelines/$CI_PIPELINE_ID/cancel"
All of these steps work in the general case. However, we are seeing a sharp uptick in random 500 errors coming from gitlab’s CI servers:
fatal: unable to access 'https://gitlab-ci-token:[MASKED]@gitlab.com/redacted/redacted.git/': The requested URL returned error: 500
and:
error: RPC failed; HTTP 500 curl 22 The requested URL returned error: 500 Internal Server Error fatal: The remote end hung up unexpectedly
Sometimes the original clone itself errors too (but we have set up a retry on that).
Gitlab’s status page shows nothing wrong, but we are seeing instability. We shall try to increase our resiliency, but can you see any increase in error rates on your servers?