CI Job stuck after successful completion

I’m facing an issue where CI jobs are being executed successfully, the logs even show “Job succeeded” and the artifacts can be viewed without issue. However the CI job in question never leaves the running state until it times out after multiple hours.

The expected result is for the CI jobs to go to the completed state after finishing, which they did until the issue started yesterday.

I’m using a self-managed GitLab instance running on Kubernetes, it is running the latest version (Helm Chart: gitlab-4.10.0, App: 13.10.0-ee).
I’m also using the GitLab runner included with the helm chart.
There have been no configuration changes in either GitLab or .gitlab-ci.yml and there has not been an update in the meantime.

So far I’ve taken the following steps to attempt to troubleshoot the issue:

  • Check the available disk space for all related PersistentVolumes in Kubernetes: no problems there
  • Checked the size of Sidekiqs job queue after reading the following issue and it’s follow up, (Pipelines stuck in "Running" despite jobs having completed successfully (#47226) · Issues · GitLab.org / GitLab FOSS · GitLab): Sidekiq has 0 enqueued jobs, so an overflowing queue isn’t the issue
  • Attempted to use another GitLab runner (Official docker image, Docker executor with a volume mount to allow it to use Docker on the host) on another server, this resulted in the same behaviour
  • Attempted to run a CI Job on another GitLab server (GitLab CE Docker image, 13.10.0) with a runner configured identical to the attempt above. This was successful, ruling out issues with the GitLab runner itself or the helper image being used for Docker and Kubernetes executors
  • Downgrading the instance to 13.9.0 and 13.8.0, this did not solve the issue either

I would appreciate any ideas or suggestions on how to find and/or resolve the issue.
Feel free to ask me any questions about my setup if you feel there is essential information missing.

Any updates/ideas from someone?

I’ve a very similar issue, in my case the job completes ~5m after showing “Job succeeded”. Only then the next stage gets tackled. This increases our build time a lot… :frowning:

Any update here?

I’m also experiencing the same issue.