GitLab CI jobs queued on retry

Hello GitLab community!

I have a pipeline which is configured to retry failing jobs. That was working perfect for a long time, but last few weeks we observed that something changed. So if job failed on pipeline it’s pending with an information
This job is in pending state and is waiting to be picked by a runner
We created a couple of runners as we thought that’s problem with lack of them, but it’s not. Only pending jobs are these with retry option. I configure retry using configuration below

.runner_tags: &runner_tags
  image: ${ANSIBLE_DOCKER_IMAGE}:${ANSIBLE_DOCKER_TAG}
  retry: 1
  extends: .ansible_run_tags

I’m using public GitLab with my custom runners (docker type). One additional note: I can prove that, but I think the pending time is related with time how long failing job took.

I hope someone can help me to find solution. It’s annoying as we execute these pipelines using API and other process waits until pipeline is finished with some timeout. So pipeline is running but main process returns it took too long :frowning:

My job is queued and on logs of runner I can see:

Nov 23 09:15:24 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring request slot                            #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=cxs9Mjg-
Nov 23 09:15:26 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mFeeding runners to channel                        #033[0;m  #033[37;1mbuilds#033[0;m=0
Nov 23 09:15:26 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mFeeding runner to channel                         #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:26 integrationgitlabrunner-1 gitlab-runner[416730]: Checking for jobs...nothing                       #033[0;m  runner#033[0;m=2nAJXMiK
Nov 23 09:15:26 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mProcessing runner                                 #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:26 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring executor from provider                  #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:26 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring job slot                                #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:26 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring request slot                            #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:27 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mFeeding runner to channel                         #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=cxs9Mjg-
Nov 23 09:15:27 integrationgitlabrunner-1 gitlab-runner[416730]: Checking for jobs...nothing                       #033[0;m  runner#033[0;m=cxs9Mjg-
Nov 23 09:15:27 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mProcessing runner                                 #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=cxs9Mjg-
Nov 23 09:15:27 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring executor from provider                  #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=cxs9Mjg-
Nov 23 09:15:27 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring job slot                                #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=cxs9Mjg-
Nov 23 09:15:27 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring request slot                            #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=cxs9Mjg-
Nov 23 09:15:29 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mFeeding runners to channel                        #033[0;m  #033[37;1mbuilds#033[0;m=0
Nov 23 09:15:29 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mFeeding runner to channel                         #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:41 integrationgitlabrunner-1 gitlab-runner[416730]: Checking for jobs...nothing                       #033[0;m  runner#033[0;m=2nAJXMiK
Nov 23 09:15:41 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mProcessing runner                                 #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:41 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring executor from provider                  #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:41 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring job slot                                #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:41 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring request slot                            #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:42 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mFeeding runner to channel                         #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=cxs9Mjg-

Can be something related with a version of gitlab runner?

And after job finally executed – it’s not on logs, nothing, so is it possible that job is picked but GitLab CI is not aware about that?

Additional thing:
job is pending and queued until it’s run on the same runner - even if it’s available or when I pause runner on project and enable it - then immidiately job is running on it. API calls responses

  "id": 3368503264,
  "status": "pending",
  "stage": "healthcheck-2",
  "name": "hvves",
  "ref": "master",
  "tag": false,
  "coverage": null,
  "allow_failure": false,
  "created_at": "2022-11-23T12:31:45.443Z",
  "started_at": null,
  "finished_at": null,
  "duration": null,
  "queued_duration": 103.90318713,
  "user": {
    
  },
  "commit": {
    
  },
  "pipeline": {

  },
  "web_url": "https://gitlab.com/Orange-OpenSource/lfn/onap/xtesting-onap/-/jobs/3368503264",
  "project": {
    "ci_job_token_scope_enabled": false
  },
  "artifacts": [],
  "runner": null,
  "artifacts_expire_at": null,
  "tag_list": [
    "kubernetes"
  ]
}

running

{
  "id": 3368503264,
  "status": "running",
  "stage": "healthcheck-2",
  "name": "hvves",
  "ref": "master",
  "tag": false,
  "coverage": null,
  "allow_failure": false,
  "created_at": "2022-11-23T12:31:45.443Z",
  "started_at": "2022-11-23T12:37:27.524Z",
  "finished_at": null,
  "duration": 2.918878457,
  "queued_duration": 341.971386,
  "user": {

  },
  "commit": {

  },
  "pipeline": {

  },
  "web_url": "https://gitlab.com/Orange-OpenSource/lfn/onap/xtesting-onap/-/jobs/3368503264",
  "project": {
    "ci_job_token_scope_enabled": false
  },
  "artifacts": [],
  "runner": {
    "id": 19198544,
    "description": "ONAP integration runner #1",
    "active": true,
    "paused": false,
    "is_shared": false,
    "runner_type": "project_type",
    "name": "gitlab-runner",
    "online": true,
    "status": "online"
  },
  "artifacts_expire_at": null,
  "tag_list": [
    "kubernetes"
  ]
}

Since you are running your own Runners, it always helps to actually include it’s version in the post.

Hi @balonik ,

Sorry I missed that, but, to be honest, I’m not sure if that really matters in that case, as we were checked on a lot of runners and for each of them we have the same issue. But to be clear - we tested on runners with versions:

  • 15.6.0
  • 15.5.1
  • 14.3.0

Right now an interim solution is to, using GitLab API, get pending jobs and if there are any, we pause and activate one of the runners - that unlock pending jobs.