GitLab CI jobs queued on retry

mjagielloTMPL · November 22, 2022, 2:46pm

Hello GitLab community!

I have a pipeline which is configured to retry failing jobs. That was working perfect for a long time, but last few weeks we observed that something changed. So if job failed on pipeline it’s pending with an information
This job is in pending state and is waiting to be picked by a runner
We created a couple of runners as we thought that’s problem with lack of them, but it’s not. Only pending jobs are these with retry option. I configure retry using configuration below

.runner_tags: &runner_tags
  image: ${ANSIBLE_DOCKER_IMAGE}:${ANSIBLE_DOCKER_TAG}
  retry: 1
  extends: .ansible_run_tags

I’m using public GitLab with my custom runners (docker type). One additional note: I can prove that, but I think the pending time is related with time how long failing job took.

I hope someone can help me to find solution. It’s annoying as we execute these pipelines using API and other process waits until pipeline is finished with some timeout. So pipeline is running but main process returns it took too long

mjagielloTMPL · November 23, 2022, 9:17am

My job is queued and on logs of runner I can see:

Nov 23 09:15:24 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring request slot                            #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=cxs9Mjg-
Nov 23 09:15:26 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mFeeding runners to channel                        #033[0;m  #033[37;1mbuilds#033[0;m=0
Nov 23 09:15:26 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mFeeding runner to channel                         #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:26 integrationgitlabrunner-1 gitlab-runner[416730]: Checking for jobs...nothing                       #033[0;m  runner#033[0;m=2nAJXMiK
Nov 23 09:15:26 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mProcessing runner                                 #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:26 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring executor from provider                  #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:26 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring job slot                                #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:26 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring request slot                            #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:27 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mFeeding runner to channel                         #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=cxs9Mjg-
Nov 23 09:15:27 integrationgitlabrunner-1 gitlab-runner[416730]: Checking for jobs...nothing                       #033[0;m  runner#033[0;m=cxs9Mjg-
Nov 23 09:15:27 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mProcessing runner                                 #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=cxs9Mjg-
Nov 23 09:15:27 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring executor from provider                  #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=cxs9Mjg-
Nov 23 09:15:27 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring job slot                                #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=cxs9Mjg-
Nov 23 09:15:27 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring request slot                            #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=cxs9Mjg-
Nov 23 09:15:29 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mFeeding runners to channel                        #033[0;m  #033[37;1mbuilds#033[0;m=0
Nov 23 09:15:29 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mFeeding runner to channel                         #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:41 integrationgitlabrunner-1 gitlab-runner[416730]: Checking for jobs...nothing                       #033[0;m  runner#033[0;m=2nAJXMiK
Nov 23 09:15:41 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mProcessing runner                                 #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:41 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring executor from provider                  #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:41 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring job slot                                #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:41 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mAcquiring request slot                            #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=2nAJXMiK
Nov 23 09:15:42 integrationgitlabrunner-1 gitlab-runner[416730]: #033[37;1mFeeding runner to channel                         #033[0;m  #033[37;1mbuilds#033[0;m=0 #033[37;1mrunner#033[0;m=cxs9Mjg-

Can be something related with a version of gitlab runner?

mjagielloTMPL · November 23, 2022, 9:32am

And after job finally executed – it’s not on logs, nothing, so is it possible that job is picked but GitLab CI is not aware about that?

mjagielloTMPL · November 23, 2022, 12:42pm

Additional thing:
job is pending and queued until it’s run on the same runner - even if it’s available or when I pause runner on project and enable it - then immidiately job is running on it. API calls responses

  "id": 3368503264,
  "status": "pending",
  "stage": "healthcheck-2",
  "name": "hvves",
  "ref": "master",
  "tag": false,
  "coverage": null,
  "allow_failure": false,
  "created_at": "2022-11-23T12:31:45.443Z",
  "started_at": null,
  "finished_at": null,
  "duration": null,
  "queued_duration": 103.90318713,
  "user": {
    
  },
  "commit": {
    
  },
  "pipeline": {

  },
  "web_url": "https://gitlab.com/Orange-OpenSource/lfn/onap/xtesting-onap/-/jobs/3368503264",
  "project": {
    "ci_job_token_scope_enabled": false
  },
  "artifacts": [],
  "runner": null,
  "artifacts_expire_at": null,
  "tag_list": [
    "kubernetes"
  ]
}

running

{
  "id": 3368503264,
  "status": "running",
  "stage": "healthcheck-2",
  "name": "hvves",
  "ref": "master",
  "tag": false,
  "coverage": null,
  "allow_failure": false,
  "created_at": "2022-11-23T12:31:45.443Z",
  "started_at": "2022-11-23T12:37:27.524Z",
  "finished_at": null,
  "duration": 2.918878457,
  "queued_duration": 341.971386,
  "user": {

  },
  "commit": {

  },
  "pipeline": {

  },
  "web_url": "https://gitlab.com/Orange-OpenSource/lfn/onap/xtesting-onap/-/jobs/3368503264",
  "project": {
    "ci_job_token_scope_enabled": false
  },
  "artifacts": [],
  "runner": {
    "id": 19198544,
    "description": "ONAP integration runner #1",
    "active": true,
    "paused": false,
    "is_shared": false,
    "runner_type": "project_type",
    "name": "gitlab-runner",
    "online": true,
    "status": "online"
  },
  "artifacts_expire_at": null,
  "tag_list": [
    "kubernetes"
  ]
}

balonik · November 23, 2022, 5:15pm

Since you are running your own Runners, it always helps to actually include it’s version in the post.

mjagielloTMPL · November 24, 2022, 6:18am

Hi @balonik ,

Sorry I missed that, but, to be honest, I’m not sure if that really matters in that case, as we were checked on a lot of runners and for each of them we have the same issue. But to be clear - we tested on runners with versions:

15.6.0
15.5.1
14.3.0

Right now an interim solution is to, using GitLab API, get pending jobs and if there are any, we pause and activate one of the runners - that unlock pending jobs.

jbermejo-veridas · November 28, 2022, 9:32am

Hey!

We are having the same issue. Previously the queue/retry system was working but not anymore. Do we know something more about this? The interin solution doesn’t work in our org

Thanks for the help!

sacrificulum · December 8, 2022, 3:19pm

We are facing the same issue as well. Any progress on this?

uhon · December 15, 2022, 4:23pm

We have the same issue as well with public Gitlab and custom docker based runners on a linux machine (no kubernetes involved)

asdasdasd · December 27, 2022, 7:51am

Hi there, we also have the same issue with the following deployment:

GitLab Server:
Version: 15.7.0-ee
Type: On Premise (Self Hosted)
Installation Method: Omnibus
OS: RHEL 8

GitLab Runner Server:
Version: 15.7.1
Type: On Premise (Self Hosted)
Installation Method: Omnibus (from GitLab repo)
OS: RHEL 8
Runner Engine: Docker

Feel free to ping me if there is another information needed.

Thank you.

asdasdasd · January 7, 2023, 6:26am

Hi, is there any solution for this bug?

It’s weird since the jobs is not running only on docker runner (inside a GitLab runner VM), but when I use GitLab runner inside Kubernetes with Kubernetes engine, the queue is working.

Is there a bug on GitLab runner docker engine?

cc GitLab team: @tnir @dnsmichi

asdasdasd · January 10, 2023, 12:07am

Hi,

I solved the problem by commenting out the listen_address = "[::]:8093" on /etc/gitlab-runner/config.toml

It seems the bug related to "409 Conflict" causes Runner to not run any jobs, and give up checking for new jobs for half an hour (#4360) · Issues · GitLab.org / gitlab-runner · GitLab and Runner doesn't pick up jobs, receives 409 conflict on each attempt. (#29466) · Issues · GitLab.org / gitlab-runner · GitLab

Please try on your runner.

dealycont · January 10, 2023, 2:13pm

I hope someone can help me to find solution. It’s annoying as we execute these pipelines using API and other process waits until pipeline is finished with some timeout. So pipeline is running but main process returns it took too long

rajewluk · January 19, 2023, 9:09am

There is another workaround described here gitlab-runner sometimes ignores jobs (#22088) · Issues · GitLab.org / GitLab · GitLab

stanhu · April 1, 2023, 1:56pm

I believe this issue will be fixed by Start pipeline in after_commit callback when retrying jobs (!116480) · Merge requests · GitLab.org / GitLab · GitLab.

Topic		Replies	Views
Gitlab.com pipeline jobs stuck on pending status GitLab CI/CD ci , runner , pipelines	9	31050	July 29, 2021
Job queue times GitLab CI/CD ci , runner , pipelines	6	8388	December 12, 2022
Fail CI/CD jobs that remain pending for too long GitLab CI/CD ci , runner	2	1995	September 14, 2022
Pipeline job stuck in pending GitLab CI/CD	0	6612	September 12, 2017
GitLab CI/CD pipeline are stuck and are not in progress GitLab CI/CD runner	1	2295	January 6, 2022

GitLab CI jobs queued on retry

Related topics