Job never finishes after being successful

rvanlieshout · September 11, 2024, 10:37am

Problem to solve

I’m setting up a new CI server and almost everything is smooth sailing.

most jobs are executed perfectly, but some seem to be stuck after running the steps successfully, with:

WARNING: step_script could not run to completion because the timeout was exceeded. For more control over job and script timeouts see: https://docs.gitlab.com/ee/ci/runners/configure_runners.html#set-script-and-after_script-timeouts

Steps to reproduce

I’m using VirtualBox for this and did some debugging on a running instance. The machine itself shows nothing running anymore (using ps). There is an ssh session left open idle. If I kill that process the jobs fails with an EOF.

The runner logs shows the job is still running, but as far as I can tell nothing is actually busy. I don’t know how else to debug that link between the gitlab-runner and the VirtualBox machine.

Any help is really appreciated

Configuration

I don’t have a reproducable context yet. Some jobs pass, some don’t. I cannot share the whole project. A job that seem to fail every time:

deploy to staging:
  stage: deploy
  environment: staging
  script: |
    eval `ssh-agent -s`
    ssh-add -t 5m <(echo "$SSH_PRIVATE_KEY_STAGING")

    if [ "$(git rev-parse origin/master)" == "$CI_COMMIT_SHA" ] ; then
      bundle exec cap staging deploy deploy:cleanup
    else
      echo "We're not on the last commit anymore... skipping deploy"
    fi
  only:
    - master

The task ‘bundle exec cap staging deploy deploy:cleanup’ is successfully finished

Versions

Please select whether options apply, and add the version information.

Self-managed
GitLab.com SaaS
Self-hosted Runners

Versions

GitLab (Web: /help or self-managed system information): 17.3.1
GitLab Runner, if self-hosted (Web /admin/runners or CLI gitlab-runner --version): 17.3.1

rvanlieshout · September 12, 2024, 11:52am

Managed to debug a bit further.

Gitlab-runner opens a ssh session to the virtual box machine and runs the command. After running this is left over:

root         707  0.0  0.1  15720  8832 ?        Ss   13:34   0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
root        1630  0.0  0.1  19316 11136 ?        Ss   13:34   0:00  \_ sshd: gitlab-runner [priv]
gitlab-+    1633  0.0  0.1  19316  6472 ?        S    13:34   0:00  |   \_ sshd: gitlab-runner@notty

Gitlab-runner keeps the SSH session open, but nothing is running anymore. This stays like this until the whole job is timed out and killed

Topic		Replies	Views
Unable to finish CI Job GitLab CI/CD ssh	3	1401	February 12, 2019
Make the job succeed after stopping the server GitLab CI/CD ci , runner , ssh	0	480	February 18, 2019
GitLab.com runner never finishes after script is completed GitLab CI/CD	18	9662	October 19, 2022
Pipeline runs indefinitely after "Job succeeded" GitLab CI/CD ci , runner	3	304	September 15, 2024
GitLab runner seems to stall when executing the 2nd, 3rd or n-th ssh-based command GitLab CI/CD ci , runner , ssh	4	3479	April 16, 2019

Job never finishes after being successful

Problem to solve

Steps to reproduce

Configuration

Versions

Related topics