Job never finishes after being successful

Problem to solve

I’m setting up a new CI server and almost everything is smooth sailing.

most jobs are executed perfectly, but some seem to be stuck after running the steps successfully, with:

WARNING: step_script could not run to completion because the timeout was exceeded. For more control over job and script timeouts see: https://docs.gitlab.com/ee/ci/runners/configure_runners.html#set-script-and-after_script-timeouts

Steps to reproduce

I’m using VirtualBox for this and did some debugging on a running instance. The machine itself shows nothing running anymore (using ps). There is an ssh session left open idle. If I kill that process the jobs fails with an EOF.

The runner logs shows the job is still running, but as far as I can tell nothing is actually busy. I don’t know how else to debug that link between the gitlab-runner and the VirtualBox machine.

Any help is really appreciated

Configuration

I don’t have a reproducable context yet. Some jobs pass, some don’t. I cannot share the whole project. A job that seem to fail every time:

deploy to staging:
  stage: deploy
  environment: staging
  script: |
    eval `ssh-agent -s`
    ssh-add -t 5m <(echo "$SSH_PRIVATE_KEY_STAGING")

    if [ "$(git rev-parse origin/master)" == "$CI_COMMIT_SHA" ] ; then
      bundle exec cap staging deploy deploy:cleanup
    else
      echo "We're not on the last commit anymore... skipping deploy"
    fi
  only:
    - master

The task ‘bundle exec cap staging deploy deploy:cleanup’ is successfully finished

Versions

Please select whether options apply, and add the version information.

  • Self-managed
  • GitLab.com SaaS
  • Self-hosted Runners

Versions

Managed to debug a bit further.

Gitlab-runner opens a ssh session to the virtual box machine and runs the command. After running this is left over:

root         707  0.0  0.1  15720  8832 ?        Ss   13:34   0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
root        1630  0.0  0.1  19316 11136 ?        Ss   13:34   0:00  \_ sshd: gitlab-runner [priv]
gitlab-+    1633  0.0  0.1  19316  6472 ?        S    13:34   0:00  |   \_ sshd: gitlab-runner@notty

Gitlab-runner keeps the SSH session open, but nothing is running anymore. This stays like this until the whole job is timed out and killed