Jobs in K8s exiting as expected, but the pod keep running

Replace this template with your information

Describe your question in as much detail as possible:

  • What are you seeing, and how does that differ from what you expect to see?
    Some of the jobs finish with
{"command_exit_code": 0, "script": "/scripts-142-420934/step_script"}

or

{"command_exit_code": 1, "script": "/scripts-142-420934/step_script"}

and randomly get stuck in a “running” state (the pod).
In the UI the user sees the job is done and CI is complete as expected, but for me, when taking care of the cluster, this is very annoying.

  • What version are you on? Are you using self-managed or GitLab.com?

    • GitLab (Hint: /help): 14.6.1-ee
    • Runner (Hint: /admin/runners): gitlab-org/gitlab-runner:alpine-v16.3.0
  • Add the CI configuration from .gitlab-ci.yml and other configuration if relevant (e.g. docker-compose.yml)
    This is not relevant, the jobs was worked fine in normal docker runner, so we just move into K8s, and it’s work the same, but some times the exit is stuck the pod on running, so it’s not related to any specific pipeline.

  • What troubleshooting steps have you already taken? Can you link to any docs or other resources so we know where you have been?
    Nothing - I don’t know where to start
    I have a corn job that checks if a pod’s last log is ““command_exit_code”:” and if so, delete the pod, this is a work around, but not a solution.