When investigating a problem on one of our runner servers, we found a number of processes (up to 5 months old), that looked like something started by a CI-job (indirectly, they looked like processes started by one of the scripts we have written that performs various CI-tasks). The jobs luckily have a argument (visible on the command line) that reveals what project they were fore, and upon investigation there seems to be a correspondence between when these processes were started and when jobs on those projects were cancelled. So our current theory is that proper cleanup isn’t performed when jobs are cancelled.
Understanding what happens when a job is cancelled matters in determining what we should do to improve cleanup. I.e. does the script (called from the project’s
.gitlab-ci.yml) receive a signal?