Recently I’m trying to deploy a CI pipeline of a special project with gitlab. Its unit test would generate a kernel module and insmod it to test. However, sometimes the bugs in codes could lead to a kernel crash. I think I can use a gitlab runner with type ssh
to execute the test. After the kernel crash, I think the CI pipeline could fetch the failure and execute some commands to reboot the remote machine. Is it possible to do it?
In the runner configuration you can set a post_build_script
which runs after the script
section of a pipeline job, but before the after_script
section. This is where you might reboot your machine.
However, if you can, it might be easier to use a Docker runner, then if the container for your build job crashes, you won’t need to reboot the physical machine.
There are a few other forum topics and blog posts about building kernels, so you might find some useful inspiration there.
Sure, using a docker runner is easier. However, a RDMA-NIC is needed for our unit tests, making the docker container more complex.
OK, I’ll try the post_build_script
because I tried to use after_script
, where I simulated a crash using iDRAC to reboot the remote test machine. It seems like the broken ssh connection is regarded as a serious failure, and the runner machine is unable to do something to reboot the machine after fetching the failure. It just throws the error ERROR: Job failed (system failure): wait: remote command exited without exit status or exit signal
and doesn’t execute the after_script
. Also, It seems like using post_build_script
means we need to reboot the machine every time the job is finished, am I right?
I expect to fetch the failure and remotely reboot the machine using the racadm
command. From what I learned from GitLab CI now, it looks impossible to do this.
I tried to use post_build_script
, which shows that it didn’t work. The runner machine didn’t execute the command in post_build_script
when the crash occurred.
By the way, is it possible for a runner whose executor is ssh
to execute some command after the ssh connection? E.g., I run a test and find a kernel BUG with dmesg
, so I want to reboot the remote machine but considering that the runner is still in the connection, script
and after_script
won’t work. It seems like letting the runner know we have a problem is also difficult.
The pre_clone_script
is probably the closest you’ll get to executing something right after the ssh
connection has been made.
Thanks!