(Mostly-)Reproducible CI instance crash when using KitchenCI testing on Debian 10 & openSUSE Leap 15.2

I was about to report this as a GitLab issue [1] but followed the instructions there to try here first.


We’re using GitLab CI for ~100 repos under our SaltStack-Formulas GitHub organisation [2]. This has been working really well since the end of last year.

I’ve just hit a mostly-reproducible CI crash (Segmentation fault) when refactoring a couple of our formula repos. I cannot cause the same crash in GitHub Actions nor when testing locally. The actual crash appears to be the same type for both repos. For both repos, I can only trigger the crash when testing on Debian 10 and openSUSE Leap 15.2. All the other Debian and openSUSE instances haven’t crashed once (not even openSUSE Leap 15.3). None of the other Linux instances are affected. These changes have also been tested in CI and locally on Windows and FreeBSD without issue.


packages-formula

This is the main repo’s pipelines:

  • [3]
    • While there have been some usual failures, the crash has never been encountered before.

I was testing out the refactor in my fork.

This was the first pipeline where I triggered the crash:

I then retriggered the CI without pushing another commit and this time only the Debian 10 instance crashed [5].

I then reduced the instances to only include the likely crash candidates but this time the Leap 15.2 instance crashed [6].

I had already tested locally a number of times by this point and couldn’t reproduce it. Since we’ve got an easy way to test the same setup in GitHub Actions, I ran that as well, to see if I could reproduce it. I couldn’t, even with the subsequent attempts at narrowing down:

In the last pipeline I ran on GitLab, it seemed to be passing for a while but repeating the instances eventually reproduced it again:


php-formula

This is the main repo’s pipelines:

  • [7]
    • Again, the crash has never been encountered before.

I was testing out the refactor in my fork. Same situation took place again.

First pipeline:

Reducing the instances and rerunning:

Again, all fine locally and then in GitHub Actions:


Further information

  • The actual gem that seems to be the source of the crash [9].
  • The repo that produces the gem [10].
  • The testing framework being used that is crashing when attempting to run kitchen verify [11].

Reducing links using manual footnotes

Apologies but I couldn’t post all the links due to hitting Sorry, new users can only put 10 links in a post. – moving some of them here instead. Unfortunately, this post is less pleasant to read now.

[1] https://gitlab.com/gitlab-org/gitlab/-/issues/new?issue
[2] https://github.com/saltstack-formulas
[3] https://gitlab.com/saltstack-formulas/packages-formula/-/pipelines
[4] https://gitlab.com/myii/packages-formula/-/pipelines/381398800
[5] https://gitlab.com/myii/packages-formula/-/pipelines/381499971
[6] https://gitlab.com/myii/packages-formula/-/pipelines/381529171
[7] https://gitlab.com/saltstack-formulas/php-formula/-/pipelines
[8] https://gitlab.com/myii/php-formula/-/pipelines/381561725
[9] https://rubygems.org/gems/aws-sdk-ec2
[10] https://github.com/aws/aws-sdk-ruby
[11] https://github.com/test-kitchen/test-kitchen