JUnit result report merge strategy for multiple jobs and multiple re-runs of flaky tests

Hello,

Does anyone know how the JUnit test reports work with the GitLab pipeline view “Test” tab when we have:

  1. multiple jobs?

  2. parallelized matrix jobs?

  3. rerunning flaky test suites?

I can see that point 1. and 2. seems to work fine (the results are merged together in the test tab on the pipeline in the GitLab UI).

What is hard to find docs on is item 3. in this setting - rerunning tests.

Is it showing the results of the last run tests, the first, the most positive, or most negative test result when merging test results from all those jobs together?

Example:

Imagine we have one test that is run 3 times:

  1. first run: failing
  2. second run: succeeds
  3. third run: failing again

What will this show as in the test panel? I assume it is the last result (3rd run, “failing” in this example), but I am struggling to find docs on that.

More background on what/why:

We have many long running and still somewhat flaky e2e tests as part of our build pipeline. To speed it up, we have parallelized them (by the parallel.matrix feature in GitLab CI), and scheduled them (nightly) with some automatic reruns when failed, and allowing additional manual reruns.

With parallelization of 4, for a given pipeline stage we can have e.g. 16 jobs where 4 of them are reruns of the same, with different tests failing at different runs. (Sounds horrible, yes I know, and we are continuously working on stabilizing it).

Sometimes all tests goes green, so we know that the baseline on our master branch can be all OK. Then, before a release, we run all those tests, usually a few times (nightly). And for now, as long as a test goes green once, we assume that the other failed runs of the same tests are flaky tests (something timing out etc.), and can move on with the release, as the new code presumably did not completely break the tests. Before we get a green light to release, there is now a lot of chaotic sifting through build logs, copy paste and diffing test reports, to figure out if the same tests failed all times.

So, for me in this case, it would be good if I could assume (or configure) that the pipeline Test view shows test results merged together as the most positive result (e.g. if it succeeded at least once, regardless of the execution order, show it green). I see that this may not be the desired test report merge strategy in many other cases (for our unit tests, the slightest flakiness introduced should break the build), so if it is configurable, that would probably be the best.

Other tips/tools would also be helpful (apart from getting our shit together and get our e2e tests reliability straightened out - we are painfully aware of that and working on it).