Flaky Tests in CI and How To Look At Them¶
Since it has been identified that flaky tests are one the pain points for dev-owned testing I always wanted to write more resilient tests in CI. The solution provided might not be the best, but I hope it will suffice.
Introduction¶
There are a couple of hurdles I needed to overcome
- phpunit silently deprecated --retry option
- phpunit no longer supports hooks instead opting for read-only event system
- I wanted for the failing tests to not only be skipped but properly retried
So it made sense to - perform an initial run, where cheap and non-complex tests are expected to pass - take the failing tests from the initial run and retry them - retry them for n-number of attempts, with the attempts progressively backing off
Implementation¶
The implementation is not as pretty as the solution itself. Due to inflexibility of the PHPUnit runner, I had to bring forward a number of less-than-optimal solutions.
All logic is done in TestRunner.php since the retry logic is a bit complicated.
Without further ado:
1. initial run is performed, which
- is run with TEST_LOG_FLAKY_TESTS=true logging failing tests to tmp/test-flaky.log
- writes the junit report to tmp/junit-initial.xml
2. if there are any failing tests in tmp/junit-initial.xml:
- the retries are performed in-order
- the retries write to tmp/junit-${HASH_OF_TEST}.xml
- when the test is successful, it is removed from retries and the initial state is updated
3. when there are no more tests to retry, the final state is written to tmp/junit-final.xml
Running with Retry Logic on Your Machine¶
It's actually dead simple, just make test-integration-retry.