Skip to main content

Probabilistic and Systematic Coverage of Consecutive Test-Method Pairs for Detecting Order-Dependent Flaky Tests

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 12651)


Software developers frequently check their code changes by running a set of tests against their code. Tests that can nondeterministically pass or fail when run on the same code version are called flaky tests. These tests are a major problem because they can mislead developers to debug their recent code changes when the failures are unrelated to these changes. One prominent category of flaky tests is order-dependent (OD) tests, which can deterministically pass or fail depending on the order in which the set of tests are run. By detecting OD tests in advance, developers can fix these tests before they change their code. Due to the high cost required to explore all possible orders (n! permutations for n tests), prior work has developed tools that randomize orders to detect OD tests. Experiments have shown that randomization can detect many OD tests, and that most OD tests depend on just one other test to fail. However, there was no analysis of the probability that randomized orders detect OD tests. In this paper, we present the first such analysis and also present a simple change for sampling random test orders to increase the probability. We finally present a novel algorithm to systematically explore all consecutive pairs of tests, guaranteeing to detect all OD tests that depend on one other test, while running substantially fewer orders and tests than simply running all test pairs.


  • Flaky tests
  • Order dependent
  • Test-pair coverage

Tao Xie is with the Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, China.


  1. Apache Hadoop (2020),

  2. Bell, J., Kaiser, G., Melski, E., Dattatreya, M.: Efficient dependency detection for safe Java test acceleration. In: ESEC/FSE (2015)

    Google Scholar 

  3. Coefficient of variation (2020),

  4. Cucumber (2020),

  5. Facebook testing and verification request for proposals (2019),

  6. Gambi, A., Bell, J., Zeller, A.: Practical test dependency detection. In: ICST (2018)

    Google Scholar 

  7. Golomb, S.W., Taylor, H.: Tuscan squares – A new family of combinatorialdesigns. Ars Combinatoria (1985)

    Google Scholar 

  8. Google: Avoiding flakey tests (2008),

  9. Gyori, A., Shi, A., Hariri, F., Marinov, D.: Reliable testing: Detecting state-polluting tests to prevent test dependency. In: ISSTA (2015)

    Google Scholar 

  10. Harman, M., O’Hearn, P.: From start-ups to scale-ups: Opportunities and open problems for static and dynamic program analysis. In: SCAM (2018)

    Google Scholar 

  11. Herzig, K., Greiler, M., Czerwonka, J., Murphy, B.: The art of testing less without sacrificing quality. In: ICSE (2015)

    Google Scholar 

  12. Herzig, K., Nagappan, N.: Empirically detecting false test alarms using association rules. In: ICSE (2015)

    Google Scholar 

  13. Houston, R.: Tackling the minimal superpermutation problem (2014), arXiv

    Google Scholar 

  14. Huo, C., Clause, J.: Improving oracle quality by detecting brittle assertions and unused inputs in tests. In: FSE (2014)

    Google Scholar 

  15. iDFlakies: Flaky test dataset (2020),

  16. Jiang, H., Li, X., Yang, Z., Xuan, J.: What causes my test alarm? Automatic cause analysis for test alarms in system and integration testing. In: ICSE (2017)

    Google Scholar 

  17. JUnit (2020),

  18. Kowalczyk, E., Nair, K., Gao, Z., Silberstein, L., Long, T., Memon, A.: Modeling and ranking flaky tests at Apple. In: ICSE SEIP (2020)

    Google Scholar 

  19. Lam, W.: Illinois Dataset of Flaky Tests (IDoFT) (2020),

  20. Lam, W., Godefroid, P., Nath, S., Santhiar, A., Thummalapenta, S.: Root causing flaky tests in a large-scale industrial setting. In: ISSTA (2019)

    Google Scholar 

  21. Lam, W., Muşlu, K., Sajnani, H., Thummalapenta, S.: A study on the lifecycle of flaky tests. In: ICSE (2020)

    Google Scholar 

  22. Lam, W., Oei, R., Shi, A., Marinov, D., Xie, T.: iDFlakies: A framework for detecting and partially classifying flaky tests. In: ICST (2019)

    Google Scholar 

  23. Lam, W., Shi, A., Oei, R., Zhang, S., Ernst, M.D., Xie, T.: Dependent-test-aware regression testing techniques. In: ISSTA (2020)

    Google Scholar 

  24. Lam, W., Winter, S., Astorga, A., Stodden, V., Marinov, D.: Understanding reproducibility and characteristics of flaky tests through test reruns in Java projects. In: ISSRE (2020)

    Google Scholar 

  25. Lam, W., Winter, S., Wei, A., Xie, T., Marinov, D., Bell, J.: A large-scale longitudinal study of flaky tests. In: OOPSLA (2020)

    Google Scholar 

  26. Lucas, E.: Récréations mathématiques (1894)

    Google Scholar 

  27. Luo, Q., Hariri, F., Eloussi, L., Marinov, D.: An empirical analysis of flaky tests. In: FSE (2014)

    Google Scholar 

  28. Maven (2020),

  29. Maven Surefire plugin (2020),

  30. Memon, A., Gao, Z., Nguyen, B., Dhanda, S., Nickell, E., Siemborski, R., Micco, J.: Taming Google-scale continuous testing. In: ICSE SEIP (2017)

    Google Scholar 

  31. Micco, J.: The state of continuous integration testing at Google. In: ICST (2017)

    Google Scholar 

  32. Muşlu, K., Soran, B., Wuttke, J.: Finding bugs by isolating unit tests. In: ESEC/FSE (2011)

    Google Scholar 

  33. Nie, C., Leung, H.: A survey of combinatorial testing. ACM Comput. Surv. (2011)

    Google Scholar 

  34. Ollis, M.: Sequenceable groups and related topics. Electronic Journal of Combinatorics (2013)

    Google Scholar 

  35. pytest (2020),

  36. RSpec (2020),

  37. Shi, A., Lam, W., Oei, R., Xie, T., Marinov, D.: iFixFlakies: A framework for automatically fixing order-dependent flaky tests. In: ESEC/FSE (2019)

    Google Scholar 

  38. Spock (2019),

  39. StackExchange – Covering pairs with permutations (2020),

  40. Test Verification (2019),

  41. TestNG (2019),

  42. Tillson, T.W.: A Hamiltonian decomposition of \(K_{2m}^{*}\), \(2m\ge 8\). Journal of Combinatorial Theory, Series B (1980)

    Google Scholar 

  43. TotT: Avoiding flakey tests (2019),

  44. TuscanSquare (2020),

  45. Yoo, S., Harman, M.: Regression testing minimization, selection and prioritization: A survey. Software Testing, Verification & Reliability (2012)

    CrossRef  Google Scholar 

  46. Zeller, A., Hildebrandt, R.: Simplifying and isolating failure-inducing input. TSE (2002)

    Google Scholar 

  47. Zhang, S., Jalali, D., Wuttke, J., Muşlu, K., Lam, W., Ernst, M.D., Notkin, D.: Empirically revisiting the test independence assumption. In: ISSTA (2014)

    Google Scholar 

  48. Ziftci, C., Reardon, J.: Who broke the build?: Automatically identifying changes that induce test failures in continuous integration at Google scale. In: ICSE (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Tao Xie .

Editor information

Editors and Affiliations

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and Permissions

Copyright information

© 2021 The Author(s)

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Wei, A., Yi, P., Xie, T., Marinov, D., Lam, W. (2021). Probabilistic and Systematic Coverage of Consecutive Test-Method Pairs for Detecting Order-Dependent Flaky Tests. In: Groote, J.F., Larsen, K.G. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2021. Lecture Notes in Computer Science(), vol 12651. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72015-5

  • Online ISBN: 978-3-030-72016-2

  • eBook Packages: Computer ScienceComputer Science (R0)