Rotten green tests in Java, Pharo and Python

Aranega, Vincent; Delplanque, Julien; Martinez, Matias; Black, Andrew P.; Ducasse, Stéphane; Etien, Anne; Fuhrman, Christopher; Polito, Guillermo

doi:10.1007/s10664-021-10016-2

Rotten green tests in Java, Pharo and Python

An empirical study

Published: 24 September 2021

Volume 26, article number 130, (2021)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Vincent Aranega¹,
Julien Delplanque¹,
Matias Martinez²,
Andrew P. Black³,
Stéphane Ducasse¹,
Anne Etien¹,
Christopher Fuhrman⁴ &
…
Guillermo Polito¹

466 Accesses
Explore all metrics

Abstract

Rotten Green Tests are tests that pass, but not because the assertions they contain are true: a rotten test passes because some or all of its assertions are not actually executed. The presence of a rotten green test is a test smell, and a bad one, because the existence of a test gives us false confidence that the code under test is valid, when in fact that code may not have been tested at all. This article reports on an empirical evaluation of the tests in a corpus of projects found in the wild. We selected approximately one hundred mature projects written in each of Java, Pharo, and Python. We looked for rotten green tests in each project, taking into account test helper methods, inherited helpers, and trait composition. Previous work has shown the presence of rotten green tests in Pharo projects; the results reported here show that they are also present in Java and Python projects, and that they fall into similar categories. Furthermore, we found code bugs that were hidden by rotten tests in Pharo and Python. We also discuss two test smells —missed fail and missed skip —that arise from the misuse of testing frameworks, and which we observed in tests written in all three languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-based testing leveraged for automated web tests

Article 27 November 2021

Improving Android app exploratory testing with UI test cases using code change analysis

Article 04 April 2024

Sampling in software engineering research: a critical review and guidelines

Article 28 April 2022

Notes

https://www.tiobe.com/tiobe-index/
https://redmonk.com/sogrady/2021/03/01/language-rankings-1-21/
https://github.com/apache/commons-collections/blob/master/src/test/java/org/apache/commons/collections4/set/ListOrderedSetTest.java#L129
https://github.com/alibaba/Sentinel/blob/103fa307e57de1b6660a8a004e9d8f18283b18c9/sentinel-core/src/test/java/com/alibaba/csp/sentinel/slots/statistic/metric/BucketLeapArrayTest.java#L209
See commit at https://github.com/alibaba/Sentinel/commit/a65d16083dffd56069c0694d0f5417454d518b22#diff-c85162534c5c25c163e1279cd8f926b7L174
A false negative would be generated by a call site that the analysis labelled “executed” but that was not actually executed.
https://docs.python.org/fr/3/library/unittest.html
https://docs.pytest.org/en/latest/index.html
https://github.com/StevenCostiou/reflectivipy/
A list of the assertion methods in unittest can be found at https://docs.python.org/3/library/unittest.html
PYPL (PopularitY of Programming Language Index) https://pypl.github.io/PYPL.html was created by analyzing how often language tutorials are sought on Google. By this metric, Python and Java were the two top programming languages in March 2021.
Raw data are available at https://github.com/rmod-team/2020-rotten-green-tests-experiment-data
https://github.com/pallets/jinja
Repository accessed September 2019, commit 91a404073acac40a7945bf7d584e8b30bc7a08cb
with commit 9ca80538d9e9418ae658772516f9b7dfb1e02ccd
https://github.com/amaembo/streamex/blob/1190608bda70885f55ec791ebc0e76f89006db6a/src/test/java/one/util/streamex/InternalsTest.java#L59
https://docs.pytest.org/en/latest/reference/reference.html#pytest.raises
https://octoverse.github.com/
See the discussion on StackOverflow at https://stackoverflow.com/q/12939362/1168342
See https://docs.pytest.org/en/reorganize-docs/new-docs/user/skipping.html.

References

Baudry B, Fleurey F, Jézéquel JM, Traon YL (2005) Automatic test case optimization: A bacteriologic algorithm. IEEE Softw 22(2):76–82
Article Google Scholar
Baudry B, Fleurey F, Traon YL (2006) Improving test suites for efficient fault localization. In: ICSE ’06: Proceeding of the 28th international conference on Software engineering. https://doi.org/10.1145/1134285.1134299. ACM Press, New York, pp 82–91
Bavota G, Qusef A, Oliveto R, Lucia AD, Binkley D (2012) An empirical analysis of the distribution of unit test smells and their impact on software maintenance. In: International conference on software maintenance (ICSM). https://doi.org/10.1109/ICSM.2012.6405253. IEEE, pp 56–65
Beszedes A, Gergely T, Schrettner L, Jasz J, Lango L, Gyimothy T (2012) Code coverage-based regression test selection and prioritization in WebKit. In: 2012 28th IEEE international conference on software maintenance (ICSM). https://doi.org/10.1109/ICSM.2012.6405252, pp 46–55
Black AP, Ducasse S, Nierstrasz O, Pollet D, Cassou D, Denker M (2009) Pharo by example. Square Bracket Associates, Kehrsatz, Switzerland. http://books.pharo.org
Blondeau V, Etien A, Anquetil N, Cresson S, Croisy P, Ducasse S (2016) Test case selection in industry: An analysis of issues related to static approaches. Softw Qual J :1–35. https://doi.org/10.1007/s11219-016-9328-4
Bowes D, Tracy H, Petrié J, Shippey T, Turhan B (2017) How good are my tests?. In: Workshop on emerging trends in software metrics (WETSoM). IEEE/ACM
Breugelmans M, Van Rompaey B (2008) TestQ: Exploring structural and maintenance characteristics of unit test suites. In: International workshop on advanced software development tools and techniques (WASDeTT)
Costiou S, Aranega V, Denker M (2020) Sub-method, partial behavioral reflection with Reflectivity: Looking back on 10 years of use. Art Sci Eng Programm 4(3). https://doi.org/10.22152/programming-journal.org/2020/4/5
Csallner C, Smaragdakis Y (2004) JCrasher: an automatic robust tester for Java. Softw Pract Exper 43
Daniel B, Dig D, Gvero T, Jagannath V, Jiaa J, Mitchell D, Nogiec J, Tan SH, Marinov D (2011) Reassert: A tool for repairing broken unit tests. In: Proceedings of the 33rd international conference on software engineering, ICSE ’11. https://doi.org/10.1145/1985793.1985978. ACM, New York, pp 1010–1012
Delplanque J, Ducasse S, Black AP, Polito G (2018) Rotten green tests: a first analysis. Tech. rep., Inria
Delplanque J, Ducasse S, Black AP, Polito G, Etien A (2019) Rotten green tests. In: 2019 IEEE/ACM 41st int. conf. on software engineering (ICSE). pp 500–511. https://doi.org/10.1109/ICSE.2019.00062
DeMillo RA, Lipton RJ, Sayward FG (1978) Hints on test data selection: Help for the practicing programmer. Computer 11(4):34–41. https://doi.org/10.1109/C-M.1978.218136
Article Google Scholar
van Deursen A, Moonen L, van den Bergh A, Kok G (2001) Refactoring test code. In: Marchesi M (ed) Proceedings of the 2nd international conference on extreme programming and flexible processes (XP2001), University of Cagliari, pp 92–95
Ducasse S, Nierstrasz O, Schärli N, Wuyts R, Black AP (2006) Traits: A mechanism for fine-grained reuse. ACM Trans Program Lang Syst (TOPLAS) 28(2):331–388. https://doi.org/10.1145/1119479.1119483
Article Google Scholar
Ducasse S, Pollet D, Bergel A, Cassou D (2009) Reusing and composing tests with traits. In: TOOLS’09: Proceedings of the 47th international conference on objects, models, components, patterns, Zurich, Switzerland, pp 252–271
Dustin E, Rashka J, Paul J (1999) Automated software testing :introduction, management, and performance. Addison-Wesley Professional, Boston
Google Scholar
Gaelli M, Lanza M, Nierstrasz O, Wuyts R (2004) Ordering broken unit tests for focused debugging. In: 20th international conference on software maintenance (ICSM 2004). pp 114–123. https://doi.org/10.1109/ICSM.2004.1357796, http://scg.unibe.ch/archive/papers/Gael04aOrderingBrokenUnitTestsForFocusedDebugging.pdf
Gligoric M, Groce A, Zhang C, Sharma R, Alipour MA, Marinov D (2013) Comparing non-adequate test suites using coverage criteria. In: International symposium on software testing and analysis
Herzig K, Nagappan N (2015) Empirically detecting false test alarms using association rules. In: International conference on software engineering
Huo C, Clause J (2014) Improving oracle quality by detecting brittle assertions and unused inputs in tests. Found Softw Eng
Inozemtseva L, Holmes R (2014) Coverage is not strongly correlated with test suite effectiveness. In: International conference on software engineering
Lingampally R, Gupta A, Jalote P (2007) A multipurpose code coverage tool for Java. In: HICSS 2007. 40th Annual Hawaii International Conference on System sciences, pp 261b–261b. https://doi.org/10.1109/HICSS.2007.24
Martinez M, Etien A, Ducasse S, Fuhrman C (2020) Rtj: a Java framework for detecting and refactoring rotten green test cases. In: IEEE/ACM 42nd int. conf. on software engineering: companion proceedings (ICSE ’20 Companion), 5–11 Oct, 2020, Seoul, Republic of Korea. https://doi.org/10.1145/3377812.3382151, pp 69–72
Meszaros G (2007) XUnit test patterns – refactoring test code. Addison Wesley, Boston
Google Scholar
Mockus A, Nagappan N, Dinh-Trong TT (2009) Test coverage and post-verification defects: A multiple case study. In: Proceedings of the 2009 3rd international symposium on empirical software engineering and measurement, ESEM ’09. IEEE Computer Society, Washingto, pp 291–301. https://doi.org/10.1109/ESEM.2009.5315981
Niedermayr R, Juergens E, Wagne S (2016) Will my tests tell me if I break this code?. In: International workshop on continuous software evolution and delivery. ACM Press, pp 23–29
Poulding SM, Feldt R (2017) Generating controllably invalid and atypical inputs for robustness testing. In: IEEE international conference on software testing, verification and validation workshops. pp 81–84. https://doi.org/10.1109/ICSTW.2017.21
Reichhart S, Gîrba T, Ducasse S (2007) Rule-based assessment of test quality. In: Journal of object technology, special issue. Proceedings of TOOLS Europe 2007, vol 6/9, pp 231–251
Runeson P, Höst M (2009) Guidelines for conducting and reporting case study research in software engineering. Empir Softw Eng 14(2):131–164
Article Google Scholar
Schuler D, Zeller A (2013) Checked coverage: an indicator for oracle quality. Softw Test Verification Reliab 23:531–551. https://doi.org/10.1002/stvr.1497
Article Google Scholar
Shahrokni A, Feldt R (2011) Robustest: Towards a framework for automated testing of robustness in software. In: International conference on advances in system testing and validation LifeCycle
Silva Junior N, Rocha L, Martins LA, Machado I (2020) A survey on test practitioners’ awareness of test smells. arXiv:200305613
Tillmann N, Schulte W (2005) Parameterized unit tests. In: ESEC/SIGSOFT FSE. pp 253–262. ftp://ftp.research.microsoft.com/pub/tr/TR-2005-64.pdf,
Van Rompaey B, Du Bois B, Demeyer S (2006a) Characterizing the relative significance of a test smell. In: Proceedings of ICSM 2006. https://doi.org/10.1109/ICSM.2006.18, pp 391–400
Van Rompaey B, Du Bois B, Demeyer S (2006b) Improving test code reviews with metrics: a pilot study. Tech rep., Lab on Re-engineering, University of Antwerp
Vera-Perez O, Danglot B, Monperrus M, Baudry B (2018) A comprehensive study of pseudo-tested methods. arXiv:1807.05030

Download references

Author information

Authors and Affiliations

CNRS, Centrale Lille, Inria, UMR 9189 - CRIStAL, University of Lille, F-59000, Lille, France
Vincent Aranega, Julien Delplanque, Stéphane Ducasse, Anne Etien & Guillermo Polito
LAMIH-UMR CNRS 8201, Université Polytechnique Hauts-de-France, Valenciennes, France
Matias Martinez
Department of Computer Science, Portland State University, Portland, OR, USA
Andrew P. Black
École de Technologie Supérieure, Montreal, Quebec, Canada
Christopher Fuhrman

Authors

Vincent Aranega
View author publications
You can also search for this author in PubMed Google Scholar
Julien Delplanque
View author publications
You can also search for this author in PubMed Google Scholar
Matias Martinez
View author publications
You can also search for this author in PubMed Google Scholar
Andrew P. Black
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Ducasse
View author publications
You can also search for this author in PubMed Google Scholar
Anne Etien
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Fuhrman
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Polito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vincent Aranega.

Additional information

Communicated by: Tingting Yu

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aranega, V., Delplanque, J., Martinez, M. et al. Rotten green tests in Java, Pharo and Python. Empir Software Eng 26, 130 (2021). https://doi.org/10.1007/s10664-021-10016-2

Download citation

Accepted: 21 June 2021
Published: 24 September 2021
DOI: https://doi.org/10.1007/s10664-021-10016-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rotten green tests in Java, Pharo and Python

Abstract

Access this article

Similar content being viewed by others

Model-based testing leveraged for automated web tests

Improving Android app exploratory testing with UI test cases using code change analysis

Sampling in software engineering research: a critical review and guidelines

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Rotten green tests in Java, Pharo and Python

Abstract

Access this article

Similar content being viewed by others

Model-based testing leveraged for automated web tests

Improving Android app exploratory testing with UI test cases using code change analysis

Sampling in software engineering research: a critical review and guidelines

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation