The secret life of test smells - an empirical study on test smell evolution and maintenance

Kim, Dong Jae; Chen, Tse-Hsun (Peter); Yang, Jinqiu

doi:10.1007/s10664-021-09969-1

The secret life of test smells - an empirical study on test smell evolution and maintenance

Published: 14 July 2021

Volume 26, article number 100, (2021)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

949 Accesses
22 Citations
Explore all metrics

Abstract

In recent years, researchers and practitioners have been studying the impact of test smells on test maintenance. However, there is still limited empirical evidence on why developers remove test smells in software maintenance and the mechanism employed for addressing test smells. In this paper, we conduct an empirical study on 12 real-world open-source systems to study the evolution and maintenance of test smells, and how test smells are related to software quality. Our results show that: 1) Although the number of test smell instances increases, test smell density decreases as systems evolve. 2) However, our qualitative analysis on those removed test smells reveals that most test smell removal (83%) is a by-product of feature maintenance activities. 45% of the removed test smells relocate to other test cases due to refactoring, while developers deliberately address the only 17% of the test smell instances, consisting of largely Exception Catch/Throw and Sleepy Test. 3) Our statistical model shows that test smell metrics can provide additional explanatory power on post-release defects over traditional baseline metrics (an average of 8.25% increase in AUC). However, most types of test smells have a minimal effect on post-release defects. Our study provides insight into how developers resolve test smells and current test maintenance practices. Future studies on test smells may consider focusing on the specific types of test smells that may have a higher correlation with defect-proneness when helping developers with test code maintenance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Who Is Afraid of Test Smells? Assessing Technical Debt from Developer Actions

Test smells 20 years later: detectability, validity, and reliability

Article Open access 20 September 2022

Investigating developers’ perception on software testability and its effects

Article 13 September 2023

Notes

https://github.com/SPEAR-SE/TestSmellEmpirical_Data
https://github.com/apache/flink/pull/4446
Logistic Regression from Lrm R package.
Redundancy analysis from the Hmisc R package.
VIF analysis from RegClass R package.

References

Akiyama F (1971) An example of software system debugging. In: Freiman CV, Griffith JE, Rosenfeld JL (eds) Information processing, Proceedings of IFIP, 1971. North-Holland, pp 353–359
AlDanial (2019) Count lines of code. https://github.com/AlDanial/cloc
Ali NB, Engström E, Taromirad M, Mousavi MR, Minhas NM, Helgesson D, Kunze S, Varshosaz M (2019) On the search for industry-relevant regression testing research. Empir Softw Eng 24(4):2020–2055
Article Google Scholar
Apache (2020) Apache jenkins. https://builds.apache.org/. Last accessed April 3, 2020
Athanasiou D, Nugroho A, Visser J, Zaidman A (2014) Test code quality and its relation to issue handling performance. IEEE Trans Softw Eng 40 (11):1100–1125
Article Google Scholar
Bavota G, Qusef A, Oliveto R, Lucia AD, Binkley DW (2012) An empirical analysis of the distribution of unit test smells and their impact on software maintenance. In: 28th IEEE international conference on software maintenance, ICSM, pp 56–65. IEEE Computer Society
Bavota G, Qusef A, Oliveto R, De Lucia A, Binkley D (2012) An empirical analysis of the distribution of unit test smells and their impact on software maintenance. In: 2012 28th IEEE international conference on software maintenance (ICSM), pp 56–65
Bavota G, Qusef A, Oliveto R, De Lucia A, Binkley D (2015) Are test smells really harmful? an empirical study. Empir Softw Eng 20(4):1052–1094
Article Google Scholar
Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t touch my code!: examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, SIGSOFT/FSE ’11, pp 4–14
Biyani S, Santhanam P (1998) Exploring defect data from development and customer usage on software modules over multiple releases. In: Ninth international symposium on software reliability engineering, ISSRE, pp 316–320. IEEE Computer Society
Bleser JD, Nucci DD, Roover CD (2019) Assessing diffusion and perception of test smells in scala projects. In: Storey MD, Adams B, Haiduc S (eds) Proceedings of the 16th international conference on mining software repositories, MSR, pp 457–467. IEEE / ACM
Chen T, Thomas SW, Hemmati H, Nagappan M, Hassan AE (2017) An empirical study on the effect of testing on code quality using topic models: a case study on software development systems. IEEE Trans Reliab 66(3):806–824
Article Google Scholar
Chen T, Shang W, Nagappan M, Hassan AE, Thomas SW (2017) Topic-based software defect explanation. J Syst Softw 129:79–106
Article Google Scholar
Chen T-H, Thomas SW, Nagappan M, Hassan A (2012) Explaining software defects using topic models. In: Proceedings of the 9th working conference on mining software repositories, MSR ’12
Child M, Rosner P, Counsell S (2019) A comparison and evaluation of variants in the coupling between objects metric. J Syst Softw 151:120–132
Article Google Scholar
D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: Whitehead J, Zimmermann T (eds) Proceedings of the 7th international working conference on mining software repositories, MSR 2010 (Co-located with ICSE), Cape Town, South Africa, May 2-3, 2010, Proceedings, pp 31–41. IEEE Computer Society
de Pádua GB, Shang W (2018) Studying the relationship between exception handling practices and post-release defects. In: Proceedings of the 15th international conference on mining software repositories, MSR, pp 564–575
Deursen A, Moonen LM, Bergh A, Kok G (2001) Refactoring test code. Technical report, Amsterdam, The Netherlands, The Netherlands
Eck M, Palomba F, Castelluccio M, Bacchelli A (2019) Understanding flaky tests: the developer’s perspective. In: Dumas M, Pfahl D, Apel S, Russo A (eds) Proceedings of the ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ESEC/SIGSOFT FSE, pp 830–840. ACM
Garousi V, Küçük B (2018) Smells in software test code: a survey of knowledge in industry and academia. J Syst Softw 138:52–81
Article Google Scholar
Harrell FE Jr (2015) Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer, Berlin
Jiarpakdee J, Tantithamthavorn C, Hassan AE (2018) The impact of correlated metrics on defect models. arXiv:1801.10271
Junior NS, Soares LR, Martins LA, Machado I (2020a) A survey on test practitioners’ awareness of test smells. arXiv:2003.05613
Junior NS, Soares LR, Martins LA, Machado I (2020b) A survey on test practitioners’ awareness of test smells. arXiv:2003.05613
Kamei Y, Fukushima T, McIntosh S, Yamashita K, Ubayashi N, Hassan AE (2016) Studying just-in-time defect prediction using cross-project models. Empir Softw Eng 21(5):2072–2106
Article Google Scholar
Knuth DE (1981) Seminumerical Algorithms, volume 2 of The Art of Computer Programming, 2nd edn. Addison-Wesley, Reading
MATH Google Scholar
Kuhn M, Johnson K (2013) Applied predictive modeling, vol 26. Springer, Berlin
Book Google Scholar
Lam W, Godefroid P, Nath S, Santhiar A, Thummalapenta S (2019) Root causing flaky tests in a large-scale industrial setting. In: Zhang D, Møller A (eds) Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, ISSTA, pp 101–111. ACM
Levin S, Yehudai A (2017) The co-evolution of test maintenance and code maintenance through the lens of fine-grained semantic changes. In: 2017 IEEE International conference on software maintenance and evolution, ICSME, pp 35–46. IEEE Computer Society
Luo Q, Hariri F, Eloussi L, Marinov D (2014) An empirical analysis of flaky tests. In: Cheung S, Orso A, Storey M D (eds) Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, (FSE-22), pp 643–653. ACM
Meszaros G (2007) xUnit test patterns: Refactoring test code. Pearson Education, London
Google Scholar
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Schäfer W, Dwyer MB, Gruhn V (eds) 30th international conference on software engineering (ICSE ), pp 181–190. ACM
Munson JC, Khoshgoftaar TM (1992) The detection of fault-prone programs. IEEE Trans Softw Eng 18(5):423–433
Article Google Scholar
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on software engineering, pp 284–292
Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Osterweil L J, Rombach H D, Soffa M L (eds) 28th international conference on software engineering (ICSE), pp 452–461. ACM
Palomba F, Bavota G, Penta MD, Oliveto R, Lucia AD (2014) Do they really smell bad? a study on developers’ perception of bad code smells. In: 30th IEEE international conference on software maintenance and evolution, pp 101–110. IEEE Computer Society
Palomba F, Nucci DD, Panichella A, Oliveto R, Lucia AD (2016) On the diffusion of test smells in automatically generated test code: an empirical study. In: Proceedings of the 9th international workshop on search-based software testing, SBST@ICSE, pp 5–14. ACM
Palomba F, Zanoni M, Fontana FA, Lucia AD, Oliveto R (2019) Toward a smell-aware bug prediction model. IEEE Trans Softw Eng 45(2):194–218
Article Google Scholar
Peruma A, Almalki K, Newman CD, Mkaouer MW, Ouni A, Palomba F (2019) On the distribution of test smells in open source android applications: an exploratory study. In: Proceedings of the 29th annual international conference on computer science and software engineering, CASCON ’19, pp 193– 202
Peruma A, Almalki K, Newman CD, Mkaouer MW, Ouni A, Palomba F (2020) tsdetect: An open source test smells detection tool. Association for Computing Machinery, New York
Google Scholar
Pham T M-T, Yang J (2020) The secret life of commented-out source code. In: 28th IEEE/ACM international conference on program comprehension, ICSE
Pinto LS, Sinha S, Orso A (2012) Understanding myths and realities of test-suite evolution. In: Tracz W, Robillard M P, Bultan T (eds) 20th ACM SIGSOFT symposium on the foundations of software engineering (FSE-20), p 33. ACM
Piotrowski P, Madeyski L (2020) Software defect prediction using bad code smells: a systematic literature review. In: Data-Centric Business and Applications, pp 77–99
Qusef A, Elish MO, Binkley DW (2019) An exploratory study of the relationship between software test smells and fault-proneness, vol 7, pp 139526–139536
Rahman F, Devanbu PT (2011) Ownership, experience and defects: a fine-grained study of authorship. In: Taylor RN, Gall HC, Medvidovic N (eds) Proceedings of the 33rd international conference on software engineering, ICSE 2011, Waikiki, Honolulu, HI, USA, May 21-28, 2011, pp 491–500. ACM
Rodríguez-Pérez G, Robles G, Serebrenik A, Zaidman A, Germán DM, González-Barahona JM (2020) How bugs are born: a model to identify how bugs are introduced in software components. Empir Softw Eng 25(2):1294–1340
Article Google Scholar
Shamshiri S, Rojas JM, Galeotti JP, Walkinshaw N, Fraser G (2018) How do automatically generated unit tests influence software maintenance?. In: 11th IEEE international conference on software testing, verification and validation, ICST, pp 250–261. IEEE Computer Society
Shang W, Nagappan M, Hassan AE (2015) Studying the relationship between logging characteristics and the code quality of platform software. Empir Softw Eng 20(1):1–27
Article Google Scholar
Shi A, Bell J, Marinov D (2019) Mitigating the effects of flaky tests on mutation testing. In: Zhang D, Møller A (eds) Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, ISSTA. ACM, pp 112–122
Shihab E, Jiang ZM, Ibrahim WM, Adams B, Hassan AE (2010) Understanding the impact of code and process metrics on post-release defects: A case study on the eclipse project. In: Proceedings of the 2010 ACM-IEEE international symposium on empirical software engineering and measurement, vol 4. ACM
Spadini D, Palomba F, Zaidman A, Bruntink M, Bacchelli A (2018) On the relation of test smells to software code quality. In: 2018 IEEE international conference on software maintenance and evolution (ICSME), pp 1–12
Spadini D, Schvarcbacher M, Oprescu A, Bruntink M, Bacchelli A (2020) Investigating severity thresholds for test smells. In: Kim S, Gousios G, Nadi S , Hejderup J (eds) MSR ’20: 17th International conference on mining software repositories, Seoul, Republic of Korea, 29-30 June, 2020. ACM, pp 311–321
Spínola R O, Zazworka N, Vetro A, Shull F, Seaman CB (2019) Understanding automated and human-based technical debt identification approaches-a two-phase study. J Braz Comp Soc 25(1):5:1–5:21
Article Google Scholar
Tsantalis N, Mansouri M, Eshkevari LM, Mazinanian D, Dig D (2018) Accurate and efficient refactoring detection in commit history. In: Proceedings of the 40th international conference on software engineering, ICSE ’18. ACM, New York, pp 483–494
Tufano M, Palomba F, Bavota G, Penta MD, Oliveto R, Lucia AD, Poshyvanyk D (2016) An empirical investigation into the nature of test smells. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, pp 4–15
Tufano M, Palomba F, Bavota G, Oliveto R, Penta MD, Lucia AD, Poshyvanyk D (2017) When and why your code starts to smell bad (and whether the smells go away). IEEE Trans Softw Eng 43(11):1063–1088
Article Google Scholar
Van Deursen A, Moonen L, Van Den Bergh A, Kok G (2001) Refactoring test code. In: Proceedings of the 2nd international conference on extreme programming and flexible processes in software engineering (XP2001), pp 92–95
Wang S, Chen T-H, Hassan AE (2018) Understanding the factors for fast answers in technical q&a websites. Empir Softw Eng 23(3):1552–1593
Article Google Scholar
Yu CS, Treude C, Aniche MF (2019) Comprehending test code: an empirical study. arXiv:1907.13365
Zaidman A, Rompaey BV, Demeyer S, van Deursen A (2008) Mining software repositories to study co-evolution of production test code. In: 2008 1st international conference on software testing, verification, and validation, pp 220–229
Zeller A (2009) Why Programs Fail - A Guide to Systematic Debugging. 2nd edn. Academic Press, Cambridge
Zhao X, Liang J, Dang C (2019) A stratified sampling based clustering algorithm for large-scale data. Knowl Based Syst 163:416–428
Article Google Scholar
Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the third international workshop on predictor models in software engineering, PROMISE ’07, p 9

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Concordia University, Montreal, Canada
Dong Jae Kim, Tse-Hsun (Peter) Chen & Jinqiu Yang

Authors

Dong Jae Kim
View author publications
You can also search for this author in PubMed Google Scholar
Tse-Hsun (Peter) Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jinqiu Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong Jae Kim.

Additional information

Communicated by: Andy Zaidman

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Table 9 The statistics of the regression models showing additive defect explainability of PD(TEST_ PRODUCT) + PR(TEST_PROCESS) metrics over the BASE(LOC+CHURNS+PRE+COUPLING)

Full size table

Table 10 The statistics of the regression models showing additive defect explainability of PD(TEST_ PRODUCT) + PR(TEST_PROCESS) metrics over the BASE(LOC+CHURNS+PRE+COUPLING)

Full size table

Table 11 The statistics of the regression models showing additive defect explainability of PD(TEST_ PRODUCT) + PR(TEST_PROCESS) metrics over the BASE(LOC+CHURNS+PRE+COUPLING)

Full size table

Table 12 The statistics of the regression models showing additive defect explainability of PD(TEST_ PRODUCT) + PR(TEST_PROCESS) metrics over the BASE(LOC+CHURNS+PRE+COUPLING)

Full size table

Table 13 The statistics of the regression models showing additive defect explainability of PD(TEST_ PRODUCT) + PR(TEST_PROCESS) metrics over the BASE(LOC+CHURNS+PRE+COUPLING)

Full size table

Table 14 The effect size of the test smell metrics on post-release defects

Full size table

Table 15 The effect size of the test smell metrics on post-release defects

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, D.J., Chen, TH.(. & Yang, J. The secret life of test smells - an empirical study on test smell evolution and maintenance. Empir Software Eng 26, 100 (2021). https://doi.org/10.1007/s10664-021-09969-1

Download citation

Accepted: 01 April 2021
Published: 14 July 2021
DOI: https://doi.org/10.1007/s10664-021-09969-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The secret life of test smells - an empirical study on test smell evolution and maintenance

Abstract

Access this article

Similar content being viewed by others

Who Is Afraid of Test Smells? Assessing Technical Debt from Developer Actions

Test smells 20 years later: detectability, validity, and reliability

Investigating developers’ perception on software testability and its effects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The secret life of test smells - an empirical study on test smell evolution and maintenance

Abstract

Access this article

Similar content being viewed by others

Who Is Afraid of Test Smells? Assessing Technical Debt from Developer Actions

Test smells 20 years later: detectability, validity, and reliability

Investigating developers’ perception on software testability and its effects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation