Empirical Software Engineering

, Volume 20, Issue 4, pp 1052–1094 | Cite as

Are test smells really harmful? An empirical study

  • Gabriele Bavota
  • Abdallah Qusef
  • Rocco OlivetoEmail author
  • Andrea De Lucia
  • Dave Binkley


Bad code smells have been defined as indicators of potential problems in source code. Techniques to identify and mitigate bad code smells have been proposed and studied. Recently bad test code smells (test smells for short) have been put forward as a kind of bad code smell specific to tests such a unit tests. What has been missing is empirical investigation into the prevalence and impact of bad test code smells. Two studies aimed at providing this missing empirical data are presented. The first study finds that there is a high diffusion of test smells in both open source and industrial software systems with 86 % of JUnit tests exhibiting at least one test smell and six tests having six distinct test smells. The second study provides evidence that test smells have a strong negative impact on program comprehension and maintenance. Highlights from this second study include the finding that comprehension is 30 % better in the absence of test smells.


Test smells Unit testing Mining software repositories Controlled experiments 


  1. Abbes M, Khomh F, Guéhéneuc YG, Antoniol G (2011) An empirical study of the impact of two antipatterns, blob and spaghetti code, on program comprehension. In: Proceedings of the 15th european conference on software maintenance and reengineering. IEEE Comput CS Press, Oldenburg, pp 181–190Google Scholar
  2. Abbes M, Khomh F, Guéhéneuc YG, Antoniol G (2011) An empirical study of the impact of two antipatterns, blob and spaghetti code, on program comprehension. In: 15th european conference on software maintenance and reengineering, CSMR 2011, 1-4 March 2011. IEEE Computer Society, Oldenburg, pp 181–190Google Scholar
  3. Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10): 970–983CrossRefGoogle Scholar
  4. Arcoverde R, Garcia A, Figueiredo E (2011) Understanding the longevity of code smells: preliminary results of an explanatory survey. In: Proceedings of the international workshop on refactoring tools. ACM, pp 33–36Google Scholar
  5. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-WesleyGoogle Scholar
  6. Baker P, Evans D, Grabowski J, Neukirchen H, Zeiss B (2006) Trex-the refactoring and metrics tool for ttcn-3 test specifications. In: TAIC PART, pp 90-94Google Scholar
  7. Bavota G, Qusef A, Oliveto R, DeLucia A, Binkley D (2013) Are test smells really harmful? an empirical study. Tech. rep.
  8. Bavota G, Qusef A, Oliveto R, Lucia AD, Binkley D (2012) An empirical analysis of the distribution of unit test smells and their impact on software maintenance. In: ICSM, pp 56–65Google Scholar
  9. Breugelmans M, Van Rompaey B (2008) Testq: Exploring structural and maintenance characteristics of unit test suites. In: Proceedings of the 1st international workshop on advanced software development tools and Techniques (WASDeTT)Google Scholar
  10. Chatzigeorgiou A, Manakos A (2010) Investigating the evolution of bad smells in object-oriented code.. In: In: QUATIC, IEEE Computer SocietyGoogle Scholar
  11. Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Earlbaum AssociatesGoogle Scholar
  12. Conover WJ (1998) Practical nonparametric statistics, 3rd edn, WileyGoogle Scholar
  13. Deligiannis IS, Shepperd MJ, Roumeliotis M, Stamelos I (2003) An empirical investigation of an object-oriented design heuristic for maintainability. J Syst Softw 65(2): 127–139CrossRefGoogle Scholar
  14. van Deursen A, Moonen L (2002) The video store revisited – thoughts on refactoring and testing. In: Proceedings of International Conference on eXtreme Programming and Flexible Processes in Software Engineering (XP). Alghero, Italy, pp 71–76Google Scholar
  15. Devore JL, Farnum N (1999) Applied statistics for engineers and scientists, DuxburyGoogle Scholar
  16. van Emden E, Moonen L (2002) Java quality assurance by detecting code smells. In: Proceedings of the 9th working conference on reverse engineering (WCRE’02). IEEE CS Press.
  17. Fowler M (1999) Refactoring: improving the design of existing code. Addison-WesleyGoogle Scholar
  18. Greiler M, van Deursen A, Storey MAD (2013) Automated detection of test fixture strategies and smells. In: IEEE Sixth International Conference on Software Testing, Verification and Validation, Luxembourg, pp 322–331Google Scholar
  19. Greiler M, Zaidman A, van Deursen A, Storey MAD (2013) Strategies for avoiding text fixture smells during software evolution.. In: In: MSRGoogle Scholar
  20. Grissom RJ, Kim JJ (2005) Effect sizes for research: a broad practical approach, 2nd edn. Lawrence Earlbaum AssociatesGoogle Scholar
  21. Hindle A., Godfrey M., Holt R. (2007) Release pattern discovery via partitioning: methodology and case study. In: 4th international workshop on mining software repositories, 2007. ICSE Workshops MSR ’07Google Scholar
  22. Khomh F, Di Penta M, Gueheneuc YG (2009) An exploratory study of the impact of code smells on software change-proneness. In: Proceedings of the 2009 16th working conference on reverse engineering, WCRE ’09. IEEE Comput SocGoogle Scholar
  23. Khomh F, Penta MD, Guéhéneuc YG, Antoniol G (2012) An exploratory study of the impact of antipatterns on class change- and fault-proneness. Empir Softw Eng 17(3): 243–275CrossRefGoogle Scholar
  24. Khomh F, Vaucher S, Guéhéneuc YG, Sahraoui H (2009) A bayesian approach for the detection of code and design smells. In: Proceedings of the 9th International Conference on Quality Software. IEEE CS Press, Hong Kong, pp 305–314Google Scholar
  25. Kruskal W.H., Wallis W.A. (1952) Use of ranks in one-criterion variance analysis. J Am Stat A 47(260): 583–621zbMATHCrossRefGoogle Scholar
  26. Lanza M, Marinescu R (2006) Object-oriented metrics in practice: using software metrics to characterize,evaluate, and improve the design of object-oriented systems. SpringerGoogle Scholar
  27. Li W, Shatnawi R (2007) An empirical study of the bad smells and class error probability in the post-release object-oriented system evolution. J Syst Softw 80(7): 1120–1128CrossRefGoogle Scholar
  28. Marinescu R (2004) Detection strategies: Metrics-based rules for detecting design flaws. In: 20th international conference on software maintenance (ICSM 2004), 11-17 September 2004 Chicago, IL, USA, 350–359Google Scholar
  29. Meszaros G (2007) XUnit test patterns: refactoring test code. Addison-WesleyGoogle Scholar
  30. Moha N, Gueheneuc YG, Duchien L, Le Meur AF (2010) Decor: a method for the specification and detection of code and design smells. IEEE Trans Softw Eng 36(1): 20–36CrossRefGoogle Scholar
  31. Munro MJ (2005) Product metrics for automatic identification of “bad smell” design problems in java source-code. In: Proceedings of the 11th International Software Metrics Symposium. IEEE Computer Society PressGoogle Scholar
  32. Neukirchen H, Bisanz M (2007) Utilising code smells to detect quality problems in ttcn-3 test suites. In: Proceedings of the 19th IFIP TC6/WG6.1 international conference, and 7th international conference on Testing of Software and Communicating Systems, TestCom’07/FATES’07. Springer, Berlin, Heidelberg, pp 228–243Google Scholar
  33. Olbrich SM, Cruzes D, Sjberg DIK (2010) Are all code smells harmful? a study of god classes and brain classes in the evolution of three open source systems. In: ICSM, pp. 1–10. IEEE Computer SocietyGoogle Scholar
  34. Oppenheim AN (1992) Questionnaire design, interviewing and attitude measurement. Pinter PublishersGoogle Scholar
  35. Palomba F, Bavota G, Penta MD, Oliveto R, Lucia AD, Poshyvanyk D (2013) Detecting bad smells in source code using change history information. 2013 28th IEEE/ACM International Conference on Automated Software Engineering, ASE 2013, Silicon Valley, CA, USA, pp. 268–278Google Scholar
  36. Peters R, Zaidman A (2012) Evaluating the lifespan of code smells using software repository mining. In: European conference on software maintenance and reengineering, pp. 411–416. IEEEGoogle Scholar
  37. Pinto LS, Sinha S, Orso A (2012) Understanding myths and realities of test-suite evolution. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering, FSE ’12, pp. 33:1–33:11. ACMGoogle Scholar
  38. Qusef A, Bavota G, Oliveto R, Lucia AD, Binkley D (2011) Scotch: test-to-code traceability using slicing and conceptual coupling In: Proceedings of the 27th IEEE international conference on software maintenance. Williamsburg, VA, USA, pp 63–72Google Scholar
  39. Ratiu D., Ducasse S., Gîrba T, Marinescu R (2004) Using history information to improve design flaws detection In: Proceeding of the 8th european conference on software maintenance and reengineering (CSMR 2004), 24-26 March 2004. IEEE Computer Society, Finland, pp 223–232Google Scholar
  40. Reichhart S., Gîrba T, Ducasse S. (2007) Rule-based assessment of test quality. J Object Technol 6(9): 231–251CrossRefGoogle Scholar
  41. Ricca F, Di Penta M, Torchiano M, Tonella P, Ceccato M (2007) The role of experience and ability in comprehension tasks supported by UML stereotypes. Proceedings of 29th ICSE. IEEE Computer Society, Minneapolis, pp 375–384Google Scholar
  42. Ricca F., Penta M.D., Torchiano M., Tonella P., Ceccato M. (2010) How developers’ experience and ability influence web application comprehension tasks supported by uml stereotypes: A series of four experiments. IEEE Trans Softw Eng 36: 96–118CrossRefGoogle Scholar
  43. Simon F., Steinbr F., Lewerentz C. (2001) Metrics based refactoring. Proceedings of 5th European Conference on Software Maintenance and Reengineering. IEEE CS Press, Lisbon, pp 30–38Google Scholar
  44. Tsantalis N., Chatzigeorgiou A. (2009) Identification of move method refactoring opportunities. IEEE Trans Softw Eng 35(3): 347–367CrossRefGoogle Scholar
  45. Van Deursen A., Moonen L., Bergh A., Kok G. (2001) Refactoring test code. Tech. rep., AmsterdamGoogle Scholar
  46. Van Rompaey B., Du Bois B., Demeyer S., Rieger M. (2007) On the detection of test smells: A metrics-based approach for general fixture and eager test. IEEE Trans Softw Eng 33(12): 800–817CrossRefGoogle Scholar
  47. Wohlin C., Runeson P., Host M., Ohlsson M.C., Regnell B., Wesslen A. (2000) Experimentation in Software Engineering - An Introduction. KluwerGoogle Scholar
  48. Yamashita A., Moonen L. (2013) Exploring the impact of inter-smell relations on software maintainability: An empirical study. In: International Conference on Software Engineering (ICSE), pp. 682–691. IEEEGoogle Scholar
  49. Zaidman A., Rompaey B.V., van Deursen A., Demeyer S. (2011) Studying the co-evolution of production and test code in open source and industrial developer test processes through repository mining. Empir Softw Eng 16(3): 325–364CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Gabriele Bavota
    • 1
  • Abdallah Qusef
    • 2
  • Rocco Oliveto
    • 3
    Email author
  • Andrea De Lucia
    • 4
  • Dave Binkley
    • 5
  1. 1.University of SannioBenevento (BN)Italy
  2. 2.Princess Sumaya University for TechnologyAmmanJordan
  3. 3.University of MolisePesche (IS)Italy
  4. 4.University of SalernoFisciano (SA)Italy
  5. 5.Loyola University MarylandBaltimoreUSA

Personalised recommendations