Empirical Software Engineering

, Volume 19, Issue 2, pp 277–302 | Cite as

On the role of tests in test-driven development: a differentiated and partial replication



Background: Test-Driven Development (TDD) is claimed to have positive effects on external code quality and programmers’ productivity. The main driver for these possible improvements is the tests enforced by the test-first nature of TDD as previously investigated in a controlled experiment (i.e. the original study). Aim: Our goal is to examine the nature of the relationship between tests and external code quality as well as programmers’ productivity in order to verify/ refute the results of the original study. Method: We conducted a differentiated and partial replication of the original setting and the related analyses, with a focus on the role of tests. Specifically, while the original study compared test-first vs. test-last, our replication employed the test-first treatment only. The replication involved 30 students, working in pairs or as individuals, in the context of a graduate course, and resulted in 16 software artifacts developed. We performed linear regression to test the original study’s hypotheses, and analyses of covariance to test the additional hypotheses imposed by the changes in the replication settings. Results: We found significant correlation (Spearman coefficient = 0.66, with p-value = 0.004) between the number of tests and productivity, and a positive regression coefficient (p-value = 0.011). We found no significant correlation (Spearman coefficient = 0.41 with p-value = 0.11) between the number of tests and external code quality (regression coefficient p-value = 0.0513). For both cases we observed no statistically significant interaction caused by the subject units being individuals or pairs. Further, our results are consistent with the original study although there were changes in the timing constraints for finishing the task and the enforced development processes. Conclusions: This replication study confirms the results of the original study concerning the relationship between the number of tests vs. external code quality and programmer productivity. Moreover, this replication allows us to identify additional context variables, for which the original results still hold; namely the subject unit, timing constraint and isolation of test-first process. Based on our findings, we recommend practitioners to implement as many tests as possible in order to achieve higher baselines for quality and productivity.


Test-driven development Software quality Productivity Software testing Replication 



This research is supported in part by the Finnish Funding Agency for Technology and Innovation (TEKES) under Cloud Software Program and the Academy of Finland with Grant Decision No. 260871. The authors would like to thank Hakan Erdogmus, Maurizio Morisio and Marco Torchiano for providing valuable insights along with the materials needed to conduct this replication. Authors also acknowledge the anonymous reviewers whose suggestions have significantly improved the earlier versions of the manuscript.


  1. Astels D (2003) Test-driven development: a practical guide. Prentice Hall Professional Technical ReferenceGoogle Scholar
  2. Beck K (2003) Test-driven development: by example. The Addison-Wesley signature series. Addison-WesleyGoogle Scholar
  3. Bhadauria V (2009) To test before or to test after-an experimental investigation of the impact of test driven development. PhD thesis, The University of Texas at ArlingtonGoogle Scholar
  4. Bramel D, Friend R (1981) Hawthorne, the myth of the docile worker, and class bias in psychology. Am Psychol 36(8):867CrossRefGoogle Scholar
  5. Brooks A, Roper M, Wood M, Daly J, Miller J (2008) Replication’s role in software engineering. In: Guide to advanced empirical software engineering, pp 365–379Google Scholar
  6. Carver J (2010) Towards reporting guidelines for experimental replications: a proposal. In: Proceedings of the 1st international workshop on replication in empirical software engineering researchGoogle Scholar
  7. Cousineau D, Chartier S (2010) Outliers detection and treatment: a review. Int J Psychol Res 3(1):58–67Google Scholar
  8. Dieste O, Fernandez E, García R, Juristo N (2010) Hidden evidence behind useless replications. In: Proceedings of the 1st international workshop on replication in empirical software engineering researchGoogle Scholar
  9. Erdogmus H, Morisio M, Marco T (2005) On the effectiveness of the test-first approach to programming. IEEE Trans Softw Eng 31(3):226–237CrossRefGoogle Scholar
  10. Flohr T, Schneider T (2006) Lessons learned from an xp experiment with students: test-first needs more teachings. In: Mnch J, Vierimaa M (eds) Product-focused software process improvement. Lecture notes in computer science, vol 4034. Springer, Berlin/Heidelberg, pp 305–318CrossRefGoogle Scholar
  11. George B, Williams L (2004) A structured experiment of test-driven development. Inform Softw Technol 46(5):337–342CrossRefGoogle Scholar
  12. George B, Williams L (2003) An initial investigation of test driven development in industry. In: Proceedings of the 2003 ACM symposium on applied computing, SAC ’03. ACM, New York, NY, USA, pp 1135–1139CrossRefGoogle Scholar
  13. Huang L (2007) Analysis and quantification of test first programming. PhD thesis, The University of SheffieldGoogle Scholar
  14. Janzen D, Saiedian H (2007) A leveled examination of test-driven development acceptance. In: 29th international conference on software engineering, ICSE 2007, pp 719–722Google Scholar
  15. Johnson P, Kou H (2007) Automated recognition of test-driven development with zorro. In: AGILE 2007. IEEE, pp 15–25Google Scholar
  16. Juristo N, Vegas S (2009) Using differences among replications of software engineering experiments to gain knowledge. In: Proceedings of the 2009 3rd international symposium on empirical software engineering and measurement, ESEM ’09. IEEE Computer Society, Washington, DC, USA, pp 356–366CrossRefGoogle Scholar
  17. Keefe K, Sheard J, Dick M (2006) Adopting xp practices for teaching object oriented programming. In: Proceedings of the 8th Australasian conference on computing education. ACE ’06, vol 52. Australian Computer Society, Inc., Darlinghurst, Australia, Australia, pp 91–100Google Scholar
  18. Madeyski L (2010) Test-driven development: an empirical evaluation of agile practice. Springer-Verlag New York IncGoogle Scholar
  19. Marchenko A, Abrahamsson P, Ihme T (2009) Long-term effects of test-driven development a case study. In: Abrahamsson P, Marchesi M, Maurer F (eds) Agile processes in software engineering and extreme programming. Lecture notes in business information processing, vol 31. Springer Berlin Heidelberg, pp 13–22Google Scholar
  20. Melnik G, Maurer F (2005) A cross-program investigation of students’ perceptions of agile methods. In: Proceedings. 27th international conference on software engineering, 2005. ICSE 2005, pp 481–488Google Scholar
  21. Müller M, Höfer A (2007) The effect of experience on the test-driven development process. Empir Softw Eng 12(6):593–615CrossRefGoogle Scholar
  22. Pančur M, Ciglarič M (2011) Impact of test-driven development on productivity, code and tests: a controlled experiment. Inf Softw Technol 53(6):557–573CrossRefGoogle Scholar
  23. Pedroso B, Jacobi R, Pimenta M (2010) TDD effects: are we measuring the right things? In: Agile processes in software engineering and extreme programming, pp 393–394Google Scholar
  24. Philipp M (2009) Comparison of the test-driven development processes Of novice and expert programmer pairsGoogle Scholar
  25. Rafique Y, Mišić VB (2013) The effects of test-driven development on external quality and productivity: a meta-analysis. IEEE Trans Softw Eng 39(6):836–856. doi: 10.1109/TSE.2012.28. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6197200&isnumber=6519220 CrossRefGoogle Scholar
  26. Raubenheimer D, Simpson S (1992) Analysis of covariance: an alternative to nutritional indices. Entomol Exp Appl 62(3):221–231CrossRefGoogle Scholar
  27. Sanchez JC, Williams L, Maximilien EM (2007) On the sustained use of a test-driven development practice at ibm. In: Proceedings of the AGILE 2007, AGILE ’07. IEEE Computer Society, Washington, DC, USA, pp 5–14CrossRefGoogle Scholar
  28. Shull F, Melnik G, Turhan B, Layman L, Diep M, Erdogmus H (2010) What do we know about test-driven development? IEEE Softw 27(6):16–19CrossRefGoogle Scholar
  29. Turhan B, Layman L, Diep M, Erdogmus H, Shull F (2010) How effective is test driven development? O’Reilly MediaGoogle Scholar
  30. Wohlin C (2000) Experimentation in software engineering: an introduction, vol 6. SpringerGoogle Scholar
  31. Xu S, Li T (2009) Evaluation of test-driven development: an academic case study. In: Software Engineering Research, Management and Applications, pp 229–238Google Scholar
  32. Yenduri S, Perkins L (2006) Impact of using test-driven development: a case study. In: Proceedings of the 2006 international conference on software engineering research and practice and conference on programming languages and compilers, SERP0́6, vol. 1Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of Information Processing ScienceUniversity of OuluOuluFinland

Personalised recommendations