Skip to main content
Log in

A family of experiments on test-driven development

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Context:

Test-driven development (TDD) is an agile software development approach that has been widely claimed to improve software quality. However, the extent to which TDD improves quality appears to be largely dependent upon the characteristics of the study in which it is evaluated (e.g., the research method, participant type, programming environment, etc.). The particularities of each study make the aggregation of results untenable.

Objectives:

The goal of this paper is to: increase the accuracy and generalizability of the results achieved in isolated experiments on TDD, provide joint conclusions on the performance of TDD across different industrial and academic settings, and assess the extent to which the characteristics of the experiments affect the quality-related performance of TDD.

Method:

We conduct a family of 12 experiments on TDD in academia and industry. We aggregate their results by means of meta-analysis. We perform exploratory analyses to identify variables impacting the quality-related performance of TDD.

Results:

TDD novices achieve a slightly higher code quality with iterative test-last development (i.e., ITL, the reverse approach of TDD) than with TDD. The task being developed largely determines quality. The programming environment, the order in which TDD and ITL are applied, or the learning effects from one development approach to another do not appear to affect quality. The quality-related performance of professionals using TDD drops more than for students. We hypothesize that this may be due to their being more resistant to change and potentially less motivated than students.

Conclusion:

Previous studies seem to provide conflicting results on TDD performance (i.e., positive vs. negative, respectively). We hypothesize that these conflicting results may be due to different study durations, experiment participants being unfamiliar with the TDD process, or case studies comparing the performance achieved by TDD vs. the control approach (e.g., the waterfall model), each applied to develop a different system. Further experiments with TDD experts are needed to validate these hypotheses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. For simplicity’s sake, we refer to quality and external quality interchangeably throughout the rest of the article. We acknowledge the limitations of this under the threats to validity.

  2. Note that in our experiments the programming language is confounded with other variables: IDE, testing tools, and other programming environment related variables (the use of Java implies the use of Java-related technologies, while the use of C++/C# implies the use of C++/C#-related technologies). We have grouped all confounded variables under the programming environment name.

  3. Due to space restrictions, we moved the references of the primary studies to the A.

  4. It was not feasible to compute any response ratio synthesizing the quality achieved with TDD with respect to a control approach.

  5. The outlier observed in [P35] may have been due to the small number of participants, and the larger variability of results expected in small sample sizes (Cumming 2013).

  6. Throughout the rest of the paper we refer to the treatment—terminology commonly used in experimental design and data analysis (Brown and Prescott 2014; Higgins et al. 2008; Juristo and Moreno 2001; Wohlin et al. 2012)—and the development approach (i.e., either ITL or TDD) interchangeably.

  7. https://github.com/GRISE-UPM/FiDiPro_ESEIL_TDD.

  8. Note that the implementation style influences the size of the programs. We do not claim that these are gold implementations with the optimal design and coding practices. In fact, there are several implementations of these katas in public GitHub repositories.

  9. Both measured with eclEmma: https://www.eclemma.org/

  10. Measured with muJava: https://cs.gmu.edu/~offutt/mujava/

  11. Note that the fact that participants do not have any ITL and TDD experience does not mean that they have no software testing experience. ITL and TDD have to do with knowledge of slicing and not with knowledge of testing. Therefore, participants with testing experience might conceivably have no experience with either ITL and/or TDD.

  12. https://www.eui.eu/documents/servicesadmin/deanofstudies/researchethics/guide-data-protection-research.pdf

  13. We analyzed the data with t-tests. Therefore, the mean difference (i.e., the slope of the line) provides useful information for evaluating experiment results.

  14. In our experiments, the IDEs and testing tools used with C++ and C# are different. In this study, however, we make the simplification of considering them as being a part of the same group of technologies merely for the purposes of comparison.

  15. A type of reactivity in which individuals modify an aspect of their behavior in response to their awareness of being observed.

References

  • Astels D (2003) Test driven development: A practical guide. Prentice Hall Professional Technical Reference

  • Baltes S, Diehl S (2018) Towards a theory of software development expertise. arXiv:1807.06087

  • Basili V R (1992) Software modeling and measurement: the goal/question/metric paradigm

  • Basili V R, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473

    Article  Google Scholar 

  • Beck K (2003) Test-driven development: by example. Addison-Wesley Professional

  • Bergersen GR, Sjoberg DIK, Dyba T (2014) Construction and validation of an instrument for measuring programming skill. IEEE Trans Softw Eng 40 (12):1163–1184

    Article  Google Scholar 

  • Bertolino A (2007) Software testing research: Achievements, challenges, dreams. In: 2007 Future of Software Engineering. IEEE Computer Society, pp 85–103

  • Bissi W, Neto A G S S, Emer M C F P (2016) The effects of test driven development on internal quality, external quality and productivity: A systematic review. Inf Softw Technol 74:45–54

    Article  Google Scholar 

  • Borenstein M, Hedges L V, Higgins JPT, Rothstein H R (2011) Introduction to meta-analysis. Wiley

  • Brown H, Prescott R (2014) Applied mixed models in medicine. Wiley

  • Causevic A, Sundmark D, Punnekkat S (2011) Factors limiting industrial adoption of test driven development: A systematic review. In: 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation (ICST). IEEE, pp 337–346

  • Cooper H, Patall E A (2009) The relative benefits of meta-analysis conducted with individual participant data versus aggregated data. Psychol Methods 14(2):165

    Article  Google Scholar 

  • Cumming G (2013) Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge

  • de Winter JCF (2013) Using the student’s t-test with extremely small sample sizes. Pract Assess Res Eval 18(10). [Online; accessed 28-August-2018]

  • Dieste O, Aranda A M, Uyaguari F, Turhan B, Tosun A, Fucci D, Oivo M, Juristo N (2017) Empirical evaluation of the effects of experience on code quality and programmer productivity: an exploratory study. Empir Softw Eng 22(5):2457–2542

    Article  Google Scholar 

  • Falessi D, Juristo N, Wohlin C, Turhan B, Münch J, Jedlitschka A, Oivo M (2018) Empirical software engineering experts on the use of students and professionals in experiments. Empir Softw Eng 23(1):452–489

    Article  Google Scholar 

  • Feigenspan J, Kästner C, Liebig J, Apel S, Hanenberg S (2012) Measuring programming experience. In: 2012 IEEE 20th International Conference on Program Comprehension (ICPC). IEEE, pp 73–82

  • Field A (2013) Discovering statistics using ibm spss statistics. Sage

  • Fisher DJ, Copas AJ, Tierney JF, Parmar MKB (2011) A critical review of methods for the assessment of patient-level interactions in individual participant data meta-analysis of randomized trials, and guidance for practitioners. J Clin Epidemiol 64(9):949–967

    Article  Google Scholar 

  • Fucci D, Erdogmus H, Turhan B, Oivo M, Juristo N (2017) A dissection of the test-driven development process: does it really matter to test-first or to test-last?. IEEE Trans Softw Eng 43(7):597–614

    Article  Google Scholar 

  • Gómez O S, Juristo N, Vegas S (2014) Understanding replication of experiments in software engineering: A classification. Inf Softw Technol 56 (8):1033–1048

    Article  Google Scholar 

  • Gurevitch J, Koricheva J, Nakagawa S, Stewart G (2018) Meta-analysis and the science of research synthesis. Nature 555(7695):175

    Article  Google Scholar 

  • Higgins JPT, Green S, et al. (2008) Cochrane handbook for systematic reviews of interventions, vol 5. Wiley Online Library

  • ISO/IEC 25010:2011 (2011) https://www.iso.org/obp/ui/#iso:std:iso-iec:25010:ed-1:v1:en

  • Jia Y, Harman M (2011) An analysis and survey of the development of mutation testing. IEEE Trans Softw Eng 37(5):649–678

    Article  Google Scholar 

  • Jung J, Hoefig K, Domis D, Jedlitschka A, Hiller M (2013) Experimental comparison of two safety analysis methods and its replication. In: 2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. IEEE, pp 223–232

  • Juristo N, Moreno A M (2001) Basics of software engineering experimentation. Springer Science & Business Media

  • Juristo N, Vegas S (2009) Using differences among replications of software engineering experiments to gain knowledge. In: Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement. IEEE Computer Society, pp 356–366

  • Karac I, Turhan B (2018) What do we (really) know about test-driven development?. IEEE Softw 35(4):81–85

    Article  Google Scholar 

  • Karac E I, Turhan B, Juristo N (2019) A controlled experiment with novice developers on the impact of task description granularity on software quality in test-driven development. IEEE Transactions on Software Engineering

  • Kitchenham B (2008) The role of replications in empirical software engineering, a word of warning. Empir Softw Eng 13(2):219–221

    Article  Google Scholar 

  • Kollanus S (2010) Test-driven development-still a promising approach? In: Quality of Information and Communications Technology (QUATIC), 2010 Seventh International Conference on the. , pp 403–408

  • Kruger J, Dunning D (1999) Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments. J Person Soc Psychol 77(6):1121

    Article  Google Scholar 

  • Lau J, Ioannidis John PA, Schmid C H (1998) Summing up evidence: one answer is not always enough. Lancet 351(9096):123–127

    Article  Google Scholar 

  • Lumley T, Diehr P, Emerson S, Chen L (2002) The importance of the normality assumption in large public health data sets. Ann Rev Public Health 23(1):151–169

    Article  Google Scholar 

  • Mäkinen S, Münch J (2014) Effects of test-driven development: A comparative analysis of empirical studies. In: International Conference on Software Quality. Springer, pp 155–169

  • Martin CR (2001) Advanced principles, patterns and process of software development. Prentice Hall

  • Munir H, Moayyed M, Petersen K (2014) Considering rigor and relevance when evaluating test driven development: A systematic review. Inf Softw Technol 56(4):375–394

    Article  Google Scholar 

  • Myers G J, Sandler C, Badgett T (2011) The art of software testing. Wiley

  • Norman G (2010) Likert scales, levels of measurement and the laws of statistics. Adv Health Sci Educ 15(5):625–632

    Article  Google Scholar 

  • Offutt J (2018) Why don’t we publish more TDD research papers?. Softw Test Verif Reliab 28(4):e1670

    Article  Google Scholar 

  • Quinn G P, Keough M J (2002) Experimental design and data analysis for biologists. Cambridge University Press

  • Rafique Y, Mišić V B (2013) The effects of test-driven development on external quality and productivity: A meta-analysis. IEEE Trans Softw Eng 39(6):835–856

    Article  Google Scholar 

  • Riley R D, Lambert P C, Abo-Zaid G (2010) Meta-analysis of individual participant data: rationale, conduct, and reporting. Bmj 340:c221

    Article  Google Scholar 

  • Rosenthal R (1991) Meta-analytic procedures for social research, vol 6. Sage

  • Santos A, Gomez O S, Juristo N (2018a) Analyzing families of experiments in SE: a systematic mapping study. IEEE Trans Softw Eng:1. https://doi.org/10.1109/TSE.2018.2864633

  • Santos A, Jarvinen J, Partanen J, Oivo M, Juristo N (2018b) Does the performance of tdd hold across software companies and premises? a group of industrial experiments on tdd. In: International Conference on Product-Focused Software Process Improvement. Springer, pp 227–242

  • Santos A, Vegas S, Oivo M, Juristo N (2018c) Guidelines for analyzing families of experiments in SE. Submitted to IEEE Transactions on Software Engineering

  • Schmider E, Ziegler M, Danay E, Beyer L, Bühner M (2010) Is it really robust? Methodology

  • Shull F, Melnik G, Turhan B, Layman L, Diep M, Erdogmus H (2010) What do we know about test-driven development?. IEEE Softw 27(6):16–19

    Article  Google Scholar 

  • Sjøberg D IK, Bergersen G R (2018) The price of using students comments on empirical software engineering experts on the use of students and professionals in experiments. CoRR, arXiv:1810.10791

  • Thorlund K, Imberger G, Johnston B C, Walsh M, Awad T, Thabane L, Gluud C, Devereaux PJ, Wetterslev J (2012) Evolution of heterogeneity (i2) estimates and their 95% confidence intervals in large meta-analyses. PloS One 7(7):e39471

    Article  Google Scholar 

  • Tosun A, Dieste O, Fucci D, Vegas S, Turhan B, Erdogmus H, Santos A, Oivo M, Toro K, Jarvinen J et al (2017) An industry experiment on the effects of test-driven development on external quality and productivity. Empir Softw Eng 22(6):2763–2805

    Article  Google Scholar 

  • Tosun A, Dieste O, Vegas S, Pfahl D, Rungi K, Juristo N (In press) Investigating the impact of development task on external quality in test-driven development: An industry experiment. IEEE Transactions on Software Engineering

  • Vegas S, Dieste O, Juristo N (2015) Difficulties in running experiments in the software industry: experiences from the trenches. In: Proceedings of the Third International Workshop on Conducting Empirical Studies in Industry at ICSE. IEEE Press, pp 3–9

  • Vickers A J (2005) Parametric versus non-parametric statistics in the analysis of randomized trials with non-normally distributed data. BMC Med Res Methodol 5(1):35

    Article  Google Scholar 

  • Williams L, Kessler R (2002) Pair programming illuminated. Addison-Wesley Longman Publishing Co., Inc.

  • Wohlin C, Runeson P, Höst M, Ohlsson M C, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer Science & Business Media

Download references

Acknowledgements

This research was developed with the support of project PGC2018-097265-B-I00, funded by: FEDER/Spanish Ministry of Science and Innovation—Research State Agency. We would like to thank the participants in the ESEIL experiments: this research would not have been possible without your help. We would also like to thank the anonymous reviewers for their valuable comments during the review of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sira Vegas.

Additional information

Communicated by: Jeff Offutt

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: : Primary Studies

Appendix: : Primary Studies

  1. [P1]

    Aniche, M.F., Gerosa, M.A.: Most common mistakes in test-driven development practice: Results from an online survey with developers. In: Software Testing, Verification, and Validation Workshops (ICSTW), 2010 Third International Conference on, pp. 469–478. IEEE (2010)

  2. [P2]

    Bannerman, S., Martin, A.: A multiple comparative study of test-with development product changes and their effects on team speed and product quality. Empirical Software Engineering 16(2), 177–210 (2011)

  3. [P3]

    Bhat, T., Nagappan, N.: Evaluating the efficacy of test-driven development: industrial case studies. In: Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering, pp. 356–363. ACM (2006)

  4. [P4]

    Damm, L.O., Lundberg, L.: Quality impact of introducing component-level test automation and test-driven development. In: European Conference on Software Process Improvement, pp. 187–199. Springer (2007)

  5. [P5]

    Desai, C., Janzen, D.S., Clements, J.: Implications of integrating test-driven development into cs1/cs2 curricula. In: ACM SIGCSE Bulletin, vol. 41, pp. 148–152. ACM (2009)

  6. [P6]

    Dogša, T., Batič, D.: The effectiveness of test-driven development: an industrial case study. Software Quality Journal 19(4), 643–661 (2011)

  7. [P7]

    Domino, M.A., Collins, R.W., Hevner, A.R.: Controlled experimentation on adaptations of pair programming. Information Technology and Management 8(4), 297–312 (2007)

  8. [P8]

    Edwards, S.H.: Using test-driven development in the classroom: Providing students with automatic, concrete feedback on performance. In: Proceedings of the international conference on education and information systems: technologies and applications EISTA, vol. 3. Citeseer (2003)

  9. [P9]

    Erdogmus, H., Morisio, M., Torchiano, M.: On the effectiveness of the test-first approach to programming. IEEE Transactions on software Engineering 31(3), 226–237 (2005)

  10. [P10]

    George, B., Williams, L.: A structured experiment of test-driven development. Information and software Technology 46(5), 337–342 (2004)

  11. [P11]

    George, B., et al.: Analysis and quantification of test driven development approach (2002)

  12. [P12]

    Geras, A., Smith, M., Miller, J.: A prototype empirical evaluation of test driven development. In: Software Metrics, 2004. Proceedings. 10th International Symposium on, pp. 405–416. IEEE (2004)

  13. [P13]

    Gupta, A., Jalote, P.: An experimental evaluation of the effectiveness and efficiency of the test driven development. In: First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007), pp. 285–294. IEEE (2007)

  14. [P14]

    Huang, L., Holcombe, M.: Empirical investigation towards the effectiveness of test first programming. Information and Software Technology 51(1), 182–194 (2009)

  15. [P15]

    Kobayashi, O., Kawabata, M., Sakai, M., Parkinson, E.: Analysis of the interaction between practices for introducing xp effectively. In: Proceedings of the 28th international conference on Software engineering, pp. 544–550. ACM (2006)

  16. [P16]

    LeJeune, N.F.: Teaching software engineering practices with extreme programming. Journal of Computing Sciences in Colleges 21(3), 107–117 (2006)

  17. [P17]

    Lui, K.M., Chan, K.C.: Test driven development and software process improvement in china. In: International Conference on Extreme Programming and Agile Processes in Software Engineering, pp. 219–222. Springer (2004)

  18. [P18]

    Madeyski, L., Sza la, L.: The impact of test-driven development on software development productivity—an empirical study. In: European Conference on Software Process Improvement, pp. 200–211. Springer (2007)

  19. [P19]

    Marchenko, A., Abrahamsson, P., Ihme, T.: Long-term effects of test-driven development a case study. In: International Conference on Agile Processes and Extreme Programming in Software Engineering, pp. 13–22. Springer (2009)

  20. [P20]

    Maximilien, E.M., Williams, L.: Assessing test-driven development at ibm. In: Software Engineering, 2003. Proceedings. 25th International Conference on, pp. 564–569. IEEE (2003)

  21. [P21]

    McDaid, K., Rust, A., Bishop, B.: Test-driven development: can it work for spreadsheets? In: Proceedings of the 4th international workshop on End-user software engineering, pp. 25–29. ACM (2008)

  22. [P22]

    Mueller, M.M., Hagner, O.: Experiment about test-first programming. IEE Proceedings-Software 149(5), 131–136 (2002)

  23. [P23]

    Nagappan, N., Maximilien, E.M., Bhat, T., Williams, L.: Realizing quality improvement through test driven development: results and experiences of four industrial teams. Empirical Software Engineering 13(3), 289–302 (2008)

  24. [P24]

    Pančur, M., Ciglarič, M.: Impact of test-driven development on productivity, code and tests: A controlled experiment. Information and Software Technology 53(6), 557–573 (2011)

  25. [P25]

    Pancur, M., Ciglaric, M., Trampus, M., Vidmar, T.: Towards empirical evaluation of test-driven development in a university environment. In: EUROCON 2003. Computer as a Tool. The IEEE Region 8, vol. 2, pp. 83–86. IEEE (2003)

  26. [P26]

    Paula Filho, W.P.: Quality gates in use-case driven development. In: Proceedings of the 2006 international workshop on Software quality, pp. 33–38. ACM (2006)

  27. [P27]

    Rahman, S.M.: Applying the tbc method in introductory programming courses. In: Frontiers In Education Conference-Global Engineering: Knowledge Without Borders, Opportunities Without Passports, 2007. FIE’07. 37th Annual, pp. T1E–20. IEEE (2007)

  28. [P28]

    Sanchez, J.C., Williams, L., Maximilien, E.M.: On the sustained use of a test-driven development practice at ibm. In: Agile Conference (AGILE), 2007, pp. 5–14. IEEE (2007)

  29. [P29]

    Siniaalto, M., Abrahamsson, P.: Does test-driven development improve the program code? alarming results from a comparative case study. In: Balancing Agility and Formalism in Software Engineering, pp. 143–156. Springer (2008)

  30. [P30]

    Slyngstad, O.P.N., Li, J., Conradi, R., Rønneberg, H., Landre, E., Wesenberg, H.: The impact of test driven development on the evolution of a reusable framework of components–an industrial case study. In: Software Engineering Advances, 2008. ICSEA’08. The Third International Conference on, pp. 214–223. IEEE (2008)

  31. [P31]

    Vu, J.H., Frojd, N., Shenkel-Therolf, C., Janzen, D.S.: Evaluating test-driven development in an industry-sponsored capstone project. In: Proceedings of the Sixth International Conference on Information Technology: New Generations, p. 229 (2009)

  32. [P32]

    Wilkerson, J.W., Nunamaker Jr, J.F., Mercer, R.: Comparing the defect reduction benefits of code inspection and test-driven development. IEEE Transactions on Software Engineering 38(3), 547 (2012)

  33. [P33]

    Xu, S., Li, T.: Evaluation of test-driven development: An academic case study. In: Software Engineering Research, Management and Applications 2009, pp. 229–238. Springer (2009)

  34. [P34]

    Yenduri, S., Perkins, A.L.: Impact of using test-driven development: A case study. Software Engineering Research and Practice 1(2006), 126–129 (2006)

  35. [P35]

    Ynchausti, R.A.: Integrating unit testing into a software development team’s process. XP 1, 84–87 (2001)

  36. [P36]

    Zielinski, K., Szmuc, T.: Preliminary analysis of the effects of pair programming and test-driven development on the external code quality. Frontiers in Artificial Intelligence and Applications p. 113 (2005)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Santos, A., Vegas, S., Dieste, O. et al. A family of experiments on test-driven development. Empir Software Eng 26, 42 (2021). https://doi.org/10.1007/s10664-020-09895-8

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-020-09895-8

Keywords

Navigation