Empirical Software Engineering

, Volume 16, Issue 3, pp 295–324 | Cite as

The role of non-exact replications in software engineering experiments

  • Natalia Juristo
  • Sira VegasEmail author


In no science or engineering discipline does it make sense to speak of isolated experiments. The results of a single experiment cannot be viewed as representative of the underlying reality. Experiment replication is the repetition of an experiment to double-check its results. Multiple replications of an experiment increase the confidence in its results. Software engineering has tried its hand at the identical (exact) replication of experiments in the way of the natural sciences (physics, chemistry, etc.). After numerous attempts over the years, apart from experiments replicated by the same researchers at the same site, no exact replications have yet been achieved. One key reason for this is the complexity of the software development setting, which prevents the many experimental conditions from being identically reproduced. This paper reports research into whether non-exact replications can be of any use. We propose a process aimed at researchers running non-exact replications. Researchers enacting this process will be able to identify new variables that are possibly having an effect on experiment results. The process consists of four phases: replication definition and planning, replication operation and analysis, replication interpretation, and analysis of the replication’s contribution. To test the effectiveness of the proposed process, we have conducted a multiple-case study, revealing the variables learned from two different replications of an experiment.


Experimentation Experiment replication Combination of experiment results 



We would like to thank the reviewers of the paper for their thorough and insightful comments on this paper. They have all unquestionably helped to improve this work. We also would like to thank Óscar Dieste for sharing with us his deep knowledge on meta-analysis, and for the fruitful conversation on random variations among experiments’ results. This work was funded by research grant TIN2008-00555 of the Spanish Ministry of Science and Innovation.


  1. Basili VR, Selby RW (1985) Comparing the effectiveness of software testing strategies. Department of Computer Science. University of Maryland. Technical Report TR-1501. College ParkGoogle Scholar
  2. Basili VR, Selby RW (1987) Comparing the effectiveness of software testing strategies. IEEE Trans Software Eng SE-13(12):1278–1296CrossRefGoogle Scholar
  3. Close F (1991) Too hot to handle: the story of the race for cold fusion. Princeton University PressGoogle Scholar
  4. Conradi R, Basili VR, Carver J, Shull F, Travassos GH (2001) A pragmatic documents standard for an experience library: roles, documents, contents and structure. University of Maryland Technical Report. CS-TR-4235Google Scholar
  5. Gómez OS, Juristo N, Vegas S (2010) Replications types in other experimental disciplines. Submitted to International Symposium on Empirical Software Engineering and Measurement (ESEM’10). Bolzano, ItalyGoogle Scholar
  6. Hedges LV, Olkin I (1985) Statistical methods for meta-analysis. Orlando Academic PressGoogle Scholar
  7. Juristo N, Moreno AM, Vegas S (2004) Reviewing 25 years of testing technique experiments. Empirical Softw Eng 9(1):7–44CrossRefGoogle Scholar
  8. Juristo N, Vegas S (2003) Functional testing, structural testing and code reading: What fault type do they each detect? Empirical Methods and Studies in Software Engineering- Experiences from ESERNET. Springer-Verlag. Volume 2785. Chapter 12, pp 235–261Google Scholar
  9. Kamsties S, Lott C (1995) An empirical evaluation of three defect detection techniques. Technical Report ISERN 95-02, Dept. Computer Science, University of KaiserslauternGoogle Scholar
  10. Kamsties E, Lott CM (1995b) An empirical evaluation of three defect-detection techniques. Proceedings of the Fifth European Software Engineering Conference. Sitges, SpainGoogle Scholar
  11. Laitenberger O, Rombach HD (2003) (Quasi-) experimental studies in industrial settings. Lecture notes on empirical software engineering. World Scientific PublishingGoogle Scholar
  12. Lindsay RM, Ehrenberg A (1993) The design of replicated studies. Am Stat 47(3):217–228CrossRefGoogle Scholar
  13. Lung J, Aranda J, Easterbrook SM, Wilson GV (2008) On the difficulty of replicating human subjects studies in software engineering. In Proceedings of the 30th International Conference on Software Engineering. Leipzig, Germany.Google Scholar
  14. Mendonça MG, Maldonado JC, de Oñoveroa MCF, Fabbri KCS, Shull F, Travassos GH, Höhn EN, Basili VR (2008) A Framework for Software Engineering Experimental Replications. In Proceedings of the 13th International Conference on Engineering of Complex Computer Systems. pp. 203–212. Belfast, Northern IrelandGoogle Scholar
  15. Miller J (2000) Applying meta-analytical procedures to software engineering experiments. J Syst Softw 54(1):29–39CrossRefGoogle Scholar
  16. Popper K (1959) The logic of scientific discovery. Hutchinson & CoGoogle Scholar
  17. Porter A, Johnson P (1997) Assessing software review meetings: results of a comparative analysis of two experimental studies. IEEE Trans Software Eng 23(3):129–145CrossRefGoogle Scholar
  18. Roper M, Wood M, Miller J (1997) An empirical evaluation of defect detection techniques. Inf Softw Technol 39:763–775CrossRefGoogle Scholar
  19. Runeson P, Andersson C, Thelin T, Amschler-Andrews A, Berling T (2006) What do we know about defect detection methods? IEEE Softw 23(3):82–90CrossRefGoogle Scholar
  20. Shull F (1998) Developing techniques for using software documents: a series of empirical studies. PhD Thesis. Department of Computer Science. University of MarylandGoogle Scholar
  21. Shull F, Carver J, Travassos GH, Maldonado JC, Conradi R, Basili VR (2003) Replicated studies: building a body of knowledge about software reading techniques. Lecture Notes on Empirical Software Engineering. Chapter 2, pp 39–84. World ScientificGoogle Scholar
  22. Shull F, Carver J, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empirical Softw Eng J 13:211–218CrossRefGoogle Scholar
  23. Sjoberg D, Hannay J, Hansen O, Kampenes V, Karahasanovic A, Liborg N, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Software Eng 31(9):733–753CrossRefGoogle Scholar
  24. Thompson SG, Sharp SJ (1999) Explaining hetegoreneity in meta-analysis: a comparison of methods. Stat Med 18(20):2693–708CrossRefGoogle Scholar
  25. Vegas S, Juristo N, Moreno AM, Solari M, Letelier P (2006) Analysis of the Influence of Communication between Researchers on Experiment Replication. Proceedings of the International Symposium on Empirical Software Engineering (ISESE’06). pp 28–37, 2006, Rio de Janeiro, Brazil.Google Scholar
  26. Whitehead A (2002) Meta-analysis of controlled clinical trials. WileyGoogle Scholar
  27. Wood M, Roper M, Brooks A, Miller J (1997) Comparing and combining software defect detection techniques: a replicated empirical study. Proceedings of the 6th European Software Engineering Conference. Zurich, SwitzerlandGoogle Scholar
  28. Yin RK (2008) Case Study Research: Design and Methods. 4th edition. Applied Social Research Methods Series. Vol. 5. Sage PublicationsGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Facultad de InformáticaUniversidad Politécnica de MadridMadridSpain

Personalised recommendations