Skip to main content
Log in

The role of non-exact replications in software engineering experiments

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript


In no science or engineering discipline does it make sense to speak of isolated experiments. The results of a single experiment cannot be viewed as representative of the underlying reality. Experiment replication is the repetition of an experiment to double-check its results. Multiple replications of an experiment increase the confidence in its results. Software engineering has tried its hand at the identical (exact) replication of experiments in the way of the natural sciences (physics, chemistry, etc.). After numerous attempts over the years, apart from experiments replicated by the same researchers at the same site, no exact replications have yet been achieved. One key reason for this is the complexity of the software development setting, which prevents the many experimental conditions from being identically reproduced. This paper reports research into whether non-exact replications can be of any use. We propose a process aimed at researchers running non-exact replications. Researchers enacting this process will be able to identify new variables that are possibly having an effect on experiment results. The process consists of four phases: replication definition and planning, replication operation and analysis, replication interpretation, and analysis of the replication’s contribution. To test the effectiveness of the proposed process, we have conducted a multiple-case study, revealing the variables learned from two different replications of an experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others


  1. Note that experience is a context variable in both the baseline experiment and the replication, as neither of the two explore its relationship to the response variable. If we get the feeling that it could be having an influence (for example, using the process proposed here), an experiment can be designed with the hypothesis that there is a relationship between technique effectiveness and experience.

  2. Note that this is what experimenters expect to happen, and it will need to be confirmed by the results of the replication. Even if the results of the replication seem to corroborate the possible influence of this variable, it will still have to be further explored in other experiments.

  3. A subject applying a testing technique may generate a test case that exercises a fault and produces a failure, but the tester may fail to identify and report the failure. For this reason, a distinction is made between detected defects and reported defects.

  4. Running a replication in the same country where the UPM experiment was run increases the possibility of many experimental conditions being the same.

  5. Note that leaving out a technique does not have an effect on the values for the response variables of the other two techniques. Therefore, the replications are comparable.

  6. In any case this change affects the reported defects response variable, which has been left out of this study.

  7. Note that leaving out a response variable does not have an effect on the values for the other response variables of the experiment. Therefore, the replications are comparable.

  8. As subject motivation increases, so does effectiveness.

  9. The more pressure the subject works under the less effective the application of the technique is.


  • Basili VR, Selby RW (1985) Comparing the effectiveness of software testing strategies. Department of Computer Science. University of Maryland. Technical Report TR-1501. College Park

  • Basili VR, Selby RW (1987) Comparing the effectiveness of software testing strategies. IEEE Trans Software Eng SE-13(12):1278–1296

    Article  Google Scholar 

  • Close F (1991) Too hot to handle: the story of the race for cold fusion. Princeton University Press

  • Conradi R, Basili VR, Carver J, Shull F, Travassos GH (2001) A pragmatic documents standard for an experience library: roles, documents, contents and structure. University of Maryland Technical Report. CS-TR-4235

  • Gómez OS, Juristo N, Vegas S (2010) Replications types in other experimental disciplines. Submitted to International Symposium on Empirical Software Engineering and Measurement (ESEM’10). Bolzano, Italy

  • Hedges LV, Olkin I (1985) Statistical methods for meta-analysis. Orlando Academic Press

  • Juristo N, Moreno AM, Vegas S (2004) Reviewing 25 years of testing technique experiments. Empirical Softw Eng 9(1):7–44

    Article  Google Scholar 

  • Juristo N, Vegas S (2003) Functional testing, structural testing and code reading: What fault type do they each detect? Empirical Methods and Studies in Software Engineering- Experiences from ESERNET. Springer-Verlag. Volume 2785. Chapter 12, pp 235–261

  • Kamsties S, Lott C (1995) An empirical evaluation of three defect detection techniques. Technical Report ISERN 95-02, Dept. Computer Science, University of Kaiserslautern

  • Kamsties E, Lott CM (1995b) An empirical evaluation of three defect-detection techniques. Proceedings of the Fifth European Software Engineering Conference. Sitges, Spain

    Google Scholar 

  • Laitenberger O, Rombach HD (2003) (Quasi-) experimental studies in industrial settings. Lecture notes on empirical software engineering. World Scientific Publishing

  • Lindsay RM, Ehrenberg A (1993) The design of replicated studies. Am Stat 47(3):217–228

    Article  Google Scholar 

  • Lung J, Aranda J, Easterbrook SM, Wilson GV (2008) On the difficulty of replicating human subjects studies in software engineering. In Proceedings of the 30th International Conference on Software Engineering. Leipzig, Germany.

  • Mendonça MG, Maldonado JC, de Oñoveroa MCF, Fabbri KCS, Shull F, Travassos GH, Höhn EN, Basili VR (2008) A Framework for Software Engineering Experimental Replications. In Proceedings of the 13th International Conference on Engineering of Complex Computer Systems. pp. 203–212. Belfast, Northern Ireland

  • Miller J (2000) Applying meta-analytical procedures to software engineering experiments. J Syst Softw 54(1):29–39

    Article  Google Scholar 

  • Popper K (1959) The logic of scientific discovery. Hutchinson & Co

  • Porter A, Johnson P (1997) Assessing software review meetings: results of a comparative analysis of two experimental studies. IEEE Trans Software Eng 23(3):129–145

    Article  Google Scholar 

  • Roper M, Wood M, Miller J (1997) An empirical evaluation of defect detection techniques. Inf Softw Technol 39:763–775

    Article  Google Scholar 

  • Runeson P, Andersson C, Thelin T, Amschler-Andrews A, Berling T (2006) What do we know about defect detection methods? IEEE Softw 23(3):82–90

    Article  Google Scholar 

  • Shull F (1998) Developing techniques for using software documents: a series of empirical studies. PhD Thesis. Department of Computer Science. University of Maryland

  • Shull F, Carver J, Travassos GH, Maldonado JC, Conradi R, Basili VR (2003) Replicated studies: building a body of knowledge about software reading techniques. Lecture Notes on Empirical Software Engineering. Chapter 2, pp 39–84. World Scientific

  • Shull F, Carver J, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empirical Softw Eng J 13:211–218

    Article  Google Scholar 

  • Sjoberg D, Hannay J, Hansen O, Kampenes V, Karahasanovic A, Liborg N, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Software Eng 31(9):733–753

    Article  Google Scholar 

  • Thompson SG, Sharp SJ (1999) Explaining hetegoreneity in meta-analysis: a comparison of methods. Stat Med 18(20):2693–708

    Article  Google Scholar 

  • Vegas S, Juristo N, Moreno AM, Solari M, Letelier P (2006) Analysis of the Influence of Communication between Researchers on Experiment Replication. Proceedings of the International Symposium on Empirical Software Engineering (ISESE’06). pp 28–37, 2006, Rio de Janeiro, Brazil.

  • Whitehead A (2002) Meta-analysis of controlled clinical trials. Wiley

  • Wood M, Roper M, Brooks A, Miller J (1997) Comparing and combining software defect detection techniques: a replicated empirical study. Proceedings of the 6th European Software Engineering Conference. Zurich, Switzerland

  • Yin RK (2008) Case Study Research: Design and Methods. 4th edition. Applied Social Research Methods Series. Vol. 5. Sage Publications

Download references


We would like to thank the reviewers of the paper for their thorough and insightful comments on this paper. They have all unquestionably helped to improve this work. We also would like to thank Óscar Dieste for sharing with us his deep knowledge on meta-analysis, and for the fruitful conversation on random variations among experiments’ results. This work was funded by research grant TIN2008-00555 of the Spanish Ministry of Science and Innovation.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Sira Vegas.

Additional information

Editor: James Miller

Rights and permissions

Reprints and permissions

About this article

Cite this article

Juristo, N., Vegas, S. The role of non-exact replications in software engineering experiments. Empir Software Eng 16, 295–324 (2011).

Download citation

  • Published:

  • Issue Date:

  • DOI: