Replication of Software Engineering Experiments

  • Natalia Juristo
  • Omar S. Gómez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7007)


Experimentation has played a major role in scientific advancement. Replication is one of the essentials of the experimental methods. In replications, experiments are repeated aiming to check their results. Successful replication increases the validity and reliability of the outcomes observed in an experiment.

There is debate about the best way of running replications of Software Engineering (SE) experiments. Some of the questions that have cropped up in this debate are, “Should replicators reuse the baseline experiment materials? Which is the adequate sort of communication among experimenters and replicators if any? What elements of the experimental structure can be changed and still be considered a replication instead of a new experiment?”. A deeper understanding of the concept of replication should help to clarify these issues as well as increase and improve replications in SE experimental practices.

In this chapter, we study the concept of replication in order to gain insight. The chapter starts with an introduction to the importance of replication and the state of replication in ESE. Then we discuss replication from both the statistical and scientific viewpoint. Based on a review of the diverse types of replication used in other scientific disciplines, we identify the different types of replication that are feasible to be run in our discipline. Finally, we present the different purposes that replication can serve in Experimental Software Engineering (ESE).


Experimental Replicaction Types of Replication Experimental Software Engineering Empirical Software Engineering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tichy, W.: Should Computer Scientists Experiment more? Computer 31(5), 32–40 (1998)CrossRefGoogle Scholar
  2. 2.
    Basili, V., Shull, F., Lanubile, F.: Building Knowledge through Families of Experiments. IEEE Transactions on Software Engineering 25(4), 456–473 (1999)CrossRefGoogle Scholar
  3. 3.
    DeMarco, T.: Software Engineering: An Idea Whose Time has Come and Gone? IEEE Software 26(4), 95–96 (2009)CrossRefGoogle Scholar
  4. 4.
    Meyer, B.: Credible Objective Answers to Fundamental Software Engineering Questions. LASER Summer School on Software Engineering (2010)Google Scholar
  5. 5.
    Meyer, B.: Empirical Research: Questions from Software Engineering. In: 4th International Symposium on Empirical Software Engineering and Measurement (ESEM 2010) (2010)Google Scholar
  6. 6.
    Brinberg, D., McGrath, J.E.: Validity and the Research Process, p. 176. Sage Publications, Inc. (June 1985)Google Scholar
  7. 7.
    Hayes, W.: Research Synthesis in Software Engineering: A Case for Meta-Analysis. In: METRICS 1999: Proceedings of the 6th International Symposium on Software Metrics, p. 143. IEEE Computer Society (1999)Google Scholar
  8. 8.
    Miller, J.: Can Results from Software Engineering Experiments be Safely Combined? In: METRICS 1999: Proceedings of the 6th International Symposium on Software Metrics, p. 152. IEEE Computer Society (1999)Google Scholar
  9. 9.
    Miller, J.: Applying Meta-analytical Procedures to Software Engineering Experiments. J. Syst. Softw. 54(1), 29–39 (2000)CrossRefGoogle Scholar
  10. 10.
    Miller, J.: Replicating Software Engineering Experiments: A poisoned Chalice or the Holy Grail. Information and Software Technology 47(4), 233–244 (2005)CrossRefGoogle Scholar
  11. 11.
    Hannay, J., Dybå, T., Arisholm, E., Sjøberg, D.: The Effectiveness of Pair Programming: A Meta-analysis. Information and Software Technology, Special Section: Software Engineering for Secure Systems 51(7), 1110–1122 (2009)CrossRefGoogle Scholar
  12. 12.
    Jørgensen, M.: A Review of Studies on Expert Estimation of Software Development Effort. Journal of Systems and Software 70(1-2), 37–60 (2004)CrossRefGoogle Scholar
  13. 13.
    Pickard, L., Kitchenham, B., Jones, P.: Combining Empirical Results in Software Engineering. Information and Software Technology 40(14), 811–821 (1998)CrossRefGoogle Scholar
  14. 14.
    Shull, F., Basili, V., Carver, J., Maldonado, J., Travassos, G., Mendonça, M., Fabbri, S.: Replicating Software Engineering Experiments: Addressing the Tacit Knowledge Problem. In: SESE 2002: Proceedings of the 2002 International Symposium on Empirical Software Engineering, p. 7. IEEE Computer Society (2002)Google Scholar
  15. 15.
    Juristo, N., Moreno, A., Vegas, S.: Reviewing 25 Years of Testing Technique Experiments. Empirical Softw. Engg. 9(1-2), 7–44 (2004)CrossRefGoogle Scholar
  16. 16.
    Basili, V., Selby, R.: Comparing the Effectiveness of Software Testing Strategies. IEEE Trans. Softw. Eng. 13(12), 1278–1296 (1987)CrossRefGoogle Scholar
  17. 17.
    Porter, A., Votta, L., Basili, V.: Comparing Detection Methods for Software Requirements Inspections: A Replicated Experiment. IEEE Trans. Softw. Eng. 21(6), 563–575 (1995)CrossRefGoogle Scholar
  18. 18.
    Fusaro, P., Lanubile, F., Visaggio, G.: A Replicated Experiment to Assess Requirements InspectionTechniques. Empirical Softw. Engg. 2(1), 39–57 (1997)CrossRefGoogle Scholar
  19. 19.
    Miller, J., Wood, M., Roper, M.: Further Experiences with Scenarios and Checklists. Empirical Software Engineering 3(1), 37–64 (1998)CrossRefGoogle Scholar
  20. 20.
    Sandahl, K., Blomkvist, O., Karlsson, J., Krysander, C., Lindvall, M., Ohlsson, N.: An Extended Replication of an Experiment for Assessing Methods for Software Requirements Inspections. Empirical Software Engineering 3(4), 327–354 (1998)CrossRefGoogle Scholar
  21. 21.
    Porter, A., Votta, L.: Comparing Detection Methods For Software Requirements Inspections: A Replication Using Professional Subjects. Empirical Software Engineering 3(4), 355–379 (1998)CrossRefGoogle Scholar
  22. 22.
    Wood, M., Roper, M., Brooks, A., Miller, J.: Comparing and Combining Software Defect Detection Techniques: A Replicated Empirical Study. SIGSOFT Softw. Eng. Notes 22(6), 262–277 (1997)CrossRefGoogle Scholar
  23. 23.
    Juristo, N., Vegas, S.: Functional Testing, Structural Testing, and Code Reading: What Fault Type Do They Each Detect? ESERNET, 208–232 (2003)Google Scholar
  24. 24.
    Vegas, S., Juristo, N., Moreno, A., Solari, M., Letelier, P.: Analysis of the Influence of Communication between Researchers on Experiment Replication. In: ISESE 2006: Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering, pp. 28–37. ACM (2006)Google Scholar
  25. 25.
    Juristo, N., Vegas, S.: Using Differences among Replications of Software Engineering Experiments to Gain Knowledge. In: ESEM 2009: Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, pp. 356–366. IEEE Computer Society (2009)Google Scholar
  26. 26.
    Shull, F., Carver, J., Vegas, S., Juristo, N.: The Role of Replications in Empirical Software Engineering. Empirical Softw. Engg. 13(2), 211–218 (2008)CrossRefGoogle Scholar
  27. 27.
    Kitchenham, B.: The Role of Replications in Empirical Software Engineering – A Word of Warning. Empirical Softw. Engg. 13(2), 219–221 (2008)CrossRefGoogle Scholar
  28. 28.
    Miller, J.: Triangulation as a Basis for Knowledge Discovery in Software Engineering. Empirical Softw. Engg. 13(2), 223–228 (2008)CrossRefGoogle Scholar
  29. 29.
    Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. L. Erlbaum Associates (1988)Google Scholar
  30. 30.
    Dybå, T., Kampenes, V., Sjøberg, D.: A Systematic Review of Statistical Power in Software Engineering Experiments. Information and Software Technology 48(8), 745–755 (2006)CrossRefGoogle Scholar
  31. 31.
    Hunter, J.: The Desperate Need for Replications. Journal of Consumer Research 28(1), 149–158 (2001)CrossRefGoogle Scholar
  32. 32.
    Kampenes, V., Dybå, T., Hannay, J., Sjøberg, D.: A Systematic Review of Effect Size in Software Engineering Experiments. Information and Software Technology 49(11-12), 1073–1086 (2007)CrossRefGoogle Scholar
  33. 33.
    La Sorte, M.A.: Replication as a Verification Technique in Survey Research: A Paradigm. The Sociological Quarterly 13(2), 218–227 (1972)CrossRefGoogle Scholar
  34. 34.
    Singh, K., Ang, S.H., Leong, S.M.: Increasing Replication for Knowledge Accumulation in Strategy Research. Journal of Management 29(4), 533–549 (2003)CrossRefGoogle Scholar
  35. 35.
    Schmidt, S.: Shall We Really Do It Again? The Powerful Concept of Replication Is Neglected in the Social Sciences. Review of General Psychology 13(2), 90–100 (2009)CrossRefGoogle Scholar
  36. 36.
    Moonesinghe, R., Khoury, M.J., Janssens, A.C.: Most Published Research Findings Are False – But a Little Replication Goes a Long Way. PLoS Med. 4(2), 218–221 (2007)CrossRefGoogle Scholar
  37. 37.
    Pfleeger, S.L.: Experimental Design and Analysis in Software Engineering: Part 2: how to set up and experiment. SIGSOFT Softw. Eng. Notes 20(1), 22–26 (1995)CrossRefGoogle Scholar
  38. 38.
    Polit, D.F., Hungler, B.P.: Nursing Research: Principles and Methods, p. 816. Lippincott Williams & Wilkins (1998)Google Scholar
  39. 39.
    Berthon, P., Pitt, L., Ewing, M., Carr, C.L.: Potential Research Space in MIS: A Framework for Envisioning and Evaluating Research Replication, Extension, and Generation. Info. Sys. Research 13, 416–427 (2002)CrossRefGoogle Scholar
  40. 40.
    Popper, K.: The Logic of Scientific Discovery. Hutchinson & Co. (1959)Google Scholar
  41. 41.
    Hempel, C.G.: Philosophy of Natural Science. Prentice-Hall (1962)Google Scholar
  42. 42.
    Campbell, D.T., Stanley, J.C.: Experimental and Quasi-Experimental Designs for Research. Houghton Mifflin Company (June 1963)Google Scholar
  43. 43.
    Collins, H.M.: Changing Order: Replication and Induction in Scientific Practice. Sage Publications (1985)Google Scholar
  44. 44.
    Broad, W., Wade, N.: Betrayers Of The Truth, Fraud and Deceit in the Halls of Science. Simon & Schuster, Inc. (1982)Google Scholar
  45. 45.
    Fahs, P.S., Morgan, L.L., Kalman, M.: A Call for Replication. Journal of Nursing Scholarship 35(1), 67–72 (2003)CrossRefGoogle Scholar
  46. 46.
    Glass, P., Avery, G.B., Subramanian, K.N.S., Keys, M.P., Sostek, A.M., Friendly, D.S.: Effect of Bright Light in the Hospital Nursery on the Incidence of Retinopathy of Prematurity. New England Journal of Medicine 313(7), 401–404 (1985)CrossRefGoogle Scholar
  47. 47.
    Ackerman, B., Sherwonit, E., Williams, J.: Reduced Incidental Light Exposure: Effect on the Development of Retinopathy of Prematurity in Low Birth Weight Infants. Pediatrics 83(6), 958–962 (1989)Google Scholar
  48. 48.
    Reynolds, J.D., Hardy, R.J., Kennedy, K.A., Spencer, R., van Heuven, W., Fielder, A.R.: Lack of Efficacy of Light Reduction in Preventing Retinopathy of Prematurity. New England Journal of Medicine 338(22), 1572–1576 (1998)CrossRefGoogle Scholar
  49. 49.
    Seiberth, V., Linderkamp, O., Knorz, M.C., Liesenhoff, H.: A Controlled Clinical Trial of Light and Retinopathy of Prematurity. Am. J. Ophthalmol. 118(4), 492–495 (1994)CrossRefGoogle Scholar
  50. 50.
    Restivo, S.: Science, Technology, and Society: An Encyclopedia, p. 728. Oxford University Press (May 2005)Google Scholar
  51. 51.
    Collins, H.: The Experimenter’s Regress as Philosophical Sociology. Studies in History and Philosophy of Science Part A 33, 149–156(8) (2002)CrossRefGoogle Scholar
  52. 52.
    Hume, D.: An Enquiry Concerning Human Understanding (1749)Google Scholar
  53. 53.
    Hempel, C.G.: Studies in the Logic of Confirmation (I.). Mind 54(213), 1–26 (1945)MathSciNetCrossRefzbMATHGoogle Scholar
  54. 54.
    Good, I.: The White Shoe Is A Red Herring. British Journal for the Philosophy of Science 17(4), 322 (1967)CrossRefGoogle Scholar
  55. 55.
    Goodman, N.: Fact, Fiction, and Forecast. Harvard University Press (1955)Google Scholar
  56. 56.
    Bayes, T.: An Essay towards solving a Problem in the Doctrine of Chances. Philosophical Transactions of the Royal Society of London (1763)Google Scholar
  57. 57.
    Fisher, R.A.: The Design of Experiments. Oliver & Boyd (1935)Google Scholar
  58. 58.
    Neyman, J.: First Course in Probability and Statistics. Henry Holt (1950)Google Scholar
  59. 59.
    Rivadula, A.: Inducción, Deducción y Decisión en las Teorías Estadísticas de la Inferencia Científica. Revista de Filosofía 9, 3–14 (1993)Google Scholar
  60. 60.
    Singh, G.: A Shift from Significance Test to Hypothesis Test trough Power Analysis in Medical Research. Journal of Postgraduate Medicine 52(2), 148–150 (2006)Google Scholar
  61. 61.
    Easley, R., Madden, C., Dunn, M.: Conducting Marketing Science: The Role of Replication in the Research Process. Journal of Business Research 48(1), 83–92 (2000)CrossRefGoogle Scholar
  62. 62.
    Bahr, H.M., Caplow, T., Chadwick, B.A.: Middletown III: Problems of Replication, Longitudinal Measurement, and Triangulation. Annu. Rev. Sociol 9(1), 243–264 (1983)CrossRefGoogle Scholar
  63. 63.
    Van IJzendoorn, M.H.: A Process Model of Replication Studies: On the Relation between Different Types of Replication. Leiden University Library (1994)Google Scholar
  64. 64.
    Evanschitzky, H., Armstrong, J.S.: Replications of Forecasting Research. International Journal of Forecasting 26(1), 4–8 (2010)CrossRefGoogle Scholar
  65. 65.
    Kantowitz, B.H., Roediger III, H.L., Elmes, D.G.: Experimental Psychology, p. 592. Wadsworth Publishing (1984)Google Scholar
  66. 66.
    Tsang, E., Kwan, K.-M.: Replication and Theory Development in Organizational Science: A Critical Realist Perspective. The Academy of Management Review 24(4), 759–780 (1999)Google Scholar
  67. 67.
    Mittelstaedt, R., Zorn, T.: Econometric Replication: Lessons from the Experimental Sciences. Quarterly Journal of Business & Economics 23(1) (1984)Google Scholar
  68. 68.
    Lindsay, R.M., Ehrenberg, A.S.C.: The Design of Replicated Studies. The American Statistician 47(3), 217–228 (1993)Google Scholar
  69. 69.
    Lykken, D.T.: Statistical Significance in Psychological Research. Psychol. Bull. 70(3), 151–159 (1968)CrossRefGoogle Scholar
  70. 70.
    Hendrick, C.: Replications, Strict Replications, and Conceptual Replications: Are They Important?, pp. 41–49. Sage, Newbury Park (1990)Google Scholar
  71. 71.
    Finifter, B.: The Generation of Confidence: Evaluating Research Findings by Random Subsample Replication. Sociological Methodology 4, 112–175 (1972)CrossRefGoogle Scholar
  72. 72.
    Kelly, C., Chase, L., Tucker, R.: Replication in Experimental Communication Research: an Analysis. Human Communication Research 5(4), 338–342 (1979)CrossRefGoogle Scholar
  73. 73.
    Leone, R., Schultz, R.: A Study of Marketing Generalizations. The Journal of Marketing 44(1), 10–18 (1980)CrossRefGoogle Scholar
  74. 74.
    Monroe, K.B.: Front Matter. The Journal of Consumer Research 19(1) pp. i–iv (1992)Google Scholar
  75. 75.
    Radder, H.: Experimental Reproducibility and the Experimenters’ Regress. PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1, 63–73 (1992)Google Scholar
  76. 76.
    Wikipedia: Reproducibility — Wikipedia, The Free Encyclopedia (2009)Google Scholar
  77. 77.
    Cartwright, N.: Replicability, Reproducibility, and Robustness: Comments on Harry Collins. History of Political Economy 23(1), 143–155 (1991)MathSciNetCrossRefGoogle Scholar
  78. 78.
    Radder, H.: In and About the World: Philosophical Studies of Science and Technology, p. 225. State University of New York Press, Albany (1996)Google Scholar
  79. 79.
    Easterbrook, S., Singer, J., Storey, M., Damian, D.: Selecting Empirical Methods for Software Engineering Research. In: Guide to Advanced Empirical Software Engineering, pp. 285–311. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  80. 80.
    Park, C.L.: What Is The Value of Replicating other Studies? Research Evaluation 13, 189–195(7) (2004)CrossRefGoogle Scholar
  81. 81.
    Almqvist, J.P.F.: Replication of Controlled Experiments in Empirical Software Engineering – A Survey (2006)Google Scholar
  82. 82.
    Krein, J.L., Knutson, C.D.: A Case for Replication: Synthesizing Research Methodologies in Software Engineering. In: 1st International Workshop on Replication in Empirical Software Engineering Research, RESER 2010 (2010)Google Scholar
  83. 83.
    Brooks, A., Daly, J., Miller, J., Roper, M., Wood, M.: Replication of experimental results in software engineering. Number ISERN–96-10 (1996)Google Scholar
  84. 84.
    Mendonça, M., Maldonado, J., de Oliveira, M., Carver, J., Fabbri, S., Shull, F., Travassos, G., Höhn, E., Basili, V.: A Framework for Software Engineering Experimental Replications. In: ICECCS 2008: Proceedings of the 13th IEEE International Conference on Engineering of Complex Computer Systems, pp. 203–212. IEEE Computer Society (2008)Google Scholar
  85. 85.
    Lung, J., Aranda, J., Easterbrook, S., Wilson, G.: On the Difficulty of Replicating Human Subjects Studies in Software Engineering. In: ICSE 2008: Proceedings of the 30th International Conference on Software Engineering, pp. 191–200. ACM (2008)Google Scholar
  86. 86.
    Mandić, V., Markkula, J., Oivo, M.: Towards Multi-Method Research Approach in Empirical Software Engineering. In: Bomarius, F., Oivo, M., Jaring, P., Abrahamsson, P. (eds.) PROFES 2009. LNBIP, vol. 32, pp. 96–110. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Natalia Juristo
    • 1
  • Omar S. Gómez
    • 1
  1. 1.Facultad de InformáticaUniversidad Politécnica de MadridMadridEspaña

Personalised recommendations