Skip to main content
Log in

Replaying development history to assess the effectiveness of change propagation tools

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

As developers modify software entities such as functions or variables to introduce new features, enhance old ones, or fix bugs, they must ensure that other entities in the software system are updated to be consistent with these new changes. Many hard to find bugs are introduced by developers who did not notice dependencies between entities, and failed to propagate changes correctly. Most modern development environments offer tools to assist developers in propagating changes. For example, dependency browsers show static code dependencies between source code entities. Other sources of information such as historical co-change or code layout information could be used by tools to support developers in propagating changes. We present the Development Replay (DR) approach which empirically assess and compares the effectiveness of several not-yet-existing change propagation tools by reenacting the changes stored in source control repositories using these tools. We present a case study of five large open source systems with a total of over 40 years of development history. Our empirical results show that historical co-change information recovered from source control repositories along with code layout information can guide developers in propagating changes better than simple static dependency information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Using a parametric paired \(t\)-test and a non-parametric paired Wilcoxon signed rank test. The \(t\)-test is performed on the square root of the precision/recall for each change set to ensure that the data has a normal distribution, a requirement for the \(t\)-test. Due to the large number of change sets used in our analysis, the normality of the data is not a major concern as the \(t\)-test is a robust test. Nevertheless we ensure the normality to guarantee the validity of our results.

References

  • Anquetil N, Lethbridge T (1998, April) Extracting concepts from file names: a new file clustering criterion. In: Proceedings of the 20th International Conference on Software Engineering. Kyoto, Japan, pp 84–93

  • Arnold R, Bohner S (1993) Impact analysis—toward a framework for comparison. In: Proceedings of the 13th International Conference on Software Maintenance. Montréal, Quebec, Canada, pp 292–301

  • Atkins D, Ball T, Graves T, Mockus A (1999, May) Using version control data to evaluate the effectiveness of software tools. In: Proceedings of the 21st International Conference on Software Engineering. Los Angeles, California, pp 324–333

  • Baniassad EL, Murphy GC, Schwanninger C, Kircher M (2002, April) Managing crosscutting concerns during software evolution tasks: an inquisitive study. In: Proceedings of the 1st IEEE International Conference on Aspect-oriented Software Development. Enschede, The Netherlands, pp 120–126

  • Baniassad EL, Murphy GC, Schwanninger C (2003, May) Design pattern rationale graphs: linking design to source. In: Proceedings of the 25th International Conference on Software Engineering. Portland, Oregon

  • Bauer A, Pizka M (2003, September) The contribution of free software to software evolution. In: Proceedings of the 6th IEEE International Workshop on Principles of Software Evolution. Helsinki, Finland,

  • Belkin NJ (1977) The problem of matching in information retrieval. In: Theory and Application of Information Research, the Second International Research Forum in Information Science. Copenhagen, Netherlands, pp 187–197

  • Bohner S, Arnold R (1996) Software change impact analysis. IEEE Computer Soc

  • Bowman IT, Holt RC (1999, May) Reconstructing ownership architectures to help understand software systems. In: Proceedings of the 7th International Workshop on Program Comprehension. Pittsburgh, Pennsylvania

  • Briand LC, Wüst J, Lounis H (1999, August) Using coupling measurement for impact analysis in object-oriented systems. In: Proceedings of the 15th International Conference on Software Maintenance. Oxford, England, UK, pp 475–482

  • Brooks FP (1974) The mythical man-month: essays on software engineering. Addison Wesley Professional

  • Chen A, Chou E, Wong J, Yao AY, Zhang Q, Zhang S, Michail A (2001) CVSSearch: searching through source code using CVS comments. In: Proceedings of the 17th International Conference on Software Maintenance. Florence, Italy, pp 364–374

  • Chen K, Schach SR, Yu L, Offutt J, Heller GZ (2004) Open-source change logs. Empirical Software Engineering 9(197):210

    Google Scholar 

  • Cubranic D, Murphy GC (2003, May) Hipikat: recommending pertinent software development artifacts. In: Proceedings of the 25th International Conference on Software Engineering. Portland, Oregon, pp 408–419

  • Eick SG, Steffen JL, Eric J, Sumner E (1992) Seesoft—a tool for visualizing line oriented software statistics. IEEE Trans Softw Eng 18(11):957–968

    Article  Google Scholar 

  • Fenton N, Pfleeger SL, Glass RL (1994) Science and substance: A challenge to software engineers. IEEE Softw 11(4):86–95

    Article  Google Scholar 

  • Finnigan PJ, Holt RC, Kalas I, Kerr S, Kontogiannis K, Müller HA, Mylopoulos J, Perelgut SG, Stanley M, Wong K (1997) The software bookshelf. IBM Syst J 36(4):564–593.

    Google Scholar 

  • Gall H, Hajek K, Jazayeri M (1998, November) Detection of logical coupling based on product release history. In: Proceedings of the 14th International Conference on Software Maintenance. Bethesda, Washington, District of Columbia

  • Gallagher KB, Lyle JR (1991) Using program slicing in software maintenance. IEEE Trans Softw Eng 17(8):751–761

    Article  Google Scholar 

  • Glass RL (2003) Questioning the software engineering unquestionables. IEEE Softw 20(3):119–120

    Article  Google Scholar 

  • Graves TL, Karr AF, Marron JS, Siy HP (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661

    Article  Google Scholar 

  • Hassan AE, Holt RC (2004a, May) C-REX: an evolutionary code extractor for C.

  • Hassan AE, Holt RC (2004b, September) Predicting change propagation in software systems. In: Proceedings of the 20th International Conference on Software Maintenance. Chicago, USA

  • Hassan AE, Holt RC (2005, Sept) The top ten list: dynamic fault prediction. In: Proceedings of the 21th International Conference on Software Maintenance. Budapest, Hungary

  • Hassan AE, Jiang ZM, Holt RC (2005, November) Source versus object code extraction for recovering software architecture. In: Proceedings of the 12th Working Conference on Reverse Engineering. Pittsburgh, USA

  • Hull DA (1998) The TREC-7 filtering track: description and analysis. In: Voorhees EM, Harman DK (eds) Proceedings of TREC-7, 7th Text Retrieval Conference. National Institute of Standards and Technology, Gaithersburg, USA, pp 33–56

    Google Scholar 

  • Kiczales G, Lamping J, Menhdhekar A, Maeda C, Lopes C, Loingtier J-M, Irwin J, (1997) Aspect-oriented programming. In: Akit M, Matsuoka S (eds) Proceedings of the 11th European Conference on Object-oriented Programming, vol. 1241. Springer, Berlin Heidelberg New York, pp 220–242

    Google Scholar 

  • Kitchenham BA, Pickard SLPLM, Jones PW, Hoaglin DC, Emam KE, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. IEEE Trans Softw Eng 28(8):721–734

    Article  Google Scholar 

  • Lee EHS (2000) Software comprehension across levels of abstraction. Master's thesis, University of Waterloo

  • Lehman MM, Belady LA (1985) Program evolution—process of sofware change. Academic, London

    Google Scholar 

  • Miller RG (1981) Simultaneous statistical inference. Springer, Berlin Heidelberg New York

  • Mitchell M (2000, October) GCC 3.0 State of the Source. In: 4th Annual Linux Showcase and Conference. Atlanta, Georgia

  • Mockus A, Votta LG (2000, October) Identifying reasons for software change using historic databases. In: Proceedings of the 16th International Conference on Software Maintenance. San Jose, California, pp 120–130

  • Mockus A, Fielding RT, Herbsleb JD (2000, June) A case study of open source software development: the apache server. In: Proceedings of the 22nd International Conference on Software Engineering. ACM, Limerick, Ireland, pp 263–272

  • Parnas D (1972) On the criteria to be used in decomposing systems into modules. Commun ACM 15(12):1053–1058

    Article  Google Scholar 

  • Parnas D (1994, May) Software aging. In: Proceedings of the 16th International Conference on Software Engineering. Sorrento, Italy, pp 279–287

  • Penny DA (1992) The software landscape: a visual formalism for programming-in-the-large. PhD thesis, University of Toronto

  • Perry DE, Porter AA, Votta LG (2000, June) Empirical studies of software engineering: a roadmap. In: Proceedings of the 22nd International Conference on Software Engineering (ICSE)—Future of SE Track. Limerick, Ireland, pp 345–355

  • Rajlich V (1997) A model for change propagation based on graph rewriting. In: Proceedings of the 13th International Conference on Software Maintenance. Bari, Italy, pp 84–91

  • Rice J (1995) Mathematical statistics and data analysis. Duxbury

  • Robillard MP, Murphy GC (2002, May) Concern graphs: finding and describing concerns using structural program dependencies. In: Proceedings of the 24th International Conference on Software Engineering. Orlando, Florida

  • Rumbaugh J, Blaha M, Premerlani W, Eddy F, Lorensen W (1991) Object-oriented modeling and design. Prentice-Hall, Englewood Cliffs, New Jersey

    Google Scholar 

  • Shirabad JS (2003) Supporting software maintenance by mining software update records. PhD thesis, University of Ottawa

  • Sim SE, Clarke CLA, Holt RC (1998, June) Archetypal source code searching: a survey of software developers and maintainers. In: Proceedings of the 6th International Workshop on Program Comprehension. Ischia, Italy, pp 180–187

  • Sniff+ Home Page. Available online at http://www.takefive.com/

  • Stephen PS, Eick G, Mockus A, Graves TL, Karr AF (2002) Visualizing software changes. IEEE Trans Softw Eng 28(4):396–412

    Article  Google Scholar 

  • van Rijsbergen CJ (1979) Information retrieval. Butterworths, London. Available online at http://www.dcs.gla.ac.uk/Keith/Preface.html

  • Weinberg Z (2003, May) A maintenance programmer's view of GCC. In: First Annual GCC Developers' Summit. Ottawa, Canada

  • Xing Z, Stroulia E (2004, June) Understanding class evolution in object-oriented systems. In: Proceedings of the 12th International Workshop on Program Comprehension. Bari, Italy

  • Yau S, Nicholl R, Tsai J, Liu S (1988) An integrated life-cycle model for software maintenance. IEEE Trans Softw Eng 15(7):58–95

    Google Scholar 

  • Ye Y, Kishida K (2003, May) Toward an understanding of the motivation of open source software developers. In: Proceedings of the 22nd International Conference on Software Engineering. ACM, Portland, Oregon, pp 419–429

  • Yin RK (1994) Case study research: design and methods. Sage, Thousand Oaks, California

    Google Scholar 

  • Ying AT (2003) Predicting source code changes by mining revision history. Master's thesis, University of British Colombia

  • Zimmermann T, Weißgerber P, Diehl S, Zeller A (2004, May) Mining version histories to guide software changes. In: Proceedings of the 26th International Conference on Software Engineering. Edinburgh, UK

  • Zipf GK (1949) Human behavior and the principle of least effort. Addison-Wesley

Download references

Acknowledgments

The authors gratefully acknowledge the significant contributions from the members of the open source community who have given freely of their time to produce large software systems with rich and detailed source code repositories; and who assisted us in understanding and acquiring these valuable repositories. They also thank Lionel Briand for his very helpful comments and suggestions to improve the statistical analysis and presentation of our results.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed E. Hassan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hassan, A.E., Holt, R.C. Replaying development history to assess the effectiveness of change propagation tools. Empir Software Eng 11, 335–367 (2006). https://doi.org/10.1007/s10664-006-9006-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-006-9006-4

Keywords

Navigation