Abstract
Code cloning has been very often indicated as a bad software development practice. However, many studies appearing in the literature indicate that this is not always the case. In fact, either changes occurring in cloned code are consistently propagated, or cloning is used as a sort of templating strategy, where cloned source code fragments evolve independently. This paper (a) proposes an automatic approach to classify the evolution of source code clone fragments, and (b) reports a fine-grained analysis of clone evolution in four different Java and C software systems, aimed at investigating to what extent clones are consistently propagated or they evolve independently. Also, the paper investigates the relationship between the presence of clone evolution patterns and other characteristics such as clone radius, clone size and the kind of change the clones underwent, i.e., corrective maintenance or enhancement.
Similar content being viewed by others
References
Al-Ekram R, Kasper C, Holt R, Godfrey M (2005) Cloning by accident: an empirical study of source code cloning across software systems. In: International symposium on empirical software engineering (ISESE 2005), pp 376–385
Alkhatib G (1992) The maintenance problem of application software: an empirical analysis. J Softw Maint 4(2):83–104
Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983
Antoniol G, Merlo E, Villano U, Di Penta M (2002) Analyzing cloning evolution in the Linux Kernel. Inf Softw Technol 44:755–765
Aversano L, Cerulo L, Di Penta M (2007) How clones are maintained: an empirical study. In: 11th European conference on software maintenance and reengineering, software evolution in complex software intensive systems, CSMR 2007, 21–23 March 2007, Amsterdam, The Netherlands. IEEE Computer Society, pp 81–90
Baker BS (1995) On finding duplication and near-duplication in large software systems. In: Proceedings of the working conference on reverse engineering (WCRE ’95). IEEE Computer Society, pp 86–95
Bakota T, Ferenc R, Gyimóthy T (2007) Clone smells in software evolution. In: Proceedings of the international conference on software maintenance (ICSM ’07). Paris, France. IEEE Computer Society, pp 24–33
Balazinska M, Merlo E, Dagenais M, Laguë B, Kontogiannis K (2000) Advanced clone-analysis to support object-oriented system refactoring. In: Proceedings of the working conference on reverse engineering. IEEE Computer Society, pp 98–107
Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: Proceedings of the international conference on software maintenance. IEEE Computer Society, pp 368–377
Bellon S, Koschke R, Antoniol G, Krinke J, Merlo E (2007) Comparison and evaluation of clone detection tools. IEEE Trans Softw Eng 33(9):577–591
Bouktif S, Antoniol G, Merlo E (2006) A feedback based quality assessment to support open source software evolution: the GRASS case study. In: 22nd IEEE international conference on software maintenance (ICSM 2006), 24–27 September 2006, Philadelphia, Pennsylvania, USA. IEEE Computer Society, pp 155–165
Bouktif S, Gueheneuc Y-G, Antoniol G (2006) Extracting change-patterns from cvs repositories. In: WCRE ’06: proceedings of the 13th working conference on reverse engineering (WCRE 2006). IEEE Computer Society, pp 221–230
Canfora G, Cerulo L, Di Penta M (2007) Identifying changed source code lines from version repositories. In: Proceedings of the fourth international workshop on mining software repositories, MSR 2007 (ICSE Workshop), Minneapolis, MN, USA, 19–20 May 2007. IEEE Computer Society, p 14
Casazza G, Antoniol G, Villano U, Merlo E, Di Penta M (2001) Identifying clones in the Linux Kernel. In Proceedings of the international workshop on source code analysis and manipulation. IEEE Computer Society, pp 90–97
Cordy JR (2003) Comprehending reality—practical barriers to industrial adoption of software maintenance automation. In: 11th international workshop on program comprehension (IWPC 2003), 10–11 May 2003, Portland, Oregon, USA. IEEE Computer Society, pp 196–206
Duala-Ekoko E, Robillard MP (2007) Tracking code clones in evolving software. In: ICSE ’07: proceedings of the 29th international conference on software engineering, Minneapolis, MN, USA. IEEE Computer Society, pp 158–167
Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: ICSM ’03: proceedings of 19th IEEE international conference on software maintenance, Amsterdam, Netherlands. IEEE Computer Society, pp 23–32
Gabel M, Jiang L, Su Z (2008) Scalable detection of semantic clones. In: 30th international conference on software engineering (ICSE 2008), Leipzig, Germany, 10–18 May 2008. ACM, New York, pp 321–330
Gall H, Hajek K, Jazayeri M (1998) Detection of logical coupling based on product release history. In: Proceedings of the international conference on software maintenance, pp 190–197
Gall H, Jazayeri M, Krajewski J (2003) CVS release history data for detecting logical couplings. In: IWPSE ’03: proceedings of the 6th international workshop on principles of software evolution. IEEE Computer Society, p 13
Geiger R, Fluri B, Gall HC, Pinzger M (2006) Relation of code clones and change couplings. In: Proceedings of the 9th international conference of funtamental approaches to software engineering (FASE). Lecture notes in computer science, vol. 3922, Vienna, Austria. Springer, New York, pp 411–425
Godfrey MW, Tu Q (2000) Evolution in open source software:a case study. In: Proceedings of the 2000 international conference on software maintenance, pp 131–142
Godfrey MW, Svetinovic D, Tu Q (2000) Evolution, growth, and cloning in Linux: a case study. In: CASCON workshop on detecting duplicated and near duplicated structures in largs software systems: methods and applications
Jiang L, Misherghi G, Su Z, Glondu S (2007) DECKARD: scalable and accurate tree-based detection of code clones. In: 29th international conference on software engineering (ICSE 2007), Minneapolis, MN, USA, 20–26 May 2007. IEEE Computer Society, pp 96–105
Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670
Kapser C, Godfrey MW (2004) Aiding comprehension of cloning through categorization. In: 7th international workshop on principles of software evolution (IWPSE 2004), 6–7 September 2004, Kyoto, Japan. IEEE Computer Society, pp 85–94
Kapser C, Godfrey MW (2005) Improved tool support for the investigation of duplication in software. In: 21st IEEE international conference on software maintenance (ICSM 2005), 25–30 September 2005, Budapest, Hungary. IEEE Computer Society, pp 305–314
Kapser C, Godfrey MW (2006) ‘Cloning considered harmful’ considered harmful. In: Proceedings of the 2006 working conference on reverse engineering, Benevento, Italy. IEEE Computer Society, pp 19–28
Kapser C, Anderson P, Godfrey M, Koschke R, Rieger M, van Rysselberghe F, Weißgerber P (2007) Subjectivity in clone judgment: can we ever agree? In: Duplication, redundancy, and similarity in software, Dagstuhl seminar proceedings. Internationales Begegnungs-und Forschungszentrum fuer Informatik (IBFI), Schloss Dagstuhl, Germany
Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. In: Proceedings of the European software engineering conference and the ACM symposium on the foundations of software engineering, Lisbon, Portugal. ACM, New York, pp 187–196
Krinke J (2001) Identifying similar code with program dependence graphs. In: Proceedings of the working conference on reverse engineering, Stuttgart, Germany, pp 301–309
Krinke J (2007) A study of consistent and inconsistent changes to code clones. In: 14th working conference on reverse engineering (WCRE 2007), 28–31 October 2007, Vancouver, BC, Canada, Los Alamitos, CA, USA. IEEE Computer Society, pp 170–178
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl (10):707–710
Li Z, Lu S, Myagmar S, Zhou Y (2006) Copy-paste and related bugs in large-scale software code. IEEE Trans Softw Eng 32(3):176–192
Lozano A, Wermelinger M, Nuseibeh B (2007a) Assessing the impact of bad smells using historical information. In: IWPSE ’07: ninth international workshop on principles of software evolution, New York, NY, USA. ACM, New York, pp 31–34
Lozano A, Wermelinger M, Nuseibeh B (2007b) Evaluating the harmfulness of cloning: a change based experiment. In: Proceedings of the fourth international workshop on mining software repositories, MSR 2007 (ICSE Workshop), Minneapolis, MN, USA, 19–20 May 2007. IEEE Computer Society, p 18
Marcus A, Maletic J (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of the 25th international conference on software engineering (ICSE), Portland, OR, USA. IEEE Computer Society, pp 124–135
Mayrand J, Leblanc C, Merlo E (1996) Experiment on the automatic detection of function clones in a software system using metrics. In: Proceedings of the international conference on software maintenance, Monterey, CA. IEEE Computer Society, pp 244–253
Mockus A, Votta LG (2000) Identifying reasons for software changes using historic databases. In: Proceedings of the international conference on software maintenance. IEEE Computer Society
Reiss SP (2007) Automatic code stylizing. In: 22nd IEEE/ACM international conference on automated software engineering (ASE 2007), 5–9 November 2007, Atlanta, Georgia, USA. ACM, New York, pp 74–83
Ueda Y, Kamiya T, Kusumoto S, Inoue K (2002) Gemini: maintenance support environment based on code clone analysis. In: 8th IEEE international software metrics symposium (METRICS 2002), 4–7 June 2002, Ottawa, Canada. IEEE Computer Society, pp 67–76
van Emden E, Moonen L (2002) Java quality assurance by detecting code smells. In: 9th working conference on reverse engineering (WCRE 2002), 28 October–1 November 2002, Richmond, VA, USA. IEEE Computer Society, pp 97–107
Xie Y, Engler DR (2002) Using redundancies to find errors. In: Proceedings of the 10th ACM SIGSOFT international symposium on foundations of software engineering, pp 51–60
Yin RK (2002) Case study research: design and methods, 3rd edn. Sage, London
Zimmermann T, Weisgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: ICSE ’04: proceedings of the 26th international conference on software engineering, pp 563–572
Acknowledgements
We would like to thank the anonymous reviewers for their very constructive comments on early versions of this manuscript. We also thank William Harris for his review comments that helped us to improve the draft. Luigi Cerulo, Lerina Aversano, and Massimiliano Di Penta are partially supported by the project METAMORPHOS (MEthods and Tools for migrAting software systeMs towards web and service Oriented aRchitectures: exPerimental evaluation, usability, and tecHnOlogy tranSfer), funded by MiUR (Ministero dell’Università e della Ricerca) under grant PRIN2006-2006098097. Suresh Thummalapenta is partially supported by NSF grant CCF-0725190 and ARO grant W911NF-08-1-0443.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Murray Wood
Rights and permissions
About this article
Cite this article
Thummalapenta, S., Cerulo, L., Aversano, L. et al. An empirical study on the maintenance of source code clones. Empir Software Eng 15, 1–34 (2010). https://doi.org/10.1007/s10664-009-9108-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-009-9108-x