Empirical Software Engineering

, Volume 15, Issue 1, pp 1–34

An empirical study on the maintenance of source code clones

  • Suresh Thummalapenta
  • Luigi Cerulo
  • Lerina Aversano
  • Massimiliano Di Penta
Article

Abstract

Code cloning has been very often indicated as a bad software development practice. However, many studies appearing in the literature indicate that this is not always the case. In fact, either changes occurring in cloned code are consistently propagated, or cloning is used as a sort of templating strategy, where cloned source code fragments evolve independently. This paper (a) proposes an automatic approach to classify the evolution of source code clone fragments, and (b) reports a fine-grained analysis of clone evolution in four different Java and C software systems, aimed at investigating to what extent clones are consistently propagated or they evolve independently. Also, the paper investigates the relationship between the presence of clone evolution patterns and other characteristics such as clone radius, clone size and the kind of change the clones underwent, i.e., corrective maintenance or enhancement.

Keywords

Software clones Software maintenance Mining software repositories Clone evolution 

References

  1. Al-Ekram R, Kasper C, Holt R, Godfrey M (2005) Cloning by accident: an empirical study of source code cloning across software systems. In: International symposium on empirical software engineering (ISESE 2005), pp 376–385Google Scholar
  2. Alkhatib G (1992) The maintenance problem of application software: an empirical analysis. J Softw Maint 4(2):83–104CrossRefGoogle Scholar
  3. Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983CrossRefGoogle Scholar
  4. Antoniol G, Merlo E, Villano U, Di Penta M (2002) Analyzing cloning evolution in the Linux Kernel. Inf Softw Technol 44:755–765CrossRefGoogle Scholar
  5. Aversano L, Cerulo L, Di Penta M (2007) How clones are maintained: an empirical study. In: 11th European conference on software maintenance and reengineering, software evolution in complex software intensive systems, CSMR 2007, 21–23 March 2007, Amsterdam, The Netherlands. IEEE Computer Society, pp 81–90Google Scholar
  6. Baker BS (1995) On finding duplication and near-duplication in large software systems. In: Proceedings of the working conference on reverse engineering (WCRE ’95). IEEE Computer Society, pp 86–95Google Scholar
  7. Bakota T, Ferenc R, Gyimóthy T (2007) Clone smells in software evolution. In: Proceedings of the international conference on software maintenance (ICSM ’07). Paris, France. IEEE Computer Society, pp 24–33Google Scholar
  8. Balazinska M, Merlo E, Dagenais M, Laguë B, Kontogiannis K (2000) Advanced clone-analysis to support object-oriented system refactoring. In: Proceedings of the working conference on reverse engineering. IEEE Computer Society, pp 98–107Google Scholar
  9. Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: Proceedings of the international conference on software maintenance. IEEE Computer Society, pp 368–377Google Scholar
  10. Bellon S, Koschke R, Antoniol G, Krinke J, Merlo E (2007) Comparison and evaluation of clone detection tools. IEEE Trans Softw Eng 33(9):577–591CrossRefGoogle Scholar
  11. Bouktif S, Antoniol G, Merlo E (2006) A feedback based quality assessment to support open source software evolution: the GRASS case study. In: 22nd IEEE international conference on software maintenance (ICSM 2006), 24–27 September 2006, Philadelphia, Pennsylvania, USA. IEEE Computer Society, pp 155–165Google Scholar
  12. Bouktif S, Gueheneuc Y-G, Antoniol G (2006) Extracting change-patterns from cvs repositories. In: WCRE ’06: proceedings of the 13th working conference on reverse engineering (WCRE 2006). IEEE Computer Society, pp 221–230Google Scholar
  13. Canfora G, Cerulo L, Di Penta M (2007) Identifying changed source code lines from version repositories. In: Proceedings of the fourth international workshop on mining software repositories, MSR 2007 (ICSE Workshop), Minneapolis, MN, USA, 19–20 May 2007. IEEE Computer Society, p 14Google Scholar
  14. Casazza G, Antoniol G, Villano U, Merlo E, Di Penta M (2001) Identifying clones in the Linux Kernel. In Proceedings of the international workshop on source code analysis and manipulation. IEEE Computer Society, pp 90–97Google Scholar
  15. Cordy JR (2003) Comprehending reality—practical barriers to industrial adoption of software maintenance automation. In: 11th international workshop on program comprehension (IWPC 2003), 10–11 May 2003, Portland, Oregon, USA. IEEE Computer Society, pp 196–206Google Scholar
  16. Duala-Ekoko E, Robillard MP (2007) Tracking code clones in evolving software. In: ICSE ’07: proceedings of the 29th international conference on software engineering, Minneapolis, MN, USA. IEEE Computer Society, pp 158–167Google Scholar
  17. Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: ICSM ’03: proceedings of 19th IEEE international conference on software maintenance, Amsterdam, Netherlands. IEEE Computer Society, pp 23–32Google Scholar
  18. Gabel M, Jiang L, Su Z (2008) Scalable detection of semantic clones. In: 30th international conference on software engineering (ICSE 2008), Leipzig, Germany, 10–18 May 2008. ACM, New York, pp 321–330Google Scholar
  19. Gall H, Hajek K, Jazayeri M (1998) Detection of logical coupling based on product release history. In: Proceedings of the international conference on software maintenance, pp 190–197Google Scholar
  20. Gall H, Jazayeri M, Krajewski J (2003) CVS release history data for detecting logical couplings. In: IWPSE ’03: proceedings of the 6th international workshop on principles of software evolution. IEEE Computer Society, p 13Google Scholar
  21. Geiger R, Fluri B, Gall HC, Pinzger M (2006) Relation of code clones and change couplings. In: Proceedings of the 9th international conference of funtamental approaches to software engineering (FASE). Lecture notes in computer science, vol. 3922, Vienna, Austria. Springer, New York, pp 411–425CrossRefGoogle Scholar
  22. Godfrey MW, Tu Q (2000) Evolution in open source software:a case study. In: Proceedings of the 2000 international conference on software maintenance, pp 131–142Google Scholar
  23. Godfrey MW, Svetinovic D, Tu Q (2000) Evolution, growth, and cloning in Linux: a case study. In: CASCON workshop on detecting duplicated and near duplicated structures in largs software systems: methods and applicationsGoogle Scholar
  24. Jiang L, Misherghi G, Su Z, Glondu S (2007) DECKARD: scalable and accurate tree-based detection of code clones. In: 29th international conference on software engineering (ICSE 2007), Minneapolis, MN, USA, 20–26 May 2007. IEEE Computer Society, pp 96–105Google Scholar
  25. Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670CrossRefGoogle Scholar
  26. Kapser C, Godfrey MW (2004) Aiding comprehension of cloning through categorization. In: 7th international workshop on principles of software evolution (IWPSE 2004), 6–7 September 2004, Kyoto, Japan. IEEE Computer Society, pp 85–94Google Scholar
  27. Kapser C, Godfrey MW (2005) Improved tool support for the investigation of duplication in software. In: 21st IEEE international conference on software maintenance (ICSM 2005), 25–30 September 2005, Budapest, Hungary. IEEE Computer Society, pp 305–314Google Scholar
  28. Kapser C, Godfrey MW (2006) ‘Cloning considered harmful’ considered harmful. In: Proceedings of the 2006 working conference on reverse engineering, Benevento, Italy. IEEE Computer Society, pp 19–28Google Scholar
  29. Kapser C, Anderson P, Godfrey M, Koschke R, Rieger M, van Rysselberghe F, Weißgerber P (2007) Subjectivity in clone judgment: can we ever agree? In: Duplication, redundancy, and similarity in software, Dagstuhl seminar proceedings. Internationales Begegnungs-und Forschungszentrum fuer Informatik (IBFI), Schloss Dagstuhl, GermanyGoogle Scholar
  30. Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. In: Proceedings of the European software engineering conference and the ACM symposium on the foundations of software engineering, Lisbon, Portugal. ACM, New York, pp 187–196Google Scholar
  31. Krinke J (2001) Identifying similar code with program dependence graphs. In: Proceedings of the working conference on reverse engineering, Stuttgart, Germany, pp 301–309Google Scholar
  32. Krinke J (2007) A study of consistent and inconsistent changes to code clones. In: 14th working conference on reverse engineering (WCRE 2007), 28–31 October 2007, Vancouver, BC, Canada, Los Alamitos, CA, USA. IEEE Computer Society, pp 170–178Google Scholar
  33. Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl (10):707–710MathSciNetGoogle Scholar
  34. Li Z, Lu S, Myagmar S, Zhou Y (2006) Copy-paste and related bugs in large-scale software code. IEEE Trans Softw Eng 32(3):176–192CrossRefGoogle Scholar
  35. Lozano A, Wermelinger M, Nuseibeh B (2007a) Assessing the impact of bad smells using historical information. In: IWPSE ’07: ninth international workshop on principles of software evolution, New York, NY, USA. ACM, New York, pp 31–34Google Scholar
  36. Lozano A, Wermelinger M, Nuseibeh B (2007b) Evaluating the harmfulness of cloning: a change based experiment. In: Proceedings of the fourth international workshop on mining software repositories, MSR 2007 (ICSE Workshop), Minneapolis, MN, USA, 19–20 May 2007. IEEE Computer Society, p 18Google Scholar
  37. Marcus A, Maletic J (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of the 25th international conference on software engineering (ICSE), Portland, OR, USA. IEEE Computer Society, pp 124–135Google Scholar
  38. Mayrand J, Leblanc C, Merlo E (1996) Experiment on the automatic detection of function clones in a software system using metrics. In: Proceedings of the international conference on software maintenance, Monterey, CA. IEEE Computer Society, pp 244–253Google Scholar
  39. Mockus A, Votta LG (2000) Identifying reasons for software changes using historic databases. In: Proceedings of the international conference on software maintenance. IEEE Computer SocietyGoogle Scholar
  40. Reiss SP (2007) Automatic code stylizing. In: 22nd IEEE/ACM international conference on automated software engineering (ASE 2007), 5–9 November 2007, Atlanta, Georgia, USA. ACM, New York, pp 74–83Google Scholar
  41. Ueda Y, Kamiya T, Kusumoto S, Inoue K (2002) Gemini: maintenance support environment based on code clone analysis. In: 8th IEEE international software metrics symposium (METRICS 2002), 4–7 June 2002, Ottawa, Canada. IEEE Computer Society, pp 67–76Google Scholar
  42. van Emden E, Moonen L (2002) Java quality assurance by detecting code smells. In: 9th working conference on reverse engineering (WCRE 2002), 28 October–1 November 2002, Richmond, VA, USA. IEEE Computer Society, pp 97–107Google Scholar
  43. Xie Y, Engler DR (2002) Using redundancies to find errors. In: Proceedings of the 10th ACM SIGSOFT international symposium on foundations of software engineering, pp 51–60Google Scholar
  44. Yin RK (2002) Case study research: design and methods, 3rd edn. Sage, LondonGoogle Scholar
  45. Zimmermann T, Weisgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: ICSE ’04: proceedings of the 26th international conference on software engineering, pp 563–572Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Suresh Thummalapenta
    • 1
  • Luigi Cerulo
    • 2
  • Lerina Aversano
    • 2
  • Massimiliano Di Penta
    • 2
  1. 1.North Carolina State UniversityRaleighUSA
  2. 2.Department of EngineeringUniversity of SannioBeneventoItaly

Personalised recommendations