“Cloning considered harmful” considered harmful: patterns of cloning in software

Article

Abstract

Literature on the topic of code cloning often asserts that duplicating code within a software system is a bad practice, that it causes harm to the system’s design and should be avoided. However, in our studies, we have found significant evidence that cloning is often used in a variety of ways as a principled engineering tool. For example, one way to evaluate possible new features for a system is to clone the affected subsystems and introduce the new features there, in a kind of sandbox testbed. As features mature and become stable within the experimental subsystems, they can be migrated incrementally into the stable code base; in this way, the risk of introducing instabilities in the stable version is minimized. This paper describes several patterns of cloning that we have observed in our case studies and discusses the advantages and disadvantages associated with using them. We also examine through a case study the frequencies of these clones in two medium-sized open source software systems, the Apache web server and the Gnumeric spreadsheet application. In this study, we found that as many as 71% of the clones could be considered to have a positive impact on the maintainability of the software system.

Keywords

Clone detection Clone analysis Reverse engineering Case study 

References

  1. Antoniol G, Villano U, Merlo E, Penta MD (2002) Analyzing cloning evolution in the linux kernel. Inf Softw Technol 44(13):755–765CrossRefGoogle Scholar
  2. Aversano L, Cerulo L, Di Penta M (2007) How clones are maintained: an empirical study. In: CSMR ’07: proceedings of the 11th european conference on software maintenance and reengineering. IEEE Computer Society, Los Alamitos, pp 81–90Google Scholar
  3. Baker BS (1995) On finding duplication and near-duplication in large software systems. In: WCRE ’95: proceedings of the second working conference on reverse engineering. IEEE Computer Society, Washington, DC, pp 86–95CrossRefGoogle Scholar
  4. Balazinska M, Merlo E, Dagenais M, Lague B, Kontogiannis K (1999a) Measuring clone based reengineering opportunities. In: Proceedings of the sixth international software metrics symposium. IEEE Computer Society, Los Alamitos, pp 292–303CrossRefGoogle Scholar
  5. Balazinska M, Merlo E, Dagenais M, Lague B, Kontogiannis K (1999b) Partial redesign of java software systems based on clone analysis. In: The proceedings of the 6th. working conference on reverse engineering. IEEE Computer Society, Los Alamitos, pp 326–336Google Scholar
  6. Balazinska M, Merlo E, Dagenais M, Lague B, Kontogiannis K (2000) Advanced clone analysis to support object-oriented system refactoring. In: Proceedings of the 7th. working conference on reverse engineering. IEEE Computer Society, Los Alamitos, pp 98–107Google Scholar
  7. Basit HA, Rajapakse DC, Jarzabek S (2005) Beyond templates: a study of clones in the STL and some general implications. In: ICSE ’05: proceedings of the 27th international conference on software engineering. ACM, New York, pp 451–459Google Scholar
  8. Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: ICSM ’98: proceedings of the international conference on software maintenance. IEEE Computer Society, Washington, DC, p 368Google Scholar
  9. Bellon S (2002) Detection of software clones—tool comparison experiment. In: International workshop on source code analysis and manipulation. Montreal, October 2002Google Scholar
  10. Brown WJ, Malveau RC, McCormick HW III, Mowbray TJ (1998) AntiPatterns: refactoring software, architectures, and projects in crisis, 1st edn. Wiley, New YorkGoogle Scholar
  11. Casazza G, Antoniol G, Villano U, Merlo E, Penta MD (2001) Identifying clones in the linux kernel. In: First IEEE international workshop on source code analysis and manipulation. IEEE Computer Society Press, Los Alamitos, pp 92–100Google Scholar
  12. Coplien JO (1992) Advanced C++ programming styles and idioms, 1st edn. Addison Wesley, ReadingGoogle Scholar
  13. Cordy JR (2003) Comprehending reality—practical barriers to industrial adoption of software maintenance automation. In: Proceedings of the 11th IEEE international workshop on program comprehension. IEEE Computer Society, Los Alamitos, pp 196–206Google Scholar
  14. Duala-Ekoko E, Robillard M (2007) Tracking code clones in evolving software. In: 29th international conference on software engineering (ICSE 2007). IEEE Computer Society, Los Alamitos, pp 158–167CrossRefGoogle Scholar
  15. Ducasse S, Rieger M, Demeyer S (1999) A language independent approach for detecting duplicated code. In: Proceedings ICSM’99: international conference on software maintenance. IEEE Computer Society Press, Los Alamitos, pp 109–118Google Scholar
  16. Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code, 1st edn. Addison-Wesley ProfessionalGoogle Scholar
  17. Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable object-oriented software, 1st edn. Addison-Wesley, ReadingGoogle Scholar
  18. Geiger R, Fluri B, Gall H, Pinzger M (2006) Relation of code clones and change couplings. In: Fundamental approaches to software engineering, 9th international conference, FASE 2006, Lecture notes in computer science, vol 3922. Springer, Heidelberg, pp 411–425Google Scholar
  19. Godfrey MW, Tu Q (2000) Evolution in open source software: a case study. In: Proceedings of the 2000 international conference on software maintenance. IEEE, Piscataway, pp 131–142CrossRefGoogle Scholar
  20. Godfrey MW, Zou L (2005) Using origin analysis to detect merging and splitting of source code entities. IEEE Trans Softw Eng 31(2):166–181CrossRefGoogle Scholar
  21. Godfrey MW, Svetinovic D, Tu Q (2000) Evolution, growth, and cloning in Linux: a case study. A presentation at the 2000 CASCON workshop on ’Detecting duplicated and near duplicated structures in largs software systems: Methods and applications’, on November 16, 2000, chaired by Ettore Merlo. http://plg.uwaterloo.ca/~migod/papers/2000/cascon00-linuxcloning.pdf
  22. Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New YorkMATHGoogle Scholar
  23. Higo Y, Kamiya T, Kusumoto S, Inoue K (2004) Aries: refactoring support environment based on code clone analysis. In: The 8th IASTED international conference on software engineering and applications (SEA 2004). MIT, Cambridge, pp 222–229Google Scholar
  24. Jarzabek S, Shubiao L (2003) Eliminating redundancies with a “composition with adaptation” meta-programming technique. In: ESEC/FSE-11: proceedings of the 9th European software engineering conference held jointly with 11th ACM SIGSOFT international symposium on Foundations of software engineering. ACM, New York, pp 237–246CrossRefGoogle Scholar
  25. Jiang L, Misherghi G, Su Z, Glondu S (2007) DECKARD: scalable and accurate tree-based detection of code clones. In: ICSE ’07: proceedings of the 29th international conference on software engineering. IEEE Computer Society, Los Alamitos, pp 96–105Google Scholar
  26. Johnson JH (1994) Substring matching for clone detection and change tracking. In: Proceedings of the international conference on software maintanence. IEEE, Piscataway, pp 120–126Google Scholar
  27. Kamiya T, Kusumoto S, Inoue K (2002) CCfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 8(7):654–670CrossRefGoogle Scholar
  28. Kapser C, Godfrey MW (2003) Toward a taxonomy of clones in source code: a case study. In: Evolution of large scale industrial software architectures. Amsterdam, 23 September 2003Google Scholar
  29. Kapser C, Godfrey MW (2004) Aiding comprehension of cloning through categorization. In: Proc. of 2004 international workshop on principles of software evolution (IWPSE-04). IEEE Computer Society, Los Alamitos, pp 85–94CrossRefGoogle Scholar
  30. Kapser C, Godfrey MW (2005) Improved tool support for the investigation of duplication in software. In: ICSM ’05: proceedings of the 21st IEEE international conference on software maintenance (ICSM’05). IEEE Computer Society, Washington, DC, pp 305–314Google Scholar
  31. Kapser C, Godfrey MW (2006a) ‘Cloning considered harmful’ considered harmful. In: WCRE ’06: proceedings of the 13th working conference on reverse engineering (WCRE 2006). IEEE Computer Society, Washington, DC, pp 19–28Google Scholar
  32. Kapser CJ, Godfrey MW (2006b) Supporting the analysis of clones in software systems. J Softw Maint Evol Res Pract 18(2):61–82CrossRefGoogle Scholar
  33. Kiczales G, Lamping J, Menhdhekar A, Maeda C, Lopes C, Loingtier J-M, Irwin J (1997) Aspect-oriented programming. In: Akit M, Matsuoka S (eds.) Proceedings European conference on object-oriented programming, vol. 1241. Springer, Berlin Heidelberg New York, pp 220–242Google Scholar
  34. Kim M, Bergman L, Lau T, Notkin D (2004) An ethnographic study of copy and paste programming practices in oopl. In: ISESE ’04: proceedings of the 2004 international symposium on empirical software engineering (ISESE’04). IEEE Computer Society, Washington, DC, pp 83–92Google Scholar
  35. Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. In: ESEC/FSE-13: proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on foundations of software engineering. ACM, New York, pp 187–196CrossRefGoogle Scholar
  36. Komondoor R, Horwitz S (2001) Using slicing to identify duplication in source code. In: SAS ’01: proceedings of the 8th international symposium on static analysis. Springer, Heidelberg, pp 40–56Google Scholar
  37. Kontogiannis K, DeMori R, Merlo E, Galler M, Bernstein M (1996) Pattern matching for clone and concept detection. Autom Softw Eng 3(1/2):77–108CrossRefMathSciNetGoogle Scholar
  38. Koschke R, Falke R, Frenzel P (2006) Clone detection using abstract syntax suffix trees. In: WCRE ’06: proceedings of the 13th working conference on reverse engineering (WCRE 2006). IEEE Computer Society, Washington, DC, pp 253–262Google Scholar
  39. Krinke J (2001) Identifying similar code with program dependence graphs. In: WCRE ’01: proceedings of the eigth working conference on reverse engineering (WCRE 2001). ACM, New York, pp 301–309CrossRefGoogle Scholar
  40. LaToza T, Venolia G, DeLine R (2006) Maintaining mental models: a study of developer work habits. In: ICSE ’06: proceedings of the 28th international conference on software engineering. IEEE Computer Society, Los Alamitos, pp 492–501CrossRefGoogle Scholar
  41. Lozano A, Wermelinger M, Nuseibeh B (2007) Evaluating the harmfulness of cloning: a change based experiment. In: MSR 2007: proceedings of the 4th int’l workshop on mining software repositories. IEEE Computer Society, Los Alamitos, pp 18–22CrossRefGoogle Scholar
  42. Mayrand J, Leblanc C, Merlo E (1996) Experiment on the automatic detection of function clones in a software system using metrics. In: Proceedings of the international conference on software maintenance. IEEE Computer Society Press, Los Alamitos, pp 244–253CrossRefGoogle Scholar
  43. Mockus A, Fielding R, Herbsleb J (2000) A case study of open source software development: the Apache Server. In: Proceedings of the 22nd international conference on software engineering (ICSE 2000). ACM, New York, pp 263–272CrossRefGoogle Scholar
  44. Rajapakse D, Stan Jarzabek S (2007) Using server pages to unify clones in web applications: a trade-off analysis. In: Proceedings ICSE ’07: 29th international conference on software engineering. IEEE Computer Society, Los Alamitos, pp 116–126Google Scholar
  45. Rysselberghe FV, Demeyer S (2003) Reconstruction of successful software evolution using clone detection. In: IWPSE ’03: proceedings of the 6th international workshop on principles of software evolution. IEEE Computer Society, Washington, DC, p 126CrossRefGoogle Scholar
  46. Toomim M, Begel A, Graham SL (2004) Managing duplicated code with linked editing. In: VLHCC ’04: proceedings of the 2004 IEEE symposium on visual languages - human centric computing (VLHCC’04). IEEE Computer Society, Washington, DC, 173–180CrossRefGoogle Scholar
  47. Ukkonen E (1995) On-line construction of suffix trees. Algorithmica 14(3):249–260MATHCrossRefMathSciNetGoogle Scholar
  48. Walenstein A, Jyoti N, Li J, Yang Y, Lakhotia A (2003) Problems creating task-relevant clone detection reference data. In: Proceedings of the 10th working conference on reverse engineering (WCRE-03). IEEE Computer Society, Los Alamitos, pp 285–294CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Software Architecture Group (SWAG) David R. Cheriton School of Computer ScienceUniversity of WaterlooWaterlooCanada

Personalised recommendations