Skip to main content

“Cloning considered harmful” considered harmful: patterns of cloning in software

Abstract

Literature on the topic of code cloning often asserts that duplicating code within a software system is a bad practice, that it causes harm to the system’s design and should be avoided. However, in our studies, we have found significant evidence that cloning is often used in a variety of ways as a principled engineering tool. For example, one way to evaluate possible new features for a system is to clone the affected subsystems and introduce the new features there, in a kind of sandbox testbed. As features mature and become stable within the experimental subsystems, they can be migrated incrementally into the stable code base; in this way, the risk of introducing instabilities in the stable version is minimized. This paper describes several patterns of cloning that we have observed in our case studies and discusses the advantages and disadvantages associated with using them. We also examine through a case study the frequencies of these clones in two medium-sized open source software systems, the Apache web server and the Gnumeric spreadsheet application. In this study, we found that as many as 71% of the clones could be considered to have a positive impact on the maintainability of the software system.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

References

  • Antoniol G, Villano U, Merlo E, Penta MD (2002) Analyzing cloning evolution in the linux kernel. Inf Softw Technol 44(13):755–765

    Article  Google Scholar 

  • Aversano L, Cerulo L, Di Penta M (2007) How clones are maintained: an empirical study. In: CSMR ’07: proceedings of the 11th european conference on software maintenance and reengineering. IEEE Computer Society, Los Alamitos, pp 81–90

    Google Scholar 

  • Baker BS (1995) On finding duplication and near-duplication in large software systems. In: WCRE ’95: proceedings of the second working conference on reverse engineering. IEEE Computer Society, Washington, DC, pp 86–95

    Chapter  Google Scholar 

  • Balazinska M, Merlo E, Dagenais M, Lague B, Kontogiannis K (1999a) Measuring clone based reengineering opportunities. In: Proceedings of the sixth international software metrics symposium. IEEE Computer Society, Los Alamitos, pp 292–303

    Chapter  Google Scholar 

  • Balazinska M, Merlo E, Dagenais M, Lague B, Kontogiannis K (1999b) Partial redesign of java software systems based on clone analysis. In: The proceedings of the 6th. working conference on reverse engineering. IEEE Computer Society, Los Alamitos, pp 326–336

    Google Scholar 

  • Balazinska M, Merlo E, Dagenais M, Lague B, Kontogiannis K (2000) Advanced clone analysis to support object-oriented system refactoring. In: Proceedings of the 7th. working conference on reverse engineering. IEEE Computer Society, Los Alamitos, pp 98–107

    Google Scholar 

  • Basit HA, Rajapakse DC, Jarzabek S (2005) Beyond templates: a study of clones in the STL and some general implications. In: ICSE ’05: proceedings of the 27th international conference on software engineering. ACM, New York, pp 451–459

    Google Scholar 

  • Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: ICSM ’98: proceedings of the international conference on software maintenance. IEEE Computer Society, Washington, DC, p 368

    Google Scholar 

  • Bellon S (2002) Detection of software clones—tool comparison experiment. In: International workshop on source code analysis and manipulation. Montreal, October 2002

  • Brown WJ, Malveau RC, McCormick HW III, Mowbray TJ (1998) AntiPatterns: refactoring software, architectures, and projects in crisis, 1st edn. Wiley, New York

    Google Scholar 

  • Casazza G, Antoniol G, Villano U, Merlo E, Penta MD (2001) Identifying clones in the linux kernel. In: First IEEE international workshop on source code analysis and manipulation. IEEE Computer Society Press, Los Alamitos, pp 92–100

    Google Scholar 

  • Coplien JO (1992) Advanced C++ programming styles and idioms, 1st edn. Addison Wesley, Reading

    Google Scholar 

  • Cordy JR (2003) Comprehending reality—practical barriers to industrial adoption of software maintenance automation. In: Proceedings of the 11th IEEE international workshop on program comprehension. IEEE Computer Society, Los Alamitos, pp 196–206

    Google Scholar 

  • Duala-Ekoko E, Robillard M (2007) Tracking code clones in evolving software. In: 29th international conference on software engineering (ICSE 2007). IEEE Computer Society, Los Alamitos, pp 158–167

    Chapter  Google Scholar 

  • Ducasse S, Rieger M, Demeyer S (1999) A language independent approach for detecting duplicated code. In: Proceedings ICSM’99: international conference on software maintenance. IEEE Computer Society Press, Los Alamitos, pp 109–118

    Google Scholar 

  • Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code, 1st edn. Addison-Wesley Professional

  • Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable object-oriented software, 1st edn. Addison-Wesley, Reading

    Google Scholar 

  • Geiger R, Fluri B, Gall H, Pinzger M (2006) Relation of code clones and change couplings. In: Fundamental approaches to software engineering, 9th international conference, FASE 2006, Lecture notes in computer science, vol 3922. Springer, Heidelberg, pp 411–425

    Google Scholar 

  • Godfrey MW, Tu Q (2000) Evolution in open source software: a case study. In: Proceedings of the 2000 international conference on software maintenance. IEEE, Piscataway, pp 131–142

    Chapter  Google Scholar 

  • Godfrey MW, Zou L (2005) Using origin analysis to detect merging and splitting of source code entities. IEEE Trans Softw Eng 31(2):166–181

    Article  Google Scholar 

  • Godfrey MW, Svetinovic D, Tu Q (2000) Evolution, growth, and cloning in Linux: a case study. A presentation at the 2000 CASCON workshop on ’Detecting duplicated and near duplicated structures in largs software systems: Methods and applications’, on November 16, 2000, chaired by Ettore Merlo. http://plg.uwaterloo.ca/~migod/papers/2000/cascon00-linuxcloning.pdf

  • Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New York

    MATH  Google Scholar 

  • Higo Y, Kamiya T, Kusumoto S, Inoue K (2004) Aries: refactoring support environment based on code clone analysis. In: The 8th IASTED international conference on software engineering and applications (SEA 2004). MIT, Cambridge, pp 222–229

    Google Scholar 

  • Jarzabek S, Shubiao L (2003) Eliminating redundancies with a “composition with adaptation” meta-programming technique. In: ESEC/FSE-11: proceedings of the 9th European software engineering conference held jointly with 11th ACM SIGSOFT international symposium on Foundations of software engineering. ACM, New York, pp 237–246

    Chapter  Google Scholar 

  • Jiang L, Misherghi G, Su Z, Glondu S (2007) DECKARD: scalable and accurate tree-based detection of code clones. In: ICSE ’07: proceedings of the 29th international conference on software engineering. IEEE Computer Society, Los Alamitos, pp 96–105

    Google Scholar 

  • Johnson JH (1994) Substring matching for clone detection and change tracking. In: Proceedings of the international conference on software maintanence. IEEE, Piscataway, pp 120–126

    Google Scholar 

  • Kamiya T, Kusumoto S, Inoue K (2002) CCfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 8(7):654–670

    Article  Google Scholar 

  • Kapser C, Godfrey MW (2003) Toward a taxonomy of clones in source code: a case study. In: Evolution of large scale industrial software architectures. Amsterdam, 23 September 2003

  • Kapser C, Godfrey MW (2004) Aiding comprehension of cloning through categorization. In: Proc. of 2004 international workshop on principles of software evolution (IWPSE-04). IEEE Computer Society, Los Alamitos, pp 85–94

    Chapter  Google Scholar 

  • Kapser C, Godfrey MW (2005) Improved tool support for the investigation of duplication in software. In: ICSM ’05: proceedings of the 21st IEEE international conference on software maintenance (ICSM’05). IEEE Computer Society, Washington, DC, pp 305–314

    Google Scholar 

  • Kapser C, Godfrey MW (2006a) ‘Cloning considered harmful’ considered harmful. In: WCRE ’06: proceedings of the 13th working conference on reverse engineering (WCRE 2006). IEEE Computer Society, Washington, DC, pp 19–28

    Google Scholar 

  • Kapser CJ, Godfrey MW (2006b) Supporting the analysis of clones in software systems. J Softw Maint Evol Res Pract 18(2):61–82

    Article  Google Scholar 

  • Kiczales G, Lamping J, Menhdhekar A, Maeda C, Lopes C, Loingtier J-M, Irwin J (1997) Aspect-oriented programming. In: Akit M, Matsuoka S (eds.) Proceedings European conference on object-oriented programming, vol. 1241. Springer, Berlin Heidelberg New York, pp 220–242

    Google Scholar 

  • Kim M, Bergman L, Lau T, Notkin D (2004) An ethnographic study of copy and paste programming practices in oopl. In: ISESE ’04: proceedings of the 2004 international symposium on empirical software engineering (ISESE’04). IEEE Computer Society, Washington, DC, pp 83–92

    Google Scholar 

  • Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. In: ESEC/FSE-13: proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on foundations of software engineering. ACM, New York, pp 187–196

    Chapter  Google Scholar 

  • Komondoor R, Horwitz S (2001) Using slicing to identify duplication in source code. In: SAS ’01: proceedings of the 8th international symposium on static analysis. Springer, Heidelberg, pp 40–56

    Google Scholar 

  • Kontogiannis K, DeMori R, Merlo E, Galler M, Bernstein M (1996) Pattern matching for clone and concept detection. Autom Softw Eng 3(1/2):77–108

    Article  MathSciNet  Google Scholar 

  • Koschke R, Falke R, Frenzel P (2006) Clone detection using abstract syntax suffix trees. In: WCRE ’06: proceedings of the 13th working conference on reverse engineering (WCRE 2006). IEEE Computer Society, Washington, DC, pp 253–262

    Google Scholar 

  • Krinke J (2001) Identifying similar code with program dependence graphs. In: WCRE ’01: proceedings of the eigth working conference on reverse engineering (WCRE 2001). ACM, New York, pp 301–309

    Chapter  Google Scholar 

  • LaToza T, Venolia G, DeLine R (2006) Maintaining mental models: a study of developer work habits. In: ICSE ’06: proceedings of the 28th international conference on software engineering. IEEE Computer Society, Los Alamitos, pp 492–501

    Chapter  Google Scholar 

  • Lozano A, Wermelinger M, Nuseibeh B (2007) Evaluating the harmfulness of cloning: a change based experiment. In: MSR 2007: proceedings of the 4th int’l workshop on mining software repositories. IEEE Computer Society, Los Alamitos, pp 18–22

    Chapter  Google Scholar 

  • Mayrand J, Leblanc C, Merlo E (1996) Experiment on the automatic detection of function clones in a software system using metrics. In: Proceedings of the international conference on software maintenance. IEEE Computer Society Press, Los Alamitos, pp 244–253

    Chapter  Google Scholar 

  • Mockus A, Fielding R, Herbsleb J (2000) A case study of open source software development: the Apache Server. In: Proceedings of the 22nd international conference on software engineering (ICSE 2000). ACM, New York, pp 263–272

    Chapter  Google Scholar 

  • Rajapakse D, Stan Jarzabek S (2007) Using server pages to unify clones in web applications: a trade-off analysis. In: Proceedings ICSE ’07: 29th international conference on software engineering. IEEE Computer Society, Los Alamitos, pp 116–126

    Google Scholar 

  • Rysselberghe FV, Demeyer S (2003) Reconstruction of successful software evolution using clone detection. In: IWPSE ’03: proceedings of the 6th international workshop on principles of software evolution. IEEE Computer Society, Washington, DC, p 126

    Chapter  Google Scholar 

  • Toomim M, Begel A, Graham SL (2004) Managing duplicated code with linked editing. In: VLHCC ’04: proceedings of the 2004 IEEE symposium on visual languages - human centric computing (VLHCC’04). IEEE Computer Society, Washington, DC, 173–180

    Chapter  Google Scholar 

  • Ukkonen E (1995) On-line construction of suffix trees. Algorithmica 14(3):249–260

    MATH  Article  MathSciNet  Google Scholar 

  • Walenstein A, Jyoti N, Li J, Yang Y, Lakhotia A (2003) Problems creating task-relevant clone detection reference data. In: Proceedings of the 10th working conference on reverse engineering (WCRE-03). IEEE Computer Society, Los Alamitos, pp 285–294

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cory J. Kapser.

Additional information

Editors: Massimiliano Di Penta and Susan Sim

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Kapser, C.J., Godfrey, M.W. “Cloning considered harmful” considered harmful: patterns of cloning in software. Empir Software Eng 13, 645 (2008). https://doi.org/10.1007/s10664-008-9076-6

Download citation

  • Published:

  • DOI: https://doi.org/10.1007/s10664-008-9076-6

Keywords

  • Clone detection
  • Clone analysis
  • Reverse engineering
  • Case study