Empirical Software Engineering

, Volume 17, Issue 4–5, pp 503–530 | Cite as

Clones: what is that smell?

Article

Abstract

Clones are generally considered bad programming practice in software engineering folklore. They are identified as a bad smell (Fowler et al. 1999) and a major contributor to project maintenance difficulties. Clones inherently cause code bloat, thus increasing project size and maintenance costs. In this work, we try to validate the conventional wisdom empirically to see whether cloning makes code more defect prone. This paper analyses the relationship between cloning and defect proneness. For the four medium to large open source projects that we studied, we find that, first, the great majority of bugs are not significantly associated with clones. Second, we find that clones may be less defect prone than non-cloned code. Third, we find little evidence that clones with more copies are actually more error prone. Fourth, we find little evidence to support the claim that clone groups that span more than one file or directory are more defect prone than collocated clones. Finally, we find that developers do not need to put a disproportionately higher effort to fix clone dense bugs. Our findings do not support the claim that clones are really a “bad smell” (Fowler et al. 1999). Perhaps we can clone, and breathe easily, at the same time.

Keywords

Empirical software engineering Software maintenance Software clone Software quality Software evolution 

References

  1. Alkhatib G (1992) The maintenance problem of application software: an empirical analysis. J Softw Maint: Res Pract 4(2):83–104. doi:10.1002/smr.4360040203 CrossRefGoogle Scholar
  2. Bachmann A, Bernstein A (2009) Data retrieval, processing and linking for software process data analysis. Technical report, University of Zurich. http://www.ifi.uzh.ch/ddis/people/adrian-bachmann/pdq/. Accessed May 2009
  3. Baker BS (1995) On finding duplication and near-duplication in large software systems. In: WCRE ’95: proceedings of the 2nd working conference on reverse engineering. IEEE Computer Society, Washington, pp 86–95. http://portal.acm.org/citation.cfm?id=836911
  4. Balazinska M, Merlo E, Dagenais M, Lague B, Kontogiannis K (1999) Partial redesign of java software systems based on clone analysis. In: WCRE ’99: proceedings of the 6th working conference on reverse engineering. IEEE Computer Society, Washington, pp 326–336. http://portal.acm.org/citation.cfm?id=837061
  5. Barbour L, Khomh F, Zou Y (2011) Late propagation in software clonesGoogle Scholar
  6. Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: Proceedings of the international conference on software maintenance, pp 368–377. doi:10.1109/ICSM.1998.738528
  7. Berkus J (2007) The 5 types of open source projects. http://www.powerpostgresql.com/5_types. Accessed 20 March 2007
  8. Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: bias in bug-fix datasets. In: ESEC/FSE ’09: proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, New York, pp 121–130. doi:10.1145/1595696.1595716 Google Scholar
  9. Bruntink M, van Deursen A, van Engelen R, Tourwe T (2005) On the use of clone detection for identifying crosscutting concern code. IEEE Trans Softw Eng 31(10):804–818. doi:10.1109/TSE.2005.114 CrossRefGoogle Scholar
  10. Cai D, Kim M (2011) An empirical study of long-lived code clones. Fundamental approaches to software engineering, pp 432–446Google Scholar
  11. Čubranić D, Murphy GC (2003) Hipikat: recommending pertinent software development artifacts. In: ICSE ’03: proceedings of the 25th international conference on software engineering. IEEE Computer Society, Washington, pp 408–418. http://portal.acm.org/citation.cfm?id=776816.776866 Google Scholar
  12. Ducasse S, Rieger M, Demeyer S (1999) A language independent approach for detecting duplicated code. In: Proc. IEEE int. conf. on software maintenance 1999 (’99). Oxford, UK, pp 109–118Google Scholar
  13. Ekoko ED, Robillard MP (2007) Tracking code clones in evolving software. In: ICSE ’07: proceedings of the 29th international conference on software engineering. IEEE Computer Society, Washington, pp 158–167. doi:10.1109/ICSE.2007.90 Google Scholar
  14. Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: ICSM ’03: proceedings of the international conference on software maintenance. IEEE Computer Society, Washington, pp 23–32. http://portal.acm.org/citation.cfm?id=943568 CrossRefGoogle Scholar
  15. Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code, 1st edn. Addison-Wesley Professional. http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/0201485672
  16. Gabel M, Jiang L, Su Z (2008) Scalable detection of semantic clones. In: ICSE ’08: proceedings of the 30th international conference on Software engineering. ACM, New York, pp 321–330. doi:10.1145/1368088.1368132 CrossRefGoogle Scholar
  17. Geiger R, Fluri B, Gall H, Pinzger M (2006) Relation of code clones and change couplings. In: Baresi L, Heckel R (eds) Fundamental approaches to software engineering. Lecture notes in computer science, vol 3922, chap 31. Springer, Berlin/Heidelberg, pp 411–425. doi:10.1007/11693017_31 CrossRefGoogle Scholar
  18. Göde N, Koschke R (2011) Frequency and risks of changes to clones. In: Proceeding of the 33rd international conference on software engineering. ACM, pp 311–320Google Scholar
  19. Higo Y, Kamiya T, Kusumoto S, Inoue K (2005) Aries: refactoring support tool for code clone. SIGSOFT Softw Eng Notes 30(4):1–4. doi:10.1145/1082983.1083306 CrossRefGoogle Scholar
  20. Jiang L, Misherghi G, Su Z, Glondu S (2007a) Deckard: scalable and accurate tree-based detection of code clones. In: ICSE ’07: proceedings of the 29th international conference on software engineering. IEEE Computer Society, Washington, pp 96–105. doi:10.1109/ICSE.2007.30 Google Scholar
  21. Jiang L, Su Z, Chiu E (2007b) Context-based detection of clone-related bugs. In: ESEC-FSE ’07: proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, New York, pp 55–64. doi:10.1145/1287624.1287634 Google Scholar
  22. Juergens E, Deissenboeck F, Hummel B, Wagner S (2009) Do code clones matter? In: ICSE ’09: proceedings of the 2009 IEEE 31st international conference on software engineering. IEEE Computer Society, Washington, pp 485–495. doi:10.1109/ICSE.2009.5070547 Google Scholar
  23. Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670. doi:10.1109/TSE.2002.1019480 CrossRefGoogle Scholar
  24. Kan S (2002) Metrics and models in software quality engineering. Addison-Wesley Longman Publishing Co., Inc., BostonGoogle Scholar
  25. Kapser C, Godfrey M (2008) Cloning considered harmful considered harmful: patterns of cloning in software. Empir Software Eng 13(6):645–692CrossRefGoogle Scholar
  26. Kapser C, Godfrey MW (2006) “Cloning considered harmful” considered harmful. In: Working conference on reverse engineering, pp 19–28. doi:10.1109/WCRE.2006.1
  27. Kawaguchi S, Yamashina T, Uwano H, Fushida K, Kamei Y, Nagura M, Iida H (2009) Shinobi: a tool for automatic code clone detection in the ide. In: Working conference on reverse engineering, pp 313–314. doi:10.1109/WCRE.2009.36
  28. Kim M, Bergman L, Lau T, Notkin D (2004) An ethnographic study of copy and paste programming practices in oopl. In: International symposium on empirical software engineering, pp 83–92. doi:10.1109/ISESE.2004.1334896
  29. Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. SIGSOFT Softw Eng Notes 30(5):187–196. doi:10.1145/1095430.1081737 CrossRefGoogle Scholar
  30. Kim S, Zimmermann T, Pan K, Jr J (2006) Automatic identification of bug-introducing changes. In: ASE ’06: proceedings of the 21st IEEE/ACM international conference on automated software engineering. IEEE Computer Society, Washington, pp 81–90. doi:10.1109/ASE.2006.23 Google Scholar
  31. Kim S, Whitehead E, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181–196CrossRefGoogle Scholar
  32. Komondoor R, Horwitz S (2001) Using slicing to identify duplication in source code. In: Cousot P (ed) Static analysis, lecture notes in computer science, chap 3, vol 2126. Springer, Berlin, pp 40–56. doi:10.1007/3-540-47764-0_3 Google Scholar
  33. Komondoor R, Horwitz S (2003) Effective, automatic procedure extraction. In: IWPC ’03: proceedings of the 11th IEEE international workshop on program comprehension. IEEE Computer Society, Washington, pp 33–42. http://portal.acm.org/citation.cfm?id=857023 CrossRefGoogle Scholar
  34. Krinke J (2007) A study of consistent and inconsistent changes to code clones. In: WCRE ’07: proceedings of the 14th working conference on reverse engineering. IEEE Computer Society, Washington, pp 170–178. doi:10.1109/WCRE.2007.7 Google Scholar
  35. Krinke J (2008) Is cloned code more stable than non-cloned code? In: 2008 8th IEEE international working conference on source code analysis and manipulation, pp 57–66. doi:10.1109/SCAM.2008.14
  36. Li Z, Lu S, Myagmar S, Zhou Y (2004) CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In: OSDI’04: proceedings of the 6th conference on symposium on opearting systems design & implementation. USENIX Association, Berkeley, p 20. http://portal.acm.org/citation.cfm?id=1251274 Google Scholar
  37. Mäntylä M, Lassenius C (2006) Subjective evaluation of software evolvability using code smells: an empirical study. Empir Software Eng 11(3):395–431. doi:10.1007/s10664-006-9002-8 CrossRefGoogle Scholar
  38. Mockus A, Votta LG (2000) Identifying reasons for software changes using historic databases. In: Proceedings international conference on software maintenance, 2000. IEEE Computer Society, Los Alamitos, pp 120–130. doi:10.1109/ICSM.2000.883028 Google Scholar
  39. Nguyen TT, Nguyen HA, Pham NH, Al-Kofahi JM, Nguyen TN (2009) Clone-aware configuration management. In: ASE ’09: proceedings of the 2009 IEEE/ACM international conference on automated software engineering. IEEE Computer Society, Washington, pp 123–134. doi:10.1109/ASE.2009.90 CrossRefGoogle Scholar
  40. Rahman F, Bird C, Devanbu P (2010) Clones: what is that smell? In: Proceedings of the 7th working conference on mining software repositories. IEEE Computer SocietyGoogle Scholar
  41. Roy C, Cordy J (2007) A survey on software clone detection research. Queens School of Computing TR 541:115Google Scholar
  42. Selim G, Barbour L, Shang W, Adams B, Hassan A, Zou Y (2010) Studying the impact of clones on software defects. In: 2010 17th working conference on reverse engineering (WCRE). IEEE, pp 13–21Google Scholar
  43. Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? In: MSR ’05: proceedings of the 2005 international workshop on mining software repositories. ACM, New York, pp 1–5. doi:10.1145/1083142.1083147 CrossRefGoogle Scholar
  44. Thummalapenta S, Cerulo L, Aversano L, Di Penta M (2009) An empirical study on the maintenance of source code clones. Empir Software Eng 15(1):1–34. doi:10.1007/s10664-009-9108-x CrossRefGoogle Scholar
  45. Toomim M, Begel A, Graham SL (2004) Managing duplicated code with linked editing. In: VLHCC ’04: proceedings of the 2004 IEEE symposium on visual languages—human centric computing. IEEE Computer Society, Washington, pp 173–180. doi:10.1109/VLHCC.2004.35 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Foyzur Rahman
    • 1
  • Christian Bird
    • 2
  • Premkumar Devanbu
    • 1
  1. 1.Department of Computer ScienceUniversity of California, DavisDavisUSA
  2. 2.Empirical Software EngineeringMicrosoft ResearchRedmondUSA

Personalised recommendations