Empirical Software Engineering

, Volume 23, Issue 4, pp 2232–2278 | Cite as

Cloned and non-cloned Java methods: a comparative study

  • Vaibhav Saini
  • Hitesh Sajnani
  • Cristina Lopes


Reusing code via copy-and-paste, with or without modification is a common behavior observed in software engineering. Traditionally, cloning has been considered as a bad smell suggesting flaws in design decisions. Many studies exist targeting clone discovery, removal, and refactoring. However there are not many studies which empirically investigate and compare the quality of cloned code to that of the code which has not been cloned. To this end, we present a statistical study that shows whether qualitative differences exist between cloned methods and non-cloned methods in Java projects. The dataset consists of 3562 open source Java projects containing 412,705 cloned and 616,604 non-cloned methods. The study uses 27 software metrics as a proxy for quality, spanning across complexity, modularity, and documentation (code-comments) categories. When controlling for size, no statistically significant differences were found between cloned and non-cloned methods for most of the metrics, except for three of them. The main statistically significant difference found was that cloned methods are on an average 18% smaller than non-cloned methods. After doing a mixed method analysis, we provide some insight for why cloned methods are smaller.


Code clones Quality metrics Open source software 



This work was partially supported by a grant from the National Science Foundation No.1218228, and by the DARPA MUSE program.


  1. Alghamdi JS, Rufai RA, Khan SM (2005) Oometer: a software quality assurance tool. In: IEEE, pp 190–191Google Scholar
  2. Andersson M, Vestergren P (2004) Object-oriented design quality metrics. Citeseer.
  3. Arisholm E, Briand LC (2006) Predicting fault-prone components in a Java legacy system. In: Proceedings of the 2006 ACM/IEEE international symposium on empirical software engineering. ACM, pp 8– 17Google Scholar
  4. Arisholm E, Briand LC, Fuglerud M (2007) Data mining techniques for building fault-proneness models in telecom Java software. In: The 18th IEEE international symposium on software reliability, 2007. ISSRE’07. IEEE, pp 215–224Google Scholar
  5. Baker B (1992) A program for identifying duplicated code. In: Proceedings of 24th Symposium of Computing Science and Statistics, March 1992, pp 49–57Google Scholar
  6. Basili VR, Caldiera G, Rombach HD (1994) The goal question metric approach. Encyclopedia of Software Engineering 1:528–532Google Scholar
  7. Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761CrossRefGoogle Scholar
  8. Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473CrossRefGoogle Scholar
  9. Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: Proceedings of the international conference on software maintenance. IEEE Computer Society, p 368Google Scholar
  10. Bellon S, Koschke R, Antoniol G, Krinke J, Merlo E (2007) Comparison and evaluation of clone detection tools. IEEE Trans Softw Eng 33(9):577–591CrossRefGoogle Scholar
  11. Benestad HC, Anda B, Arisholm E (2006) Assessing software product maintainability based on class-level structural measures. In: Product-focused software process improvement. Springer, pp 94–111Google Scholar
  12. Borrego M, Douglas EP, Amelink CT (2009) Quantitative, qualitative, and mixed research methods in engineering education. J Eng Educ 98(1):53–66CrossRefGoogle Scholar
  13. Börstler J, Nordström M, Paterson JH (2011) On the quality of examples in introductory Java textbooks. ACM Transactions on Computing Education (TOCE) 11(1):3Google Scholar
  14. Briand LC, Wüst J, Daly JW, Porter DV (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Softw 51(3):245–273CrossRefGoogle Scholar
  15. Bruntink M, van Deursen A, van Engelen R, Tourwé T (2005) On the use of clone detection for identifying cross cutting concern code. IEEE Trans Softw Eng 31(10):804–818CrossRefGoogle Scholar
  16. Cartwright M, Shepperd M (2000) An empirical investigation of an object-oriented software system. IEEE Trans Softw Eng 26(8):786–796CrossRefGoogle Scholar
  17. Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493CrossRefGoogle Scholar
  18. Cordy J (2003) Comprehending reality - practical barriers to industrial adoption of software maintenance automation. In: Proceedings of international conference on program comprehension, pp 196–205Google Scholar
  19. de Wit M, Zaidman A, van Deursen A (2009) Managing code clones using dynamic change tracking and resolution. In: Proceedings of the 25th international conference on software maintenance (ICSM 2009). IEEE Computer SocietyGoogle Scholar
  20. Ducasse S, Rieger M, Demeyer S (1999) A language independent approach for detecting duplicated code. In: Proceedings of the IEEE international conference on software maintenance. IEEE Computer Society, p 109Google Scholar
  21. El Emam K, Benlarbi S, Goel N, Rai SN (2001) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630–650CrossRefGoogle Scholar
  22. Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code. Addison-Wesley Professional, ReadingGoogle Scholar
  23. Gode N, Harder J (2011) Clone stability. In: 15th European conference on software maintenance and reengineering (CSMR), 2011. IEEE, pp 65–74Google Scholar
  24. Gode N, Koschke R (2009) Incremental clone detection. In: Proceedings of CSMRGoogle Scholar
  25. Göde N, Koschke R (2011) Frequency and risks of changes to clones. In: Proceedings of the 33rd international conference on software engineering. ACM, pp 311–320Google Scholar
  26. Gupta V, Aggarwal K, Singh Y (2005) A fuzzy approach for integrated measure of object-oriented software testability. J Comput Sci 1(2):276–282CrossRefGoogle Scholar
  27. Halstead MH (1977) Elements of software science (operating and programming systems series). Elsevier Science Inc.Google Scholar
  28. Herzig K, Just S, Rau A, Zeller A (2013) Classifying code changes and predicting defects using changegenealogies. Technical Report, Tech. Rep. Saarland University, GermanyGoogle Scholar
  29. Hotta K, Sano Y, Higo Y, Kusumoto S (2010) Is duplicate code more frequently modified than non-duplicate code in software evolution?: an empirical study on open source software. In: Proceedings of the joint ERCIM workshop on software evolution (EVOL) and international workshop on principles of software evolution (IWPSE). ACM, pp 73–82Google Scholar
  30. Islam MR, Zibran MF (2016) A comparative study on vulnerabilities in categories of clones and non-cloned code. In: IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol 3. IEEE, pp 8–14Google Scholar
  31. Johnson JH (1993) Identifying redundancy in source code using fingerprints. In: Proceedings of the 1993 conference of the centre for advanced studies on collaborative research: software engineering, vol 1. IBM Press, Toronto, pp 171–183Google Scholar
  32. Johnson JH (1994) Substring matching for clone detection and change tracking. In: International conference on software maintanence, pp 120–126Google Scholar
  33. Juergens E, Deissenboeck F, Hummel B, Wagner S (2009) Do code clones matter?. In: Proceedings of ICSE, pp 485–495Google Scholar
  34. Kafura D, Reddy G (1987) The use of software complexity metrics in software maintenance. IEEE Trans Softw Eng SE-13(3):335–343CrossRefGoogle Scholar
  35. Kamiya T, Kusumoto S, Inoue K (2002) CCFInder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28 (7):654–670CrossRefGoogle Scholar
  36. Kapser C, Godfrey M (2008) “Cloning considered harmful” considered harmful: patterns of cloning in software. Empir Softw Eng 13(6):645–692CrossRefGoogle Scholar
  37. Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. In: Proceedings of FSEGoogle Scholar
  38. Komondoor R, Horwitz S (2001) Using slicing to identify duplication in source code. In: Proceedings of the 8th international symposium on static analysis. Springer, pp 40–56Google Scholar
  39. Komondoor R, Horwitz S (2003) Effective automatic procedure extraction. In: Proceedings of the international workshop on program comprehension. Springer, pp 40–56Google Scholar
  40. Koschke R (2007) Survey of research on software clones. In: Proceedings of duplication, redundancy, and similarity in softwareGoogle Scholar
  41. Koschke R (2008) Identifying and removing software clones. In: Software evolution. Springer, pp 15–36Google Scholar
  42. Koschke R (2012) Large-scale inter-system clone detection using suffix trees. In: Proceedings of CSMR, pp 309–318Google Scholar
  43. Krinke J (2001) Identifying similar code with program dependence graphs. In: Proceedings of the eighth working conference on reverse engineering (WCRE’01). IEEE Computer Society, p 301Google Scholar
  44. Krinke J (2007) A study of consistent and inconsistent changes to code clones. In: Working conference on reverse engineering, pp 170–178Google Scholar
  45. Krinke J (2008) Is cloned code more stable than non-cloned code?. In: Eighth IEEE international working conference on source code analysis and manipulation, 2008. IEEE, pp 57–66Google Scholar
  46. Kutner MH, Nachtsheim C, Neter J, Li W (2005) Applied Linear Statistical Models. McGraw-Hill IrwinGoogle Scholar
  47. Li W, Henry S (1993) Object-oriented metrics that predict maintainability. J Syst Softw 23(2):111–122CrossRefGoogle Scholar
  48. Lopes CV, Ossher J (2015) How scale affects structure in Java programs. In: Proceedings of the 2015 ACM SIGPLAN international conference on object-oriented programming, systems, languages, and applications. ACM, pp 675–694Google Scholar
  49. Lopes C, Bajracharya S, Ossher J, Baldi P (2010) UCI Source code data sets. [Online]. Available:
  50. Lozano A, Wermelinger M (2008) Assessing the effect of clones on changeability. In: IEEE international conference on software maintenance, 2008. ICSM 2008. IEEE, pp 227–236Google Scholar
  51. Lozano A, Wermelinger M, Nuseibeh B (2007) Evaluating the harmfulness of cloning: a change based experiment. In: Mining software repositories, pp 18–22Google Scholar
  52. Lucrédio D, de Almeida ES, Fortes RP (2012) An investigation on the impact of mde on software reuse. In: Sixth brazilian symposium on software components architectures and reuse (SBCARS), 2012. IEEE, pp 101–110Google Scholar
  53. McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 4:308–320MathSciNetCrossRefzbMATHGoogle Scholar
  54. Miles MB, Huberman AM (1994) Qualitative data analysis: an expanded sourcebook. SageGoogle Scholar
  55. Mondal M, Roy CK, Rahman MS, Saha RK, Krinke J, Schneider KA (2012) Comparative stability of cloned and non-cloned code: an empirical study. In: Proceedings of the 27th annual ACM symposium on applied computing. ACM, pp 1227–1234Google Scholar
  56. Mubarak A, Counsell S, Hierons R (2009) Does an 80: 20 rule apply to Java coupling?. In: Proceedings of the international conference on evaluation and assessment in software engineeringGoogle Scholar
  57. Mubarak A, Counsell S, Hierons RM (2010) An evolutionary study of fan-in and fan-out metrics in OSS. In: Fourth international conference on research challenges in information science (RCIS), 2010. IEEE, pp 473–482Google Scholar
  58. Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th international conference on software engineering. ACMGoogle Scholar
  59. Nasseri E, Counsell S, Shepperd M (2008) An empirical study of evolution of inheritance in Java OSS. In: 19th Australian conference on software engineering, 2008. ASWEC 2008. IEEE, pp 269–278Google Scholar
  60. Ossher J, Sajnani H, Lopes CV (2011) File cloning in open source Java projects: the good, the bad, and the ugly. In: ICSM. IEEEGoogle Scholar
  61. Parnas DL (1972) On the criteria to be used in decomposing systems into modules. Commun ACM 15(12):1053–1058CrossRefGoogle Scholar
  62. Rahman F, Bird C, Devanbu P (2012) Clones: what is that smell? Empir Softw Eng 17(4–5):503–530CrossRefGoogle Scholar
  63. Rajapakse DC, Jarzabek S (2007) Using server pages to unify clones in web applications: a trade-off analysis. In: Proceedings of international conference on software engineering, pp 116–126Google Scholar
  64. Roy CK, Cordy JR (2007) A survey on software clone detection research. Technical report, Queen’s University at KingstonGoogle Scholar
  65. Roy CK, Cordy JR (2009) A mutation/injection-based automatic framework for evaluating code clone detection tools. In: International conference on software testing, verification and validation workshops, 2009. ICSTW’09. IEEE, pp 157–166Google Scholar
  66. Roy CK, Cordy JR, Koschke R (2009) Comparison and evaluation of code clone detection techniques and tools: a qualitative approach. Sci Comput Program 74 (7):470–495MathSciNetCrossRefzbMATHGoogle Scholar
  67. Saini V, Sajnani H, Lopes C (2016) Comparing quality metrics for cloned and non cloned java methods: a large scale empirical study. In: IEEE international conference on software maintenance and evolution (ICSME), 2016. IEEE, pp 256–266Google Scholar
  68. Sajnani H, Saini V, Lopes CV (2014a) A comparative study of bug patterns in Java cloned and non-cloned code. In: IEEE 14th international working conference on source code analysis and manipulation (SCAM), 2014. IEEE, pp 21–30Google Scholar
  69. Sajnani H, Saini V, Ossher J, Lopes C (2014b) Is popularity a measure of its quality? An analysis of maven components. In: Proceedings of the 30th software maintenance and evolution(to appear in ICSME 2014). IEEE Computer SocietyGoogle Scholar
  70. Sajnani H, Saini V, Svajlenko J, Roy CK, Lopes CV (2016) Sourcerercc: scaling code clone detection to big code. In: Proceedings of international conference on software engineering, to appearGoogle Scholar
  71. Samoladas I, Gousios G, Spinellis D, Stamelos I (2008) The SQO-OSS quality model: measurement based open source software evaluation. In: Open source development, communities and quality, pp 237–248Google Scholar
  72. Scandariato R, Walden J (2012) Predicting vulnerable classes in an android application. In: Proceedings of the 4th international workshop on security measurements and metrics. ACM, pp 11–16Google Scholar
  73. Shomrat M, Feldman Y (2013) Detecting refactored clones. In: Castagna G (ed) ECOOP 2013 European condeference on object-oriented programming, ser. Lecture notes in computer science, vol 7920. Springer, Berlin, pp 502–526Google Scholar
  74. Subramanyam R, Krishnan MS (2003) Empirical analysis of ck metrics for object-oriented design complexity: Implications for software defects. IEEE Trans Softw Eng 29(4):297–310. [Online]. Available: CrossRefGoogle Scholar
  75. Svajlenko J, Islam JF, Keivanloo I, Roy CK, Mia MM (2014) Towards a big data curated benchmark of inter-project code clones. In: IEEE international conference on software maintenance and evolution (ICSME), 2014. IEEE, pp 476–480Google Scholar
  76. Wang T, Harman M, Jia Y, Krinke J (2013) Searching for better configurations: a rigorous approach to clone evaluation. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 455–465Google Scholar
  77. Xie S, Khomh F, Zou Y, Keivanloo I (2014) An empirical study on the fault-proneness of clone migration in clone genealogies. In: Proceedings of CSMR-WCRE. IEEE, pp 94–103Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.University of California IrvineIrvineUSA
  2. 2.MicrosoftRedmondUSA

Personalised recommendations