Abstract
Reusing code via copy-and-paste, with or without modification is a common behavior observed in software engineering. Traditionally, cloning has been considered as a bad smell suggesting flaws in design decisions. Many studies exist targeting clone discovery, removal, and refactoring. However there are not many studies which empirically investigate and compare the quality of cloned code to that of the code which has not been cloned. To this end, we present a statistical study that shows whether qualitative differences exist between cloned methods and non-cloned methods in Java projects. The dataset consists of 3562 open source Java projects containing 412,705 cloned and 616,604 non-cloned methods. The study uses 27 software metrics as a proxy for quality, spanning across complexity, modularity, and documentation (code-comments) categories. When controlling for size, no statistically significant differences were found between cloned and non-cloned methods for most of the metrics, except for three of them. The main statistically significant difference found was that cloned methods are on an average 18% smaller than non-cloned methods. After doing a mixed method analysis, we provide some insight for why cloned methods are smaller.
Similar content being viewed by others
Notes
Specific hardware details and testing conditions are in the original article.
References
Alghamdi JS, Rufai RA, Khan SM (2005) Oometer: a software quality assurance tool. In: IEEE, pp 190–191
Andersson M, Vestergren P (2004) Object-oriented design quality metrics. Citeseer. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.5047&rep=rep1&type=pdf
Apache Maven, http://maven.apache.org/
Arisholm E, Briand LC (2006) Predicting fault-prone components in a Java legacy system. In: Proceedings of the 2006 ACM/IEEE international symposium on empirical software engineering. ACM, pp 8– 17
Arisholm E, Briand LC, Fuglerud M (2007) Data mining techniques for building fault-proneness models in telecom Java software. In: The 18th IEEE international symposium on software reliability, 2007. ISSRE’07. IEEE, pp 215–224
Baker B (1992) A program for identifying duplicated code. In: Proceedings of 24th Symposium of Computing Science and Statistics, March 1992, pp 49–57
Basili VR, Caldiera G, Rombach HD (1994) The goal question metric approach. Encyclopedia of Software Engineering 1:528–532
Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761
Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473
Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: Proceedings of the international conference on software maintenance. IEEE Computer Society, p 368
Bellon S, Koschke R, Antoniol G, Krinke J, Merlo E (2007) Comparison and evaluation of clone detection tools. IEEE Trans Softw Eng 33(9):577–591
Benestad HC, Anda B, Arisholm E (2006) Assessing software product maintainability based on class-level structural measures. In: Product-focused software process improvement. Springer, pp 94–111
Borrego M, Douglas EP, Amelink CT (2009) Quantitative, qualitative, and mixed research methods in engineering education. J Eng Educ 98(1):53–66
Börstler J, Nordström M, Paterson JH (2011) On the quality of examples in introductory Java textbooks. ACM Transactions on Computing Education (TOCE) 11(1):3
Briand LC, Wüst J, Daly JW, Porter DV (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Softw 51(3):245–273
Bruntink M, van Deursen A, van Engelen R, Tourwé T (2005) On the use of clone detection for identifying cross cutting concern code. IEEE Trans Softw Eng 31(10):804–818
Cartwright M, Shepperd M (2000) An empirical investigation of an object-oriented software system. IEEE Trans Softw Eng 26(8):786–796
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
Cordy J (2003) Comprehending reality - practical barriers to industrial adoption of software maintenance automation. In: Proceedings of international conference on program comprehension, pp 196–205
de Wit M, Zaidman A, van Deursen A (2009) Managing code clones using dynamic change tracking and resolution. In: Proceedings of the 25th international conference on software maintenance (ICSM 2009). IEEE Computer Society
Ducasse S, Rieger M, Demeyer S (1999) A language independent approach for detecting duplicated code. In: Proceedings of the IEEE international conference on software maintenance. IEEE Computer Society, p 109
El Emam K, Benlarbi S, Goel N, Rai SN (2001) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630–650
Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code. Addison-Wesley Professional, Reading
Gode N, Harder J (2011) Clone stability. In: 15th European conference on software maintenance and reengineering (CSMR), 2011. IEEE, pp 65–74
Gode N, Koschke R (2009) Incremental clone detection. In: Proceedings of CSMR
Göde N, Koschke R (2011) Frequency and risks of changes to clones. In: Proceedings of the 33rd international conference on software engineering. ACM, pp 311–320
Gupta V, Aggarwal K, Singh Y (2005) A fuzzy approach for integrated measure of object-oriented software testability. J Comput Sci 1(2):276–282
Halstead MH (1977) Elements of software science (operating and programming systems series). Elsevier Science Inc.
Herzig K, Just S, Rau A, Zeller A (2013) Classifying code changes and predicting defects using changegenealogies. Technical Report, Tech. Rep. Saarland University, Germany
Hotta K, Sano Y, Higo Y, Kusumoto S (2010) Is duplicate code more frequently modified than non-duplicate code in software evolution?: an empirical study on open source software. In: Proceedings of the joint ERCIM workshop on software evolution (EVOL) and international workshop on principles of software evolution (IWPSE). ACM, pp 73–82
Islam MR, Zibran MF (2016) A comparative study on vulnerabilities in categories of clones and non-cloned code. In: IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol 3. IEEE, pp 8–14
Johnson JH (1993) Identifying redundancy in source code using fingerprints. In: Proceedings of the 1993 conference of the centre for advanced studies on collaborative research: software engineering, vol 1. IBM Press, Toronto, pp 171–183
Johnson JH (1994) Substring matching for clone detection and change tracking. In: International conference on software maintanence, pp 120–126
Juergens E, Deissenboeck F, Hummel B, Wagner S (2009) Do code clones matter?. In: Proceedings of ICSE, pp 485–495
Kafura D, Reddy G (1987) The use of software complexity metrics in software maintenance. IEEE Trans Softw Eng SE-13(3):335–343
Kamiya T, Kusumoto S, Inoue K (2002) CCFInder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28 (7):654–670
Kapser C, Godfrey M (2008) “Cloning considered harmful” considered harmful: patterns of cloning in software. Empir Softw Eng 13(6):645–692
Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. In: Proceedings of FSE
Komondoor R, Horwitz S (2001) Using slicing to identify duplication in source code. In: Proceedings of the 8th international symposium on static analysis. Springer, pp 40–56
Komondoor R, Horwitz S (2003) Effective automatic procedure extraction. In: Proceedings of the international workshop on program comprehension. Springer, pp 40–56
Koschke R (2007) Survey of research on software clones. In: Proceedings of duplication, redundancy, and similarity in software
Koschke R (2008) Identifying and removing software clones. In: Software evolution. Springer, pp 15–36
Koschke R (2012) Large-scale inter-system clone detection using suffix trees. In: Proceedings of CSMR, pp 309–318
Krinke J (2001) Identifying similar code with program dependence graphs. In: Proceedings of the eighth working conference on reverse engineering (WCRE’01). IEEE Computer Society, p 301
Krinke J (2007) A study of consistent and inconsistent changes to code clones. In: Working conference on reverse engineering, pp 170–178
Krinke J (2008) Is cloned code more stable than non-cloned code?. In: Eighth IEEE international working conference on source code analysis and manipulation, 2008. IEEE, pp 57–66
Kutner MH, Nachtsheim C, Neter J, Li W (2005) Applied Linear Statistical Models. McGraw-Hill Irwin
Li W, Henry S (1993) Object-oriented metrics that predict maintainability. J Syst Softw 23(2):111–122
Lopes CV, Ossher J (2015) How scale affects structure in Java programs. In: Proceedings of the 2015 ACM SIGPLAN international conference on object-oriented programming, systems, languages, and applications. ACM, pp 675–694
Lopes C, Bajracharya S, Ossher J, Baldi P (2010) UCI Source code data sets. [Online]. Available: http://www.ics.uci.edu/~lopes/datasets/
Lozano A, Wermelinger M (2008) Assessing the effect of clones on changeability. In: IEEE international conference on software maintenance, 2008. ICSM 2008. IEEE, pp 227–236
Lozano A, Wermelinger M, Nuseibeh B (2007) Evaluating the harmfulness of cloning: a change based experiment. In: Mining software repositories, pp 18–22
Lucrédio D, de Almeida ES, Fortes RP (2012) An investigation on the impact of mde on software reuse. In: Sixth brazilian symposium on software components architectures and reuse (SBCARS), 2012. IEEE, pp 101–110
McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 4:308–320
Miles MB, Huberman AM (1994) Qualitative data analysis: an expanded sourcebook. Sage
Mondal M, Roy CK, Rahman MS, Saha RK, Krinke J, Schneider KA (2012) Comparative stability of cloned and non-cloned code: an empirical study. In: Proceedings of the 27th annual ACM symposium on applied computing. ACM, pp 1227–1234
Mubarak A, Counsell S, Hierons R (2009) Does an 80: 20 rule apply to Java coupling?. In: Proceedings of the international conference on evaluation and assessment in software engineering
Mubarak A, Counsell S, Hierons RM (2010) An evolutionary study of fan-in and fan-out metrics in OSS. In: Fourth international conference on research challenges in information science (RCIS), 2010. IEEE, pp 473–482
Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th international conference on software engineering. ACM
Nasseri E, Counsell S, Shepperd M (2008) An empirical study of evolution of inheritance in Java OSS. In: 19th Australian conference on software engineering, 2008. ASWEC 2008. IEEE, pp 269–278
Ossher J, Sajnani H, Lopes CV (2011) File cloning in open source Java projects: the good, the bad, and the ugly. In: ICSM. IEEE
Parnas DL (1972) On the criteria to be used in decomposing systems into modules. Commun ACM 15(12):1053–1058
Rahman F, Bird C, Devanbu P (2012) Clones: what is that smell? Empir Softw Eng 17(4–5):503–530
Rajapakse DC, Jarzabek S (2007) Using server pages to unify clones in web applications: a trade-off analysis. In: Proceedings of international conference on software engineering, pp 116–126
Roy CK, Cordy JR (2007) A survey on software clone detection research. Technical report, Queen’s University at Kingston
Roy CK, Cordy JR (2009) A mutation/injection-based automatic framework for evaluating code clone detection tools. In: International conference on software testing, verification and validation workshops, 2009. ICSTW’09. IEEE, pp 157–166
Roy CK, Cordy JR, Koschke R (2009) Comparison and evaluation of code clone detection techniques and tools: a qualitative approach. Sci Comput Program 74 (7):470–495
Saini V, Sajnani H, Lopes C (2016) Comparing quality metrics for cloned and non cloned java methods: a large scale empirical study. In: IEEE international conference on software maintenance and evolution (ICSME), 2016. IEEE, pp 256–266
Sajnani H, Saini V, Lopes CV (2014a) A comparative study of bug patterns in Java cloned and non-cloned code. In: IEEE 14th international working conference on source code analysis and manipulation (SCAM), 2014. IEEE, pp 21–30
Sajnani H, Saini V, Ossher J, Lopes C (2014b) Is popularity a measure of its quality? An analysis of maven components. In: Proceedings of the 30th software maintenance and evolution(to appear in ICSME 2014). IEEE Computer Society
Sajnani H, Saini V, Svajlenko J, Roy CK, Lopes CV (2016) Sourcerercc: scaling code clone detection to big code. In: Proceedings of international conference on software engineering, to appear
Samoladas I, Gousios G, Spinellis D, Stamelos I (2008) The SQO-OSS quality model: measurement based open source software evaluation. In: Open source development, communities and quality, pp 237–248
Scandariato R, Walden J (2012) Predicting vulnerable classes in an android application. In: Proceedings of the 4th international workshop on security measurements and metrics. ACM, pp 11–16
Shomrat M, Feldman Y (2013) Detecting refactored clones. In: Castagna G (ed) ECOOP 2013 European condeference on object-oriented programming, ser. Lecture notes in computer science, vol 7920. Springer, Berlin, pp 502–526
Subramanyam R, Krishnan MS (2003) Empirical analysis of ck metrics for object-oriented design complexity: Implications for software defects. IEEE Trans Softw Eng 29(4):297–310. [Online]. Available: https://doi.org/10.1109/TSE.2003.1191795
Svajlenko J, Islam JF, Keivanloo I, Roy CK, Mia MM (2014) Towards a big data curated benchmark of inter-project code clones. In: IEEE international conference on software maintenance and evolution (ICSME), 2014. IEEE, pp 476–480
Wang T, Harman M, Jia Y, Krinke J (2013) Searching for better configurations: a rigorous approach to clone evaluation. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 455–465
Xie S, Khomh F, Zou Y, Keivanloo I (2014) An empirical study on the fault-proneness of clone migration in clone genealogies. In: Proceedings of CSMR-WCRE. IEEE, pp 94–103
Acknowledgments
This work was partially supported by a grant from the National Science Foundation No.1218228, and by the DARPA MUSE program.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Bram Adams and Denys Poshyvanyk
Rights and permissions
About this article
Cite this article
Saini, V., Sajnani, H. & Lopes, C. Cloned and non-cloned Java methods: a comparative study. Empir Software Eng 23, 2232–2278 (2018). https://doi.org/10.1007/s10664-017-9572-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-017-9572-7