Skip to main content
Log in

Cloned and non-cloned Java methods: a comparative study

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Reusing code via copy-and-paste, with or without modification is a common behavior observed in software engineering. Traditionally, cloning has been considered as a bad smell suggesting flaws in design decisions. Many studies exist targeting clone discovery, removal, and refactoring. However there are not many studies which empirically investigate and compare the quality of cloned code to that of the code which has not been cloned. To this end, we present a statistical study that shows whether qualitative differences exist between cloned methods and non-cloned methods in Java projects. The dataset consists of 3562 open source Java projects containing 412,705 cloned and 616,604 non-cloned methods. The study uses 27 software metrics as a proxy for quality, spanning across complexity, modularity, and documentation (code-comments) categories. When controlling for size, no statistically significant differences were found between cloned and non-cloned methods for most of the metrics, except for three of them. The main statistically significant difference found was that cloned methods are on an average 18% smaller than non-cloned methods. After doing a mixed method analysis, we provide some insight for why cloned methods are smaller.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Listing 1
Listing 2
Listing 3
Listing 4
Listing 5
Listing 6
Listing 7
Listing 8
Fig. 24
Listing 9
Listing 10
Listing 11
Fig. 25
Listing 12

Similar content being viewed by others

Notes

  1. Specific hardware details and testing conditions are in the original article.

References

  • Alghamdi JS, Rufai RA, Khan SM (2005) Oometer: a software quality assurance tool. In: IEEE, pp 190–191

  • Andersson M, Vestergren P (2004) Object-oriented design quality metrics. Citeseer. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.5047&rep=rep1&type=pdf

  • Apache Maven, http://maven.apache.org/

  • Arisholm E, Briand LC (2006) Predicting fault-prone components in a Java legacy system. In: Proceedings of the 2006 ACM/IEEE international symposium on empirical software engineering. ACM, pp 8– 17

  • Arisholm E, Briand LC, Fuglerud M (2007) Data mining techniques for building fault-proneness models in telecom Java software. In: The 18th IEEE international symposium on software reliability, 2007. ISSRE’07. IEEE, pp 215–224

  • Baker B (1992) A program for identifying duplicated code. In: Proceedings of 24th Symposium of Computing Science and Statistics, March 1992, pp 49–57

  • Basili VR, Caldiera G, Rombach HD (1994) The goal question metric approach. Encyclopedia of Software Engineering 1:528–532

    Google Scholar 

  • Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761

    Article  Google Scholar 

  • Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473

    Article  Google Scholar 

  • Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: Proceedings of the international conference on software maintenance. IEEE Computer Society, p 368

  • Bellon S, Koschke R, Antoniol G, Krinke J, Merlo E (2007) Comparison and evaluation of clone detection tools. IEEE Trans Softw Eng 33(9):577–591

    Article  Google Scholar 

  • Benestad HC, Anda B, Arisholm E (2006) Assessing software product maintainability based on class-level structural measures. In: Product-focused software process improvement. Springer, pp 94–111

  • Borrego M, Douglas EP, Amelink CT (2009) Quantitative, qualitative, and mixed research methods in engineering education. J Eng Educ 98(1):53–66

    Article  Google Scholar 

  • Börstler J, Nordström M, Paterson JH (2011) On the quality of examples in introductory Java textbooks. ACM Transactions on Computing Education (TOCE) 11(1):3

    Google Scholar 

  • Briand LC, Wüst J, Daly JW, Porter DV (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Softw 51(3):245–273

    Article  Google Scholar 

  • Bruntink M, van Deursen A, van Engelen R, Tourwé T (2005) On the use of clone detection for identifying cross cutting concern code. IEEE Trans Softw Eng 31(10):804–818

    Article  Google Scholar 

  • Cartwright M, Shepperd M (2000) An empirical investigation of an object-oriented software system. IEEE Trans Softw Eng 26(8):786–796

    Article  Google Scholar 

  • Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  • Cordy J (2003) Comprehending reality - practical barriers to industrial adoption of software maintenance automation. In: Proceedings of international conference on program comprehension, pp 196–205

  • de Wit M, Zaidman A, van Deursen A (2009) Managing code clones using dynamic change tracking and resolution. In: Proceedings of the 25th international conference on software maintenance (ICSM 2009). IEEE Computer Society

  • Ducasse S, Rieger M, Demeyer S (1999) A language independent approach for detecting duplicated code. In: Proceedings of the IEEE international conference on software maintenance. IEEE Computer Society, p 109

  • El Emam K, Benlarbi S, Goel N, Rai SN (2001) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630–650

    Article  Google Scholar 

  • Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code. Addison-Wesley Professional, Reading

    Google Scholar 

  • Gode N, Harder J (2011) Clone stability. In: 15th European conference on software maintenance and reengineering (CSMR), 2011. IEEE, pp 65–74

  • Gode N, Koschke R (2009) Incremental clone detection. In: Proceedings of CSMR

  • Göde N, Koschke R (2011) Frequency and risks of changes to clones. In: Proceedings of the 33rd international conference on software engineering. ACM, pp 311–320

  • Gupta V, Aggarwal K, Singh Y (2005) A fuzzy approach for integrated measure of object-oriented software testability. J Comput Sci 1(2):276–282

    Article  Google Scholar 

  • Halstead MH (1977) Elements of software science (operating and programming systems series). Elsevier Science Inc.

  • Herzig K, Just S, Rau A, Zeller A (2013) Classifying code changes and predicting defects using changegenealogies. Technical Report, Tech. Rep. Saarland University, Germany

    Google Scholar 

  • Hotta K, Sano Y, Higo Y, Kusumoto S (2010) Is duplicate code more frequently modified than non-duplicate code in software evolution?: an empirical study on open source software. In: Proceedings of the joint ERCIM workshop on software evolution (EVOL) and international workshop on principles of software evolution (IWPSE). ACM, pp 73–82

  • Islam MR, Zibran MF (2016) A comparative study on vulnerabilities in categories of clones and non-cloned code. In: IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol 3. IEEE, pp 8–14

  • Jhawk, http://mondego.ics.uci.edu/projects/clonedetection/

  • Johnson JH (1993) Identifying redundancy in source code using fingerprints. In: Proceedings of the 1993 conference of the centre for advanced studies on collaborative research: software engineering, vol 1. IBM Press, Toronto, pp 171–183

  • Johnson JH (1994) Substring matching for clone detection and change tracking. In: International conference on software maintanence, pp 120–126

  • Juergens E, Deissenboeck F, Hummel B, Wagner S (2009) Do code clones matter?. In: Proceedings of ICSE, pp 485–495

  • Kafura D, Reddy G (1987) The use of software complexity metrics in software maintenance. IEEE Trans Softw Eng SE-13(3):335–343

    Article  Google Scholar 

  • Kamiya T, Kusumoto S, Inoue K (2002) CCFInder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28 (7):654–670

    Article  Google Scholar 

  • Kapser C, Godfrey M (2008) “Cloning considered harmful” considered harmful: patterns of cloning in software. Empir Softw Eng 13(6):645–692

    Article  Google Scholar 

  • Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. In: Proceedings of FSE

  • Komondoor R, Horwitz S (2001) Using slicing to identify duplication in source code. In: Proceedings of the 8th international symposium on static analysis. Springer, pp 40–56

  • Komondoor R, Horwitz S (2003) Effective automatic procedure extraction. In: Proceedings of the international workshop on program comprehension. Springer, pp 40–56

  • Koschke R (2007) Survey of research on software clones. In: Proceedings of duplication, redundancy, and similarity in software

  • Koschke R (2008) Identifying and removing software clones. In: Software evolution. Springer, pp 15–36

  • Koschke R (2012) Large-scale inter-system clone detection using suffix trees. In: Proceedings of CSMR, pp 309–318

  • Krinke J (2001) Identifying similar code with program dependence graphs. In: Proceedings of the eighth working conference on reverse engineering (WCRE’01). IEEE Computer Society, p 301

  • Krinke J (2007) A study of consistent and inconsistent changes to code clones. In: Working conference on reverse engineering, pp 170–178

  • Krinke J (2008) Is cloned code more stable than non-cloned code?. In: Eighth IEEE international working conference on source code analysis and manipulation, 2008. IEEE, pp 57–66

  • Kutner MH, Nachtsheim C, Neter J, Li W (2005) Applied Linear Statistical Models. McGraw-Hill Irwin

  • Li W, Henry S (1993) Object-oriented metrics that predict maintainability. J Syst Softw 23(2):111–122

    Article  Google Scholar 

  • Lopes CV, Ossher J (2015) How scale affects structure in Java programs. In: Proceedings of the 2015 ACM SIGPLAN international conference on object-oriented programming, systems, languages, and applications. ACM, pp 675–694

  • Lopes C, Bajracharya S, Ossher J, Baldi P (2010) UCI Source code data sets. [Online]. Available: http://www.ics.uci.edu/~lopes/datasets/

  • Lozano A, Wermelinger M (2008) Assessing the effect of clones on changeability. In: IEEE international conference on software maintenance, 2008. ICSM 2008. IEEE, pp 227–236

  • Lozano A, Wermelinger M, Nuseibeh B (2007) Evaluating the harmfulness of cloning: a change based experiment. In: Mining software repositories, pp 18–22

  • Lucrédio D, de Almeida ES, Fortes RP (2012) An investigation on the impact of mde on software reuse. In: Sixth brazilian symposium on software components architectures and reuse (SBCARS), 2012. IEEE, pp 101–110

  • McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 4:308–320

    Article  MathSciNet  MATH  Google Scholar 

  • Miles MB, Huberman AM (1994) Qualitative data analysis: an expanded sourcebook. Sage

  • Mondal M, Roy CK, Rahman MS, Saha RK, Krinke J, Schneider KA (2012) Comparative stability of cloned and non-cloned code: an empirical study. In: Proceedings of the 27th annual ACM symposium on applied computing. ACM, pp 1227–1234

  • Mubarak A, Counsell S, Hierons R (2009) Does an 80: 20 rule apply to Java coupling?. In: Proceedings of the international conference on evaluation and assessment in software engineering

  • Mubarak A, Counsell S, Hierons RM (2010) An evolutionary study of fan-in and fan-out metrics in OSS. In: Fourth international conference on research challenges in information science (RCIS), 2010. IEEE, pp 473–482

  • Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th international conference on software engineering. ACM

  • Nasseri E, Counsell S, Shepperd M (2008) An empirical study of evolution of inheritance in Java OSS. In: 19th Australian conference on software engineering, 2008. ASWEC 2008. IEEE, pp 269–278

  • Ossher J, Sajnani H, Lopes CV (2011) File cloning in open source Java projects: the good, the bad, and the ugly. In: ICSM. IEEE

  • Parnas DL (1972) On the criteria to be used in decomposing systems into modules. Commun ACM 15(12):1053–1058

    Article  Google Scholar 

  • Rahman F, Bird C, Devanbu P (2012) Clones: what is that smell? Empir Softw Eng 17(4–5):503–530

    Article  Google Scholar 

  • Rajapakse DC, Jarzabek S (2007) Using server pages to unify clones in web applications: a trade-off analysis. In: Proceedings of international conference on software engineering, pp 116–126

  • Roy CK, Cordy JR (2007) A survey on software clone detection research. Technical report, Queen’s University at Kingston

  • Roy CK, Cordy JR (2009) A mutation/injection-based automatic framework for evaluating code clone detection tools. In: International conference on software testing, verification and validation workshops, 2009. ICSTW’09. IEEE, pp 157–166

  • Roy CK, Cordy JR, Koschke R (2009) Comparison and evaluation of code clone detection techniques and tools: a qualitative approach. Sci Comput Program 74 (7):470–495

    Article  MathSciNet  MATH  Google Scholar 

  • Saini V, Sajnani H, Lopes C (2016) Comparing quality metrics for cloned and non cloned java methods: a large scale empirical study. In: IEEE international conference on software maintenance and evolution (ICSME), 2016. IEEE, pp 256–266

  • Sajnani H, Saini V, Lopes CV (2014a) A comparative study of bug patterns in Java cloned and non-cloned code. In: IEEE 14th international working conference on source code analysis and manipulation (SCAM), 2014. IEEE, pp 21–30

  • Sajnani H, Saini V, Ossher J, Lopes C (2014b) Is popularity a measure of its quality? An analysis of maven components. In: Proceedings of the 30th software maintenance and evolution(to appear in ICSME 2014). IEEE Computer Society

  • Sajnani H, Saini V, Svajlenko J, Roy CK, Lopes CV (2016) Sourcerercc: scaling code clone detection to big code. In: Proceedings of international conference on software engineering, to appear

  • Samoladas I, Gousios G, Spinellis D, Stamelos I (2008) The SQO-OSS quality model: measurement based open source software evaluation. In: Open source development, communities and quality, pp 237–248

  • Scandariato R, Walden J (2012) Predicting vulnerable classes in an android application. In: Proceedings of the 4th international workshop on security measurements and metrics. ACM, pp 11–16

  • Shomrat M, Feldman Y (2013) Detecting refactored clones. In: Castagna G (ed) ECOOP 2013 European condeference on object-oriented programming, ser. Lecture notes in computer science, vol 7920. Springer, Berlin, pp 502–526

  • Subramanyam R, Krishnan MS (2003) Empirical analysis of ck metrics for object-oriented design complexity: Implications for software defects. IEEE Trans Softw Eng 29(4):297–310. [Online]. Available: https://doi.org/10.1109/TSE.2003.1191795

    Article  Google Scholar 

  • Svajlenko J, Islam JF, Keivanloo I, Roy CK, Mia MM (2014) Towards a big data curated benchmark of inter-project code clones. In: IEEE international conference on software maintenance and evolution (ICSME), 2014. IEEE, pp 476–480

  • Wang T, Harman M, Jia Y, Krinke J (2013) Searching for better configurations: a rigorous approach to clone evaluation. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 455–465

  • Xie S, Khomh F, Zou Y, Keivanloo I (2014) An empirical study on the fault-proneness of clone migration in clone genealogies. In: Proceedings of CSMR-WCRE. IEEE, pp 94–103

Download references

Acknowledgments

This work was partially supported by a grant from the National Science Foundation No.1218228, and by the DARPA MUSE program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vaibhav Saini.

Additional information

Communicated by: Bram Adams and Denys Poshyvanyk

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saini, V., Sajnani, H. & Lopes, C. Cloned and non-cloned Java methods: a comparative study. Empir Software Eng 23, 2232–2278 (2018). https://doi.org/10.1007/s10664-017-9572-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9572-7

Keywords

Navigation