Abstract
Deep Learning applications are becoming increasingly popular worldwide. Developers of deep learning systems like in every other context of software development strive to write more efficient code in terms of performance, complexity, and maintenance. The continuous evolution of deep learning systems imposing tighter development timelines and their increasing complexity may result in bad design decisions by the developers. Besides, due to the use of common frameworks and repetitive implementation of similar tasks, deep learning developers are likely to use the copy-paste practice leading to clones in deep learning code. Code clone is considered to be a bad software development practice since developers can inadvertently fail to properly propagate changes to all clones fragments during a maintenance activity. However, to the best of our knowledge, no study has investigated code cloning practices in deep learning development. The majority of research on deep learning systems mostly focusing on improving the dependability of the models. Given the negative impacts of clones on software quality reported in the studies on traditional systems and the inherent complexity of maintaining deep learning systems (e.g., bug fixing), it is very important to understand the characteristics and potential impacts of code clones on deep learning systems. This paper examines the frequency, distribution, and impacts of code clones and the code cloning practices in deep learning systems. To accomplish this, we use the NiCad clone detection tool to detect clones from 59 Python, 14 C#, and 6 Java based deep learning systems and an equal number of traditional software systems. We then analyze the comparative frequency and distribution of code clones in deep learning systems and the traditional ones. Further, we study the distribution of the detected code clones by applying a location based taxonomy. In addition, we study the correlation between bugs and code clones to assess the impacts of clones on the quality of the studied systems. Finally, we introduce a code clone taxonomy related to deep learning programs based on 6 DL software systems (from 59 DL systems) and identify the deep learning system development phases in which cloning has the highest risk of faults. Our results show that code cloning is a frequent practice in deep learning systems and that deep learning developers often clone code from files contain in distant repositories in the system. In addition, we found that code cloning occurs more frequently during DL model construction, model training, and data pre-processing. And that hyperparameters setting is the phase of deep learning model construction during which cloning is the riskiest, since it often leads to faults.
Similar content being viewed by others
References
Al Dallal J, Abdin A (2017) Empirical evaluation of the impact of object-oriented code refactoring on quality attributes: a systematic literature review. IEEE Trans Softw Eng 44(1):44–69
Amershi S, Begel A, Bird C, DeLine R, Gall H, Kamar E, Nagappan N, Nushi B, Zimmermann T (2019) Software engineering for machine learning: a case study. In: 2019 IEEE/ACM 41St international conference on software engineering: Software engineering in practice (ICSE-SEIP). IEEE, pp 291–300
Anwar H, Pfahl D, Srirama SN (2019) Evaluating the impact of code smell refactoring on the energy consumption of android applications. In: 2019 45Th euromicro conference on software engineering and advanced applications (SEAA). https://doi.org/10.1109/SEAA.2019.00021, pp 82–86
Aversano L, Cerulo L, Di Penta M (2007) How clones are maintained: an empirical study. In: 11Th european conference on software maintenance and reengineering (CSMR’07). IEEE, pp 81–90
Barbour L, An L, Khomh F, Zou Y, Wang S (2018) An investigation of the fault-proneness of clone evolutionary patterns. Softw Qual J 26 (4):1187–1222
Barbour L, Khomh F, Zou Y (2011) Late propagation in software clones. In: 2011 27Th IEEE international conference on software maintenance (ICSM). IEEE, pp 273–282
Barbour L, Khomh F, Zou Y (2013) An empirical study of faults in late propagation clone genealogies. J Soft Evol Proc 25:1139–1165
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305
Bordes A, Chopra S, Weston J (2014) Question answering with subgraph embeddings. arXiv:1406.3676
Braiek HB, Khomh F (2019) Deepevolution: A search-based testing approach for deep neural networks. 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 454–458
Braiek HB, Khomh F (2020) On testing machine learning programs. J Syst Softw 164:110542
Braiek HB, Khomh F, Adams B (2018) The open-closed principle of modern machine learning frameworks. In: 2018 IEEE/ACM 15Th international conference on mining software repositories (MSR). IEEE, pp 353–363
Breck E, Cai S, Nielsen E, Salib M, Sculley D (2017) The ml test score: a rubric for ml production readiness and technical debt reduction. In: 2017 IEEE International conference on big data (big data). IEEE, pp 1123–1132
Buckley FJ, Poston R (1984) Software quality assurance. IEEE Trans Softw Eng (1):36–41
Chen B (2019) Berrynet deep learning gateway on raspberry pi and other edge devices. https://github.com/DT42/berrynet
Chen Z, Cao Y, Liu Y, Wang H, Xie T, Liu X (2020) Understanding challenges in deploying deep learning based software: An empirical study. arXiv:2005.00760
Chen Z, Chen L, Ma W, Zhou X, Zhou Y, Xu B (2018) Understanding metric-based detectable smells in python software: a comparative study. Inf Softw Technol 94:14–29
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(ARTICLE):2493–2537
Cordy JR, Roy CK (2011) The nicad clone detector. In: 2011 IEEE 19Th international conference on program comprehension. IEEE, pp 219–220
Cordy JR, Roy CK (2019) NiCad clone detector. https://www.txl.ca/txl-nicaddownload.html. [Online; accessed 20-February-2020]
Ernst N (2019) Cliff’s delta implementation. https://github.com/neilernst/cliffsDelta
Farabet C, Couprie C, Najman L, LeCun Y (2012) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35 (8):1915–1929
Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: Improving the design of existing code. Addison-Wesley Professional. Berkeley
Gode N, Harder J (2011) Clone stability. In: 2011 15Th european conference on software maintenance and reengineering. IEEE, pp 65–74
Göde N., Koschke R (2011) Frequency and risks of changes to clones. In: Proceedings of the 33rd International Conference on Software Engineering, pp 311–320
Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press. http://www.deeplearningbook.org
Gottschalk M, Josefiok M, Jelschen J, Winter A (2012) Removing energy code smells with reengineering services. Informatik
Gupta R, Pal S, Kanade A, Shevade S (2017) Deepfix: Fixing common c language errors by deep learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp 1345–1351
Hadhemii (2019) DLCOdesmells. https://github.com/hadhemii/DLCodesmells/blob/master/data/dlrepos.csv
Hamdan S, Alramouni S (2015) A quality framework for software continuous integration. Procedia Manuf 3:2019–2025
Han J, Shihab E, Wan Z, Deng S, Xia X (2020) What do programmers discuss about deep learning frameworks. Empirical Software Engineering
Heaton JB, Polson NG, Witte JH (2017) Deep learning for finance: deep portfolios. Appl Stoch Model Bus Ind 33(1):3–12
Hinton G, Deng L, Yu D, Dahl GE, Mohamed AR, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN et al (2012) Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process Mag 29(6):82–97
Hotta K, Sano Y, Higo Y, Kusumoto S (2010) Is duplicate code more frequently modified than non-duplicate code in software evolution? an empirical study on open source software. In: Proceedings of the Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE), pp 73–82
Islam JF, Mondal M, Roy CK (2016) Bug replication in code clones: an empirical study. In: 2016 IEEE 23Rd international conference on software analysis, evolution, and reengineering (SANER), vol 1. IEEE, pp 68–78
Islam JF, Mondal M, Roy CK, Schneider KA (2017) A comparative study of software bugs in clone and non-clone code. In: SEKE, pp 436–443
Islam MJ, Nguyen G, Pan R, Rajan H (2019) A comprehensive study on deep learning bug characteristics. arXiv:1906.01388
Jebnoun H (2020) 6dlreposdata. https://github.com/hadhemii/clonesinDLCode/blob/master/data/6DLReposdata.csv
Jebnoun H (2020) Clones in deep learning code. https://github.com/Hadhemii/ClonesInDLCode
Jebnoun H, Ben Braiek H, Rahman MM, Khomh F (2020) The scent of deep learning code: an empirical study. In: Proceedings of the 17th International Conference on Mining Software Repositories, pp 1–11
Jiang L, Su Z, Chiu E (2007) Context-based detection of clone-related bugs. In: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pp 55–64
Juergens E, Deissenboeck F, Hummel B, Wagner S (2009) Do code clones matter?. In: 2009 IEEE 31St international conference on software engineering. IEEE, pp 485–495
Kapser C, Godfrey MW (2003) Toward a taxonomy of clones in source code: a case study. Evol Large Scale Indust Softw Architect 16:107–113
Kapser CJ, Godfrey MW (2008) “cloning considered harmful” considered harmful: patterns of cloning in software. Empir Softw Eng 13(6):645
Kery MB, Radensky M, Arya M, John BE, Myers BA (2018) The story in the notebook: Exploratory data science using a literate programming tool. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pp 1–11
Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. In: Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pp 187–196
Kim S, Whitehead Jr EJ (2006) How long did it take to fix bugs?. In: Proceedings of the 2006 international workshop on Mining software repositories, pp 173–174
Koenzen A, Ernst N, Storey MA (2020) Code duplication and reuse in jupyter notebooks. arXiv:2005.13709
Koschke R (2007) Survey of research on software clones. In: Dagstuhl seminar proceedings. Schloss dagstuhl-leibniz-zentrum für informatik
Krinke J (2011) Is cloned code older than non-cloned code?. In: Proceedings of the 5th International Workshop on Software Clones, pp 28–33
Kumlander D (2010) Towards a new paradigm of software development: an ambassador driven process in distributed software companies. In: Advanced techniques in computing sciences and software engineering. Springer, pp 487–490
LeCun Y, Boser BE, Denker JS, Henderson D, Howard RE, Hubbard WE, Jackel LD (1990) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, pp 396–404
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Li J, Ernst MD (2012) CBCD: Cloned Buggy code detector. In: Proceedings of ICSE, pp 310–320
Li X, Jiang H, Ren Z, Li G, Zhang J (2018) Deep learning in software engineering. arXiv:1805.04825
Li Z, Lu S, Myagmar S, Zhou Y (2006) CP-Miner: Finding copy-paste and related bugs in large-scale software code. IEEE TSE 32:176–192
Liu J, Huang Q, Xia X, Shihab E, Lo D, Li S (2020) Is using deep learning frameworks free? characterizing technical debt in deep learning frameworks. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Society, pp 1–10
Lozano A, Wermelinger M (2008) Assessing the effect of clones on changeability. In: Proceedings of ICSM, pp 227–236
Lozano A, Wermelinger M (2010) Tracking clones’ imprint. In: Proceedings of the 4th International Workshop on Software Clones, pp 65–72
Macbeth G, Razumiejczyk E, Ledesma RD (2011) Cliff’s delta calculator: a non-parametric effect size program for two groups of observations. Univ Psychol 10(2):545–555
Miotto R, Wang F, Wang S, Jiang X, Dudley JT (2018) Deep learning for healthcare: review, opportunities and challenges. Brief Bioinf 19(6):1236–1246
Mondal M, Rahman MS, Roy CK, Schneider KA (2018) Is cloned code really stable?. Empir Softw Engg 23(2):693–770
Mondal M, Roy B, Roy CK, Schneider KA (2019) Investigating context adaptation bugs in code clones. In: 2019 IEEE International conference on software maintenance and evolution (ICSME), pp 157–168
Mondal M, Roy CK, Rahman MS, Saha RK, Krinke J, Schneider KA (2012) Comparative stability of cloned and non-cloned code: an empirical study. In: Proceedings of ACM SAC, pp 1227–1234
Mondal M, Roy CK, Schneider KA (2015) A comparative study on the bug-proneness of different types of code clones. In: 2015 IEEE International conference on software maintenance and evolution (ICSME), pp 91–100
Mondal M, Roy CK, Schneider KA (2017) Does cloned code increase maintenance effort?. In: 2017 IEEE 11Th international workshop on software clones (IWSC). IEEE, pp 1–7
Munappy A, Bosch J, Olsson HH, Arpteg A, Brinne B (2019) Data management challenges for deep learning. In: 2019 45Th euromicro conference on software engineering and advanced applications (SEAA). IEEE, pp 140–147
Neuhäuser M (2011) Wilcoxon–Mann–Whitney Test. Springer, Berlin, pp 1656–1658. https://doi.org/10.1007/978-3-642-04898-2_615
Nguyen H, Kieu LM, Wen T, Cai C (2018) Deep learning methods in transportation domain: a review. IET Intell Transp Syst 12(9):998–1004
Pasumarthi RK, Bruch S, Wang X, Li C, Bendersky M, Najork M, Pfeifer J, Golbandi N, Anil R, Wolf S (2019) Tensorflow ranking. https://github.com/tensorflow/ranking
Pimentel JF, Murta L, Braganholo V, Freire J (2019) A large-scale study about quality and reproducibility of jupyter notebooks. In: 2019 IEEE/ACM 16Th international conference on mining software repositories (MSR). IEEE, pp 507–517
Psallidas F, Zhu Y, Karlas B, Interlandi M, Floratou A, Karanasos K, Wu W, Zhang C, Krishnan S, Curino C et al (2019) Data science through the looking glass and what we found there. arXiv:1912.09536
Rahman F, Bird C, Devanbu P (2012) Clones: What is that smell? Empir Softw Eng 17(4-5):503–530
Rahman MS, Roy CK (2014) A change-type based empirical study on the stability of cloned code. In: Proceedings of SCAM, pp 31–40
Rahman MS, Roy CK (2017) On the relationships between stability and bug-proneness of code clones: an empirical study. In: Proceedings of SCAM, pp 131–140
Rampasek L, Goldenberg A (2016) Tensorflow: Biology’s gateway to deep learning? Cell Syst 2(1):12–14
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Rochimah S, Arifiani S, Insanittaqwa VF (2015) Non-source code refactoring: a systematic literature review. Int J Softw Eng Appl 9(6):197–214
Rosen C, Grawi B, Shihab E (2015) Commit guru: analytics and risk prediction of software commits. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, pp 966–969
Roy CK, Cordy JR (2007) A survey on software clone detection research. Queen’s School Comput TR 541(115):64–68
Roy CK, Cordy JR (2008) Nicad: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In: 2008 16Th iEEE international conference on program comprehension. IEEE, pp 172–181
Roy CK, Cordy JR (2009) A mutation/injection-based automatic framework for evaluating code clone detection tools. In: 2009 International conference on software testing, verification, and validation workshops. IEEE, pp 157–166
Roy CK, Cordy JR (2010) Near-miss function clones in open source software: an empirical study. J Softw Mainten Evol Res Practice 22(3):165–189
Roy CK, Zibran MF, Koschke R (2014) The vision of software clone management: past, present, and future (keynote paper). In: Proceedings of CSMR-WCRE, pp 18–33
Sainath TN, Mohamed AR, Kingsbury B, Ramabhadran B (2013) Deep convolutional neural networks for lvcsr. In: 2013 IEEE International conference on acoustics, speech and signal processing. IEEE, pp 8614–8618
Saini V, Sajnani H, Lopes C (2018) Cloned and non-cloned java methods: A comparative study. Empir Softw Engg 23(4):2232–2278. https://doi.org/10.1007/s10664-017-9572-7
Sajnani H, Saini V, Lopes CV (2014) A comparative study of bug patterns in java cloned and non-cloned code. In: 2014 IEEE 14Th international working conference on source code analysis and manipulation. IEEE, pp 21–30
Samek W, Wiegand T, Müller KR (2017) Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv:1708.08296
Sculley D, Holt G, Golovin D, Davydov E, Phillips T, Ebner D, Chaudhary V, Young M, Crespo JF, Dennison D (2015) Hidden technical debt in machine learning systems. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in Neural Information Processing Systems. http://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf, vol 28. Curran Associates, Inc., pp 2503–2511
Selim GM, Barbour L, Shang W, Adams B, Hassan AE, Zou Y (2010) Studying the impact of clones on software defects. In: 2010 17Th working conference on reverse engineering. IEEE, pp 13–21
Shen H (2014) Interactive notebooks: Sharing the code. Nature 515 (7525):151–152
Spadini D, Aniche M, Bacchelli A (2018) PyDriller: Python Framework for Mining Software Repositories. https://doi.org/10.1145/3236024.3264598
Suryanarayana G, Samarthyam G, Sharma T (2014) Refactoring for software design smells: managing technical debt. Morgan Kaufmann
Svajlenko J, Roy CK (2014) Evaluating modern clone detection tools. In: 2014 IEEE International conference on software maintenance and evolution. IEEE, pp 321–330
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, pp 1799–1807
Vetro A, Ardito L, Morisio M (2013) Definition implementation and validation of energy code smells: an exploratory study on an embedded system
Wagner S, Abdulkhaleq A, Kaya K, Paar A (2016) On the relationship of inconsistent software clones and faults: an empirical study. In: Proceedings of SANER, pp 79–89
Wan Z, Xia X, Lo D, Murphy GC (2019) How does machine learning change software development practices? IEEE Transactions on Software Engineering
Wang T, Harman M, Jia Y, Krinke J (2013) Searching for better configurations: a rigorous approach to clone evaluation. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pp 455–465
Weill C, Gonzalvo J, Kuznetsov V, Yang S, Yak S, Mazzawi H, Hotaj E, Jerfel G, Macko V, Adlam B, Mohri M, Cortes C (2019) Adanet. https://github.com/tensorflow/adanet
Wheeler DA (2004) SLOCCount. https://dwheeler.com/sloccount/. [Online; accessed 19-May-2020]
White M, Tufano M, Vendome C, Poshyvanyk D (2016) Deep learning code fragments for code clone detection. In: 2016 31St IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 87–98
Zhang JM, Harman M, Ma L, Liu Y (2020) Machine learning testing: survey, landscapes and horizons. IEEE Transactions on Software Engineering
Zhang X, Yang Y, Feng Y, Chen Z (2019) Software engineering practice in the development of deep learning applications. arXiv:1910.03156
Zhang Y, Chen Y, Cheung SC, Xiong Y, Zhang L (2018) An empirical study on tensorflow program bugs. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, pp 129–140
Acknowledgements
This work is supported by Fonds de Recherche du Quebec (FRQ) and the Natural Sciences and Engineering Research Council of Canada (NSERC). We would like to thank Dr. Amin Nikanjam for his valuable comments on the manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Communicated by: Denys Poshyvanyk
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Study Design
Table 22 shows the name, url, number of lines of code (SLOC), number of commits and the size of each selected 6 DL repository.
Appendix B: RQ1 Additional Results
2.1 B.1 Results of Clone Detection Using Threshold of 20%
In this section, we provide additional results for both programming languages (Java and C#) when using a dissimilarity threshold 20% in order to explore the impact of threshold on clone detection as we use in our analysis 30% as threshold. Figure 29 shows the code clones occurrences in DL and Traditional Java projects for both code clones granularities. Figure 30 shows the same analysis but for C# projects.
We further extend our analysis by comparing clones types. Figures 31 and 32 illustrate the clone density in DL and traditional projects for clone types and granularity for the two programming languages (Java and C# respectively).
Appendix C: RQ2 Additional Results
3.1 C.1 Other Programming Languages Analysis Results
We study the distribution of different clones types by clone location in DL and traditional code in java projects (Fig. 33) and in C# projects (Fig. 34)
3.2 C.2 Results of Clone Detection Using Threshold of 20%
In this section, we present the additional analysis we performed to address RQ2. We examine the code clone distribution by location in DL and traditional java (Fig. 35) and C# (Fig. 36) systems using 20% as dissimilarity threshold.
We further study the percentages of average number of fragments of code clones by location of clones in both deep learning and traditional using 20% dissimilarity threshold for the two programming language Java (Fig. 37) and C# (Fig. 38).
We then study the distribution of different types of clones in the different clones location (Same file, Same directory, and different directories) using 20% dissimilarity threshold for the two programming languages Java (Fig. 39) and C# (Fig. 40).
Appendix D: RQ3 Additional Results
In this section, we provide additional analysis on the distribution of the size of cloned and non-cloned functions in DL and traditional systems (Fig. 41). This is done to understand if size is playing an important confounding role in identifying bug fixing commits that are related to clones. We study the distribution of the mean size of cloned and non-cloned functions per systems in DL and traditional systems in Python projects (Fig. 42) and for Java and C# (Fig. 43).
Appendix E: RQ4 Additional Results
As explained in RQ4 but in percentages, Table 23 shows the total number of code clones attributed to the DL phases. The total number of code clones manually analyzed is 595.
Rights and permissions
About this article
Cite this article
Jebnoun, H., Rahman, M.S., Khomh, F. et al. Clones in deep learning code: what, where, and why?. Empir Software Eng 27, 84 (2022). https://doi.org/10.1007/s10664-021-10099-x
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-021-10099-x