Cross-project smell-based defect prediction

Sotto-Mayor, Bruno; Kalech, Meir

doi:10.1007/s00500-021-06254-7

Cross-project smell-based defect prediction

Data analytics and machine learning
Published: 04 October 2021

Volume 25, pages 14171–14181, (2021)
Cite this article

Soft Computing Aims and scope Submit manuscript

375 Accesses
8 Citations
Explore all metrics

Abstract

Defect prediction is a technique introduced to optimize the testing phase of the software development pipeline by predicting which components in the software may contain defects. Its methodology trains a classifier with data regarding a set of features measured on each component from the target software project to predict whether the component may be defective or not. However, suppose the defective information is not available in the training set. In that case, we need to rely on an alternate approach that uses the training set of external projects to train the classifier. This approached is called cross-project defect prediction. Bad code smells are a category of features that have been previously explored in defect prediction and have been shown to be a good predictor of defects. Code smells are patterns of poor development in the code and indicate flaws in its design and implementation. Although they have been previously studied in the context of defect prediction, they have not been studied as features for cross-project defect prediction. In our experiment, we train defect prediction models for 100 projects to evaluate the predictive performance of the bad code smells. We implemented four cross-project approaches known in the literature and compared the performance of 37 smells with 56 code metrics, commonly used for defect prediction. The results show that the cross-project defect prediction models trained with code smells significantly improved \(6.50\%\) on the ROC AUC compared against the code metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Software Defect Prediction Using Bad Code Smells: A Systematic Literature Review

Combined classifier for cross-project defect prediction: an extended empirical study

Article 15 February 2018

Software defect prediction based on nested-stacking and heterogeneous feature selection

Article Open access 20 February 2022

Availability of data and material

The data sets generated during and analyzed during the current study are available in the public data repository https://zenodo.org/record/4697491.

References

Bal PR (2018) Cross project software defect prediction using extreme learning machine: an ensemble based study. In: Proceedings of the 13th international conference on software technologies, SCITEPRESS - Science and Technology Publications, Porto, Portugal, pp 354–361, https://doi.org/10.5220/0006886503540361, http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0006886503540361
Booch G, Booch G (eds) (2007) Object-oriented analysis and design with applications, 3rd edn. The Addison-Wesley object technology series, Addison-Wesley, Upper Saddle River, NJ, p oCLC: ocm80020116
Borg M, Svensson O, Berg K, Hansson D (2019) SZZ unleashed: an open implementation of the SZZ algorithm - featuring example usage in a study of just-in-time bug prediction for the Jenkins project. In: Proceedings of the 3rd ACM SIGSOFT international workshop on machine learning techniques for software quality evaluation - MaLTeSQuE 2019, ACM Press, Tallinn, Estonia, pp 7–12. https://doi.org/10.1145/3340482.3342742, http://dl.acm.org/citation.cfm?doid=3340482.3342742
Brito e Abreu F, Carapuça R, (1994) In: Zenodo McLean, VA, USA, DOI, (eds) Object-Oriented Software Engineering: Measuring And Controlling The Development Process. In: 4th International. publisher: Zenodo, p https://doi.org/10.5281/ZENODO.1217609,
Brown WJ (ed) (1998) AntiPatterns: refactoring software, architectures, and projects in crisis. Wiley, New York
Google Scholar
Cedrim D, Sousa L (2018) opus-research/organic. https://github.com/opus-research/organic
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intel Res 16:321–357. https://doi.org/10.1613/jair.953
Article MATH Google Scholar
Chidamber S, Kemerer C (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493. https://doi.org/10.1109/32.295895
Article Google Scholar
Cruz AEC, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: 2009 3rd International Symposium on Empirical Software Engineering and Measurement, IEEE, Lake Buena Vista, FL, USA, pp 460–463. https://doi.org/10.1109/ESEM.2009.5316002, http://ieeexplore.ieee.org/document/5316002/
Fowler M, Beck K (1999) Refactoring: improving the design of existing code. The Addison-Wesley object technology series, Addison-Wesley, Reading, MA
Google Scholar
Goel L, Damodaran D, Khatri SK, Sharma M (2017) A literature review on cross project defect prediction. In: 2017 4th IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics (UPCON), IEEE, Mathura, pp 680–685, https://doi.org/10.1109/UPCON.2017.8251131, http://ieeexplore.ieee.org/document/8251131/
Guo J, Rahimi M, Cleland-Huang J, Rasin A, Hayes JH, Vierhauser M (2016) Cold-start software analytics. In: Proceedings of the 13th International Conference on Mining Software Repositories, ACM, Austin Texas, pp 142–153. https://doi.org/10.1145/2901739.2901740, https://dl.acm.org/doi/10.1145/2901739.2901740
Halstead MH (1977) Elements of software science. No. 2 in Operating and programming systems series, Elsevier, New York
Hassan AE (2009) Predicting faults using the complexity of code changes, In: 2009 IEEE 31st International Conference on Software Engineering, IEEE, Vancouver, BC, Canada, pp 78–88, https://doi.org/10.1109/ICSE.2009.5070510, http://ieeexplore.ieee.org/document/5070510/
Herbold S, Trautsch A, Grabowski J (2018) A comparative study to benchmark cross-project defect prediction approaches. IEEE Trans Softw Eng 44(9):811–833. https://doi.org/10.1109/TSE.2017.2724538
Article Google Scholar
Hosseini S, Turhan B, Gunarathna D (2019) A systematic literature review and meta-analysis on cross project defect prediction. IEEE Trans Softw Eng 45(2):111–147. https://doi.org/10.1109/TSE.2017.2770124
Article Google Scholar
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501. https://doi.org/10.1016/j.neucom.2005.12.126
Article Google Scholar
Ivanov R, Veach R, Bludov P, Paikin A, Dubinin I, Selkin A, Lisetskii V, Burn O, Kordas M, Diachenko R, Izmailov B, Yaroslavtsev D, Sopov I, Kühne L, Giles R, Sukhodolsky O, Studman M, Schneeberger T (2021) checkstyle – Checkstyle 8.41.1. https://checkstyle.sourceforge.io/
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering - PROMISE ’10, ACM Press, Timişoara, Romania
Kitchenham BA, Mendes E, Travassos GH (2007) Cross versus within-company cost estimation studies: a systematic review. IEEE Trans Softw Eng 33(5):316–329. https://doi.org/10.1109/TSE.2007.1001
Article Google Scholar
Kotte A, Qyser D, Moiz AA (2021) A survey of different machine learning models for software defect testing. Eur J Mol Clin Med 7(9):3256–3268
Google Scholar
Li Z, Jing XY, Zhu X (2018) Progress on approaches to software defect prediction. IET Softw 12(3):161–175. https://doi.org/10.1049/iet-sen.2017.0148
Article Google Scholar
McCabe T (1976) A complexity measure. IEEE Trans Softw Eng SE 2(4):308–320. https://doi.org/10.1109/TSE.1976.233837
Article MathSciNet MATH Google Scholar
McGinnis W (2015) sklearn-extensions. https://github.com/wdm0006/sklearn-extensions
Moser R, Pedrycz W, Succi G (2008) Analysis of the reliability of a subset of change metrics for defect prediction. In: Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement - ESEM ’08, ACM Press, Kaiserslautern, Germany, https://doi.org/10.1145/1414004.1414063, http://portal.acm.org/citation.cfm?doid=1414004.1414063
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on Software engineering - ICSE ’05, ACM Press, St. Louis, MO, USA, p 284, https://doi.org/10.1145/1062455.1062514, http://portal.acm.org/citation.cfm?doid=1062455.1062514
Paterson D, Campos J, Abreu R, Kapfhammer GM, Fraser G, McMinn P (2019) An empirical study on the use of defect prediction for test case prioritization. In: 2019 12th IEEE conference on software testing, validation and verification (ICST), IEEE, Xi’an, China, pp 346–357, https://doi.org/10.1109/ICST.2019.00041, https://ieeexplore.ieee.org/document/8730206/
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Piotrowski P, Madeyski L (2020) Software defect prediction using bad code smells: a systematic literature review. In: Poniszewska-Marańda A, Kryvinska N, Jarzbek S, Madeyski L (eds) Data-centric business and applications: towards software development (volume 4). Springer International Publishing, Cham, pp 77–99. https://doi.org/10.1007/978-3-030-34706-2
Chapter Google Scholar
Porto F, Minku L, Mendes E, Simao A (2019) A systematic study of cross-project defect prediction with meta-learning. arXiv:1802.06025 [cs]
Radjenović D, Heričko M, Torkar R, Živkovič A (2013) Software fault prediction metrics: A systematic literature review. Information and Software Technology 55(8):1397–1418. https://doi.org/10.1016/j.infsof.2013.02.009, https://linkinghub.elsevier.com/retrieve/pii/S0950584913000426
Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327. https://doi.org/10.1007/s10462-017-9563-5
Article Google Scholar
Sharma T (2018) DesigniteJava. https://doi.org/10.5281/zenodo.2566861
Suryanarayana G, Samarthyam G, Sharma T (2015) Refactoring for software design smells: managing technical debt. Elsevier, Morgan Kaufmann, Morgan Kaufmann is an imprint of Elsevier, Amsterdam, Boston
Taba SES, Khomh F, Zou Y, Hassan AE, Nagappan M (2013) Predicting Bugs Using Antipatterns, In: 2013 IEEE International Conference on Software Maintenance, IEEE, Eindhoven, Netherlands, pp 270–279, https://doi.org/10.1109/ICSM.2013.38, http://ieeexplore.ieee.org/document/6676898/
Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578. https://doi.org/10.1007/s10664-008-9103-7
Article Google Scholar
Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter languagereuse. In: Proceedings of the 4th international workshop on Predictor models in software engineering - PROMISE ’8, ACM Press, Leipzig, Germany, p 19, https://doi.org/10.1145/1370788.1370794, http://portal.acm.org/citation.cfm?doid=1370788.1370794
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering on European software engineering conference and foundations of software engineering symposium - ESEC/FSE ’09, ACM Press, Amsterdam, The Netherlands, p 91, https://doi.org/10.1145/1595696.1595713, http://portal.acm.org/citation.cfm?doid=1595696.1595713

Download references

Funding

This work was supported by the Cyber Security Research Center at the Ben-Gurion University of the Negev.

Author information

Authors and Affiliations

Ben-Gurion University of the Negev, Beer-Sheva, Israel
Bruno Sotto-Mayor & Meir Kalech

Authors

Bruno Sotto-Mayor
View author publications
You can also search for this author in PubMed Google Scholar
Meir Kalech
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: BSM, MK; Funding acquisition: MK; Investigation: BSM, MK; Methodology: BSM, MK; Supervision: MK; Visualization: BSM; Writing—original draft: BSM; Writing—review & editing: BSM, MK.

Corresponding author

Correspondence to Bruno Sotto-Mayor.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing interests or personal relationships that could have influenced the work reported in this paper.

Code availability

The software developed during the current study is available from the public repository at the website of https://github.com/Bruno81930/smells.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sotto-Mayor, B., Kalech, M. Cross-project smell-based defect prediction. Soft Comput 25, 14171–14181 (2021). https://doi.org/10.1007/s00500-021-06254-7

Download citation

Accepted: 09 September 2021
Published: 04 October 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s00500-021-06254-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-project smell-based defect prediction

Abstract

Access this article

Similar content being viewed by others

Software Defect Prediction Using Bad Code Smells: A Systematic Literature Review

Combined classifier for cross-project defect prediction: an extended empirical study

Software defect prediction based on nested-stacking and heterogeneous feature selection

Availability of data and material

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cross-project smell-based defect prediction

Abstract

Access this article

Similar content being viewed by others

Software Defect Prediction Using Bad Code Smells: A Systematic Literature Review

Combined classifier for cross-project defect prediction: an extended empirical study

Software defect prediction based on nested-stacking and heterogeneous feature selection

Availability of data and material

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation