Is deep learning good enough for software defect prediction?

Pandey, Sushant Kumar; Haldar, Arya; Tripathi, Anil Kumar

doi:10.1007/s11334-023-00542-1

Is deep learning good enough for software defect prediction?

Original Paper
Published: 08 October 2023

(2023)
Cite this article

Innovations in Systems and Software Engineering Aims and scope Submit manuscript

164 Accesses
Explore all metrics

Abstract

Due to high impact of internet technology and rapid change in software systems, it has been a tough challenge for us to detect software defects with high accuracy. Traditional software defect prediction research mainly concentrates on manually designing features (e.g., complexity metrics) and inputting them into machine learning classifiers to distinguish defective code. To gain high prediction accuracy, researchers have developed several deep learning or high computational models for software defect prediction. However, there are several critical conditions and theoretical problems in order to achieve better results. This article explores the investigation of SDP using two deep learning techniques, i.e., SqueezeNet and Bottleneck models. We employed seven different open-source datasets from NASA Repository to perform this comparative study. We use F-Measure as a performance evaluator and found that these methods statistically outperform eight state-of-the-art methods with mean F-Measure of 0.93 ± 0.014 and 0.90 ± 0.013, respectively. We found that these two methods are significantly more effective in terms of F-Measure over large- and moderate-size projects. But they are computationally expensive in terms of training time. As the size of projects is getting immense and sophisticated, such deep learning methods are worth applying.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey of Software Defect Prediction Based on Deep Learning

Article 01 August 2022

Deep-Learning Approach with DeepXplore for Software Defect Severity Level Prediction

Evaluation of LMT and DNN Algorithms in Software Defect Prediction for Open-Source Software

Data Availability

We publicly shared (https://github.com/Ary4/Is-Deep-learning-good-enough-for-software-defect-prediction-) all the source code along with datasets and results for replication purposes. The datasets used in our experiments can be found in PROMISE repository (http://promise.site.uottawa.ca/SERepository/) citeSayyad-Shirabad+Menzies:2005 and UCI repository(http://archive.ics.uci.edu/ml/index.php)

Notes

https://github.com/Ary4/Is-Deep-learning-good-enough-for-software-defect-prediction-.

References

Arena P, Basile A, Bucolo M, Fortuna L (2003) Image processing for medical diagnosis using CNN. Nucl Instrum Methods Phys Res Sect A Accel Spectrom Detect Assoc Equip 497(1):174–178
Article Google Scholar
Catal C, Sevim U, Diri B (2011) Practical development of an eclipse-based software fault prediction tool using naive bayes algorithm. Expert Syst Appl 38(3):2347–2353
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
Cliff N (2014) Ordinal methods for behavioral data analysis. Psychology Press, London
Book Google Scholar
Deng J, Lu L, Qiu S (2020) Software defect prediction via lstm. IET Software 14(4):443–450
Article Google Scholar
Fan G, Diao X, Yu H, Yang K, Chen L (2019) Software defect prediction via attention-based recurrent neural network. Sci Program. https://doi.org/10.1155/2019/6230953
Article Google Scholar
Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689
Article Google Scholar
Friedman J, Hastie T, Tibshirani R et al (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28(2):337–407
Article MATH Google Scholar
Garner SR et al (1995) Weka: the waikato environment for knowledge analysis. In: Proceedings of the New Zealand computer science research students conference, pp 57–64
Ghosh D, Singh J (2020) A novel approach of software fault prediction using deep learning technique. In: Automated Software Engineering: A Deep Learning-Based Approach, pp 73–91. Springer
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
MATH Google Scholar
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28
Article Google Scholar
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360
Jorayeva M, Akbulut A, Catal C, Mishra A (2022) Deep learning-based defect prediction for mobile applications. Sensors 22(13):4734
Article Google Scholar
Katiyar S, Borgohain SK (2021) Comparative evaluation of cnn architectures for image caption generation. arXiv preprint arXiv:2102.11506
Kayalibay B, Jensen G, van der Smagt P (2017) Cnn-based segmentation of medical imaging data. arXiv preprint arXiv:1701.03056
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Koh PW, Nguyen T, Tang YS, Mussmann S, Pierson E, Kim B, Liang P (2020) Concept bottleneck models. In: International Conference on Machine Learning, pp. 5338–5348. PMLR
Kumar L, Sripada SK, Sureka A, Rath SK (2018) Effective fault prediction model developed using least square support vector machine (lssvm). J Syst Softw 137:686–712
Article Google Scholar
Laradji IH, Alshayeb M, Ghouti L (2015) Software defect prediction using ensemble learning on selected features. Inf Softw Technol 58:388–402
Article Google Scholar
Li J, He P, Zhu J, Lyu MR (2017) Software defect prediction via convolutional neural network. In: 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS), pp 318–328. IEEE
Li N, Shepperd M, Guo Y (2020) A systematic review of unsupervised learning techniques for software defect prediction. Inf Softw Technol 122:106287
Article Google Scholar
Liaw A, Wiener M et al (2002) Classification and regression by randomforest. R News 2(3):18–22
Google Scholar
Majd A, Vahidi-Asl M, Khalilian A, Poorsarvi-Tehrani P, Haghighi H (2020) Sldeep: statement-level software defect prediction using deep-learning model on static code features. Expert Syst Appl 147:113156
Article Google Scholar
Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518
Article Google Scholar
Malohtra R, Yadav HS (2021) An improved cnn-based architecture for within-project software defect prediction. In: Soft Computing and Signal Processing, pp 335–349. Springer
Matloob F, Ghazal TM, Taleb N, Aftab S, Ahmad M, Khan MA, Abbas S, Soomro TR (2021) Software defect prediction using ensemble learning: A systematic literature review. IEEE Access 9:98754–98771
Article Google Scholar
Munir HS, Ren S, Mustafa M, Siddique CN, Qayyum S (2021) Attention based gru-lstm for software defect prediction. Plos one 16(3):e0247444
Article Google Scholar
Murphy KP et al (2006) Naive bayes classifiers. University of British Columbia, Vancouver
Google Scholar
Okutan A, Yıldız OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19(1):154–181
Article Google Scholar
Omri S, Sinz C (2020) Deep learning for software defect prediction: a survey. In: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops, pp 209–214
Pan C, Lu M, Xu B, Gao H (2019) An improved CNN model for within-project software defect prediction. Appl Sci 9(10):2138
Article Google Scholar
Pandey SK, Mishra RB, Tripathi AK (2020) Bpdet: an effective software bug prediction model using deep representation and ensemble learning techniques. Expert Syst Appl 144:113085
Article Google Scholar
Pandey SK, Mishra RB, Tripathi AK (2021) Machine learning based methods for software fault prediction: A survey. Expert Syst Appl 172:114595
Article Google Scholar
Pandey SK, Rathee D, Tripathi AK (2020) Software defect prediction using k-pca and various kernel-based extreme learning machine: an empirical study. IET Softw 14(7):768–782
Article Google Scholar
Pandey SK, Tripathi AK (2020) Bcv-predictor: a bug count vector predictor of a successive version of the software system. Knowledge-Based Syst 197:105924
Article Google Scholar
Pandey SK, Tripathi AK (2021) Class imbalance issue in software defect prediction models by various machine learning techniques: An empirical study. In: 2021 8th International Conference on Smart Computing and Communications (ICSCC), pp 58–63. IEEE
Pandey SK, Tripathi AK (2021) Dnnattention: a deep neural network and attention based architecture for cross project defect number prediction. Knowledge-Based Syst 233:107541
Article Google Scholar
Pandey SK, Tripathi AK (2021) An empirical study toward dealing with noise and class imbalance issues in software defect prediction. Soft Comput 25(21):13465–13492
Article Google Scholar
Qiao L, Li X, Umer Q, Guo P (2020) Deep learning based software defect prediction. Neurocomputing 385:100–110
Article Google Scholar
Ruck DW, Rogers SK, Kabrisky M, Oxley ME, Suter BW. The multilayer perceptron as an approximation to a bayes optimal discriminant function
Ryu D, Choi O, Baik J (2016) Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng 21(1):43–71
Article Google Scholar
Sayyad Shirabad J, Menzies T (2005) The PROMISE Repository of Software Engineering Databases. School of Information Technology and Engineering, University of Ottawa, Canada. http://promise.site.uottawa.ca/SERepository
Shepperd M, Song Q, Sun Z, Mair C (2013) Data quality: some comments on the NASA software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Steinwart I, Christmann A (2008) Support vector machines. Springer, Berlin
MATH Google Scholar
Sun Y, Xu L, Li Y, Guo L, Ma Z, Wang Y (2018) Utilizing deep architecture networks of vae in software fault prediction. In: 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), pp 870–877. IEEE
Suresh Kumar P, Behera HS, Nayak J, Naik B (2021) Bootstrap aggregation ensemble learning-based reliable approach for software defect prediction by using characterized code feature. Innov Syst Softw Eng 17(4):355–379
Article Google Scholar
Tantithamthavorn CK (2016) Nasa software defect prediction dataset. https://github.com/klainfo/NASADefectDataset
Tong H, Liu B, Wang S (2018) Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf Softw Technol 96:94–111
Article Google Scholar
Wang H, Zhuang W, Zhang X (2021) Software defect prediction based on gated hierarchical lstms. IEEE Trans Reliab 70(2):711–727
Article Google Scholar
Wang T, Zhang Z, Jing X, Zhang L (2016) Multiple kernel ensemble learning for software defect prediction. Autom Softw Eng 23(4):569–590
Article Google Scholar
Xu Z, Liu J, Luo X, Yang Z, Zhang Y, Yuan P, Tang Y, Zhang T (2019) Software defect prediction based on kernel pca and weighted extreme learning machine. Inf Softw Technol 106:182–200
Article Google Scholar
Yedida R, Menzies T (2021) On the value of oversampling for deep learning in software defect prediction. IEEE Trans Softw Eng 48(8):3103–3116
Article Google Scholar
Zhu K, Ying S, Zhang N, Zhu D (2021) Software defect prediction based on enhanced metaheuristic feature selection optimization and a hybrid deep neural network. J Syst Softw 180:111026
Article Google Scholar

Download references

Acknowledgements

This work is supported by the IIT(BHU), India. The authors also like to thank professor David Lo from SMU Singapore for encouraging me for this work. His valuable suggestions and critical comments increased the quality of the manuscript.

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Indian Institute of Technology (BHU), Vararanasi, 221001, India
Sushant Kumar Pandey, Arya Haldar & Anil Kumar Tripathi

Authors

Sushant Kumar Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Arya Haldar
View author publications
You can also search for this author in PubMed Google Scholar
Anil Kumar Tripathi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SKP (corresponding author) contributed to conceptualization, methodology, validation, formal analysis, investigation, resources, data curation, writing—original draft, writing—review and editing, and visualization and provided software. AH (corresponding author) was involved in conceptualization, validation, formal analysis, investigation, resources, data curation, writing—original draft, editing, and visualization and provided software. AKT contributed to supervision, writing—review and editing, resources, formal analysis, and methodology.

Corresponding author

Correspondence to Sushant Kumar Pandey.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article. Sushant Kumar Pandey, Arya Halder, and Anil Kumar Tripathi declare they have no financial interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pandey, S.K., Haldar, A. & Tripathi, A.K. Is deep learning good enough for software defect prediction?. Innovations Syst Softw Eng (2023). https://doi.org/10.1007/s11334-023-00542-1

Download citation

Received: 10 August 2022
Accepted: 15 September 2023
Published: 08 October 2023
DOI: https://doi.org/10.1007/s11334-023-00542-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Is deep learning good enough for software defect prediction?

Abstract

Access this article

Similar content being viewed by others

A Survey of Software Defect Prediction Based on Deep Learning

Deep-Learning Approach with DeepXplore for Software Defect Severity Level Prediction

Evaluation of LMT and DNN Algorithms in Software Defect Prediction for Open-Source Software

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Is deep learning good enough for software defect prediction?

Abstract

Access this article

Similar content being viewed by others

A Survey of Software Defect Prediction Based on Deep Learning

Deep-Learning Approach with DeepXplore for Software Defect Severity Level Prediction

Evaluation of LMT and DNN Algorithms in Software Defect Prediction for Open-Source Software

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation