Optimized ensemble machine learning model for software bugs prediction

Johnson, Femi; Oluwatobi, Olayiwola; Folorunso, Olusegun; Ojumu, Alomaja Victor; Quadri, Alatishe

doi:10.1007/s11334-022-00506-x

Optimized ensemble machine learning model for software bugs prediction

S.i. : Intelligence for Systems and Software Engineering
Published: 03 December 2022

Volume 19, pages 91–101, (2023)
Cite this article

Innovations in Systems and Software Engineering Aims and scope Submit manuscript

Femi Johnson¹,
Olayiwola Oluwatobi²,
Olusegun Folorunso¹,
Alomaja Victor Ojumu³ &
…
Alatishe Quadri²

294 Accesses
1 Citation
Explore all metrics

Abstract

Software accuracy and efficiency checks are becoming of paramount interest to system users before utilization. As a result, twenty-first-century programmers are consciously developing less buggy, highly efficient, and robust software with a higher degree of accuracy. Occasionally, undetected bugs in large software due to the complexity of codes and other associated parametric attributes cause hardware to malfunction. In this paper, an ensemble model of Logistic Regression and Extra tree classifier algorithms is deployed on parametric software attributes for the accurate classification and prediction of software bugs. The implementation was performed on different platforms (WEKA, MATLAB and PyCharm) to determine the rate of memory utilization, optimize prediction time, maximize the model’s efficiency and compare accuracy rankings among similar machine models. A publicly available software defects dataset from the National Aeronautics and Space Administration (NASA) containing 16,962 instances and 38 attributes for software defects prediction was collected, pre-processed and used in the implementation of this study. The collected data were vectorized, subjected to principal component analysis (PCA) for dimension reduction based on ranking values and divided in the ratio 3:2 for training and testing of the ensemble model classifier, respectively, on new sets of buggy software datasets. The result from the ensembled model showed a significant increase from 96.7–97.8% in the prediction accuracy of the un-vectorized dataset to vectorized dataset. An appreciable decrease in the prediction time (19.7 s) of the vectorized dataset was also observed against the initial time (26.9 s) recorded for the un-vectorized dataset. In addition, memory utilization for vectorized datasets increased during the training phase due to the number of bits but got reduced at the final testing phase of the software bug prediction. However, the overall accuracy of 97.8% recorded by the optimized ensemble model for buggy software prediction proved the model’s capability to accurately classify and predict buggy software with efficient memory utilization at optimal time duration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved prediction of software defects using ensemble machine learning techniques

Article 02 March 2021

Software Defect Prediction: An ML Approach-Based Comprehensive Study

Software defect prediction techniques using metrics based on neural network classifier

Article 07 February 2018

References

Jin C, Jin SW (2015) Prediction approach of software fault-proneness based on hybrid artificial neural network and quantum particle swarm optimization. Appl Soft Comput 35:717–725
Article Google Scholar
Zhang ZW, Jing XY, Wang TJ (2017) Label propagation based semi-supervised learning for software defect prediction. Autom Softw Eng 24(1):47–69
Article Google Scholar
Chen X, Zhang D, Zhao Y, Cui Z, Ni C (2019) Software defect number prediction: unsupervised vs. supervised methods. Inf Softw Technol 106:161–181
Article Google Scholar
Phan H, Andreotti F, Cooray N, Chén OY, De Vos M (2018) Joint classification and prediction CNN framework for automatic sleep stage classification. IEEE Trans Biomed Eng 66(5):1285–1296
Article Google Scholar
Chatterjee S, Maji B (2016) A new fuzzy rule-based algorithm for estimating software faults in early phase of development. Soft Comput 20(10):4023–4035
Article Google Scholar
Mustaqeem M, Saqib M (2021) Principal component based support vector machine (PC-SVM): a hybrid technique for software defect detection. Clust Comput. https://doi.org/10.1007/s10586-021-03282-8
Article Google Scholar
Sasidharan R, Sriram P (2014) Hyper-quadtree-based k-means algorithm for software fault prediction. Computational intelligence cyber security and computational models. Springer, New Delhi, pp 107–118
Chapter Google Scholar
Turabieh H, Mafarja M, Li X (2019) Iterated feature selection algorithms with layered recurrent neural network for software fault prediction. Expert Syst Appl 122:27–42
Article Google Scholar
Yang X, Lo D, Xia X, Zhang Y, and Sun J (2015) Deep learning for just-in-time defect prediction. In: IEEE international conference on software quality, reliability and security, p 17–26
Ji H, Huang S, Wu Y, Hui Z, Zheng C (2019) A new weighted naive Bayes method based on information diffusion for software defect prediction. Softw Qual J 27(3):923–968
Article Google Scholar
Rathore SS, Kumar S (2015) Predicting number of faults in software system using genetic programming. Procedia Comput Sci 62:303–311
Article Google Scholar
Soleimani A and Asdaghi F (2014) An AIS based feature selection method for software fault prediction. In: Iranian conference on intelligent systems (ICIS), IEEE, p 1–
Tong H, Liu B, Wang S (2018) Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf Softw Technol 96:94–111
Article Google Scholar
Ricky MY, Purnomo F, and Yulianto B, (2016) Mobile application software defect prediction. In: IEEE symposium on service-oriented system engineering (SOSE), p 307–313
Nam J and Kim S (2015) Clami: defect prediction on unlabeled datasets. In: IEEE/ACM international conference on automated software engineering (ASE), p 452–463
Okutan A, Yıldız OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19(1):154–181
Article Google Scholar
Mendis C, Yan C, Pu Y, Amasinghe S, Carbin M (2019) Auto-vectorization with imitation learning. In: Conference on neural information processing systems, (NEURIPS), p 1–12
Hall T, Zhang M, Bowes D, Sun Y (2015) Some code smells have a significant but small effect on faults. ACM Trans Softw Eng Methodol (TOSEM) 23(4):1–39
Article Google Scholar
Yuzhou L (2021) A novel DL approach to PE malware detection: exploring GLOVE vectorization, MCC-RNN and feature fusion. p 1–19
Yadav HB, Yadav DK (2015) A fuzzy logic-based approach for phase-wise software defects prediction using software metrics”. Inf Softw Technol 63:44–57
Article Google Scholar
Biçer MS and Diri B (2015) Predicting defect prone modules in web applications. In: International conference on information and software technologies, Springer, Cham, p 577–591
Elish KO, Elish MO (2008) Predicting defect-prone software modules using support vector machines. J Syst Softw 81(5):649–660
Article Google Scholar
Maua G, Grbac GT (2017) Co-evolutionary multi-population genetic programming for classification in software defect prediction. Appl Soft Comput 55:331–351
Article Google Scholar
Parag CP (2010) Exhaustive and heuristic search approaches for learning software defect prediction model. Eng Appl Artif Intell 23(1):34–40
Article Google Scholar
Abaei G, Selamat A, Fujita H (2015) An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction. Knowl Based Syst 74:28–39
Article Google Scholar
Li B, Shen B, Wang J, Chen Y, Zhang T, and Wang J (2014) A scenario-based approach to predicting software defects using compressed C4. 5 model. In: IEEE 38th annual computer software and applications conference, p 406–415
Aman H, Amasaki S, Sasaki T, Kawahara M (2015) Lines of comments as a noteworthy metric for analyzing fault-proneness in methods. IEICE Trans Inf Syst 98(12):2218–2228
Article Google Scholar
Ulan M, Löwe W, Ericsson M, Wingkvist A (2021) Copula-based software metrics aggregation. Softw Qual J. https://doi.org/10.1007/s11219-021-09568-9
Article Google Scholar
Biçer MS, Diri B (2016) Defect prediction for cascading style sheets. Appl Soft Comput 49:1078–1084
Article Google Scholar
Bowes D, Hall T, Harman M, Jia Y, Sarro F, and Wu F, (2016) Mutation-aware fault prediction. In: Proceedings of the 25th international symposium on software testing and analysis, p 330–341
Kim S, Whitehead EJ, Zhang Y (2008) Classifying software changes: clean or buggy. IEEE Trans Softw Eng 34(2):181–196
Article Google Scholar
Zhao Y, Yang Y, Lu H, Liu J, Leung H, Wu Y, Xu B (2017) Understanding the value of considering client usage context in package cohesion for fault-proneness prediction. Autom Softw Eng 24(2):393–453
Article Google Scholar
Goyal S (2021) Predicting the defects using stacked ensemble learner with filtered dataset. Autom Softw Eng 28:14. https://doi.org/10.1007/s10515-021-00285-y
Article Google Scholar
Yao J, Shepperd M (2021) The impact of using biased performance metrics on software defect prediction research. Inf Softw Technol 139:1–14. https://doi.org/10.1016/j.infsof.2021.106664
Article Google Scholar
Matloob F, Ghazal TM, Taleb N, Aftab S, Ahmad M, Adnan-Khan M, Abbas S, Rahim-Soomro T (2021) Software defect prediction using ensemble learning: a systematic literature review. Digit Object Identif. https://doi.org/10.1109/ACCESS.2021.3095559
Article Google Scholar
Musinat B, Johnson F, Folorunso O, Ezinne I (2021) Genetic algorithm-based multi-objective optimization model for software bugs prediction. Ann J Tech Univ Varna Bulg 6(1):34–48
Article Google Scholar
Wang Y, Zorzi S and Bittner K (2021) Machine-learned 3D Building Vectorization from Satellite Imagery. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), p 1–14. https://arxiv.org/pdf/2104.06485.pdf
Balogun AO, Oladele RO, Mojeed HA, Amin-Balogun B, Adeyemo VE, Aro TO (2021) Performance analysis of selected clustering techniques for software defects prediction. Afr J Comput ICT 12(2):30–42
Google Scholar
Alattas K (2021) System error estimate using combination of classification and optimization technique. J Comput Sci 17(3):319–329. https://doi.org/10.3844/jcssp.2021.319.329
Article Google Scholar
Haj-Ali A, Ahmad NK, Willkie T, Sopjia Y, Asanovic K, and Stoica I (2020) Neurovectorizer: end-to-end vectorization with deep learning. ACM 1SBN978–01–4503, p 242–245
Matveiev O, Zubenko A, Yevtushenko D, and Cherednichenko O (2022) Towards classifying HTML embedded product data based on machine learning approach. pp 1–11

Download references

Author information

Authors and Affiliations

Department of Computer Science, Federal University of Agriculture, Abeokuta, Nigeria
Femi Johnson & Olusegun Folorunso
Department of Computer Science, Olabisi Onabanjo University, Ago-Iwoye, Nigeria
Olayiwola Oluwatobi & Alatishe Quadri
Department of Computer Technology, Yaba College of Technology, Lagos, Nigeria
Alomaja Victor Ojumu

Authors

Femi Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Olayiwola Oluwatobi
View author publications
You can also search for this author in PubMed Google Scholar
Olusegun Folorunso
View author publications
You can also search for this author in PubMed Google Scholar
Alomaja Victor Ojumu
View author publications
You can also search for this author in PubMed Google Scholar
Alatishe Quadri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Femi Johnson.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Johnson, F., Oluwatobi, O., Folorunso, O. et al. Optimized ensemble machine learning model for software bugs prediction. Innovations Syst Softw Eng 19, 91–101 (2023). https://doi.org/10.1007/s11334-022-00506-x

Download citation

Received: 04 May 2022
Accepted: 22 November 2022
Published: 03 December 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11334-022-00506-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimized ensemble machine learning model for software bugs prediction

Abstract

Access this article

Similar content being viewed by others

Improved prediction of software defects using ensemble machine learning techniques

Software Defect Prediction: An ML Approach-Based Comprehensive Study

Software defect prediction techniques using metrics based on neural network classifier

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimized ensemble machine learning model for software bugs prediction

Abstract

Access this article

Similar content being viewed by others

Improved prediction of software defects using ensemble machine learning techniques

Software Defect Prediction: An ML Approach-Based Comprehensive Study

Software defect prediction techniques using metrics based on neural network classifier

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation