Abstract
Software accuracy and efficiency checks are becoming of paramount interest to system users before utilization. As a result, twenty-first-century programmers are consciously developing less buggy, highly efficient, and robust software with a higher degree of accuracy. Occasionally, undetected bugs in large software due to the complexity of codes and other associated parametric attributes cause hardware to malfunction. In this paper, an ensemble model of Logistic Regression and Extra tree classifier algorithms is deployed on parametric software attributes for the accurate classification and prediction of software bugs. The implementation was performed on different platforms (WEKA, MATLAB and PyCharm) to determine the rate of memory utilization, optimize prediction time, maximize the model’s efficiency and compare accuracy rankings among similar machine models. A publicly available software defects dataset from the National Aeronautics and Space Administration (NASA) containing 16,962 instances and 38 attributes for software defects prediction was collected, pre-processed and used in the implementation of this study. The collected data were vectorized, subjected to principal component analysis (PCA) for dimension reduction based on ranking values and divided in the ratio 3:2 for training and testing of the ensemble model classifier, respectively, on new sets of buggy software datasets. The result from the ensembled model showed a significant increase from 96.7–97.8% in the prediction accuracy of the un-vectorized dataset to vectorized dataset. An appreciable decrease in the prediction time (19.7 s) of the vectorized dataset was also observed against the initial time (26.9 s) recorded for the un-vectorized dataset. In addition, memory utilization for vectorized datasets increased during the training phase due to the number of bits but got reduced at the final testing phase of the software bug prediction. However, the overall accuracy of 97.8% recorded by the optimized ensemble model for buggy software prediction proved the model’s capability to accurately classify and predict buggy software with efficient memory utilization at optimal time duration.
Similar content being viewed by others
References
Jin C, Jin SW (2015) Prediction approach of software fault-proneness based on hybrid artificial neural network and quantum particle swarm optimization. Appl Soft Comput 35:717–725
Zhang ZW, Jing XY, Wang TJ (2017) Label propagation based semi-supervised learning for software defect prediction. Autom Softw Eng 24(1):47–69
Chen X, Zhang D, Zhao Y, Cui Z, Ni C (2019) Software defect number prediction: unsupervised vs. supervised methods. Inf Softw Technol 106:161–181
Phan H, Andreotti F, Cooray N, Chén OY, De Vos M (2018) Joint classification and prediction CNN framework for automatic sleep stage classification. IEEE Trans Biomed Eng 66(5):1285–1296
Chatterjee S, Maji B (2016) A new fuzzy rule-based algorithm for estimating software faults in early phase of development. Soft Comput 20(10):4023–4035
Mustaqeem M, Saqib M (2021) Principal component based support vector machine (PC-SVM): a hybrid technique for software defect detection. Clust Comput. https://doi.org/10.1007/s10586-021-03282-8
Sasidharan R, Sriram P (2014) Hyper-quadtree-based k-means algorithm for software fault prediction. Computational intelligence cyber security and computational models. Springer, New Delhi, pp 107–118
Turabieh H, Mafarja M, Li X (2019) Iterated feature selection algorithms with layered recurrent neural network for software fault prediction. Expert Syst Appl 122:27–42
Yang X, Lo D, Xia X, Zhang Y, and Sun J (2015) Deep learning for just-in-time defect prediction. In: IEEE international conference on software quality, reliability and security, p 17–26
Ji H, Huang S, Wu Y, Hui Z, Zheng C (2019) A new weighted naive Bayes method based on information diffusion for software defect prediction. Softw Qual J 27(3):923–968
Rathore SS, Kumar S (2015) Predicting number of faults in software system using genetic programming. Procedia Comput Sci 62:303–311
Soleimani A and Asdaghi F (2014) An AIS based feature selection method for software fault prediction. In: Iranian conference on intelligent systems (ICIS), IEEE, p 1–
Tong H, Liu B, Wang S (2018) Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf Softw Technol 96:94–111
Ricky MY, Purnomo F, and Yulianto B, (2016) Mobile application software defect prediction. In: IEEE symposium on service-oriented system engineering (SOSE), p 307–313
Nam J and Kim S (2015) Clami: defect prediction on unlabeled datasets. In: IEEE/ACM international conference on automated software engineering (ASE), p 452–463
Okutan A, Yıldız OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19(1):154–181
Mendis C, Yan C, Pu Y, Amasinghe S, Carbin M (2019) Auto-vectorization with imitation learning. In: Conference on neural information processing systems, (NEURIPS), p 1–12
Hall T, Zhang M, Bowes D, Sun Y (2015) Some code smells have a significant but small effect on faults. ACM Trans Softw Eng Methodol (TOSEM) 23(4):1–39
Yuzhou L (2021) A novel DL approach to PE malware detection: exploring GLOVE vectorization, MCC-RNN and feature fusion. p 1–19
Yadav HB, Yadav DK (2015) A fuzzy logic-based approach for phase-wise software defects prediction using software metrics”. Inf Softw Technol 63:44–57
Biçer MS and Diri B (2015) Predicting defect prone modules in web applications. In: International conference on information and software technologies, Springer, Cham, p 577–591
Elish KO, Elish MO (2008) Predicting defect-prone software modules using support vector machines. J Syst Softw 81(5):649–660
Maua G, Grbac GT (2017) Co-evolutionary multi-population genetic programming for classification in software defect prediction. Appl Soft Comput 55:331–351
Parag CP (2010) Exhaustive and heuristic search approaches for learning software defect prediction model. Eng Appl Artif Intell 23(1):34–40
Abaei G, Selamat A, Fujita H (2015) An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction. Knowl Based Syst 74:28–39
Li B, Shen B, Wang J, Chen Y, Zhang T, and Wang J (2014) A scenario-based approach to predicting software defects using compressed C4. 5 model. In: IEEE 38th annual computer software and applications conference, p 406–415
Aman H, Amasaki S, Sasaki T, Kawahara M (2015) Lines of comments as a noteworthy metric for analyzing fault-proneness in methods. IEICE Trans Inf Syst 98(12):2218–2228
Ulan M, Löwe W, Ericsson M, Wingkvist A (2021) Copula-based software metrics aggregation. Softw Qual J. https://doi.org/10.1007/s11219-021-09568-9
Biçer MS, Diri B (2016) Defect prediction for cascading style sheets. Appl Soft Comput 49:1078–1084
Bowes D, Hall T, Harman M, Jia Y, Sarro F, and Wu F, (2016) Mutation-aware fault prediction. In: Proceedings of the 25th international symposium on software testing and analysis, p 330–341
Kim S, Whitehead EJ, Zhang Y (2008) Classifying software changes: clean or buggy. IEEE Trans Softw Eng 34(2):181–196
Zhao Y, Yang Y, Lu H, Liu J, Leung H, Wu Y, Xu B (2017) Understanding the value of considering client usage context in package cohesion for fault-proneness prediction. Autom Softw Eng 24(2):393–453
Goyal S (2021) Predicting the defects using stacked ensemble learner with filtered dataset. Autom Softw Eng 28:14. https://doi.org/10.1007/s10515-021-00285-y
Yao J, Shepperd M (2021) The impact of using biased performance metrics on software defect prediction research. Inf Softw Technol 139:1–14. https://doi.org/10.1016/j.infsof.2021.106664
Matloob F, Ghazal TM, Taleb N, Aftab S, Ahmad M, Adnan-Khan M, Abbas S, Rahim-Soomro T (2021) Software defect prediction using ensemble learning: a systematic literature review. Digit Object Identif. https://doi.org/10.1109/ACCESS.2021.3095559
Musinat B, Johnson F, Folorunso O, Ezinne I (2021) Genetic algorithm-based multi-objective optimization model for software bugs prediction. Ann J Tech Univ Varna Bulg 6(1):34–48
Wang Y, Zorzi S and Bittner K (2021) Machine-learned 3D Building Vectorization from Satellite Imagery. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), p 1–14. https://arxiv.org/pdf/2104.06485.pdf
Balogun AO, Oladele RO, Mojeed HA, Amin-Balogun B, Adeyemo VE, Aro TO (2021) Performance analysis of selected clustering techniques for software defects prediction. Afr J Comput ICT 12(2):30–42
Alattas K (2021) System error estimate using combination of classification and optimization technique. J Comput Sci 17(3):319–329. https://doi.org/10.3844/jcssp.2021.319.329
Haj-Ali A, Ahmad NK, Willkie T, Sopjia Y, Asanovic K, and Stoica I (2020) Neurovectorizer: end-to-end vectorization with deep learning. ACM 1SBN978–01–4503, p 242–245
Matveiev O, Zubenko A, Yevtushenko D, and Cherednichenko O (2022) Towards classifying HTML embedded product data based on machine learning approach. pp 1–11
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Johnson, F., Oluwatobi, O., Folorunso, O. et al. Optimized ensemble machine learning model for software bugs prediction. Innovations Syst Softw Eng 19, 91–101 (2023). https://doi.org/10.1007/s11334-022-00506-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11334-022-00506-x