Skip to main content
Log in

Optimized ensemble machine learning model for software bugs prediction

  • S.i. : Intelligence for Systems and Software Engineering
  • Published:
Innovations in Systems and Software Engineering Aims and scope Submit manuscript

Abstract

Software accuracy and efficiency checks are becoming of paramount interest to system users before utilization. As a result, twenty-first-century programmers are consciously developing less buggy, highly efficient, and robust software with a higher degree of accuracy. Occasionally, undetected bugs in large software due to the complexity of codes and other associated parametric attributes cause hardware to malfunction. In this paper, an ensemble model of Logistic Regression and Extra tree classifier algorithms is deployed on parametric software attributes for the accurate classification and prediction of software bugs. The implementation was performed on different platforms (WEKA, MATLAB and PyCharm) to determine the rate of memory utilization, optimize prediction time, maximize the model’s efficiency and compare accuracy rankings among similar machine models. A publicly available software defects dataset from the National Aeronautics and Space Administration (NASA) containing 16,962 instances and 38 attributes for software defects prediction was collected, pre-processed and used in the implementation of this study. The collected data were vectorized, subjected to principal component analysis (PCA) for dimension reduction based on ranking values and divided in the ratio 3:2 for training and testing of the ensemble model classifier, respectively, on new sets of buggy software datasets. The result from the ensembled model showed a significant increase from 96.7–97.8% in the prediction accuracy of the un-vectorized dataset to vectorized dataset. An appreciable decrease in the prediction time (19.7 s) of the vectorized dataset was also observed against the initial time (26.9 s) recorded for the un-vectorized dataset. In addition, memory utilization for vectorized datasets increased during the training phase due to the number of bits but got reduced at the final testing phase of the software bug prediction. However, the overall accuracy of 97.8% recorded by the optimized ensemble model for buggy software prediction proved the model’s capability to accurately classify and predict buggy software with efficient memory utilization at optimal time duration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Jin C, Jin SW (2015) Prediction approach of software fault-proneness based on hybrid artificial neural network and quantum particle swarm optimization. Appl Soft Comput 35:717–725

    Article  Google Scholar 

  2. Zhang ZW, Jing XY, Wang TJ (2017) Label propagation based semi-supervised learning for software defect prediction. Autom Softw Eng 24(1):47–69

    Article  Google Scholar 

  3. Chen X, Zhang D, Zhao Y, Cui Z, Ni C (2019) Software defect number prediction: unsupervised vs. supervised methods. Inf Softw Technol 106:161–181

    Article  Google Scholar 

  4. Phan H, Andreotti F, Cooray N, Chén OY, De Vos M (2018) Joint classification and prediction CNN framework for automatic sleep stage classification. IEEE Trans Biomed Eng 66(5):1285–1296

    Article  Google Scholar 

  5. Chatterjee S, Maji B (2016) A new fuzzy rule-based algorithm for estimating software faults in early phase of development. Soft Comput 20(10):4023–4035

    Article  Google Scholar 

  6. Mustaqeem M, Saqib M (2021) Principal component based support vector machine (PC-SVM): a hybrid technique for software defect detection. Clust Comput. https://doi.org/10.1007/s10586-021-03282-8

    Article  Google Scholar 

  7. Sasidharan R, Sriram P (2014) Hyper-quadtree-based k-means algorithm for software fault prediction. Computational intelligence cyber security and computational models. Springer, New Delhi, pp 107–118

    Chapter  Google Scholar 

  8. Turabieh H, Mafarja M, Li X (2019) Iterated feature selection algorithms with layered recurrent neural network for software fault prediction. Expert Syst Appl 122:27–42

    Article  Google Scholar 

  9. Yang X, Lo D, Xia X, Zhang Y, and Sun J (2015) Deep learning for just-in-time defect prediction. In: IEEE international conference on software quality, reliability and security, p 17–26

  10. Ji H, Huang S, Wu Y, Hui Z, Zheng C (2019) A new weighted naive Bayes method based on information diffusion for software defect prediction. Softw Qual J 27(3):923–968

    Article  Google Scholar 

  11. Rathore SS, Kumar S (2015) Predicting number of faults in software system using genetic programming. Procedia Comput Sci 62:303–311

    Article  Google Scholar 

  12. Soleimani A and Asdaghi F (2014) An AIS based feature selection method for software fault prediction. In: Iranian conference on intelligent systems (ICIS), IEEE, p 1–

  13. Tong H, Liu B, Wang S (2018) Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf Softw Technol 96:94–111

    Article  Google Scholar 

  14. Ricky MY, Purnomo F, and Yulianto B, (2016) Mobile application software defect prediction. In: IEEE symposium on service-oriented system engineering (SOSE), p 307–313

  15. Nam J and Kim S (2015) Clami: defect prediction on unlabeled datasets. In: IEEE/ACM international conference on automated software engineering (ASE), p 452–463

  16. Okutan A, Yıldız OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19(1):154–181

    Article  Google Scholar 

  17. Mendis C, Yan C, Pu Y, Amasinghe S, Carbin M (2019) Auto-vectorization with imitation learning. In: Conference on neural information processing systems, (NEURIPS), p 1–12

  18. Hall T, Zhang M, Bowes D, Sun Y (2015) Some code smells have a significant but small effect on faults. ACM Trans Softw Eng Methodol (TOSEM) 23(4):1–39

    Article  Google Scholar 

  19. Yuzhou L (2021) A novel DL approach to PE malware detection: exploring GLOVE vectorization, MCC-RNN and feature fusion. p 1–19

  20. Yadav HB, Yadav DK (2015) A fuzzy logic-based approach for phase-wise software defects prediction using software metrics”. Inf Softw Technol 63:44–57

    Article  Google Scholar 

  21. Biçer MS and Diri B (2015) Predicting defect prone modules in web applications. In: International conference on information and software technologies, Springer, Cham, p 577–591

  22. Elish KO, Elish MO (2008) Predicting defect-prone software modules using support vector machines. J Syst Softw 81(5):649–660

    Article  Google Scholar 

  23. Maua G, Grbac GT (2017) Co-evolutionary multi-population genetic programming for classification in software defect prediction. Appl Soft Comput 55:331–351

    Article  Google Scholar 

  24. Parag CP (2010) Exhaustive and heuristic search approaches for learning software defect prediction model. Eng Appl Artif Intell 23(1):34–40

    Article  Google Scholar 

  25. Abaei G, Selamat A, Fujita H (2015) An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction. Knowl Based Syst 74:28–39

    Article  Google Scholar 

  26. Li B, Shen B, Wang J, Chen Y, Zhang T, and Wang J (2014) A scenario-based approach to predicting software defects using compressed C4. 5 model. In: IEEE 38th annual computer software and applications conference, p 406–415

  27. Aman H, Amasaki S, Sasaki T, Kawahara M (2015) Lines of comments as a noteworthy metric for analyzing fault-proneness in methods. IEICE Trans Inf Syst 98(12):2218–2228

    Article  Google Scholar 

  28. Ulan M, Löwe W, Ericsson M, Wingkvist A (2021) Copula-based software metrics aggregation. Softw Qual J. https://doi.org/10.1007/s11219-021-09568-9

    Article  Google Scholar 

  29. Biçer MS, Diri B (2016) Defect prediction for cascading style sheets. Appl Soft Comput 49:1078–1084

    Article  Google Scholar 

  30. Bowes D, Hall T, Harman M, Jia Y, Sarro F, and Wu F, (2016) Mutation-aware fault prediction. In: Proceedings of the 25th international symposium on software testing and analysis, p 330–341

  31. Kim S, Whitehead EJ, Zhang Y (2008) Classifying software changes: clean or buggy. IEEE Trans Softw Eng 34(2):181–196

    Article  Google Scholar 

  32. Zhao Y, Yang Y, Lu H, Liu J, Leung H, Wu Y, Xu B (2017) Understanding the value of considering client usage context in package cohesion for fault-proneness prediction. Autom Softw Eng 24(2):393–453

    Article  Google Scholar 

  33. Goyal S (2021) Predicting the defects using stacked ensemble learner with filtered dataset. Autom Softw Eng 28:14. https://doi.org/10.1007/s10515-021-00285-y

    Article  Google Scholar 

  34. Yao J, Shepperd M (2021) The impact of using biased performance metrics on software defect prediction research. Inf Softw Technol 139:1–14. https://doi.org/10.1016/j.infsof.2021.106664

    Article  Google Scholar 

  35. Matloob F, Ghazal TM, Taleb N, Aftab S, Ahmad M, Adnan-Khan M, Abbas S, Rahim-Soomro T (2021) Software defect prediction using ensemble learning: a systematic literature review. Digit Object Identif. https://doi.org/10.1109/ACCESS.2021.3095559

    Article  Google Scholar 

  36. Musinat B, Johnson F, Folorunso O, Ezinne I (2021) Genetic algorithm-based multi-objective optimization model for software bugs prediction. Ann J Tech Univ Varna Bulg 6(1):34–48

    Article  Google Scholar 

  37. Wang Y, Zorzi S and Bittner K (2021) Machine-learned 3D Building Vectorization from Satellite Imagery. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), p 1–14. https://arxiv.org/pdf/2104.06485.pdf

  38. Balogun AO, Oladele RO, Mojeed HA, Amin-Balogun B, Adeyemo VE, Aro TO (2021) Performance analysis of selected clustering techniques for software defects prediction. Afr J Comput ICT 12(2):30–42

    Google Scholar 

  39. Alattas K (2021) System error estimate using combination of classification and optimization technique. J Comput Sci 17(3):319–329. https://doi.org/10.3844/jcssp.2021.319.329

    Article  Google Scholar 

  40. Haj-Ali A, Ahmad NK, Willkie T, Sopjia Y, Asanovic K, and Stoica I (2020) Neurovectorizer: end-to-end vectorization with deep learning. ACM 1SBN978–01–4503, p 242–245

  41. Matveiev O, Zubenko A, Yevtushenko D, and Cherednichenko O (2022) Towards classifying HTML embedded product data based on machine learning approach. pp 1–11

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Femi Johnson.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Johnson, F., Oluwatobi, O., Folorunso, O. et al. Optimized ensemble machine learning model for software bugs prediction. Innovations Syst Softw Eng 19, 91–101 (2023). https://doi.org/10.1007/s11334-022-00506-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11334-022-00506-x

Keywords

Navigation