Abstract
Software defect prediction is a crucial discipline within the software development life cycle. Accurate identification of defective modules in software can result in time and cost savings for developers. The ELM algorithm offers the benefits of rapid training and robust learning capabilities. Numerous researchers in the field of software defect prediction have employed the ELM algorithm. However, the ELM algorithm, a single hidden layer feedforward neural network, faces challenges related to random parameter selection and limited generalization ability. To enhance the predictive performance of the ELM algorithm in software defect prediction. Most researchers utilize swarm intelligence optimization algorithms to optimize extreme learning machines. However, these optimization methods may encounter challenges related to fall into local optimal solution. This paper introduces a new sparrow search algorithm (2SSSA) built upon the original sparrow search algorithm. To enhance the original sparrow algorithm’s ability to escape local extrema, the pinhole imaging reverse learning and somersault foraging strategies are employed. The performance of 2SSSA in terms of optimization and convergence speed is assessed using 8 randomly selected benchmark functions and 8 CEC2017 functions. Additionally, ensemble learning is a prominent research focus in the field of software defect prediction. Ensemble learning is known for its ability to significantly enhance prediction performance and model generalization. As a result, the ELM optimized using 2SSSA serves as the foundational predictor in the bagging ensemble learning algorithm. We propose an ensemble algorithm for software defect prediction, denoted as 2SSEBA, which employs a 2-step optimization sparrow algorithm (2SSSA) to optimize extreme learning machines. Based on an evaluation of 25 publicly available software defect prediction datasets using 5 commonly employed metrics. The predictive performance of 2SSEBA significantly outperforms the other five advanced prediction algorithms. Furthermore, this conclusion is supported by both Friedman ranking and Holm’s post-hoc test.
Similar content being viewed by others
Data availability
Data will be made available on reasonable request. Datasets URL is http://promise.site.uottawa.ca/SERepository/datasets-page.html.
References
Zhao, Y., Damevski, K., Chen, H.: A systematic survey of just-in-time software defect prediction. ACM Comput. Surv. 55(10), 1–35 (2023)
Tabassum, S., Minku, L.L., Feng, D.: Cross-project online just-in-time software defect prediction. IEEE Trans. Softw. Eng. 49(1), 268–287 (2022)
Wang, H., Zhuang, W., Zhang, X.: Software defect prediction based on gated hierarchical LSTMs. IEEE Trans. Reliab. 70(2), 711–727 (2021)
Nevendra, M., Singh, P.: A survey of software defect prediction based on deep learning. Arch. Comput. Methods Eng. 29(7), 5723–5748 (2022)
Cabral, G.G., Minku, L.L.: Towards reliable online just-in-time software defect prediction. IEEE Trans. Softw. Eng. 49(3), 1342–1358 (2022)
Xu, J., Ai, J., Liu, J., Shi, T.: ACGDP: an augmented code graph-based system for software defect prediction. IEEE Trans. Reliab. 71(2), 850–864 (2022)
Wan, X., Zheng, Z., Liu, Y.: SPE$^{2}$: self-paced ensemble of ensembles for software defect prediction. IEEE Trans. Reliab. 71(2), 865–879 (2022)
Goyal, S.: Handling class-imbalance with KNN (neighbourhood) under-sampling for software defect prediction. Artif. Intell. Rev. 55(3), 2023–2064 (2022)
Gong, L., Rajbahadur, G.K., Hassan, A.E.: Revisiting the impact of dependency network metrics on software defect prediction. IEEE Trans. Softw. Eng. 48(12), 5030–5049 (2021)
Gangwar, A.K., Kumar, S.: Concept drift in software defect prediction: a method for detecting and handling the drift. ACM Trans. Internet Technol. 23(2), 1–28 (2023)
Gong, L., Zhang, H., Zhang, J., Wei, M., Huang, Z.: A comprehensive investigation of the impact of class overlap on software defect prediction. IEEE Trans. Softw. Eng. 49(4), 2440–2458 (2022)
Xu, Z., Liu, J., Luo, X.P., Yang, Z.J., Zhang, Y.F., Yuan, P.P., Tang, Y.T., Zhang, T.: Software defect prediction based on kernel PCA and weighted extreme learning machine. Inf. Softw. Technol. 106, 182–200 (2019)
Mi, W., Li, Y., Wen, M., Chen, Y.: Using active learning selection approach for cross-project software defect prediction. Connect. Sci. 34(1), 1482–1499 (2022)
Mehta, S., Patnaik, K.S.: Improved prediction of software defects using ensemble machine learning techniques. Neural Comput. Appl. 33, 10551–10562 (2021)
Zivkovic, T., Nikolic, B., Simic, V., Pamucar, D., Bacanin, N.: Software defects prediction by metaheuristics tuned extreme gradient boosting and analysis based on shapley additive explanations. Appl. Softw. Comput. 146, 110659 (2023)
Jiang, F., Yu, X., Gong, D.W., Du, J.W.: A random approximate reduct-based ensemble learning approach and its application in software defect prediction. Inf. Sci. 609, 1147–1168 (2022)
Thirumoorthy, K., Britto, J.J.J.: A feature selection model for software defect prediction using binary Rao optimization algorithm. Appl. Softw. Comput. 131, 109737 (2022)
Tong, H.N., Lu, W., Xing, W.W., Liu, B., Wang, S.H.: SHSE: a subspace hybrid sampling ensemble method for software defect number prediction. Inf. Softw. Technol. 142, 106747 (2022)
Feng, S., Keung, J., Yu, X., Xiao, Y., Bennin, K.E., Kabir, M.A., Zhang, M.: COSTE: complexity-based oversampling technique to alleviate the class imbalance problem in software defect prediction. Inf. Softw. Technol. 129, 106432 (2021)
Ding, L., Zhang, X.Y., Wu, D.Y.: Application of an extreme learning machine network with particle swarm optimization in syndrome classification of primary liver cancer. J. Integr. Med. 19(5), 395–407 (2021)
Li, L.L., Sun, J., Tseng, M.L.: Extreme learning machine optimized by whale optimization algorithm using insulated gate bipolar transistor module aging degree evaluation. Expert Syst. Appl. 127, 58–67 (2019)
Kaur, G., Arora, S.: Chaotic whale optimization algorithm. J Comput. Design Eng. 5(3), 275–284 (2018)
Abualigah, L., Diabat, A., Mirjalili, S., Abd, E.M., Gandomi, A.H.: The arithmetic optimization algorithm. Comput. Methods Appl. Mech. Eng. 376, 113609 (2021)
Abualigah, L., Yousri, D., Abd, E.M., Ewees, A.A., Al-Qaness, M.A., Gandomi, A.H.: Aquila optimizer: a novel meta-heuristic optimization algorithm. Comput. Ind. Eng. 157, 107250 (2021)
Abualigah, L., Abd, E.M., Sumari, P., Geem, Z.W., Gandomi, A.H.: Reptile search algorithm (RSA): a nature-inspired meta-heuristic optimizer. Expert Syst. Appl. 191, 116158 (2022)
Xue, J.T., Shen, B.: A novel swarm intelligence optimization approach: sparrow search algorithm. Syst. Sci. Control Eng. 8(1), 22–34 (2020)
Abualigah, L., Qasim, L.M.: Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin (2019)
Ganti, P.K., Naik, H., Barada, M.K.: Environmental impact analysis and enhancement of factors affecting the photovoltaic (PV) energy utilization in mining industry by sparrow search optimization based gradient boosting decision tree approach. Energy 244, 122561 (2022)
Ouyang, C.T., Qiu, Y., Zhu, D.L.: Adaptive spiral flying sparrow search algorithm. Sci. Progr. 2021, 1–16 (2021)
Jiang, Z.Y., Ge, J., Xu, Q., Yang, T.: Fast trajectory optimization for gliding reentry vehicle based on improved sparrow search algorithm. J Phys.: Conf. Ser. 1986(1), 012114 (2021)
Li, J., Chen, J., Shi, J.: Evaluation of new sparrow search algorithms with sequential fusion of improvement strategies. Comput. Ind. Eng. 182, 109425 (2023)
Geng, J., Sun, X., Wang, H., Bu, X., Liu, D., Li, F., Zhao, Z.: A modified adaptive sparrow search algorithm based on chaotic reverse learning and spiral search for global optimization. Neural Comput. Appl. 2023, 1–18 (2023)
Ren, J.J., Wang, Y.P., Mao, M.P.: Equalization ensemble for large scale highly imbalanced data classification. Knowl. Based Syst. 242, 108295 (2022)
Dai, Q., Liu, J.W.: Multi-granularity relabeled under-sampling algorithm for imbalanced data. Appl. Softw. Comput. 124, 109083 (2022)
Dai, Q., Liu, J.W., Yang, J.P.: Class-imbalanced positive instances augmentation via three-line hybrid. Knowl. Based Syst. 257, 109902 (2022)
Vuttipittayamongkol, P., Elyan, E., Petrovski, A.: On the class overlap problem in imbalanced data classification. Knowl.-Based Syst. 212, 106631 (2021)
Ganaie, M.A., Hu, M., Malik, A.K., Tanveer, M., Suganthan, P.N.: Ensemble deep learning: a review. Eng. Appl. Artif. Intell. 115, 105151 (2022)
Leo, B.: Bagging predictors. Mach Learn 24(2), 123–140 (1996)
Duffy, N., Helmbold, D.: Boosting methods for regression. Mach Learn 47(2), 153–200 (2002)
Winsen, M., Denman, S., Corcoran, E., Hamilton, G.: Automated detection of koalas with deep learning ensembles. Remote Sens. 14(10), 2432 (2022)
Tian, J., Li, K., Xue, W.: An adaptive ensemble predictive strategy for multiple scale electrical energy usages forecasting. Sustain. Cities Soc. 66, 102654 (2021)
Feng, D.C., Cetiner, B., Azadi, K.M.R., Taciroglu, E.: Data-driven approach to predict the plastic hinge length of reinforced concrete columns and its application. J. Struct. Eng. 147(2), 04020332 (2021)
Sun, Z., Song, Q., Zhu, X.: Using coding-based ensemble learning to improve software defect prediction. IEEE Trans. Syst. Man, Cybern. Part C (Appl. Rev.). 42(6), 1806–1817 (2012)
Xu, C., Zhang, S.W.: A genetic algorithm-based sequential instance selection framework for ensemble learning. Expert Syst. Appl. 236, 121269 (2023)
Bhutamapuram, U.S., Sadam, R.: With-in-project defect prediction using bootstrap aggregation based diverse ensemble learning technique. J King Saud Univ. Comput. Inform. Sci. 34(10), 8675–8691 (2022)
Khadijah, K., Sasongko, P.S.: Software defect prediction using synthetic minority over-sampling technique and extreme learning machine. Kinetik Game Technol. Inf. Syst. Comput. Netw. Comput. Electron. Control 7(2), 60–68 (2019)
Zain, Z.M., Sakri, S., Ismail, N.H.A., Parizi, R.: Software defect prediction harnessing on multi 1-dimensional convolutional neural network structure. CMC-Comput. Mater. Continua 71(1), 1521–1546 (2022)
Zhu, K., Ying, S., Zhang, N.: Software defect prediction based on enhanced metaheuristic feature selection optimization and a hybrid deep neural network. J. Syst. Softw. 180, 111026 (2021)
Ding, Z., Xing, L.: Improved software defect prediction using Pruned Histogram-based isolation forest. Reliab. Eng. Syst. Saf. 204, 107170 (2020)
Pandey, S.K., Rathee, D., Tripathi, A.K.: Software defect prediction using K-PCA and various kernel-based extreme learning machine: an empirical study. IET Softw. 14(7), 768–782 (2020)
Liu, B.Y., Chen, G.L., Lin, H.C.: Prediction of IGBT junction temperature using improved cuckoo search-based extreme learning machine. Microelectron. Reliab. 124, 114267 (2021)
Tang, Y., Dai, Q., Yang, M.Y., Du, T., Chen, L.F.: Software defect prediction ensemble learning algorithm based on adaptive variable sparrow search algorithm. Int. J. Mach. Learn. Cybern. 14(6), 1967–1987 (2023)
Zhai, J., Xu, H., Wang, X.: Dynamic ensemble extreme learning machine based on sample entropy. Soft. Comput. 16(9), 1493–1502 (2012)
Zhao, L.J., Yuan, D.C., Chai, T.Y., Tang, J.: KPCA and ELM ensemble modeling of wastewater effluent quality indices. Procedia Eng. 15, 5558–5562 (2011)
Tian, Z.D., Chen, H.: A novel decomposition-ensemble prediction model for ultra-short-term wind speed. Energy Convers. Manage. 248, 114775 (2021)
Long, W., Jiao, J., Liang, X.M.: Pinhole-imaging-based learning butterfly optimization algorithm for global optimization and feature selection. Appl. Softw. Comput. 103, 107164 (2021)
Zhao, W.G., Zhang, Z.X., Wang, L.Y.: Manta ray foraging optimization: An effective bio-inspired optimizer for engineering applications. Eng. Appl. Artif. Intell. 87, 103300 (2020)
Wang, Y., Lin, K.Y., Cheng, S., Li, L.: Variational quantum extreme learning machine. Neurocomputing 512, 83–99 (2022)
Zhang, Z., Cai, Y., Gong, W.: Semi-supervised learning with graph convolutional extreme learning machines. Expert Syst. Appl. 213, 119164 (2023)
Zhu, X., He, Y., Cheng, L.: Software change-proneness prediction through combination of bagging and resampling methods. J. Softw. Maint. Evol. 30(12), e2111 (2018)
Zhang, G., Wang, C., Liu, C., Sha, D.: Bagging-based positive-unlabeled learning algorithm with Bayesian hyperparameter optimization for three-dimensional mineral potential mapping. Comput. Geosci. 154, 104817 (2021)
Ma, J., Hao, Z.Y., Sun, W.J.: Enhancing sparrow search algorithm via multi-strategies for continuous optimization problems. Inf. Process. Manage. 59(2), 102854 (2022)
Garcia, S., Triguero, I., Carmona, C.J., Herrera, F.: Evolutionary-based selection of generalized instances for imbalanced classification. Knowl. Based Syst. 25(1), 3–12 (2012)
Wu, H., Zhang, A.H., Han, Y., Li, K.: Fast stochastic configuration network based on an improved sparrow search algorithm for fire flame recognition. Knowl. Based Syst. 245, 108626 (2022)
Wang, S.H., Huang, S.Y.: Perturbation theory for cross data matrix-based PCA. J. Multivar. Anal. 190, 104960 (2022)
Meng, D.X., Li, Y.J.: An imbalanced learning method by combining SMOTE with center offset factor. Appl. Softw. Comput. 120, 108618 (2022)
Zhang, Y., Lo, D., Xia, X., Sun, J.: An empirical study of classifier combination for cross-project defect prediction. IEEE 39th Annu. Comput. Softw. Appl. Conf. 2, 264–269 (2015)
Chen, L., Fang, B., Shang, Z., Tang, Y.: Negative samples reduction in cross-company software defects prediction. Inf. Softw. Technol. 62, 67–77 (2015)
Shao, Y., Liu, B., Wang, S.: Software defect prediction based on correlation weighted class association rule mining. Knowl. Based Syst. 196, 105742 (2020)
Dai, Q., Liu, J.W.: Class-overlap undersampling based on schur decomposition for class-imbalance problems. Expert Syst. Appl. 221, 119735 (2023)
Tang, Y., Dai, Q., Du, Y., Chen, L.F., Niu, X.W.: A software defect prediction method based on learnable three-line hybrid feature fusion. Expert Syst. Appl. 239, 122409 (2024)
Acknowledgements
This work is supported by the National Key Research and Development Program of China (2022YFB3105105). We would like to thank the editor and anonymous reviewers for their valuable comments and suggestions to improve the paper.
Funding
This work was funded by National Key Research and Development Program of China (Grant No. 2022YFB3105105).
Author information
Authors and Affiliations
Contributions
Yu Tang: Writing-original draft, Editing and visualization. Qi Dai: Methodology and review. Mengyuan Yang: Data curation and visualization. Lifang Chen: Review and conceptualization. Ye Du: Resources, Supervision and review.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tang, Y., Dai, Q., Yang, M. et al. Software defect prediction ensemble learning algorithm based on 2-step sparrow optimizing extreme learning machine. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04446-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10586-024-04446-y