Abstract
Ensemble learning is a machine learning paradigm that integrates the results of multiple base learners according to a certain rule to obtain a better classification result. Ensemble learning has been widely used in many fields, but the existing methods still have the problems of difficult to guarantee the diversity of base learners and low prediction accuracy. In order to overcome the above problems, we considered ensemble learning from the perspective of attribute space division, defined the concept of neighborhood approximate reduction through neighborhood rough set theory, and further proposed an ensemble learning algorithm based on neighborhood approximate reduction, called ELNAR. ELNAR algorithm divides the attribute space of the data set into multiple subspaces. The basic learners trained based on the data sets corresponding to different subspaces have great differences, so as to ensure the strong generalization performance of the ensemble learner. In order to verify the effectiveness of ELNAR algorithm, we applied ELNAR algorithm to software defect prediction. Experiments on 20 NASA MDP data sets show that ELNAR algorithm can better improve the performance of software defect prediction compared with the existing ensemble learning algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rajadurai, H., Gandhi, U.D.: A stacked ensemble learning model for intrusion detection in wireless network. In: Neural Computing and Applications 34, 15387–15395 (2020)
Luo, S.Y., Gu, Y.J., Yao, X.X., Wei, F.: Research on text sentiment analysis based on neural network and ensemble learning. Revue d’Intelligence Artificielle 35(1), 63–70 (2021)
Jabbar, M.A.: Breast cancer data classification using ensemble machine learning. Eng. Appl. Sci. Res. 48(1), 65–72 (2021)
Ali, U., Aftab, S., Iqbal, A., Nawaz, Z., Bashir, M.S., Saeed, M.A.: Software defect prediction using variant based ensemble learning and feature selection techniques. Int. J. Modern Educ. Comput. Sci. 12(5), 29–40 (2020)
Bühlmann, P., Yu, B.: Analyzing bagging. Ann. Stat. 30(4), 927–961 (2002)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Liu, Z.N., et al.: Self-paced ensemble for highly imbalanced massive data classification. In: 9th International Proceedings on Data Engineering, pp. 841–852. IEEE, NY (2020)
García, S., Zhang, Z.L., Altalhi, A., Alshomrani, S., Herrera, F.: Dynamic ensemble selection for multi-class imbalanced datasets. Inf. Sci. 445–456, 22–37 (2018)
Liu, Z.N., et al.: Towards inter-class and intra-class imbalance in class-imbalanced learning. arXiv preprint arXiv:2111.12791 (2021)
Jiang, F., Yu, X., Zhao, H.B., Gong, D.W., Du, J.W.: Ensemble learning based on random super-reduct and resampling. Artif. Intell. Rev. 54(4), 3115–3140 (2021)
Chen, L., Fang, B., Shang, Z.W., Tang, Y.Y.: Tackling class overlap and imbalance problems in software defect prediction. Software Qual. J. 26(1), 97–125 (2018)
Abuqaddom, I., Hudaib, A.: Cost-sensitive learner on hybrid smote-ensemble approach to predict software defects. In: Silhavy, R., Silhavy, P., Prokopova, Z. (eds.) CoMeSySo 2018. AISC, vol. 859, pp. 12–21. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-00211-4_2
Balogun, A.O., et al.: SMOTE-based homogeneous ensemble methods for software defect prediction. In: Gervasi, O., et al. (eds.) ICCSA 2020. LNCS, vol. 12254, pp. 615–631. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58817-5_45
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
MDP Data Repository. http://nasa-softwaredefectdatasets.wikispaces.com/. Accessed 11 Mar 2022
PROMISE Data Repository. https://code.google.com/p/promisedata/. Accessed 11 Mar 2022
Hu, Q.H., Yu, D.R., Xie, Z.X.: Neighborhood classifiers. Expert Syst. Appl. 34(2), 866–876 (2008)
Hu, Q.H., Yu, D.R., Liu, J.F., Wu, C.X.: Neighborhood rough set based heterogeneous feature subset selection. Inf. Sci. 178(18), 3577–3594 (2008)
Hu, Q.H., Liu, J.F., Yu, D.R.: Mixed feature selection based on granulation and approximation. Knowl.-Based Syst. 21(4), 294–304 (2008)
Dolatshah, M., Hadian, A., Minaei-Bidgoli, B.: Ball*-tree: Efficient spatial indexing for constrained nearest-neighbor search in metric spaces. arXiv preprint arXiv:1511.00628 (2015)
Marqués, A.I., García, V., Sánchez, J.S.: Two-level classifier ensembles for credit risk assessment. Expert Syst. Appl. 39(12), 10916–10922 (2012)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Grant Nos. 61973180, 62172249, U1806201), and the Shandong Provincial Natural Science Foundation, China (Grant Nos. ZR2022MF326, ZR2021QF074, ZR2018MF007).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, Z., Du, J., Hu, Q., Jiang, F. (2022). Neighborhood Approximate Reducts-Based Ensemble Learning Algorithm and Its Application in Software Defect Prediction. In: Yao, J., Fujita, H., Yue, X., Miao, D., Grzymala-Busse, J., Li, F. (eds) Rough Sets. IJCRS 2022. Lecture Notes in Computer Science(), vol 13633. Springer, Cham. https://doi.org/10.1007/978-3-031-21244-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-21244-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21243-7
Online ISBN: 978-3-031-21244-4
eBook Packages: Computer ScienceComputer Science (R0)