Abstract
Predicting software defects is critical for ensuring software quality. Many supervised learning approaches have been used to detect defect-prone instances in recent years. However, the efficacy of these supervised learning approaches is still inadequate, and more sophisticated techniques will be required to boost the effectiveness of defect prediction models. In this paper, we present a light gradient boosting methodology based on ensemble learning that uses simultaneous feature selection (Recursive Feature Elimination (RFE)) and hyperparameter tuning (Random search). Our proposed technique LGBM + Randomsearch + RFE method is evaluated using the AEEEM dataset, including Apache Lucene, Eclipse JDT Core, Equinox, Mylyn, and Eclipse PDE UI. The experimental findings demonstrate that the proposed approach outperforms LGBM + Randomsearch, LGBM, and the top classical machine learning algorithms on all performance criteria considered.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
J. Chen, Y. Yang, K. Hu, Q. Xuan, Y. Liu, C. Yang, Multiview transfer learning for software defect prediction. IEEE Access 7, 8901–8916 (2019). https://doi.org/10.1109/ACCESS.2018.2890733
T. Zhou, X. Sun, X. Xia, B. Li, X. Chen, Improving defect prediction with deep forest. Inf. Softw. Technol. 114, 204–216 (2019). https://doi.org/10.1016/j.infsof.2019.07.003
P. Suresh Kumar, H.S. Behera, J. Nayak, B. Naik, A pragmatic ensemble learning approach for effective software effort estimation. Innov. Syst. Softw. Eng. (2021). https://doi.org/10.1007/s11334-020-00379-y
P. Suresh Kumar, H.S. Behera, J. Nayak, B. Naik, Bootstrap aggregation ensemble learning-based reliable approach for software defect prediction by using characterized code feature. Innov. Syst. Softw. Eng. 17(4), 355–379 (2021). https://doi.org/10.1007/s11334-021-00399-2
R. Shatnawi, Improving software fault-prediction for imbalanced data, in 2012 International Conference on Innovations in Information Technology (IIT), Mar 2012, pp. 54–59. https://doi.org/10.1109/INNOVATIONS.2012.6207774
R. Chen, S.-K. Guo, X.-Z. Wang, T.-L. Zhang, Fusion of multi-RSMOTE with fuzzy integral to classify bug reports with an imbalanced distribution. IEEE Trans. Fuzzy Syst. 27(12), 2406–2420 (2019). https://doi.org/10.1109/TFUZZ.2019.2899809
S. Mehta, K.S. Patnaik, Improved prediction of software defects using ensemble machine learning techniques. Neural Comput. Appl. 33(16), 10551–10562 (2021). https://doi.org/10.1007/s00521-021-05811-3
V.U.B. Challagulla, F.B. Bastani, I.-L. Yen, R.A. Paul, Empirical assessment of machine learning based software defect prediction techniques, in 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems (2005), pp. 263–270. https://doi.org/10.1109/WORDS.2005.32
Ö.F. Arar, K. Ayan, A feature dependent Naive Bayes approach and its application to the software defect prediction problem. Appl. Soft Comput. 59, 197–209 (2017). https://doi.org/10.1016/j.asoc.2017.05.043
X. Rong, F. Li, Z. Cui, A model for software defect prediction using support vector machine based on CBA. Int. J. Intell. Syst. Technol. Appl. 15(1), 19 (2016). https://doi.org/10.1504/IJISTA.2016.076102
H. Lu, B. Cukic, M. Culp, Software defect prediction using semi-supervised learning with dimension reduction, in Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering—ASE 2012 (2012), p. 314. https://doi.org/10.1145/2351676.2351734
I.H. Laradji, M. Alshayeb, L. Ghouti, Software defect prediction using ensemble learning on selected features. Inf. Softw. Technol. 58, 388–402 (2015). https://doi.org/10.1016/j.infsof.2014.07.005
J.M. Catherine, S. Djodilatchoumy, Multi-layer perceptron neural network with feature selection for software defect prediction, in 2021 2nd International Conference on Intelligent Engineering and Management (ICIEM), Apr 2021, pp. 228–232. https://doi.org/10.1109/ICIEM51511.2021.9445350
S. Guo, J. Dong, H. Li, J. Wang, Software defect prediction with imbalanced distribution by radius-synthetic minority over-sampling technique. J. Softw. Evol. Process 33(7), 1–21 (2021). https://doi.org/10.1002/smr.2362
R. Malhotra, V. Agrawal, V. Pal, T. Agarwal, Support vector based oversampling technique for handling class imbalance in software defect prediction, in 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Jan 2021, pp. 1078–1083. https://doi.org/10.1109/Confluence51648.2021.9377068
J. Zheng, X. Wang, D. Wei, B. Chen, Y. Shao, A novel imbalanced ensemble learning in software defect predication. IEEE Access 9, 86855–86868 (2021). https://doi.org/10.1109/ACCESS.2021.3072682
Y. Liu, F. Sun, J. Yang, D. Zhou, Software defect prediction model based on improved BP neural network, in 2019 6th International Conference on Dependable Systems and Their Applications (DSA), Jan 2020, pp. 521–522. https://doi.org/10.1109/DSA.2019.00095
A. Rahim, Z. Hayat, M. Abbas, A. Rahim, M.A. Rahim, Software defect prediction with Naïve Bayes classifier, in 2021 International Bhurban Conference on Applied Sciences and Technologies (IBCAST), Jan 2021, pp. 293–297. https://doi.org/10.1109/IBCAST51254.2021.9393250
A. Arya, S. Kumar, V. Singh, Prediction of defects in software using machine learning classifiers (2021), pp. 481–494
K.V. Kumar, P. Kumari, A. Chatterjee, D.P. Mohapatra, Software fault prediction using random forests, in Smart Innovation, Systems and Technologies, vol. 194 (2021), pp. 95–103. https://doi.org/10.1007/978-981-15-5971-6_10
A.O. Balogun et al., Impact of feature selection methods on the predictive performance of software defect prediction models: an extensive empirical study. Symmetry (Basel) 12(7), 1147 (2020). https://doi.org/10.3390/sym12071147
F. Zhang, Q. Zheng, Y. Zou, A.E. Hassan, Cross-project defect prediction using a connectivity-based unsupervised classifier, in Proceedings of the 38th International Conference on Software Engineering—ICSE ’16 (2016), 14–22 May 2016, pp. 309–320. https://doi.org/10.1145/2884781.2884839
G. Ke et al., LightGBM: a highly efficient gradient boosting decision tree, in 31st Conference on Neural Information Processing Systems (NIPS 2017) (2017), pp. 3147–3155. [Online]. Available: https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf
J. Fan, X. Ma, L. Wu, F. Zhang, X. Yu, W. Zeng, Light gradient boosting machine: an efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data. Agric. Water Manag. 225, 105758 (2019). https://doi.org/10.1016/j.agwat.2019.105758
M. D’Ambros, M. Lanza, R. Robbes, An extensive comparison of bug prediction approaches, in 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), May 2010, pp. 31–41. https://doi.org/10.1109/MSR.2010.5463279
A.O. Balogun et al., Impact of feature selection methods on the predictive performance of software defect prediction models: an extensive empirical study. Symmetry (Basel) 12(7) (2020). https://doi.org/10.3390/sym12071147
J. Nayak, P.S. Kumar, D.K. Reddy, B. Naik, Identification and classification of hepatitis C virus: an advance machine-learning-based approach, in Blockchain and Machine Learning for e-Healthcare Systems (Institution of Engineering and Technology, 2020), pp. 393–415
T. Yu, C.-Y. Huang, N.C. Fang, Use of deep learning model with attention mechanism for software fault prediction, in International Conference on Dependable Systems and Their Applications (2021), pp. 161–171. https://doi.org/10.1109/DSA52907.2021.00025
Y. Sun, X.Y. Jing, F. Wu, Y. Sun, Manifold embedded distribution adaptation for cross-project defect prediction. IET Softw. 14(7), 825–838 (2020). https://doi.org/10.1049/iet-sen.2019.0389
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Pemmada, S.K., Nayak, J., Behera, H.S., Pelusi, D. (2022). Light Gradient Boosting Machine in Software Defect Prediction: Concurrent Feature Selection and Hyper Parameter Tuning. In: Raj, J.S., Shi, Y., Pelusi, D., Balas, V.E. (eds) Intelligent Sustainable Systems. Lecture Notes in Networks and Systems, vol 458. Springer, Singapore. https://doi.org/10.1007/978-981-19-2894-9_32
Download citation
DOI: https://doi.org/10.1007/978-981-19-2894-9_32
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2893-2
Online ISBN: 978-981-19-2894-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)