Skip to main content

Light Gradient Boosting Machine in Software Defect Prediction: Concurrent Feature Selection and Hyper Parameter Tuning

  • Conference paper
  • First Online:
Intelligent Sustainable Systems

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 458))

  • 461 Accesses

Abstract

Predicting software defects is critical for ensuring software quality. Many supervised learning approaches have been used to detect defect-prone instances in recent years. However, the efficacy of these supervised learning approaches is still inadequate, and more sophisticated techniques will be required to boost the effectiveness of defect prediction models. In this paper, we present a light gradient boosting methodology based on ensemble learning that uses simultaneous feature selection (Recursive Feature Elimination (RFE)) and hyperparameter tuning (Random search). Our proposed technique LGBM + Randomsearch + RFE method is evaluated using the AEEEM dataset, including Apache Lucene, Eclipse JDT Core, Equinox, Mylyn, and Eclipse PDE UI. The experimental findings demonstrate that the proposed approach outperforms LGBM + Randomsearch, LGBM, and the top classical machine learning algorithms on all performance criteria considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. J. Chen, Y. Yang, K. Hu, Q. Xuan, Y. Liu, C. Yang, Multiview transfer learning for software defect prediction. IEEE Access 7, 8901–8916 (2019). https://doi.org/10.1109/ACCESS.2018.2890733

    Article  Google Scholar 

  2. T. Zhou, X. Sun, X. Xia, B. Li, X. Chen, Improving defect prediction with deep forest. Inf. Softw. Technol. 114, 204–216 (2019). https://doi.org/10.1016/j.infsof.2019.07.003

    Article  Google Scholar 

  3. P. Suresh Kumar, H.S. Behera, J. Nayak, B. Naik, A pragmatic ensemble learning approach for effective software effort estimation. Innov. Syst. Softw. Eng. (2021). https://doi.org/10.1007/s11334-020-00379-y

  4. P. Suresh Kumar, H.S. Behera, J. Nayak, B. Naik, Bootstrap aggregation ensemble learning-based reliable approach for software defect prediction by using characterized code feature. Innov. Syst. Softw. Eng. 17(4), 355–379 (2021). https://doi.org/10.1007/s11334-021-00399-2

  5. R. Shatnawi, Improving software fault-prediction for imbalanced data, in 2012 International Conference on Innovations in Information Technology (IIT), Mar 2012, pp. 54–59. https://doi.org/10.1109/INNOVATIONS.2012.6207774

  6. R. Chen, S.-K. Guo, X.-Z. Wang, T.-L. Zhang, Fusion of multi-RSMOTE with fuzzy integral to classify bug reports with an imbalanced distribution. IEEE Trans. Fuzzy Syst. 27(12), 2406–2420 (2019). https://doi.org/10.1109/TFUZZ.2019.2899809

    Article  Google Scholar 

  7. S. Mehta, K.S. Patnaik, Improved prediction of software defects using ensemble machine learning techniques. Neural Comput. Appl. 33(16), 10551–10562 (2021). https://doi.org/10.1007/s00521-021-05811-3

    Article  Google Scholar 

  8. V.U.B. Challagulla, F.B. Bastani, I.-L. Yen, R.A. Paul, Empirical assessment of machine learning based software defect prediction techniques, in 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems (2005), pp. 263–270. https://doi.org/10.1109/WORDS.2005.32

  9. Ö.F. Arar, K. Ayan, A feature dependent Naive Bayes approach and its application to the software defect prediction problem. Appl. Soft Comput. 59, 197–209 (2017). https://doi.org/10.1016/j.asoc.2017.05.043

    Article  Google Scholar 

  10. X. Rong, F. Li, Z. Cui, A model for software defect prediction using support vector machine based on CBA. Int. J. Intell. Syst. Technol. Appl. 15(1), 19 (2016). https://doi.org/10.1504/IJISTA.2016.076102

    Article  Google Scholar 

  11. H. Lu, B. Cukic, M. Culp, Software defect prediction using semi-supervised learning with dimension reduction, in Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering—ASE 2012 (2012), p. 314. https://doi.org/10.1145/2351676.2351734

  12. I.H. Laradji, M. Alshayeb, L. Ghouti, Software defect prediction using ensemble learning on selected features. Inf. Softw. Technol. 58, 388–402 (2015). https://doi.org/10.1016/j.infsof.2014.07.005

    Article  Google Scholar 

  13. J.M. Catherine, S. Djodilatchoumy, Multi-layer perceptron neural network with feature selection for software defect prediction, in 2021 2nd International Conference on Intelligent Engineering and Management (ICIEM), Apr 2021, pp. 228–232. https://doi.org/10.1109/ICIEM51511.2021.9445350

  14. S. Guo, J. Dong, H. Li, J. Wang, Software defect prediction with imbalanced distribution by radius-synthetic minority over-sampling technique. J. Softw. Evol. Process 33(7), 1–21 (2021). https://doi.org/10.1002/smr.2362

    Article  Google Scholar 

  15. R. Malhotra, V. Agrawal, V. Pal, T. Agarwal, Support vector based oversampling technique for handling class imbalance in software defect prediction, in 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Jan 2021, pp. 1078–1083. https://doi.org/10.1109/Confluence51648.2021.9377068

  16. J. Zheng, X. Wang, D. Wei, B. Chen, Y. Shao, A novel imbalanced ensemble learning in software defect predication. IEEE Access 9, 86855–86868 (2021). https://doi.org/10.1109/ACCESS.2021.3072682

    Article  Google Scholar 

  17. Y. Liu, F. Sun, J. Yang, D. Zhou, Software defect prediction model based on improved BP neural network, in 2019 6th International Conference on Dependable Systems and Their Applications (DSA), Jan 2020, pp. 521–522. https://doi.org/10.1109/DSA.2019.00095

  18. A. Rahim, Z. Hayat, M. Abbas, A. Rahim, M.A. Rahim, Software defect prediction with Naïve Bayes classifier, in 2021 International Bhurban Conference on Applied Sciences and Technologies (IBCAST), Jan 2021, pp. 293–297. https://doi.org/10.1109/IBCAST51254.2021.9393250

  19. A. Arya, S. Kumar, V. Singh, Prediction of defects in software using machine learning classifiers (2021), pp. 481–494

    Google Scholar 

  20. K.V. Kumar, P. Kumari, A. Chatterjee, D.P. Mohapatra, Software fault prediction using random forests, in Smart Innovation, Systems and Technologies, vol. 194 (2021), pp. 95–103. https://doi.org/10.1007/978-981-15-5971-6_10

  21. A.O. Balogun et al., Impact of feature selection methods on the predictive performance of software defect prediction models: an extensive empirical study. Symmetry (Basel) 12(7), 1147 (2020). https://doi.org/10.3390/sym12071147

    Article  Google Scholar 

  22. F. Zhang, Q. Zheng, Y. Zou, A.E. Hassan, Cross-project defect prediction using a connectivity-based unsupervised classifier, in Proceedings of the 38th International Conference on Software Engineering—ICSE ’16 (2016), 14–22 May 2016, pp. 309–320. https://doi.org/10.1145/2884781.2884839

  23. G. Ke et al., LightGBM: a highly efficient gradient boosting decision tree, in 31st Conference on Neural Information Processing Systems (NIPS 2017) (2017), pp. 3147–3155. [Online]. Available: https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf

  24. J. Fan, X. Ma, L. Wu, F. Zhang, X. Yu, W. Zeng, Light gradient boosting machine: an efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data. Agric. Water Manag. 225, 105758 (2019). https://doi.org/10.1016/j.agwat.2019.105758

  25. M. D’Ambros, M. Lanza, R. Robbes, An extensive comparison of bug prediction approaches, in 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), May 2010, pp. 31–41. https://doi.org/10.1109/MSR.2010.5463279

  26. A.O. Balogun et al., Impact of feature selection methods on the predictive performance of software defect prediction models: an extensive empirical study. Symmetry (Basel) 12(7) (2020). https://doi.org/10.3390/sym12071147

  27. J. Nayak, P.S. Kumar, D.K. Reddy, B. Naik, Identification and classification of hepatitis C virus: an advance machine-learning-based approach, in Blockchain and Machine Learning for e-Healthcare Systems (Institution of Engineering and Technology, 2020), pp. 393–415

    Google Scholar 

  28. T. Yu, C.-Y. Huang, N.C. Fang, Use of deep learning model with attention mechanism for software fault prediction, in International Conference on Dependable Systems and Their Applications (2021), pp. 161–171. https://doi.org/10.1109/DSA52907.2021.00025

  29. Y. Sun, X.Y. Jing, F. Wu, Y. Sun, Manifold embedded distribution adaptation for cross-project defect prediction. IET Softw. 14(7), 825–838 (2020). https://doi.org/10.1049/iet-sen.2019.0389

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suresh Kumar Pemmada .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pemmada, S.K., Nayak, J., Behera, H.S., Pelusi, D. (2022). Light Gradient Boosting Machine in Software Defect Prediction: Concurrent Feature Selection and Hyper Parameter Tuning. In: Raj, J.S., Shi, Y., Pelusi, D., Balas, V.E. (eds) Intelligent Sustainable Systems. Lecture Notes in Networks and Systems, vol 458. Springer, Singapore. https://doi.org/10.1007/978-981-19-2894-9_32

Download citation

Publish with us

Policies and ethics