Skip to main content
Log in

A new ensemble classification approach based on Rotation Forest and LightGBM

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Recent research shows that the Rotation Forest and its several variants can achieve better performance than other widely used ensemble methods in classification issues. However, it is a very time-consuming task to train a Rotation Forest on large-scale and high-dimensional data. To improve its classification accuracy and reduce the computational cost, a novel classification ensemble algorithm named RoF-GBM is presented in this paper. In the novel algorithm, singular value decomposition is employed to solve the rotation matrix, and then whitening is performed to reduce the computational complexity of PCA. And LightGBM is utilized to train the base classifier to reduce the number of PCA calculations of the whole model. The effectiveness of the proposed RoF-GBM is evaluated against twelve small and medium-scale datasets, three large-scale datasets, three high-dimensional datasets, two artificial datasets and five comparison algorithms. And the extensive experimentation demonstrates that RoF-GBM can observably improve the training speed of Rotation Forest and generate a higher classification accuracy than LightGBM and the variants of Rotation Forest, such as RotBoost and Rotation of Random Forest. Moreover, RoF-GBM also shows strong robustness on some datasets with noise and extreme outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The datasets analyzed during the current study are publicly available in the UCI repository at http://archive.ics.uci.edu/ml/datasets.php and the KEEL repository at http://www.keel.es/.

References

  1. Ding Y, Zhao X, Zhang Z, Cai W, Yang N (2021) Multiscale graph sample and aggregate network with context-aware learning for hyperspectral image classification. IEEE J Sel Top Appl Earth Observ Remote Sens 14:4561–4572

    Article  Google Scholar 

  2. Zhang Y, Liu Y, Yang G, Song J (2022) SSIT: a sample selection-based incremental model training method for image recognition. Neural Comput Appl 34(4):3117–3134

    Article  Google Scholar 

  3. Asim MN, Ghani MU, Ibrahim MA, Mahmood W, Dengel A, Ahmed S (2021) Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classification. Neural Comput Appl 33(11):5437–5469

    Article  Google Scholar 

  4. Wang Y, Wang A, Ai Q, Sun H (2019) Ensemble based fuzzy weighted extreme learning machine for gene expression classification. Appl Intell 49(3):1161–1171

    Article  Google Scholar 

  5. Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181

    MATH  MathSciNet  Google Scholar 

  6. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  7. Rokach L (2016) Decision forest: twenty years of research. Inf Fus 27:111–125

    Article  Google Scholar 

  8. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  MATH  MathSciNet  Google Scholar 

  9. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Article  MATH  Google Scholar 

  10. Freund Y, Schapire RE, 1996. Experiments with a new boosting algorithm, icml. Publishing, pp 148–156

  11. Xu J, Dang D, Ma Q, Liu X, Han Q (2022) A novel and robust data anomaly detection framework using LAL-AdaBoost for structural health monitoring. J Civil Struct Health Monit. https://doi.org/10.1007/s13349-021-00544-2

    Article  Google Scholar 

  12. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat. https://doi.org/10.1214/aos/1013203451

    Article  MATH  MathSciNet  Google Scholar 

  13. Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system, Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. Publishing, pp 785–794

  14. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 30

  15. Raffei AFM, Asmuni H, Hassan R, Othman RM (2015) A low lighting or contrast ratio visible iris recognition using iso-contrast limited adaptive histogram equalization. Knowl Based Syst 74:40–48

    Article  Google Scholar 

  16. Wang Q, Nguyen TT, Huang JZ, Nguyen TT (2018) An efficient random forests algorithm for high dimensional data classification. Adv Data Anal Classif 12(4):953–972

    Article  MATH  MathSciNet  Google Scholar 

  17. Shi Y, Liu J, Qi Z, Wang B (2018) Learning from label proportions on high-dimensional data. Neural Netw 103:9–18

    Article  MATH  Google Scholar 

  18. Shafizadeh-Moghadam H (2021) Fully component selection: An efficient combination of feature selection and principal component analysis to increase model performance. Expert Syst Appl 186:115678

    Article  Google Scholar 

  19. Conn D, Ngun T, Li G, Ramirez CM (2019) Fuzzy forests: Extending random forest feature selection for correlated, high-dimensional data. J Stat Softw 91:1–25

    Article  Google Scholar 

  20. Reis I, Baron D, Shahaf S (2018) Probabilistic random forest: A machine learning algorithm for noisy data sets. Astron J 157(1):16

    Article  Google Scholar 

  21. Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630

    Article  Google Scholar 

  22. Guo H, Diao X, Liu H (2018) Embedding undersampling rotation forest for imbalanced problem. Comput Intell Neurosci. https://doi.org/10.1155/2018/6798042

    Article  Google Scholar 

  23. Su C, Ju S, Liu Y, Yu Z (2015) Improving random forest and rotation forest for highly imbalanced datasets. Intell Data Anal 19(6):1409–1432

    Article  Google Scholar 

  24. Xia J, Falco N, Benediktsson JA, Du P, Chanussot J (2017) Hyperspectral image classification with rotation random forest via KPCA. IEEE J Sel Topics Appl Earth Observ Remote Sens 10(4):1601–1609

    Article  Google Scholar 

  25. Eeti LN, Buddhiraju KM (2021) Two hidden layer neural network-based rotation forest ensemble for hyperspectral image classification. Geocarto Int 36(16):1820–1837

    Article  Google Scholar 

  26. Feng W, Quan Y, Dauphin G, Li Q, Gao L, Huang W, Xia J, Zhu W, Xing M (2021) Semi-supervised rotation forest based on ensemble margin theory for the classification of hyperspectral image with limited training data. Inf Sci 575:611–638

    Article  MathSciNet  Google Scholar 

  27. Lu H, Yang L, Yan K, Xue Y, Gao Z (2017) A cost-sensitive rotation forest algorithm for gene expression data classification. Neurocomputing 228:270–276

    Article  Google Scholar 

  28. Aličković E, Subasi A (2017) Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Comput Appl 28(4):753–763

    Article  Google Scholar 

  29. Zhang CX, Zhang JS (2008) RotBoost: a technique for combining Rotation Forest and AdaBoost. Pattern Recogn Lett 29(10):1524–1536

    Article  Google Scholar 

  30. Stiglic G, Rodriguez JJ, Kokol P (2011) Rotation of random forests for genomic and proteomic classification problems, Software Tools and Algorithms for Biological Systems. Publishing, pp 211–221

  31. Zhang C, Liu C, Zhang X, Almpanidis G (2017) An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst Appl 82:128–150

    Article  Google Scholar 

  32. Dhar J (2022) An adaptive intelligent diagnostic system to predict early stage of parkinson’s disease using two-stage dimension reduction with genetically optimized lightgbm algorithm. Neural Comput Appl 34(6):4567–4593

    Article  Google Scholar 

  33. Shaker B, Yu MS, Song JS, Ahn S, Ryu JY, Oh KS, Na D (2021) LightBBB: computational prediction model of blood–brain-barrier penetration based on LightGBM. Bioinformatics 37(8):1135–1139

    Article  Google Scholar 

  34. Tang M, Zhao Q, Wu H, Wang Z (2021) Cost-sensitive LightGBM-based online fault detection method for wind turbine gearboxes. Front Energy Res. https://doi.org/10.3389/fenrg.2021.701574

    Article  Google Scholar 

  35. Ma X, Sha J, Wang D, Yu Y, Yang Q, Niu X (2018) Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning. Electron Commer Res Appl 31:24–39

    Article  Google Scholar 

  36. Li Z, Zhang J, Yao X, Kou G (2021) How to identify early defaults in online lending: a cost-sensitive multi-layer learning framework. Knowl-Based Syst 221:106963

    Article  Google Scholar 

  37. Dua D, Taniskidou EK (2017) UCI machine learning repository (http://archive.ics.uci.edu/ml). University of California, School of Information and Computer Science, Irvine

  38. Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17

  39. Rahman MG, Islam MZ (2013) Missing value imputation using decision trees and decision forests by splitting and merging records: two novel techniques. Knowl-Based Syst 53:51–65

    Article  Google Scholar 

  40. Speybroeck N (2012) Classification and regression trees. Int J Public Health 57(1):243–246

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China [Grant Number 52104146]; Social Science Foundation of Shaanxi Province [Grant Number 2020R005]; Shaanxi province fund for Distinguished Young Scholars [Grant Number 2020JC-44].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qinghua Gu.

Ethics declarations

Conflict of interest

The authors declare that there has no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Fig. 

Fig. 11
figure 11figure 11

The 10-run accuracy of the compared algorithm on small and medium datasets

11.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gu, Q., Sun, W., Li, X. et al. A new ensemble classification approach based on Rotation Forest and LightGBM. Neural Comput & Applic 35, 11287–11308 (2023). https://doi.org/10.1007/s00521-023-08297-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08297-3

Keywords

Navigation