A new ensemble classification approach based on Rotation Forest and LightGBM

Gu, Qinghua; Sun, Wenjing; Li, Xuexian; Jiang, Song; Tian, Jingni

doi:10.1007/s00521-023-08297-3

A new ensemble classification approach based on Rotation Forest and LightGBM

Original Article
Published: 25 February 2023

Volume 35, pages 11287–11308, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Qinghua Gu ORCID: orcid.org/0000-0002-3100-7243^1,2,
Wenjing Sun^1,2,
Xuexian Li^2,3,
Song Jiang^1,2 &
…
Jingni Tian^1,2

502 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Recent research shows that the Rotation Forest and its several variants can achieve better performance than other widely used ensemble methods in classification issues. However, it is a very time-consuming task to train a Rotation Forest on large-scale and high-dimensional data. To improve its classification accuracy and reduce the computational cost, a novel classification ensemble algorithm named RoF-GBM is presented in this paper. In the novel algorithm, singular value decomposition is employed to solve the rotation matrix, and then whitening is performed to reduce the computational complexity of PCA. And LightGBM is utilized to train the base classifier to reduce the number of PCA calculations of the whole model. The effectiveness of the proposed RoF-GBM is evaluated against twelve small and medium-scale datasets, three large-scale datasets, three high-dimensional datasets, two artificial datasets and five comparison algorithms. And the extensive experimentation demonstrates that RoF-GBM can observably improve the training speed of Rotation Forest and generate a higher classification accuracy than LightGBM and the variants of Rotation Forest, such as RotBoost and Rotation of Random Forest. Moreover, RoF-GBM also shows strong robustness on some datasets with noise and extreme outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparative evaluation of state-of-the-art ensemble learning algorithms for land cover classification using WorldView-2, Sentinel-2 and ROSIS imagery

Article 06 May 2022

An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features

M-ary Random Forest

Data availability

The datasets analyzed during the current study are publicly available in the UCI repository at http://archive.ics.uci.edu/ml/datasets.php and the KEEL repository at http://www.keel.es/.

References

Ding Y, Zhao X, Zhang Z, Cai W, Yang N (2021) Multiscale graph sample and aggregate network with context-aware learning for hyperspectral image classification. IEEE J Sel Top Appl Earth Observ Remote Sens 14:4561–4572
Article Google Scholar
Zhang Y, Liu Y, Yang G, Song J (2022) SSIT: a sample selection-based incremental model training method for image recognition. Neural Comput Appl 34(4):3117–3134
Article Google Scholar
Asim MN, Ghani MU, Ibrahim MA, Mahmood W, Dengel A, Ahmed S (2021) Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classification. Neural Comput Appl 33(11):5437–5469
Article Google Scholar
Wang Y, Wang A, Ai Q, Sun H (2019) Ensemble based fuzzy weighted extreme learning machine for gene expression classification. Appl Intell 49(3):1161–1171
Article Google Scholar
Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181
MATH MathSciNet Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Rokach L (2016) Decision forest: twenty years of research. Inf Fus 27:111–125
Article Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Article MATH MathSciNet Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Article MATH Google Scholar
Freund Y, Schapire RE, 1996. Experiments with a new boosting algorithm, icml. Publishing, pp 148–156
Xu J, Dang D, Ma Q, Liu X, Han Q (2022) A novel and robust data anomaly detection framework using LAL-AdaBoost for structural health monitoring. J Civil Struct Health Monit. https://doi.org/10.1007/s13349-021-00544-2
Article Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat. https://doi.org/10.1214/aos/1013203451
Article MATH MathSciNet Google Scholar
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system, Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. Publishing, pp 785–794
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 30
Raffei AFM, Asmuni H, Hassan R, Othman RM (2015) A low lighting or contrast ratio visible iris recognition using iso-contrast limited adaptive histogram equalization. Knowl Based Syst 74:40–48
Article Google Scholar
Wang Q, Nguyen TT, Huang JZ, Nguyen TT (2018) An efficient random forests algorithm for high dimensional data classification. Adv Data Anal Classif 12(4):953–972
Article MATH MathSciNet Google Scholar
Shi Y, Liu J, Qi Z, Wang B (2018) Learning from label proportions on high-dimensional data. Neural Netw 103:9–18
Article MATH Google Scholar
Shafizadeh-Moghadam H (2021) Fully component selection: An efficient combination of feature selection and principal component analysis to increase model performance. Expert Syst Appl 186:115678
Article Google Scholar
Conn D, Ngun T, Li G, Ramirez CM (2019) Fuzzy forests: Extending random forest feature selection for correlated, high-dimensional data. J Stat Softw 91:1–25
Article Google Scholar
Reis I, Baron D, Shahaf S (2018) Probabilistic random forest: A machine learning algorithm for noisy data sets. Astron J 157(1):16
Article Google Scholar
Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630
Article Google Scholar
Guo H, Diao X, Liu H (2018) Embedding undersampling rotation forest for imbalanced problem. Comput Intell Neurosci. https://doi.org/10.1155/2018/6798042
Article Google Scholar
Su C, Ju S, Liu Y, Yu Z (2015) Improving random forest and rotation forest for highly imbalanced datasets. Intell Data Anal 19(6):1409–1432
Article Google Scholar
Xia J, Falco N, Benediktsson JA, Du P, Chanussot J (2017) Hyperspectral image classification with rotation random forest via KPCA. IEEE J Sel Topics Appl Earth Observ Remote Sens 10(4):1601–1609
Article Google Scholar
Eeti LN, Buddhiraju KM (2021) Two hidden layer neural network-based rotation forest ensemble for hyperspectral image classification. Geocarto Int 36(16):1820–1837
Article Google Scholar
Feng W, Quan Y, Dauphin G, Li Q, Gao L, Huang W, Xia J, Zhu W, Xing M (2021) Semi-supervised rotation forest based on ensemble margin theory for the classification of hyperspectral image with limited training data. Inf Sci 575:611–638
Article MathSciNet Google Scholar
Lu H, Yang L, Yan K, Xue Y, Gao Z (2017) A cost-sensitive rotation forest algorithm for gene expression data classification. Neurocomputing 228:270–276
Article Google Scholar
Aličković E, Subasi A (2017) Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Comput Appl 28(4):753–763
Article Google Scholar
Zhang CX, Zhang JS (2008) RotBoost: a technique for combining Rotation Forest and AdaBoost. Pattern Recogn Lett 29(10):1524–1536
Article Google Scholar
Stiglic G, Rodriguez JJ, Kokol P (2011) Rotation of random forests for genomic and proteomic classification problems, Software Tools and Algorithms for Biological Systems. Publishing, pp 211–221
Zhang C, Liu C, Zhang X, Almpanidis G (2017) An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst Appl 82:128–150
Article Google Scholar
Dhar J (2022) An adaptive intelligent diagnostic system to predict early stage of parkinson’s disease using two-stage dimension reduction with genetically optimized lightgbm algorithm. Neural Comput Appl 34(6):4567–4593
Article Google Scholar
Shaker B, Yu MS, Song JS, Ahn S, Ryu JY, Oh KS, Na D (2021) LightBBB: computational prediction model of blood–brain-barrier penetration based on LightGBM. Bioinformatics 37(8):1135–1139
Article Google Scholar
Tang M, Zhao Q, Wu H, Wang Z (2021) Cost-sensitive LightGBM-based online fault detection method for wind turbine gearboxes. Front Energy Res. https://doi.org/10.3389/fenrg.2021.701574
Article Google Scholar
Ma X, Sha J, Wang D, Yu Y, Yang Q, Niu X (2018) Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning. Electron Commer Res Appl 31:24–39
Article Google Scholar
Li Z, Zhang J, Yao X, Kou G (2021) How to identify early defaults in online lending: a cost-sensitive multi-layer learning framework. Knowl-Based Syst 221:106963
Article Google Scholar
Dua D, Taniskidou EK (2017) UCI machine learning repository (http://archive.ics.uci.edu/ml). University of California, School of Information and Computer Science, Irvine
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17
Rahman MG, Islam MZ (2013) Missing value imputation using decision trees and decision forests by splitting and merging records: two novel techniques. Knowl-Based Syst 53:51–65
Article Google Scholar
Speybroeck N (2012) Classification and regression trees. Int J Public Health 57(1):243–246
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China [Grant Number 52104146]; Social Science Foundation of Shaanxi Province [Grant Number 2020R005]; Shaanxi province fund for Distinguished Young Scholars [Grant Number 2020JC-44].

Author information

Authors and Affiliations

School of Resources Engineering, Xi’an University of Architecture and Technology, Xi’an, 710055, Shaanxi, China
Qinghua Gu, Wenjing Sun, Song Jiang & Jingni Tian
Xi’an Key Laboratory for Intelligent Industrial Perception, Calculation and Decision, Xi’an University of Architecture and Technology, Xi’an, 710055, China
Qinghua Gu, Wenjing Sun, Xuexian Li, Song Jiang & Jingni Tian
School of Management, Xi’an University of Architecture and Technology, Xi’an, 710055, Shaanxi, China
Xuexian Li

Authors

Qinghua Gu
View author publications
You can also search for this author in PubMed Google Scholar
Wenjing Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xuexian Li
View author publications
You can also search for this author in PubMed Google Scholar
Song Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jingni Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qinghua Gu.

Ethics declarations

Conflict of interest

The authors declare that there has no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Fig.

11.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gu, Q., Sun, W., Li, X. et al. A new ensemble classification approach based on Rotation Forest and LightGBM. Neural Comput & Applic 35, 11287–11308 (2023). https://doi.org/10.1007/s00521-023-08297-3

Download citation

Received: 27 June 2022
Accepted: 06 January 2023
Published: 25 February 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00521-023-08297-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new ensemble classification approach based on Rotation Forest and LightGBM

Abstract

Access this article

Similar content being viewed by others

A comparative evaluation of state-of-the-art ensemble learning algorithms for land cover classification using WorldView-2, Sentinel-2 and ROSIS imagery