Abstract
Recent research shows that the Rotation Forest and its several variants can achieve better performance than other widely used ensemble methods in classification issues. However, it is a very time-consuming task to train a Rotation Forest on large-scale and high-dimensional data. To improve its classification accuracy and reduce the computational cost, a novel classification ensemble algorithm named RoF-GBM is presented in this paper. In the novel algorithm, singular value decomposition is employed to solve the rotation matrix, and then whitening is performed to reduce the computational complexity of PCA. And LightGBM is utilized to train the base classifier to reduce the number of PCA calculations of the whole model. The effectiveness of the proposed RoF-GBM is evaluated against twelve small and medium-scale datasets, three large-scale datasets, three high-dimensional datasets, two artificial datasets and five comparison algorithms. And the extensive experimentation demonstrates that RoF-GBM can observably improve the training speed of Rotation Forest and generate a higher classification accuracy than LightGBM and the variants of Rotation Forest, such as RotBoost and Rotation of Random Forest. Moreover, RoF-GBM also shows strong robustness on some datasets with noise and extreme outliers.
Similar content being viewed by others
Data availability
The datasets analyzed during the current study are publicly available in the UCI repository at http://archive.ics.uci.edu/ml/datasets.php and the KEEL repository at http://www.keel.es/.
References
Ding Y, Zhao X, Zhang Z, Cai W, Yang N (2021) Multiscale graph sample and aggregate network with context-aware learning for hyperspectral image classification. IEEE J Sel Top Appl Earth Observ Remote Sens 14:4561–4572
Zhang Y, Liu Y, Yang G, Song J (2022) SSIT: a sample selection-based incremental model training method for image recognition. Neural Comput Appl 34(4):3117–3134
Asim MN, Ghani MU, Ibrahim MA, Mahmood W, Dengel A, Ahmed S (2021) Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classification. Neural Comput Appl 33(11):5437–5469
Wang Y, Wang A, Ai Q, Sun H (2019) Ensemble based fuzzy weighted extreme learning machine for gene expression classification. Appl Intell 49(3):1161–1171
Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Rokach L (2016) Decision forest: twenty years of research. Inf Fus 27:111–125
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Freund Y, Schapire RE, 1996. Experiments with a new boosting algorithm, icml. Publishing, pp 148–156
Xu J, Dang D, Ma Q, Liu X, Han Q (2022) A novel and robust data anomaly detection framework using LAL-AdaBoost for structural health monitoring. J Civil Struct Health Monit. https://doi.org/10.1007/s13349-021-00544-2
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat. https://doi.org/10.1214/aos/1013203451
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system, Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. Publishing, pp 785–794
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 30
Raffei AFM, Asmuni H, Hassan R, Othman RM (2015) A low lighting or contrast ratio visible iris recognition using iso-contrast limited adaptive histogram equalization. Knowl Based Syst 74:40–48
Wang Q, Nguyen TT, Huang JZ, Nguyen TT (2018) An efficient random forests algorithm for high dimensional data classification. Adv Data Anal Classif 12(4):953–972
Shi Y, Liu J, Qi Z, Wang B (2018) Learning from label proportions on high-dimensional data. Neural Netw 103:9–18
Shafizadeh-Moghadam H (2021) Fully component selection: An efficient combination of feature selection and principal component analysis to increase model performance. Expert Syst Appl 186:115678
Conn D, Ngun T, Li G, Ramirez CM (2019) Fuzzy forests: Extending random forest feature selection for correlated, high-dimensional data. J Stat Softw 91:1–25
Reis I, Baron D, Shahaf S (2018) Probabilistic random forest: A machine learning algorithm for noisy data sets. Astron J 157(1):16
Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630
Guo H, Diao X, Liu H (2018) Embedding undersampling rotation forest for imbalanced problem. Comput Intell Neurosci. https://doi.org/10.1155/2018/6798042
Su C, Ju S, Liu Y, Yu Z (2015) Improving random forest and rotation forest for highly imbalanced datasets. Intell Data Anal 19(6):1409–1432
Xia J, Falco N, Benediktsson JA, Du P, Chanussot J (2017) Hyperspectral image classification with rotation random forest via KPCA. IEEE J Sel Topics Appl Earth Observ Remote Sens 10(4):1601–1609
Eeti LN, Buddhiraju KM (2021) Two hidden layer neural network-based rotation forest ensemble for hyperspectral image classification. Geocarto Int 36(16):1820–1837
Feng W, Quan Y, Dauphin G, Li Q, Gao L, Huang W, Xia J, Zhu W, Xing M (2021) Semi-supervised rotation forest based on ensemble margin theory for the classification of hyperspectral image with limited training data. Inf Sci 575:611–638
Lu H, Yang L, Yan K, Xue Y, Gao Z (2017) A cost-sensitive rotation forest algorithm for gene expression data classification. Neurocomputing 228:270–276
Aličković E, Subasi A (2017) Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Comput Appl 28(4):753–763
Zhang CX, Zhang JS (2008) RotBoost: a technique for combining Rotation Forest and AdaBoost. Pattern Recogn Lett 29(10):1524–1536
Stiglic G, Rodriguez JJ, Kokol P (2011) Rotation of random forests for genomic and proteomic classification problems, Software Tools and Algorithms for Biological Systems. Publishing, pp 211–221
Zhang C, Liu C, Zhang X, Almpanidis G (2017) An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst Appl 82:128–150
Dhar J (2022) An adaptive intelligent diagnostic system to predict early stage of parkinson’s disease using two-stage dimension reduction with genetically optimized lightgbm algorithm. Neural Comput Appl 34(6):4567–4593
Shaker B, Yu MS, Song JS, Ahn S, Ryu JY, Oh KS, Na D (2021) LightBBB: computational prediction model of blood–brain-barrier penetration based on LightGBM. Bioinformatics 37(8):1135–1139
Tang M, Zhao Q, Wu H, Wang Z (2021) Cost-sensitive LightGBM-based online fault detection method for wind turbine gearboxes. Front Energy Res. https://doi.org/10.3389/fenrg.2021.701574
Ma X, Sha J, Wang D, Yu Y, Yang Q, Niu X (2018) Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning. Electron Commer Res Appl 31:24–39
Li Z, Zhang J, Yao X, Kou G (2021) How to identify early defaults in online lending: a cost-sensitive multi-layer learning framework. Knowl-Based Syst 221:106963
Dua D, Taniskidou EK (2017) UCI machine learning repository (http://archive.ics.uci.edu/ml). University of California, School of Information and Computer Science, Irvine
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17
Rahman MG, Islam MZ (2013) Missing value imputation using decision trees and decision forests by splitting and merging records: two novel techniques. Knowl-Based Syst 53:51–65
Speybroeck N (2012) Classification and regression trees. Int J Public Health 57(1):243–246
Acknowledgements
This work is supported by the National Natural Science Foundation of China [Grant Number 52104146]; Social Science Foundation of Shaanxi Province [Grant Number 2020R005]; Shaanxi province fund for Distinguished Young Scholars [Grant Number 2020JC-44].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there has no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gu, Q., Sun, W., Li, X. et al. A new ensemble classification approach based on Rotation Forest and LightGBM. Neural Comput & Applic 35, 11287–11308 (2023). https://doi.org/10.1007/s00521-023-08297-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08297-3