Feature-ranked self-growing forest: a tree ensemble based on structure diversity for classification and regression

Carino-Escobar, Ruben I.; Alonso-Silverio, Gustavo A.; Alarcón-Paredes, Antonio; Cantillo-Negrete, Jessica

doi:10.1007/s00521-023-08202-y

Feature-ranked self-growing forest: a tree ensemble based on structure diversity for classification and regression

S.I.: Latin American Computational Intelligence
Published: 27 January 2023

Volume 35, pages 9285–9298, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

362 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

Tree ensemble algorithms, such as random forest (RF), are some of the most widely applied methods in machine learning. However, an important hyperparameter, the number of classification or regression trees within the ensemble must be specified in these algorithms. The number of trees within the ensemble can adversely affect bias or computational cost and should ideally be adapted for each task. For this reason, a novel tree ensemble is described, the feature-ranked self-growing forest (FSF), that allows the automatic growth of a tree ensemble based on the structural diversity of the first two levels of trees’ nodes. The algorithm’s performance was tested with 30 classification and 30 regression datasets and compared with RF. The computational complexity was also theoretically and experimentally analyzed. FSF had a significant higher performance for 57%, and an equivalent performance for 27% of classification datasets compared to RF. FSF had a higher performance for 70% and an equivalent performance for 7% of regression datasets compared to RF. Computational complexity of FSF was competitive compared to that of other tree ensembles, being mainly dependent on the number of observations within the dataset. Therefore, it can be implied that FSF is a suitable out-of-the-box approach with potential as a tool for feature ranking and dataset’s complexity analysis using the number of trees computed for a particular task. A MATLAB and Python implementation of the algorithm and a working example for classification and regression are provided for academic use.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

All Relevant Feature Selection Methods and Applications

Variable importance-weighted random forests

Article 06 November 2017

Pruning a Random Forest by Learning a Learning Algorithm

References

Noorbakhsh J, Chandok H, Karuturi RKM, George J (2019) Machine learning in biology and medicine. Adv Mol Pathol 2:143–152. https://doi.org/10.1016/j.yamp.2019.07.010
Article Google Scholar
Gao T, Lu W (2021) Machine learning toward advanced energy storage devices and systems. iScience 24:101936. https://doi.org/10.1016/j.isci.2020.101936
Article Google Scholar
Alanne K, Sierla S (2022) An overview of machine learning applications for smart buildings. Sustain Cities Soc 76:103445. https://doi.org/10.1016/j.scs.2021.103445
Article Google Scholar
Chlingaryan A, Sukkarieh S, Whelan B (2018) Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: a review. Comput Electron Agric 151:61–69. https://doi.org/10.1016/j.compag.2018.05.012
Article Google Scholar
Dixon MF, Halperin I, Bilokon P (2020) Machine learning in finance. Springer, Cham
Book MATH Google Scholar
Zhang C, Yunqia M (2012) Ensemble machine learning. Springer, US, Boston, MA
Book Google Scholar
Loh W-Y (2014) Fifty years of classification and regression trees. Int Stat Rev 82:329–348. https://doi.org/10.1111/insr.12016
Article MathSciNet MATH Google Scholar
Breiman L, Friedman J, Stone C, Olshen R (1984) Classification and regression trees, 1st edn. Chapman and Hall/CRC, New York
MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Amancio DR, Comin CH, Casanova D et al (2014) A systematic comparison of supervised classifiers. PLoS ONE 9:e94137
Article Google Scholar
Oliveira S, Oehler F, San-Miguel-Ayanz J et al (2012) Modeling spatial patterns of fire occurrence in Mediterranean Europe using multiple regression and random forest. For Ecol Manage 275:117–129. https://doi.org/10.1016/j.foreco.2012.03.003
Article Google Scholar
Couronné R, Probst P, Boulesteix A-L (2018) Random forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinform 19:270. https://doi.org/10.1186/s12859-018-2264-5
Article Google Scholar
Richards G, Wang W (2012) What influences the accuracy of decision tree ensembles? J Intell Inf Syst 39:627–650. https://doi.org/10.1007/s10844-012-0206-7
Article Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232. https://doi.org/10.1214/aos/1013203451
Article MathSciNet MATH Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139. https://doi.org/10.1006/jcss.1997.1504
Article MathSciNet MATH Google Scholar
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42. https://doi.org/10.1007/s10994-006-6226-1
Article MATH Google Scholar
Chen T, Guestrin C (2016) XGBoost. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 785–794
González S, García S, del Ser J et al (2020) A practical tutorial on bagging and boosting based ensembles for machine learning: algorithms, software tools, performance study, practical perspectives and opportunities. Inf Fusion 64:205–237. https://doi.org/10.1016/j.inffus.2020.07.007
Article Google Scholar
Sagi O, Rokach L (2020) Explainable decision forest: transforming a decision forest into an interpretable tree. Inf Fusion 61:124–138. https://doi.org/10.1016/j.inffus.2020.03.013
Article Google Scholar
Budnik M, Krawczyk B (2013) On optimal settings of classification tree ensembles for medical decision support. Health Inform J 19:3–15. https://doi.org/10.1177/1460458212446096
Article Google Scholar
Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26:217–222. https://doi.org/10.1080/01431160412331269698
Article Google Scholar
Probst P, Boulesteix A-L (2017) To tune or not to tune the number of trees in random forest. J Mach Learn Res 18:6673–6690
MathSciNet MATH Google Scholar
Wei-Yin L (2002) Regression trees with unbiased variable selection and interaction detection. Stat Sin 12:361–386
MathSciNet MATH Google Scholar
Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Conn Sci 8:385–404. https://doi.org/10.1080/095400996116839
Article Google Scholar
Han J, Kamber M, Pei C (2012) Classification: basic concepts. Data mining concepts and techniques, 3rd edn. Elsevier, Waltham, pp 327–350
Chapter MATH Google Scholar
Sun T, Zhou Z-H (2018) Structural diversity for decision tree ensemble learning. Front Comput Sci 12:560–570. https://doi.org/10.1007/s11704-018-7151-8
Article Google Scholar
Maillo J, Triguero I, Herrera F (2020) Redundancy and complexity metrics for big data classification: towards smart data. IEEE Access 8:87918–87928. https://doi.org/10.1109/ACCESS.2020.2991800
Article Google Scholar
Khan Z, Gul A, Perperoglou A et al (2020) Ensemble of optimal trees, random forest and random projection ensemble classification. Adv Data Anal Classif 14:97–116. https://doi.org/10.1007/s11634-019-00364-9
Article MathSciNet MATH Google Scholar
Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537. https://doi.org/10.1126/science.286.5439.531
Article Google Scholar
Wang Y, Klijn JG, Zhang Y et al (2005) Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365:671–679. https://doi.org/10.1016/S0140-6736(05)17947-1
Article Google Scholar
Gordon GJ, Jensen RV, Hsiao L-L et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma1. Cancer Res 62:4963–4967
Google Scholar
Singh D, Febbo PG, Ross K et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209. https://doi.org/10.1016/S1535-6108(02)00030-2
Article Google Scholar
Shipp MA, Ross KN, Tamayo P et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8:68–74. https://doi.org/10.1038/nm0102-68
Article Google Scholar

Download references

Acknowledgments

Authors would like to acknowledge Consejo Nacional de Ciencia y Tecnología (CONACYT) for supporting this work with grant number SALUD-2018-02-B-S-45803. Authors would also like to acknowledge the UC Irvine Machine Learning Repository for hosting the datasets used in this work.

Author information

Authors and Affiliations

Division of Research in Medical Engineering, Instituto Nacional de Rehabilitación “Luis Guillermo Ibarra Ibarra”, Calzada Mexico-Xochimilco #289, Arenal de Guadalupe, Tlalpan, 14389, Mexico City, Mexico
Ruben I. Carino-Escobar & Jessica Cantillo-Negrete
Faculty of Engineering, Universidad Autónoma de Guerrero, 39087, Chilpancingo, Mexico
Gustavo A. Alonso-Silverio
Centro de Investigación en Computación, Instituto Politécnico Nacional, 07738, Mexico City, Mexico
Antonio Alarcón-Paredes

Authors

Ruben I. Carino-Escobar
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo A. Alonso-Silverio
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Alarcón-Paredes
View author publications
You can also search for this author in PubMed Google Scholar
Jessica Cantillo-Negrete
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jessica Cantillo-Negrete.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Availability of data and material

The datasets analyzed during the current study are available in the UCI Machine Learning repository, https://archive.ics.uci.edu/ml/index.php. The FSF algorithm can be downloaded from: https://github.com/RubenICarinoEscobar/Feature-Ranked-Self-Growing-Forest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 772 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Carino-Escobar, R.I., Alonso-Silverio, G.A., Alarcón-Paredes, A. et al. Feature-ranked self-growing forest: a tree ensemble based on structure diversity for classification and regression. Neural Comput & Applic 35, 9285–9298 (2023). https://doi.org/10.1007/s00521-023-08202-y

Download citation

Received: 31 March 2022
Accepted: 06 January 2023
Published: 27 January 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00521-023-08202-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature-ranked self-growing forest: a tree ensemble based on structure diversity for classification and regression

Abstract

Access this article

Similar content being viewed by others

All Relevant Feature Selection Methods and Applications

Variable importance-weighted random forests

Pruning a Random Forest by Learning a Learning Algorithm

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Availability of data and material

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 772 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature-ranked self-growing forest: a tree ensemble based on structure diversity for classification and regression

Abstract

Access this article

Similar content being viewed by others

All Relevant Feature Selection Methods and Applications

Variable importance-weighted random forests

Pruning a Random Forest by Learning a Learning Algorithm

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Availability of data and material

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 772 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation