Reducing the number of trees in a forest using noisy features

Manzali, Youness; Akhiat, Yassine; Chahhou, Mohamed; Elmohajir, Mohammed; Zinedine, Ahmed

doi:10.1007/s12530-022-09441-5

Reducing the number of trees in a forest using noisy features

Original Paper
Published: 27 May 2022

Volume 14, pages 157–174, (2023)
Cite this article

Evolving Systems Aims and scope Submit manuscript

Youness Manzali ORCID: orcid.org/0000-0002-5912-3094¹,
Yassine Akhiat¹,
Mohamed Chahhou²,
Mohammed Elmohajir² &
…
Ahmed Zinedine¹

153 Accesses
3 Citations
Explore all metrics

Abstract

Random Forest is one of the most popular supervised machine learning algorithms; it is an ensemble of decision trees combined together to accurately discover more rules and ensure diversity. Generally, constructing a large number of trees could lead to constructing redundant ones which may badly affect storage memory, computational time resources, performance attainability and interpretability. A plenty of methods have been proposed in the literature for the sake of selecting a sub-forest while maintaining or even increasing the whole performance. In this paper, a new sub-forest selection method is proposed. It comes in two flavours: first, selecting the minimal number of trees possible, and second, maintaining or even ameliorating the original ensemble performance attainability. A noisy variable technique is introduced as an indicator of underperforming trees. This generated variable is injected in the feature space at each node during the tree’s construction. As a result, the noisy trees are eliminated from the final sub-forest. To test the validity of the proposed method, we have employed real and artificial benchmarking datasets. The obtained results confirm that the generated sub-forest is of a small size and high performance compared to the state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pruning a Random Forest by Learning a Learning Algorithm

Random Forest Pruning Techniques: A Recent Review

Article 19 May 2023

An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features

References

Adnan MN, Islam MZ (2016) Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl-Based Syst 110:86–97
Article Google Scholar
Adnan MN, Islam MZ et al (2017a) Forex++: a new framework for knowledge discovery from decision forests. Australas J Inf Syst 21
Adnan MN, Islam MZ (2017b) Forest pa: constructing a decision forest by penalizing attributes used in previous trees. Expert Syst Appl 89:389–403
Article Google Scholar
Akhiat Y, Chahhou M, Zinedine A (2018) Feature selection based on graph representation. In: 2018 IEEE 5th international congress on Information science and technology (CiSt). IEEE, pp 232–237
Akhiat Y, Chahhou M, Zinedine A (2019) Ensemble feature selection algorithm. Int J Intell Syst Appl 11(1):24
Google Scholar
Akhiat Y, Asnaoui Y, Chahhou M, Zinedine A (2021a) A new graph feature selection approach. In: 2020 6th IEEE congress on information science and technology (CiSt). IEEE, pp 156–161
Akhiat Y, Manzali Y, Chahhou M, Zinedine A (2021b) A new noisy random forest based method for feature selection. Cybern Inf Technol 21(2)
Ali KM, Pazzani MJ (1996) Error reduction through learning multiple descriptions. Mach Learn 24(3):173–202
Article Google Scholar
Angelov PP, Gu X (2019) Empirical approach to machine learning. Springer, Berlin
Book Google Scholar
Angelov PP, Gu X, Príncipe JC (2017) A generalized methodology for data analysis. IEEE Trans Cybern 48(10):2981–2993
Article Google Scholar
Bernard S, Heutte L, Adam S (2008) Forest-rk: a new random forest induction method. In: International conference on intelligent computing. Springer, pp 430–437
Bernard S, Heutte L, Adam S (2009) On the selection of decision trees in random forests. In: 2009 international joint conference on neural networks. IEEE, pp 302–307
Bernard S, Adam S, Heutte L (2012) Dynamic random forests. Pattern Recogn Lett 33(12):1580–1586
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Brieman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees, vol 67. Wadsworth Inc, Routledge
Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Article MATH Google Scholar
Daho MEH, Settouti N, Bechar MEA, Boublenza A, Chikh MA (2021) A new correlation-based approach for ensemble selection in random forests. Int J Intell Comput Cybern 14(2):251–268
Deng H (2019) Interpreting tree ensembles with in trees. Int J Data Sci Anal 7(4):277–287
Article Google Scholar
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Article MathSciNet MATH Google Scholar
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Article MATH Google Scholar
Guo H, Liu H, Li R, Wu C, Guo Y, Xu M (2018) Margin & diversity based ordering ensemble pruning. Neurocomputing 275:237–246
Article Google Scholar
Hart PE, Stork DG, Duda RO (2000) Pattern classification. Wiley, Hoboken
MATH Google Scholar
Huang G-B, Zhou H, Ding X, Zhang R (2011) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B (Cybern) 42(2):513–529
Article Google Scholar
Jiang X, Wu C-A, Guo H (2017) Forest pruning based on branch importance. Comput Intell Neurosci 2017
Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 4:580–585
Article Google Scholar
Khan Z, Gul A, Mahmoud O, Miftahuddin M, Perperoglou A, Adler W, Lausen B (2016) An ensemble of optimal trees for class membership probability estimation. In: Analysis of large and complex data. Springer, pp 395–409
Khan Z, Gul A, Perperoglou A, Miftahuddin M, Mahmoud O, Adler W, Lausen B (2020) Ensemble of optimal trees, random forest and random projection ensemble classification. Adv Data Anal Classif 14(1):97–116
Article MathSciNet MATH Google Scholar
Khan Z, Gul N, Faiz N, Gul A, Adler W, Lausen B (2021) Optimal trees selection for classification via out-of-bag assessment and sub-bagging. IEEE Access 9:28591–28607
Article Google Scholar
Kulkarni V, Sinha P, Singh A (2012) Heuristic based improvements for effective random forest classifier. In: Proceedings of international conference on computational intelligence. Springer, Chennai
Latinne P, Debeir O, Decaestecker C (2001) Limiting the number of trees in random forests. In: International workshop on multiple classifier systems. Springer, pp 178–187
Lichman M et al (2013) Uci machine learning repository
Lu Z, Wu X, Zhu X, Bongard J (2010) Ensemble pruning via individual contribution ordering. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 871–880
Maclin R, Opitz D (2011) Popular ensemble methods: an empirical study. J Artif Intell Res. arXiv:1106.0257
Margineantu DD, Dietterich TG (1997) Pruning adaptive boosting. In: ICML, vol 97. Citeseer, pp. 211–218
Martınez-Munoz G, Suárez A (2004) Aggregation ordering in bagging. In: Proc. of the IASTED international conference on artificial intelligence and applications. Citeseer, pp 258–263
Nan F, Wang J, Saligrama V (2016) Pruning random forests for prediction on a budget. In: Advances in neural information processing systems, pp 2334–2342
Oh D-Y (2012) GA-Boost: a genetic algorithm for robust boosting. The University of Alabama
Ordóñez FJ, Ledezma A, Sanchis A (2008) Genetic approach for optimizing ensembles of classifiers. In: FLAIRS conference, pp 89–94
Oshiro TM, Perez PS, Baranauskas JA (2012) How many trees in a random forest? In: International workshop on machine learning and data mining in pattern recognition. Springer, pp 154–168
Rakers C, Reker D, Brown JB (2017) Small random forest models for effective chemogenomic active learning. J Comput Aided Chem 18:124–142
Article Google Scholar
Rodríguez-Fdez I, Canosa A, Mucientes M, Bugarín A (2015) STAC: a web platform for the comparison of algorithms using statistical tests. In: Proceedings of the 2015 IEEE international conference on Fuzzy systems (FUZZ-IEEE)
Ruta D, Gabrys B (2005) Classifier selection for majority voting. Inf Fusion 6(1):63–81
Article MATH Google Scholar
Souad TZ, Abdelkader A (2019) Pruning of random forests: a diversity-based heuristic measure to simplify a random forest ensemble. INFOCOMP: J Comput Sci 18(1)
Tripoliti EE, Fotiadis DI, Manis G (2010) Dynamic construction of random forests: evaluation using biomedical engineering problems. In: Proceedings of the 10th IEEE international conference on information technology and applications in biomedicine. IEEE, pp 1–4
Wang Y, Wang D, Geng N, Wang Y, Yin Y, Jin Y (2019) Stacking-based ensemble learning of decision trees for interpretable prostate cancer detection. Appl Soft Comput 77:188–204
Article Google Scholar
Xu X, Chen W (2017) Implementation and performance optimization of dynamic random forest. In: 2017 International conference on cyber-enabled distributed computing and knowledge discovery (CyberC). IEEE, pp 283–289
Yang F, Lu W-H, Luo L-K, Li T (2012) Margin optimization based pruning for random forest. Neurocomputing 94:54–63
Article Google Scholar
Yassine A, Mohamed C, Zinedine A (2017) Feature selection based on pairwise evalution. In: 2017 intelligent systems and computer vision (ISCV). IEEE, pp 1–6
Ying X (2019) An overview of overfitting and its solutions. J Phys: Conf Ser 1168:022022
Google Scholar
Zhang H, Wang M (2009) Search for the smallest random forest. Stat Interface 2(3):381
Article MathSciNet MATH Google Scholar
Zhang C, Liu C, Zhang X, Almpanidis G (2017) An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst Appl 82:128–150
Article Google Scholar

Download references

Author information

Authors and Affiliations

LIMS Laboratory, Faculty of Sciences, USMBA, Fez, Morocco
Youness Manzali, Yassine Akhiat & Ahmed Zinedine
Faculty of Sciences, UAE, Tetouan, Morocco
Mohamed Chahhou & Mohammed Elmohajir

Authors

Youness Manzali
View author publications
You can also search for this author in PubMed Google Scholar
Yassine Akhiat
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Chahhou
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Elmohajir
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Zinedine
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youness Manzali.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Manzali, Y., Akhiat, Y., Chahhou, M. et al. Reducing the number of trees in a forest using noisy features. Evolving Systems 14, 157–174 (2023). https://doi.org/10.1007/s12530-022-09441-5

Download citation

Received: 09 July 2021
Accepted: 03 May 2022
Published: 27 May 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s12530-022-09441-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reducing the number of trees in a forest using noisy features

Abstract

Access this article

Similar content being viewed by others

Pruning a Random Forest by Learning a Learning Algorithm

Random Forest Pruning Techniques: A Recent Review

An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reducing the number of trees in a forest using noisy features

Abstract

Access this article

Similar content being viewed by others

Pruning a Random Forest by Learning a Learning Algorithm

Random Forest Pruning Techniques: A Recent Review

An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation