Skip to main content
Log in

Reducing the number of trees in a forest using noisy features

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

Random Forest is one of the most popular supervised machine learning algorithms; it is an ensemble of decision trees combined together to accurately discover more rules and ensure diversity. Generally, constructing a large number of trees could lead to constructing redundant ones which may badly affect storage memory, computational time resources, performance attainability and interpretability. A plenty of methods have been proposed in the literature for the sake of selecting a sub-forest while maintaining or even increasing the whole performance. In this paper, a new sub-forest selection method is proposed. It comes in two flavours: first, selecting the minimal number of trees possible, and second, maintaining or even ameliorating the original ensemble performance attainability. A noisy variable technique is introduced as an indicator of underperforming trees. This generated variable is injected in the feature space at each node during the tree’s construction. As a result, the noisy trees are eliminated from the final sub-forest. To test the validity of the proposed method, we have employed real and artificial benchmarking datasets. The obtained results confirm that the generated sub-forest is of a small size and high performance compared to the state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Adnan MN, Islam MZ (2016) Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl-Based Syst 110:86–97

    Article  Google Scholar 

  • Adnan MN, Islam MZ et al (2017a) Forex++: a new framework for knowledge discovery from decision forests. Australas J Inf Syst 21

  • Adnan MN, Islam MZ (2017b) Forest pa: constructing a decision forest by penalizing attributes used in previous trees. Expert Syst Appl 89:389–403

    Article  Google Scholar 

  • Akhiat Y, Chahhou M, Zinedine A (2018) Feature selection based on graph representation. In: 2018 IEEE 5th international congress on Information science and technology (CiSt). IEEE, pp 232–237

  • Akhiat Y, Chahhou M, Zinedine A (2019) Ensemble feature selection algorithm. Int J Intell Syst Appl 11(1):24

    Google Scholar 

  • Akhiat Y, Asnaoui Y, Chahhou M, Zinedine A (2021a) A new graph feature selection approach. In: 2020 6th IEEE congress on information science and technology (CiSt). IEEE, pp 156–161

  • Akhiat Y, Manzali Y, Chahhou M, Zinedine A (2021b) A new noisy random forest based method for feature selection. Cybern Inf Technol 21(2)

  • Ali KM, Pazzani MJ (1996) Error reduction through learning multiple descriptions. Mach Learn 24(3):173–202

    Article  Google Scholar 

  • Angelov PP, Gu X (2019) Empirical approach to machine learning. Springer, Berlin

    Book  Google Scholar 

  • Angelov PP, Gu X, Príncipe JC (2017) A generalized methodology for data analysis. IEEE Trans Cybern 48(10):2981–2993

    Article  Google Scholar 

  • Bernard S, Heutte L, Adam S (2008) Forest-rk: a new random forest induction method. In: International conference on intelligent computing. Springer, pp 430–437

  • Bernard S, Heutte L, Adam S (2009) On the selection of decision trees in random forests. In: 2009 international joint conference on neural networks. IEEE, pp 302–307

  • Bernard S, Adam S, Heutte L (2012) Dynamic random forests. Pattern Recogn Lett 33(12):1580–1586

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  • Brieman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees, vol 67. Wadsworth Inc, Routledge

    Google Scholar 

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    Article  MATH  Google Scholar 

  • Daho MEH, Settouti N, Bechar MEA, Boublenza A, Chikh MA (2021) A new correlation-based approach for ensemble selection in random forests. Int J Intell Comput Cybern 14(2):251–268

  • Deng H (2019) Interpreting tree ensembles with in trees. Int J Data Sci Anal 7(4):277–287

    Article  Google Scholar 

  • Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378

    Article  MathSciNet  MATH  Google Scholar 

  • Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42

    Article  MATH  Google Scholar 

  • Guo H, Liu H, Li R, Wu C, Guo Y, Xu M (2018) Margin & diversity based ordering ensemble pruning. Neurocomputing 275:237–246

    Article  Google Scholar 

  • Hart PE, Stork DG, Duda RO (2000) Pattern classification. Wiley, Hoboken

    MATH  Google Scholar 

  • Huang G-B, Zhou H, Ding X, Zhang R (2011) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B (Cybern) 42(2):513–529

    Article  Google Scholar 

  • Jiang X, Wu C-A, Guo H (2017) Forest pruning based on branch importance. Comput Intell Neurosci  2017

  • Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 4:580–585

    Article  Google Scholar 

  • Khan Z, Gul A, Mahmoud O, Miftahuddin M, Perperoglou A, Adler W, Lausen B (2016) An ensemble of optimal trees for class membership probability estimation. In: Analysis of large and complex data. Springer, pp 395–409

  • Khan Z, Gul A, Perperoglou A, Miftahuddin M, Mahmoud O, Adler W, Lausen B (2020) Ensemble of optimal trees, random forest and random projection ensemble classification. Adv Data Anal Classif 14(1):97–116

    Article  MathSciNet  MATH  Google Scholar 

  • Khan Z, Gul N, Faiz N, Gul A, Adler W, Lausen B (2021) Optimal trees selection for classification via out-of-bag assessment and sub-bagging. IEEE Access 9:28591–28607

    Article  Google Scholar 

  • Kulkarni V, Sinha P, Singh A (2012) Heuristic based improvements for effective random forest classifier. In: Proceedings of international conference on computational intelligence. Springer, Chennai

  • Latinne P, Debeir O, Decaestecker C (2001) Limiting the number of trees in random forests. In: International workshop on multiple classifier systems. Springer, pp 178–187

  • Lichman M et al (2013) Uci machine learning repository

  • Lu Z, Wu X, Zhu X, Bongard J (2010) Ensemble pruning via individual contribution ordering. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 871–880

  • Maclin R, Opitz D (2011) Popular ensemble methods: an empirical study. J Artif Intell Res. arXiv:1106.0257

  • Margineantu DD, Dietterich TG (1997) Pruning adaptive boosting. In: ICML, vol 97. Citeseer, pp. 211–218

  • Martınez-Munoz G, Suárez A (2004) Aggregation ordering in bagging. In: Proc. of the IASTED international conference on artificial intelligence and applications. Citeseer, pp 258–263

  • Nan F, Wang J, Saligrama V (2016) Pruning random forests for prediction on a budget. In: Advances in neural information processing systems, pp 2334–2342

  • Oh D-Y (2012) GA-Boost: a genetic algorithm for robust boosting. The University of Alabama

  • Ordóñez FJ, Ledezma A, Sanchis A (2008) Genetic approach for optimizing ensembles of classifiers. In: FLAIRS conference, pp 89–94

  • Oshiro TM, Perez PS, Baranauskas JA (2012) How many trees in a random forest? In: International workshop on machine learning and data mining in pattern recognition. Springer, pp 154–168

  • Rakers C, Reker D, Brown JB (2017) Small random forest models for effective chemogenomic active learning. J Comput Aided Chem 18:124–142

    Article  Google Scholar 

  • Rodríguez-Fdez I, Canosa A, Mucientes M, Bugarín A (2015) STAC: a web platform for the comparison of algorithms using statistical tests. In: Proceedings of the 2015 IEEE international conference on Fuzzy systems (FUZZ-IEEE)

  • Ruta D, Gabrys B (2005) Classifier selection for majority voting. Inf Fusion 6(1):63–81

    Article  MATH  Google Scholar 

  • Souad TZ, Abdelkader A (2019) Pruning of random forests: a diversity-based heuristic measure to simplify a random forest ensemble. INFOCOMP: J Comput Sci 18(1)

  • Tripoliti EE, Fotiadis DI, Manis G (2010) Dynamic construction of random forests: evaluation using biomedical engineering problems. In: Proceedings of the 10th IEEE international conference on information technology and applications in biomedicine. IEEE, pp 1–4

  • Wang Y, Wang D, Geng N, Wang Y, Yin Y, Jin Y (2019) Stacking-based ensemble learning of decision trees for interpretable prostate cancer detection. Appl Soft Comput 77:188–204

    Article  Google Scholar 

  • Xu X, Chen W (2017) Implementation and performance optimization of dynamic random forest. In: 2017 International conference on cyber-enabled distributed computing and knowledge discovery (CyberC). IEEE, pp 283–289

  • Yang F, Lu W-H, Luo L-K, Li T (2012) Margin optimization based pruning for random forest. Neurocomputing 94:54–63

    Article  Google Scholar 

  • Yassine A, Mohamed C, Zinedine A (2017) Feature selection based on pairwise evalution. In: 2017 intelligent systems and computer vision (ISCV). IEEE, pp 1–6

  • Ying X (2019) An overview of overfitting and its solutions. J Phys: Conf Ser 1168:022022

    Google Scholar 

  • Zhang H, Wang M (2009) Search for the smallest random forest. Stat Interface 2(3):381

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang C, Liu C, Zhang X, Almpanidis G (2017) An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst Appl 82:128–150

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youness Manzali.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Manzali, Y., Akhiat, Y., Chahhou, M. et al. Reducing the number of trees in a forest using noisy features. Evolving Systems 14, 157–174 (2023). https://doi.org/10.1007/s12530-022-09441-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-022-09441-5

Keywords

Navigation