Advertisement

Soft Computing

, Volume 23, Issue 21, pp 10739–10754 | Cite as

A comparison of random forest based algorithms: random credal random forest versus oblique random forest

  • Carlos J. MantasEmail author
  • Javier G. Castellano
  • Serafín Moral-García
  • Joaquín Abellán
Methodologies and Application

Abstract

Random forest (RF) is an ensemble learning method, and it is considered a reference due to its excellent performance. Several improvements in RF have been published. A kind of improvement for the RF algorithm is based on the use of multivariate decision trees with local optimization process (oblique RF). Another type of improvement is to provide additional diversity for the univariate decision trees by means of the use of imprecise probabilities (random credal random forest, RCRF). The aim of this work is to compare experimentally these improvements of the RF algorithm. It is shown that the improvement in RF with the use of additional diversity and imprecise probabilities achieves better results than the use of RF with multivariate decision trees.

Keywords

Classification Ensemble schemes Random forest Imprecise probabilities Credal sets 

Notes

Acknowledgements

This work has been supported by the Spanish “Ministerio de Economía y Competitividad” and by “Fondo Europeo de Desarrollo Regional” (FEDER) under Project TEC2015-69496-R.

Compliance with ethical standards

Conflict of interest

Carlos J. Mantas, Javier G. Castellano, Serafín Moral-García and Joaquín Abellán declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. Abellán J (2006) Uncertainty measures on probability intervals from the imprecise dirichlet model. Int J Gen Syst 35(5):509–528.  https://doi.org/10.1080/03081070600687643 MathSciNetCrossRefzbMATHGoogle Scholar
  2. Abellán J, Masegosa A (2008) Requirements for total uncertainty measures in dempster–shafer theory of evidence. Int J Gen Syst 37(6):733–747.  https://doi.org/10.1080/03081070802082486 MathSciNetCrossRefzbMATHGoogle Scholar
  3. Abellán J, Masegosa AR (2012) Bagging schemes on the presence of class noise in classification. Expert Syst Appl 39(8):6827–6837.  https://doi.org/10.1016/j.eswa.2012.01.013 CrossRefGoogle Scholar
  4. Abellán J, Moral S (2003) Building classification trees using the total uncertainty criterion. Int J Intell Syst 18(12):1215–1225.  https://doi.org/10.1002/int.10143 CrossRefzbMATHGoogle Scholar
  5. Abellán J, Mantas CJ, Castellano JG (2018a) Adaptative CC4.5: credal C4.5 with a rough class noise estimator. Expert Syst Appl 92:363–379.  https://doi.org/10.1016/j.eswa.2017.09.057 CrossRefGoogle Scholar
  6. Abellán J, Mantas CJ, Castellano JG, Moral S (2018b) Increasing diversity in random forest learning algorithm via imprecise probabilities. Expert Syst Appl 97:228–243.  https://doi.org/10.1016/j.eswa.2017.12.029 CrossRefGoogle Scholar
  7. Alcalá-Fdez J, Sánchez L, Garćýa S, del Jesus M, Ventura S, Garrell J, Herrera F (2009) Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318.  https://doi.org/10.1007/s00500-008-0323-y CrossRefGoogle Scholar
  8. Breiman L (2000) Randomizing outputs to increase prediction accuracy. Mach Learn 40(3):229–242.  https://doi.org/10.1023/A:1007682208299 CrossRefzbMATHGoogle Scholar
  9. Breiman L (2001) Random forests. Mach Learn 45(1):5–32.  https://doi.org/10.1023/A:1010933404324 CrossRefzbMATHGoogle Scholar
  10. Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. J Inf Fusion 6:5–20CrossRefGoogle Scholar
  11. Chen F-H, Howard H (2016) An alternative model for the analysis of detecting electronic industries earnings management using stepwise regression, random forest, and decision tree. Soft Comput 20(5):1945–1960.  https://doi.org/10.1007/s00500-015-1616-6 CrossRefGoogle Scholar
  12. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetzbMATHGoogle Scholar
  13. Dietterich TG (2000a) Ensemble methods in machine learning ensemble methods in machine learning. In: Proceedings of the first international workshop on multiple classifier systems proceedings of the first international workshop on multiple classifier systems, Springer, London, UK, pp 1–15Google Scholar
  14. Dietterich TG (2000b) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learn 40(2):139–157.  https://doi.org/10.1023/A:1007607513941 CrossRefGoogle Scholar
  15. Fan S-KS, Su C-J, Nien H-T, Tsai P-F, Cheng C-Y (2017) Using machine learning and big data approaches to predict travel time based on historical and real-time data from taiwan electronic toll collection. Soft Comput.  https://doi.org/10.1007/s00500-017-2610-y
  16. Frenay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869CrossRefGoogle Scholar
  17. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701CrossRefGoogle Scholar
  18. Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92.  https://doi.org/10.1214/aoms/1177731944 MathSciNetCrossRefzbMATHGoogle Scholar
  19. Hoerl AE, Kennard RW (2000) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 42(1):80–86.  https://doi.org/10.2307/1271436 CrossRefzbMATHGoogle Scholar
  20. Klir GJ (2005) Uncertainty and information: foundations of generalized information theory. Wiley, New York.  https://doi.org/10.1002/0471755575 CrossRefzbMATHGoogle Scholar
  21. Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
  22. Mantas CJ, Abellán J (2014a) Analysis and extension of decision trees based on imprecise probabilities: application on noisy data. Expert Syst Appl 41(5):2514–2525.  https://doi.org/10.1016/j.eswa.2013.09.050 CrossRefGoogle Scholar
  23. Mantas CJ, Abellán J (2014b) Credal-C4.5: decision tree based on imprecise probabilities to classify noisy data. Expert Syst Appl 41(10):4625–4637.  https://doi.org/10.1016/j.eswa.2014.01.017 CrossRefGoogle Scholar
  24. Mantas CJ, Abellán J, Castellano JG (2016) Analysis of credal-C4.5 for classification in noisy domains. Expert Syst Appl 61:314–326.  https://doi.org/10.1016/j.eswa.2016.05.035 CrossRefGoogle Scholar
  25. Marquardt DW, Snee RD (1975) Ridge regression in practice. Am Stat 29(1):3–20zbMATHGoogle Scholar
  26. Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA (2011) On oblique random forests. In: Proceedings of the 2011 European conference on machine learning and knowledge discovery in databases-volume part ii, Springer, pp 453–469Google Scholar
  27. Mistry P, Neagu D, Trundle PR, Vessey JD (2016) Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology. Soft Comput 20(8):2967–2979.  https://doi.org/10.1007/s00500-015-1925-9 CrossRefGoogle Scholar
  28. Nemenyi P (1963) Distribution-free multiple comparisons (Doctoral dissertation). Princeton University, PrincetonGoogle Scholar
  29. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106.  https://doi.org/10.1023/A:1022643204877 CrossRefGoogle Scholar
  30. R Core Team (2013) R: a language and environment for statistical computing [computer software manual], Vienna, Austria. http://www.R-project.org/
  31. Ren Y, Zhang L, Suganthan PN (2016) Ensemble classification and regression-recent developments, applications and future directions. IEEE Comput Intell Mag 11(1):41–53.  https://doi.org/10.1109/MCI.2015.2471235 CrossRefGoogle Scholar
  32. Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39.  https://doi.org/10.1007/s10462-009-9124-7 CrossRefGoogle Scholar
  33. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423.  https://doi.org/10.1002/j.1538-7305.1948.tb01338.x MathSciNetCrossRefzbMATHGoogle Scholar
  34. Walley P (1996) Inferences from multinomial data; learning about a bag of marbles (with discussion). J R Stat Soc Ser B 58(1):3–57.  https://doi.org/10.2307/2346164 MathSciNetCrossRefzbMATHGoogle Scholar
  35. Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83.  https://doi.org/10.2307/3001968 CrossRefGoogle Scholar
  36. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann Publishers Inc., San FranciscozbMATHGoogle Scholar
  37. Xu Y, Zhang Q, Wang L (2018) Metric forests based on gaussian mixture model for visual image classification. Soft Comput 22(2):499–509.  https://doi.org/10.1007/s00500-016-2350-4 CrossRefGoogle Scholar
  38. Zhang L, Suganthan P (2014) Random forests with ensemble of feature spaces. Pattern Recognit 47:3429–3437CrossRefGoogle Scholar
  39. Zhang L, Suganthan PN (2015) Oblique decision tree ensemble via multisurface proximal support vector machine. IEEE Trans Cybern 45(10):2165–2176.  https://doi.org/10.1109/TCYB.2014.2366468 CrossRefGoogle Scholar
  40. Zhang L, Suganthan PN (2017) Benchmarking ensemble classifiers with novel co-trained kernal ridge regression and random vector functional link ensembles [research frontier]. IEEE Comput Intell Mag 12(4):61–72.  https://doi.org/10.1109/MCI.2017.2742867 CrossRefGoogle Scholar
  41. Zhang L, Ren Y, Suganthan PN (2014) Towards generating random forests via extremely randomized trees. In: IJCNN, IEEE, pp 2645–2652Google Scholar
  42. Zhang L, Varadarajan J, Suganthan PN, Ahuja N, Moulin P (2017) Robust visual tracking using oblique random forests. IEEE Conf Comput Vis Pattern Recognit (CVPR) 2017:5825–5834Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Carlos J. Mantas
    • 1
    Email author
  • Javier G. Castellano
    • 1
  • Serafín Moral-García
    • 1
  • Joaquín Abellán
    • 1
  1. 1.Department of Computer Science and Artificial IntelligenceUniversity of GranadaGranadaSpain

Personalised recommendations