On Oblique Random Forests

  • Bjoern H. Menze
  • B. Michael Kelm
  • Daniel N. Splitthoff
  • Ullrich Koethe
  • Fred A. Hamprecht
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6912)

Abstract

In his original paper on random forests, Breiman proposed two different decision tree ensembles: one generated from “orthogonal” trees with thresholds on individual features in every split, and one from “oblique” trees separating the feature space by randomly oriented hyperplanes. In spite of a rising interest in the random forest framework, however, ensembles built from orthogonal trees (RF) have gained most, if not all, attention so far.

In the present work we propose to employ “oblique” random forests (oRF) built from multivariate trees which explicitly learn optimal split directions at internal nodes using linear discriminative models, rather than using random coefficients as the original oRF. This oRF outperforms RF, as well as other classifiers, on nearly all data sets but those with discrete factorial features. Learned node models perform distinctively better than random splits. An oRF feature importance score shows to be preferable over standard RF feature importance scores such as Gini or permutation importance. The topology of the oRF decision space appears to be smoother and better adapted to the data, resulting in improved generalization performance. Overall, the oRF propose here may be preferred over standard RF on most learning tasks involving numerical and spectral data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Archer, K.J., Kimes, R.V.: Empirical characterization of random forest variable importance measures. Comput. Stat. Data Anal. 52, 2249–2260 (2008)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Biau, G., Devroye, L., Lugosi, G.: Consistency of random forests and other averaging classifiers. J. Mach. Learn. Res., 2015–2033 (2008)Google Scholar
  3. 3.
    Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)MATHGoogle Scholar
  4. 4.
    Breiman, L.: Arcing classifiers. Technical Report, UC Berkeley (1998)Google Scholar
  5. 5.
    Breiman, L.: Random forests. Mach. Learn. J. 45, 5–32 (2001)CrossRefMATHGoogle Scholar
  6. 6.
    Breiman, L.: Consistency for a simple model of random forests. Tech. Rep. 670, UC Berkeley (2004)Google Scholar
  7. 7.
    Caputo, B., Sim, K., Furesjo, F., Smola, A.: Appearance-based object recognition using SVMs: which kernel should I use? In: Proc NIPS WS (2002)Google Scholar
  8. 8.
    Chan, K.Y., Loh, W.Y.: LOTUS: An algorithm for building accurate and comprehensible logistic regression trees. J. Comp. Graph. Stat. 13, 826–852 (2004)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Criminisi, A., Shotton, J., Bucciarelli, S.: Decision forests with long-range spatial context for organ localization in ct volumes. In: Proc. MICCAI-PMMIA (2009)Google Scholar
  10. 10.
    Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn. 40, 139–157 (2000)CrossRefGoogle Scholar
  11. 11.
    Frank, I.E., Friedman, J.H.: A statistical view of some chemometrics regression tools. Technometrics 35, 109–135 (1993)CrossRefMATHGoogle Scholar
  12. 12.
    Freund, Y., Shapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P.M.B. (ed.) EuroCOLT 1995. LNCS, vol. 904, Springer, Heidelberg (1995)CrossRefGoogle Scholar
  13. 13.
    Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)CrossRefMATHGoogle Scholar
  14. 14.
    Geurts, P., Fillet, M., de Seny, D., Meuwis, M.A., Malaise, M., Merville, M.P., Wehenkel, L.: Proteomic mass spectra classification using decision tree based ensemble methods. Bioinformatics 21, 313–845 (2005)CrossRefGoogle Scholar
  15. 15.
    Hastie, T., Tibshirani, R., Eisen, M., Alizadeh, A., Levy, R., Staudt, L., Chan, W., Botstein, D., Brown, P.: Gene shaving as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol. 1, 1–8 (2000)CrossRefGoogle Scholar
  16. 16.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, Heidelberg (2009)CrossRefMATHGoogle Scholar
  17. 17.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE-T Patt. Anal. Mach. Intell. 20, 832–844 (1998)CrossRefGoogle Scholar
  18. 18.
    Hothorn, T., Leisch, F., Zeileis, A., Hornik, K.: The design and analysis of benchmark experiments. Tech. rep., TU Vienna (2003)Google Scholar
  19. 19.
    Jiang, H., Deng, Y., Chen, H.S., Tao, L., Sha, Q., Chen, J., Tsai, C.J., Zhang, S.: Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes. BMC Bioinformatics 5(81) (2004)Google Scholar
  20. 20.
    Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2, 18–22 (2002)Google Scholar
  21. 21.
    Lin, Y., Jeon, Y.: Random forests and adaptive nearest neighbors. J. Am. Stat. Assoc. 101, 578–590 (2006)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Martinez-Munoz, G., Hernandez-Lobato, D., Suarez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE-T Pattern Anal. Mach. Intell. 31, 245–259 (2009)CrossRefGoogle Scholar
  23. 23.
    Menze, B.H., Kelm, B.M., Masuch, R., Himmelreich, U., Petrich, W., Hamprecht, F.A.: A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics 10, 213 (2009)CrossRefGoogle Scholar
  24. 24.
    Menze, B.H., Lichy, M.P., Bachert, P., Kelm, B.M., Schlemmer, H.P., Hamprecht, F.A.: Optimal classification of long echo time in vivo magnetic resonance spectra in the detection of recurrent brain tumors. NMR Biomed. 19, 599–610 (2006)CrossRefGoogle Scholar
  25. 25.
    Menze, B.H., Petrich, W., Hamprecht, F.A.: Multivariate feature selection and hierarchical classification for infrared spectroscopy: serum-based detection of bovine spongiform encephalopathy. Anal. Bioanal. Chem. 387, 801–1807 (2007)CrossRefGoogle Scholar
  26. 26.
    Menze, B.H., Ur, J.A., Sherratt, A.G.: Detection of ancient settlement mounds – Archaeological survey based on the SRTM terrain model. Photogramm Engin. Rem. Sens. 72, 321–327 (2006)CrossRefGoogle Scholar
  27. 27.
    Murthy, S.K., Kasif, S., Salzberg, S.: A system for induction of oblique decision trees. J. Artif. Intell. Res. 2, 1–32 (1994)MATHGoogle Scholar
  28. 28.
    Nicodemus, K., Malley, J., Strobl, C., Ziegler, A.: The behaviour of random forest permutation-based variable importance measures under predictor correlation. BMC Bioinformatics 11, 110 (2010)CrossRefGoogle Scholar
  29. 29.
    Pal, M.: Random forest classifier for remote sensing classification. Intern. J. Remote Sensing 1, 217–222 (2005)CrossRefGoogle Scholar
  30. 30.
    Pisetta, V., Jouve, P.-E., Zighed, D.A.: Learning with ensembles of randomized trees: New insights. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS, vol. 6323, pp. 67–82. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  31. 31.
    Platt, J.: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Smola, A., Bartlett, P., Schoelkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers. MIT Press, Cambridge (2000)Google Scholar
  32. 32.
    Robnik-Šikonja, M.: Improving random forests. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 359–370. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  33. 33.
    Rodriguez, J., Kuncheva, L., Alonso, C.: Rotation forest: A new classifier ensemble method. IEEE T. Patt. Anal. Mach. Intell. 28, 1619–1630 (2006)CrossRefGoogle Scholar
  34. 34.
    Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  35. 35.
    Segal, M.R.: Machine learning benchmarks and random forest regression. Tech. rep., UC San Francisco (2004)Google Scholar
  36. 36.
    Sethi, I.K.: Entropy nets: from decision trees to neural networks. Proc. IEEE 78, 1605–1613 (1990)CrossRefGoogle Scholar
  37. 37.
    Shen, K.Q., Ong, C.J., Li, X.P., Zheng, H., Wilder-Smith, E.P.V.: A feature selection method for multi-level mental fatigue EEG classification. IEEE-T. Biomed. Engin. 54, 1231–1237 (2007) (in press, epub ahead)CrossRefGoogle Scholar
  38. 38.
    Su, X., Tsai, C.L., Wang, H., Nickerson, D.M., Li, B.: Subgroup analysis via recursive partitioning. J. Mach. Learn. Res. 10, 141–158 (2009)Google Scholar
  39. 39.
    Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling. J. Chem. Inf. Model 43, 1947–1958 (2003)Google Scholar
  40. 40.
    Tan, P.J., Dowe, D.L., Webb, G.I., Yu, X.: MML inference of oblique decision trees. In: Proc. AJCAI, pp. 1082–1088 (2004)Google Scholar
  41. 41.
    Tan, P.J., Dowe, D.L.: Decision forests with oblique decision trees. In: Gelbukh, A., Reyes-Garcia, C.A. (eds.) MICAI 2006. LNCS (LNAI), vol. 4293, pp. 593–603. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  42. 42.
    Tu, Z., Bai, X.: Auto-context and its application to high-level vision tasks and 3d brain image segmentation. IEEE-T. Patt. Anal. Mach. Intell. 99(preprint) (2009)Google Scholar
  43. 43.
    Tu, Z.: Probabilistic boosting-tree: Learning discriminative models for classification, recognition, and clustering. In: Proc. ICCV, pp. 1589–1596 (2005)Google Scholar
  44. 44.
    Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Feature selection with ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10, 1341–1366 (2009)MathSciNetMATHGoogle Scholar
  45. 45.
    Yao, B., Khosla, A., Fei-Fei, L.: Combining randomization and discrimination for fine-grained image categorization. In: Proc. CVPR (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Bjoern H. Menze
    • 1
    • 2
  • B. Michael Kelm
    • 1
  • Daniel N. Splitthoff
    • 1
  • Ullrich Koethe
    • 1
  • Fred A. Hamprecht
    • 1
  1. 1.Interdisciplinary Center for Scientific ComputingUniversity of HeidelbergGermany
  2. 2.Computer Science and Artificial Intelligence LaboratoryMITCambridgeUSA

Personalised recommendations