Random and Deterministic Forests

Part of the Springer Series in Statistics book series (SSS, volume 0)


Forest-based classification and prediction is one of the most commonly used nonparametric statistical methods in many scientific and engineering areas, particularly in machine learning and analysis of high-throughput genomic data. In this chapter, we first introduce the construction of random forests and deterministic forests, and then address a fundamental and practical issue on how large the forests need to be.




Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [22]
    L. Breiman. Bagging predictors. Machine Learning, 26:123–140, 1996.Google Scholar
  2. [23]
    L. Breiman. Random forests. Machine Learning, 45:5–32, 2001.MATHCrossRefGoogle Scholar
  3. [24]
    L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth, California, 1984.MATHGoogle Scholar
  4. [38]
    X. Chen, C. Liu, M. Zhang, and H. Zhang. A forest-based approach to identifying gene and gene–gene interactions. Proc. Natl. Acad. Sci. USA, 104:19199–19203, 2007.CrossRefGoogle Scholar
  5. [82]
    J.H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189–1232, 2001.MATHCrossRefMathSciNetGoogle Scholar
  6. [93]
    T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999.CrossRefGoogle Scholar
  7. [122]
    R.J. Klein, C. Zeiss, E.Y. Chew, J.Y. Tsai, R.S. Sackler, C. Haynes, A.K. Henning, J.P. SanGiovanni, S.M. Mane, S.T. Mayne, M.B. Bracken, F.L. Ferris, J. Ott, C. Barnstable, and C. Hoh. Complement factor H polymorphism in age-related macular degeneration. Science, 308:385–389, 2005.CrossRefGoogle Scholar
  8. [124]
    M.R. Kosorok and S. Ma. Marginal asymptotics for the “large p, small n” paradigm: With applications to microarray data. Annals of Statistics, 35:1456–1486, 2007.MATHCrossRefMathSciNetGoogle Scholar
  9. [138]
    S. Lin, D. J. Cutler, M. E. Zwick, and A. Chakravarti. Haplotype inference in random population samples. American Journal of Human Genetics, 71:1129–1137, 2002.CrossRefGoogle Scholar
  10. [201]
    C. Strobl, A.-L. Boulesteix, T. Kneib, T. Augustin, and A. Zeileis. Conditional variable importance for random forests. BMC Bioinfor-matics, 9:307, 2008.CrossRefGoogle Scholar
  11. [202]
    C. Strobl, A.-L. Boulesteix, A. Zeileis, and T. Hothorn. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics, 8:25, 2007.CrossRefGoogle Scholar
  12. [208]
    M. J. van de Vijver, Y. D. He, L. J. van’t Veer, H. Dai, A. A. M. Hart, et al. A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine, 347:1999–2009., 2002.CrossRefGoogle Scholar
  13. [210]
    M. Wang, M. Zhang, X. Chen, and H.P. Zhang. Detecting genes and gene-gene interactions for age-related macular degeneration with a forest-based approach. Statistics in Biopharmaceutical Research, 1:424–430, 2009.CrossRefGoogle Scholar
  14. [224]
    H.P. Zhang. Classification trees for multiple binary responses. Journal of the American Statistical Association, 93:180–193, 1998a.MATHCrossRefGoogle Scholar
  15. [236]
    H.P Zhang, C.Y. Yu, and B. Singer. Cell and tumor classification using gene expression data: Construction of forests. Proc. Natl. Acad. Sci. USA, 100:4168–4172, 2003.CrossRefGoogle Scholar
  16. [239]
    M. Zhang, D. Zhang, and M. Wells. Variable selection for large p small n regression models with incomplete data: Mapping qtl with epistases. BMC Bioinformatics, 9:251, 2008.CrossRefGoogle Scholar
  17. [23]
    L. Breiman. Random forests. Machine Learning, 45:5–32, 2001.MATHCrossRefGoogle Scholar
  18. [79]
    Y. Freund and R.E. Schapire. Game theory, on-line prediction and boosting. In In Proceedings of the Ninth Annual Conference on Computational Learning Theory, pages 325–332. ACM Press, 1996.Google Scholar
  19. [67]
    R. Díaz-Uriarte and S. Alvarez de Andrés. Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7:3, 2006.CrossRefGoogle Scholar
  20. [7]
    D. Amaratunga, J. Cabrera, et al. Enriched random forests. Bioin-formatics, 24:2010–2014, 2008.CrossRefGoogle Scholar
  21. [87]
    R. Genuer, J. M. Poggi, and C. Tuleau. Random forests: some methodological insights. Rapport de Recherche, Institut National de Recherche en Informatique et en Automatique, 2008.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Epidemiology and Public HealthYale University School of MedicineNew HavenUSA
  2. 2.Emerging Pathogens InstituteUniversity of FloridaGainesvilleUSA

Personalised recommendations