Advertisement

Random and Deterministic Forests

Chapter
Part of the Springer Series in Statistics book series (SSS, volume 0)

Abstract

Forest-based classification and prediction is one of the most commonly used nonparametric statistical methods in many scientific and engineering areas, particularly in machine learning and analysis of high-throughput genomic data. In this chapter, we first introduce the construction of random forests and deterministic forests, and then address a fundamental and practical issue on how large the forests need to be.

Keywords

Random Forest Importance Measure Importance Score Correct Class Importance Index 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [22]
    L. Breiman. Bagging predictors. Machine Learning, 26:123–140, 1996.Google Scholar
  2. [23]
    L. Breiman. Random forests. Machine Learning, 45:5–32, 2001.MATHCrossRefGoogle Scholar
  3. [24]
    L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth, California, 1984.MATHGoogle Scholar
  4. [38]
    X. Chen, C. Liu, M. Zhang, and H. Zhang. A forest-based approach to identifying gene and gene–gene interactions. Proc. Natl. Acad. Sci. USA, 104:19199–19203, 2007.CrossRefGoogle Scholar
  5. [82]
    J.H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189–1232, 2001.MATHCrossRefMathSciNetGoogle Scholar
  6. [93]
    T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999.CrossRefGoogle Scholar
  7. [122]
    R.J. Klein, C. Zeiss, E.Y. Chew, J.Y. Tsai, R.S. Sackler, C. Haynes, A.K. Henning, J.P. SanGiovanni, S.M. Mane, S.T. Mayne, M.B. Bracken, F.L. Ferris, J. Ott, C. Barnstable, and C. Hoh. Complement factor H polymorphism in age-related macular degeneration. Science, 308:385–389, 2005.CrossRefGoogle Scholar
  8. [124]
    M.R. Kosorok and S. Ma. Marginal asymptotics for the “large p, small n” paradigm: With applications to microarray data. Annals of Statistics, 35:1456–1486, 2007.MATHCrossRefMathSciNetGoogle Scholar
  9. [138]
    S. Lin, D. J. Cutler, M. E. Zwick, and A. Chakravarti. Haplotype inference in random population samples. American Journal of Human Genetics, 71:1129–1137, 2002.CrossRefGoogle Scholar
  10. [201]
    C. Strobl, A.-L. Boulesteix, T. Kneib, T. Augustin, and A. Zeileis. Conditional variable importance for random forests. BMC Bioinfor-matics, 9:307, 2008.CrossRefGoogle Scholar
  11. [202]
    C. Strobl, A.-L. Boulesteix, A. Zeileis, and T. Hothorn. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics, 8:25, 2007.CrossRefGoogle Scholar
  12. [208]
    M. J. van de Vijver, Y. D. He, L. J. van’t Veer, H. Dai, A. A. M. Hart, et al. A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine, 347:1999–2009., 2002.CrossRefGoogle Scholar
  13. [210]
    M. Wang, M. Zhang, X. Chen, and H.P. Zhang. Detecting genes and gene-gene interactions for age-related macular degeneration with a forest-based approach. Statistics in Biopharmaceutical Research, 1:424–430, 2009.CrossRefGoogle Scholar
  14. [224]
    H.P. Zhang. Classification trees for multiple binary responses. Journal of the American Statistical Association, 93:180–193, 1998a.MATHCrossRefGoogle Scholar
  15. [236]
    H.P Zhang, C.Y. Yu, and B. Singer. Cell and tumor classification using gene expression data: Construction of forests. Proc. Natl. Acad. Sci. USA, 100:4168–4172, 2003.CrossRefGoogle Scholar
  16. [239]
    M. Zhang, D. Zhang, and M. Wells. Variable selection for large p small n regression models with incomplete data: Mapping qtl with epistases. BMC Bioinformatics, 9:251, 2008.CrossRefGoogle Scholar
  17. [23]
    L. Breiman. Random forests. Machine Learning, 45:5–32, 2001.MATHCrossRefGoogle Scholar
  18. [79]
    Y. Freund and R.E. Schapire. Game theory, on-line prediction and boosting. In In Proceedings of the Ninth Annual Conference on Computational Learning Theory, pages 325–332. ACM Press, 1996.Google Scholar
  19. [67]
    R. Díaz-Uriarte and S. Alvarez de Andrés. Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7:3, 2006.CrossRefGoogle Scholar
  20. [7]
    D. Amaratunga, J. Cabrera, et al. Enriched random forests. Bioin-formatics, 24:2010–2014, 2008.CrossRefGoogle Scholar
  21. [87]
    R. Genuer, J. M. Poggi, and C. Tuleau. Random forests: some methodological insights. Rapport de Recherche, Institut National de Recherche en Informatique et en Automatique, 2008.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Epidemiology and Public HealthYale University School of MedicineNew HavenUSA
  2. 2.Emerging Pathogens InstituteUniversity of FloridaGainesvilleUSA

Personalised recommendations