Rotation of Random Forests for Genomic and Proteomic Classification Problems

Chapter
Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 696)

Abstract

Random Forests have been recently widely used for different kinds of classification problems. One of them is classification of gene expression samples that is known as a problem with extremely high dimensionality, and therefore demands suited classification techniques. Due to its strong robustness with respect to large feature sets, Random Forests show significant increase of accuracy in comparison to other ensemble-based classifiers that were widely used before its introduction. In this chapter, we present another ensemble of decision trees called Rotation Forest and evaluate its classification performance on different microarray datasets. Rotation Forest can also be applied to different already existing ensembles of classifiers like Random Forest to improve their accuracy and robustness. This study presents evaluation of Rotation Forest classification technique based on decision trees as base classifiers and was evaluated on 14 different datasets with genomic and proteomic data. It is evident that Rotation Forest as well as the proposed rotation of Random Forests outperform most widely used ensembles of classifiers including Random Forests on majority of datasets.

References

  1. 1.
    Díaz-Uriarte R, Alvarez de Andrés S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7:3PubMedCrossRefGoogle Scholar
  2. 2.
    Vapnik V (1998) Statistical learning theory. John Wiley and Sons, New YorkGoogle Scholar
  3. 3.
    Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16:906–914PubMedCrossRefGoogle Scholar
  4. 4.
    Wu W, Xing E, Mian I, Bissell M (2005) Evaluation of normalization methods for cdna microarray data by k-nn classification. BMC Bioinformatics 6(191):1–21Google Scholar
  5. 5.
    Dudoit S, Fridlyand J (2003) Classification in microarray experimentse. In: Speed T (Ed.), Statistical analysis of gene expression microarray data. Interdisciplinary statistics. Chapman & Hall/CRC, Virginia Beach, 93–158Google Scholar
  6. 6.
    Seiffert U, Hammer B, Kaski S, Villmann T (2006) neural networks and machine learning in bioinformatics – theory and applications. In: Proceedings of the 14th European Symposium on Artificial Neural Networks ESANN 2006, 521–532Google Scholar
  7. 7.
    Cunningham, P. (2007) Ensemble Techniques. Technical Report UCD-CSI-2007–5Google Scholar
  8. 8.
    Breiman L (1996) Bagging predictors. Machine Learning 24:123–140Google Scholar
  9. 9.
    Efron B, Tibshirani R (1994) An introduction to the bootstrap. Chapman & Hall/CRC, Virginia BeachGoogle Scholar
  10. 10.
    Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on Machine Learning, 148–156Google Scholar
  11. 11.
    Rätsch G, Onoda T, Müller KR (2001) Soft margins for AdaBoost. Machine Learning 42(3):287–320CrossRefGoogle Scholar
  12. 12.
    Ho, TK (1995) Random decision forest. In: Proceedings of the 3rd Int’l Conf on Document Analysis and Recognition, Montreal, Canada, August 14–18, 1995, 278–282Google Scholar
  13. 13.
    Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Computation 9:1545–1588CrossRefGoogle Scholar
  14. 14.
    Breiman L (2001) Random forests. Machine Learning 45:5–32CrossRefGoogle Scholar
  15. 15.
    Dietterich TG (2002) Ensemble learning. In: Arbib MA (Ed.) The handbook of brain theory and neural networks, 2nd ed. The MIT Press, Cambridge, MA, 405–408Google Scholar
  16. 16.
    Melville P, Mooney RJ (2003) Constructing diverse classifier ensembles using artificial training examples. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), 505–510Google Scholar
  17. 17.
    Rodríguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10):1619–1630PubMedCrossRefGoogle Scholar
  18. 18.
    Stiglic G, Kokol P (2007) Effectiveness of rotation forest in meta-learning based gene expression classification. In: Proceedings of the 20th IEEE International Symposium on Computer-Based Medical Systems (CBMS 2007), 243–250Google Scholar
  19. 19.
    Robnik-Sikonja M, Kononenko I (1997) An adaptation of relief for attribute estimation in regression. In: Machine Learning: Proceedings of the Fourteenth International Conference (ICML’97), 296–304Google Scholar
  20. 20.
    Symons S, Nieselt K (2006) Data mining microarray data – comprehensive benchmarking of feature selection and classification methods (available at: www.zbit.unituebingen.de/pas/preprints/GCB2006/SymonsNieselt.pdf)
  21. 21.
    Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Academy of Sciences of the United States of America 99:6562–6566PubMedCrossRefGoogle Scholar
  22. 22.
    Burman P (1989) A comparative study of ordinary cross-validation, v-fold cross-validation and repeated learning-testing methods. Biometrika 76:503–514Google Scholar
  23. 23.
    Efron B, Tibshirani R (1997) Improvements on cross-validation: the. 632 + bootstrap method. Journal of the American Statistical Association 92:548–560CrossRefGoogle Scholar
  24. 24.
    Li J, Liu H (2003) Ensembles of cascading trees. In: Proceedings of the IEEE ICDM 2003 Conference 585Google Scholar
  25. 25.
    Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A, Cheng C, Campana D, Wilkins D, Zhou X, Li J, Liu H, Pui CH, Evans WE, Naeve C, Wong L, Downing JR (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1:133–143PubMedCrossRefGoogle Scholar
  26. 26.
    Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537PubMedCrossRefGoogle Scholar
  27. 27.
    van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Letters to Nature, Nature 415:530–536Google Scholar
  28. 28.
    Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JY, Goumnerova LC, Black PM, Lau C, Allen JC, Zagzag D, Olson JM, Curran T, Wetmore C, Biegel JA, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis DN, Mesirov JP, Lander ES, Golub TR (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Letters to Nature, Nature 415:436–442CrossRefGoogle Scholar
  29. 29.
    Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of National Academy of Sciences of the United States of America 96:6745–6750CrossRefGoogle Scholar
  30. 30.
    Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J Jr, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511PubMedCrossRefGoogle Scholar
  31. 31.
    Rosenwald A, Wright G, Chan W, Connors JM, Campo E, Fisher R, Gascoyne RD, Muller-Hermelink K, Smeland EB, Staut LM (2002) The use of molecular profiling to predict survival after themotheropy for diffuse large-B-cell lymphoma. The New England Journal of Medicine 346:1937–1947PubMedCrossRefGoogle Scholar
  32. 32.
    Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, Ray TS, Koval MA, Last KW, Norton A, Lister TA, Mesirov J, Neuberg DS, Lander ES, Aster JC, Golub TR (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine 8:68–74PubMedCrossRefGoogle Scholar
  33. 33.
    Gordon GJ, Jensen RV, Hsiao LL, Gullans SR, Blumenstock JE, Ramaswamy S, Richards WG, Sugarbaker DJ, Bueno R (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62(17):4963–4967PubMedGoogle Scholar
  34. 34.
    Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M (2002) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinomas subclasses. Proceedings of National Academy of Sciences of the United States of America 98:13790–13795CrossRefGoogle Scholar
  35. 35.
    Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni MD, Orringer MB, Hanash S (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Medicine 18(8):816–824Google Scholar
  36. 36.
    Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ (2002) MLL Translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics 30:41–47PubMedCrossRefGoogle Scholar
  37. 37.
    Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Mills GB, Simone C, Fishman DA, Kohn EC, Liotta LA (2002) Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359:572–577CrossRefGoogle Scholar
  38. 38.
    Singh D, Febbo P, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR (2002) Gene expression correlates of clinical prostate cancer behaviour. Cancer Cell 1(2):203–209PubMedCrossRefGoogle Scholar
  39. 39.
    Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S (2005) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21:631–643PubMedCrossRefGoogle Scholar
  40. 40.
    Witten IH, Frank E (2005) Data mining: practical machine learning tools with Java implementations. Morgan Kaufmann, MassachusettsGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Gregor Stiglic
    • 1
    • 2
  • Juan J. Rodriguez
  • Peter Kokol
  1. 1.Faculty of Health SciencesUniversity of MariborMariborSlovenia
  2. 2.Faculty of Electrical Engineering and Computer ScienceUniversity of MariborMariborSlovenia

Personalised recommendations