Rotation of Random Forests for Genomic and Proteomic Classification Problems

Stiglic, Gregor; Rodriguez, Juan J.; Kokol, Peter

doi:10.1007/978-1-4419-7046-6_21

Gregor Stiglic^3,4,
Juan J. Rodriguez &
Peter Kokol

Part of the book series: Advances in Experimental Medicine and Biology ((AEMB,volume 696))

2701 Accesses
14 Citations

Abstract

Random Forests have been recently widely used for different kinds of classification problems. One of them is classification of gene expression samples that is known as a problem with extremely high dimensionality, and therefore demands suited classification techniques. Due to its strong robustness with respect to large feature sets, Random Forests show significant increase of accuracy in comparison to other ensemble-based classifiers that were widely used before its introduction. In this chapter, we present another ensemble of decision trees called Rotation Forest and evaluate its classification performance on different microarray datasets. Rotation Forest can also be applied to different already existing ensembles of classifiers like Random Forest to improve their accuracy and robustness. This study presents evaluation of Rotation Forest classification technique based on decision trees as base classifiers and was evaluated on 14 different datasets with genomic and proteomic data. It is evident that Rotation Forest as well as the proposed rotation of Random Forests outperform most widely used ensembles of classifiers including Random Forests on majority of datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Díaz-Uriarte R, Alvarez de Andrés S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7:3
Article PubMed Google Scholar
Vapnik V (1998) Statistical learning theory. John Wiley and Sons, New York
Google Scholar
Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16:906–914
Article PubMed CAS Google Scholar
Wu W, Xing E, Mian I, Bissell M (2005) Evaluation of normalization methods for cdna microarray data by k-nn classification. BMC Bioinformatics 6(191):1–21
Google Scholar
Dudoit S, Fridlyand J (2003) Classification in microarray experimentse. In: Speed T (Ed.), Statistical analysis of gene expression microarray data. Interdisciplinary statistics. Chapman & Hall/CRC, Virginia Beach, 93–158
Google Scholar
Seiffert U, Hammer B, Kaski S, Villmann T (2006) neural networks and machine learning in bioinformatics – theory and applications. In: Proceedings of the 14th European Symposium on Artificial Neural Networks ESANN 2006, 521–532
Google Scholar
Cunningham, P. (2007) Ensemble Techniques. Technical Report UCD-CSI-2007–5
Google Scholar
Breiman L (1996) Bagging predictors. Machine Learning 24:123–140
Google Scholar
Efron B, Tibshirani R (1994) An introduction to the bootstrap. Chapman & Hall/CRC, Virginia Beach
Google Scholar
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on Machine Learning, 148–156
Google Scholar
Rätsch G, Onoda T, Müller KR (2001) Soft margins for AdaBoost. Machine Learning 42(3):287–320
Article Google Scholar
Ho, TK (1995) Random decision forest. In: Proceedings of the 3rd Int’l Conf on Document Analysis and Recognition, Montreal, Canada, August 14–18, 1995, 278–282
Google Scholar
Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Computation 9:1545–1588
Article Google Scholar
Breiman L (2001) Random forests. Machine Learning 45:5–32
Article Google Scholar
Dietterich TG (2002) Ensemble learning. In: Arbib MA (Ed.) The handbook of brain theory and neural networks, 2nd ed. The MIT Press, Cambridge, MA, 405–408
Google Scholar
Melville P, Mooney RJ (2003) Constructing diverse classifier ensembles using artificial training examples. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), 505–510
Google Scholar
Rodríguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10):1619–1630
Article PubMed Google Scholar
Stiglic G, Kokol P (2007) Effectiveness of rotation forest in meta-learning based gene expression classification. In: Proceedings of the 20th IEEE International Symposium on Computer-Based Medical Systems (CBMS 2007), 243–250
Google Scholar
Robnik-Sikonja M, Kononenko I (1997) An adaptation of relief for attribute estimation in regression. In: Machine Learning: Proceedings of the Fourteenth International Conference (ICML’97), 296–304
Google Scholar
Symons S, Nieselt K (2006) Data mining microarray data – comprehensive benchmarking of feature selection and classification methods (available at: www.zbit.unituebingen.de/pas/preprints/GCB2006/SymonsNieselt.pdf)
Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Academy of Sciences of the United States of America 99:6562–6566
Article PubMed CAS Google Scholar
Burman P (1989) A comparative study of ordinary cross-validation, v-fold cross-validation and repeated learning-testing methods. Biometrika 76:503–514
Google Scholar
Efron B, Tibshirani R (1997) Improvements on cross-validation: the. 632 + bootstrap method. Journal of the American Statistical Association 92:548–560
Article Google Scholar
Li J, Liu H (2003) Ensembles of cascading trees. In: Proceedings of the IEEE ICDM 2003 Conference 585
Google Scholar
Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A, Cheng C, Campana D, Wilkins D, Zhou X, Li J, Liu H, Pui CH, Evans WE, Naeve C, Wong L, Downing JR (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1:133–143
Article PubMed CAS Google Scholar
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Article PubMed CAS Google Scholar
van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Letters to Nature, Nature 415:530–536
Google Scholar
Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JY, Goumnerova LC, Black PM, Lau C, Allen JC, Zagzag D, Olson JM, Curran T, Wetmore C, Biegel JA, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis DN, Mesirov JP, Lander ES, Golub TR (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Letters to Nature, Nature 415:436–442
Article CAS Google Scholar
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of National Academy of Sciences of the United States of America 96:6745–6750
Article CAS Google Scholar
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J Jr, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511
Article PubMed CAS Google Scholar
Rosenwald A, Wright G, Chan W, Connors JM, Campo E, Fisher R, Gascoyne RD, Muller-Hermelink K, Smeland EB, Staut LM (2002) The use of molecular profiling to predict survival after themotheropy for diffuse large-B-cell lymphoma. The New England Journal of Medicine 346:1937–1947
Article PubMed Google Scholar
Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, Ray TS, Koval MA, Last KW, Norton A, Lister TA, Mesirov J, Neuberg DS, Lander ES, Aster JC, Golub TR (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine 8:68–74
Article PubMed CAS Google Scholar
Gordon GJ, Jensen RV, Hsiao LL, Gullans SR, Blumenstock JE, Ramaswamy S, Richards WG, Sugarbaker DJ, Bueno R (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62(17):4963–4967
PubMed CAS Google Scholar
Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M (2002) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinomas subclasses. Proceedings of National Academy of Sciences of the United States of America 98:13790–13795
Article Google Scholar
Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni MD, Orringer MB, Hanash S (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Medicine 18(8):816–824
Google Scholar
Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ (2002) MLL Translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics 30:41–47
Article PubMed CAS Google Scholar
Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Mills GB, Simone C, Fishman DA, Kohn EC, Liotta LA (2002) Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359:572–577
Article CAS Google Scholar
Singh D, Febbo P, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR (2002) Gene expression correlates of clinical prostate cancer behaviour. Cancer Cell 1(2):203–209
Article PubMed CAS Google Scholar
Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S (2005) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21:631–643
Article PubMed CAS Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools with Java implementations. Morgan Kaufmann, Massachusetts
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Health Sciences, University of Maribor, Zitna ulica 15, 2000, Maribor, Slovenia
Gregor Stiglic
Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova 17, 2000, Maribor, Slovenia
Gregor Stiglic

Authors

Gregor Stiglic
View author publications
You can also search for this author in PubMed Google Scholar
Juan J. Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
Peter Kokol
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gregor Stiglic .

Editor information

Editors and Affiliations

Dept. Computer Science, University of Georgia, Athens, 30602-7404, Georgia, USA
Hamid R. Arabnia
, Department of Computer Science, Lamar University, Beaumont, 77710, Texas, USA
Quoc-Nam Tran

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Stiglic, G., Rodriguez, J.J., Kokol, P. (2011). Rotation of Random Forests for Genomic and Proteomic Classification Problems. In: Arabnia, H., Tran, QN. (eds) Software Tools and Algorithms for Biological Systems. Advances in Experimental Medicine and Biology, vol 696. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7046-6_21

Download citation

DOI: https://doi.org/10.1007/978-1-4419-7046-6_21
Published: 15 March 2011
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7045-9
Online ISBN: 978-1-4419-7046-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics