Abstract
Many advanced machine learning and statistical methods have recently been employed in classification of gene expression measurements. Although many of these methods can achieve high accuracy, they generally lack comprehensibility of the classification process. In this paper a new method for interpretation of small ensembles of classifiers is used on gene expression data from real-world dataset. It was shown that interactive interpretation systems that were developed for classical machine learning problems also give a great range of possibilities for the scientists in the bioinformatics field. Therefore we chose a gene expression dataset discriminating three types of Leukemia as a testbed for the proposed Visual Interpretation of Small Ensembles (VISE) tool. Our results show that using the accuracy of ensembles and adding comprehensibility gains not only accurate but also results that can possibly represent new knowledge on specific gene functions.
Chapter PDF
References
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36(1/2), 525–536 (1999)
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision tress: Bagging, boosting and randomization. Machine Learning 40(2), 139–158 (2000)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on Machine Learning, pp. 148–156. Morgan Kauffman, San Francisco (1996)
Kuncheva, L., Whitaker, C.: Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy. Machine Learning 51, 181–207 (2003)
Hall, L.O., Bowyer, K.W., Banfield, R.E., Bhadoria, D., Kegelmeyer, W.P., Eschrich, S.: Comparing Pure Parallel Ensemble Creation Techniques Against Bagging. In: The Third IEEE International Conference on Data Mining, Melbourne, Florida, pp. 533–536 (November 2003)
Melnik, O., Pollack, J.B.: Theory and scope of exact representation extraction from feed-forward networks. Cognitive Systems Research 3(2) (2002)
Urbanek, S.: Exploring Statistical Forests. In: Proc. of the 2002 Joint Statistical Meeting, Mira DP (2002)
Frank, E., Hall, M.: Visualizing Class Probability Estimators. In: Proceedings of the European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat, Croatia (2003)
Caragea, D., Cook, D., Honavar, V.: Visual Methods for Examining Support Vector Machine Results, ISU Technical Report (December 2005)
Lee, E.K., Cook, D., Wurtele, E., Kim, D., Kim, J., An, H.: GENEGOBI: Visual Data Analysis Aid Tools for Microarray Data. In: Computational Statistics 2004 Symposium (COMPSTAT 2004) (2004)
Curk, T., Demsar, J., Xu, Q., Leban, G., Petrovic, U., Bratko, I., Shaulsky, G., Zupan, B.: Microarray data mining with visual programming. Bioinformatics 21(3), 396–398 (2005)
Stiglic, G., Mertik, M., Podgorelec, V., Kokol, P.: Using Visual Interpretation of Small Ensembles in Microarray Analysis. In: Proceedings of Computer Based Medical Systems, Salt Lake City, UT, USA (2006)
Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., Sallan, S.E., Lander, E.S., Golub, T.R., Korsmeyer, S.J.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukaemia. Nat. Genet. 30(1), 41–47 (2002)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2005)
Bachner, D., Sedlacek, Z., Korn, B., Hameister, H., Poustka, A.: Expression patterns of two human genes coding for different rab GDP-dissociation inhibitors (GDIs), extremely conserved proteins involved in cellular transport. Hum. Mol. Genet. 4(4), 701–708 (1995)
Cutrona, G., Tasso, P., et al.: CD10 is a marker for cycling cells with propensity to apoptosis in childhood ALL. Br. J. Cancer 86(11), 1776–1785 (2002)
Fasching, K., Panzer, S., Haas, O.A., et al.: Presence of clone-specific antigen receptor gene rearrangements at birth indicates an in utero origin of diverse types of early childhood acute lymphoblastic leukemia. Blood 95(8), 2722–2724 (2000)
Kawata, H., Yamada, K., Shou, Z., Mizutani, T., Yazawa, T., Yoshino, M., Sekiguchi, T., Kajitani, T., Miyamoto, K.: Zinc-fingers and homeoboxes (ZHX) 2, a novel member of the ZHX family, functions as a transcriptional repressor. Biochem. J. 373(Pt 3), 747–757 (2003)
Ogawa, H., Iwaya, K., Izumi, M., Kuroda, M., Serizawa, H., Koyanagi, Y., Mukai, K.: Expression of CD10 by stromal cells during colorectal tumor development. Hum. Pathol. 33(8), 806–811 (2002)
Sheikh, S.S., Kallakury, B.V., Al-Kuraya, K.A., Meck, J., Hartmann, D.P., Bagg, A.: CD5-negative, CD10-negative small B-cell leukemia: variant of chronic lymphocytic leukemia or a distinct entity? Am. J. Hematol. 71(4), 306–310 (2002)
Shipp, M.A., Tarr, G.E., Chen, C.Y., Switzer, S.N., Hersh, L.B., Stein, H., Sunday, M.E., Reinherz, E.L.: CD10/neutral endopeptidase 24.11 hydrolyzes bombesin-like peptides and regulates the growth of small cell carcinomas of the lung. Proc. Natl. Acad. Sci. USA 88(23), 10662–10666 (1991)
Shisheva, A., Sudhof, T.C., Czech, M.P.: Cloning, characterization, and expression of a novel GDP dissociation inhibitor isoform from skeletal muscle. Mol. Cell Biol. 14(5), 3459–3468 (1994)
Strausberg, R.L., Feingold, E.A., et al.: Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc. Natl. Acad. Sci. USA 99(26), 16899–16903 (2002)
Toyoda, M., Nakamura, M., Makino, T., Kagoura, M., Morohashi, M.: Sebaceous glands in acne patients express high levels of neutral endopeptidase. Exp. Dermatol. 11(3), 241–247 (2002)
Weitzdoerfer, R., Stolzlechner, D., Dierssen, M., Ferreres, J., Fountoulakis, M., Lubec, G.: Reduction of nucleoside diphosphate kinase B, Rab GDP-dissociation inhibitor beta and histidine triad nucleotide-binding protein in fetal Down syndrome brain. J. Neural Transm. Suppl. 61, 347–359 (2001)
Yagi, T., Hibi, S., Tabata, Y., et al.: Detection of clonotypic IGH and TCR rearrangements in the neonatal blood spots of infants and children with B-cell precursor acute lymphoblastic leukemia. Blood 96(1), 264–268 (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stiglic, G., Khan, N., Verlic, M., Kokol, P. (2007). Gene Expression Analysis of Leukemia Samples Using Visual Interpretation of Small Ensembles: A Case Study. In: Rajapakse, J.C., Schmidt, B., Volkert, G. (eds) Pattern Recognition in Bioinformatics. PRIB 2007. Lecture Notes in Computer Science(), vol 4774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75286-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-75286-8_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75285-1
Online ISBN: 978-3-540-75286-8
eBook Packages: Computer ScienceComputer Science (R0)