Generation of Comprehensible Hypotheses from Gene Expression Data
Machine learning techniques have been recognized as powerful tools for the analysis of gene expression data. However, most learning techniques used in class prediction in gene expression analysis during the past years generate black-box models. Although the prediction accuracy of these models could be very well, they provide little insight into the biological facts. This paper holds the recognition that a more reasonable role for machine learning techniques is to generate hypotheses that can be verified or refined by human experts instead of making decisions for human experts. Based on this recognition, a general approach to generate comprehensible hypotheses from gene expression data is described and applied to human acute leukemias as a test case. The results demonstrate the feasibility of using machine learning techniques to help form hypotheses on the relationship between genes and certain diseases.
KeywordsSupport Vector Machine Acute Myeloid Leukemia Acute Lymphoblastic Leukemia Gene Expression Data Linear Discriminant Analysis
Unable to display preview. Download preview PDF.
- 3.Bishop, J.F.: Adult acute myeloid leukaemia: update on treatment. Medical Journal of Australia 170, 39–43 (1999)Google Scholar
- 8.Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
- 11.Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7, 673–679 (2001)CrossRefGoogle Scholar
- 13.Li, W., Yang, Y.: How many genes are needed for a discriminant microarray data analysis. In: Lin, S.M., Johnson, K.F. (eds.) Methods of Microarray Data Analysis, pp. 137–150. Kluwer, Boston (2001)Google Scholar
- 20.Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
- 22.Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics 2, S75–S83 (2003)Google Scholar
- 23.Yun, Z., Keong, K.C.: Identifying simple discriminatory gene vectors with an information theory approach. In: Proceedings of the 4th IEEE Computational Systems Bioinformatics Conference, Stanford, CA, pp. 13–24 (2005)Google Scholar