Relational Subgroup Discovery for Descriptive Analysis of Microarray Data
This paper presents a method that uses gene ontologies, together with the paradigm of relational subgroup discovery, to help find description of groups of genes differentially expressed in specific cancers. The descriptions are represented by means of relational features, extracted from gene ontology information, and are straightforwardly interpretable by the medical experts. We applied the proposed method to two known data sets: acute lymphoblastic leukemia (ALL) vs. acute myeloid leukemia and classification of fourteen types of cancer. Significant number of discovered groups of genes had a description, confirmed by the medical expert, which highlighted the underlying biological process that is responsible for distinguishing one class from the other classes. We view our methodology not just as a prototypical example of applying sophisticated machine learning algorithms to microarray data, but also as a motivation for developing more sophisticated functional annotations and ontologies, that can be processed by such learning algorithms.
KeywordsGene Ontology Acute Myeloid Leukemia Acute Lymphoblastic Leukemia Gene Expression Data Inductive Logic Programming
Unable to display preview. Download preview PDF.
- 1.Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning, 261–283 (1989)Google Scholar
- 9.Muggleton, S.: Inverse entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming 13(3-4), 245–286 (1995)Google Scholar
- 12.Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263. Springer, Heidelberg (1997)Google Scholar