Towards Knowledge Discovery from cDNA Microarray Gene Expression Data
The advent of the so-called cDNA microarrays has offered the first possibility to obtain a global understanding of biological processes in living organisms by simultaneous readouts of tens of thousands of genes. Initial experiments suggest that genes with similar function have similar expression patterns in microarray experiments. Until now, most approaches to computational analysis of gene expressions have used unsupervised learning. Although in some cases unsupervised methods may be suficient, the complexity of the biological processes is so high that it is unlikely that purely syntactical analyses are capable of fully exploiting the richness of the microarray data. In addition, it seems natural to re-use the existing biological (background) knowledge. In this paper, we present some elements of a methodology for knowledge discovery from microarray experiments. Two source of bio-medical knowledge are used: Ashburner’s gene ontology and our own literature-derived network of gene-gene relations obtained by analysing Medline citation records. Predictive models can be induced and their classification quality validated through the ROC/AUC analysis and applied to provide hypotheses regarding the function of unclassified genes. The methodology has been so far tested on publicly available gene expression data and its results evaluated by molecular biologists and medical researchers.
KeywordsKnowledge Discovery Squalene Epoxidase Computational Molecular Biology Norwegian Radium Hospital Universal Academy
- 5.Kohonen T, The Self-Organizing Map, Proceedings of the IEEE, Vol. 78, No. 9:1464–1480, 1990.Google Scholar
- 10.White JA, et al., The HUGO Nomenclature Committee home page http://www.gene.ucl.ac.uk/nomenclature.
- 11.Jenssen TK, The PubGene home page http://www.idi.ntnu.no/grupper/KS-grp/microarray/pubgen/genes.cgi.
- 12.Jensen T-K, Lægreid A, Komorowski J and Hovig E, A literature network of human genes for high-throuput gene-expression analysis, submitted for publication, June 2000.Google Scholar
- 14.Komorowski J, Skowron A and Øhrn A, The Rosetta system, to appear in Handbook of Data Mining and Knowledge Discovery, (W. Klösgen, J. Zytkow, Eds.), Oxford University Press, 2000.Google Scholar
- 16.Hvidsten TR, Komorowski J, Lægreid A and Sandvik, Discovery of gene functions and processes from gene expressions and ontologies, submitted for publication, July 2000.Google Scholar
- 17.Hvidsten TR, Jenssen T-K, Komorowski J, Lægreid A, Sandvik A and Tjeldvoll D, Template-based gene expression analysis, in “Currents in Computational Molecular Biology-RECOMB 2000”, edited by S. Miyano, R. Shamir and T. Takagi, pp. 10–11, ISBN 4-946443-61-4, Universal Academy Press, Inc, April 8–11, 2000, Tokyo, Japan.Google Scholar
- 18.Jenssen T-K, Lægreid A, Komorowski J and Hovig E, PubGene: Discovering and visualising gene-gene relations, in “Currents in Computational Molecular Biology-RECOMB 2000”, edited by S. Miyano, R. Shamir and T. Takagi, pp. 48–49, ISBN 4-946443-61-4, Universal Academy Press, Inc, April 8–11, 2000, Tokyo, Japan.Google Scholar