Towards Knowledge Discovery from cDNA Microarray Gene Expression Data

  • Jan Komorowski
  • Torgeir R. Hvidsten
  • Tor-Kristian Jenssen
  • Dyre Tjeldvoll
  • Eivind Hovig
  • Arne K. Sandvik
  • Astrid Lægreid
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1910)


The advent of the so-called cDNA microarrays has offered the first possibility to obtain a global understanding of biological processes in living organisms by simultaneous readouts of tens of thousands of genes. Initial experiments suggest that genes with similar function have similar expression patterns in microarray experiments. Until now, most approaches to computational analysis of gene expressions have used unsupervised learning. Although in some cases unsupervised methods may be suficient, the complexity of the biological processes is so high that it is unlikely that purely syntactical analyses are capable of fully exploiting the richness of the microarray data. In addition, it seems natural to re-use the existing biological (background) knowledge. In this paper, we present some elements of a methodology for knowledge discovery from microarray experiments. Two source of bio-medical knowledge are used: Ashburner’s gene ontology and our own literature-derived network of gene-gene relations obtained by analysing Medline citation records. Predictive models can be induced and their classification quality validated through the ROC/AUC analysis and applied to provide hypotheses regarding the function of unclassified genes. The methodology has been so far tested on publicly available gene expression data and its results evaluated by molecular biologists and medical researchers.


Knowledge Discovery Squalene Epoxidase Computational Molecular Biology Norwegian Radium Hospital Universal Academy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Schena M, Shalon D, Davis R and Brown PO, Quantitative monitoring of gene expression patterns with a cDNA microarray, Science, 270:467–470, 1995.CrossRefGoogle Scholar
  2. 2.
    Deboucek and Goodfellow, Nature Genetics, 21 (1 Suppl):48–52, 1999.CrossRefGoogle Scholar
  3. 3.
    Brown MPS, Grundy WN, Cristianini N, Sugnet CW, Furey TS, Ares M and Haussler D, Knowledge-based analysis of microarray gene expression data by using support vector machines, PNAS, No. 1, Vol. 97:262–267, 1999.CrossRefGoogle Scholar
  4. 4.
    Eisen M, Spellman P, Brown P and Botstein D, Cluster analysis and display of genome-wide expression pattern, Proc. Natl. Acad. Sci. USA, 95:1464–1480, 1998.CrossRefGoogle Scholar
  5. 5.
    Kohonen T, The Self-Organizing Map, Proceedings of the IEEE, Vol. 78, No. 9:1464–1480, 1990.Google Scholar
  6. 8.
    Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JCF, Trent JM, Staudt LM, Dudson Jr. J, Boguski MS, Lashkari D, Shalon D, Botstein D and Brown PO, The transcriptional program in the response of human fibroblasts to serum, Science, 283:83–87, 1999.CrossRefGoogle Scholar
  7. 9.
    White JA, et al., Guidelines for human gene nomenclature, Genomics, 45(2):468–471, Oct 15 1997.CrossRefGoogle Scholar
  8. 10.
    White JA, et al., The HUGO Nomenclature Committee home page
  9. 11.
  10. 12.
    Jensen T-K, Lægreid A, Komorowski J and Hovig E, A literature network of human genes for high-throuput gene-expression analysis, submitted for publication, June 2000.Google Scholar
  11. 13.
    Pawlak Z, Rough Sets, International Journal of Computer and Information Sciences, Vol. 11:341–356,1982.CrossRefzbMATHMathSciNetGoogle Scholar
  12. 14.
    Komorowski J, Skowron A and Øhrn A, The Rosetta system, to appear in Handbook of Data Mining and Knowledge Discovery, (W. Klösgen, J. Zytkow, Eds.), Oxford University Press, 2000.Google Scholar
  13. 15.
    Komorowski J and Øhrn A, Modelling Prognostic Power of Cardiac Tests Using Rough Sets, Artificial Intelligence in Medicine, Vol. 15, No. 2:167–191, 1999.CrossRefGoogle Scholar
  14. 16.
    Hvidsten TR, Komorowski J, Lægreid A and Sandvik, Discovery of gene functions and processes from gene expressions and ontologies, submitted for publication, July 2000.Google Scholar
  15. 17.
    Hvidsten TR, Jenssen T-K, Komorowski J, Lægreid A, Sandvik A and Tjeldvoll D, Template-based gene expression analysis, in “Currents in Computational Molecular Biology-RECOMB 2000”, edited by S. Miyano, R. Shamir and T. Takagi, pp. 10–11, ISBN 4-946443-61-4, Universal Academy Press, Inc, April 8–11, 2000, Tokyo, Japan.Google Scholar
  16. 18.
    Jenssen T-K, Lægreid A, Komorowski J and Hovig E, PubGene: Discovering and visualising gene-gene relations, in “Currents in Computational Molecular Biology-RECOMB 2000”, edited by S. Miyano, R. Shamir and T. Takagi, pp. 48–49, ISBN 4-946443-61-4, Universal Academy Press, Inc, April 8–11, 2000, Tokyo, Japan.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Jan Komorowski
    • 1
  • Torgeir R. Hvidsten
    • 1
  • Tor-Kristian Jenssen
    • 1
  • Dyre Tjeldvoll
    • 1
  • Eivind Hovig
    • 2
  • Arne K. Sandvik
    • 3
  • Astrid Lægreid
    • 3
  1. 1.Knowledge Systems Group Department of Information and Computer ScienceNorwegian University of Science and TechnologyTrondheimNorway
  2. 2.Department of Tumor BiologyInstitute for Cancer ResearchOsloNorway
  3. 3.Department of Physiology and Biomedical EngineeringNorwegian University of Science and TechnologyTrondheimNorway

Personalised recommendations