A Distributed, Parallel System for Large-Scale Structure Recognition in Gene Expression Data

  • Jens Ernst
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4208)


Due to the development of very high-throughput lab technology, known as DNA microarrays, it has become feasible for scientists to monitor the transcriptional activity of all known genes in many living organisms. Such assays are typically conducted repeatedly, along a timecourse or across a series of predefined experimental conditions, yielding a set of expression profiles. Arranging these into subsets, based on their pair-wise similarity, is known as clustering. Clusters of genes exhibiting similar expression behavior are often related in a biologically meaningful way, which is at the center of interest to research in functional genomics.

We present a distributed, parallel system based on spectral graph theory and numerical linear algebra that can solve this problem for datasets generated by the latest generation of microarrays, and at high levels of experimental noise. It allows us to process hundreds of thousands of expression profiles, thereby vastly increasing the current size limit for unsupervized clustering with full similarity information.


computational biology structure recognition gene expression analysis unsupervized clustering spectral graph theory 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ernst, J.: Similarity-Based Clustering Algorithms for Gene Expression Profiles, Dissertation, TU München (2003)Google Scholar
  2. 2.
    Gourlay, A., Watson, G.: Computational Methods for Matrix Eigenproblems. John Wiley & Sons, New York (1973)Google Scholar
  3. 3.
    Jiang, D., Tang, C., Zhang, A.: Cluster Analysis for Gene Expression Data: A Survey. IEEE Transactions on Knowledge and Data Engineering 16(11), 1370–1386 (2004)CrossRefGoogle Scholar
  4. 4.
    Press, W., Teukolsky, S., Vetterling, W., Flannery, B.: Numerical Recipes in C: The Art of Scientific Computing, 2nd edn. Cambridge University Press, Cambridge (1992)Google Scholar
  5. 5.
    Schena, M., Shalon, D., Davis, R.W., Brown, P.O.: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995)CrossRefGoogle Scholar
  6. 6.
    Shamir, R., Sharan, R.: Algorithmic approaches to clustering gene expression data. Current Topics in Computational Biology, 269–300 (2002)Google Scholar
  7. 7.
    Spira, A., Beane, J., Shah, V., Liu, G., Schembri, F., Yang, X., Palma, J., Brody, J.S.: Effects of cigarette smoke on the human airway epithelial cell transcriptome. Proc. Natl. Acad. Sci. US 101(27), 10143–10148 (2004)CrossRefGoogle Scholar
  8. 8.
    Valafar, F.: Pattern Recognition Techniques in Microarray Data: A Survey. Special Issue of Annals of New York, Techniques in Bioinformatics and Medical Informatics 980, 41–64 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jens Ernst
    • 1
  1. 1.Lehrstuhl für Effiziente Algorithmen, Institut für InformatikTechnische UniversitätMünchen

Personalised recommendations