A Distributed, Parallel System for Large-Scale Structure Recognition in Gene Expression Data
Due to the development of very high-throughput lab technology, known as DNA microarrays, it has become feasible for scientists to monitor the transcriptional activity of all known genes in many living organisms. Such assays are typically conducted repeatedly, along a timecourse or across a series of predefined experimental conditions, yielding a set of expression profiles. Arranging these into subsets, based on their pair-wise similarity, is known as clustering. Clusters of genes exhibiting similar expression behavior are often related in a biologically meaningful way, which is at the center of interest to research in functional genomics.
We present a distributed, parallel system based on spectral graph theory and numerical linear algebra that can solve this problem for datasets generated by the latest generation of microarrays, and at high levels of experimental noise. It allows us to process hundreds of thousands of expression profiles, thereby vastly increasing the current size limit for unsupervized clustering with full similarity information.
Keywordscomputational biology structure recognition gene expression analysis unsupervized clustering spectral graph theory
Unable to display preview. Download preview PDF.
- 1.Ernst, J.: Similarity-Based Clustering Algorithms for Gene Expression Profiles, Dissertation, TU München (2003)Google Scholar
- 2.Gourlay, A., Watson, G.: Computational Methods for Matrix Eigenproblems. John Wiley & Sons, New York (1973)Google Scholar
- 4.Press, W., Teukolsky, S., Vetterling, W., Flannery, B.: Numerical Recipes in C: The Art of Scientific Computing, 2nd edn. Cambridge University Press, Cambridge (1992)Google Scholar
- 6.Shamir, R., Sharan, R.: Algorithmic approaches to clustering gene expression data. Current Topics in Computational Biology, 269–300 (2002)Google Scholar
- 8.Valafar, F.: Pattern Recognition Techniques in Microarray Data: A Survey. Special Issue of Annals of New York, Techniques in Bioinformatics and Medical Informatics 980, 41–64 (2002)Google Scholar