Cluster Analysis and Its Applications to Gene Expression Data

  • R. Sharan
  • R. Elkon
  • R. Shamir
Part of the Ernst Schering Research Foundation Workshop book series (SCHERING FOUND, volume 38)


Technologies for generating high-density arrays of cDNAs and oligonucleotides are developing rapidly, and changing the landscape of biological and biomedical research. They enable, for the first time, a global, simultaneous view on the transcription levels of many thousands of genes, when the cell undergoes specific processes and in certain conditions. For several organisms, the sequences of all genes are available, and thus, transcript levels of the complete gene collection can already be monitored today. The potential of such technologies is tremendous. Monitoring gene expression levels in different developmental stages, tissue types, clinical conditions, and different organisms can help in our understanding of gene function and gene networks, assist in the diagnosis of disease conditions, and reveal the effects of medical treatments. Undoubtedly, other applications will emerge in coming years.


Acute Lymphoblastic Leukemia Gene Expression Data Cluster Solution Cluster Problem Reference Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Alizadeh AA, Eisen MB, Davis R, Ma C, Lossos I, Rosenwald A, Boldrick J, Warnke R, Levy R, Wilson W, Grever M, Byrd J, Botstein D, Brown PO, Straudt LM (2000) Distinct types of diffuse large B-cell lymphomas identified by gene expression profiling. Nature 403: 503–511PubMedCrossRefGoogle Scholar
  2. Alon U, Barkai N, Notterman DA, Gish G, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96: 6745–6750PubMedCrossRefGoogle Scholar
  3. Ben-Dor A, Shamir R, Yakhini Z (1999) Clustering gene expression patterns. J Comput Biol 6 (314): 28l - 297Google Scholar
  4. Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000) Tissue classification with gene expression profiles. J Comput Biol 7 (3/4): 559–583PubMedCrossRefGoogle Scholar
  5. Brazma A, Vilo J (2000) Gene expression data analysis. FEBS Letters 480: 17–24PubMedCrossRefGoogle Scholar
  6. Cho LU, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodica L, Wolfsberg TG et al (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2: 65–73PubMedCrossRefGoogle Scholar
  7. Clarke PA, George M, Cunningham D, Swift I, Workman P (1999) Analysis of tumor gene expression following chemotherapeutic treatment of patients with bowel cancer In proc. Nature Genetics Microarray Meeting 99, Scottsdale, Arizona, p 39Google Scholar
  8. Coller H, Gradori C, Tamayo P, Colbert T, Lander E, Eisenman R, Golub TR (2000) Expression analysis with oligonucleotide reveals that C-Myc regulates genes involved in growth, cell-cycle, signaling and adhesion. Proc Natl Acad Sci USA 97 (7): 3260–3265PubMedCrossRefGoogle Scholar
  9. Cormack RM (1971) A review of classification (with discussion). J Royal Statistical Society, Series A 134: 321–367Google Scholar
  10. Dudoit S, Fridlyand J, Speed TP (2000) Comparison of discrimination methods for the classification of tumors using gene expression data. Technical report #576, Dept. of Statistics, university of California, BerkeleyGoogle Scholar
  11. Eisen MB, Brown PO (1999) DNA arrays for analysis of gene expression. Methods Enzymol 303: 179–205PubMedCrossRefGoogle Scholar
  12. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95: 14863–14868PubMedCrossRefGoogle Scholar
  13. Even S (1979) Graph Algorithms Computer Science Press, Rockville, MarylandGoogle Scholar
  14. Everitt B (1993) Cluster analysis. Edward Arnold, London, third editionGoogle Scholar
  15. Fodor SP, RP Rua, Huang XC, Pease AC, Holmes CP, Adams CL (1993) Multiplexed biochemical assays with biological chips. Nature 364: 555–556PubMedCrossRefGoogle Scholar
  16. Furey TS, Cristianini N, Duffy N, Bendarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16: 906–914PubMedCrossRefGoogle Scholar
  17. Getz G, Levine E, Domany E, Mang MQ (2000) Super-paramagnetic clustering of yeast gene expression profiles. Physica A279: 457CrossRefGoogle Scholar
  18. Golub T, Slonim D, Tamayo P, Huard CM, Caasenbeek JM, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander E (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286: 531–537PubMedCrossRefGoogle Scholar
  19. Golumbic MC (1980) Algorithmic graph theory and perfect graphs. Academic Press, New YorkGoogle Scholar
  20. Hansen P, Jaumard B (1997) Cluster analysis and mathematical programming. Mathemat Program 79: 191–215Google Scholar
  21. Hao J, Orlin J (1994) A faster algorithm for finding the minimum cut in a directed graph. J Algorithm 17 (3): 424–446CrossRefGoogle Scholar
  22. Harkin DP, Bean J, Miklos D, Song Y, Maheswaram V, Oliver J, Haber D (1999) Induction of GADD45 and JNK/SAPK-dependent apoptosis following inducible expression of BRCAl. Cell 97: 575–586PubMedCrossRefGoogle Scholar
  23. Harrington CA, Rosenow C, Retief J (2000) Monitoring gene expression using DNA microarrays. Curr Opin Microbiol 3 (3): 285–291PubMedCrossRefGoogle Scholar
  24. Hartigan JA (1975) Clustering algorithms. John Wiley and SonsGoogle Scholar
  25. Hartuv E, Shamir R (2000) A clustering algorithm based on graph connectivity. Inf. Process Lett 76: 175–181CrossRefGoogle Scholar
  26. Herwig R, Poustka AJ, Meuller C, Lehrach H, O’Brien J (1999) Large-scale clustering of cDNA-fingerprinting data. Genome Res 9 (11): 1093–1105PubMedCrossRefGoogle Scholar
  27. Heyer LJ, Kruglyak S, Yooseph S (1999) Exploring expression data: identifica- tion and analysis of coexpressed genes. Genome Res 9 (11): 1106–1115PubMedCrossRefGoogle Scholar
  28. Hughes JD, Estep PE, Tavazoie S, Church GM (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 296: 1205–1214PubMedCrossRefGoogle Scholar
  29. Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JCF, Trent M, Staudt LM, Hudson J, Boguski MS, Lashkari D, Shalon D, Botstein D, Brown PO (1999) The transcriptional program in the response of human fibroblast to serum. Science 283: 83–87PubMedCrossRefGoogle Scholar
  30. Jelinsky SA, Estep P, Church QM, Samson LD (2000) Regulatory networks revealed by transcriptional profiling of damaged Saccharomyces cerevisiae cells: Rpn4 links base excision repair with proteasomes. MCB 20 (21): 8157–8167PubMedCrossRefGoogle Scholar
  31. Kerr MK, Martin M, Churchill GA (2000) Analysis of variance for gene expression microarray data. Technical report, The Jackson LaboratoryGoogle Scholar
  32. Kohonen T (1997) Self-organizing maps. Springer, BerlinCrossRefGoogle Scholar
  33. Lance GN, Williams WT (1967) A general theory of classification sorting strategies 1 hierarchical systems. Comput J 9: 373–380CrossRefGoogle Scholar
  34. Lipshutz RJ, Fodor SPA, Gingeras TR, Lockhart DJ (2000) High density synthetic oligonucleotide arrays. Nat Genet Suppl 21: 20–24CrossRefGoogle Scholar
  35. Livesey FJ, Furukawa T, Steffen MA, Church GM, Cepko CL (2000) Microarray analysis of the transcriptional network controlled by the photoreceptor homeobox gene Crx. Curr Biol 10: 301–310PubMedCrossRefGoogle Scholar
  36. Maleck K, Levine A, Eulgem T, Morgan A, Schmid J, Lawton KA, Dangl JL, Dietrich RA (2000) The transcriptome of Arabidopsis thaliana during systematic acquired resistance. Nat Genet 26: 403–410PubMedCrossRefGoogle Scholar
  37. Marshall A, Hodgson J (1998) DNA chips: an array of possibilities. Nat Biotechnol 16: 27–31PubMedCrossRefGoogle Scholar
  38. Milosavljevic A, Strezoska Z, Zeremski M, Grujic D, Paunesku T, Crkven- jakov R (1995) Clone clustering by hybridization Genomics 27: 83–89Google Scholar
  39. Mirkin B (1996) Mathematical Classification and Clustering. KluwerGoogle Scholar
  40. Poustka AJ, Herwig R, Krause A, Hennig S, Meier-Ewert S, Lehrach H (1999) Toward the gene catalogue of sea urchin development: the construction and analysis of an unfertilized egg cDNA library highly normalized by oligonucleotide fingerprinting. Genomics 59: 122–133PubMedCrossRefGoogle Scholar
  41. Ramsay G (1998) DNA chips: state-of-the art. Nat Biotechnol 16: 40–44PubMedCrossRefGoogle Scholar
  42. Roth FP, Hughes JD, Estep PW, Church GM (1998) Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 16: 939–908PubMedCrossRefGoogle Scholar
  43. Schena M (1996) Genome analysis with gene expression microarrays. Bioessays 18: 427–431PubMedCrossRefGoogle Scholar
  44. Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW (1996) Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proc Natl Acad Sci USA 93: 10614–10619PubMedCrossRefGoogle Scholar
  45. Shamir R, Sharan R (2001) Algorithmic approaches to clustering gene expression data In: T Jiang, T Smith, Y Xu, MQ Zhang (eds) Current topics in computational biology. MIT PressGoogle Scholar
  46. Sharan R, Shamir R (2000) CLICK: A clustering algorithm with applications to gene expression analysis. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), pp 307–316Google Scholar
  47. Spellman PT, Sherlock G, Zhang M, Iyer VR, Anders K, Eisen M, Brown PO, Botstein D, Futcher B (1998) Comprehensive identification of cell cycle regulated gene of the yeast Saccharomyces Cerevisia by microarray hybridization. Mol Biol Cell 9: 3273–3297PubMedGoogle Scholar
  48. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Nail Acad Sci USA 96: 2907–2912CrossRefGoogle Scholar
  49. Tavazoie S, Hughes J, Campbell M, Cho R, Church GM (1999) Systematic determination of genetic network architecture. Nat Genet 22: 281–285PubMedCrossRefGoogle Scholar
  50. Toronen P, Kolehmainen M, Wong G, Castren E (1999) Analysis of gene expression data using self-organizing maps. FEBS Letters, 451: 142–146PubMedCrossRefGoogle Scholar
  51. Werner T (2001) Target gene identification from expression array data by promoter analysis. Biomol Eng 17: 87–94PubMedCrossRefGoogle Scholar
  52. Xiong M, Jin L, Li W, Boerwinkle E (2000) Computational methods for gene expression based tumor classification. Biotechniques 29: 1264–1270PubMedGoogle Scholar
  53. Yeung KY, Haynor DR, Ruzzo WL (2001) Validating clustering for gene expression data. Bioinformatics 17: 309–318PubMedCrossRefGoogle Scholar
  54. Zhang MQ (1999) Large scale gene expression data analysis: a new challenge to computational biologists. Genome Res 9: 681–688PubMedGoogle Scholar
  55. Zhao R, Gish K, Yin Y, Notterman D, Hoffman W, Tom E, Mak D, Levine M (2000) Analysis of p53 regulated gene expression patterns using oligonucleotide arrays Genes and Dev. 14: 981–993Google Scholar
  56. Zhu J, Zhang MQ (1999) SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics 15: 607–611PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • R. Sharan
  • R. Elkon
  • R. Shamir

There are no affiliations available

Personalised recommendations