Journal of Statistical Physics

, Volume 110, Issue 3–6, pp 1117–1139 | Cite as

Cluster Analysis of Gene Expression Data

  • Eytan Domany


The expression levels of many thousands of genes can be measured simultaneously by DNA microarrays (chips). This novel experimental tool has revolutionized research in molecular biology and generated considerable excitement. A typical experiment uses a few tens of such chips, each dedicated to a single sample—such as tissue extracted from a particular tumor. The results of such an experiment contain several hundred thousand numbers, that come in the form of a table, of several thousand rows (one for each gene) and 50–100 columns (one for each sample). We developed a clustering methodology to mine such data. In this review I provide a very basic introduction to the subject, aimed at a physics audience with no prior knowledge of either gene expression or clustering methods. I explain what genes are, what is gene expression and how it is measured by DNA chips. Next I explain what is meant by “clustering” and how we analyze the massive amounts of data from such experiments, and present results obtained from analysis of data from colon cancer, brain tumors and breast cancer.

Gene expression DNA chips microarrays clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    E. Domany, K. K. Mon, G. V. Chester, and M. E. Fisher, Phys. Rev. B 12:5025(1975).Google Scholar
  2. 2.
    D. Mukamel, M. E. Fisher, and E. Domany, Phys. Rev. Lett. 37:565(1976).Google Scholar
  3. 3.
    R. Sharan and R. Shamir, in Current Topics in Computational Biology (MIT Press, Boston, 2002), p. 269.Google Scholar
  4. 4.
    B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts, and J. D. Watson, Molecular Biology of the Cell, 3rd Ed. (Garland Publishing, New York, 1994).Google Scholar
  5. 5.
    J. L. Gould and W. T. Keeton, Biological Science, 6th Ed. (W. W. Norton, New York, London, 1996).Google Scholar
  6. 6.
    A. Schulze and J. Downward, Nature Cell. Biol. 3:190(2001)Google Scholar
  7. 7.
    See for information.Google Scholar
  8. 8.
    See Scholar
  9. 9.
    U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine, Proc. Natl. Acad. Sci. USA 96:6745(1999).Google Scholar
  10. 10.
    J. Khan, J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson, and P. S. Meltzer. Nat. Med. 7:673–9 (2001).Google Scholar
  11. 11.
    A. K. Jain and R. C. Dubes, Algorithms for Clustering Data (Prentice-Hall, Englewood Cliffs, NJ, 1988).Google Scholar
  12. 12.
    O. R. Duda, P. E. Hart, and D. G. Stork, Pattern Classification (Wiley, New York, 2001)Google Scholar
  13. 13.
    M. Eisen, P. Spellman, P. Brown, and D. Botstein, Proc. Natl. Acad. Sci. USA 95:14863(1998).Google Scholar
  14. 14.
    T. Kohonen, Self Organizing Maps (Springer, Berlin, 1997).Google Scholar
  15. 15.
    K. Rose, E. Gurewitz, and G. C. Fox, Phys. Rev. Lett. 65:945(1990).Google Scholar
  16. 16.
    L. Angelini, F. De Carlo, C. Marangi, M. Pellicor, and S. Stramaglia, Phys. Rev. Lett 85:554(2000).Google Scholar
  17. 17.
    J. Schneider, Phys. Rev. E 57:2449(1998)Google Scholar
  18. 18.
    M. Blatt, S. Wiseman, and E. Domany, Phys. Rev. Lett. 76:3251(1996).Google Scholar
  19. 19.
    M. Blatt, S. Wiseman, and E. Domany, Neural Comp. 9:1805(1997).Google Scholar
  20. 20.
    M. Blatt, Non-ferromagnetic Potts models can be obtained from maximum likelihood and maximum entropy principles; Ph.D. thesis, Weizmann Inst. Of Science (1997) andGoogle Scholar
  21. 21.
    L. Giada and M. Marsili, Phys. Rev. E 63:1101(2001).Google Scholar
  22. 22.
    E. Domany, M. Blatt, Y. Gdalyahu, and D. Weinshall, Comp. Phys. Comm. 121:5(1999).Google Scholar
  23. 23.
    G. Getz, E. Levine, E. Domany, and M. Zhang, Phys. A 279:457(2000).Google Scholar
  24. 24.
    P. T. Spellman et al., Mol. Biol. Cell 9:3273(1998).Google Scholar
  25. 25.
    K. Kannan, N. Amariglio, G. Rechavi, J. Jakob-Hirsch, I. Kela, N. Kaminski, G. Getz, E. Domany, and D. Givol, Oncogene 20:2225(2001).Google Scholar
  26. 26.
    G. Fontemaggi, I. Kela, N. Amariglio, G. Rechavi, J. Krishnamurthy, S. Strano, A. Sacchi, D. Givol, and G. Blandino, Identification of direct p73 target genes combining DNA microarray and chromatin immunoprecipitation analyses; Comparison with p53 targets, Oncogene (in print 2002).Google Scholar
  27. 27.
    G. Getz, E. Levine, and E. Domany, Proc. Natl. Acad. Sci. USA 97:12079(2000).Google Scholar
  28. 28.
    A. Califano, G. Stolovitsky, and Y. Tu, Proc. Int. Conf. Intell. Syst. Mol. Biol. 8:75(2000).Google Scholar
  29. 29.
    Y. Cheng, and G. M. Church, Proc. Int. Conf. Intell. Syst. Mol. Biol. 8:93(2000).Google Scholar
  30. 30.
    A. Tanay, R. Sharan, and R. Shamir, Proc. Int. Conf. Intell. Syst. Mol. Biol. (in print, 2002)Google Scholar
  31. 31.
    J. Ihmels, G. Friedlander, S. Bergmann, O. Sarig, Y. Ziv, and N. Barkai, Nature Genetics 31:370(2002).Google Scholar
  32. 32.
    T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander, Science 286:531(1999).Google Scholar
  33. 33.
    I. Kela, Unraveling Biological Information from Gene Expression Data, Using Advanced Clustering Techniques, M.Sc. thesis (Weizmann Institute of Science, 2001). Available at Scholar
  34. 34.
    C. M. Perou et al., Nature 406:747(2000).Google Scholar
  35. 35.
    T. Sorlie et al., Proc. Natl. Acad. Sci. USA, 19:10869(2001).Google Scholar
  36. 36.
    S. Godard, G. Getz, H. Kobayashi, M. Nozaki, A.-C. Diserens, M.-F. Hamou, R. Stupp, R. C. Janzer, P. Bucher, N. de Tribolet, E. Domany, and M. E. Hegi (submitted, 2002).Google Scholar
  37. 37.
    J.-E. Dazard, H. Gal, N. Amariglio, G. Rechavi, E. Domany, and D. Givol (submitted 2002)Google Scholar
  38. 38.
    D. A. Notterman, U. Alon, A. J. Sierk, and A. J. Levine, Cancer Res. 7:3124(2001).Google Scholar
  39. 39.
    G. Getz, H. Gal, I. Kela, D. A. Notterman, and Eytan Domany, Bioinformatics (in print 2002).Google Scholar
  40. 40.
    F. Quintana, G. Getz, G. Hed, E. Domany, and I. R. Cohen (submitted 2002).Google Scholar

Copyright information

© Plenum Publishing Corporation 2003

Authors and Affiliations

  • Eytan Domany
    • 1
  1. 1.Department of Physics of Complex SystemsWeizmann Institute of ScienceRehovotIsrael

Personalised recommendations