Stability-Based Model Order Selection in Clustering with Applications to Gene Expression Data

  • Volker Roth
  • Mikio L. Braun
  • Tilman Lange
  • Joachim M. Buhmann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2415)

Abstract

The concept of cluster stability is introduced to assess the validity of data partitionings found by clustering algorithms. It allows us to explicitly quantify the quality of a clustering solution, without being dependent on external information. The principle of maximizing the cluster stability can be interpreted as choosing the most self-consistent data partitioning. We present an empirical estimator for the theoretically derived stability index, based on resampling. Experiments are conducted on well known gene expression data sets, re-analyzing the work by Alon et al. [1] and by Spellman et al. [8].

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    U. Alon, N. Barkai D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci., 96:6745–6750, 1999.CrossRefGoogle Scholar
  2. 2.
    J. Breckenridge. Replicating cluster analysis: Method, consistency and validity. Multivariate Behavioral research, 1989.Google Scholar
  3. 3.
    J. Fridlyand & S. Dudoit. Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method. Stat. Berkeley Tech Report. No. 600, 2001.Google Scholar
  4. 4.
    E. Levine, E. Domany. Resampling Method for Unsupervised Estimation of Cluster Validity. Neural Computation 13: 2573–2593, 2001.MATHCrossRefGoogle Scholar
  5. 5.
    D. H. Mack, E.Y. Tom, M. Mahadev, H. Dong, M. Mittman, S. Dee, A. J. Levine, T. R. Gingeras, D. J. Lockhart. In: Biology of Tumors, eds. K. Mihich, C. Croce, (Plenum, New York), pp. 123, 1998.Google Scholar
  6. 6.
    C.H. Papadimitriou & K. Steiglitz. Combinatorial Optimization, Algorithms and Complexity, Prentice-Hall, Englewood Cliffs, NJ, 1982.MATHGoogle Scholar
  7. 7.
    K. Rose, E. Gurewitz and G. Fox. Vector Quantization and Deterministic Annealing, IETrans. Inform. Theory, Vol. 38, No. 4, pp. 1249–1257, 1992.MATHCrossRefGoogle Scholar
  8. 8.
    P.T. Spellman, G. Sherlock, MQ. Zhang, V.R. Iyer, K. Anders, M.B. Eisen, P.O. Brown, D. Botstein, B. Futcher. Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization. Molecular Biology of the Cell 9, 3273–3297, 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Volker Roth
    • 1
  • Mikio L. Braun
    • 1
  • Tilman Lange
    • 1
  • Joachim M. Buhmann
    • 1
  1. 1.Institute of Computer Science, Dept.IIIUniversity of BonnBonnGermany

Personalised recommendations