Effectivity of Internal Validation Techniques for Gene Clustering

  • Chunmei Yang
  • Baikun Wan
  • Xiaofeng Gao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4345)


Clustering is a major exploratory technique for gene expression data in post-genomic era. As essential tools within cluster analysis, cluster validation techniques have the potential to assess the quality of clustering results and performance of clustering algorithms, helpful to the interpretation of clustering results. In this work, the validation ability of Silhouette index, Dunn’s index, Davies-Bouldin index and FOM in gene clustering was investigated with public gene expression datasets clustered by hierarchical single-linkage and average-linkage clustering, K-means and SOMs. It was made clear that Silhouette index and FOM can preferably validate the performance of clustering algorithms and the quality of clustering results, Dunn’s index should not be used directly in gene clustering validation for its high susceptibility to outliers, while Davies- Bouldin index can afford better validation than Dunn’s index, exception for its preference to hierarchical single-linkage clustering.


gene expression data gene clustering cluster validation internal validation measure 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: a survey. IEEE Transactions on Knowledge and Data Engineering 16, 1370–1386 (2004)CrossRefGoogle Scholar
  2. 2.
    Amir, B., Friedman, N., Yakhini, Z.: Class discovery in gene expression data. In: RECOMB, pp. 31–38 (2001)Google Scholar
  3. 3.
    Quackenbush, J.: Computational analysis of microarray data. Nat. Rev. Genet. 2, 418–427 (2001)CrossRefGoogle Scholar
  4. 4.
    Slonim, D.K.: From patterns to pathways: gene expression data analysis comes of age. Nature Genetics 32, 502–508 (2002)CrossRefGoogle Scholar
  5. 5.
    Sherlock, G.: Analysis of large-scale gene expression data. Current Opinion in Immunology 12, 201–205 (2000)CrossRefGoogle Scholar
  6. 6.
    Datta, S., Datta, S.: Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19, 459–466 (2003)CrossRefGoogle Scholar
  7. 7.
    Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinformatics 17, 309–318 (2001)CrossRefGoogle Scholar
  8. 8.
    Eisen, M.B., Spellman, P.T., Brown, P.O., et al.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)CrossRefGoogle Scholar
  9. 9.
    Halkidi, M.: On clustering validation techniques. J. Intell. Inform. Syst. 17, 107–145 (2001)zbMATHCrossRefGoogle Scholar
  10. 10.
    Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21, 3201–3212 (2005)CrossRefGoogle Scholar
  11. 11.
    Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Processing 83, 825–833 (2003)zbMATHCrossRefGoogle Scholar
  12. 12.
    Ji, X.L., Li, L.J., Sun, Z.R.: Mining gene expression data using a novel approach based on hidden Markov models. FEBS Letters 542, 125–131 (2003)CrossRefGoogle Scholar
  13. 13.
    Bolshakova, N., Azuaje, F.: Improving expression data mining through cluster validation. In: Proc. of the 4th Annual IEEE conf. on Information Technology Application in Biomedicine, pp. 19–22 (2003)Google Scholar
  14. 14.
    Chu, S., DeRisi, J., Eisen, M., et al.: The transcriptional program of sporulation in budding yeast. Science 282, 699–705 (1998)CrossRefGoogle Scholar
  15. 15.
    Cho, R.J., Campbell, M.J., Winzeler, E.A., et al.: A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2, 65–73 (1998)CrossRefGoogle Scholar
  16. 16.
    Tavazoie, S., Huges, J.D., Campbell, M.J., et al.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)CrossRefGoogle Scholar
  17. 17.
    Wen, X.L., Fuhrman, S., Michaels, G.S., et al.: Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA 95, 334–339 (1998)CrossRefGoogle Scholar
  18. 18.
    Ideker, T., Thorsson, V., Ranish, J.A., et al.: Integrated genomic and proteomic analyses of a systemically perturbed metabolic network. Science 292, 929–934 (2001)CrossRefGoogle Scholar
  19. 19.
    Yeung, K.Y., Medvedovic, M., Bumgarner, R.E.: Clustering gene expression data with repeated measurements. Genome Biology 4, R34 (2003)CrossRefGoogle Scholar
  20. 20.
    Iyer, V.R., Eisen, M.B., Ross, D.T., et al.: The transcriptional program in the response of human fibroblasts to serum. Science 283, 83–87 (1999)CrossRefGoogle Scholar
  21. 21.
    Xu, Y., Olman, V., Xu, D.: Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics 18, 536–545 (2002)CrossRefGoogle Scholar
  22. 22.
    Yang, C.M., Wan, B.K., Gao, X.F.: Selections of data preprocessing methods and similarity metrics for gene cluster analysis. Progress in Nature Science 16, 607–713 (2006)CrossRefGoogle Scholar
  23. 23.
    Yang, C.M., Wan, B.K., Gao, X.F.: Data preprocessing in cluster analysis of gene expression. Chin. Phys. Lett. 20, 774–777 (2003)CrossRefGoogle Scholar
  24. 24.
    Rousseuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 53–65 (1987)CrossRefGoogle Scholar
  25. 25.
    Bezdek, J.C., Nikhil, R.P.: Some new indexes of cluster validity. IEEE Transactions on systems, man, and cybernetics 28, 301–315 (1998)CrossRefGoogle Scholar
  26. 26.
    Azuaje, F.: A cluster validity framework for genome expression data. Bioinformatics 18, 319–320 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Chunmei Yang
    • 1
  • Baikun Wan
    • 1
  • Xiaofeng Gao
    • 2
  1. 1.Department of Biomedical Engineering and Scientific InstrumentationsTianjin UniversityTianjinChina
  2. 2.Motorola (China) Electronics Ltd.TianjinChina

Personalised recommendations