Medical & Biological Engineering & Computing

, Volume 45, Issue 12, pp 1175–1185

An effective non-parametric method for globally clustering genes from expression profiles

Original Article


Clustering is widely used in bioinformatics to find gene correlation patterns. Although many algorithms have been proposed, these are usually confronted with difficulties in meeting the requirements of both automation and high quality. In this paper, we propose a novel algorithm for clustering genes from their expression profiles. The unique features of the proposed algorithm are twofold: it takes into consideration global, rather than local, gene correlation information in clustering processes; and it incorporates clustering quality measurement into the clustering processes to implement non-parametric, automatic and global optimal gene clustering. The evaluation on simulated and real gene data sets demonstrates the effectiveness of the algorithm.


Bioinformatics Microarray Gene expression Clustering Data mining 


  1. 1.
    Aldenderfer MS, Blashfield RK (1984) Cluster analysis. Sage Publications, Beverly HillsGoogle Scholar
  2. 2.
    Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In: Proceedings of the National Academy of Sciences of the USA Cell Biology 96:6745–6750Google Scholar
  3. 3.
    Altman RB, Raychaudhuri S (2001) Whole-genome expression analysis: challenges beyond clustering. Curr Opin Struct Biol 11(3):340–347CrossRefGoogle Scholar
  4. 4.
    Azuaje F (2003) Clustering-based approaches to discovering and visualising microarray data patterns. Brief Bioinform 4(1):31–42CrossRefGoogle Scholar
  5. 5.
    Boutros PC, Okey AB (2005) Unsupervised pattern recognition: An introduction to the whys and wherefores of clustering microarray data. Brief Bioinform 6(4):331–343CrossRefGoogle Scholar
  6. 6.
    Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. In: Proceedings of the National Academy of Sciences of the USA, Cenetics 95:14863–14868Google Scholar
  7. 7.
    Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inform Sys 17(2/3):107–145MATHCrossRefGoogle Scholar
  8. 8.
    Hathaway RJ, Bezdek JC (2003) Visual cluster validity for prototype generator clustering models. Pattern Recognition Letters 24(9–10):1563–1569MATHCrossRefGoogle Scholar
  9. 9.
    Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Englewood CliffsGoogle Scholar
  10. 10.
    MacQueens JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkley symposium on mathematical statistics and probability, vol I Statistics, pp 281–297Google Scholar
  11. 11.
    Özsu MT, Valduriez P (1991) Principle of distributed database systems. Prentice-Hall, Englewood CliffsGoogle Scholar
  12. 12.
    Raychaudhuri S, Sutphin PD, Chang JT, Altman RB (2001) Basic microarray analysis: grouping and feature reduction. Trends Biotechnol 19(5):189–193CrossRefGoogle Scholar
  13. 13.
    Sherlock G (2001) Analysis of large-scale gene expression data. Brief Bioinform 2(4):350–362CrossRefGoogle Scholar
  14. 14.
    Simon R, Radmacher MD, Dobbin K, McShane LM (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 95(1):14–18CrossRefGoogle Scholar
  15. 15.
    Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull 38:1409–1438Google Scholar
  16. 16.
    Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Fucher B (1998) Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization. Mol Biol Cell 9(12):3273–3297Google Scholar
  17. 17.
    Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a dataset via the gap statistics. J R Statist Soc B 63:411–423MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Tseng VS, Kao CP (2005) Efficiently Mining Gene Expression Data via a Novel Parameterless Clustering Method. IEEE/ACM Trans Comput Biol Bioinform 2(4):355–365CrossRefGoogle Scholar
  19. 19.
    Tseng SM, Kao CP (2003) Mining and Validating Gene Expression Patterns: An Integrated Approach and Applications. Informatica 27:21–27MathSciNetGoogle Scholar
  20. 20.
    Zhang T, Ramakrishnman R, Linvy M (1996) BIRCH: An efficient method for very large databases, ACM SIGMOD. MontrealGoogle Scholar

Copyright information

© International Federation for Medical and Biological Engineering 2007

Authors and Affiliations

  1. 1.School of Engineering and Information TechnologyDeakin UniversityBurwoodAustralia
  2. 2.The Walter and Eliza Hall Institute of Medical Research (WEHI)ParkvilleAustralia

Personalised recommendations