Skip to main content
Log in

Cluster analysis using different correlation coefficients

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

Partitioning objects into closely related groups that have different states allows to understand the underlying structure in the data set treated. Different kinds of similarity measure with clustering algorithms are commonly used to find an optimal clustering or closely akin to original clustering. Using shrinkage-based and rank-based correlation coefficients, which are known to be robust, the recovery level of six chosen clustering algorithms is evaluated using Rand’s C values. The recovery levels using weighted likelihood estimate of correlation coefficient are obtained and compared to the results from using those correlation coefficients in applying agglomerative clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Basu A, Bose S, Purkayastha S (2004) Robust discirminant analysis using weighted likelihood estimators. J Stat Comput Simulat 74:445–460

    Article  MATH  MathSciNet  Google Scholar 

  • Basu A, Lindsay BG (1994) Minimum disparity estimation in the continuous case: Efficiency. distributions and robustness. Ann Inst Stat Math 46:683–705

    Article  MATH  MathSciNet  Google Scholar 

  • Beran RJ (1977) Minimum Hellinger distance estimates for parametrics models. Ann Stat 5:445–463

    Article  MATH  MathSciNet  Google Scholar 

  • Bickel DR (2003) Robust cluster analysis of microarray gene expression data with the number of clusters determined biologically. Bioinformatics 19:818–824

    Article  Google Scholar 

  • Chae SS, DuBien JL, Warde WD (2006) A method of predicting the number of clusters using Rand’s statistic. Comput Stat Data Anal 50:3531–3546

    Article  MathSciNet  Google Scholar 

  • Cherepinsky V, Feng J, Rejali M, Mishra B (2003) Shrinkage-based similarity metric for cluster analysis of microarray data. Proc Nat Acad Sci USA 100:9668–9673

    Article  MATH  MathSciNet  Google Scholar 

  • DuBien JL, Warde WD (1979) A mathematical comparison of the members of an infinite family of agglomerative clustering algorithms. Canad J Stat 7:29–38

    Article  MATH  MathSciNet  Google Scholar 

  • DuBien JL, Warde WD (1987) A comparison of agglomerative clustering method with respect to noise. Commun Stat Theory Method 16:1433–1460

    Article  MATH  MathSciNet  Google Scholar 

  • DuBien JL, Warde WD, Chae SS (2004) Moments of Rand’s C statistic in cluster analysis. Stat Prob Lett 69:243–252

    Article  MATH  MathSciNet  Google Scholar 

  • Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Nat Acad Sci USA 95:14868–14868

    Article  Google Scholar 

  • Kojadinovic I (2004) Agglomerative hierarchical clustering of continuous variables based on mutual information. Comput Stat Data Anal 46:269–294

    Article  MathSciNet  Google Scholar 

  • Lance GN, Williams WT (1966) A generalized sorting strategy for computer classification. Nature 212:218

    Article  Google Scholar 

  • Lance GN, Williams WT (1967) A general theory of classificatory sorting strategies, 1. Hierarchical systems. Comput J 9:373–380

    Google Scholar 

  • Lindsay BG (1994) Efficiency versus robustness: the case for minimum Hellinger distance and related methods. Ann Stat 22:1081–1114

    Article  MATH  MathSciNet  Google Scholar 

  • Markatou M, Basu A, Lindsay BG (1996) Weighted likelihood estimating equations: The continuous case, Technical Report 323. Department of Statistics, Stanford University, California, USA

  • Markatou M, Basu A, Lindsay BG (1997) Weighted likelihood estimating equations: The discrete case with applications to logistic regression. J Stat Plann Infer 57:215–232

    Article  MATH  MathSciNet  Google Scholar 

  • Pison G, Van Aelst, Willems G (2002) Small sample corrections for LTS and MCD. Metrika 55:111–123

    Article  MathSciNet  Google Scholar 

  • Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850

    Article  Google Scholar 

  • Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223

    Article  Google Scholar 

  • Sarkar S, Kim C, Basu A (1999) Tests for homogeniety of variance using robust weighted likelihood estimates. Biometr J 41:857–871

    Article  MATH  Google Scholar 

  • Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Eisen MB, Brown PO, Botstein D, Futcher B (1998) Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9:3273–3297

    Google Scholar 

  • Wang N, Raftery AE (2002) Nearest-neighbor variance estimation (NNVE) robust covariance estimation via nearest-neighbor cleaning. J Am Stat Assoc 97:994–1006

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seong S. Chae.

Additional information

This work was supported by RIC(R) grants from Traditional and Bio-Medical Research Center, Daejeon University (RRC04713, 2005) by ITEP in Republic of Korea.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chae, S.S., Kim, C., Kim, JM. et al. Cluster analysis using different correlation coefficients. Stat Papers 49, 715–727 (2008). https://doi.org/10.1007/s00362-006-0043-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-006-0043-2

Keywords

Navigation