Abstract
Partitioning objects into closely related groups that have different states allows to understand the underlying structure in the data set treated. Different kinds of similarity measure with clustering algorithms are commonly used to find an optimal clustering or closely akin to original clustering. Using shrinkage-based and rank-based correlation coefficients, which are known to be robust, the recovery level of six chosen clustering algorithms is evaluated using Rand’s C values. The recovery levels using weighted likelihood estimate of correlation coefficient are obtained and compared to the results from using those correlation coefficients in applying agglomerative clustering algorithms.
Similar content being viewed by others
References
Basu A, Bose S, Purkayastha S (2004) Robust discirminant analysis using weighted likelihood estimators. J Stat Comput Simulat 74:445–460
Basu A, Lindsay BG (1994) Minimum disparity estimation in the continuous case: Efficiency. distributions and robustness. Ann Inst Stat Math 46:683–705
Beran RJ (1977) Minimum Hellinger distance estimates for parametrics models. Ann Stat 5:445–463
Bickel DR (2003) Robust cluster analysis of microarray gene expression data with the number of clusters determined biologically. Bioinformatics 19:818–824
Chae SS, DuBien JL, Warde WD (2006) A method of predicting the number of clusters using Rand’s statistic. Comput Stat Data Anal 50:3531–3546
Cherepinsky V, Feng J, Rejali M, Mishra B (2003) Shrinkage-based similarity metric for cluster analysis of microarray data. Proc Nat Acad Sci USA 100:9668–9673
DuBien JL, Warde WD (1979) A mathematical comparison of the members of an infinite family of agglomerative clustering algorithms. Canad J Stat 7:29–38
DuBien JL, Warde WD (1987) A comparison of agglomerative clustering method with respect to noise. Commun Stat Theory Method 16:1433–1460
DuBien JL, Warde WD, Chae SS (2004) Moments of Rand’s C statistic in cluster analysis. Stat Prob Lett 69:243–252
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Nat Acad Sci USA 95:14868–14868
Kojadinovic I (2004) Agglomerative hierarchical clustering of continuous variables based on mutual information. Comput Stat Data Anal 46:269–294
Lance GN, Williams WT (1966) A generalized sorting strategy for computer classification. Nature 212:218
Lance GN, Williams WT (1967) A general theory of classificatory sorting strategies, 1. Hierarchical systems. Comput J 9:373–380
Lindsay BG (1994) Efficiency versus robustness: the case for minimum Hellinger distance and related methods. Ann Stat 22:1081–1114
Markatou M, Basu A, Lindsay BG (1996) Weighted likelihood estimating equations: The continuous case, Technical Report 323. Department of Statistics, Stanford University, California, USA
Markatou M, Basu A, Lindsay BG (1997) Weighted likelihood estimating equations: The discrete case with applications to logistic regression. J Stat Plann Infer 57:215–232
Pison G, Van Aelst, Willems G (2002) Small sample corrections for LTS and MCD. Metrika 55:111–123
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850
Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223
Sarkar S, Kim C, Basu A (1999) Tests for homogeniety of variance using robust weighted likelihood estimates. Biometr J 41:857–871
Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Eisen MB, Brown PO, Botstein D, Futcher B (1998) Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9:3273–3297
Wang N, Raftery AE (2002) Nearest-neighbor variance estimation (NNVE) robust covariance estimation via nearest-neighbor cleaning. J Am Stat Assoc 97:994–1006
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by RIC(R) grants from Traditional and Bio-Medical Research Center, Daejeon University (RRC04713, 2005) by ITEP in Republic of Korea.
Rights and permissions
About this article
Cite this article
Chae, S.S., Kim, C., Kim, JM. et al. Cluster analysis using different correlation coefficients. Stat Papers 49, 715–727 (2008). https://doi.org/10.1007/s00362-006-0043-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-006-0043-2