Cluster analysis using different correlation coefficients

Chae, Seong S.; Kim, Chansoo; Kim, Jong-Min; Warde, William D.

doi:10.1007/s00362-006-0043-2

Cluster analysis using different correlation coefficients

Regular Article
Published: 30 December 2006

Volume 49, pages 715–727, (2008)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Seong S. Chae¹,
Chansoo Kim²,
Jong-Min Kim³ &
…
William D. Warde²

146 Accesses
4 Citations
Explore all metrics

Abstract

Partitioning objects into closely related groups that have different states allows to understand the underlying structure in the data set treated. Different kinds of similarity measure with clustering algorithms are commonly used to find an optimal clustering or closely akin to original clustering. Using shrinkage-based and rank-based correlation coefficients, which are known to be robust, the recovery level of six chosen clustering algorithms is evaluated using Rand’s C values. The recovery levels using weighted likelihood estimate of correlation coefficient are obtained and compared to the results from using those correlation coefficients in applying agglomerative clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

References

Basu A, Bose S, Purkayastha S (2004) Robust discirminant analysis using weighted likelihood estimators. J Stat Comput Simulat 74:445–460
Article MATH MathSciNet Google Scholar
Basu A, Lindsay BG (1994) Minimum disparity estimation in the continuous case: Efficiency. distributions and robustness. Ann Inst Stat Math 46:683–705
Article MATH MathSciNet Google Scholar
Beran RJ (1977) Minimum Hellinger distance estimates for parametrics models. Ann Stat 5:445–463
Article MATH MathSciNet Google Scholar
Bickel DR (2003) Robust cluster analysis of microarray gene expression data with the number of clusters determined biologically. Bioinformatics 19:818–824
Article Google Scholar
Chae SS, DuBien JL, Warde WD (2006) A method of predicting the number of clusters using Rand’s statistic. Comput Stat Data Anal 50:3531–3546
Article MathSciNet Google Scholar
Cherepinsky V, Feng J, Rejali M, Mishra B (2003) Shrinkage-based similarity metric for cluster analysis of microarray data. Proc Nat Acad Sci USA 100:9668–9673
Article MATH MathSciNet Google Scholar
DuBien JL, Warde WD (1979) A mathematical comparison of the members of an infinite family of agglomerative clustering algorithms. Canad J Stat 7:29–38
Article MATH MathSciNet Google Scholar
DuBien JL, Warde WD (1987) A comparison of agglomerative clustering method with respect to noise. Commun Stat Theory Method 16:1433–1460
Article MATH MathSciNet Google Scholar
DuBien JL, Warde WD, Chae SS (2004) Moments of Rand’s C statistic in cluster analysis. Stat Prob Lett 69:243–252
Article MATH MathSciNet Google Scholar
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Nat Acad Sci USA 95:14868–14868
Article Google Scholar
Kojadinovic I (2004) Agglomerative hierarchical clustering of continuous variables based on mutual information. Comput Stat Data Anal 46:269–294
Article MathSciNet Google Scholar
Lance GN, Williams WT (1966) A generalized sorting strategy for computer classification. Nature 212:218
Article Google Scholar
Lance GN, Williams WT (1967) A general theory of classificatory sorting strategies, 1. Hierarchical systems. Comput J 9:373–380
Google Scholar
Lindsay BG (1994) Efficiency versus robustness: the case for minimum Hellinger distance and related methods. Ann Stat 22:1081–1114
Article MATH MathSciNet Google Scholar
Markatou M, Basu A, Lindsay BG (1996) Weighted likelihood estimating equations: The continuous case, Technical Report 323. Department of Statistics, Stanford University, California, USA
Markatou M, Basu A, Lindsay BG (1997) Weighted likelihood estimating equations: The discrete case with applications to logistic regression. J Stat Plann Infer 57:215–232
Article MATH MathSciNet Google Scholar
Pison G, Van Aelst, Willems G (2002) Small sample corrections for LTS and MCD. Metrika 55:111–123
Article MathSciNet Google Scholar
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850
Article Google Scholar
Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223
Article Google Scholar
Sarkar S, Kim C, Basu A (1999) Tests for homogeniety of variance using robust weighted likelihood estimates. Biometr J 41:857–871
Article MATH Google Scholar
Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Eisen MB, Brown PO, Botstein D, Futcher B (1998) Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9:3273–3297
Google Scholar
Wang N, Raftery AE (2002) Nearest-neighbor variance estimation (NNVE) robust covariance estimation via nearest-neighbor cleaning. J Am Stat Assoc 97:994–1006
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Statistics, Daejeon University, Daejeon, 300-716, Republic of Korea
Seong S. Chae
Department of Statistics, Oklahoma State University, Stillwater, OK, 74078, USA
Chansoo Kim & William D. Warde
University of Minnesota, Statistics Discipline, Morris, MN, 56267, USA
Jong-Min Kim

Authors

Seong S. Chae
View author publications
You can also search for this author in PubMed Google Scholar
Chansoo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jong-Min Kim
View author publications
You can also search for this author in PubMed Google Scholar
William D. Warde
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seong S. Chae.

Additional information

This work was supported by RIC(R) grants from Traditional and Bio-Medical Research Center, Daejeon University (RRC04713, 2005) by ITEP in Republic of Korea.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chae, S.S., Kim, C., Kim, JM. et al. Cluster analysis using different correlation coefficients. Stat Papers 49, 715–727 (2008). https://doi.org/10.1007/s00362-006-0043-2

Download citation

Received: 22 June 2005
Revised: 27 November 2006
Published: 30 December 2006
Issue Date: October 2008
DOI: https://doi.org/10.1007/s00362-006-0043-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cluster analysis using different correlation coefficients

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cluster analysis using different correlation coefficients

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation