Setting the Number of Clusters in K-Means Clustering

Huh, Myung-Hoe

doi:10.1007/978-4-431-68544-9_5

Myung-Hoe Huh⁴

184 Accesses
1 Citations

Summary

K-means clustering is an efficient non-hierarchical clustering method, which became widely used in data mining. In applying the method, however, one needs to specify k,the number of clusters, a priori. In this short paper, we propose an exploratory procedure for setting k using Euclidean and/or Mahalanobis inter-point distances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

K-Strange Points Clustering Algorithm

cs-means: Determining optimal number of clusters based on a level-of-similarity

Article 06 October 2020

A Comparative Study on k-means Clustering Method and Analysis

References

Art, D., Gnanadesikan, R., and Kettenring, J. R. (1982). Data-based metrics for cluster analysis, Utilitas Mathematica, 21A, 75–99.
MathSciNet Google Scholar
Bensmail, H. and Meulman, J. J. (1998). MCMC inference for modelbased cluster analysis, Advances in Data Science and Classification, edited by Rizzi,A. and Vichi, M., Berlin: Springer.
Google Scholar
Campbell, M. A. and Mahon, R. J. (1974). A multivariate study of variation in two species of rock crab of genus Leptograpsus, Australian Journal of Zoology, 22, 417–425.
Article Google Scholar
Everitt, B. S. and Dunn, G. (1991). Applied Multivariate Data Analysis. London: Edward Arnold.
MATH Google Scholar
Huh, Myung-Hoe (2000). Double K-means clustering, Unpublished manuscript (Submitted to Korean Journal of Applied Statistics, Written in Korean).
Google Scholar
Jin, Seohoon (1999). A Study of the Partitioning Method for Cluster Analysis. Doctoral Thesis, Dept. of Statistics, Korea University. Seoul, Korea.
Google Scholar
McLachlan, G. and Basford, K. (1988). Mixture Models: Inference and Applications to Clustering. New York: Macel Dekker.
MATH Google Scholar
Milligan, G. W. and Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set, Psychometrika, 50, 159179.
Google Scholar
Peck, R., Fisher, L., and Van Ness, J. (1989). Approximate confidence intervals for the number of clusters, Journal of the American Statistical Association, 84, 184–191.
Article MathSciNet MATH Google Scholar
Ripley, R. D. (1996). Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press.
MATH Google Scholar
Rost, D. (1995). A simulation study of the weighted -means cluster procedure, Journal of Statistical Computing and Simulation, 53, 51–63.
Article MATH Google Scholar
Sarle, W. S. (1983). Cubic Clustering Criterion, Technical Report A-108. SAS Institute, NC: Cary.
Google Scholar
SAS Institute (1990). SAS/STAT User’s Guide (Vol. 1), Version 6 Fourth Edition. SAS Institute, NC: Cary.
Google Scholar
Sharma, S. (1996). Applied Multivariate Techniques. New York: Wiley. SPSS Inc. (1997). SPSS 7. 5 Statistical Algorithms. Chicago: SPSS Inc.
Google Scholar
Trejos, J., Murillo, A., and Piza, E. (1998). Global stochastic optimization techniques applied to partitioning, Advances in Data Science and Classification, edited by Rizzi, A. and Vichi, M., Berlin: Springer.
Google Scholar
Wong, M. A. (1982). A hybrid clustering method for identifying high-density clusters, Journal of the American Statistical Association, 77, 841–847.
Article MathSciNet MATH Google Scholar
Wong, M. A., and Lane, T. (1983). A kth nearest neighbor clustering procedure, Journal of the Royal Statistical Society (Series B), 45, 362–368.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Statistics, Korea University, 136-701, Anam-Dong 5-1, Seoul, Korea
Myung-Hoe Huh

Authors

Myung-Hoe Huh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Information on Statistical Sciences, The Institute of Statistical Mathematics, 4-6-7 Minami Azabu, 106-8569, Minato-ku, Tokyo, Japan
Yasumasa Baba (Professor) & Koji Kanefuji (Associate Professor) (Professor) & (Associate Professor)
School of Industrial and Systems Engineering, Georgia Institute of Technology, 30332-0205, Atlanta, GA, USA
Anthony J. Hayter (Associate Professor) (Associate Professor)
Department of Fundamental Statistical Theory, The Institute of Statistical Mathematics, 4-6-7 Minami Azabu, 106-8569, Minato-ku, Tokyo, Japan
Satoshi Kuriki (Associate Professor) (Associate Professor)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huh, MH. (2002). Setting the Number of Clusters in K-Means Clustering. In: Baba, Y., Hayter, A.J., Kanefuji, K., Kuriki, S. (eds) Recent Advances in Statistical Research and Data Analysis. Springer, Tokyo. https://doi.org/10.1007/978-4-431-68544-9_5

Download citation

DOI: https://doi.org/10.1007/978-4-431-68544-9_5
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-68546-3
Online ISBN: 978-4-431-68544-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Setting the Number of Clusters in K-Means Clustering

Summary

Access this chapter

Preview

Similar content being viewed by others

K-Strange Points Clustering Algorithm

cs-means: Determining optimal number of clusters based on a level-of-similarity

A Comparative Study on k-means Clustering Method and Analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Setting the Number of Clusters in K-Means Clustering

Summary

Access this chapter

Preview

Similar content being viewed by others

K-Strange Points Clustering Algorithm

cs-means: Determining optimal number of clusters based on a level-of-similarity

A Comparative Study on k-means Clustering Method and Analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation