A parametric k-means algorithm

Tarpey, Thaddeus

doi:10.1007/s00180-007-0022-7

A parametric k-means algorithm

Original Paper
Published: 01 February 2007

Volume 22, pages 71–89, (2007)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Thaddeus Tarpey¹

321 Accesses
25 Citations
Explore all metrics

Abstract

The k points that optimally represent a distribution (usually in terms of a squared error loss) are called the k principal points. This paper presents a computationally intensive method that automatically determines the principal points of a parametric distribution. Cluster means from the k-means algorithm are nonparametric estimators of principal points. A parametric k-means approach is introduced for estimating principal points by running the k-means algorithm on a very large simulated data set from a distribution whose parameters are estimated using maximum likelihood. Theoretical and simulation results are presented comparing the parametric k-means algorithm to the usual k-means algorithm and an example on determining sizes of gas masks is used to illustrate the parametric k-means algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abraham C, Cornillon PA, Matzner-Lober E, Molinari N (2003) Unsupervised curve clustering using B-splines. Scand J Stat 30:1–15
Article MathSciNet Google Scholar
Connor R (1972) Grouping for testing trends in categorical data. J Am Stat Assoc 67:601–604
Article MATH Google Scholar
Cox DR (1957) A note on grouping. J Am Stat Assoc 52:543–547
Article MATH Google Scholar
Dalenius T (1950) The problem of optimum stratification. Skandinavisk Aktuarietidskrift 33: 203–213
MathSciNet Google Scholar
Dalenius T, Gurney M (1951) The problem of optimum stratification ii. Skandinavisk Aktuarietidskrift 34:133–148
MathSciNet Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Am Stat Assoc 39:1–38
MATH MathSciNet Google Scholar
Eubank RL (1988) Optimal grouping, spacing, stratification, and piecewise constant approximation. Siam Rev 30:404–420
Article MATH MathSciNet Google Scholar
Fang K, He S (1982) The problem of selecting a given number of representative points in a normal population and a generalized mill’s ratio. Technical report, Department of Statistics, Stanford University
Fang KT, Kotz S, Ng KW (1990) Symmetric multivariate and related distributions. Chapman and Hall, London
MATH Google Scholar
Flury B (1990) Principal points. Biometrika 77:33–41
Article MATH MathSciNet Google Scholar
Flury B (1993) Estimation of principal points. Appl Stat 42:139–151
Article MATH MathSciNet Google Scholar
Flury B (1997) A first course in multivariate statistics. Springer, New York
MATH Google Scholar
Flury BD, Tarpey T (1993) Representing a large collection of curves: a case for principal points. Am Stat 47:304–306
Article Google Scholar
Graf L Luschgy H (2000) Foundations of quantization for probability distributions. Springer, Berlin
Google Scholar
Gu XN, Mathew T (2001) Some characterizations of symmetric two-principal points. J Stat Plann Infer 98:29–37
Article MATH MathSciNet Google Scholar
Hand DJ, Krzanowski WJ (2005) Optimising k-means clustering results with standard software packages. Comput Stat Data Anal 49:969–973
Article MathSciNet MATH Google Scholar
Hartigan JA (1975) Clustering algorithms. Wiley, New York
MATH Google Scholar
Hartigan JA, Wong MA (1979) A k-means clustering algorithm. Appl Stat 28:100–108
Article MATH Google Scholar
Iyengar S, Solomon H (1983) Selecting representative points in normal populations. In recent advances in statistics: papers in honor of Herman chernoff on his 60th Birthday, Academic, New York, pp 579–591
James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98: 397–408
Article MATH MathSciNet Google Scholar
Li L, Flury B (1995) Uniqueness of principal points for univariate distributions. Stat Probab Lett 25:323–327
Article MATH MathSciNet Google Scholar
Luschgy H, Pagés G (2002) Functional quantization of Gaussian processes. J Func Anal 196:486–531
Article MATH Google Scholar
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In:Proceedings 5th Berkeley symposium on mathematics, statistics and probability 3:281–297
McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York
MATH Google Scholar
Mease D, Nair VN, Sudjianto A (2004) Selective assembly in manufacturing: statistical issues and optimal binning strategies. Technometrics 46:165–175
Article MathSciNet Google Scholar
Pollard D (1981) Strong consistency of k-means clustering. Ann Stat 9:135–140
MATH MathSciNet Google Scholar
Pollard D (1982) A central limit theorem for k-means clustering. Ann Probab 10:919–926
MATH MathSciNet Google Scholar
Pötzelberger K, Felsenstein K (1994) An asymptotic result on principal points for univariate distributions. Optimization 28:397–406
MATH MathSciNet Google Scholar
R Development Core Team (2003) R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. ISBN 3-900051-00-3
Ramsay JO, Silverman BW (1997) Functional data analysis. Springer, New York
MATH Google Scholar
Rowe S (1996) An algorithm for computing principal points with respect to a loss function in the unidimensional case. Stat Comput 6:187–190
Article Google Scholar
Stampfer E, Stadlober E (2002) Methods for estimating principal points. Commun Stat—Ser B, Simul Comput 31:261–277
Article MATH MathSciNet Google Scholar
Su Y (1997) On the asymptotics of qunatizers in two dimensions. J Multivariate Anal 61:67–85
Article MATH MathSciNet Google Scholar
Sugar C, James G (2003) Finding the number of clusters in a data set: an information theoretic approach. J Am Stat Assoc 98:750–763
Article MATH MathSciNet Google Scholar
Tarpey T (1994) Two principal points of symmetric, strongly unimodal distributions. Stat Probab Lett 20:253–257
Article MATH MathSciNet Google Scholar
Tarpey T (1995) Principal points and self–consistent points of symmetric multivariate distributions. J Multivariate Anal 53:39–51
Article MATH MathSciNet Google Scholar
Tarpey T (1997) Estimating principal points of univariate distributions. J Appl Stat 24:499–512
Article Google Scholar
Tarpey T (1998) Self-consistent patterns for symmetric multivariate distributions. J Class 15:57–79
Article MATH Google Scholar
Tarpey T, Flury B (1996) Self-consistency: a fundamental concept in statistics. Stat Sci 11:229–243
Article MATH MathSciNet Google Scholar
Tarpey T, Kinateder KJ (2003) Clustering functional data. J Class 20:93–114
Article MATH MathSciNet Google Scholar
Tarpey T, Li L, Flury B (1995) Principal points and self–consistent points of elliptical distributions. Ann Stat 23:103–112
MATH MathSciNet Google Scholar
Tarpey T, Petkova E, Ogden RT (2003) Profiling placebo responders by self-consistent partitions of functional data. J Am Stat Assoc 98:850–858
Article MathSciNet Google Scholar
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York
MATH Google Scholar
Wei GCG, Tanner MA (1990) A Monte Carlo implementation of the EM algorithm and the poor man’s algorithms. J Am Stat Assoc 85:699–704
Article Google Scholar
Yamamoto W, Shinozaki N (2000a) On uniqueness of two principal points for univariate location mixtures. Stat Probab Lett 46:33–42
Article MATH MathSciNet Google Scholar
Yamamoto W, Shinozaki N (2000b) Two principal points for multivariate location mixtures of distributions. J Japan Stat Soc 30:53–63
MATH MathSciNet Google Scholar
Zoppé A (1995) Principal points of univariate continuous distributions. Stat Comput 5:127–132
Article Google Scholar
Zoppé A (1997) On uniqueness and symmetry of self-consistent points of univariate continuous distributions. J Class 14:147–158
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Wright State University, Dayton, OH, USA
Thaddeus Tarpey

Authors

Thaddeus Tarpey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thaddeus Tarpey.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tarpey, T. A parametric k-means algorithm. Computational Statistics 22, 71–89 (2007). https://doi.org/10.1007/s00180-007-0022-7

Download citation

Accepted: 14 December 2006
Published: 01 February 2007
Issue Date: April 2007
DOI: https://doi.org/10.1007/s00180-007-0022-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A parametric k-means algorithm

Abstract

Access this article

Similar content being viewed by others

On a Robust Approach to Search for Cluster Centers

Clustering Large Datasets by Merging K-Means Solutions

Interpretation and optimization of the k-means algorithm

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A parametric k-means algorithm

Abstract

Access this article

Similar content being viewed by others

On a Robust Approach to Search for Cluster Centers

Clustering Large Datasets by Merging K-Means Solutions

Interpretation and optimization of the k-means algorithm

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation