Abstract
K-means clustering is a popular data clustering algorithm. Principal component analysis (PCA) is a widely used statistical technique for dimension reduction. Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering, with a clear simplex cluster structure. Our results prove that PCA-based dimension reductions are particularly effective for K-means clustering. New lower bounds for K-means objective function are derived, which is the total variance minus the eigenvalues of the data covariance matrix.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fan, K.: On a theorem of Weyl concerning eigenvalnes of linear transformations. Proc. Natl. Acad. Set. USA 35, 652–655 (1949)
Hartigan, J.A., Wang, M.A.: AJC-means clnstering algorithm. Applied Statistics 28, 100–108 (1979)
Jolliffe, I.T.: Principal Component Analysis. Springer, Heidelberg (1986)
Zha, H., Ding, C., Gn, M., He, X., Simon, H.D.: Spectral relaxation for K-means clnstering. In: Advances in Neural Information Processing Systems H, pp. 1057–1064 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ding, C., He, X. (2004). Cluster Structure of K-means Clustering via Principal Component Analysis. In: Dai, H., Srikant, R., Zhang, C. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science(), vol 3056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24775-3_50
Download citation
DOI: https://doi.org/10.1007/978-3-540-24775-3_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22064-0
Online ISBN: 978-3-540-24775-3
eBook Packages: Springer Book Archive