Cluster Structure of K-means Clustering via Principal Component Analysis

Ding, Chris; He, Xiaofeng

doi:10.1007/978-3-540-24775-3_50

Chris Ding¹⁹ &
Xiaofeng He¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3056))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3131 Accesses
27 Citations

Abstract

K-means clustering is a popular data clustering algorithm. Principal component analysis (PCA) is a widely used statistical technique for dimension reduction. Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering, with a clear simplex cluster structure. Our results prove that PCA-based dimension reductions are particularly effective for K-means clustering. New lower bounds for K-means objective function are derived, which is the total variance minus the eigenvalues of the data covariance matrix.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fan, K.: On a theorem of Weyl concerning eigenvalnes of linear transformations. Proc. Natl. Acad. Set. USA 35, 652–655 (1949)
Article Google Scholar
Hartigan, J.A., Wang, M.A.: AJC-means clnstering algorithm. Applied Statistics 28, 100–108 (1979)
Article MATH Google Scholar
Jolliffe, I.T.: Principal Component Analysis. Springer, Heidelberg (1986)
Google Scholar
Zha, H., Ding, C., Gn, M., He, X., Simon, H.D.: Spectral relaxation for K-means clnstering. In: Advances in Neural Information Processing Systems H, pp. 1057–1064 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, US
Chris Ding & Xiaofeng He

Authors

Chris Ding
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng He
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering and Information Technology, Deakin University, VIC 3125, Australia
Honghua Dai
University of Illinois at Urbana-Champaign, 61801, Urbana, IL, USA
Ramakrishnan Srikant
Faculty of Engineering and Information Technology, Centre for Quantum Computation and Intelligent Systems, and Australian ACS National Committee for Artificial Intelligence, University of Technology, Sydney, Australia
Chengqi Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, C., He, X. (2004). Cluster Structure of K-means Clustering via Principal Component Analysis. In: Dai, H., Srikant, R., Zhang, C. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science(), vol 3056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24775-3_50

Download citation

DOI: https://doi.org/10.1007/978-3-540-24775-3_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22064-0
Online ISBN: 978-3-540-24775-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics