Isotropic PCA and Affine-Invariant Clustering

Brubaker, S.Charles; Vempala, Santosh S.

doi:10.1007/978-3-540-85221-6_8

S.Charles Brubaker³ &
Santosh S. Vempala³

Part of the book series: Bolyai Society Mathematical Studies ((BSMS,volume 19))

1121 Accesses
10 Citations

Abstract

We present an extension of Principal Component Analysis (PCA) and a new algorithm for clustering points in Rⁿ based on it. The key property of the algorithm is that it is affine-invariant. When the input is a sample from a mixture of two arbitrary Gaussians, the algorithm correctly classifies the sample assuming only that the two components are separable by a hyperplane, i.e., there exists a halfspace that contains most of one Gaussian and almost none of the other in probability mass. This is nearly the best possible, improving known results substantially [15, 10, 1]. For k>2 components, the algorithm requires only that there be some (k−1)-dimensional subspace in which the overlap in every direction is small. Here we define overlap to be the ratio of the following two quantities: 1) the average squared distance between a point and the mean of its component, and 2) the average squared distance between a point and the mean of the mixture. The main result may also be stated in the language of linear discriminant analysis: if the standard Fisher discriminant [9] is small enough, labels are not needed to estimate the optimal subspace for projection. Our main tools are isotropic transformation, spectral projection and a simple reweighting technique. We call this combination isotropic PCA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. Achlioptas and F. McSherry, On spectral learning of mixtures of distributions, in: Proc. of COLT, 2005.
Google Scholar
K. Chaudhuri and S. Rao, Beyond gaussians: Spectral methods for learning mixtures of heavy-tailed distributions (2008).
Google Scholar
K. Chaudhuri and S. Rao, Learning mixtures of product distributions using corre-lations and independence (2008).
Google Scholar
A. Dasgupta, J. Hopcroft, J. Kleinberg and M. Sandler, On learning mixtures of heavy-tailed distributions, in: Proc. of FOCS (2005).
Google Scholar
S. DasGupta, Learning mixtures of gaussians, in: Proc. of FOCS (1999).
Google Scholar
A. P. Dempster, N. M. Laird and D. B. Rubin, Maximum likelihood from incom-plete data via the em algorithm, Journal of the Royal Statistical Society B, 39 (1977), 1–38.
MATH MathSciNet Google Scholar
Jon Feldman, Rocco A. Servedio and Ryan O’Donnell, Pac learning axis-aligned mixtures of gaussians with no separation assumption, in: COLT (2006), pp. 20–34.
Google Scholar
K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press (1990).
Google Scholar
R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification, John Wiley & Sons (2001).
Google Scholar
R. Kannan, H. Salmasian and S. Vempala, The spectral method for general mixture models, in: Proceedings of the 18th Conference on Learning Theory, University of California Press (2005).
Google Scholar
L. Lovász and S. Vempala, The geometry of logconcave functions and and sampling algorithms, Random Strucures and Algorithms, 30(3) (2007), 307–358.
Article MATH Google Scholar
J. B. MacQueen, Some methods for classification and analysis of multivariate ob-servations, in: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, volume 1. University of California Press (1967), pp. 281–297.
MathSciNet Google Scholar
M. Rudelson, Random vectors in the isotropic position, Journal of Functional Analysis, 164 (1999), 60–72.
Article MATH MathSciNet Google Scholar
M. Rudelson and R. Vershynin, Sampling from large matrices: An approach through geometric functional analysis, J. ACM, 54(4) (2007).
Google Scholar
R. Kannan and S. Arora, Learning mixtures of arbitrary gaussians, Ann. Appl. Probab., 15(1A) (2005), 69–92.
MathSciNet Google Scholar
L. Schulman and S. DasGupta, A two-round variant of em for gaussian mixtures, in: Sixteenth Conference on Uncertainty in Artificial Intelligence (2000).
Google Scholar
G. W. Stewart and Ji guang Sun, Matrix Perturbation Theory, Academic Press, Inc. (1990).
Google Scholar
S. Vempala and G. Wang, A spectral algorithm for learning mixtures of distributions, Proc. of FOCS 2002; JCCS, 68(4) (2004), 841–860.
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

College of Computing, Georgia Tech., USA
S.Charles Brubaker & Santosh S. Vempala

Authors

S.Charles Brubaker
View author publications
You can also search for this author in PubMed Google Scholar
Santosh S. Vempala
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Konrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB), Takustr. 7, 14195, Berlin, Germany
Martin Grötschel
Hungarian Academy of Sciences, Alfréd Rényi Institute of Mathematics, Reáltanoda u. 13-15, Budapest, 1053, Hungary
Gyula O. H. Katona & Gábor Sági &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Brubaker, S., Vempala, S.S. (2008). Isotropic PCA and Affine-Invariant Clustering. In: Grötschel, M., Katona, G.O.H., Sági, G. (eds) Building Bridges. Bolyai Society Mathematical Studies, vol 19. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85221-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-85221-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85218-6
Online ISBN: 978-3-540-85221-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics