Nonlinear and Nonparametric Extensions

Vidal, René; Ma, Yi; Sastry, S. Shankar

doi:10.1007/978-0-387-87811-9_4

René Vidal¹⁶,
Yi Ma¹⁷ &
S. Shankar Sastry¹⁸

Part of the book series: Interdisciplinary Applied Mathematics ((IAM,volume 40))

9628 Accesses

Abstract

In the previous chapters, we studied the problem of fitting a low-dimensional linear or affine subspace to a collection of points. In practical applications, however, a linear or affine subspace may not be able to capture nonlinear structures in the data. For instance, consider the set of all images of a face obtained by rotating it about its main axis of symmetry. While all such images live in a high-dimensional space whose dimension is the number of pixels, there is only one degree of freedom in the data, namely the angle of rotation. In fact, the space of all such images is a one-dimensional circle embedded in a high-dimensional space, whose structure is not well captured by a one-dimensional line. More generally, a collection of face images observed from different viewpoints is not well approximated by a single linear or affine subspace, as illustrated in the following example.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In principle, we should use the notation \(\hat{\Sigma }_{\phi (\boldsymbol{x})}\) to indicate that it is the estimate of the actual covariance matrix. But for simplicity, we will drop the hat in the sequel and simply use \(\Sigma _{\phi (\boldsymbol{x})}\). The same goes for the eigenvectors and the principal components.
2.
The remaining M − N eigenvectors of \(\Phi \Phi ^{\top }\) are associated with the eigenvalue zero.
3.
In PCA, we center the data by subtracting its mean. Here, we first subtract the mean of the embedded data and then compute the kernel, whence the name centered kernel.
4.
In PCA, if X is the data matrix, then XJ is the centered (mean-subtracted) data matrix.
5.
“Almost every” means except for a set of measure zero.
6.
See (Davison 1983) for alternative optimization methods for minimizing the objective in (4.32).
7.
Notice that \(A = JX^{\top }XJ\), where \(J = I - \frac{1} {N}\boldsymbol{1}\boldsymbol{1}^{\top }\) is the centering matrix.
8.
By scaled low-dimensional representation we mean replacing \(\boldsymbol{y}_{j}\) by \(d_{jj}\boldsymbol{y}_{j}\).
9.
As we will see in Chapters 7 and 8, spectral clustering methods will play a crucial role in many approaches to subspace clustering.
10.
Notice that the above objective is very much related to the MAP-EM algorithm for a mixture of isotropic Gaussians discussed in Appendix B.3.2.
11.
AT&T Laboratories, Cambridge, http://www.cl.cam.ac.uk/Research/DTG/attarchive/facedatabase.html.
12.
A graph is connected when there is a path between every pair of vertices.
13.
This constraint is needed to prevent the trivial solution \(U =\boldsymbol{ 0}\). Alternatively, we could enforce \(U^{\top }U = \mbox{ diag}(\vert \mathcal{G}_{1}\vert,\vert \mathcal{G}_{2}\vert,\ldots,\vert \mathcal{G}_{n}\vert )\). However, this is impossible, because we do not know \(\vert \mathcal{G}_{i}\vert\).

References

Belkin, M., & Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Proceedings of Neural Information Processing Systems (NIPS) (pp. 585–591).
Google Scholar
Bottou, L., & Bengio, J. (1995). Convergence properties of the k-means algorithms. In Neural Information Processing and Systems.
Google Scholar
Burges, C. (2005). Geometric methods for feature extraction and dimensional reduction - a guided tour. In The data mining and knowledge discovery handbook (pp. 59–92). Boston: Kluwer Academic.
Chapter Google Scholar
Burges, C. J. C. (2010). Dimension reduction: A guided tour. Foundations and Trends in Machine Learning, 2(4), 275–365.
Google Scholar
Chung, F. (1997). Spectral graph theory. Washington: Conference Board of the Mathematical Sciences.
MATH Google Scholar
Cox, T. F., & Cox, M. A. A. (1994). Multidimensional scaling. London: Chapman and Hall.
Google Scholar
Davis, C., & Cahan, W. (1970). The rotation of eigenvectors by a pertubation. SIAM Journal on Numerical Analysis, 7(1), 1–46.
Article MathSciNet Google Scholar
Davison, M. (1983). Multidimensional Scaling. New York: Wiley.
MATH Google Scholar
Donoho, D., & Grimes, C. (2003). Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. National Academy of Sciences, 100(10), 5591–5596.
Article MathSciNet MATH Google Scholar
Forgy, E. (1965). Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications (abstract). Biometrics, 21, 768–769.
Google Scholar
Gower, J. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, 53, 325–338.
Article MathSciNet MATH Google Scholar
Hastie, T. (1984). Principal curves and surfaces. Technical Report, Stanford University.
Google Scholar
Hastie, T., & Stuetzle, W. (1989). Principal curves. Journal of the American Statistical Association, 84(406), 502–516.
Article MathSciNet MATH Google Scholar
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441.
Article MATH Google Scholar
Jancey, R. (1966). Multidimensional group analysis. Australian Journal of Botany, 14, 127–130.
Article Google Scholar
Kruskal, J. (1964). Nonmetric multidimensional scaling: A numerical method. Psychometrika.
Google Scholar
Lee, J. A., & Verleysen, M. (2007). Nonlinear Dimensionality Reduction (1st ed.). New York: Springer.
Google Scholar
Lloyd, S. (1957). Least squares quantization in PCM. Technical Report. Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory, 28, 128–137.
Google Scholar
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297).
Google Scholar
Mercer, J. (1909). Functions of positive and negative types and their connection with the theory of integral equations. Philosophical Transactions, Royal Society London, A, 209(1909), 415–446.
Article MATH Google Scholar
Ng, A., Weiss, Y., & Jordan, M. (2001). On spectral clustering: Analysis and an algorithm. In Proceedings of Neural Information Processing Systems (NIPS) (pp. 849–856).
Google Scholar
Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.
Article Google Scholar
Roweis, S., & Saul, L. (2003). Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4, 119–155.
MathSciNet MATH Google Scholar
Schölkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT Press.
MATH Google Scholar
Schölkopf, B., Smola, A., & Muller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319.
Google Scholar
Selim, S., & Ismail, M. A. (1984). K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transaction on Pattern Analysis and Machine Intelligence, 6(1), 81–87.
Google Scholar
Sha, F., & Saul, L. (2005). Analysis and extension of spectral methods for nonlinear dimensionality reduction. In Proceedings of International Conference on Machine Learning (pp. 784–791).
Google Scholar
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Article Google Scholar
Shi, T., Belkin, M., & Yin, B. (2008). Data spectroscopy: Eigenspace of convolution operators and clustering. arXiv:0807.3719v1.
Google Scholar
Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.
Google Scholar
Torgerson, W. (1958). Theory and Methods of Scaling. New York: Wiley.
Google Scholar
von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.
Article MathSciNet Google Scholar
Weinberger, K. Q., & Saul, L. (2004). Unsupervised learning of image manifolds by semidefinite programming. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 988–955).
Google Scholar
Williams, C. (2002). On a connection between kernel PCA and metric multidimensional scaling. Machine Learning, 46, 11–19.
Article MATH Google Scholar
Zhang, Z., & Zha, H. (2005). Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal on Scientific Computing, 26(1), 313–338.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Center for Imaging Science Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
René Vidal
School of Information Science and Technology ShanghaiTech University, Shanghai, China
Yi Ma
Department of Electrical Engineering and Computer Science, University of California Berkeley, Berkeley, CA, USA
S. Shankar Sastry

Authors

René Vidal
View author publications
You can also search for this author in PubMed Google Scholar
Yi Ma
View author publications
You can also search for this author in PubMed Google Scholar
S. Shankar Sastry
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vidal, R., Ma, Y., Sastry, S.S. (2016). Nonlinear and Nonparametric Extensions. In: Generalized Principal Component Analysis. Interdisciplinary Applied Mathematics, vol 40. Springer, New York, NY. https://doi.org/10.1007/978-0-387-87811-9_4

Download citation

DOI: https://doi.org/10.1007/978-0-387-87811-9_4
Published: 12 April 2016
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-87810-2
Online ISBN: 978-0-387-87811-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics