Skip to main content

Nonlinear and Nonparametric Extensions

  • Chapter
  • First Online:
Generalized Principal Component Analysis

Part of the book series: Interdisciplinary Applied Mathematics ((IAM,volume 40))

  • 9628 Accesses

Abstract

In the previous chapters, we studied the problem of fitting a low-dimensional linear or affine subspace to a collection of points. In practical applications, however, a linear or affine subspace may not be able to capture nonlinear structures in the data. For instance, consider the set of all images of a face obtained by rotating it about its main axis of symmetry. While all such images live in a high-dimensional space whose dimension is the number of pixels, there is only one degree of freedom in the data, namely the angle of rotation. In fact, the space of all such images is a one-dimensional circle embedded in a high-dimensional space, whose structure is not well captured by a one-dimensional line. More generally, a collection of face images observed from different viewpoints is not well approximated by a single linear or affine subspace, as illustrated in the following example.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In principle, we should use the notation \(\hat{\Sigma }_{\phi (\boldsymbol{x})}\) to indicate that it is the estimate of the actual covariance matrix. But for simplicity, we will drop the hat in the sequel and simply use \(\Sigma _{\phi (\boldsymbol{x})}\). The same goes for the eigenvectors and the principal components.

  2. 2.

    The remaining MN eigenvectors of \(\Phi \Phi ^{\top }\) are associated with the eigenvalue zero.

  3. 3.

    In PCA, we center the data by subtracting its mean. Here, we first subtract the mean of the embedded data and then compute the kernel, whence the name centered kernel.

  4. 4.

    In PCA, if X is the data matrix, then XJ is the centered (mean-subtracted) data matrix.

  5. 5.

    “Almost every” means except for a set of measure zero.

  6. 6.

    See (Davison 1983) for alternative optimization methods for minimizing the objective in (4.32).

  7. 7.

    Notice that \(A = JX^{\top }XJ\), where \(J = I - \frac{1} {N}\boldsymbol{1}\boldsymbol{1}^{\top }\) is the centering matrix.

  8. 8.

    By scaled low-dimensional representation we mean replacing \(\boldsymbol{y}_{j}\) by \(d_{jj}\boldsymbol{y}_{j}\).

  9. 9.

    As we will see in Chapters 7 and 8, spectral clustering methods will play a crucial role in many approaches to subspace clustering.

  10. 10.

    Notice that the above objective is very much related to the MAP-EM algorithm for a mixture of isotropic Gaussians discussed in Appendix B.3.2.

  11. 11.

    AT&T Laboratories, Cambridge, http://www.cl.cam.ac.uk/Research/DTG/attarchive/facedatabase.html.

  12. 12.

    A graph is connected when there is a path between every pair of vertices.

  13. 13.

    This constraint is needed to prevent the trivial solution \(U =\boldsymbol{ 0}\). Alternatively, we could enforce \(U^{\top }U = \mbox{ diag}(\vert \mathcal{G}_{1}\vert,\vert \mathcal{G}_{2}\vert,\ldots,\vert \mathcal{G}_{n}\vert )\). However, this is impossible, because we do not know \(\vert \mathcal{G}_{i}\vert\).

References

  • Belkin, M., & Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Proceedings of Neural Information Processing Systems (NIPS) (pp. 585–591).

    Google Scholar 

  • Bottou, L., & Bengio, J. (1995). Convergence properties of the k-means algorithms. In Neural Information Processing and Systems.

    Google Scholar 

  • Burges, C. (2005). Geometric methods for feature extraction and dimensional reduction - a guided tour. In The data mining and knowledge discovery handbook (pp. 59–92). Boston: Kluwer Academic.

    Chapter  Google Scholar 

  • Burges, C. J. C. (2010). Dimension reduction: A guided tour. Foundations and Trends in Machine Learning, 2(4), 275–365.

    Google Scholar 

  • Chung, F. (1997). Spectral graph theory. Washington: Conference Board of the Mathematical Sciences.

    MATH  Google Scholar 

  • Cox, T. F., & Cox, M. A. A. (1994). Multidimensional scaling. London: Chapman and Hall.

    Google Scholar 

  • Davis, C., & Cahan, W. (1970). The rotation of eigenvectors by a pertubation. SIAM Journal on Numerical Analysis, 7(1), 1–46.

    Article  MathSciNet  Google Scholar 

  • Davison, M. (1983). Multidimensional Scaling. New York: Wiley.

    MATH  Google Scholar 

  • Donoho, D., & Grimes, C. (2003). Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. National Academy of Sciences, 100(10), 5591–5596.

    Article  MathSciNet  MATH  Google Scholar 

  • Forgy, E. (1965). Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications (abstract). Biometrics, 21, 768–769.

    Google Scholar 

  • Gower, J. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, 53, 325–338.

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie, T. (1984). Principal curves and surfaces. Technical Report, Stanford University.

    Google Scholar 

  • Hastie, T., & Stuetzle, W. (1989). Principal curves. Journal of the American Statistical Association, 84(406), 502–516.

    Article  MathSciNet  MATH  Google Scholar 

  • Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441.

    Article  MATH  Google Scholar 

  • Jancey, R. (1966). Multidimensional group analysis. Australian Journal of Botany, 14, 127–130.

    Article  Google Scholar 

  • Kruskal, J. (1964). Nonmetric multidimensional scaling: A numerical method. Psychometrika.

    Google Scholar 

  • Lee, J. A., & Verleysen, M. (2007). Nonlinear Dimensionality Reduction (1st ed.). New York: Springer.

    Google Scholar 

  • Lloyd, S. (1957). Least squares quantization in PCM. Technical Report. Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory, 28, 128–137.

    Google Scholar 

  • MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297).

    Google Scholar 

  • Mercer, J. (1909). Functions of positive and negative types and their connection with the theory of integral equations. Philosophical Transactions, Royal Society London, A, 209(1909), 415–446.

    Article  MATH  Google Scholar 

  • Ng, A., Weiss, Y., & Jordan, M. (2001). On spectral clustering: Analysis and an algorithm. In Proceedings of Neural Information Processing Systems (NIPS) (pp. 849–856).

    Google Scholar 

  • Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.

    Article  Google Scholar 

  • Roweis, S., & Saul, L. (2003). Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4, 119–155.

    MathSciNet  MATH  Google Scholar 

  • Schölkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Schölkopf, B., Smola, A., & Muller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319.

    Google Scholar 

  • Selim, S., & Ismail, M. A. (1984). K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transaction on Pattern Analysis and Machine Intelligence, 6(1), 81–87.

    Google Scholar 

  • Sha, F., & Saul, L. (2005). Analysis and extension of spectral methods for nonlinear dimensionality reduction. In Proceedings of International Conference on Machine Learning (pp. 784–791).

    Google Scholar 

  • Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.

    Article  Google Scholar 

  • Shi, T., Belkin, M., & Yin, B. (2008). Data spectroscopy: Eigenspace of convolution operators and clustering. arXiv:0807.3719v1.

    Google Scholar 

  • Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.

    Google Scholar 

  • Torgerson, W. (1958). Theory and Methods of Scaling. New York: Wiley.

    Google Scholar 

  • von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.

    Article  MathSciNet  Google Scholar 

  • Weinberger, K. Q., & Saul, L. (2004). Unsupervised learning of image manifolds by semidefinite programming. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 988–955).

    Google Scholar 

  • Williams, C. (2002). On a connection between kernel PCA and metric multidimensional scaling. Machine Learning, 46, 11–19.

    Article  MATH  Google Scholar 

  • Zhang, Z., & Zha, H. (2005). Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal on Scientific Computing, 26(1), 313–338.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag New York

About this chapter

Cite this chapter

Vidal, R., Ma, Y., Sastry, S.S. (2016). Nonlinear and Nonparametric Extensions. In: Generalized Principal Component Analysis. Interdisciplinary Applied Mathematics, vol 40. Springer, New York, NY. https://doi.org/10.1007/978-0-387-87811-9_4

Download citation

Publish with us

Policies and ethics