Nonlinear and Nonparametric Extensions

  • René Vidal
  • Yi Ma
  • S. Shankar Sastry
Part of the Interdisciplinary Applied Mathematics book series (IAM, volume 40)


In the previous chapters, we studied the problem of fitting a low-dimensional linear or affine subspace to a collection of points. In practical applications, however, a linear or affine subspace may not be able to capture nonlinear structures in the data. For instance, consider the set of all images of a face obtained by rotating it about its main axis of symmetry. While all such images live in a high-dimensional space whose dimension is the number of pixels, there is only one degree of freedom in the data, namely the angle of rotation. In fact, the space of all such images is a one-dimensional circle embedded in a high-dimensional space, whose structure is not well captured by a one-dimensional line. More generally, a collection of face images observed from different viewpoints is not well approximated by a single linear or affine subspace, as illustrated in the following example.


Nonlinear PCA (NLPCA) Locally Linear Embedding (LLE) Nonlinear Principal Component Laplacian Eigenmaps (LE) Kernel PCA (KPCA) 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Belkin, M., & Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Proceedings of Neural Information Processing Systems (NIPS) (pp. 585–591).Google Scholar
  2. Bottou, L., & Bengio, J. (1995). Convergence properties of the k-means algorithms. In Neural Information Processing and Systems.Google Scholar
  3. Burges, C. (2005). Geometric methods for feature extraction and dimensional reduction - a guided tour. In The data mining and knowledge discovery handbook (pp. 59–92). Boston: Kluwer Academic.CrossRefGoogle Scholar
  4. Burges, C. J. C. (2010). Dimension reduction: A guided tour. Foundations and Trends in Machine Learning, 2(4), 275–365.Google Scholar
  5. Chung, F. (1997). Spectral graph theory. Washington: Conference Board of the Mathematical Sciences.zbMATHGoogle Scholar
  6. Cox, T. F., & Cox, M. A. A. (1994). Multidimensional scaling. London: Chapman and Hall.Google Scholar
  7. Davis, C., & Cahan, W. (1970). The rotation of eigenvectors by a pertubation. SIAM Journal on Numerical Analysis, 7(1), 1–46.MathSciNetCrossRefGoogle Scholar
  8. Davison, M. (1983). Multidimensional Scaling. New York: Wiley.zbMATHGoogle Scholar
  9. Donoho, D., & Grimes, C. (2003). Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. National Academy of Sciences, 100(10), 5591–5596.MathSciNetCrossRefzbMATHGoogle Scholar
  10. Forgy, E. (1965). Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications (abstract). Biometrics, 21, 768–769.Google Scholar
  11. Gower, J. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, 53, 325–338.MathSciNetCrossRefzbMATHGoogle Scholar
  12. Hastie, T. (1984). Principal curves and surfaces. Technical Report, Stanford University.Google Scholar
  13. Hastie, T., & Stuetzle, W. (1989). Principal curves. Journal of the American Statistical Association, 84(406), 502–516.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441.CrossRefzbMATHGoogle Scholar
  15. Jancey, R. (1966). Multidimensional group analysis. Australian Journal of Botany, 14, 127–130.CrossRefGoogle Scholar
  16. Kruskal, J. (1964). Nonmetric multidimensional scaling: A numerical method. Psychometrika.Google Scholar
  17. Lee, J. A., & Verleysen, M. (2007). Nonlinear Dimensionality Reduction (1st ed.). New York: Springer.Google Scholar
  18. Lloyd, S. (1957). Least squares quantization in PCM. Technical Report. Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory, 28, 128–137.Google Scholar
  19. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297).Google Scholar
  20. Mercer, J. (1909). Functions of positive and negative types and their connection with the theory of integral equations. Philosophical Transactions, Royal Society London, A, 209(1909), 415–446.CrossRefzbMATHGoogle Scholar
  21. Ng, A., Weiss, Y., & Jordan, M. (2001). On spectral clustering: Analysis and an algorithm. In Proceedings of Neural Information Processing Systems (NIPS) (pp. 849–856).Google Scholar
  22. Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.CrossRefGoogle Scholar
  23. Roweis, S., & Saul, L. (2003). Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4, 119–155.MathSciNetzbMATHGoogle Scholar
  24. Schölkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT Press.zbMATHGoogle Scholar
  25. Schölkopf, B., Smola, A., & Muller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319.Google Scholar
  26. Selim, S., & Ismail, M. A. (1984). K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transaction on Pattern Analysis and Machine Intelligence, 6(1), 81–87.Google Scholar
  27. Sha, F., & Saul, L. (2005). Analysis and extension of spectral methods for nonlinear dimensionality reduction. In Proceedings of International Conference on Machine Learning (pp. 784–791).Google Scholar
  28. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.CrossRefGoogle Scholar
  29. Shi, T., Belkin, M., & Yin, B. (2008). Data spectroscopy: Eigenspace of convolution operators and clustering. arXiv:0807.3719v1.Google Scholar
  30. Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.Google Scholar
  31. Torgerson, W. (1958). Theory and Methods of Scaling. New York: Wiley.Google Scholar
  32. von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.MathSciNetCrossRefGoogle Scholar
  33. Weinberger, K. Q., & Saul, L. (2004). Unsupervised learning of image manifolds by semidefinite programming. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 988–955).Google Scholar
  34. Williams, C. (2002). On a connection between kernel PCA and metric multidimensional scaling. Machine Learning, 46, 11–19.CrossRefzbMATHGoogle Scholar
  35. Zhang, Z., & Zha, H. (2005). Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal on Scientific Computing, 26(1), 313–338.MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag New York 2016

Authors and Affiliations

  • René Vidal
    • 1
  • Yi Ma
    • 2
  • S. Shankar Sastry
    • 3
  1. 1.Center for Imaging Science Department of Biomedical EngineeringJohns Hopkins UniversityBaltimoreUSA
  2. 2.School of Information Science and Technology ShanghaiTech UniversityShanghaiChina
  3. 3.Department of Electrical Engineering and Computer ScienceUniversity of California BerkeleyBerkeleyUSA

Personalised recommendations