Skip to main content

Principal Component Analysis

  • Chapter
  • First Online:
Generalized Principal Component Analysis

Part of the book series: Interdisciplinary Applied Mathematics ((IAM,volume 40))

Abstract

Principal component analysis (PCA) is the problem of fitting a low-dimensional affine subspace to a set of data points in a high-dimensional space. PCA is, by now, well established in the literature, and has become one of the most useful tools for data modeling, compression, and visualization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The reason for this is that both \(\boldsymbol{u}_{1}\) and its orthogonal complement \(\boldsymbol{u}_{1}^{\perp }\) are invariant subspaces of \(\Sigma _{\boldsymbol{x}}\).

  2. 2.

    In the statistical setting, \(\boldsymbol{x}_{j}\) and \(\boldsymbol{y}_{j}\) will be samples of two random variables \(\boldsymbol{x}\) and \(\boldsymbol{y}\), respectively. Then this constraint is equivalent to setting their means to zero.

  3. 3.

    From a statistical standpoint, the column vectors of U give the directions in which the data X has the largest variance, whence the name “principal components.”

  4. 4.

    In Section 1.2.1, we have seen an example in which a similar process can be applied to an ensemble of face images from multiple subspaces, where the first d = 3 principal components are calculated and visualized.

  5. 5.

    We leave as an exercise to the reader to calculate the number of parameters needed to specify a d-dimensional subspace in \(\mathbb{R}^{D}\) and then the additional parameters needed to specify a Gaussian distribution inside the subspace.

  6. 6.

    Even if one chooses to compare models by their algorithmic complexity, such as the minimum message length (MML) criterion (Wallace and Boulton 1968) (an extension of the Kolmogrov complexity to model selection), a strong connection with the above information-theoretic criteria, such as minimum description length (MDL), can be readily established via Shannon’s optimal coding theory (see (Wallace and Dowe 1999)).

  7. 7.

    It can be shown that the nuclear norm is a convex envelope of the rank function for matrices.

  8. 8.

    http://cvc.yale.edu/projects/yalefacesB/yalefacesB.html.

References

  • Akaike, H. (1977). A new look at the statistical model selection. IEEE Transactions on Automatic Control, 16(6), 716–723.

    MathSciNet  MATH  Google Scholar 

  • Basri, R., & Jacobs, D. (2003). Lambertian reflection and linear subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(2), 218–233.

    Article  Google Scholar 

  • Beltrami, E. (1873). Sulle funzioni bilineari. Giornale di Mathematiche di Battaglini, 11, 98–106.

    MATH  Google Scholar 

  • Cai, J.-F., Candés, E. J., & Shen, Z. (2008). A singular value thresholding algorithm for matrix completion. SIAM Journal of Optimization, 20(4), 1956–1982.

    Article  MathSciNet  MATH  Google Scholar 

  • Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245–276.

    Article  Google Scholar 

  • Collins, M., Dasgupta, S., & Schapire, R. (2001). A generalization of principal component analysis to the exponential family. In Neural Information Processing Systems (Vol. 14)

    Google Scholar 

  • Ding, C., Zha, H., He, X., Husbands, P., & Simon, H. D. (2004). Link analysis: Hubs and authoraties on the world wide web. SIAM Review, 46(2), 256–268.

    Article  MathSciNet  MATH  Google Scholar 

  • Donoho, D., & Gavish, M. (2014). The optimal hard threshold for singular values is \(4/\sqrt{3}\). IEEE Transactions on Information Theory, 60(8), 5040–5053.

    Article  MathSciNet  Google Scholar 

  • Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1, 211–218.

    Article  MATH  Google Scholar 

  • Gabriel, K. R. (1978). Least squares approximation of matrices by additive and multiplicative models. Journal of the Royal Statistical Society B, 40, 186–196.

    MathSciNet  MATH  Google Scholar 

  • Georghiades, A., Belhumeur, P., & Kriegman, D. (2001). From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 643–660.

    Article  Google Scholar 

  • Hansen, M., & Yu, B. (2001). Model selection and the principle of minimum description length. Journal of American Statistical Association, 96, 746–774.

    Article  MathSciNet  MATH  Google Scholar 

  • Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441.

    Article  MATH  Google Scholar 

  • Householder, A. S., & Young, G. (1938). Matrix approximation and latent roots. American Mathematical Monthly, 45, 165–171.

    Article  MathSciNet  MATH  Google Scholar 

  • Hubert, L., Meulman, J., & Heiser, W. (2000). Two purposes for matrix factorization: A historical appraisal. SIAM Review, 42(1), 68–82.

    Article  MathSciNet  MATH  Google Scholar 

  • Jolliffe, I. (1986). Principal Component Analysis. New York: Springer.

    Book  MATH  Google Scholar 

  • Jolliffe, I. (2002). Principal Component Analysis (2nd ed.). New York: Springer.

    MATH  Google Scholar 

  • Jordan, M. (1874). Mémoire sur les formes bilinéaires. Journal de Mathématiques Pures et Appliqués, 19, 35–54.

    MATH  Google Scholar 

  • Kanatani, K. (1998). Geometric information criterion for model selection. International Journal of Computer Vision (pp. 171–189).

    Google Scholar 

  • Kleinberg, J. M. (1999). Authorative sources in a hyberlinked environment. Journal of the ACM, 48, 604–632.

    Article  MathSciNet  Google Scholar 

  • Minka, T. (2000). Automatic choice of dimensionality for PCA. In Neural Information Processing Systems (Vol. 13, pp. 598–604).

    Google Scholar 

  • Mirsky, L. (1975). A trace inequality of John von Neumann. Monatshefte für Mathematic, 79, 303–306.

    Article  MathSciNet  MATH  Google Scholar 

  • Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh and Dublin Philosphical Magazine and Journal of Science, 2, 559–572.

    Article  MATH  Google Scholar 

  • Recht, B., Fazel, M., & Parrilo, P. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501.

    Article  MathSciNet  MATH  Google Scholar 

  • Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.

    Article  MATH  Google Scholar 

  • Shabalin, A., & Nobel, A. (2010). Reconstruction of a low-rank matrix in the presence of gaussian noise (pp. 1–34). arXiv preprint 1007.4148

    Google Scholar 

  • Tipping, M., & Bishop, C. (1999b). Probabilistic principal component analysis. Journal of the Royal Statistical Society, 61(3), 611–622.

    Article  MathSciNet  MATH  Google Scholar 

  • Turk, M., & Pentland, A. (1991). Face recognition using eigenfaces. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 586–591).

    Google Scholar 

  • Wallace, C., & Boulton, D. (1968). An information measure for classification. The Computer Journal, 11, 185–194.

    Article  MATH  Google Scholar 

  • Wallace, C., & Dowe, D. (1999). Minimum message length and Kolmogrov complexity. The Computer Journal, 42(4), 270–283.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag New York

About this chapter

Cite this chapter

Vidal, R., Ma, Y., Sastry, S.S. (2016). Principal Component Analysis. In: Generalized Principal Component Analysis. Interdisciplinary Applied Mathematics, vol 40. Springer, New York, NY. https://doi.org/10.1007/978-0-387-87811-9_2

Download citation

Publish with us

Policies and ethics