Abstract
Principal component analysis (PCA) is the problem of fitting a low-dimensional affine subspace to a set of data points in a high-dimensional space. PCA is, by now, well established in the literature, and has become one of the most useful tools for data modeling, compression, and visualization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The reason for this is that both \(\boldsymbol{u}_{1}\) and its orthogonal complement \(\boldsymbol{u}_{1}^{\perp }\) are invariant subspaces of \(\Sigma _{\boldsymbol{x}}\).
- 2.
In the statistical setting, \(\boldsymbol{x}_{j}\) and \(\boldsymbol{y}_{j}\) will be samples of two random variables \(\boldsymbol{x}\) and \(\boldsymbol{y}\), respectively. Then this constraint is equivalent to setting their means to zero.
- 3.
From a statistical standpoint, the column vectors of U give the directions in which the data X has the largest variance, whence the name “principal components.”
- 4.
In Section 1.2.1, we have seen an example in which a similar process can be applied to an ensemble of face images from multiple subspaces, where the first d = 3 principal components are calculated and visualized.
- 5.
We leave as an exercise to the reader to calculate the number of parameters needed to specify a d-dimensional subspace in \(\mathbb{R}^{D}\) and then the additional parameters needed to specify a Gaussian distribution inside the subspace.
- 6.
Even if one chooses to compare models by their algorithmic complexity, such as the minimum message length (MML) criterion (Wallace and Boulton 1968) (an extension of the Kolmogrov complexity to model selection), a strong connection with the above information-theoretic criteria, such as minimum description length (MDL), can be readily established via Shannon’s optimal coding theory (see (Wallace and Dowe 1999)).
- 7.
It can be shown that the nuclear norm is a convex envelope of the rank function for matrices.
- 8.
References
Akaike, H. (1977). A new look at the statistical model selection. IEEE Transactions on Automatic Control, 16(6), 716–723.
Basri, R., & Jacobs, D. (2003). Lambertian reflection and linear subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(2), 218–233.
Beltrami, E. (1873). Sulle funzioni bilineari. Giornale di Mathematiche di Battaglini, 11, 98–106.
Cai, J.-F., Candés, E. J., & Shen, Z. (2008). A singular value thresholding algorithm for matrix completion. SIAM Journal of Optimization, 20(4), 1956–1982.
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245–276.
Collins, M., Dasgupta, S., & Schapire, R. (2001). A generalization of principal component analysis to the exponential family. In Neural Information Processing Systems (Vol. 14)
Ding, C., Zha, H., He, X., Husbands, P., & Simon, H. D. (2004). Link analysis: Hubs and authoraties on the world wide web. SIAM Review, 46(2), 256–268.
Donoho, D., & Gavish, M. (2014). The optimal hard threshold for singular values is \(4/\sqrt{3}\). IEEE Transactions on Information Theory, 60(8), 5040–5053.
Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1, 211–218.
Gabriel, K. R. (1978). Least squares approximation of matrices by additive and multiplicative models. Journal of the Royal Statistical Society B, 40, 186–196.
Georghiades, A., Belhumeur, P., & Kriegman, D. (2001). From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 643–660.
Hansen, M., & Yu, B. (2001). Model selection and the principle of minimum description length. Journal of American Statistical Association, 96, 746–774.
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441.
Householder, A. S., & Young, G. (1938). Matrix approximation and latent roots. American Mathematical Monthly, 45, 165–171.
Hubert, L., Meulman, J., & Heiser, W. (2000). Two purposes for matrix factorization: A historical appraisal. SIAM Review, 42(1), 68–82.
Jolliffe, I. (1986). Principal Component Analysis. New York: Springer.
Jolliffe, I. (2002). Principal Component Analysis (2nd ed.). New York: Springer.
Jordan, M. (1874). Mémoire sur les formes bilinéaires. Journal de Mathématiques Pures et Appliqués, 19, 35–54.
Kanatani, K. (1998). Geometric information criterion for model selection. International Journal of Computer Vision (pp. 171–189).
Kleinberg, J. M. (1999). Authorative sources in a hyberlinked environment. Journal of the ACM, 48, 604–632.
Minka, T. (2000). Automatic choice of dimensionality for PCA. In Neural Information Processing Systems (Vol. 13, pp. 598–604).
Mirsky, L. (1975). A trace inequality of John von Neumann. Monatshefte für Mathematic, 79, 303–306.
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh and Dublin Philosphical Magazine and Journal of Science, 2, 559–572.
Recht, B., Fazel, M., & Parrilo, P. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501.
Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.
Shabalin, A., & Nobel, A. (2010). Reconstruction of a low-rank matrix in the presence of gaussian noise (pp. 1–34). arXiv preprint 1007.4148
Tipping, M., & Bishop, C. (1999b). Probabilistic principal component analysis. Journal of the Royal Statistical Society, 61(3), 611–622.
Turk, M., & Pentland, A. (1991). Face recognition using eigenfaces. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 586–591).
Wallace, C., & Boulton, D. (1968). An information measure for classification. The Computer Journal, 11, 185–194.
Wallace, C., & Dowe, D. (1999). Minimum message length and Kolmogrov complexity. The Computer Journal, 42(4), 270–283.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag New York
About this chapter
Cite this chapter
Vidal, R., Ma, Y., Sastry, S.S. (2016). Principal Component Analysis. In: Generalized Principal Component Analysis. Interdisciplinary Applied Mathematics, vol 40. Springer, New York, NY. https://doi.org/10.1007/978-0-387-87811-9_2
Download citation
DOI: https://doi.org/10.1007/978-0-387-87811-9_2
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-87810-2
Online ISBN: 978-0-387-87811-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)