Robust PCA for High-dimensional Data

Hubert, M.; Rousseeuw, P. J.; Verboven, S.

doi:10.1007/978-3-642-57338-5_14

M. Hubert⁴,
P. J. Rousseeuw⁵ &
S. Verboven⁵

626 Accesses
2 Citations

Summary

Principal component analysis (PCA) is a well-known technique for dimension reduction. Classical PCA is based on the empirical mean and covariance matrix of the data, and hence is strongly affected by outlying observations. Therefore, there is a huge need for robust PCA. When the original number of variables is small enough, and in particular smaller than the number of observations, it is known that one can apply a robust estimator of multivariate location and scatter and compute the eigenvectors of the scatter matrix.

The other situation, where there are many variables (often even more variables than observations), has received less attention in the robustness literature. We will compare two robust methods for this situation. The first one is based on projection pursuit (Li and Chen, 1985; Rousseeuw and Croux, 1993; Croux and Ruiz-Gazen, 1996, 2000; Hubert et al., 2002). The second method is a new proposal, which combines the notion of outlyingness (Stahel, 1981; Donoho, 1982) with the FAST-MCD algorithm (Rousseeuw and Van Driessen, 1999). The performance and the robustness of these two methods are compared through a simulation study. We also illustrate the new method on a chemometrical data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

G. Boente and L. Orellana. A robust approach to common principal components. In L.T. Fernholz, S. Morgenthaler, and W. Stahel, editors, Statistics in Genetics and in Environmental Sciences,pages 117–146. Birkhauser, Basel, 2001.
Chapter Google Scholar
C. Croux and G. Haesbroeck. Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies. Biometrika, 87: 603–618, 2000.
Article MathSciNet MATH Google Scholar
C. Croux and A. Ruiz-Gazen. A fast algorithm for robust principal components based on projection pursuit. In A. Prat, editor, Proceedings in Computational Statistics, pages 211–217. Physica, Heidelberg, 1996.
Google Scholar
C. Croux and A. Ruiz-Gazen. High breakdown estimators for principal components: The projection-pursuit approach revisited. Université libre de Bruxelles, 2000. Preprint.
Google Scholar
P.L. Davies. Asymptotic behavior of S-estimators of multivariate location and dispersion matrices. Ann. Statist., 15:1269–1292, 1987.
Article MathSciNet MATH Google Scholar
D.L. Donoho. Breakdown properties of multivariate location estimators. PhD thesis, Qualifying paper, Harvard University, Boston, 1982.
Google Scholar
M. Hubert, P.J. Rousseeuw, and S. Verboven. A fast method for robust principal components with applications to chemometrics. Chemometrics and Intelligent Laboratory Systems, 60:101–111,2002.
Article Google Scholar
G. Li and Z. Chen. Projection-pursuit approach to robust dispersion matrices and principal components: primary theory and Monte Carlo. J. Am. Statist. Assoc.,80:759–766, 1985.
Article MATH Google Scholar
B.D. Marx and P.H.C. Eilers. Generalized linear regression on sampled signals and curves: A P-spline approach. Technometrics, 41:1–13, 1999.
Article Google Scholar
B.G. Osborne, T. Fearn, A.R. Miller, and S. Douglas. Application of near infrared reflectance spectroscopy to the compositional analysis of biscuits and biscuit dough. J. of the Science of Food and Agriculture, 35:99–105, 1984.
Article Google Scholar
P.J. Rousseeuw. Least median of squares regression. J. Am. Statist. Assoc., 79:871–880, 1984.
Article MathSciNet MATH Google Scholar
P.J. Rousseeuw and C. Croux. Alternatives to the median absolute deviation. J. Am. Statist. Assoc., 88:1273–1283, 1993.
Article MathSciNet MATH Google Scholar
P.J. Rousseeuw and A.M. Leroy. Robust regression and outlier detection. Wiley, New York, 1987.
Book MATH Google Scholar
P.J. Rousseeuw and K. Van Driessen. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3):212–223, 1999.
Article Google Scholar
P.J. Rousseeuw and B.C. Van Zomeren. Unmasking multivariate outliers and leverage points. J. Am. Statist. Assoc.,85:633–651, 1990.
Article Google Scholar
W.A. Stahel. Robust estimation: Infinitesimal optimality and covariance matrix estimators. PhD thesis, ETH, ZUrich, 1981.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Catholic University of Leuven, Belgium
M. Hubert
Department of Mathematics and Computer Science, University of Antwerp (UIA), Belgium
P. J. Rousseeuw & S. Verboven

Authors

M. Hubert
View author publications
You can also search for this author in PubMed Google Scholar
P. J. Rousseeuw
View author publications
You can also search for this author in PubMed Google Scholar
S. Verboven
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Statistics, Vienna University of Technology, Wiedner Hauptstraße 8-10, Vienna, 1040, Austria
Rudolf Dutter & Peter Filzmoser &
Statistics Department, University of Dortmund, Dortmund, 44221, Germany
Ursula Gather
Department of Mathematics and Computer Science, University of Antwerp, Universiteitsplein 1, Antwerp, 2610, Belgium
Peter J. Rousseeuw

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hubert, M., Rousseeuw, P.J., Verboven, S. (2003). Robust PCA for High-dimensional Data. In: Dutter, R., Filzmoser, P., Gather, U., Rousseeuw, P.J. (eds) Developments in Robust Statistics. Physica, Heidelberg. https://doi.org/10.1007/978-3-642-57338-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-57338-5_14
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-642-63241-9
Online ISBN: 978-3-642-57338-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics