Robust PCA for High-dimensional Data
Principal component analysis (PCA) is a well-known technique for dimension reduction. Classical PCA is based on the empirical mean and covariance matrix of the data, and hence is strongly affected by outlying observations. Therefore, there is a huge need for robust PCA. When the original number of variables is small enough, and in particular smaller than the number of observations, it is known that one can apply a robust estimator of multivariate location and scatter and compute the eigenvectors of the scatter matrix.
The other situation, where there are many variables (often even more variables than observations), has received less attention in the robustness literature. We will compare two robust methods for this situation. The first one is based on projection pursuit (Li and Chen, 1985; Rousseeuw and Croux, 1993; Croux and Ruiz-Gazen, 1996, 2000; Hubert et al., 2002). The second method is a new proposal, which combines the notion of outlyingness (Stahel, 1981; Donoho, 1982) with the FAST-MCD algorithm (Rousseeuw and Van Driessen, 1999). The performance and the robustness of these two methods are compared through a simulation study. We also illustrate the new method on a chemometrical data set.
Unable to display preview. Download preview PDF.
- C. Croux and A. Ruiz-Gazen. A fast algorithm for robust principal components based on projection pursuit. In A. Prat, editor, Proceedings in Computational Statistics, pages 211–217. Physica, Heidelberg, 1996.Google Scholar
- C. Croux and A. Ruiz-Gazen. High breakdown estimators for principal components: The projection-pursuit approach revisited. Université libre de Bruxelles, 2000. Preprint.Google Scholar
- D.L. Donoho. Breakdown properties of multivariate location estimators. PhD thesis, Qualifying paper, Harvard University, Boston, 1982.Google Scholar
- W.A. Stahel. Robust estimation: Infinitesimal optimality and covariance matrix estimators. PhD thesis, ETH, ZUrich, 1981.Google Scholar