Statistics and Computing

, Volume 25, Issue 2, pp 471–486 | Cite as

Regularised PCA to denoise and visualise data

Article

Abstract

Principal component analysis (PCA) is a well-established dimensionality reduction method commonly used to denoise and visualise data. A classical PCA model is the fixed effect model in which data are generated as a fixed structure of low rank corrupted by noise. Under this model, PCA does not provide the best recovery of the underlying signal in terms of mean squared error. Following the same principle as in ridge regression, we suggest a regularised version of PCA that essentially selects a certain number of dimensions and shrinks the corresponding singular values. Each singular value is multiplied by a term which can be seen as the ratio of the signal variance over the total variance of the associated dimension. The regularised term is analytically derived using asymptotic results and can also be justified from a Bayesian treatment of the model. Regularised PCA provides promising results in terms of the recovery of the true signal and the graphical outputs in comparison with classical PCA and with a soft thresholding estimation strategy. The distinction between PCA and regularised PCA becomes especially important in the case of very noisy data.

Keywords

Principal component analysis Shrinkage Regularised PCA Fixed effect model Denoising Visualisation 

References

  1. Bartholomew, D.: Latent Variable Models and Factor Analysis. Charles Griffin and Company Limited, London (1987) MATHGoogle Scholar
  2. Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053–2080 (2009) CrossRefGoogle Scholar
  3. Candès, E.J., Sing-Long, C.A., Trzasko, J.D.: Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Trans. Signal Process. 61(19), 4643–4657 (2013) CrossRefMathSciNetGoogle Scholar
  4. Caussinus, H.: Models and Uses of Principal Component Analysis (with Discussion) pp. 149–178. DSWO Press, Leiden (1986) Google Scholar
  5. Chikuse, Y.: Statistics on Special Manifolds. Springer, Berlin (2003) CrossRefMATHGoogle Scholar
  6. Cornelius, P., Crossa, J.: Prediction assessment of shrinkage estimators of multiplicative models for multi-environment cultivar trials. Crop Sci. 39, 998–1009 (1999) CrossRefGoogle Scholar
  7. Denis, J.B., Gower, J.C.: Asymptotic covariances for the parameters of biadditive models. Util. Math. 193–205 (1994) Google Scholar
  8. Denis, J.B., Gower, J.C.: Asymptotic confidence regions for biadditive models: interpreting genotype-environment interactions. J. R. Stat. Soc., Ser. C, Appl. Stat. 45, 479–493 (1996) Google Scholar
  9. Denis, J.B., Pázman, A.: Bias of least squares estimators in nonlinear regression models with constraints. Part ii: biadditive models. Appl. Math. 44, 359–374 (1999) CrossRefMATHMathSciNetGoogle Scholar
  10. Désert, C., Duclos, M., Blavy, P., Lecerf, F., Moreews, F., Klopp, C., Aubry, M., Herault, F., Le Roy, P., Berri, C., Douaire, M., Diot, C., Lagarrigue, S.: Transcriptome profiling of the feeding-to-fasting transition in chicken liver. BMC Genomics (2008) Google Scholar
  11. Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25), 863–868 (1998) CrossRefGoogle Scholar
  12. Gower, J.C., Dijksterhuis, G.B.: Procrustes Problems. Oxford University Press, London (2004) CrossRefMATHGoogle Scholar
  13. Greenacre, M.J.: Biplots in practice. In: BBVA Fundation (2010) Google Scholar
  14. Hastie, T.J., Tibshirani, R.J., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, Berlin (2009) CrossRefGoogle Scholar
  15. Hoff, P.D.: Model averaging and dimension selection for the singular value decomposition. J. Am. Stat. Assoc. 102(478), 674–685 (2007) CrossRefMATHMathSciNetGoogle Scholar
  16. Hoff, P.D.: Simulation of the matrix Bingham–von Mises–Fisher distribution, with applications to multivariate and relational data. J. Comput. Graph. Stat. 18(2), 438–456 (2009) CrossRefMathSciNetGoogle Scholar
  17. Husson, F., Le, S., Pages, J.: Exploratory Multivariate Analysis by Example Using R, 1st edn. CRC Press, Boca Raton (2010) CrossRefGoogle Scholar
  18. Hwang, H., Tomiuk, M., Takane, Y.: In: Correspondence Analysis, Multiple Correspondence Analysis and Recent Developments, Sage Publications, pp. 243–263 (2009) Google Scholar
  19. Jolliffe, I.: In: Principal Component Analysis. Springer Series in Statistics (2002) Google Scholar
  20. Josse, J., Husson, F.: Selecting the number of components in pca using cross-validation approximations. Comput. Stat. Data Anal. 56, 1869–1879 (2011) CrossRefMathSciNetGoogle Scholar
  21. Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 99, 2287–2322 (2010) MathSciNetGoogle Scholar
  22. Papadopoulo, T., Lourakis, M.I.A.: Estimating the Jacobian of the singular value decomposition: theory and applications. In: Proceedings of the European Conference on Computer Vision, ECCV00, pp. 554–570. Springer, Berlin (2000) Google Scholar
  23. R Core Team: R: a language and environment for statistical computing. In: R Foundation for Statistical Computing, Vienna, Austria (2012). http://www.R-project.org/, ISBN 3-900051-07-0 Google Scholar
  24. Robinson, G.K.: That BLUP is a good thing: the estimation of random effects. Stat. Sci. 6(1), 15–32 (1991) CrossRefMATHGoogle Scholar
  25. Roweis, S.: Em algorithms for pca and spca. In: Advances in Neural Information Processing Systems, pp. 626–632. MIT Press, Cambridge (1998) Google Scholar
  26. Rubin, D.B., Thayer, D.T.: EM algorithms for ML factor analysis. Psychometrika 47(1), 69–76 (1982) CrossRefMATHMathSciNetGoogle Scholar
  27. Sharif, B., Bresler, Y.: Physiologically improved NCAT phantom (PINCAT) enables in-silico study of the effects of beat-to-beat variability on cardiac MR. In: Proceedings of the Annual Meeting of ISMRM, Berlin, p. 3418 (2007) Google Scholar
  28. Takane, Y., Hwang, H.: Regularized Multiple Correspondence Analysis pp. 259–279. Chapman & Hall, Boca Raton (2006) CrossRefGoogle Scholar
  29. Tipping, M., Bishop, C.: Probabilistic principal component analysis. J. R. Stat. Soc. B 61, 611–622 (1999) CrossRefMATHMathSciNetGoogle Scholar
  30. Witten, D., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–534 (2009) CrossRefGoogle Scholar
  31. Witten, D., Tibshirani, R., Gross, S., Narasimhan, B.: PMA: Penalized Multivariate Analysis (2011). http://CRAN.R-project.org/package=PMA, R package version 1.0.8
  32. Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006) CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Applied Mathematics DepartmentAgrocampus OuestRennesFrance

Personalised recommendations