Regularised PCA to denoise and visualise data

Abstract

Principal component analysis (PCA) is a well-established dimensionality reduction method commonly used to denoise and visualise data. A classical PCA model is the fixed effect model in which data are generated as a fixed structure of low rank corrupted by noise. Under this model, PCA does not provide the best recovery of the underlying signal in terms of mean squared error. Following the same principle as in ridge regression, we suggest a regularised version of PCA that essentially selects a certain number of dimensions and shrinks the corresponding singular values. Each singular value is multiplied by a term which can be seen as the ratio of the signal variance over the total variance of the associated dimension. The regularised term is analytically derived using asymptotic results and can also be justified from a Bayesian treatment of the model. Regularised PCA provides promising results in terms of the recovery of the true signal and the graphical outputs in comparison with classical PCA and with a soft thresholding estimation strategy. The distinction between PCA and regularised PCA becomes especially important in the case of very noisy data.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  1. Bartholomew, D.: Latent Variable Models and Factor Analysis. Charles Griffin and Company Limited, London (1987)

    Google Scholar 

  2. Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053–2080 (2009)

    Article  Google Scholar 

  3. Candès, E.J., Sing-Long, C.A., Trzasko, J.D.: Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Trans. Signal Process. 61(19), 4643–4657 (2013)

    Article  MathSciNet  Google Scholar 

  4. Caussinus, H.: Models and Uses of Principal Component Analysis (with Discussion) pp. 149–178. DSWO Press, Leiden (1986)

    Google Scholar 

  5. Chikuse, Y.: Statistics on Special Manifolds. Springer, Berlin (2003)

    Google Scholar 

  6. Cornelius, P., Crossa, J.: Prediction assessment of shrinkage estimators of multiplicative models for multi-environment cultivar trials. Crop Sci. 39, 998–1009 (1999)

    Article  Google Scholar 

  7. Denis, J.B., Gower, J.C.: Asymptotic covariances for the parameters of biadditive models. Util. Math. 193–205 (1994)

  8. Denis, J.B., Gower, J.C.: Asymptotic confidence regions for biadditive models: interpreting genotype-environment interactions. J. R. Stat. Soc., Ser. C, Appl. Stat. 45, 479–493 (1996)

    Google Scholar 

  9. Denis, J.B., Pázman, A.: Bias of least squares estimators in nonlinear regression models with constraints. Part ii: biadditive models. Appl. Math. 44, 359–374 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  10. Désert, C., Duclos, M., Blavy, P., Lecerf, F., Moreews, F., Klopp, C., Aubry, M., Herault, F., Le Roy, P., Berri, C., Douaire, M., Diot, C., Lagarrigue, S.: Transcriptome profiling of the feeding-to-fasting transition in chicken liver. BMC Genomics (2008)

  11. Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25), 863–868 (1998)

    Article  Google Scholar 

  12. Gower, J.C., Dijksterhuis, G.B.: Procrustes Problems. Oxford University Press, London (2004)

    Google Scholar 

  13. Greenacre, M.J.: Biplots in practice. In: BBVA Fundation (2010)

    Google Scholar 

  14. Hastie, T.J., Tibshirani, R.J., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, Berlin (2009)

    Google Scholar 

  15. Hoff, P.D.: Model averaging and dimension selection for the singular value decomposition. J. Am. Stat. Assoc. 102(478), 674–685 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  16. Hoff, P.D.: Simulation of the matrix Bingham–von Mises–Fisher distribution, with applications to multivariate and relational data. J. Comput. Graph. Stat. 18(2), 438–456 (2009)

    Article  MathSciNet  Google Scholar 

  17. Husson, F., Le, S., Pages, J.: Exploratory Multivariate Analysis by Example Using R, 1st edn. CRC Press, Boca Raton (2010)

    Google Scholar 

  18. Hwang, H., Tomiuk, M., Takane, Y.: In: Correspondence Analysis, Multiple Correspondence Analysis and Recent Developments, Sage Publications, pp. 243–263 (2009)

    Google Scholar 

  19. Jolliffe, I.: In: Principal Component Analysis. Springer Series in Statistics (2002)

    Google Scholar 

  20. Josse, J., Husson, F.: Selecting the number of components in pca using cross-validation approximations. Comput. Stat. Data Anal. 56, 1869–1879 (2011)

    Article  MathSciNet  Google Scholar 

  21. Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 99, 2287–2322 (2010)

    MathSciNet  Google Scholar 

  22. Papadopoulo, T., Lourakis, M.I.A.: Estimating the Jacobian of the singular value decomposition: theory and applications. In: Proceedings of the European Conference on Computer Vision, ECCV00, pp. 554–570. Springer, Berlin (2000)

    Google Scholar 

  23. R Core Team: R: a language and environment for statistical computing. In: R Foundation for Statistical Computing, Vienna, Austria (2012). http://www.R-project.org/, ISBN 3-900051-07-0

    Google Scholar 

  24. Robinson, G.K.: That BLUP is a good thing: the estimation of random effects. Stat. Sci. 6(1), 15–32 (1991)

    Article  MATH  Google Scholar 

  25. Roweis, S.: Em algorithms for pca and spca. In: Advances in Neural Information Processing Systems, pp. 626–632. MIT Press, Cambridge (1998)

    Google Scholar 

  26. Rubin, D.B., Thayer, D.T.: EM algorithms for ML factor analysis. Psychometrika 47(1), 69–76 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  27. Sharif, B., Bresler, Y.: Physiologically improved NCAT phantom (PINCAT) enables in-silico study of the effects of beat-to-beat variability on cardiac MR. In: Proceedings of the Annual Meeting of ISMRM, Berlin, p. 3418 (2007)

    Google Scholar 

  28. Takane, Y., Hwang, H.: Regularized Multiple Correspondence Analysis pp. 259–279. Chapman & Hall, Boca Raton (2006)

    Google Scholar 

  29. Tipping, M., Bishop, C.: Probabilistic principal component analysis. J. R. Stat. Soc. B 61, 611–622 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  30. Witten, D., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–534 (2009)

    Article  Google Scholar 

  31. Witten, D., Tibshirani, R., Gross, S., Narasimhan, B.: PMA: Penalized Multivariate Analysis (2011). http://CRAN.R-project.org/package=PMA, R package version 1.0.8

  32. Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Julie Josse.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Verbanck, M., Josse, J. & Husson, F. Regularised PCA to denoise and visualise data. Stat Comput 25, 471–486 (2015). https://doi.org/10.1007/s11222-013-9444-y

Download citation

Keywords

  • Principal component analysis
  • Shrinkage
  • Regularised PCA
  • Fixed effect model
  • Denoising
  • Visualisation