Adaptive shrinkage of singular values

Abstract

To recover a low-rank structure from a noisy matrix, truncated singular value decomposition has been extensively used and studied. Recent studies suggested that the signal can be better estimated by shrinking the singular values as well. We pursue this line of research and propose a new estimator offering a continuum of thresholding and shrinking functions. To avoid an unstable and costly cross-validation search, we propose new rules to select two thresholding and shrinking parameters from the data. In particular we propose a generalized Stein unbiased risk estimation criterion that does not require knowledge of the variance of the noise and that is computationally fast. A Monte Carlo simulation reveals that our estimator outperforms the tested methods in terms of mean squared error on both low-rank and general signal matrices across different signal-to-noise ratio regimes. In addition, it accurately estimates the rank of the signal when it is detectable.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. Baik, J., Silverstein, J.: Eigenvalues of large sample covariance matrices of spiked population models. J. Multivar. Anal. 97(6), 1382–1408 (2006)

    MathSciNet  Article  MATH  Google Scholar 

  2. Cai, J., Candes, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optimiz. 20(4), 1956–1982 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  3. Candes, E.J., Li, X., Ma, Y., Wright, J.: Robust principle component analysis? J. ACM. 58, 1–37 (2009)

    MathSciNet  Article  Google Scholar 

  4. Candes, E.J., Sing-Long, C.A., Trzasko, J.D.: Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Trans. Signal Process. 61(19), 4643–4657 (2013)

    MathSciNet  Article  Google Scholar 

  5. Caussinus, H.: Models and uses of principal component analysis (with discussion). In: de Leeuw, J. (ed.) Multidimensional Data Analysis. DSWO, Leiden, The Netherlands (1986)

    Google Scholar 

  6. Chatterjee, S.: Matrix estimation by universal singular value thresholding. arXiv:1212.1247 (2013)

  7. Chen, K., Dong, H., Chan, K.-S.: Reduced rank regression via adaptive nuclear norm penalization. Biometrika 100(4), 901–920 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  8. Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik 31, 377–403 (1979)

    MathSciNet  Article  MATH  Google Scholar 

  9. de Leeuw, J.D., Mooijaart, A., Leeden, M.: Fixed Factor Score Models with Linear Restrictions. University of Leiden, Leiden, The Netherlands (1985)

    Google Scholar 

  10. Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation via wavelet shrinkage. Biometrika 81, 425–455 (1994)

    MathSciNet  Article  MATH  Google Scholar 

  11. Donoho, D.L., Gavish, M.: The optimal hard threshold for singular values is 4/\(\sqrt{3}\). IEEE Trans. Inf. Theory 60(8), 5040–5053 (2014a)

    MathSciNet  Article  Google Scholar 

  12. Donoho, D.L., Gavish, M.: Minimax risk of matrix denoising by singular value thresholding. Ann. Statist. 42(6), 2413–2440 (2014b)

  13. Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika 1(3), 211–218 (1936)

    Article  MATH  Google Scholar 

  14. Gaiffas, S., Lecue, G.: Weighted algorithms for compressed sensing and matrix completion. arXiv:1107.1638 (2011)

  15. Gavish, M., Donoho, D.L.: Optimal shrinkage of singular values. arXiv:1405.7511v2 (2014)

  16. Hoff, P.D.: Equivariant and scale-free tucker decomposition models. arXiv:1312.6397 (2013)

  17. Hoff, P.D.: Model averaging and dimension selection for the singular value decomposition. J. Am. Statist. Assoc. 102(478), 674–685 (2007)

    MathSciNet  Article  MATH  Google Scholar 

  18. Huber, P.J.: Robust Statistics. Wiley, New York (1981)

    Google Scholar 

  19. Ilin, A., Raiko, T.: Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res. 11, 1957–2000 (2010)

    MathSciNet  MATH  Google Scholar 

  20. Johnstone, I.: On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29(2), 295–327 (2001)

    MathSciNet  Article  MATH  Google Scholar 

  21. Josse, J., Husson, F.: Selecting the number of components in PCA using cross-validation approximations. Computat. Statistist. Data Anal. 56(6), 1869–1879 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  22. Josse, J., Husson, F.: Handling missing values in exploratory multivariate data analysis methods. J. Société Française Statistique 153(2), 1–21 (2012)

    MathSciNet  MATH  Google Scholar 

  23. Lê, S., Josse, J., Husson, F.: Factominer: an R package for multivariate analysis. J. Statist. Softw. 25(1), 1–18 (2008). 3

    Article  Google Scholar 

  24. Ledoit, O., Wolf, M.: Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist. 40, 1024–1060 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  25. Mandel, J.: The partitioning of interaction in analysis of variance. J. Res. Natl. Bur. Stand. B 73, 309–328 (1969)

    MathSciNet  Article  MATH  Google Scholar 

  26. Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 99, 2287–2322 (2010)

    MathSciNet  MATH  Google Scholar 

  27. Owen, A.B., Perry, P.O.: Bi-cross-validation of the svd and the nonnegative matrix factorization. Ann. Appl. Statist. 3(2), 564–594 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  28. Paul, D.: Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica 17(4), 1617 (2007)

    MathSciNet  MATH  Google Scholar 

  29. Rao, N.R.: Optshrink-low-rank signal matrix denoising via optimal, data-driven singular value shrinkage. arXiv:1306.6042 (2013)

  30. Sardy, S.: Blockwise and coordinatewise thresholding to combine tests of different natures in modern anova. arXiv:1302.6073 (2013)

  31. Sardy, S., Bruce, A.G., Tseng, P.: Block coordinate relaxation methods for nonparametric wavelet denoising. J. Computat. Graphic. Statist. 9, 361–379 (2000)

    MathSciNet  Google Scholar 

  32. Sardy, S., Tseng, P., Bruce, A.G.: Robust wavelet denoising. IEEE Trans. Signal Process. 49, 1146–1152 (2001)

    Article  Google Scholar 

  33. Sardy, S.: Smooth blockwise iterative thresholding: a smooth fixed point estimator based on the likelihood’s block gradient. J. Am. Statist. Assoc. 107, 800–813 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  34. Shabalin, A.A., Nobel, B.: Reconstruction of a low-rank matrix in the presence of Gaussian noise. J. Multivar. Anal. 118, 67–76 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  35. Stein, C.M.: Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9, 1135–1151 (1981)

    MathSciNet  Article  MATH  Google Scholar 

  36. Talebi, H., Milanfar, P.: Global image denoising. IEEE Trans. Image. Process. 23(2), 755–768 (2014)

    MathSciNet  Article  Google Scholar 

  37. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B Methodol. 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  38. Verbanck, M., Josse, J., Husson, F.: Regularized PCA to denoise and visualize data. Statist. Comput. 25(2), 471–486 (2015)

  39. Zanella, A., Chiani, M., Win, M.Z.: On the marginal distribution of the eigenvalues of Wishart matrices. IEEE Trans. Commun. 57(4), 1050–1060 (2009)

  40. Zhang, C.H., Huang, J.: The sparsity and biais of the lasso selection in high-dimensional linear regression. Ann. Statist. 36(4), 1567–1594 (2008)

    MathSciNet  Article  MATH  Google Scholar 

  41. Zou, H.: The adaptive LASSO and its oracle properties. J. Am. Statist. Assoc. 101, 1418–1429 (2006)

    MathSciNet  Article  MATH  Google Scholar 

Download references

Acknowledgments

The authors are grateful to the editors and for the helpful comments of the reviewers. J. J. is supported by an AgreenSkills fellowship of the European Union Marie-Curie FP7 COFUND People Programme. S. S. is supported by the Swiss National Science Foundation. This work started while both authors were visiting Stanford University and the authors would like to thank the Department of Statistics for hosting them and for its stimulating seminars.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Sylvain Sardy.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Josse, J., Sardy, S. Adaptive shrinkage of singular values. Stat Comput 26, 715–724 (2016). https://doi.org/10.1007/s11222-015-9554-9

Download citation

Keywords

  • Denoising
  • Singular values shrinking and thresholding
  • Stein’s unbiased risk estimate
  • Adaptive trace norm
  • Rank estimation