Statistics and Computing

, Volume 26, Issue 3, pp 715–724 | Cite as

Adaptive shrinkage of singular values

Article

Abstract

To recover a low-rank structure from a noisy matrix, truncated singular value decomposition has been extensively used and studied. Recent studies suggested that the signal can be better estimated by shrinking the singular values as well. We pursue this line of research and propose a new estimator offering a continuum of thresholding and shrinking functions. To avoid an unstable and costly cross-validation search, we propose new rules to select two thresholding and shrinking parameters from the data. In particular we propose a generalized Stein unbiased risk estimation criterion that does not require knowledge of the variance of the noise and that is computationally fast. A Monte Carlo simulation reveals that our estimator outperforms the tested methods in terms of mean squared error on both low-rank and general signal matrices across different signal-to-noise ratio regimes. In addition, it accurately estimates the rank of the signal when it is detectable.

Keywords

Denoising Singular values shrinking and thresholding  Stein’s unbiased risk estimate Adaptive trace norm Rank estimation 

References

  1. Baik, J., Silverstein, J.: Eigenvalues of large sample covariance matrices of spiked population models. J. Multivar. Anal. 97(6), 1382–1408 (2006)MathSciNetCrossRefMATHGoogle Scholar
  2. Cai, J., Candes, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optimiz. 20(4), 1956–1982 (2010)MathSciNetCrossRefMATHGoogle Scholar
  3. Candes, E.J., Li, X., Ma, Y., Wright, J.: Robust principle component analysis? J. ACM. 58, 1–37 (2009)MathSciNetCrossRefGoogle Scholar
  4. Candes, E.J., Sing-Long, C.A., Trzasko, J.D.: Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Trans. Signal Process. 61(19), 4643–4657 (2013)MathSciNetCrossRefGoogle Scholar
  5. Caussinus, H.: Models and uses of principal component analysis (with discussion). In: de Leeuw, J. (ed.) Multidimensional Data Analysis. DSWO, Leiden, The Netherlands (1986)Google Scholar
  6. Chatterjee, S.: Matrix estimation by universal singular value thresholding. arXiv:1212.1247 (2013)
  7. Chen, K., Dong, H., Chan, K.-S.: Reduced rank regression via adaptive nuclear norm penalization. Biometrika 100(4), 901–920 (2013)MathSciNetCrossRefMATHGoogle Scholar
  8. Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik 31, 377–403 (1979)MathSciNetCrossRefMATHGoogle Scholar
  9. de Leeuw, J.D., Mooijaart, A., Leeden, M.: Fixed Factor Score Models with Linear Restrictions. University of Leiden, Leiden, The Netherlands (1985)Google Scholar
  10. Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation via wavelet shrinkage. Biometrika 81, 425–455 (1994)MathSciNetCrossRefMATHGoogle Scholar
  11. Donoho, D.L., Gavish, M.: The optimal hard threshold for singular values is 4/\(\sqrt{3}\). IEEE Trans. Inf. Theory 60(8), 5040–5053 (2014a)MathSciNetCrossRefGoogle Scholar
  12. Donoho, D.L., Gavish, M.: Minimax risk of matrix denoising by singular value thresholding. Ann. Statist. 42(6), 2413–2440 (2014b)Google Scholar
  13. Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika 1(3), 211–218 (1936)CrossRefMATHGoogle Scholar
  14. Gaiffas, S., Lecue, G.: Weighted algorithms for compressed sensing and matrix completion. arXiv:1107.1638 (2011)
  15. Gavish, M., Donoho, D.L.: Optimal shrinkage of singular values. arXiv:1405.7511v2 (2014)
  16. Hoff, P.D.: Equivariant and scale-free tucker decomposition models. arXiv:1312.6397 (2013)
  17. Hoff, P.D.: Model averaging and dimension selection for the singular value decomposition. J. Am. Statist. Assoc. 102(478), 674–685 (2007)MathSciNetCrossRefMATHGoogle Scholar
  18. Huber, P.J.: Robust Statistics. Wiley, New York (1981)CrossRefMATHGoogle Scholar
  19. Ilin, A., Raiko, T.: Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res. 11, 1957–2000 (2010)MathSciNetMATHGoogle Scholar
  20. Johnstone, I.: On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29(2), 295–327 (2001)MathSciNetCrossRefMATHGoogle Scholar
  21. Josse, J., Husson, F.: Selecting the number of components in PCA using cross-validation approximations. Computat. Statistist. Data Anal. 56(6), 1869–1879 (2012)MathSciNetCrossRefMATHGoogle Scholar
  22. Josse, J., Husson, F.: Handling missing values in exploratory multivariate data analysis methods. J. Société Française Statistique 153(2), 1–21 (2012)MathSciNetMATHGoogle Scholar
  23. Lê, S., Josse, J., Husson, F.: Factominer: an R package for multivariate analysis. J. Statist. Softw. 25(1), 1–18 (2008). 3CrossRefGoogle Scholar
  24. Ledoit, O., Wolf, M.: Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist. 40, 1024–1060 (2012)MathSciNetCrossRefMATHGoogle Scholar
  25. Mandel, J.: The partitioning of interaction in analysis of variance. J. Res. Natl. Bur. Stand. B 73, 309–328 (1969)MathSciNetCrossRefMATHGoogle Scholar
  26. Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 99, 2287–2322 (2010)MathSciNetMATHGoogle Scholar
  27. Owen, A.B., Perry, P.O.: Bi-cross-validation of the svd and the nonnegative matrix factorization. Ann. Appl. Statist. 3(2), 564–594 (2009)MathSciNetCrossRefMATHGoogle Scholar
  28. Paul, D.: Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica 17(4), 1617 (2007)MathSciNetMATHGoogle Scholar
  29. Rao, N.R.: Optshrink-low-rank signal matrix denoising via optimal, data-driven singular value shrinkage. arXiv:1306.6042 (2013)
  30. Sardy, S.: Blockwise and coordinatewise thresholding to combine tests of different natures in modern anova. arXiv:1302.6073 (2013)
  31. Sardy, S., Bruce, A.G., Tseng, P.: Block coordinate relaxation methods for nonparametric wavelet denoising. J. Computat. Graphic. Statist. 9, 361–379 (2000)MathSciNetGoogle Scholar
  32. Sardy, S., Tseng, P., Bruce, A.G.: Robust wavelet denoising. IEEE Trans. Signal Process. 49, 1146–1152 (2001)CrossRefGoogle Scholar
  33. Sardy, S.: Smooth blockwise iterative thresholding: a smooth fixed point estimator based on the likelihood’s block gradient. J. Am. Statist. Assoc. 107, 800–813 (2012)MathSciNetCrossRefMATHGoogle Scholar
  34. Shabalin, A.A., Nobel, B.: Reconstruction of a low-rank matrix in the presence of Gaussian noise. J. Multivar. Anal. 118, 67–76 (2013)MathSciNetCrossRefMATHGoogle Scholar
  35. Stein, C.M.: Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9, 1135–1151 (1981)MathSciNetCrossRefMATHGoogle Scholar
  36. Talebi, H., Milanfar, P.: Global image denoising. IEEE Trans. Image. Process. 23(2), 755–768 (2014)MathSciNetCrossRefGoogle Scholar
  37. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B Methodol. 58, 267–288 (1996)MathSciNetMATHGoogle Scholar
  38. Verbanck, M., Josse, J., Husson, F.: Regularized PCA to denoise and visualize data. Statist. Comput. 25(2), 471–486 (2015)Google Scholar
  39. Zanella, A., Chiani, M., Win, M.Z.: On the marginal distribution of the eigenvalues of Wishart matrices. IEEE Trans. Commun. 57(4), 1050–1060 (2009)Google Scholar
  40. Zhang, C.H., Huang, J.: The sparsity and biais of the lasso selection in high-dimensional linear regression. Ann. Statist. 36(4), 1567–1594 (2008)MathSciNetCrossRefMATHGoogle Scholar
  41. Zou, H.: The adaptive LASSO and its oracle properties. J. Am. Statist. Assoc. 101, 1418–1429 (2006)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Agrocampus OuestRennesFrance
  2. 2.Université de GenèveGenevaSwitzerland

Personalised recommendations