Bayesian Penalty Mixing: The Case of a Non-separable Penalty

Conference paper
Part of the Abel Symposia book series (ABEL, volume 11)

Abstract

Separable penalties for sparse vector recovery are plentiful throughout statistical methodology and theory. Here, we confine attention to the problem of estimating sparse high-dimensional normal means. Separable penalized likelihood estimators are known to have a Bayesian interpretation as posterior modes under independent product priors. Such estimators can achieve rate-minimax performance when the correct level of sparsity is known. A fully Bayes approach, on the other hand, mixes the product priors over a shared complexity parameter. These constructions can yield a self-adaptive posterior that achieves rate-minimax performance when the sparsity level is unknown. Such optimality has also been established for posterior mean functionals. However, less is known about posterior modes in these setups. Ultimately, the mixing priors render the coordinates dependent through a penalty that is no longer separable. By tying the coordinates together, the hope is to gain adaptivity and achieve automatic hyperparameter tuning. Here, we study two examples of fully Bayes penalties: the fully Bayes LASSO and the fully Bayes Spike-and-Slab LASSO of Ročková and George (The Spike-and-Slab LASSO, Submitted). We discuss discrepancies and highlight the benefits of the two-group prior variant. We develop an Appell function apparatus for coping with adaptive selection thresholds. We show that the fully Bayes treatment of a complexity parameter is tantamount to oracle hyperparameter choice for sparse normal mean estimation.

References

  1. 1.
    Armero, C., Bayarri, M.: Prior assessments in prediction in queues. The Stat. 45, 139–153 (1994)Google Scholar
  2. 2.
    Bondell, H., Reich, B.: Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64, 115–123 (2008)CrossRefMathSciNetMATHGoogle Scholar
  3. 3.
    Brown, L.: Admissible estimators, recurrent diffusions, and insoluble boundary value problems. Ann. Math. Stat. 42, 855–903 (1971)CrossRefMATHGoogle Scholar
  4. 4.
    Castillo, I., Schmidt-Hieber, J., van der Vaart, A.: Bayesian linear regression with sparse priors. Ann. Stat. 43, 1986–2018 (2015)CrossRefGoogle Scholar
  5. 5.
    Castillo, I., van der Vaart, A.: Needles and straw in a haystack: posterior concentration for possibly sparse sequences. Ann. Stat. 40, 2069–2101 (2012)CrossRefMATHGoogle Scholar
  6. 6.
    Donoho, D., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425–455 (1994)CrossRefMathSciNetMATHGoogle Scholar
  7. 7.
    Donoho, D., Johnstone, I.M., Hoch, J.C., Stern, A.S.: Maximum entropy and the nearly black object. J. R. Stat. Soc. B 54, 41–81 (1992)MathSciNetMATHGoogle Scholar
  8. 8.
    Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)CrossRefMathSciNetMATHGoogle Scholar
  9. 9.
    Fan, Y., Lv, J.: Asymptotic properties for combined 11 and concave regularization. Biometrika 101, 67–70 (2014)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Friedman, J.: Fast sparse regression and classification. Technical Report, Department of Statistics, Stanford University (2008)Google Scholar
  11. 11.
    George, E.I.: Combining minimax shrinkage estimators. J. Am. Stat. Assoc. 81, 437–445 (1968a)CrossRefGoogle Scholar
  12. 12.
    George, E.I.: Minimax multiple shrinkage estimation. Ann. Stat. 14, 188–205 (1968b)CrossRefGoogle Scholar
  13. 13.
    Gradshteyn, I., Ryzhik, E.: Table of Integrals Series and Products. Academic, New York (2000)MATHGoogle Scholar
  14. 14.
    Griffin, J.E., Brown, P.J.: Bayesian hyper-LASSOS with non-convex penalization. Aust. N. Z. J. Stat. 53, 423–442 (2012)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Ismail, M., Pitman, J.: Algebraic evaluations of some Euler integrals, duplication formulae for Appell’s hypergeometric function f 1, and Brownian variations. Can. J. Math. 52, 961–981 (2000)CrossRefMathSciNetMATHGoogle Scholar
  16. 16.
    Johnstone, I.M., Silverman, B.W.: Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences. Ann. Stat. 32, 1594–1649 (2004)CrossRefMathSciNetMATHGoogle Scholar
  17. 17.
    Karp, D., Sitnik, S.M.: Inequalities and monotonicity of ratios for generalized hypergeometric function. J. Approx. Theory 161, 337–352 (2009)CrossRefMathSciNetMATHGoogle Scholar
  18. 18.
    Meier, L., Van de Geer, S., Bühlmann, P.: The group LASSO for logistic regression. J. R. Stat. Soc. B 70, 53–71 (2008)CrossRefMATHGoogle Scholar
  19. 19.
    Park, T., Casella, G.: The Bayesian LASSO. J. Am. Stat. Assoc. 103, 681–686 (2008)Google Scholar
  20. 20.
    Polson, N., Scott, J.: Shrink globally, act locally: sparse Bayesian regularization and prediction. Bayesian Stat. 9, 501–539 (2010)Google Scholar
  21. 21.
    Ročková, V.: Bayesian estimation of sparse signals with a continuous spike-and-slab prior. In revision Annals of Statistics (2015)Google Scholar
  22. 22.
    Ročková, V., George, E.: EMVS: The EM approach to Bayesian variable selection. J. Am. Stat. Assoc. 109, 828–846 (2014)CrossRefGoogle Scholar
  23. 23.
    Ročková, V., George, E.: Fast Bayesian factor analysis via automatic rotations to sparsity. J. Am. Stat. Assoc., JASA (2015a, accepted for publication)Google Scholar
  24. 24.
    Ročková, V., George, E.: The Spike-and-Slab LASSO, JASA (2015b, Submitted)Google Scholar
  25. 25.
    Stein, C.: Estimation of the mean of a multivariate normal distribution. In: Hajek, J. (ed.) Prague Symposium on Asymptotic Statistics. Univerzita Karlova, Prague, Czech republic (1974)Google Scholar
  26. 26.
    Tibshirani, R.: Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. B 58, 267–288 (1994)MathSciNetGoogle Scholar
  27. 27.
    Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused LASSO. J. R. Stat. Soc. B 67, 91–108 (2005)CrossRefMathSciNetMATHGoogle Scholar
  28. 28.
    Wang, Z., Liu, H., Zhang, T.: Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Ann. Stat. 42, 2164–2201 (2014)CrossRefMathSciNetMATHGoogle Scholar
  29. 29.
    Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)CrossRefMATHGoogle Scholar
  30. 30.
    Zheng, Z., Fan, Y., Lv, J.: High dimensional thresholded regression and shrinkage effect. J. R. Stat. Soc. B 76, 627–649 (2014)CrossRefMathSciNetGoogle Scholar
  31. 31.
    Zou, H.: The adaptive LASSO and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)CrossRefMATHGoogle Scholar
  32. 32.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)CrossRefMathSciNetMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Statistics DepartmentThe Wharton School of the University of PennsylvaniaPhiladelphiaUSA

Personalised recommendations