Bayesian Penalty Mixing: The Case of a Non-separable Penalty

  • Veronika RočkováEmail author
  • Edward I. George
Conference paper
Part of the Abel Symposia book series (ABEL, volume 11)


Separable penalties for sparse vector recovery are plentiful throughout statistical methodology and theory. Here, we confine attention to the problem of estimating sparse high-dimensional normal means. Separable penalized likelihood estimators are known to have a Bayesian interpretation as posterior modes under independent product priors. Such estimators can achieve rate-minimax performance when the correct level of sparsity is known. A fully Bayes approach, on the other hand, mixes the product priors over a shared complexity parameter. These constructions can yield a self-adaptive posterior that achieves rate-minimax performance when the sparsity level is unknown. Such optimality has also been established for posterior mean functionals. However, less is known about posterior modes in these setups. Ultimately, the mixing priors render the coordinates dependent through a penalty that is no longer separable. By tying the coordinates together, the hope is to gain adaptivity and achieve automatic hyperparameter tuning. Here, we study two examples of fully Bayes penalties: the fully Bayes LASSO and the fully Bayes Spike-and-Slab LASSO of Ročková and George (The Spike-and-Slab LASSO, Submitted). We discuss discrepancies and highlight the benefits of the two-group prior variant. We develop an Appell function apparatus for coping with adaptive selection thresholds. We show that the fully Bayes treatment of a complexity parameter is tantamount to oracle hyperparameter choice for sparse normal mean estimation.


Prior Distribution Penalty Function Selection Threshold Sparsity Level Posterior Mode 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported by NSF grant DMS-1406563 and AHRQ grant R21-HS021854.


  1. 1.
    Armero, C., Bayarri, M.: Prior assessments in prediction in queues. The Stat. 45, 139–153 (1994)Google Scholar
  2. 2.
    Bondell, H., Reich, B.: Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64, 115–123 (2008)CrossRefMathSciNetzbMATHGoogle Scholar
  3. 3.
    Brown, L.: Admissible estimators, recurrent diffusions, and insoluble boundary value problems. Ann. Math. Stat. 42, 855–903 (1971)CrossRefzbMATHGoogle Scholar
  4. 4.
    Castillo, I., Schmidt-Hieber, J., van der Vaart, A.: Bayesian linear regression with sparse priors. Ann. Stat. 43, 1986–2018 (2015)CrossRefGoogle Scholar
  5. 5.
    Castillo, I., van der Vaart, A.: Needles and straw in a haystack: posterior concentration for possibly sparse sequences. Ann. Stat. 40, 2069–2101 (2012)CrossRefzbMATHGoogle Scholar
  6. 6.
    Donoho, D., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425–455 (1994)CrossRefMathSciNetzbMATHGoogle Scholar
  7. 7.
    Donoho, D., Johnstone, I.M., Hoch, J.C., Stern, A.S.: Maximum entropy and the nearly black object. J. R. Stat. Soc. B 54, 41–81 (1992)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)CrossRefMathSciNetzbMATHGoogle Scholar
  9. 9.
    Fan, Y., Lv, J.: Asymptotic properties for combined 11 and concave regularization. Biometrika 101, 67–70 (2014)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Friedman, J.: Fast sparse regression and classification. Technical Report, Department of Statistics, Stanford University (2008)Google Scholar
  11. 11.
    George, E.I.: Combining minimax shrinkage estimators. J. Am. Stat. Assoc. 81, 437–445 (1968a)CrossRefGoogle Scholar
  12. 12.
    George, E.I.: Minimax multiple shrinkage estimation. Ann. Stat. 14, 188–205 (1968b)CrossRefGoogle Scholar
  13. 13.
    Gradshteyn, I., Ryzhik, E.: Table of Integrals Series and Products. Academic, New York (2000)zbMATHGoogle Scholar
  14. 14.
    Griffin, J.E., Brown, P.J.: Bayesian hyper-LASSOS with non-convex penalization. Aust. N. Z. J. Stat. 53, 423–442 (2012)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Ismail, M., Pitman, J.: Algebraic evaluations of some Euler integrals, duplication formulae for Appell’s hypergeometric function f 1, and Brownian variations. Can. J. Math. 52, 961–981 (2000)CrossRefMathSciNetzbMATHGoogle Scholar
  16. 16.
    Johnstone, I.M., Silverman, B.W.: Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences. Ann. Stat. 32, 1594–1649 (2004)CrossRefMathSciNetzbMATHGoogle Scholar
  17. 17.
    Karp, D., Sitnik, S.M.: Inequalities and monotonicity of ratios for generalized hypergeometric function. J. Approx. Theory 161, 337–352 (2009)CrossRefMathSciNetzbMATHGoogle Scholar
  18. 18.
    Meier, L., Van de Geer, S., Bühlmann, P.: The group LASSO for logistic regression. J. R. Stat. Soc. B 70, 53–71 (2008)CrossRefzbMATHGoogle Scholar
  19. 19.
    Park, T., Casella, G.: The Bayesian LASSO. J. Am. Stat. Assoc. 103, 681–686 (2008)Google Scholar
  20. 20.
    Polson, N., Scott, J.: Shrink globally, act locally: sparse Bayesian regularization and prediction. Bayesian Stat. 9, 501–539 (2010)Google Scholar
  21. 21.
    Ročková, V.: Bayesian estimation of sparse signals with a continuous spike-and-slab prior. In revision Annals of Statistics (2015)Google Scholar
  22. 22.
    Ročková, V., George, E.: EMVS: The EM approach to Bayesian variable selection. J. Am. Stat. Assoc. 109, 828–846 (2014)CrossRefGoogle Scholar
  23. 23.
    Ročková, V., George, E.: Fast Bayesian factor analysis via automatic rotations to sparsity. J. Am. Stat. Assoc., JASA (2015a, accepted for publication)Google Scholar
  24. 24.
    Ročková, V., George, E.: The Spike-and-Slab LASSO, JASA (2015b, Submitted)Google Scholar
  25. 25.
    Stein, C.: Estimation of the mean of a multivariate normal distribution. In: Hajek, J. (ed.) Prague Symposium on Asymptotic Statistics. Univerzita Karlova, Prague, Czech republic (1974)Google Scholar
  26. 26.
    Tibshirani, R.: Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. B 58, 267–288 (1994)MathSciNetGoogle Scholar
  27. 27.
    Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused LASSO. J. R. Stat. Soc. B 67, 91–108 (2005)CrossRefMathSciNetzbMATHGoogle Scholar
  28. 28.
    Wang, Z., Liu, H., Zhang, T.: Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Ann. Stat. 42, 2164–2201 (2014)CrossRefMathSciNetzbMATHGoogle Scholar
  29. 29.
    Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)CrossRefzbMATHGoogle Scholar
  30. 30.
    Zheng, Z., Fan, Y., Lv, J.: High dimensional thresholded regression and shrinkage effect. J. R. Stat. Soc. B 76, 627–649 (2014)CrossRefMathSciNetGoogle Scholar
  31. 31.
    Zou, H.: The adaptive LASSO and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)CrossRefzbMATHGoogle Scholar
  32. 32.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)CrossRefMathSciNetzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Statistics DepartmentThe Wharton School of the University of PennsylvaniaPhiladelphiaUSA

Personalised recommendations