Skip to main content

Bayesian Penalty Mixing: The Case of a Non-separable Penalty

  • Conference paper
  • First Online:
Statistical Analysis for High-Dimensional Data

Part of the book series: Abel Symposia ((ABEL,volume 11))

Abstract

Separable penalties for sparse vector recovery are plentiful throughout statistical methodology and theory. Here, we confine attention to the problem of estimating sparse high-dimensional normal means. Separable penalized likelihood estimators are known to have a Bayesian interpretation as posterior modes under independent product priors. Such estimators can achieve rate-minimax performance when the correct level of sparsity is known. A fully Bayes approach, on the other hand, mixes the product priors over a shared complexity parameter. These constructions can yield a self-adaptive posterior that achieves rate-minimax performance when the sparsity level is unknown. Such optimality has also been established for posterior mean functionals. However, less is known about posterior modes in these setups. Ultimately, the mixing priors render the coordinates dependent through a penalty that is no longer separable. By tying the coordinates together, the hope is to gain adaptivity and achieve automatic hyperparameter tuning. Here, we study two examples of fully Bayes penalties: the fully Bayes LASSO and the fully Bayes Spike-and-Slab LASSO of Ročková and George (The Spike-and-Slab LASSO, Submitted). We discuss discrepancies and highlight the benefits of the two-group prior variant. We develop an Appell function apparatus for coping with adaptive selection thresholds. We show that the fully Bayes treatment of a complexity parameter is tantamount to oracle hyperparameter choice for sparse normal mean estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Following the proving technique of Ročková [21], Remark 5.1. This upper bound is useful for illustration and can be sharpened.

  2. 2.

    Park and Casella [19] use a gamma prior on \(\lambda ^{2}\) for the ease of MCMC implementation.

References

  1. Armero, C., Bayarri, M.: Prior assessments in prediction in queues. The Stat. 45, 139–153 (1994)

    Google Scholar 

  2. Bondell, H., Reich, B.: Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64, 115–123 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  3. Brown, L.: Admissible estimators, recurrent diffusions, and insoluble boundary value problems. Ann. Math. Stat. 42, 855–903 (1971)

    Article  MATH  Google Scholar 

  4. Castillo, I., Schmidt-Hieber, J., van der Vaart, A.: Bayesian linear regression with sparse priors. Ann. Stat. 43, 1986–2018 (2015)

    Article  Google Scholar 

  5. Castillo, I., van der Vaart, A.: Needles and straw in a haystack: posterior concentration for possibly sparse sequences. Ann. Stat. 40, 2069–2101 (2012)

    Article  MATH  Google Scholar 

  6. Donoho, D., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425–455 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  7. Donoho, D., Johnstone, I.M., Hoch, J.C., Stern, A.S.: Maximum entropy and the nearly black object. J. R. Stat. Soc. B 54, 41–81 (1992)

    MathSciNet  MATH  Google Scholar 

  8. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  9. Fan, Y., Lv, J.: Asymptotic properties for combined 11 and concave regularization. Biometrika 101, 67–70 (2014)

    Article  MathSciNet  Google Scholar 

  10. Friedman, J.: Fast sparse regression and classification. Technical Report, Department of Statistics, Stanford University (2008)

    Google Scholar 

  11. George, E.I.: Combining minimax shrinkage estimators. J. Am. Stat. Assoc. 81, 437–445 (1968a)

    Article  Google Scholar 

  12. George, E.I.: Minimax multiple shrinkage estimation. Ann. Stat. 14, 188–205 (1968b)

    Article  Google Scholar 

  13. Gradshteyn, I., Ryzhik, E.: Table of Integrals Series and Products. Academic, New York (2000)

    MATH  Google Scholar 

  14. Griffin, J.E., Brown, P.J.: Bayesian hyper-LASSOS with non-convex penalization. Aust. N. Z. J. Stat. 53, 423–442 (2012)

    Article  MathSciNet  Google Scholar 

  15. Ismail, M., Pitman, J.: Algebraic evaluations of some Euler integrals, duplication formulae for Appell’s hypergeometric function f 1, and Brownian variations. Can. J. Math. 52, 961–981 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  16. Johnstone, I.M., Silverman, B.W.: Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences. Ann. Stat. 32, 1594–1649 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  17. Karp, D., Sitnik, S.M.: Inequalities and monotonicity of ratios for generalized hypergeometric function. J. Approx. Theory 161, 337–352 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  18. Meier, L., Van de Geer, S., Bühlmann, P.: The group LASSO for logistic regression. J. R. Stat. Soc. B 70, 53–71 (2008)

    Article  MATH  Google Scholar 

  19. Park, T., Casella, G.: The Bayesian LASSO. J. Am. Stat. Assoc. 103, 681–686 (2008)

    Google Scholar 

  20. Polson, N., Scott, J.: Shrink globally, act locally: sparse Bayesian regularization and prediction. Bayesian Stat. 9, 501–539 (2010)

    Google Scholar 

  21. Ročková, V.: Bayesian estimation of sparse signals with a continuous spike-and-slab prior. In revision Annals of Statistics (2015)

    Google Scholar 

  22. Ročková, V., George, E.: EMVS: The EM approach to Bayesian variable selection. J. Am. Stat. Assoc. 109, 828–846 (2014)

    Article  Google Scholar 

  23. Ročková, V., George, E.: Fast Bayesian factor analysis via automatic rotations to sparsity. J. Am. Stat. Assoc., JASA (2015a, accepted for publication)

    Google Scholar 

  24. Ročková, V., George, E.: The Spike-and-Slab LASSO, JASA (2015b, Submitted)

    Google Scholar 

  25. Stein, C.: Estimation of the mean of a multivariate normal distribution. In: Hajek, J. (ed.) Prague Symposium on Asymptotic Statistics. Univerzita Karlova, Prague, Czech republic (1974)

    Google Scholar 

  26. Tibshirani, R.: Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. B 58, 267–288 (1994)

    MathSciNet  Google Scholar 

  27. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused LASSO. J. R. Stat. Soc. B 67, 91–108 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  28. Wang, Z., Liu, H., Zhang, T.: Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Ann. Stat. 42, 2164–2201 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  29. Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)

    Article  MATH  Google Scholar 

  30. Zheng, Z., Fan, Y., Lv, J.: High dimensional thresholded regression and shrinkage effect. J. R. Stat. Soc. B 76, 627–649 (2014)

    Article  MathSciNet  Google Scholar 

  31. Zou, H.: The adaptive LASSO and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)

    Article  MATH  Google Scholar 

  32. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by NSF grant DMS-1406563 and AHRQ grant R21-HS021854.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Veronika Ročková .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ročková, V., George, E.I. (2016). Bayesian Penalty Mixing: The Case of a Non-separable Penalty. In: Frigessi, A., Bühlmann, P., Glad, I., Langaas, M., Richardson, S., Vannucci, M. (eds) Statistical Analysis for High-Dimensional Data. Abel Symposia, vol 11. Springer, Cham. https://doi.org/10.1007/978-3-319-27099-9_11

Download citation

Publish with us

Policies and ethics