Bayesian Penalty Mixing: The Case of a Non-separable Penalty

Ročková, Veronika; George, Edward I.

doi:10.1007/978-3-319-27099-9_11

Veronika Ročková⁸ &
Edward I. George⁸

Part of the book series: Abel Symposia ((ABEL,volume 11))

2723 Accesses
6 Citations

Abstract

Separable penalties for sparse vector recovery are plentiful throughout statistical methodology and theory. Here, we confine attention to the problem of estimating sparse high-dimensional normal means. Separable penalized likelihood estimators are known to have a Bayesian interpretation as posterior modes under independent product priors. Such estimators can achieve rate-minimax performance when the correct level of sparsity is known. A fully Bayes approach, on the other hand, mixes the product priors over a shared complexity parameter. These constructions can yield a self-adaptive posterior that achieves rate-minimax performance when the sparsity level is unknown. Such optimality has also been established for posterior mean functionals. However, less is known about posterior modes in these setups. Ultimately, the mixing priors render the coordinates dependent through a penalty that is no longer separable. By tying the coordinates together, the hope is to gain adaptivity and achieve automatic hyperparameter tuning. Here, we study two examples of fully Bayes penalties: the fully Bayes LASSO and the fully Bayes Spike-and-Slab LASSO of Ročková and George (The Spike-and-Slab LASSO, Submitted). We discuss discrepancies and highlight the benefits of the two-group prior variant. We develop an Appell function apparatus for coping with adaptive selection thresholds. We show that the fully Bayes treatment of a complexity parameter is tantamount to oracle hyperparameter choice for sparse normal mean estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Following the proving technique of Ročková [21], Remark 5.1. This upper bound is useful for illustration and can be sharpened.
2.
Park and Casella [19] use a gamma prior on \(\lambda ^{2}\) for the ease of MCMC implementation.

References

Armero, C., Bayarri, M.: Prior assessments in prediction in queues. The Stat. 45, 139–153 (1994)
Google Scholar
Bondell, H., Reich, B.: Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64, 115–123 (2008)
Article MathSciNet MATH Google Scholar
Brown, L.: Admissible estimators, recurrent diffusions, and insoluble boundary value problems. Ann. Math. Stat. 42, 855–903 (1971)
Article MATH Google Scholar
Castillo, I., Schmidt-Hieber, J., van der Vaart, A.: Bayesian linear regression with sparse priors. Ann. Stat. 43, 1986–2018 (2015)
Article Google Scholar
Castillo, I., van der Vaart, A.: Needles and straw in a haystack: posterior concentration for possibly sparse sequences. Ann. Stat. 40, 2069–2101 (2012)
Article MATH Google Scholar
Donoho, D., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425–455 (1994)
Article MathSciNet MATH Google Scholar
Donoho, D., Johnstone, I.M., Hoch, J.C., Stern, A.S.: Maximum entropy and the nearly black object. J. R. Stat. Soc. B 54, 41–81 (1992)
MathSciNet MATH Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Article MathSciNet MATH Google Scholar
Fan, Y., Lv, J.: Asymptotic properties for combined 11 and concave regularization. Biometrika 101, 67–70 (2014)
Article MathSciNet Google Scholar
Friedman, J.: Fast sparse regression and classification. Technical Report, Department of Statistics, Stanford University (2008)
Google Scholar
George, E.I.: Combining minimax shrinkage estimators. J. Am. Stat. Assoc. 81, 437–445 (1968a)
Article Google Scholar
George, E.I.: Minimax multiple shrinkage estimation. Ann. Stat. 14, 188–205 (1968b)
Article Google Scholar
Gradshteyn, I., Ryzhik, E.: Table of Integrals Series and Products. Academic, New York (2000)
MATH Google Scholar
Griffin, J.E., Brown, P.J.: Bayesian hyper-LASSOS with non-convex penalization. Aust. N. Z. J. Stat. 53, 423–442 (2012)
Article MathSciNet Google Scholar
Ismail, M., Pitman, J.: Algebraic evaluations of some Euler integrals, duplication formulae for Appell’s hypergeometric function f ₁, and Brownian variations. Can. J. Math. 52, 961–981 (2000)
Article MathSciNet MATH Google Scholar
Johnstone, I.M., Silverman, B.W.: Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences. Ann. Stat. 32, 1594–1649 (2004)
Article MathSciNet MATH Google Scholar
Karp, D., Sitnik, S.M.: Inequalities and monotonicity of ratios for generalized hypergeometric function. J. Approx. Theory 161, 337–352 (2009)
Article MathSciNet MATH Google Scholar
Meier, L., Van de Geer, S., Bühlmann, P.: The group LASSO for logistic regression. J. R. Stat. Soc. B 70, 53–71 (2008)
Article MATH Google Scholar
Park, T., Casella, G.: The Bayesian LASSO. J. Am. Stat. Assoc. 103, 681–686 (2008)
Google Scholar
Polson, N., Scott, J.: Shrink globally, act locally: sparse Bayesian regularization and prediction. Bayesian Stat. 9, 501–539 (2010)
Google Scholar
Ročková, V.: Bayesian estimation of sparse signals with a continuous spike-and-slab prior. In revision Annals of Statistics (2015)
Google Scholar
Ročková, V., George, E.: EMVS: The EM approach to Bayesian variable selection. J. Am. Stat. Assoc. 109, 828–846 (2014)
Article Google Scholar
Ročková, V., George, E.: Fast Bayesian factor analysis via automatic rotations to sparsity. J. Am. Stat. Assoc., JASA (2015a, accepted for publication)
Google Scholar
Ročková, V., George, E.: The Spike-and-Slab LASSO, JASA (2015b, Submitted)
Google Scholar
Stein, C.: Estimation of the mean of a multivariate normal distribution. In: Hajek, J. (ed.) Prague Symposium on Asymptotic Statistics. Univerzita Karlova, Prague, Czech republic (1974)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. B 58, 267–288 (1994)
MathSciNet Google Scholar
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused LASSO. J. R. Stat. Soc. B 67, 91–108 (2005)
Article MathSciNet MATH Google Scholar
Wang, Z., Liu, H., Zhang, T.: Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Ann. Stat. 42, 2164–2201 (2014)
Article MathSciNet MATH Google Scholar
Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Article MATH Google Scholar
Zheng, Z., Fan, Y., Lv, J.: High dimensional thresholded regression and shrinkage effect. J. R. Stat. Soc. B 76, 627–649 (2014)
Article MathSciNet Google Scholar
Zou, H.: The adaptive LASSO and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Article MATH Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported by NSF grant DMS-1406563 and AHRQ grant R21-HS021854.

Author information

Authors and Affiliations

Statistics Department, The Wharton School of the University of Pennsylvania, Philadelphia, PA, 19104, USA
Veronika Ročková & Edward I. George

Authors

Veronika Ročková
View author publications
You can also search for this author in PubMed Google Scholar
Edward I. George
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Veronika Ročková .

Editor information

Editors and Affiliations

Oslo Centre for Biostatistics and Epide, University of Oslo, Oslo, Norway
Arnoldo Frigessi
Seminar for Statistics, ETH Zürich, Zürich, Switzerland
Peter Bühlmann
Department of Mathematics, University of Oslo, Oslo, Norway
Ingrid K. Glad
Norwegian University of Science and Tec, Department of Mathematical Sciences, Trondheim, Norway
Mette Langaas
University of Cambridge, MRC Biostatistics Unit, Cambridge Instit, Cambridge, United Kingdom
Sylvia Richardson
Department of Statistics, Rice University, Houston, Texas, USA
Marina Vannucci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ročková, V., George, E.I. (2016). Bayesian Penalty Mixing: The Case of a Non-separable Penalty. In: Frigessi, A., Bühlmann, P., Glad, I., Langaas, M., Richardson, S., Vannucci, M. (eds) Statistical Analysis for High-Dimensional Data. Abel Symposia, vol 11. Springer, Cham. https://doi.org/10.1007/978-3-319-27099-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-27099-9_11
Published: 17 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27097-5
Online ISBN: 978-3-319-27099-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics