Skip to main content

Empirical Priors and Coverage of Posterior Credible Sets in a Sparse Normal Mean Model

Abstract

Bayesian methods provide a natural means for uncertainty quantification, that is, credible sets can be easily obtained from the posterior distribution. But is this uncertainty quantification valid in the sense that the posterior credible sets attain the nominal frequentist coverage probability? This paper investigates the frequentist validity of posterior uncertainty quantification based on a class of empirical priors in the sparse normal mean model. In particular, we show that our marginal posterior credible intervals achieve the nominal frequentist coverage probability under conditions slightly weaker than needed for selection consistency and a Bernstein–von Mises theorem for the full posterior, and numerical investigations suggest that our empirical Bayes method has superior frequentist coverage probability properties compared to other fully Bayes methods.

This is a preview of subscription content, access via your institution.

Figure 1

Notes

  1. In fact, it is not uncommon in seminar talks or less formal discussions to hear one motivate the construction of a new prior by saying that existing priors “don’t work” and/or the new prior “works better,” not that it more accurately reflects subjective prior beliefs, etc.

  2. The expression for πn(S) given in Section 4.1 of Martin et al. (2017) has a typo, but the correct formula is given in the supplement at https://arxiv.org/abs/1406.7718.

References

  • Arias-Castro, E. and Lounici, K. (2014). Estimation and variable selection with exponential weights. Electron. J. Stat. 8, 1, 328–354.

    MathSciNet  MATH  Google Scholar 

  • Barbieri, M. M. and Berger, J. O. (2004). Optimal predictive model selection. Ann. Statist. 32, 3, 870–897.

    MathSciNet  MATH  Google Scholar 

  • Belitser, E. (2017). On coverage and local radial rates of credible sets. Ann. Statist.45, 3, 1124–1151.

    MathSciNet  MATH  Google Scholar 

  • Belitser, E. and Ghosal, S. (2019). Empirical Bayes oracle uncertainty quantification. Ann. Statist., to appear. http://www4.stat.ncsu.edu/ghoshal/papers/oracle_regression.pdf.

  • Belitser, E. and Nurushev, N. (2017). Needles and straw in a haystack: robust confidence for possibly sparse sequences. Unpublished manuscript. arXiv:1511.01803.

  • Bogdan, M., Chakrabarti, A., Frommlet, F. and Ghosh, J. K. (2011). Asymptotic Bayes-optimality under sparsity of some multiple testing procedures. Ann. Statist. 39, 3, 1551–1579.

    MathSciNet  MATH  Google Scholar 

  • Bogdan, M., Ghosh, J.K. and Tokdar, S.T. (2008). A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing. IMS, Beachwood, Balakrishnan, N., Peña, E. and Silvapulle, M. (eds.),.

  • Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97, 2, 465–480.

    MathSciNet  MATH  Google Scholar 

  • Castillo, I., Schmidt-Hieber, J. and van der Vaart, A. (2015). Bayesian linear regression with sparse priors. Ann. Statist. 43, 5, 1986–2018.

    MathSciNet  MATH  Google Scholar 

  • Castillo, I. and Szabó, B. (2019). Spike and slab empirical Bayes sparse credible sets. Unpublished manuscript. arXiv:1808.07721.

  • Castillo, I. and van der Vaart, A. (2012). Needles and straw in a haystack: posterior concentration for possibly sparse sequences. Ann. Statist. 40, 4, 2069–2101.

    MathSciNet  MATH  Google Scholar 

  • Datta, J. and Ghosh, J. K. (2013). Asymptotic properties of Bayes risk for the horseshoe prior. Bayesian Anal. 8, 1, 111–131.

    MathSciNet  MATH  Google Scholar 

  • Donoho, D. L. and Johnstone, I. M. (1994). Minimax risk over lp-balls for lq-error. Probab. Theory Related Fields 99, 2, 277–303.

    MathSciNet  MATH  Google Scholar 

  • Ghosh, J. K., Delampady, M. and Samanta, T. (2006). An introduction to bayesian analysis. Springer, New York.

    MATH  Google Scholar 

  • Ghosh, P. and Chakrabarti, A. (2015). Posterior concentration properties of a general class of shrinkage estimators around nearly black vectors. Unpublished manuscript. arXiv:1412.8161.

  • Grünwald, P. and Mehta, N. (2017). Faster rates for general unbounded loss functions: from ERM to generalized Bayes. Unpublished manuscript. arXiv:1605.00252.

  • Grünwald, P. and van Ommen, T. (2017). Inconsistency of Bayesian inference for misspecified linear models, and a proposal for repairing it. Bayesian Anal. 12, 4, 1069–1103.

    MathSciNet  MATH  Google Scholar 

  • Holmes, C. C. and Walker, S. G. (2017). Assigning a value to a power likelihood in a general Bayesian model. Biometrika 104, 2, 497–503.

    MathSciNet  MATH  Google Scholar 

  • Jiang, W. and Zhang, C. -H. (2009). General maximum likelihood empirical Bayes estimation of normal means. Ann. Statist. 37, 4, 1647–1684.

    MathSciNet  MATH  Google Scholar 

  • Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences. Ann. Statist. 32, 4, 1594–1649.

    MathSciNet  MATH  Google Scholar 

  • Li, K. -C. (1989). Honest confidence regions for nonparametric regression. Ann. Statist. 17, 3, 1001–1008.

    MathSciNet  MATH  Google Scholar 

  • Liu, C., Yang, Y., Bondell, H. and Martin, R. (2018). Bayesian inference in high-dimensional linear models using an empirical correlation-adaptive prior. Unpublished manuscript. arXiv:1810.00739.

  • Martin, R. (2017). Invited comment on the article by van der Pas, Szabó, and van der Vaart. Bayesian Anal. 12, 4, 1254–1258.

    Google Scholar 

  • Martin, R. (2018). Empirical priors and posterior concentration rates for a monotone density. Sankhya A, to appear. arXiv:1706.08567.

  • Martin, R., Mess, R. and Walker, S. G. (2017). Empirical Bayes posterior concentration in sparse high-dimensional linear models. Bernoulli 23, 3, 1822–1847.

    MathSciNet  MATH  Google Scholar 

  • Martin, R. and Tang, Y. (2019). Empirical priors for prediction in sparse high-dimensional linear regression. arXiv:1903.00961.

  • Martin, R. and Walker, S. G. (2014). Asymptotically minimax empirical Bayes estimation of a sparse normal mean vector. Electron. J. Stat. 8, 2, 2188–2206.

    MathSciNet  MATH  Google Scholar 

  • Martin, R. and Walker, S. G. (2019). Data-dependent priors and their posterior concentration rates. Electron. J. Stat. 13, 2, 3049–3081.

    MathSciNet  MATH  Google Scholar 

  • Ning, B. and Ghosal, S. (2018). Bayesian linear regression for multivariate responses under group sparsity. Unpublished manuscript. arXiv:1807.03439.

  • Nurushev, N. and Belitser, E. (2019). General framework for projection structures. Unpublished manuscript. arXiv:1904.01003.

  • Salomond, J. -B. (2014). Concentration rate and consistency of the posterior distribution for selected priors under monotonicity constraints. Electron. J. Stat. 8, 1, 1380–1404.

    MathSciNet  MATH  Google Scholar 

  • Syring, N. and Martin, R. (2019). Calibrating general posterior credible regions. Biometrika 106, 2, 479–486.

    MathSciNet  MATH  Google Scholar 

  • Szabó, B., van der Vaart, A. W. and van Zanten, J. H. (2015). Frequentist coverage of adaptive nonparametric Bayesian credible sets. Ann. Statist. 43, 4, 1391–1428.

    MathSciNet  MATH  Google Scholar 

  • van der Pas, S., Scott, J., Chakraborty, A. and Bhattacharya, A. (2016). horseshoe: Implementation of the Horseshoe Prior. R package version 0.1.0.

  • van der Pas, S., Szabó, B. and van der Vaart, A. (2017a). Adaptive posterior contraction rates for the horseshoe. Electron. J. Stat. 11, 2, 3196–3225.

    MathSciNet  MATH  Google Scholar 

  • van der Pas, S., Szabó, B. and van der Vaart, A. (2017b). Uncertainty quantification for the horseshoe (with discussion). Bayesian Anal. 12, 4, 1221–1274. With a rejoinder by the authors.

    MathSciNet  MATH  Google Scholar 

  • van der Pas, S. L., Kleijn, B. J. K. and van der Vaart, A. W. (2014). The horseshoe estimator: posterior concentration around nearly black vectors. Electron. J. Stat.8, 2, 2585–2618.

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors thank the editors of the special issue of Sankhya A dedicated to Jayanta K. Ghosh for the invitation to contribute, and the anonymous reviewers for their helpful suggestions that improved both our results and presentation. This work is partially supported by the National Science Foundation, DMS–1737933.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ryan Martin.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix : Proof of Theorem 2

Appendix : Proof of Theorem 2

The proof strategy here closely follows that of Theorem 5 in the supplement to Martin et al. (2017) presented in the most recent arXiv version (arXiv:1406.7718). To fix notation, let S be the true configuration of size s = |S|, and let \(S^{\dagger } \subseteq S^{\star }\) be the set of all i such that \(|\theta _{i}^{\star }| \geq \rho _{n}\), where ρn is as in Eq. 2.6, and write s = |S|.

Based on Theorem 2 in Martin et al. (2017), we can restrict to configurations S such that |S| ≤ Cs, where s = |S| and C is a large constant. Take such an S that also satisfies \(S \not \supseteq S^{\dagger }\). Then πn(S) can be bounded as follows:

$$ \pi^{n}(S) \leq \frac{\pi^{n}(S)}{\pi^{n}(S^{\dagger})} = \frac{\pi(S)}{\pi(S^{\dagger})} z^{|S|-s^{\dagger}} e^{\frac{\alpha}{2\sigma^{2}}\{\|Y_{S^{\dagger c}}\|^{2} - \|Y_{S^{c}}\|^{2}\}}, $$

where z = (1 + ατ− 1)− 1/2 < 1. A key observation is that

$$ \|Y_{S^{\dagger c}}\|^{2} - \|Y_{S^{c}}\|^{2} = \|Y_{S \cap S^{\dagger c}}\|^{2} - \|Y_{S^{c} \cap S^{\dagger}}\|^{2}, $$

and the latter two terms are independent since they depend on disjoint sets of Yi’s. Therefore, using this independence and the familiar central and non-central chi-square moment generating functions, we get

$$ \mathsf{E}_{\theta^{\star}} e^{\frac{\alpha}{2\sigma^{2}}\{\|Y_{S^{\dagger c}}\|^{2} - \|Y_{S^{c}}\|^{2}\}} = (1-\alpha)^{-|S \cap S^{\dagger c}|} (1+\alpha)^{-|S^{c} \cap S^{\dagger}|} e^{-\frac{\alpha}{2(1+\alpha)\sigma^{2}} \|\theta_{S^{c} \cap S^{\dagger}}^{\star}\|^{2}}. $$

By definition of S, and the fact that 1 + α > 1, the above expectation can be upper-bounded by

$$ (1-\alpha)^{-|S \cap S^{\dagger c}|} (n^{M})^{-|S^{c} \cap S^{\dagger}|}. $$

Putting the pieces together we have

$$ \mathsf{E}_{\theta^{\star}} \pi^{n}(S) \leq \frac{\pi(S)}{\pi(S^{\dagger})} z^{|S|-s^{\dagger}} (1-\alpha)^{-|S \cap S^{\dagger c}|} (n^{M})^{-|S^{c} \cap S^{\dagger}|}. $$

We want to sum this over all \(S \not \supseteq S^{\dagger }\) but, since it only involves size of S, we only need to sum over sizes. Indeed, after plugging in the definition of π(S) we get

$$ \sum\limits_{S: S \not\supseteq S^{\dagger}, |S| \leq C s^{\star}} \mathsf{E}_{\theta^{\star}} \pi^{n}(S) \leq \sum\limits_{s=0}^{Cs^{\star}} \sum\limits_{t=0}^{s \wedge s^{\dagger}} \frac{\binom{s}{t} \binom{n-s^{\dagger}}{s-t} \binom{n}{s^{\dagger}}}{\binom{n}{s}} \frac{f_{n}(s)}{f_{n}(s^{\dagger})} z^{s-s^{\dagger}} (1-\alpha)^{-(s-t)} (n^{M})^{-(s^{\dagger} - t)}. $$

For the binomial coefficient ratio we have the following simplification and bound:

$$ \frac{\binom{s}{t} \binom{n-s^{\dagger}}{s-t} \binom{n}{s^{\dagger}}}{\binom{n}{s}} = \binom{s}{t} \binom{n-s}{s^{\dagger} - t} \leq s^{s-t} n^{s^{\dagger} - t}. $$

Next, to bound the double-sum, split it into two parts:

$$ \sum\limits_{s=0}^{Cs^{\star}} \sum\limits_{t=0}^{s \wedge s^{\dagger}} (\cdots) = \sum\limits_{s=0}^{s^{\dagger} - 1} \sum\limits_{t=0}^{s} (\cdots) + \sum\limits_{s=s^{\dagger}}^{Cs^{\star}} \sum\limits_{t=0}^{s^{\dagger}} (\cdots) $$

We need to show that both parts on the right-hand side above vanish as \(n \to \infty \). For the first double-sum we have

$$ \sum\limits_{s=0}^{s^{\dagger} - 1} \sum\limits_{t=0}^{s} (\cdots) = \sum\limits_{s=0}^{s^{\dagger} - 1} \left(\frac{1}{K_{1} n^{M-a_{1}-1}} \right)^{s^{\dagger} - s} \sum\limits_{t=0}^{s} \left(\frac{s}{(1-\alpha)n^{M-1}} \right)^{s-t}. $$

Since M > 1 + a1, the inner sum is O(1) and the outer sum—because there is a common n−(Ma1− 1) factor—is o(1) as \(n \to \infty \). Similarly, for the second double-sum we have

$$ \sum\limits_{s=s^{\dagger}}^{Cs^{\star}} \sum\limits_{t=0}^{s^{\dagger}} (\cdots) = \sum\limits_{s=s^{\dagger}}^{Cs^{\star}} \left(\frac{K_{2} s}{(1-\alpha)n^{a_{2}}} \right)^{s-s^{\dagger}} \sum\limits_{t=0}^{s^{\dagger}} \left(\frac{s}{1-\alpha} \right)^{s^{\dagger} - t} \left(\frac{1}{n^{M-1}} \right)^{s^{\dagger} - t}. $$

The inner sum is O(1) and, since a2 < a1, the outer sum is upper-bounded by O(sna2) which goes to 0 by assumption. Both terms in the double-sum above vanish as \(n \to \infty \), thus proving the claim.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Martin, R., Ning, B. Empirical Priors and Coverage of Posterior Credible Sets in a Sparse Normal Mean Model. Sankhya A 82, 477–498 (2020). https://doi.org/10.1007/s13171-019-00189-w

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13171-019-00189-w

Keywords and phrases.

  • Bayesian inference
  • Bernstein–von Mises theorem
  • Concentration rate
  • High-dimensional model
  • Uncertainty quantification

AMS (2000) subject classification.

  • Primary 62C12
  • 62F12
  • Secondary 62E20