Maximum a posteriori estimators as a limit of Bayes estimators

Abstract

Maximum a posteriori and Bayes estimators are two common methods of point estimation in Bayesian statistics. It is commonly accepted that maximum a posteriori estimators are a limiting case of Bayes estimators with 0–1 loss. In this paper, we provide a counterexample which shows that in general this claim is false. We then correct the claim that by providing a level-set condition for posterior densities such that the result holds. Since both estimators are defined in terms of optimization problems, the tools of variational analysis find a natural application to Bayesian point estimation.

This is a preview of subscription content, access via your institution.

Fig. 1

References

  1. 1.

    Attouch, H., Wets, R.J.-B.: Epigraphical Processes: Laws of Large Numbers for Random LSC Functions. Dép. des Sciences Mathématiques, Paris (1994)

    MATH  Google Scholar 

  2. 2.

    Beer, G.: Topologies on Closed and Closed Convex Sets. Mathematics and Its Applications. Springer, New York (1993)

    Book  MATH  Google Scholar 

  3. 3.

    Chan, S.-O., Diakonikolas, I., Servedio, R.A., Sun, X.: Efficient density estimation via piecewise polynomial approximation. In: Proceedings of the 46th Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM (2014)

  4. 4.

    Dümbgen, L., Rufibach, K., et al.: Maximum likelihood estimation of a log-concave density and its distribution function: basic properties and uniform consistency. Bernoulli 15(1), 40–68 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  5. 5.

    Dupacová, J., Wets, R.: Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems. Ann. Stat. 16, 1517–1549 (1988)

    MathSciNet  Article  MATH  Google Scholar 

  6. 6.

    Figueiredo, M.A.T.: Lecture Notes on Bayesian Estimation and Classification. Instituto de Telecomunicacoes-Instituto Superior Tecnico, Lisboa (2004)

    Google Scholar 

  7. 7.

    Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning, vol. 1. Springer, New York (2001)

    MATH  Google Scholar 

  8. 8.

    Geweke, J.: Contemporary Bayesian Econometrics and Statistics, vol. 537. Wiley, New York (2005)

    Book  MATH  Google Scholar 

  9. 9.

    Geyer, C.J.: On the asymptotics of constrained m-estimation. Ann. Stat. 22, 1993–2010 (1994)

    MathSciNet  Article  MATH  Google Scholar 

  10. 10.

    Hoegele, W., Loeschel, R., Dobler, B., Koelbl, O., Zygmanski, P.: Bayesian estimation applied to stochastic localization with constraints due to interfaces and boundaries. Math. Probl. Eng. 2013, 5–6 (2013)

  11. 11.

    King, A.J., Rockafellar, R.T.: Asymptotic theory for solutions in statistical estimation and stochastic programming. Math. Oper. Res. 18(1), 148–162 (1993)

    MathSciNet  Article  MATH  Google Scholar 

  12. 12.

    King, A.J., Wets, R.J.B.: Epi-consistency of convex stochastic programs. Stoch. Stoch. Rep. 34(1–2), 83–92 (1991)

    MathSciNet  Article  MATH  Google Scholar 

  13. 13.

    Knight, K: Epi-convergence in distribution and stochastic equi-semicontinuity. Unpublished manuscript, 37 (1999)

  14. 14.

    Knight, K.: Limiting distributions of linear programming estimators. Extremes 4(2), 87–103 (2001)

    MathSciNet  Article  MATH  Google Scholar 

  15. 15.

    Knight, K., Fu, W.: Asymptotics for lasso-type estimators. Ann. Stat. 28, 1356–1378 (2000)

    MathSciNet  Article  MATH  Google Scholar 

  16. 16.

    Lee, P.M.: Bayesian Statistics: An Introduction. Wiley, New York (2012)

    MATH  Google Scholar 

  17. 17.

    Pflug, G.C.: Asymptotic Dominance and Confidence for Solutions of Stochastic Programs. International Institute for Applied Systems Analysis, Laxenburg (1991)

    Google Scholar 

  18. 18.

    Pflug, G.C.: Asymptotic stochastic programs. Math. Oper. Res. 20(4), 769–789 (1995)

    MathSciNet  Article  MATH  Google Scholar 

  19. 19.

    Robert, C.: The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation. Springer, New York (2007)

    MATH  Google Scholar 

  20. 20.

    Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Grundlehren der mathematischen Wissenschaften. Springer, Berlin (2009)

    Google Scholar 

  21. 21.

    Royset, J.O., Wets, R.J.B.: Nonparametric density estimation via exponential epi-eplines: fusion of soft and hard information. Technical report (2013)

  22. 22.

    Rufibach, K.: Computing maximum likelihood estimators of a log-concave density function. J. Stat. Comput. Simul. 77(7), 561–574 (2007)

    MathSciNet  Article  MATH  Google Scholar 

  23. 23.

    Salinetti, G., Wets, R.J.-B.: On the convergence in distribution of measurable multifunctions (random sets) normal integrands, stochastic processes and stochastic infima. Math. Oper. Res. 11(3), 385–419 (1986)

    MathSciNet  Article  MATH  Google Scholar 

  24. 24.

    Shapiro, A.: Asymptotic analysis of stochastic programs. Ann. Oper. Res. 30(1), 169–186 (1991)

    MathSciNet  Article  MATH  Google Scholar 

Download references

Acknowledgements

Both authors express their gratitude to Roger J.-B. Wets for his guidance and supervision. This paper is dedicated to him, in honor of his 80th birthday.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Robert Bassett.

Additional information

Robert Bassett’s work was supported by the Programme Gaspard Monge pour l’Optimisation et la recherche opérationnelle (PGMO). Julio Deride was partially supported by NSF funding CMMI 1538263.

Proof of Lemma 1

Proof of Lemma 1

In this appendix we provide the proof of Lemma 1.

Proof

Let \(\theta \in \mathbb {R}^{n}\). To show hypo-convergence, we must show that for each sequence \(\theta ^{\nu } \rightarrow \theta \), \(\limsup _{\nu } f^{\nu }(\theta ^{\nu }) \le f(x)\) and that there exists a sequence \(\theta ^{\nu } \rightarrow \theta \) with \(\liminf _{\nu } f^{\nu }(\theta ^{\nu }) \ge f(\theta )\).

Fix \(\epsilon >0\). Since f is upper semi-continuous at \(\theta \), there is a \(\delta >0\) such that \(\left| \left| z-\theta \right| \right| < 2 \delta \) gives \(f(z) - f(\theta ) < \epsilon \).

Consider any sequence \(\theta ^{\nu } \rightarrow \theta \). We have that

$$\begin{aligned} f^{\nu }(\theta ^{\nu })- f(\theta )= & {} s_{n} \cdot \nu ^{n} \cdot \int _{\left| \left| \theta ^{\nu }-z\right| \right| > \frac{1}{\nu }} (f(z) - f(\theta )) dz \\= & {} s_{n} \cdot \nu ^{n} \cdot \int _{\left| \left| z\right| \right| < \frac{1}{\nu }} (f(z+\theta ^{\nu })-f(\theta )) \, dz. \end{aligned}$$

Choose \(\nu _{0} \in \mathbb {N}\) so that \(\left| \left| \theta -\theta ^{\nu }\right| \right| <\delta \) and \(\frac{1}{\nu } < \delta \) for each \(\nu > \nu _{0}\). Then for any \(\nu > \nu _{0}\),

$$\begin{aligned} s_{n} \cdot \nu ^n \cdot \int _{\left| \left| z\right| \right|< \frac{1}{\nu }} \left( f(z+\theta ^{\nu })-f(\theta ) \right) \, dz \le s_{n} \cdot \nu ^n \cdot \epsilon \cdot \int _{\left| \left| z\right| \right| < \frac{1}{\nu }} \, dz = \epsilon . \end{aligned}$$

Thus \(\limsup _{\nu } f^{\nu }(\theta ^{\nu }) \le f(\theta )\).

To establish the second part of the hypo-convergence definition, we focus our attention on constructing a sequence that satisfies the required inequality.

Consider any \(\eta \in \mathbb {N}\). Recall that \(f^{\nu }\) is an upper semi-continuous density. Let C be the set where f is continuous. Because C is dense, for each \(\nu \in \mathbb {N}\), there is a \(y^{\nu } \in C\) such that \(\left| \left| y^{\nu }-x\right| \right| < \frac{1}{\nu }\). Furthermore, \(y^{\nu } \in C\) means that there is a \(\delta (y^{\nu },\eta ) >0\) such that any \(z \in {\varTheta }\) which satisfies \(\left| \left| y^{\nu } - z\right| \right| < \delta (y^{\nu },\eta )\) also has

$$\begin{aligned} \left| f(y^{\nu }) - f(z) \right| < \frac{1}{\eta }. \end{aligned}$$

Here we use function notation for \(\delta \) to emphasize that \(\delta \) depends on both \(y^{\nu }\) and \(\eta \).

For each \(\eta \), define a sequence such that

$$\begin{aligned} z^{\nu , \eta } = {\left\{ \begin{array}{ll} 0 &{}\quad \text { when } \frac{1}{\nu } >\delta (y^{1}, \eta ) \\ y^{1} &{}\quad \text { when } \delta (y^{2}, \eta ) \le \frac{1}{\nu }< \delta (y^{1},\eta ) \\ y^{2} &{}\quad \text { when } \delta (y^{3}, \eta ) \le \frac{1}{\nu }< \min \{\delta (y^{2}, \eta ), \delta (y^{1},\eta ) \} \\ y^{3} &{}\quad \text { when } \delta (y^{4}, \eta ) \le \frac{1}{\nu } < \min _{i \le 4}{\delta (y^{i}, \eta )} \\ \vdots &{}\quad \vdots \end{array}\right. } \end{aligned}$$

Extracting a diagonal subsequence from the sequences generated according to this procedure gives a sequence \(\theta ^{\nu }\) such that \(\theta ^{\nu } \rightarrow \theta \) and \(\frac{1}{\nu } < \delta (\theta ^{\nu }, \nu )\). In particular, \(\left| f(\theta ^{\nu }) - f(z) \right| < \frac{1}{\nu }\) for z with \(\left| \left| \theta ^{\nu }-z\right| \right| < \frac{1}{\nu } \).

Hence, for any \(\epsilon > 0\), choosing \(\nu > \frac{2}{\epsilon }\) gives

$$\begin{aligned} \left| f^{\nu }(\theta ^{\nu }) - f(\theta ) \right|&\le \left| f^{\nu }(\theta ^{\nu }) - f(\theta ^{\nu }) \right| + \left| f(\theta ^{\nu }) -f(\theta ) \right| \\&\le \frac{\epsilon }{2} + \frac{\epsilon }{2} = \epsilon \end{aligned}$$

We conclude that \(\lim _{\nu } f^{\nu }(\theta ^{\nu }) = f(\theta )\), so the result is proven. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bassett, R., Deride, J. Maximum a posteriori estimators as a limit of Bayes estimators. Math. Program. 174, 129–144 (2019). https://doi.org/10.1007/s10107-018-1241-0

Download citation

Mathematics Subject Classification

  • 62C10
  • 62F10
  • 62F15
  • 65K10