Skip to main content
Log in

A review of Bayesian asymptotics in general insurance applications

  • Original Research Paper
  • Published:
European Actuarial Journal Aims and scope Submit manuscript

Abstract

Over the last two decades, Bayesian methods have been widely used in general insurance applications, ranging from credibility theory to loss-reserves estimation, but this literature rarely addresses questions about the method’s asymptotic properties. In this paper, we review the Bayesian’s notion of posterior consistency in both parametric and nonparametric models and its implication on the sensitivity of the posterior to the actuary’s choice of prior. We review some of the techniques for proving posterior consistency and, for illustration, we apply these results to investigate the asymptotic properties of several recently proposed Bayesian methods in general insurance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. This modern use of the term “nonparametric” differs a bit from that in the classical setting, e.g., [36], where it referred to methods free of distributional assumptions. The connection is that models depending on an infinite-dimensional parameter can avoid certain specifications. For example, in nonparametric regression (Example 4.4), the error distribution is normal, but taking the regression function itself to be the parameter avoids specifying a particular form like linear, quadratic, etc.

  2. The assumption that the \(X_i\)s are uniformly distributed is not particularly special. The key assumption is that the distribution is known, i.e., does not depend on any unknown parameters.

  3. By an \(\epsilon\)-net in a metric space (Xd), we mean a subset Y of X such that for any \(x\in X\) there exists a \(y\in Y\) such that \(d(x, y)<\epsilon\).

References

  1. Barron A (1988) The exponential convergence of posterior probabilities with implications for Bayes estimators of density functions. Technical Report 7, Department of Statistics, University of Illinois, Champaign, IL

  2. Barron A, Schervish MJ, Wasserman L (1999) The consistency of posterior distributions in nonparametric problems. Ann Stat 27:536–561

    Article  MathSciNet  MATH  Google Scholar 

  3. Berger JO (1985) Statistical decision theory and bayesian analysis, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  4. Brockett PL, Chuang SL, Pitaktong U (2014) Generalized additive models and nonparametric regression. In: Predictive modeling applications in actuarial science. Cambridge University Press. pp 367–397

  5. Bühlmann H (1967) Experience rating and credibility. ASTIN Bull 4:199–207

    Article  Google Scholar 

  6. Bühlmann H, Gisler A (2005) A course in credibility theory and its applications. Springer, New York

    MATH  Google Scholar 

  7. Bühlmann H, Straub E (1970) Glaubwürdigkeit für Schadensätze. Mitteilungen der Vereinigung Schweizerischer Versicherungs-Mathematiker 70:111–133

    MATH  Google Scholar 

  8. Bunke O, Milhaud X (1998) Asymptotic behavior of Bayes estimates under possibly incorrect models. Ann Stat 26(2):617–644

    Article  MathSciNet  MATH  Google Scholar 

  9. Cai X, Wen L, Wu X, Zhou X (2015) Credibility estimation of distribution functions with applications to experience rating in general insurance. N Am Actuar J 19(4):311–335

    Article  MathSciNet  Google Scholar 

  10. Choi T, Ramamoorthi RV (2008) Remarks on consistency of posterior distributions. Pushing the limits of contemporary statistics: contributions in honor of Jayanta K. Ghosh. Inst Math Stat Collect 3:170–186

    Article  Google Scholar 

  11. Choi T, Schervish M (2007) On posterior consistency in nonparametric regression problems. J Multivar Anal 98:1969–1987

    Article  MathSciNet  MATH  Google Scholar 

  12. de Alba E (2002) Bayesian estimation of outstanding claim reserves. N Am Actuar J 6(4):1–20

    Article  MathSciNet  MATH  Google Scholar 

  13. de Alba E (2006) Claim reserving when there are negative values in the runoff triangle. N Am Actuar J 10(3):45–59

    Article  MathSciNet  Google Scholar 

  14. De Blasi P, Walker SG (2013) Bayesian asymptotics with misspecified models. Stat Sin 23:169–187

    MathSciNet  MATH  Google Scholar 

  15. Diaconis P, Freedman D (1986) On the consistency of Bayes estimates. Ann Stat 14(1):1–26

    Article  MathSciNet  MATH  Google Scholar 

  16. Doob JL (1949) Application of the theory of martingales. In: Le Calcul des Probabilités et ses applications. Colloques Internationaux du Centre National de la Recherche Scientifique. Paris. pp 23–27

  17. Escoto B (2013) Bayesian claim severity with mixed distributions. Variance 7(2):110–122

    Google Scholar 

  18. Fellingham GW, Kottas A, Hartman BM (2015) Bayesian nonparametric predictive modeling of group health claims. Insur Math Econ 60:1–10

    Article  MathSciNet  MATH  Google Scholar 

  19. Ferguson TS (1973) Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230

    Article  MathSciNet  MATH  Google Scholar 

  20. Gangopadhyay A, Gau WC (2007) Bayesian nonparametric approach to credibility modeling. Ann Actuar Sci 2(I):91–114

    Article  Google Scholar 

  21. Ghosal S (2010) The Dirichlet process, related priors and posterior asymptotics. In: Nils Hjort, Chris Holmes, Peter Müller, and Stephen G. Walker (eds) Bayesian nonparametrics. Cambridge University Press, Cambridge. pp 35–79

  22. Ghosal S, Ghosh JK, Ramamoorthi RV (1999) Posterior consistency of Dirichlet mixtures in density estimation. Ann Stat 27:143–158

    Article  MathSciNet  MATH  Google Scholar 

  23. Ghosh JK, Ramamoorthi RV (2003) Bayesian nonparametrics. Springer, New York

    MATH  Google Scholar 

  24. Hong L, Martin R (2016) Discussion on “Credibility Estimation of Distribution Functions with Applications to Experience Rating in General Insurance”. N Am Actuar J 20(1):95–98

    Article  MathSciNet  Google Scholar 

  25. Hong L, Martin R (2017) A flexible Bayesian nonparametric model for predicting future insurance claims. N Am Actuar J. doi:10.1080/10920277.2016.1247720

  26. Jara A, Hanson T, Quintana F, Müller P, Rosner G (2011) DPpackage: Bayesian semi- and nonparametric modeling in R. J Stat Softw 40(1):1–30

    Google Scholar 

  27. Jeon Y, Kim JHT (2013) A gamma kernel density estimation for insurance loss data. Insur Math Econ 53:569–579

    Article  MathSciNet  MATH  Google Scholar 

  28. Kaas R, Dannenburg D, Goovaerts M (1997) Exact credibility for weighted observations. ASTIN Bull 27(2):287–295

    Article  Google Scholar 

  29. Kass RE, Wasserman L (1996) The selection of prior distributions by formal rules. J Am Stat Assoc 91:1343–1370

    Article  MATH  Google Scholar 

  30. Kleijn BJK, van der Vaart AW (2006) Misspecification in infinite-dimensional Bayesian statistics. Ann Stat 34(2):837–877

    Article  MathSciNet  MATH  Google Scholar 

  31. Klugman SA (1992) Bayesian statistics in actuarial science with emphasis on credibility. Kluwer, Boston

    Book  MATH  Google Scholar 

  32. Klugman SA, Panjer HH, Willmot GE (2008) Loss models: from data to decisions, 3rd edn. Wiley, Hoboken

    Book  MATH  Google Scholar 

  33. Kuo H (1975) Gaussian measures in banach spaces. Springer, New York

    Book  MATH  Google Scholar 

  34. Lau WJ, Siu TK, Yang H (2006) On Bayesian mixture credibility. ASTIN Bull 36(2):573–588

    Article  MathSciNet  MATH  Google Scholar 

  35. Lee SCK, Lin XS (2010) Modeling and evaluating insurance losses via mixtures of Erlang distributions. N Am Actuar J 14(1):107–130

    Article  MathSciNet  Google Scholar 

  36. Lehmann EL (2006) Nonparametrics: statistical methods based on ranks, revised first edition. Springer, New York

    Google Scholar 

  37. Lehmann EL, Casella G (1998) Theory of point estimation, 2nd edn. New York, Springer

    MATH  Google Scholar 

  38. Makov UE, Smith AFM, Liu YH (1996) Bayesian methods in actuarial science. Statistician 45(4):503–515

    Article  Google Scholar 

  39. Makov UE (2001) Principal applications of Bayesian methods in actuarial science. N Am Actuar J 5(4):53–57

    Article  MathSciNet  MATH  Google Scholar 

  40. Merz M, Wüthrich MV (2010) Paid-incurred chain claims reserving methods. Insur Math Econ 46:568–579

    Article  MathSciNet  MATH  Google Scholar 

  41. Ntzoufras I, Dellaportas P (2002) Bayesian modeling of outstanding liabilities incorporating claim count uncertainty. N Am Actuar J 6(1):113–125

    Article  MathSciNet  MATH  Google Scholar 

  42. Pan M, Wang R, Wu X (2008) On the consistency of credibility premiums regarding Esscher principles. Insur Math Econ 42:119–126

    Article  MathSciNet  MATH  Google Scholar 

  43. Ramamoorthi RV, Sriram K, Martin R (2015) On posterior concentration in misspecified models. Bayesian Anal 10:759–789

    Article  MathSciNet  MATH  Google Scholar 

  44. Rempala GA, Derrig RA (2005) Modeling hidden exposures in claim severity via the EM algorithm. N Am Actuar J 9(2):108–128

    Article  MathSciNet  MATH  Google Scholar 

  45. Schervish MJ (1995) Theory of statistics. Springer, New York

    Book  MATH  Google Scholar 

  46. Schmidt KD (1991) Convergence of Bayes and credibility premiums. ASTIN Bull 20(2):167–172

    Article  Google Scholar 

  47. Schwartz L (1965) On bayes procedures. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 4:10–26

    Article  MathSciNet  MATH  Google Scholar 

  48. Scollnik DPM (2001) Actuarial modeling with MCMC and BUGS. N Am Actuar J 5(2):96–124

    Article  MathSciNet  MATH  Google Scholar 

  49. Shen X, Wasserman L (2001) Rates of convergence of posterior distributions. Ann Stat 29(3):687–714

    Article  MathSciNet  MATH  Google Scholar 

  50. Shi P, Basu S, Meyers GG (2012) A Bayesian lognormal model for multivariate loss reserving. N Am Actuar J 16(1):1–29

    Article  MathSciNet  MATH  Google Scholar 

  51. Shyamalkumar ND (1996) Cyclic \(I_0\) projections and its applications in statistics. Purdue University Technical Report \(\#\) 96–24

  52. Tokdar ST (2006) Posterior consistency of Dirichlet location-scale mixture of normals in density estimation and regression. Sankhyā 67(4):90–110

    MathSciNet  MATH  Google Scholar 

  53. van der Geer S (2003) Asymptotic theory for maximum likelihood in nonparametric mixture models. Comput Stat Data Anal 41:453–464

    Article  MathSciNet  MATH  Google Scholar 

  54. van der Vaart AW, Wellner J (1996) Weak convergence and empirical processes: with applications to statistics. Springer, New York

    Book  MATH  Google Scholar 

  55. Wald A (1949) Note on the consistency of the maximum likelihood estimate. Ann Math Stat 20:595–601

    Article  MathSciNet  MATH  Google Scholar 

  56. Walker SG (2003) On sufficient conditions for Bayesian consistency. Biometrika 90:482–488

    Article  MathSciNet  MATH  Google Scholar 

  57. Walker SG (2004) New approaches to Bayesian consistency. Ann Stat 32:2028–2043

    Article  MathSciNet  MATH  Google Scholar 

  58. Werner G, Modlin C (2010) Basic ratemaking. Casualty Actuarial Society, Arlington

    Google Scholar 

  59. Wu Y, Ghosal S (2008) Kullback Leibler property of kernel mixture priors in Bayesian density estimation. Electron J Stat 2:298–331

    Article  MathSciNet  MATH  Google Scholar 

  60. Wüthrich MV (2012) “A Bayesian log-normal model for multivariate loss reserving, Peng Shi, Sanjib Basu, and Glenn G. Meyers, March 2012”. N Am Actuar J 16(2):398–401

    Article  MathSciNet  MATH  Google Scholar 

  61. Zhang Y, Dukic V (2013) Predicting multivariate insurance loss payments under the Bayesian copula framework. J Risk Insur 80(4):891–919

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank the Editor and two anonymous reviewers for their thoughtful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liang Hong.

Deferred technical details

Deferred technical details

1.1 From Example 3.3

Since

$$\begin{aligned} \sup _{\theta >0}T(X_1, \theta )=T(X_1, \theta _{X_1})=\log \Bigl [\Bigl (\frac{\beta }{\theta ^\star }\Bigr )^{\beta }\frac{1}{(1+\beta )^{\beta +1}} \frac{(x+\theta ^\star )^{\beta +1}}{x}\Bigr ], \end{aligned}$$

we have

$$\begin{aligned} E_{\theta ^\star }\Bigl \{\sup _{\theta >0}T(X_1, \theta )\Bigl \} \le \log \Bigl [\Bigl (\frac{\beta }{\theta ^\star }\Bigr )^{\beta }\frac{1}{(\beta +1)^{\beta +1}}\Bigr ]+(1+\beta )|E_{\theta ^\star }[\log (X_1+\theta ^\star )]|+|E_{\theta ^\star }[\log X_1]|. \end{aligned}$$

In view of

$$\begin{aligned} E_{\theta ^\star }[\log X_1]=\beta (\theta ^\star )^{\beta }\int _0^{\infty } \frac{\log x}{(x+\theta ^\star )^{\beta +1}} dx, \end{aligned}$$

and

$$\begin{aligned} E_{\theta ^\star }[\log (x+\theta ^\star )]=\beta (\theta ^\star )^{\beta }\int _0^{\infty }\frac{\log (x+\theta ^\star )}{(x+\theta ^\star )^{\beta +1}}dx, \end{aligned}$$

it suffices to show that both integrals are finite.

To see that the first integral is finite, consider the complex-valued function \(f(z)=\frac{\log ^2 z}{(\theta +z)^{\beta +1}}\). It is clear that f has a pole at \(z=-\theta\). Take \(\epsilon >0\) and \(0<\rho<1<R\). Let \(L_1\) be the line segment \([\rho +\epsilon i, R+\epsilon i]\), \(L_2\) be the line segment \([\rho -\epsilon i, R-\epsilon i]\), \(\gamma _{\rho }\) be the part of the circle \(|z|=\rho\) from \(\rho -\epsilon i\) clockwise to \(\rho +\epsilon i\), and \(\gamma _R\) be the part of the circle \(|z|=R\) from \(R+\epsilon i\) counterclockwise to \(R-\epsilon i\). Then the Residue Theorem implies

$$\begin{aligned}&\int _{L_1}\frac{\log ^2 x}{(\theta +x)^{\beta +1}}dx+\int _{\gamma _R}\frac{\log z}{(\theta +z)^{\beta +1}}dz+\int _{L_2}\frac{(\log ^2 x+4\pi i\log x-4\pi ^2)}{(\theta +x)^{\beta +1}}dx\\&\quad+ \int _{\gamma _{\rho }}\frac{\log z}{(\theta +z)^{\beta +1}}dz=2\pi i \, \mathrm {Res}(f, -\theta ), \end{aligned}$$

where \(\mathrm {Res}(f, -\theta )\) denotes the residue of f at \(-\theta\). We have

$$\begin{aligned} \bigg | \int _{\gamma _R}\frac{\log z}{(\theta +z)^{\beta +1}}dz \bigg | \le 2\pi R\frac{|\log R+2\pi |}{(R-\theta )^{\beta +1}}\rightarrow 0 \quad {\text {as}}\,\,R\rightarrow \infty , \end{aligned}$$

and

$$\begin{aligned} \bigg | \int _{\gamma _{\rho }}\frac{\log z}{(\theta +z)^{\beta +1}}dz \bigg | \le 2\pi \rho \frac{|\log \rho +2\pi |}{(\rho -\theta )^{\beta +1}}\rightarrow 0 \quad {\text {as}}\,\,\rho \rightarrow 0. \end{aligned}$$

It follows that

$$\begin{aligned} i\int _0^{\infty } \frac{\log x}{(\theta +x)^{\beta +1}} dx=\frac{i}{2}Res(f, -\theta )+\pi \int _0^{\infty }\frac{1}{(\theta +x)^{\beta +1}}dx. \end{aligned}$$

Therefore, \(\int _0^{\infty } \frac{\log x}{(\theta +x)^{\beta +1}} dx\) equals the imaginary part of \(\frac{i}{2} \, \mathrm {Res}(f, -\theta )\) which is finite. Similarly, the second integral is also finite.

1.2 From Example 4.3

1.2.1 Verification of Conditions 1 and 2 of Theorem 4.1

To check Conditions 1–2 of Theorem 4.1, we need to identify a candidate sieve \(\Theta _n\). Fix an arbitrary \(\varepsilon \in (0,1)\) and let \(\delta < \varepsilon /4\) be as in Theorem 4.1. Given \(b> a > 0\), define the set of gamma scale mixtures \(\Theta _{a,b}^\delta =\{\theta _G: G((a,b]) \ge 1-\delta \}\). We will show that \(\Theta _n := \Theta _{a_n,b_n}^\delta\), with \(a_n=e^{-cn}\) and \(b_n=e^{cn}\), satisfies Conditions 1–2 of the theorem, where

$$\begin{aligned} c < \Bigl ( 1 + \log \frac{6 + \delta }{\delta } \Bigr )^{-1} \frac{\varepsilon ^2}{16}. \end{aligned}$$

We begin with checking Condition 2 about the metric entropy. Let d be the \(L^1\)-distance and let \(k_u\) denote the gamma density with scale parameter u (and fixed shape s). Without loss of generality, let \(v> u > 0\). Then we get

$$\begin{aligned} d(k_u, k_v)&= \frac{1}{\Gamma (s)} \int _0^{\infty } x^{s-1} |u^{-s} e^{-x/u} - v^{-s} e^{-x/v}| \, dx \\&\le \frac{1}{\Gamma (s)} \Bigl \{u^{-s} \int _0^\infty x^{s-1} |e^{-x/u} - e^{-x/v}| \,dx + (u^{-s} - v^{-s}) \int _0^\infty x^{s-1} e^{-x/v} \,dx \Bigr \} \\&= 2 \Bigl ( \frac{v^s}{u^s} - 1 \Bigr ). \end{aligned}$$

For the \(b> a > 0\) and \(\delta\) introduced above, let \(z=(1 + \delta /2)^{1/s}\) and define

$$\begin{aligned} u_m = a z^m, \quad m=1,\ldots ,M, \end{aligned}$$
(5)

where M is the smallest integer such that \(a z^{(M+1)} \ge b\), i.e., \(M \le (\log z)^{-1} \log (b/a)\). If we partition (ab] by the sub-intervals

$$\begin{aligned} E_m = (a z^m, a z^{m+1}], \quad m=1,\ldots ,M, \end{aligned}$$

then it is easy to check, based on the bound on the \(L^1\)-distance above, that

$$\begin{aligned} u,v \in E_m \implies d(k_u, k_v) \le \delta . \end{aligned}$$

Let \(\Delta _M\) be the probability simplex in \(\mathbb {R}^{M+1}\) and let \(\Delta _M^\delta\) be a \((\delta /6)\)-netFootnote 3 in \(\Delta _M\). An argument similar to that in the proofs of Lemmas 1 and 2 of Ghosal et al. [22] shows that

$$\begin{aligned} \Bigl \{\sum _{m=1}^M P_m k_{u_m}: (P_1,\ldots ,P_M) \in \Delta _M^\delta \Bigr \}, \end{aligned}$$

a set of finite scale mixtures of gammas with fixed scales \(u_1,\ldots ,u_M\) in (5), is a \(\delta\)-net in \(\Theta _{a,b}^\delta\). Therefore, \(\log N(\Theta _{a,b}^\delta , \delta , d) \le \log N(\Delta _M, \delta /6, \Vert \cdot \Vert _1)\), and the argument in the proof of Lemma 1 in Ghosal et al. [22], based on Lemma 8 in Barron et al. [2], shows that

$$\begin{aligned} \log N(\Delta _M, \delta /6, \Vert \cdot \Vert _1) \le \Bigl ( 1 + \log \frac{6 + \delta }{\delta } \Bigr ) M. \end{aligned}$$

Specifically, we put

$$\begin{aligned} D=\left\{ (P_1, \ldots , P_m)\in \Delta _M: \sum _{m=1}^MP_m\le 1+\frac{\delta }{6}\right\} . \end{aligned}$$

Let \((P_1, \ldots , P_m)\in \Delta _M\) and \((\widetilde{P}_1, \ldots , \widetilde{P}_m)\in \Delta _M^{\delta }\). Then \(||P_m-\widetilde{P}_m||_1<\delta /(6M)\) implies \(\sum _{m=1}^M||P_m-\widetilde{P}_m||_1<\delta /6\). Therefore, the \(\delta /6\)-covering number \(N(\Delta _M, \delta /6, ||\cdot ||_1)\) of \(\Delta _M\) might be bounded above by the product of the number of cubes of length \(\delta /6\) covering \([0, 1]^M\) and the volume of D, which is further bounded above by

$$\begin{aligned} \frac{1}{M!} \left( \frac{6M}{\delta }\right) ^M\left( 1+\frac{\delta }{6}\right) ^M. \end{aligned}$$

It follows that

$$\begin{aligned} \log (\Delta _M, \delta /6, ||\cdot ||_1)\le & {} -\log M!+M\log \left( \frac{6M}{\delta }\right) +M\left( 1+\frac{\delta }{6}\right) \\\le & {} -M\log M + M +M \log M +M\log \frac{6+\delta }{\delta }\\= & {} \left( 1+\log \frac{6+\delta }{\delta }\right) M. \end{aligned}$$

Since M is bounded by a constant (depending only on \(\delta\)) times \(\log (b/a)\), we clearly have that Condition 2 of Theorem 4.1 holds with \(a_n = e^{-cn}\) and \(b_n = e^{cn}\).

For Condition 1 about the prior mass assigned to the sieve \(\Theta _n = \Theta _{a_n,b_n}^\delta\), it suffices to bound the prior probability of \(\{G: G((a_n, b_n]) \ge 1-\delta \}\). A fundamental property of the Dirichlet process is that G(A) has a beta distribution with parameters \(\alpha G_0(A)\) and \(\alpha G_0(A^c)\). In the present case, if we let

$$\begin{aligned} \alpha _n = \alpha G_0((a_n, b_n]) \quad {\text {and}} \quad \beta _n = \alpha \{1 - G_0((a_n, b_n])\}, \end{aligned}$$

then we have

$$\begin{aligned} \Pi (\{G: G((a_n,b_n]) < 1-\delta \}) = \frac{1}{B(\alpha _n,\beta _n)} \int _0^{1-\delta } z^{\alpha _n-1} (1-z)^{\beta _n-1} \,dz, \end{aligned}$$

where \(B(a,b) = \Gamma (a)\Gamma (b)/\Gamma (a+b)\) is the beta function. The right-hand side of the previous display is upper-bounded by

$$\begin{aligned} \frac{1}{B(\alpha _n, \beta _n)} (1-\delta )^{\alpha _n-1} \frac{1-\delta ^{\beta _n}}{\beta _n}. \end{aligned}$$

As \(n \rightarrow \infty\), it is clear that \(\alpha _n \rightarrow \alpha\) and \(\beta _n \rightarrow 0\). Some simple analysis shows that the latter two terms in the upper bound have finite and non-zero limits as \(n \rightarrow \infty\), so only the beta function term will be relevant. Using some basic properties of the gamma function we have

$$\begin{aligned} B(\alpha _n, \beta _n) = \frac{\Gamma (\alpha _n) \Gamma (\beta _n)}{\Gamma (\alpha _n + \beta _n)} = \{1 + o(1)\} \frac{\Gamma (\beta _n + 1)}{\beta _n} = \frac{O(1)}{\beta _n}. \end{aligned}$$

Therefore, the prior probability of the sieve vanishes at the rate \(\beta _n = \alpha \{1-G_0((a_n,b_n])\}\) as \(n \rightarrow \infty\). Following Lau et al. [34], if we take \(G_0\) to be a gamma distribution with shape parameter t and scale parameter r, then it is easy to see that

$$\begin{aligned} G_0((0,a_n]) = \int _0^{a_n} \frac{1}{r^t \Gamma (t)} x^{t-1} e^{-x/r} \,dx \lesssim a_n^{t-1} \end{aligned}$$

and, by Markov’s inequality,

$$\begin{aligned} G_0((b_n,\infty )) \lesssim b_n^{-1}. \end{aligned}$$

With \(a_n = e^{-cn}\) and \(b_n = e^{cn}\), it is clear that Condition 1 of Theorem 4.1 holds.

1.2.2 The best scale-mixture of gammas is a single gamma

Recall that \(\theta ^\star\) is a \(\mathsf{Gamma}(\alpha ^\star , \lambda ^\star )\) density and \(\theta ^\lambda\) is a \(\mathsf{Gamma}(\alpha , \lambda )\) density, where \(\alpha < \alpha ^\star\). To prove that the best mixture \(\theta _G = \int \theta ^\lambda \, G(d\lambda )\) corresponds to just a single gamma density with an appropriately chosen rate \(\lambda\), we need to consider the projection of \(\theta ^\star\) onto the space of mixtures \(\theta _G\). The claim is that the best mixture approximation to \(\theta ^\star\), according to the Kullback–Leibler divergence, is one with G equal to a point mass \(\delta _\Lambda\) at

$$\begin{aligned} \Lambda = \Lambda (\alpha , \alpha ^\star , \lambda ^\star ) = \frac{\alpha \lambda ^\star }{\alpha ^\star }, \end{aligned}$$
(6)

which is the value of \(\lambda\) that makes the mean of \(\theta ^\lambda\) the same as that of \(\theta ^\star\). In this context, according to Lemma 2.5 in Shyamalkumar [51] and Lemma 2.3 in Kleijn and van der Vaart [30], it suffices to show that

$$\begin{aligned} \sup _G \int \theta ^\star (x) \frac{\theta _G(x)}{\theta ^\Lambda (x)} \,dx \le 1, \end{aligned}$$

where \(\Lambda\) is as in (6). Writing out the definition of \(\theta _G\) and switching order of integration, we can see that it suffices to show that

$$\begin{aligned} \int \theta ^\star (x) \frac{\theta ^\lambda (x)}{\theta ^\Lambda (x)} \,dx \le 1 \quad \forall \; \lambda > 0. \end{aligned}$$
(7)

A straightforward calculation shows that

$$\begin{aligned} \int \theta ^\star (x) \frac{\theta ^\lambda (x)}{\theta ^\Lambda (x)} \,dx = \Bigl ( \frac{\lambda ^\star }{\lambda ^\star - \Lambda + \lambda } \Bigr )^{\alpha ^\star } \Bigl (\frac{\lambda }{\Lambda } \Bigr )^\alpha = \Bigl ( \frac{\lambda ^\star }{\lambda + \frac{\alpha ^\star -\alpha }{\alpha ^\star } \lambda ^\star } \Bigr )^{\alpha ^\star } \Bigl ( \frac{\alpha ^\star \lambda }{\alpha \lambda ^\star } \Bigr )^\alpha . \end{aligned}$$

The right-hand side clearly equals 1 if \(\lambda = \Lambda\); this is also obvious from the definition. Moreover, using the fact that \(\alpha < \alpha ^\star\), it is an easy calculus exercise to show that the right-hand side is actually maximized at \(\lambda = \Lambda = (\alpha /\alpha ^\star )\lambda ^\star\). This justifies the intuition from Example 4.3 that the best Kullback–Leibler approximation of \(\theta ^\star\) (or \(\eta ^\star\)) by mixtures of \(\theta ^\lambda\) (or \(\eta ^\lambda\)) is just a single \(\theta ^\Lambda\) (or \(\eta ^\Lambda\)), for suitably chosen \(\Lambda\). Therefore, \(K(\theta ^\star , \theta _G) \ge K(\theta ^\star , \theta ^\Lambda )\), and the lower-bound is strictly positive.

1.3 From Example 4.4

We first want to show that the proposed prior satisfies the Kullback–Leibler property at \(\theta ^\star\). Let \(p_\theta (x,y)\) be the joint density of (XY) under the proposed nonparametric regression model. For any two regression functions \(\theta\) and \(\eta\), the Kullback–Leibler divergence \(K(p_{\theta }, p_{\eta })\) of \(p_{\eta }\) from \(p_{\theta }\) can be expressed in terms of the \(L^2\)-distance between \(\theta\) and \(\eta\). Indeed, since

$$\begin{aligned} \{y-\eta (x)\}^2=\{y-\theta (x)\}^2+2\{\theta (x)-\eta (x)\}\{y-\theta (x)\} +\{\theta (x)-\eta (x)\}^2, \end{aligned}$$

and \(\int \{y-\theta (x)\}p_{\theta }(x, y) \, dy=0\) for all x, we have

$$\begin{aligned} K(p_{\theta }, p_{\eta })= & {} \frac{1}{2} \int \int \{y-\eta (x)\}^2-\{y-\theta (x)\}^2 p_{\theta }(x, y) \, dy \, dx\nonumber \\= & {} \frac{1}{2}\int \{\theta (x)-\eta (x)\}^2 \, dx\nonumber \\= & {} \frac{1}{2}\Vert \theta -\eta \Vert _2^2, \end{aligned}$$
(8)

where \(\Vert \cdot \Vert _2\) denotes the \(L^2\)-norm on \([0, 1]^q\). In view of (8), to verify the Kullback–Leibler property of \(\Pi\) at \(\theta ^\star\), it suffices to show that \(\Pi (\{\theta : \Vert \theta -\theta _0\Vert _2 \le \varepsilon \}>0\) for all \(\varepsilon >0\). Since \(\theta \in L^2([0, 1]^q)\), the Bessel inequality and (4) imply that for any \(\varepsilon >0\), there exists an integer J such that

$$\begin{aligned} \sum _{j>J}(\sigma _j^2+\theta _j^{\star 2})\le \frac{\varepsilon ^2}{4}. \end{aligned}$$

Following Shen and Wasserman ([49, Lemma 5), we have

$$\begin{aligned} \Pi \{\theta : \Vert \theta -\theta ^\star \Vert _2\le \varepsilon \}&=\Pi \Bigl \{\theta : \sum _j(\theta _j-\theta _j^\star )^2<\varepsilon ^2\Bigr \}\\&\ge \Pi \Bigl \{\theta : \sum _{j=1}^J(\theta _j-\theta _j^\star )^2<\frac{\varepsilon ^2}{2}\Bigr \} \, \Pi \Bigl \{\theta : \sum _{j>J}(\theta _j-\theta _j^\star )^2< \frac{\varepsilon ^2}{2} \Bigr \}\\&= \Pi \Bigl \{\theta : \sum _{j=1}^J(\theta _j-\theta _j^\star )^2< \frac{\varepsilon ^2}{2}\Bigr \} \Bigl [1-\Pi \Bigl \{\theta : \sum _{j>J}(\theta _j-\theta _j^\star )^2> \frac{\varepsilon ^2}{2} \Bigr \}\Bigr ]\\&\ge \Pi \Bigl \{\theta : \sum _{j=1}^J(\theta _j-\theta _j^\star )^2< \frac{\varepsilon ^2}{2} \Bigr \} \Bigl [1-\frac{2}{\varepsilon ^2}\sum _{j>J}E(\theta _j-\theta _j^\star )^2\Bigr ]\\&\ge \frac{1}{2} \Pi \Bigl \{\theta : \sum _{j=1}^J(\theta _j-\theta _j^\star )^2<\frac{\varepsilon ^2}{2} \Bigr \}. \end{aligned}$$

The first inequality is due to monotonicity of \(\Pi\) and independent of the \(\theta _j\)’s under \(\Pi\); the second inequality is due to the Markov inequality; and the third inequality follows from the definition of J and the fact \(E(\theta _j-\theta _j^\star )^2=\sigma _j^2+\theta _j^{\star 2}\). The lower bound involves a ball probability for a J-dimensional normal distribution and, since such a distribution has a positive density on \(\mathbb {R}^J\), the latter probability is positive, proving that \(\Pi \{\theta : \Vert \theta -\theta ^\star \Vert _2\le \varepsilon \}>0\) for all \(\varepsilon >0\). Therefore, the given \(\Pi\), with variance \(\sigma _j^2\), satisfies the Kullback–Leibler condition at any \(\theta ^\star\) such that (4) holds.

Next, to prove consistency with respect to the \(L^2\)-distance d, it suffices to work with the \(L^\infty\)-norm \(\Vert \cdot \Vert _\infty\), since \(\Vert \cdot \Vert _2 \le \Vert \cdot \Vert _\infty\). For an increasing sequence \(M_n\) to be specified later, define the sets

$$\begin{aligned} \Theta _{n0}=\{\theta : ||\theta ||_{\infty }<M_n\}\quad {\text {and}}\,\,\Theta _{n1}=\{\theta : ||\theta '||_{\infty }<M_n\}. \end{aligned}$$

Take \(\Theta _n=\Theta _{n0} \cap \Theta _{n1}\) as the sieve set. By the relation between \(L^2\)- and \(L^{\infty }\)-distance on regression functions, it is clear that the \(L^2\) covering numbers for \(\Theta _n\) are less than the corresponding \(L^{\infty }\) covering numbers. It follows from Theorem 2.7.1 in [54] that the \(\delta\)-covering number of \(\Theta _n\), relative to the \(L^{\infty }\)-distance, satisfies \(\log N(\Theta _n, \delta , \Vert \cdot \Vert _{\infty }) \le CM_n\delta ^{-1}\), where C is a constant that does not depend on n or \(\delta\). If we take \(\delta <\epsilon /4\), as in Theorem 4.1, then we can get \(\log (\Theta _n, \delta , \Vert \cdot \Vert _{\infty })<nr\), for \(\beta =\delta ^2<\epsilon ^2/16<\epsilon ^2/8\), by selecting \(M_n= r n\), where \(r<\delta ^3/C\). This verifies Conditions 2 of Theorem 4.1.

To verify Condition 1 of Theorem 4.1, we need to show that the \(\Pi\)-probability of \(\Theta _n^c\) is exponentially small. To prove this, it suffices to show that both \(\Theta ^c_{n0}\) and \(\Theta ^c_{n1}\) have exponentially small \(\Pi\)-probability. Start with \(\Theta _{n0}\). Like in Choi and Schervish ([11], Sec. 6.1), we have the following, based on the Chernoff inequality:

$$\begin{aligned} \Pi \{\theta : ||\theta ||_{\infty }>M_n\}\le & {} \Pi \Bigl \{\theta : \sum _j a_j|\theta _j|>M_n\Bigr \}\\\le & {} e^{-tM_n}E\bigl (e^{t\sum _ja_j|\theta _j|}\bigr ), \quad \forall \; t>0\\= & {} e^{-tM_n}\prod _jE\bigl (e^{ta_j\sigma _j|Z_j|}\bigr )\\= & {} e^{-tM_n}e^{(t^2/2)\sum _ja_j^2\sigma _j^2}\prod _j2\Phi (a_j\sigma _jt), \end{aligned}$$

where \(Z_j\) is a standard normal random variable with density \(\phi\) and distribution function \(\Phi\). Here we used the formula for the moment-generating function of the half-normal |Z|:

$$\begin{aligned} \frac{1}{\sqrt{2\pi }}\int _{-\infty }^{\infty }e^{t|z|}e^{-z^2/2}dz= & {} \frac{2}{\sqrt{2\pi }}\int _0^{\infty }e^{tz}e^{-z^2/2}dz \\= & {} \frac{2e^{t^2/2}}{\sqrt{2\pi }}\int _0^{\infty }e^{-(z-t)^2/2}dz\\= & {} \frac{2e^{t^2/2}}{\sqrt{2\pi }}\int _{-t}^{\infty }e^{-u^2/2}du\quad (u=z-t)\\= & {} 2e^{t^2/2}\Phi (t). \end{aligned}$$

Since \(\Phi (z)\) is concave on \([0, \infty )\), it can be bounded from above, for small \(z>0\), by its first-order Taylor approximation at \(z=0\). This implies that \(\log \{2\Phi (z)\}\le \log \{1+2\phi (0)z\}\) which, in our cases, gives

$$\begin{aligned} \log \prod _j2\Phi (a_j\sigma _jt)\le \sum _j\log \left( 1+\frac{2a_j\sigma _jt}{\sqrt{2\pi }}\right) \le \frac{2t}{\sqrt{2\pi }}\sum _j a_j\sigma _j. \end{aligned}$$

By assumption \(\sum _j a_j\sigma _j<\infty\), we also have \(\sum _ja_j^2\sigma ^2_j<\infty\). In addition, we have \(M_n=O(n)\). Thus, the upper bound for \(\Pi (\Theta ^c_{n0})\) is of the form \(c_1e^{-c_2n}\) for some constants \(c_1\) and \(c_2\). The exact same calculation, with \(b_j\) in place of \(a_j\), gives \(\Pi (\Theta ^c_{n1})\le c_1e^{-c_2n}\) for some different \(c_1\) and \(c_2\). Since \(\Pi (\Theta ^c_{n})\le \Pi (\Theta ^c_{n0})+\Pi (\Theta ^c_{n1})\), Condition 2 of Theorem 4.1 has been verified. Therefore, the posterior distribution is consistent with respect to \(L^2\)-distance at any \(\theta ^\star\) satisfying (4).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hong, L., Martin, R. A review of Bayesian asymptotics in general insurance applications. Eur. Actuar. J. 7, 231–255 (2017). https://doi.org/10.1007/s13385-017-0151-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13385-017-0151-5

Keywords

Navigation