A review of Bayesian asymptotics in general insurance applications

Hong, Liang; Martin, Ryan

doi:10.1007/s13385-017-0151-5

A review of Bayesian asymptotics in general insurance applications

Original Research Paper
Published: 25 April 2017

Volume 7, pages 231–255, (2017)
Cite this article

European Actuarial Journal Aims and scope Submit manuscript

Liang Hong¹ &
Ryan Martin²

432 Accesses
6 Citations
Explore all metrics

Abstract

Over the last two decades, Bayesian methods have been widely used in general insurance applications, ranging from credibility theory to loss-reserves estimation, but this literature rarely addresses questions about the method’s asymptotic properties. In this paper, we review the Bayesian’s notion of posterior consistency in both parametric and nonparametric models and its implication on the sensitivity of the posterior to the actuary’s choice of prior. We review some of the techniques for proving posterior consistency and, for illustration, we apply these results to investigate the asymptotic properties of several recently proposed Bayesian methods in general insurance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation of the Ruin Probability in Infinite Time for Heavy Right-Tailed Losses

Weighted allocations, their concomitant-based estimators, and asymptotics

Article 07 April 2018

Bayesian estimation of the threshold of a generalised pareto distribution for heavy-tailed observations

Article Open access 05 August 2016

Notes

This modern use of the term “nonparametric” differs a bit from that in the classical setting, e.g., [36], where it referred to methods free of distributional assumptions. The connection is that models depending on an infinite-dimensional parameter can avoid certain specifications. For example, in nonparametric regression (Example 4.4), the error distribution is normal, but taking the regression function itself to be the parameter avoids specifying a particular form like linear, quadratic, etc.
The assumption that the $X_i$s are uniformly distributed is not particularly special. The key assumption is that the distribution is known, i.e., does not depend on any unknown parameters.
By an $\epsilon$-net in a metric space (X, d), we mean a subset Y of X such that for any $x\in X$ there exists a $y\in Y$ such that $d(x, y)<\epsilon$.

References

Barron A (1988) The exponential convergence of posterior probabilities with implications for Bayes estimators of density functions. Technical Report 7, Department of Statistics, University of Illinois, Champaign, IL
Barron A, Schervish MJ, Wasserman L (1999) The consistency of posterior distributions in nonparametric problems. Ann Stat 27:536–561
Article MathSciNet MATH Google Scholar
Berger JO (1985) Statistical decision theory and bayesian analysis, 2nd edn. Springer, New York
Book MATH Google Scholar
Brockett PL, Chuang SL, Pitaktong U (2014) Generalized additive models and nonparametric regression. In: Predictive modeling applications in actuarial science. Cambridge University Press. pp 367–397
Bühlmann H (1967) Experience rating and credibility. ASTIN Bull 4:199–207
Article Google Scholar
Bühlmann H, Gisler A (2005) A course in credibility theory and its applications. Springer, New York
MATH Google Scholar
Bühlmann H, Straub E (1970) Glaubwürdigkeit für Schadensätze. Mitteilungen der Vereinigung Schweizerischer Versicherungs-Mathematiker 70:111–133
MATH Google Scholar
Bunke O, Milhaud X (1998) Asymptotic behavior of Bayes estimates under possibly incorrect models. Ann Stat 26(2):617–644
Article MathSciNet MATH Google Scholar
Cai X, Wen L, Wu X, Zhou X (2015) Credibility estimation of distribution functions with applications to experience rating in general insurance. N Am Actuar J 19(4):311–335
Article MathSciNet Google Scholar
Choi T, Ramamoorthi RV (2008) Remarks on consistency of posterior distributions. Pushing the limits of contemporary statistics: contributions in honor of Jayanta K. Ghosh. Inst Math Stat Collect 3:170–186
Article Google Scholar
Choi T, Schervish M (2007) On posterior consistency in nonparametric regression problems. J Multivar Anal 98:1969–1987
Article MathSciNet MATH Google Scholar
de Alba E (2002) Bayesian estimation of outstanding claim reserves. N Am Actuar J 6(4):1–20
Article MathSciNet MATH Google Scholar
de Alba E (2006) Claim reserving when there are negative values in the runoff triangle. N Am Actuar J 10(3):45–59
Article MathSciNet Google Scholar
De Blasi P, Walker SG (2013) Bayesian asymptotics with misspecified models. Stat Sin 23:169–187
MathSciNet MATH Google Scholar
Diaconis P, Freedman D (1986) On the consistency of Bayes estimates. Ann Stat 14(1):1–26
Article MathSciNet MATH Google Scholar
Doob JL (1949) Application of the theory of martingales. In: Le Calcul des Probabilités et ses applications. Colloques Internationaux du Centre National de la Recherche Scientifique. Paris. pp 23–27
Escoto B (2013) Bayesian claim severity with mixed distributions. Variance 7(2):110–122
Google Scholar
Fellingham GW, Kottas A, Hartman BM (2015) Bayesian nonparametric predictive modeling of group health claims. Insur Math Econ 60:1–10
Article MathSciNet MATH Google Scholar
Ferguson TS (1973) Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230
Article MathSciNet MATH Google Scholar
Gangopadhyay A, Gau WC (2007) Bayesian nonparametric approach to credibility modeling. Ann Actuar Sci 2(I):91–114
Article Google Scholar
Ghosal S (2010) The Dirichlet process, related priors and posterior asymptotics. In: Nils Hjort, Chris Holmes, Peter Müller, and Stephen G. Walker (eds) Bayesian nonparametrics. Cambridge University Press, Cambridge. pp 35–79
Ghosal S, Ghosh JK, Ramamoorthi RV (1999) Posterior consistency of Dirichlet mixtures in density estimation. Ann Stat 27:143–158
Article MathSciNet MATH Google Scholar
Ghosh JK, Ramamoorthi RV (2003) Bayesian nonparametrics. Springer, New York
MATH Google Scholar
Hong L, Martin R (2016) Discussion on “Credibility Estimation of Distribution Functions with Applications to Experience Rating in General Insurance”. N Am Actuar J 20(1):95–98
Article MathSciNet Google Scholar
Hong L, Martin R (2017) A flexible Bayesian nonparametric model for predicting future insurance claims. N Am Actuar J. doi:10.1080/10920277.2016.1247720
Jara A, Hanson T, Quintana F, Müller P, Rosner G (2011) DPpackage: Bayesian semi- and nonparametric modeling in R. J Stat Softw 40(1):1–30
Google Scholar
Jeon Y, Kim JHT (2013) A gamma kernel density estimation for insurance loss data. Insur Math Econ 53:569–579
Article MathSciNet MATH Google Scholar
Kaas R, Dannenburg D, Goovaerts M (1997) Exact credibility for weighted observations. ASTIN Bull 27(2):287–295
Article Google Scholar
Kass RE, Wasserman L (1996) The selection of prior distributions by formal rules. J Am Stat Assoc 91:1343–1370
Article MATH Google Scholar
Kleijn BJK, van der Vaart AW (2006) Misspecification in infinite-dimensional Bayesian statistics. Ann Stat 34(2):837–877
Article MathSciNet MATH Google Scholar
Klugman SA (1992) Bayesian statistics in actuarial science with emphasis on credibility. Kluwer, Boston
Book MATH Google Scholar
Klugman SA, Panjer HH, Willmot GE (2008) Loss models: from data to decisions, 3rd edn. Wiley, Hoboken
Book MATH Google Scholar
Kuo H (1975) Gaussian measures in banach spaces. Springer, New York
Book MATH Google Scholar
Lau WJ, Siu TK, Yang H (2006) On Bayesian mixture credibility. ASTIN Bull 36(2):573–588
Article MathSciNet MATH Google Scholar
Lee SCK, Lin XS (2010) Modeling and evaluating insurance losses via mixtures of Erlang distributions. N Am Actuar J 14(1):107–130
Article MathSciNet Google Scholar
Lehmann EL (2006) Nonparametrics: statistical methods based on ranks, revised first edition. Springer, New York
Google Scholar
Lehmann EL, Casella G (1998) Theory of point estimation, 2nd edn. New York, Springer
MATH Google Scholar
Makov UE, Smith AFM, Liu YH (1996) Bayesian methods in actuarial science. Statistician 45(4):503–515
Article Google Scholar
Makov UE (2001) Principal applications of Bayesian methods in actuarial science. N Am Actuar J 5(4):53–57
Article MathSciNet MATH Google Scholar
Merz M, Wüthrich MV (2010) Paid-incurred chain claims reserving methods. Insur Math Econ 46:568–579
Article MathSciNet MATH Google Scholar
Ntzoufras I, Dellaportas P (2002) Bayesian modeling of outstanding liabilities incorporating claim count uncertainty. N Am Actuar J 6(1):113–125
Article MathSciNet MATH Google Scholar
Pan M, Wang R, Wu X (2008) On the consistency of credibility premiums regarding Esscher principles. Insur Math Econ 42:119–126
Article MathSciNet MATH Google Scholar
Ramamoorthi RV, Sriram K, Martin R (2015) On posterior concentration in misspecified models. Bayesian Anal 10:759–789
Article MathSciNet MATH Google Scholar
Rempala GA, Derrig RA (2005) Modeling hidden exposures in claim severity via the EM algorithm. N Am Actuar J 9(2):108–128
Article MathSciNet MATH Google Scholar
Schervish MJ (1995) Theory of statistics. Springer, New York
Book MATH Google Scholar
Schmidt KD (1991) Convergence of Bayes and credibility premiums. ASTIN Bull 20(2):167–172
Article Google Scholar
Schwartz L (1965) On bayes procedures. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 4:10–26
Article MathSciNet MATH Google Scholar
Scollnik DPM (2001) Actuarial modeling with MCMC and BUGS. N Am Actuar J 5(2):96–124
Article MathSciNet MATH Google Scholar
Shen X, Wasserman L (2001) Rates of convergence of posterior distributions. Ann Stat 29(3):687–714
Article MathSciNet MATH Google Scholar
Shi P, Basu S, Meyers GG (2012) A Bayesian lognormal model for multivariate loss reserving. N Am Actuar J 16(1):1–29
Article MathSciNet MATH Google Scholar
Shyamalkumar ND (1996) Cyclic $I_0$ projections and its applications in statistics. Purdue University Technical Report $\#$ 96–24
Tokdar ST (2006) Posterior consistency of Dirichlet location-scale mixture of normals in density estimation and regression. Sankhyā 67(4):90–110
MathSciNet MATH Google Scholar
van der Geer S (2003) Asymptotic theory for maximum likelihood in nonparametric mixture models. Comput Stat Data Anal 41:453–464
Article MathSciNet MATH Google Scholar
van der Vaart AW, Wellner J (1996) Weak convergence and empirical processes: with applications to statistics. Springer, New York
Book MATH Google Scholar
Wald A (1949) Note on the consistency of the maximum likelihood estimate. Ann Math Stat 20:595–601
Article MathSciNet MATH Google Scholar
Walker SG (2003) On sufficient conditions for Bayesian consistency. Biometrika 90:482–488
Article MathSciNet MATH Google Scholar
Walker SG (2004) New approaches to Bayesian consistency. Ann Stat 32:2028–2043
Article MathSciNet MATH Google Scholar
Werner G, Modlin C (2010) Basic ratemaking. Casualty Actuarial Society, Arlington
Google Scholar
Wu Y, Ghosal S (2008) Kullback Leibler property of kernel mixture priors in Bayesian density estimation. Electron J Stat 2:298–331
Article MathSciNet MATH Google Scholar
Wüthrich MV (2012) “A Bayesian log-normal model for multivariate loss reserving, Peng Shi, Sanjib Basu, and Glenn G. Meyers, March 2012”. N Am Actuar J 16(2):398–401
Article MathSciNet MATH Google Scholar
Zhang Y, Dukic V (2013) Predicting multivariate insurance loss payments under the Bayesian copula framework. J Risk Insur 80(4):891–919
Article Google Scholar

Download references

Acknowledgements

The authors thank the Editor and two anonymous reviewers for their thoughtful comments and suggestions.

Author information

Authors and Affiliations

Department of Mathematics, Robert Morris University, 6001 University Boulevard, Moon, PA, 15108, USA
Liang Hong
Department of Statistics, North Carolina State University, 2311 Stinson Drive, Raleigh, NC, 27695, USA
Ryan Martin

Authors

Liang Hong
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Martin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liang Hong.

Deferred technical details

1.1 From Example 3.3

Since

$$\begin{aligned} \sup _{\theta >0}T(X_1, \theta )=T(X_1, \theta _{X_1})=\log \Bigl [\Bigl (\frac{\beta }{\theta ^\star }\Bigr )^{\beta }\frac{1}{(1+\beta )^{\beta +1}} \frac{(x+\theta ^\star )^{\beta +1}}{x}\Bigr ], \end{aligned}$$

we have

$$\begin{aligned} E_{\theta ^\star }\Bigl \{\sup _{\theta >0}T(X_1, \theta )\Bigl \} \le \log \Bigl [\Bigl (\frac{\beta }{\theta ^\star }\Bigr )^{\beta }\frac{1}{(\beta +1)^{\beta +1}}\Bigr ]+(1+\beta )|E_{\theta ^\star }[\log (X_1+\theta ^\star )]|+|E_{\theta ^\star }[\log X_1]|. \end{aligned}$$

In view of

$$\begin{aligned} E_{\theta ^\star }[\log X_1]=\beta (\theta ^\star )^{\beta }\int _0^{\infty } \frac{\log x}{(x+\theta ^\star )^{\beta +1}} dx, \end{aligned}$$

and

$$\begin{aligned} E_{\theta ^\star }[\log (x+\theta ^\star )]=\beta (\theta ^\star )^{\beta }\int _0^{\infty }\frac{\log (x+\theta ^\star )}{(x+\theta ^\star )^{\beta +1}}dx, \end{aligned}$$

it suffices to show that both integrals are finite.

To see that the first integral is finite, consider the complex-valued function $f(z)=\frac{\log ^2 z}{(\theta +z)^{\beta +1}}$. It is clear that f has a pole at $z=-\theta$. Take $\epsilon >0$ and $0<\rho<1<R$. Let $L_1$ be the line segment $[\rho +\epsilon i, R+\epsilon i]$, $L_2$ be the line segment $[\rho -\epsilon i, R-\epsilon i]$, $\gamma _{\rho }$ be the part of the circle $|z|=\rho$ from $\rho -\epsilon i$ clockwise to $\rho +\epsilon i$, and $\gamma _R$ be the part of the circle $|z|=R$ from $R+\epsilon i$ counterclockwise to $R-\epsilon i$. Then the Residue Theorem implies

$$\begin{aligned}&\int _{L_1}\frac{\log ^2 x}{(\theta +x)^{\beta +1}}dx+\int _{\gamma _R}\frac{\log z}{(\theta +z)^{\beta +1}}dz+\int _{L_2}\frac{(\log ^2 x+4\pi i\log x-4\pi ^2)}{(\theta +x)^{\beta +1}}dx\\&\quad+ \int _{\gamma _{\rho }}\frac{\log z}{(\theta +z)^{\beta +1}}dz=2\pi i \, \mathrm {Res}(f, -\theta ), \end{aligned}$$

where $\mathrm {Res}(f, -\theta )$ denotes the residue of f at $-\theta$. We have

$$\begin{aligned} \bigg | \int _{\gamma _R}\frac{\log z}{(\theta +z)^{\beta +1}}dz \bigg | \le 2\pi R\frac{|\log R+2\pi |}{(R-\theta )^{\beta +1}}\rightarrow 0 \quad {\text {as}}\,\,R\rightarrow \infty , \end{aligned}$$

and

$$\begin{aligned} \bigg | \int _{\gamma _{\rho }}\frac{\log z}{(\theta +z)^{\beta +1}}dz \bigg | \le 2\pi \rho \frac{|\log \rho +2\pi |}{(\rho -\theta )^{\beta +1}}\rightarrow 0 \quad {\text {as}}\,\,\rho \rightarrow 0. \end{aligned}$$

It follows that

$$\begin{aligned} i\int _0^{\infty } \frac{\log x}{(\theta +x)^{\beta +1}} dx=\frac{i}{2}Res(f, -\theta )+\pi \int _0^{\infty }\frac{1}{(\theta +x)^{\beta +1}}dx. \end{aligned}$$

Therefore, $\int _0^{\infty } \frac{\log x}{(\theta +x)^{\beta +1}} dx$ equals the imaginary part of $\frac{i}{2} \, \mathrm {Res}(f, -\theta )$ which is finite. Similarly, the second integral is also finite.

1.2 From Example 4.3

1.2.1 Verification of Conditions 1 and 2 of Theorem 4.1

To check Conditions 1–2 of Theorem 4.1, we need to identify a candidate sieve $\Theta _n$. Fix an arbitrary $\varepsilon \in (0,1)$ and let $\delta < \varepsilon /4$ be as in Theorem 4.1. Given $b> a > 0$, define the set of gamma scale mixtures $\Theta _{a,b}^\delta =\{\theta _G: G((a,b]) \ge 1-\delta \}$. We will show that $\Theta _n := \Theta _{a_n,b_n}^\delta$, with $a_n=e^{-cn}$ and $b_n=e^{cn}$, satisfies Conditions 1–2 of the theorem, where

$$\begin{aligned} c < \Bigl ( 1 + \log \frac{6 + \delta }{\delta } \Bigr )^{-1} \frac{\varepsilon ^2}{16}. \end{aligned}$$

We begin with checking Condition 2 about the metric entropy. Let d be the $L^1$-distance and let $k_u$ denote the gamma density with scale parameter u (and fixed shape s). Without loss of generality, let $v> u > 0$. Then we get

$$\begin{aligned} d(k_u, k_v)&= \frac{1}{\Gamma (s)} \int _0^{\infty } x^{s-1} |u^{-s} e^{-x/u} - v^{-s} e^{-x/v}| \, dx \\&\le \frac{1}{\Gamma (s)} \Bigl \{u^{-s} \int _0^\infty x^{s-1} |e^{-x/u} - e^{-x/v}| \,dx + (u^{-s} - v^{-s}) \int _0^\infty x^{s-1} e^{-x/v} \,dx \Bigr \} \\&= 2 \Bigl ( \frac{v^s}{u^s} - 1 \Bigr ). \end{aligned}$$

For the $b> a > 0$ and $\delta$ introduced above, let $z=(1 + \delta /2)^{1/s}$ and define

$$\begin{aligned} u_m = a z^m, \quad m=1,\ldots ,M, \end{aligned}$$

(5)

where M is the smallest integer such that $a z^{(M+1)} \ge b$, i.e., $M \le (\log z)^{-1} \log (b/a)$. If we partition (a, b] by the sub-intervals

$$\begin{aligned} E_m = (a z^m, a z^{m+1}], \quad m=1,\ldots ,M, \end{aligned}$$

then it is easy to check, based on the bound on the $L^1$-distance above, that

$$\begin{aligned} u,v \in E_m \implies d(k_u, k_v) \le \delta . \end{aligned}$$

Let $\Delta _M$ be the probability simplex in $\mathbb {R}^{M+1}$ and let $\Delta _M^\delta$ be a $(\delta /6)$-net^{Footnote 3} in $\Delta _M$. An argument similar to that in the proofs of Lemmas 1 and 2 of Ghosal et al. [22] shows that

$$\begin{aligned} \Bigl \{\sum _{m=1}^M P_m k_{u_m}: (P_1,\ldots ,P_M) \in \Delta _M^\delta \Bigr \}, \end{aligned}$$

a set of finite scale mixtures of gammas with fixed scales $u_1,\ldots ,u_M$ in (5), is a $\delta$-net in $\Theta _{a,b}^\delta$. Therefore, $\log N(\Theta _{a,b}^\delta , \delta , d) \le \log N(\Delta _M, \delta /6, \Vert \cdot \Vert _1)$, and the argument in the proof of Lemma 1 in Ghosal et al. [22], based on Lemma 8 in Barron et al. [2], shows that

$$\begin{aligned} \log N(\Delta _M, \delta /6, \Vert \cdot \Vert _1) \le \Bigl ( 1 + \log \frac{6 + \delta }{\delta } \Bigr ) M. \end{aligned}$$

Specifically, we put

$$\begin{aligned} D=\left\{ (P_1, \ldots , P_m)\in \Delta _M: \sum _{m=1}^MP_m\le 1+\frac{\delta }{6}\right\} . \end{aligned}$$

Let $(P_1, \ldots , P_m)\in \Delta _M$ and $(\widetilde{P}_1, \ldots , \widetilde{P}_m)\in \Delta _M^{\delta }$. Then $||P_m-\widetilde{P}_m||_1<\delta /(6M)$ implies $\sum _{m=1}^M||P_m-\widetilde{P}_m||_1<\delta /6$. Therefore, the $\delta /6$-covering number $N(\Delta _M, \delta /6, ||\cdot ||_1)$ of $\Delta _M$ might be bounded above by the product of the number of cubes of length $\delta /6$ covering $[0, 1]^M$ and the volume of D, which is further bounded above by

$$\begin{aligned} \frac{1}{M!} \left( \frac{6M}{\delta }\right) ^M\left( 1+\frac{\delta }{6}\right) ^M. \end{aligned}$$

It follows that

$$\begin{aligned} \log (\Delta _M, \delta /6, ||\cdot ||_1)\le & {} -\log M!+M\log \left( \frac{6M}{\delta }\right) +M\left( 1+\frac{\delta }{6}\right) \\\le & {} -M\log M + M +M \log M +M\log \frac{6+\delta }{\delta }\\= & {} \left( 1+\log \frac{6+\delta }{\delta }\right) M. \end{aligned}$$

Since M is bounded by a constant (depending only on $\delta$) times $\log (b/a)$, we clearly have that Condition 2 of Theorem 4.1 holds with $a_n = e^{-cn}$ and $b_n = e^{cn}$.

For Condition 1 about the prior mass assigned to the sieve $\Theta _n = \Theta _{a_n,b_n}^\delta$, it suffices to bound the prior probability of $\{G: G((a_n, b_n]) \ge 1-\delta \}$. A fundamental property of the Dirichlet process is that G(A) has a beta distribution with parameters $\alpha G_0(A)$ and $\alpha G_0(A^c)$. In the present case, if we let

$$\begin{aligned} \alpha _n = \alpha G_0((a_n, b_n]) \quad {\text {and}} \quad \beta _n = \alpha \{1 - G_0((a_n, b_n])\}, \end{aligned}$$

then we have

$$\begin{aligned} \Pi (\{G: G((a_n,b_n]) < 1-\delta \}) = \frac{1}{B(\alpha _n,\beta _n)} \int _0^{1-\delta } z^{\alpha _n-1} (1-z)^{\beta _n-1} \,dz, \end{aligned}$$

where $B(a,b) = \Gamma (a)\Gamma (b)/\Gamma (a+b)$ is the beta function. The right-hand side of the previous display is upper-bounded by

$$\begin{aligned} \frac{1}{B(\alpha _n, \beta _n)} (1-\delta )^{\alpha _n-1} \frac{1-\delta ^{\beta _n}}{\beta _n}. \end{aligned}$$

As $n \rightarrow \infty$, it is clear that $\alpha _n \rightarrow \alpha$ and $\beta _n \rightarrow 0$. Some simple analysis shows that the latter two terms in the upper bound have finite and non-zero limits as $n \rightarrow \infty$, so only the beta function term will be relevant. Using some basic properties of the gamma function we have

$$\begin{aligned} B(\alpha _n, \beta _n) = \frac{\Gamma (\alpha _n) \Gamma (\beta _n)}{\Gamma (\alpha _n + \beta _n)} = \{1 + o(1)\} \frac{\Gamma (\beta _n + 1)}{\beta _n} = \frac{O(1)}{\beta _n}. \end{aligned}$$

Therefore, the prior probability of the sieve vanishes at the rate $\beta _n = \alpha \{1-G_0((a_n,b_n])\}$ as $n \rightarrow \infty$. Following Lau et al. [34], if we take $G_0$ to be a gamma distribution with shape parameter t and scale parameter r, then it is easy to see that

$$\begin{aligned} G_0((0,a_n]) = \int _0^{a_n} \frac{1}{r^t \Gamma (t)} x^{t-1} e^{-x/r} \,dx \lesssim a_n^{t-1} \end{aligned}$$

and, by Markov’s inequality,

$$\begin{aligned} G_0((b_n,\infty )) \lesssim b_n^{-1}. \end{aligned}$$

With $a_n = e^{-cn}$ and $b_n = e^{cn}$, it is clear that Condition 1 of Theorem 4.1 holds.

1.2.2 The best scale-mixture of gammas is a single gamma

Recall that $\theta ^\star$ is a $\mathsf{Gamma}(\alpha ^\star , \lambda ^\star )$ density and $\theta ^\lambda$ is a $\mathsf{Gamma}(\alpha , \lambda )$ density, where $\alpha < \alpha ^\star$. To prove that the best mixture $\theta _G = \int \theta ^\lambda \, G(d\lambda )$ corresponds to just a single gamma density with an appropriately chosen rate $\lambda$, we need to consider the projection of $\theta ^\star$ onto the space of mixtures $\theta _G$. The claim is that the best mixture approximation to $\theta ^\star$, according to the Kullback–Leibler divergence, is one with G equal to a point mass $\delta _\Lambda$ at

$$\begin{aligned} \Lambda = \Lambda (\alpha , \alpha ^\star , \lambda ^\star ) = \frac{\alpha \lambda ^\star }{\alpha ^\star }, \end{aligned}$$

(6)

which is the value of $\lambda$ that makes the mean of $\theta ^\lambda$ the same as that of $\theta ^\star$. In this context, according to Lemma 2.5 in Shyamalkumar [51] and Lemma 2.3 in Kleijn and van der Vaart [30], it suffices to show that

$$\begin{aligned} \sup _G \int \theta ^\star (x) \frac{\theta _G(x)}{\theta ^\Lambda (x)} \,dx \le 1, \end{aligned}$$

where $\Lambda$ is as in (6). Writing out the definition of $\theta _G$ and switching order of integration, we can see that it suffices to show that

$$\begin{aligned} \int \theta ^\star (x) \frac{\theta ^\lambda (x)}{\theta ^\Lambda (x)} \,dx \le 1 \quad \forall \; \lambda > 0. \end{aligned}$$

(7)

A straightforward calculation shows that

$$\begin{aligned} \int \theta ^\star (x) \frac{\theta ^\lambda (x)}{\theta ^\Lambda (x)} \,dx = \Bigl ( \frac{\lambda ^\star }{\lambda ^\star - \Lambda + \lambda } \Bigr )^{\alpha ^\star } \Bigl (\frac{\lambda }{\Lambda } \Bigr )^\alpha = \Bigl ( \frac{\lambda ^\star }{\lambda + \frac{\alpha ^\star -\alpha }{\alpha ^\star } \lambda ^\star } \Bigr )^{\alpha ^\star } \Bigl ( \frac{\alpha ^\star \lambda }{\alpha \lambda ^\star } \Bigr )^\alpha . \end{aligned}$$

The right-hand side clearly equals 1 if $\lambda = \Lambda$; this is also obvious from the definition. Moreover, using the fact that $\alpha < \alpha ^\star$, it is an easy calculus exercise to show that the right-hand side is actually maximized at $\lambda = \Lambda = (\alpha /\alpha ^\star )\lambda ^\star$. This justifies the intuition from Example 4.3 that the best Kullback–Leibler approximation of $\theta ^\star$ (or $\eta ^\star$) by mixtures of $\theta ^\lambda$ (or $\eta ^\lambda$) is just a single $\theta ^\Lambda$ (or $\eta ^\Lambda$), for suitably chosen $\Lambda$. Therefore, $K(\theta ^\star , \theta _G) \ge K(\theta ^\star , \theta ^\Lambda )$, and the lower-bound is strictly positive.

1.3 From Example 4.4

We first want to show that the proposed prior satisfies the Kullback–Leibler property at $\theta ^\star$. Let $p_\theta (x,y)$ be the joint density of (X, Y) under the proposed nonparametric regression model. For any two regression functions $\theta$ and $\eta$, the Kullback–Leibler divergence $K(p_{\theta }, p_{\eta })$ of $p_{\eta }$ from $p_{\theta }$ can be expressed in terms of the $L^2$-distance between $\theta$ and $\eta$. Indeed, since

$$\begin{aligned} \{y-\eta (x)\}^2=\{y-\theta (x)\}^2+2\{\theta (x)-\eta (x)\}\{y-\theta (x)\} +\{\theta (x)-\eta (x)\}^2, \end{aligned}$$

and $\int \{y-\theta (x)\}p_{\theta }(x, y) \, dy=0$ for all x, we have

$$\begin{aligned} K(p_{\theta }, p_{\eta })= & {} \frac{1}{2} \int \int \{y-\eta (x)\}^2-\{y-\theta (x)\}^2 p_{\theta }(x, y) \, dy \, dx\nonumber \\= & {} \frac{1}{2}\int \{\theta (x)-\eta (x)\}^2 \, dx\nonumber \\= & {} \frac{1}{2}\Vert \theta -\eta \Vert _2^2, \end{aligned}$$

(8)

where $\Vert \cdot \Vert _2$ denotes the $L^2$-norm on $[0, 1]^q$. In view of (8), to verify the Kullback–Leibler property of $\Pi$ at $\theta ^\star$, it suffices to show that $\Pi (\{\theta : \Vert \theta -\theta _0\Vert _2 \le \varepsilon \}>0$ for all $\varepsilon >0$. Since $\theta \in L^2([0, 1]^q)$, the Bessel inequality and (4) imply that for any $\varepsilon >0$, there exists an integer J such that

$$\begin{aligned} \sum _{j>J}(\sigma _j^2+\theta _j^{\star 2})\le \frac{\varepsilon ^2}{4}. \end{aligned}$$

Following Shen and Wasserman ([49, Lemma 5), we have

$$\begin{aligned} \Pi \{\theta : \Vert \theta -\theta ^\star \Vert _2\le \varepsilon \}&=\Pi \Bigl \{\theta : \sum _j(\theta _j-\theta _j^\star )^2<\varepsilon ^2\Bigr \}\\&\ge \Pi \Bigl \{\theta : \sum _{j=1}^J(\theta _j-\theta _j^\star )^2<\frac{\varepsilon ^2}{2}\Bigr \} \, \Pi \Bigl \{\theta : \sum _{j>J}(\theta _j-\theta _j^\star )^2< \frac{\varepsilon ^2}{2} \Bigr \}\\&= \Pi \Bigl \{\theta : \sum _{j=1}^J(\theta _j-\theta _j^\star )^2< \frac{\varepsilon ^2}{2}\Bigr \} \Bigl [1-\Pi \Bigl \{\theta : \sum _{j>J}(\theta _j-\theta _j^\star )^2> \frac{\varepsilon ^2}{2} \Bigr \}\Bigr ]\\&\ge \Pi \Bigl \{\theta : \sum _{j=1}^J(\theta _j-\theta _j^\star )^2< \frac{\varepsilon ^2}{2} \Bigr \} \Bigl [1-\frac{2}{\varepsilon ^2}\sum _{j>J}E(\theta _j-\theta _j^\star )^2\Bigr ]\\&\ge \frac{1}{2} \Pi \Bigl \{\theta : \sum _{j=1}^J(\theta _j-\theta _j^\star )^2<\frac{\varepsilon ^2}{2} \Bigr \}. \end{aligned}$$

The first inequality is due to monotonicity of $\Pi$ and independent of the $\theta _j$’s under $\Pi$; the second inequality is due to the Markov inequality; and the third inequality follows from the definition of J and the fact $E(\theta _j-\theta _j^\star )^2=\sigma _j^2+\theta _j^{\star 2}$. The lower bound involves a ball probability for a J-dimensional normal distribution and, since such a distribution has a positive density on $\mathbb {R}^J$, the latter probability is positive, proving that $\Pi \{\theta : \Vert \theta -\theta ^\star \Vert _2\le \varepsilon \}>0$ for all $\varepsilon >0$. Therefore, the given $\Pi$, with variance $\sigma _j^2$, satisfies the Kullback–Leibler condition at any $\theta ^\star$ such that (4) holds.

Next, to prove consistency with respect to the $L^2$-distance d, it suffices to work with the $L^\infty$-norm $\Vert \cdot \Vert _\infty$, since $\Vert \cdot \Vert _2 \le \Vert \cdot \Vert _\infty$. For an increasing sequence $M_n$ to be specified later, define the sets

$$\begin{aligned} \Theta _{n0}=\{\theta : ||\theta ||_{\infty }<M_n\}\quad {\text {and}}\,\,\Theta _{n1}=\{\theta : ||\theta '||_{\infty }<M_n\}. \end{aligned}$$

Take $\Theta _n=\Theta _{n0} \cap \Theta _{n1}$ as the sieve set. By the relation between $L^2$- and $L^{\infty }$-distance on regression functions, it is clear that the $L^2$ covering numbers for $\Theta _n$ are less than the corresponding $L^{\infty }$ covering numbers. It follows from Theorem 2.7.1 in [54] that the $\delta$-covering number of $\Theta _n$, relative to the $L^{\infty }$-distance, satisfies $\log N(\Theta _n, \delta , \Vert \cdot \Vert _{\infty }) \le CM_n\delta ^{-1}$, where C is a constant that does not depend on n or $\delta$. If we take $\delta <\epsilon /4$, as in Theorem 4.1, then we can get $\log (\Theta _n, \delta , \Vert \cdot \Vert _{\infty })<nr$, for $\beta =\delta ^2<\epsilon ^2/16<\epsilon ^2/8$, by selecting $M_n= r n$, where $r<\delta ^3/C$. This verifies Conditions 2 of Theorem 4.1.

To verify Condition 1 of Theorem 4.1, we need to show that the $\Pi$-probability of $\Theta _n^c$ is exponentially small. To prove this, it suffices to show that both $\Theta ^c_{n0}$ and $\Theta ^c_{n1}$ have exponentially small $\Pi$-probability. Start with $\Theta _{n0}$. Like in Choi and Schervish ([11], Sec. 6.1), we have the following, based on the Chernoff inequality:

$$\begin{aligned} \Pi \{\theta : ||\theta ||_{\infty }>M_n\}\le & {} \Pi \Bigl \{\theta : \sum _j a_j|\theta _j|>M_n\Bigr \}\\\le & {} e^{-tM_n}E\bigl (e^{t\sum _ja_j|\theta _j|}\bigr ), \quad \forall \; t>0\\= & {} e^{-tM_n}\prod _jE\bigl (e^{ta_j\sigma _j|Z_j|}\bigr )\\= & {} e^{-tM_n}e^{(t^2/2)\sum _ja_j^2\sigma _j^2}\prod _j2\Phi (a_j\sigma _jt), \end{aligned}$$

where $Z_j$ is a standard normal random variable with density $\phi$ and distribution function $\Phi$. Here we used the formula for the moment-generating function of the half-normal |Z|:

$$\begin{aligned} \frac{1}{\sqrt{2\pi }}\int _{-\infty }^{\infty }e^{t|z|}e^{-z^2/2}dz= & {} \frac{2}{\sqrt{2\pi }}\int _0^{\infty }e^{tz}e^{-z^2/2}dz \\= & {} \frac{2e^{t^2/2}}{\sqrt{2\pi }}\int _0^{\infty }e^{-(z-t)^2/2}dz\\= & {} \frac{2e^{t^2/2}}{\sqrt{2\pi }}\int _{-t}^{\infty }e^{-u^2/2}du\quad (u=z-t)\\= & {} 2e^{t^2/2}\Phi (t). \end{aligned}$$

Since $\Phi (z)$ is concave on $[0, \infty )$, it can be bounded from above, for small $z>0$, by its first-order Taylor approximation at $z=0$. This implies that $\log \{2\Phi (z)\}\le \log \{1+2\phi (0)z\}$ which, in our cases, gives

$$\begin{aligned} \log \prod _j2\Phi (a_j\sigma _jt)\le \sum _j\log \left( 1+\frac{2a_j\sigma _jt}{\sqrt{2\pi }}\right) \le \frac{2t}{\sqrt{2\pi }}\sum _j a_j\sigma _j. \end{aligned}$$

By assumption $\sum _j a_j\sigma _j<\infty$, we also have $\sum _ja_j^2\sigma ^2_j<\infty$. In addition, we have $M_n=O(n)$. Thus, the upper bound for $\Pi (\Theta ^c_{n0})$ is of the form $c_1e^{-c_2n}$ for some constants $c_1$ and $c_2$. The exact same calculation, with $b_j$ in place of $a_j$, gives $\Pi (\Theta ^c_{n1})\le c_1e^{-c_2n}$ for some different $c_1$ and $c_2$. Since $\Pi (\Theta ^c_{n})\le \Pi (\Theta ^c_{n0})+\Pi (\Theta ^c_{n1})$, Condition 2 of Theorem 4.1 has been verified. Therefore, the posterior distribution is consistent with respect to $L^2$-distance at any $\theta ^\star$ satisfying (4).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hong, L., Martin, R. A review of Bayesian asymptotics in general insurance applications. Eur. Actuar. J. 7, 231–255 (2017). https://doi.org/10.1007/s13385-017-0151-5

Download citation

Received: 13 October 2016
Revised: 16 January 2017
Accepted: 07 April 2017
Published: 25 April 2017
Issue Date: July 2017
DOI: https://doi.org/10.1007/s13385-017-0151-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of Bayesian asymptotics in general insurance applications

Abstract

Access this article

Similar content being viewed by others

Estimation of the Ruin Probability in Infinite Time for Heavy Right-Tailed Losses

Weighted allocations, their concomitant-based estimators, and asymptotics

Bayesian estimation of the threshold of a generalised pareto distribution for heavy-tailed observations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Deferred technical details

1.1 From Example 3.3

1.2 From Example 4.3

1.2.1 Verification of Conditions 1 and 2 of Theorem 4.1

1.2.2 The best scale-mixture of gammas is a single gamma

1.3 From Example 4.4

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A review of Bayesian asymptotics in general insurance applications

Abstract

Access this article

Similar content being viewed by others

Estimation of the Ruin Probability in Infinite Time for Heavy Right-Tailed Losses

Weighted allocations, their concomitant-based estimators, and asymptotics

Bayesian estimation of the threshold of a generalised pareto distribution for heavy-tailed observations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Deferred technical details

Deferred technical details

1.1 From Example 3.3

1.2 From Example 4.3

1.2.1 Verification of Conditions 1 and 2 of Theorem 4.1

1.2.2 The best scale-mixture of gammas is a single gamma

1.3 From Example 4.4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation