Abstract
Over the last two decades, Bayesian methods have been widely used in general insurance applications, ranging from credibility theory to loss-reserves estimation, but this literature rarely addresses questions about the method’s asymptotic properties. In this paper, we review the Bayesian’s notion of posterior consistency in both parametric and nonparametric models and its implication on the sensitivity of the posterior to the actuary’s choice of prior. We review some of the techniques for proving posterior consistency and, for illustration, we apply these results to investigate the asymptotic properties of several recently proposed Bayesian methods in general insurance.
Similar content being viewed by others
Notes
This modern use of the term “nonparametric” differs a bit from that in the classical setting, e.g., [36], where it referred to methods free of distributional assumptions. The connection is that models depending on an infinite-dimensional parameter can avoid certain specifications. For example, in nonparametric regression (Example 4.4), the error distribution is normal, but taking the regression function itself to be the parameter avoids specifying a particular form like linear, quadratic, etc.
The assumption that the \(X_i\)s are uniformly distributed is not particularly special. The key assumption is that the distribution is known, i.e., does not depend on any unknown parameters.
By an \(\epsilon\)-net in a metric space (X, d), we mean a subset Y of X such that for any \(x\in X\) there exists a \(y\in Y\) such that \(d(x, y)<\epsilon\).
References
Barron A (1988) The exponential convergence of posterior probabilities with implications for Bayes estimators of density functions. Technical Report 7, Department of Statistics, University of Illinois, Champaign, IL
Barron A, Schervish MJ, Wasserman L (1999) The consistency of posterior distributions in nonparametric problems. Ann Stat 27:536–561
Berger JO (1985) Statistical decision theory and bayesian analysis, 2nd edn. Springer, New York
Brockett PL, Chuang SL, Pitaktong U (2014) Generalized additive models and nonparametric regression. In: Predictive modeling applications in actuarial science. Cambridge University Press. pp 367–397
Bühlmann H (1967) Experience rating and credibility. ASTIN Bull 4:199–207
Bühlmann H, Gisler A (2005) A course in credibility theory and its applications. Springer, New York
Bühlmann H, Straub E (1970) Glaubwürdigkeit für Schadensätze. Mitteilungen der Vereinigung Schweizerischer Versicherungs-Mathematiker 70:111–133
Bunke O, Milhaud X (1998) Asymptotic behavior of Bayes estimates under possibly incorrect models. Ann Stat 26(2):617–644
Cai X, Wen L, Wu X, Zhou X (2015) Credibility estimation of distribution functions with applications to experience rating in general insurance. N Am Actuar J 19(4):311–335
Choi T, Ramamoorthi RV (2008) Remarks on consistency of posterior distributions. Pushing the limits of contemporary statistics: contributions in honor of Jayanta K. Ghosh. Inst Math Stat Collect 3:170–186
Choi T, Schervish M (2007) On posterior consistency in nonparametric regression problems. J Multivar Anal 98:1969–1987
de Alba E (2002) Bayesian estimation of outstanding claim reserves. N Am Actuar J 6(4):1–20
de Alba E (2006) Claim reserving when there are negative values in the runoff triangle. N Am Actuar J 10(3):45–59
De Blasi P, Walker SG (2013) Bayesian asymptotics with misspecified models. Stat Sin 23:169–187
Diaconis P, Freedman D (1986) On the consistency of Bayes estimates. Ann Stat 14(1):1–26
Doob JL (1949) Application of the theory of martingales. In: Le Calcul des Probabilités et ses applications. Colloques Internationaux du Centre National de la Recherche Scientifique. Paris. pp 23–27
Escoto B (2013) Bayesian claim severity with mixed distributions. Variance 7(2):110–122
Fellingham GW, Kottas A, Hartman BM (2015) Bayesian nonparametric predictive modeling of group health claims. Insur Math Econ 60:1–10
Ferguson TS (1973) Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230
Gangopadhyay A, Gau WC (2007) Bayesian nonparametric approach to credibility modeling. Ann Actuar Sci 2(I):91–114
Ghosal S (2010) The Dirichlet process, related priors and posterior asymptotics. In: Nils Hjort, Chris Holmes, Peter Müller, and Stephen G. Walker (eds) Bayesian nonparametrics. Cambridge University Press, Cambridge. pp 35–79
Ghosal S, Ghosh JK, Ramamoorthi RV (1999) Posterior consistency of Dirichlet mixtures in density estimation. Ann Stat 27:143–158
Ghosh JK, Ramamoorthi RV (2003) Bayesian nonparametrics. Springer, New York
Hong L, Martin R (2016) Discussion on “Credibility Estimation of Distribution Functions with Applications to Experience Rating in General Insurance”. N Am Actuar J 20(1):95–98
Hong L, Martin R (2017) A flexible Bayesian nonparametric model for predicting future insurance claims. N Am Actuar J. doi:10.1080/10920277.2016.1247720
Jara A, Hanson T, Quintana F, Müller P, Rosner G (2011) DPpackage: Bayesian semi- and nonparametric modeling in R. J Stat Softw 40(1):1–30
Jeon Y, Kim JHT (2013) A gamma kernel density estimation for insurance loss data. Insur Math Econ 53:569–579
Kaas R, Dannenburg D, Goovaerts M (1997) Exact credibility for weighted observations. ASTIN Bull 27(2):287–295
Kass RE, Wasserman L (1996) The selection of prior distributions by formal rules. J Am Stat Assoc 91:1343–1370
Kleijn BJK, van der Vaart AW (2006) Misspecification in infinite-dimensional Bayesian statistics. Ann Stat 34(2):837–877
Klugman SA (1992) Bayesian statistics in actuarial science with emphasis on credibility. Kluwer, Boston
Klugman SA, Panjer HH, Willmot GE (2008) Loss models: from data to decisions, 3rd edn. Wiley, Hoboken
Kuo H (1975) Gaussian measures in banach spaces. Springer, New York
Lau WJ, Siu TK, Yang H (2006) On Bayesian mixture credibility. ASTIN Bull 36(2):573–588
Lee SCK, Lin XS (2010) Modeling and evaluating insurance losses via mixtures of Erlang distributions. N Am Actuar J 14(1):107–130
Lehmann EL (2006) Nonparametrics: statistical methods based on ranks, revised first edition. Springer, New York
Lehmann EL, Casella G (1998) Theory of point estimation, 2nd edn. New York, Springer
Makov UE, Smith AFM, Liu YH (1996) Bayesian methods in actuarial science. Statistician 45(4):503–515
Makov UE (2001) Principal applications of Bayesian methods in actuarial science. N Am Actuar J 5(4):53–57
Merz M, Wüthrich MV (2010) Paid-incurred chain claims reserving methods. Insur Math Econ 46:568–579
Ntzoufras I, Dellaportas P (2002) Bayesian modeling of outstanding liabilities incorporating claim count uncertainty. N Am Actuar J 6(1):113–125
Pan M, Wang R, Wu X (2008) On the consistency of credibility premiums regarding Esscher principles. Insur Math Econ 42:119–126
Ramamoorthi RV, Sriram K, Martin R (2015) On posterior concentration in misspecified models. Bayesian Anal 10:759–789
Rempala GA, Derrig RA (2005) Modeling hidden exposures in claim severity via the EM algorithm. N Am Actuar J 9(2):108–128
Schervish MJ (1995) Theory of statistics. Springer, New York
Schmidt KD (1991) Convergence of Bayes and credibility premiums. ASTIN Bull 20(2):167–172
Schwartz L (1965) On bayes procedures. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 4:10–26
Scollnik DPM (2001) Actuarial modeling with MCMC and BUGS. N Am Actuar J 5(2):96–124
Shen X, Wasserman L (2001) Rates of convergence of posterior distributions. Ann Stat 29(3):687–714
Shi P, Basu S, Meyers GG (2012) A Bayesian lognormal model for multivariate loss reserving. N Am Actuar J 16(1):1–29
Shyamalkumar ND (1996) Cyclic \(I_0\) projections and its applications in statistics. Purdue University Technical Report \(\#\) 96–24
Tokdar ST (2006) Posterior consistency of Dirichlet location-scale mixture of normals in density estimation and regression. Sankhyā 67(4):90–110
van der Geer S (2003) Asymptotic theory for maximum likelihood in nonparametric mixture models. Comput Stat Data Anal 41:453–464
van der Vaart AW, Wellner J (1996) Weak convergence and empirical processes: with applications to statistics. Springer, New York
Wald A (1949) Note on the consistency of the maximum likelihood estimate. Ann Math Stat 20:595–601
Walker SG (2003) On sufficient conditions for Bayesian consistency. Biometrika 90:482–488
Walker SG (2004) New approaches to Bayesian consistency. Ann Stat 32:2028–2043
Werner G, Modlin C (2010) Basic ratemaking. Casualty Actuarial Society, Arlington
Wu Y, Ghosal S (2008) Kullback Leibler property of kernel mixture priors in Bayesian density estimation. Electron J Stat 2:298–331
Wüthrich MV (2012) “A Bayesian log-normal model for multivariate loss reserving, Peng Shi, Sanjib Basu, and Glenn G. Meyers, March 2012”. N Am Actuar J 16(2):398–401
Zhang Y, Dukic V (2013) Predicting multivariate insurance loss payments under the Bayesian copula framework. J Risk Insur 80(4):891–919
Acknowledgements
The authors thank the Editor and two anonymous reviewers for their thoughtful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Deferred technical details
Deferred technical details
1.1 From Example 3.3
Since
we have
In view of
and
it suffices to show that both integrals are finite.
To see that the first integral is finite, consider the complex-valued function \(f(z)=\frac{\log ^2 z}{(\theta +z)^{\beta +1}}\). It is clear that f has a pole at \(z=-\theta\). Take \(\epsilon >0\) and \(0<\rho<1<R\). Let \(L_1\) be the line segment \([\rho +\epsilon i, R+\epsilon i]\), \(L_2\) be the line segment \([\rho -\epsilon i, R-\epsilon i]\), \(\gamma _{\rho }\) be the part of the circle \(|z|=\rho\) from \(\rho -\epsilon i\) clockwise to \(\rho +\epsilon i\), and \(\gamma _R\) be the part of the circle \(|z|=R\) from \(R+\epsilon i\) counterclockwise to \(R-\epsilon i\). Then the Residue Theorem implies
where \(\mathrm {Res}(f, -\theta )\) denotes the residue of f at \(-\theta\). We have
and
It follows that
Therefore, \(\int _0^{\infty } \frac{\log x}{(\theta +x)^{\beta +1}} dx\) equals the imaginary part of \(\frac{i}{2} \, \mathrm {Res}(f, -\theta )\) which is finite. Similarly, the second integral is also finite.
1.2 From Example 4.3
1.2.1 Verification of Conditions 1 and 2 of Theorem 4.1
To check Conditions 1–2 of Theorem 4.1, we need to identify a candidate sieve \(\Theta _n\). Fix an arbitrary \(\varepsilon \in (0,1)\) and let \(\delta < \varepsilon /4\) be as in Theorem 4.1. Given \(b> a > 0\), define the set of gamma scale mixtures \(\Theta _{a,b}^\delta =\{\theta _G: G((a,b]) \ge 1-\delta \}\). We will show that \(\Theta _n := \Theta _{a_n,b_n}^\delta\), with \(a_n=e^{-cn}\) and \(b_n=e^{cn}\), satisfies Conditions 1–2 of the theorem, where
We begin with checking Condition 2 about the metric entropy. Let d be the \(L^1\)-distance and let \(k_u\) denote the gamma density with scale parameter u (and fixed shape s). Without loss of generality, let \(v> u > 0\). Then we get
For the \(b> a > 0\) and \(\delta\) introduced above, let \(z=(1 + \delta /2)^{1/s}\) and define
where M is the smallest integer such that \(a z^{(M+1)} \ge b\), i.e., \(M \le (\log z)^{-1} \log (b/a)\). If we partition (a, b] by the sub-intervals
then it is easy to check, based on the bound on the \(L^1\)-distance above, that
Let \(\Delta _M\) be the probability simplex in \(\mathbb {R}^{M+1}\) and let \(\Delta _M^\delta\) be a \((\delta /6)\)-netFootnote 3 in \(\Delta _M\). An argument similar to that in the proofs of Lemmas 1 and 2 of Ghosal et al. [22] shows that
a set of finite scale mixtures of gammas with fixed scales \(u_1,\ldots ,u_M\) in (5), is a \(\delta\)-net in \(\Theta _{a,b}^\delta\). Therefore, \(\log N(\Theta _{a,b}^\delta , \delta , d) \le \log N(\Delta _M, \delta /6, \Vert \cdot \Vert _1)\), and the argument in the proof of Lemma 1 in Ghosal et al. [22], based on Lemma 8 in Barron et al. [2], shows that
Specifically, we put
Let \((P_1, \ldots , P_m)\in \Delta _M\) and \((\widetilde{P}_1, \ldots , \widetilde{P}_m)\in \Delta _M^{\delta }\). Then \(||P_m-\widetilde{P}_m||_1<\delta /(6M)\) implies \(\sum _{m=1}^M||P_m-\widetilde{P}_m||_1<\delta /6\). Therefore, the \(\delta /6\)-covering number \(N(\Delta _M, \delta /6, ||\cdot ||_1)\) of \(\Delta _M\) might be bounded above by the product of the number of cubes of length \(\delta /6\) covering \([0, 1]^M\) and the volume of D, which is further bounded above by
It follows that
Since M is bounded by a constant (depending only on \(\delta\)) times \(\log (b/a)\), we clearly have that Condition 2 of Theorem 4.1 holds with \(a_n = e^{-cn}\) and \(b_n = e^{cn}\).
For Condition 1 about the prior mass assigned to the sieve \(\Theta _n = \Theta _{a_n,b_n}^\delta\), it suffices to bound the prior probability of \(\{G: G((a_n, b_n]) \ge 1-\delta \}\). A fundamental property of the Dirichlet process is that G(A) has a beta distribution with parameters \(\alpha G_0(A)\) and \(\alpha G_0(A^c)\). In the present case, if we let
then we have
where \(B(a,b) = \Gamma (a)\Gamma (b)/\Gamma (a+b)\) is the beta function. The right-hand side of the previous display is upper-bounded by
As \(n \rightarrow \infty\), it is clear that \(\alpha _n \rightarrow \alpha\) and \(\beta _n \rightarrow 0\). Some simple analysis shows that the latter two terms in the upper bound have finite and non-zero limits as \(n \rightarrow \infty\), so only the beta function term will be relevant. Using some basic properties of the gamma function we have
Therefore, the prior probability of the sieve vanishes at the rate \(\beta _n = \alpha \{1-G_0((a_n,b_n])\}\) as \(n \rightarrow \infty\). Following Lau et al. [34], if we take \(G_0\) to be a gamma distribution with shape parameter t and scale parameter r, then it is easy to see that
and, by Markov’s inequality,
With \(a_n = e^{-cn}\) and \(b_n = e^{cn}\), it is clear that Condition 1 of Theorem 4.1 holds.
1.2.2 The best scale-mixture of gammas is a single gamma
Recall that \(\theta ^\star\) is a \(\mathsf{Gamma}(\alpha ^\star , \lambda ^\star )\) density and \(\theta ^\lambda\) is a \(\mathsf{Gamma}(\alpha , \lambda )\) density, where \(\alpha < \alpha ^\star\). To prove that the best mixture \(\theta _G = \int \theta ^\lambda \, G(d\lambda )\) corresponds to just a single gamma density with an appropriately chosen rate \(\lambda\), we need to consider the projection of \(\theta ^\star\) onto the space of mixtures \(\theta _G\). The claim is that the best mixture approximation to \(\theta ^\star\), according to the Kullback–Leibler divergence, is one with G equal to a point mass \(\delta _\Lambda\) at
which is the value of \(\lambda\) that makes the mean of \(\theta ^\lambda\) the same as that of \(\theta ^\star\). In this context, according to Lemma 2.5 in Shyamalkumar [51] and Lemma 2.3 in Kleijn and van der Vaart [30], it suffices to show that
where \(\Lambda\) is as in (6). Writing out the definition of \(\theta _G\) and switching order of integration, we can see that it suffices to show that
A straightforward calculation shows that
The right-hand side clearly equals 1 if \(\lambda = \Lambda\); this is also obvious from the definition. Moreover, using the fact that \(\alpha < \alpha ^\star\), it is an easy calculus exercise to show that the right-hand side is actually maximized at \(\lambda = \Lambda = (\alpha /\alpha ^\star )\lambda ^\star\). This justifies the intuition from Example 4.3 that the best Kullback–Leibler approximation of \(\theta ^\star\) (or \(\eta ^\star\)) by mixtures of \(\theta ^\lambda\) (or \(\eta ^\lambda\)) is just a single \(\theta ^\Lambda\) (or \(\eta ^\Lambda\)), for suitably chosen \(\Lambda\). Therefore, \(K(\theta ^\star , \theta _G) \ge K(\theta ^\star , \theta ^\Lambda )\), and the lower-bound is strictly positive.
1.3 From Example 4.4
We first want to show that the proposed prior satisfies the Kullback–Leibler property at \(\theta ^\star\). Let \(p_\theta (x,y)\) be the joint density of (X, Y) under the proposed nonparametric regression model. For any two regression functions \(\theta\) and \(\eta\), the Kullback–Leibler divergence \(K(p_{\theta }, p_{\eta })\) of \(p_{\eta }\) from \(p_{\theta }\) can be expressed in terms of the \(L^2\)-distance between \(\theta\) and \(\eta\). Indeed, since
and \(\int \{y-\theta (x)\}p_{\theta }(x, y) \, dy=0\) for all x, we have
where \(\Vert \cdot \Vert _2\) denotes the \(L^2\)-norm on \([0, 1]^q\). In view of (8), to verify the Kullback–Leibler property of \(\Pi\) at \(\theta ^\star\), it suffices to show that \(\Pi (\{\theta : \Vert \theta -\theta _0\Vert _2 \le \varepsilon \}>0\) for all \(\varepsilon >0\). Since \(\theta \in L^2([0, 1]^q)\), the Bessel inequality and (4) imply that for any \(\varepsilon >0\), there exists an integer J such that
Following Shen and Wasserman ([49, Lemma 5), we have
The first inequality is due to monotonicity of \(\Pi\) and independent of the \(\theta _j\)’s under \(\Pi\); the second inequality is due to the Markov inequality; and the third inequality follows from the definition of J and the fact \(E(\theta _j-\theta _j^\star )^2=\sigma _j^2+\theta _j^{\star 2}\). The lower bound involves a ball probability for a J-dimensional normal distribution and, since such a distribution has a positive density on \(\mathbb {R}^J\), the latter probability is positive, proving that \(\Pi \{\theta : \Vert \theta -\theta ^\star \Vert _2\le \varepsilon \}>0\) for all \(\varepsilon >0\). Therefore, the given \(\Pi\), with variance \(\sigma _j^2\), satisfies the Kullback–Leibler condition at any \(\theta ^\star\) such that (4) holds.
Next, to prove consistency with respect to the \(L^2\)-distance d, it suffices to work with the \(L^\infty\)-norm \(\Vert \cdot \Vert _\infty\), since \(\Vert \cdot \Vert _2 \le \Vert \cdot \Vert _\infty\). For an increasing sequence \(M_n\) to be specified later, define the sets
Take \(\Theta _n=\Theta _{n0} \cap \Theta _{n1}\) as the sieve set. By the relation between \(L^2\)- and \(L^{\infty }\)-distance on regression functions, it is clear that the \(L^2\) covering numbers for \(\Theta _n\) are less than the corresponding \(L^{\infty }\) covering numbers. It follows from Theorem 2.7.1 in [54] that the \(\delta\)-covering number of \(\Theta _n\), relative to the \(L^{\infty }\)-distance, satisfies \(\log N(\Theta _n, \delta , \Vert \cdot \Vert _{\infty }) \le CM_n\delta ^{-1}\), where C is a constant that does not depend on n or \(\delta\). If we take \(\delta <\epsilon /4\), as in Theorem 4.1, then we can get \(\log (\Theta _n, \delta , \Vert \cdot \Vert _{\infty })<nr\), for \(\beta =\delta ^2<\epsilon ^2/16<\epsilon ^2/8\), by selecting \(M_n= r n\), where \(r<\delta ^3/C\). This verifies Conditions 2 of Theorem 4.1.
To verify Condition 1 of Theorem 4.1, we need to show that the \(\Pi\)-probability of \(\Theta _n^c\) is exponentially small. To prove this, it suffices to show that both \(\Theta ^c_{n0}\) and \(\Theta ^c_{n1}\) have exponentially small \(\Pi\)-probability. Start with \(\Theta _{n0}\). Like in Choi and Schervish ([11], Sec. 6.1), we have the following, based on the Chernoff inequality:
where \(Z_j\) is a standard normal random variable with density \(\phi\) and distribution function \(\Phi\). Here we used the formula for the moment-generating function of the half-normal |Z|:
Since \(\Phi (z)\) is concave on \([0, \infty )\), it can be bounded from above, for small \(z>0\), by its first-order Taylor approximation at \(z=0\). This implies that \(\log \{2\Phi (z)\}\le \log \{1+2\phi (0)z\}\) which, in our cases, gives
By assumption \(\sum _j a_j\sigma _j<\infty\), we also have \(\sum _ja_j^2\sigma ^2_j<\infty\). In addition, we have \(M_n=O(n)\). Thus, the upper bound for \(\Pi (\Theta ^c_{n0})\) is of the form \(c_1e^{-c_2n}\) for some constants \(c_1\) and \(c_2\). The exact same calculation, with \(b_j\) in place of \(a_j\), gives \(\Pi (\Theta ^c_{n1})\le c_1e^{-c_2n}\) for some different \(c_1\) and \(c_2\). Since \(\Pi (\Theta ^c_{n})\le \Pi (\Theta ^c_{n0})+\Pi (\Theta ^c_{n1})\), Condition 2 of Theorem 4.1 has been verified. Therefore, the posterior distribution is consistent with respect to \(L^2\)-distance at any \(\theta ^\star\) satisfying (4).
Rights and permissions
About this article
Cite this article
Hong, L., Martin, R. A review of Bayesian asymptotics in general insurance applications. Eur. Actuar. J. 7, 231–255 (2017). https://doi.org/10.1007/s13385-017-0151-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13385-017-0151-5