Abstract
Claims modeling is a classical actuarial task aimed to understand the claim distribution given a set of risk factors. Yet some risk factors may be subject to misrepresentation, giving rise to bias in the estimated risk effects. Motivated by the unique characteristics of real health insurance data, we propose a novel class of two-part aggregate loss models that can (a) account for the semi-continuous feature of aggregate loss data, (b) test and adjust for misrepresentation risk in insurance ratemaking, and (c) incorporate an arbitrary number of correctly measured risk factors. The unobserved status of misrepresentation is captured via a latent factor shared by the two regression models on the occurrence and size of aggregate losses. For the complex two-part model, we derive explicit iterative formulas for the expectation maximization algorithm adopted in parameter estimation. Analytical expressions are obtained for the observed Fisher information matrix, ensuring computational efficiency in large-sample inferences on risk effects. We perform extensive simulation studies to demonstrate the convergence and robustness of the estimators under model misspecification. We illustrate the practical usefulness of the models by two empirical applications based on real medical claims data.
Similar content being viewed by others
Notes
Available for download via http://www.stat.purdue.edu/~jianxi/research.html
References
Akakpo R, Xia M, Polansky A (2019) Frequentist inference in insurance ratemaking models adjusting for misrepresentation. ASTIN Bull 49:117–146
Bernard D, Banthin J (2006) Family level expenditures on health care and insurance premiums among the nonelderly population. In: MEPS Research Findings. No. 29. March 2009. Agency for Healthcare Research and Quality, Rockville
Blostein M, Miljkovic T (2019) On modeling left-truncated loss data using mixtures of distributions. Insur Math Econ 85:35–46
Blough DK, Ramsey SD (2000) Using generalized linear models to assess medical care costs. Health Serv Outcomes Res Methodol 1(2):185–202
de Jong P, Heller GZ (2008) Generalized linear models for insurance data. Cambridge University Press, New York
Diehr P, Yanez D, Ash A, Hornbrook M, Lin DY (1999) Methods for analyzing health care utilization and costs. Annu Rev Public Health 20(1):125–144
Duan N, Manning WG, Morris CN, Newhouse JP (1983) A comparison of alternative models for the demand for medical care. J Bus Econ Stat 1(2):115–126
FBI (2011) Financial crimes report. Technical report, Federal Bureau of Investigation. Retrieved Dec 11, 2019, from https://www.fbi.gov/stats-services/publications/financial-crimes-report-2010-2011/financial-crimes-report-2010-2011
Frees EW (2009) Regression modeling with actuarial and financial applications. Cambridge University Press, Cambridge
Gabaldón IM, Vázquez Hernández FJ, Watt R (2014) The effect of contract type on insurance fraud. J Insur Regul 33(8):197–230
Gordon KS, Jørgensen B (2002) Fitting Tweedie’s compound Poisson model to insurance claims data: dispersion modelling. ASTIN Bull 32(1):143–157
Gustafson P (2014) Bayesian statistical methodology for observational health sciences data. Statistics in action. Chapman and Hall, London, pp 187–200
Hua L (2015) Tail negative dependence and its applications for aggregate loss modeling. Insur Math Econ 61:135–145
Hurn M, Justel A, Robert CP (2003) Estimating mixtures of regressions. J Comput Gr Stat 12(1):55–79
Jørgensen B, de Souza MC (1994) Fitting Tweedie’s compound Poisson model to insurance claims data. Scand Actuar J 1994(1):69–93
Kashihara D, Carper K (2009) National health care expenses in the U.S. civilian noninstitutionalized population. Statistical Brief. No. 355. January 2012. Agency for Healthcare Research and Quality, Rockville
Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc Ser B 44(2):226–233
Lunn DJ, Thomas A, Best N, Spiegelhalter D (2000) WinBUGS—a Bayesian modelling framework: concepts, structure, and extensibility. Stat Comput 10:325–337
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions. Wiley, New York
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Miljkovic T, Grün B (2016) Modeling loss data using mixtures of distributions. Insur Math Econ 70:387–396
Ratovomirija G, Tamraz M, Vernic R (2017) On some multivariate Sarmanov mixed Erlang reinsurance risks: aggregation and capital allocation. Insur Math Econ 74:197–209
Scollnik DPM (2001) Actuarial modeling with MCMC and BUGS. North Am Actuar J 5(2):96–124
Scollnik DPM (2002) Modeling size-of-loss distributions for exact data in WinBUGS. Journal of Actuarial Practice 10:202–227
Xia M, Gustafson P (2016) Bayesian regression models adjusting for unidirectional covariate misclassification. Can J Stat 44(2):198–218
Xia M, Gustafson P (2018) Bayesian inference for unidirectional misclassification of a binary response trait. Stat Med 37(6):933–947
Xia M, Hua L, Vadnais G (2018) Embedded predictive analysis of misrepresentation risk in GLM ratemaking models. Variance 12(1):39–58
Zhou XH, Tu W (2000) Interval estimation for the ratio in means of log-normally distributed medical costs with zero values. Comput Stat Data Anal 35(2):201–210
Acknowledgements
We are indebted to the two anonymous referees for their very thorough reading of the paper, and the many suggestions that resulted in an improved version.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Technical proofs
Proof of Proposition 2
The derivation for all the three formulas hinges on the partial derivative of the Q-function (10). We begin with the iterative formula for \(\lambda\). Setting
yields \({{\hat{\lambda }}}^{(s+1)}= \sum _{i=1}^{n}(1-v_{i}^{*})\ \eta _{i}^{(s+1)} \big / \sum _{i=1}^{n}(1-v_{i}^{*})\).
Turning to the estimator for \({\varvec{\alpha }}\), we first recall
For expositional reasons, assume \(x_0=1\). It is straightforward that, for \(v\in \{0,1\}\) and \(j\in \{0,{\mathcal {S}}\}\),
and
Simple algebraic operation yields the following partial derivative formula:
for \(j\in \{0,{\mathcal {S}}\}\), and
Setting \(\frac{\partial }{\partial \alpha _{j}} Q\left( \varvec{\Psi }\big |\varvec{\Psi }^{(s)}\right) =0\) for \(j\in \{0,{\mathcal {S}},k+1\}\) forms a system of \((|{\mathcal {S}}|+2)\) linear equations with the same number of unknowns. The linear equation system can be solved via \(({\varvec{B}}^T {\varvec{B}}+\varvec{E})^{-1}\ {\varvec{B}}^T \varvec{t}\). This yields the estimator for \({\varvec{\alpha }}\).
Finally, set \(\frac{\partial }{\partial \sigma ^{2}} Q\left( \varvec{\Psi }\big |\varvec{\Psi }^{(s)}\right) =0\). With some simple algebraic calculations, we obtain
This completes the proof of the proposition. \(\square\)
Proof of Corollary 3
Under the simplified assumption \(\pi _{v,{\varvec{x}}}=\pi _v=\beta _0+\beta _1v\) for \(v=0,1\), we first aim to find \({{\hat{\pi }}}_0\) and \(\hat{\pi }_1\) that maximize Q-function (10). Note that
Set
and
We obtain
The iterative formula for \(\varvec{\beta }\) can be obtained via the one-to-one correspondence between \((\beta _0,\beta _1)\) and \((\pi _0,\pi _1)\). The proof is completed. \(\square\)
Proof of Proposition 5
For ease of presentation, we only report the Fisher information matrix for the \(i\)-th sample, \(i=1,\ldots ,n\). The associated complete-data log likelihood function is denoted by \(l_{c,i}\). With an observed sample of size \(n\), we can compute the observed Fisher information matrix by summing the individual information matrices over the \(n\) observations.
First, we study the expected Fisher information matrix associated with the complete-data log likelihood function used in the EM algorithm. To this end, we need the expected second derivatives of the complete-data log likelihood function. Tedious yet manageable calculations yield
The expected second derivatives are equal to zero otherwise. By using the aforementioned derivative formulas, the Fisher information matrix associated with the complete-data log likelihood function can now be constructed according to Eq. (13).
Next, we study the covariance matrix of the gradient vector of the complete data log likelihood function. We again set \(x_{i0}=1\), \(i=1,\ldots ,n\), for notational convenience. It holds that
Note that all the partial derivatives reported above are linear in \(z_{i}\). Thereby, we have readily got
The application of the observed Fisher information formula in Lemma 4 completes the proof for the proposition. \(\square\)
Appendix B: Bayesian implementation
For the Bayesian implementation, we may use the MCMC methods based on the complete-data log likelihood function (8). Either Gibbs sampling or the Metropolis-Hastings algorithm can be used for the posterior simulations. References such as Hurn et al. [14], McLachlan and Peel [20] give comprehensive reviews of such algorithms for mixture models. In the current paper, we focus on the implementation of the two-part misrepresentation models using the BUGS language [18]. Owing to the excellent introductions by Scollnik [23, 24], the BUGS language has been widely used in the actuarial literature for implementing Bayesian models.
For the sake of illustration, let us consider the setting in Sect. 4 where there are four rating factors \((X_1,X_2,X_3,V)\) with \({\mathcal {S}}=\{1,2,3\}\) and \({\mathcal {F}}=\{1,2\}\). The rating factor \(V\) is subject to misrepresentation. For the parameters in \(\varvec{\Psi }\), we assume normal \(Normal(0,10)\) priors for the regression coefficients in \({\varvec{\alpha }}\) and \(\varvec{\beta }\), an inverse gamma prior \(IG(0.001,0.001)\) for the shape parameter \(\sigma\), and a \(Uniform(0,1)\) prior for the misrepresentation prevalence parameters \(\lambda\) and \(\theta =\mathbb {P}[V=1]\). Such a non-informative prior specification represents a situation where we have no prior knowledge on the parameters. The following BUGS implementation of the two-part misrepresentation model utilizes the ones trick in specifying the complete-data log likelihood. In an attempt to maximize the likelihood function, in the ones trick a Bernoulli trial is assumed for a vector of ones, with the probability of each observation being the likelihood divided by a large number.
For the application studies, the Bayesian implementation gives similar results on the estimation of the parameters and their standard errors. Hence, we will not present the results here. For the simulation study, Bayesian MCMC simulations are much slower than the frequentist counterparts. Thus, running repeated simulations with large sample sizes is computationally impossible. When there is no prior knowledge on the misrepresentation behaviors, the frequentist methods discussed in Sect. 3 are more convenient for the implementation of the proposed two-part misrepresentation models.
Rights and permissions
About this article
Cite this article
Chen, LC., Su, J. & Xia, M. Two-part models for assessing misrepresentation on risk status. Eur. Actuar. J. 11, 503–539 (2021). https://doi.org/10.1007/s13385-021-00263-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13385-021-00263-4
Keywords
- Expectation maximization algorithm
- Loss modeling
- Misrepresentation fraud
- Mixture models
- Semi-continuous data