1 Introduction

In many developed countries, the ageing of the population due to lower fertility and mortality rates and the increasing cost of new medical technologies are leading to increase in health spending. This phenomenon has been accentuated by the spread of the Covid-19. In 2019, OECD countries spent, on average, around 8.8% of their GDP on health care, with an average per capita health expenditure estimated to be more than USD 4000 (OECD, 2021). Government financing schemes are the main form of financing of health care in many of OECD countries but rising costs, the economic crisis and budgetary constraints have led to financial cuts in many countries and a reduction of the overall share of public health spending (Rechel, 2019). This leads to an increased demand for enrolment in health plans and for buying health insurance policy on the market. e.g., in Italy health plans policyholders increased from less than 7 million in 2013 to more than 14 million in 2019 (Ministero della Salute, 2021).

The first pillar in the Italian health system is represented by the National Health Insurance (Servizio Sanitario Nazionale, SSN) that accounts for 74% of total health spending (OECD, 2021). The SSN does not allow people to opt out of the system, therefore there are no substitute insurance but only complementary and supplementary private health insurance. They cover several medical services not provided by SSN, offer a higher level of comfort in hospital facilities and allow for a wider choice between public and private providers (Tikkanen et al., 2020). The second pillar is mainly characterized by private group health plans that are differentiated in “full-insured” and “self-insured”.

Financial sustainability analysis of “full-insured” and “self-insured” health plans requires the adoption of the actuarial risk assessment methodology (Carroll & Mange, 2017). This is especially true for a self-insured health plan which is a true risk-taking entity as it retains the risk for paying medical claims made by the plan’s participants and operates on its own.

Academic literature has mainly focused on forecasting the annual cost of healthcare in both the public and the private sectors due to the overall impact on the economy. Projections for annual healthcare expenditure, for an individual or a group, has been obtained using several methodologies: loss models, multivariate regression models (Frees et al., 2011, 2013), phase-type model (see Govorun et al. (2015) and literature therein) and, more recently, artificial intelligence methodologies (Vimont et al., 2021; Duncan et al., 2016). For a review of models applied to healthcare in an actuarial context reader can refer to Yang et al. (2016) and Duncan (2018).

A standard methodology to estimate health plans expenditure are two-part models, where the first part models a frequency component and the second a claim amount (or expenditure) (Frees et al., 2011; Duncan et al., 2016). In actuarial literature, frequency-severity approach (Frees, 2014) has been extensively used in modeling non-life insurance claim amount. In a standard frequency-severity approach the number of claims and expenditure per claim are used. For an example of the use of a frequency-severity model in health insurance see (Keeler & Rolph, 1988).

A key element in determining the expenditure of a health plan is the introduction of reimbursement limitations (e.g. deductibles, co-payments,...). On the one hand, they reduce costs for the health plan through cost sharing between plan and policyholders and allow contribution reduction, on the other hand they motivate policyholders to reduce expenses and, more in general, counteract moral hazard (Daykin et al., 1993; van Winssen et al., 2016).

Coverage modification changes the claim frequency and severity distributions (e.g. censoring or truncation). Parametric loss models for deductible ratemaking allow to measure the effects of coverage modification on claim frequency and severity distributions. The main advantage of these models is that they are accurate and allow to calculate theoretically correct deductible rates (Klugman et al., 2012).

Alternatively, the regression approach could be adopted, including the deductible as an explanatory variable inside a Generalized Linear Models (GLM) (Lee, 2017). A standard approach proposed in the actuarial literature for reimbursement limitation is focused on the estimate of “Indicated Deductible Relativity” (IDR) that provides an assessment of how much the insurance loss is reduced by a deductible. Considering that a health plan offers several medical services, reimbursement rules may differ among services (e.g. dental care, specialist medical visits). Therefore, in determining a reimbursement limitation strategy the level of protection offered to members should also be considered, along with the costs and risks of the health plan. Cost sharing strategies can make it difficult for members to understand the degree of coverage offered by a health plan. A possible index is given by the Actuarial Value (AV). AV is a measure of the relative percentages paid by a health benefits plan and its members. It is calculated using the expected health claims from a standard population, along with a plan’s cost-sharing provisions, to simulate the payment of claims. It ranges from 0 for a plan that pays nothing to 1 for a plan that pays all medical expenses.

The main goal of the paper is to assess the optimal IDR for each health service, by defining a mean-variance optimization problem. In this way, we calculate the IDRs by minimizing the total variance of the expenditure of the Health Plan (HP), fixing the expected gain. The optimization problem is solved both in the special case of no-correlation and the general case of correlation among health services. Such a problem adapts, in the health insurance context, the original optimization problem introduced by de Finetti De Finetti (1940) in the context of proportional reinsurance. Furthermore, our optimization problem requests as input the estimate of the mean and variance of expenditure for each health service. Hence, we propose a regression approach based on GLM. Starting from an Italian health plan data set, we compare two approaches: Two-Part and Tweedy. In general, these two approaches are used to determine the expected value of the response variable but results are not significantly different. However, for the purposes of this work is relevant to measure the variance of the expenditure in order to compare the optimization results using different methods to estimate mean and variance of the expenditure.

The paper is organized as follows. Section 2 describes the actuarial framework, the introduction of reimbursement limitation and the optimization problem. Section 3 compares Two-Part regression model and Tweedie regression model for the estimate of mean and variance of the expenditure. Section 4 presents a case study based on a real dataset and discusses the main findings. Section 5 concludes.

2 Actuarial framework and optimization problem

2.1 The frequency-severity model

We consider a health plan (HP) offering health care services to r policyholders. We group the health services in J groups (or ’branch’). Let:

  • i index the ith policyholder, \(1\le i \le r\);

  • j index the jth branch of health expenditure, \(1\le j \le J\).

Following a standard approach in actuarial literature, we adopt a double stochastic aggregate expenditure model, where both the number of claims and the size of each claim are stochastic variables (see, for instance, (Daykin et al., 1993)), the so called frequency-severity model. Therefore, we separately model the number of claims in a certain time period (frequency) and the expenditure per claim (severity). In the following we denote with:

  • N the random variable (r.v.), number of claims per year;

  • Y the r.v. expenditure for single claim;

In the classical collective risk model, the expenditure of the i-th policyholder given a specific branch j is:

$$\begin{aligned} Z_{i,j}=\sum _{k=1}^{N_{i,j}}Y_{i,j,k}, \end{aligned}$$
(1)

where \(Y_{i,j,k}\) is the r.v. expenditure for ith policyholder, due to the kth claim in the jth branch.

Under the typical assumption of independence between \(N_{i,j}\) and \(Y_{i,j,k}\) and identical distribution on \(Y_{i,j,k}\sim Y_{i,j}, \forall k=1,\ldots ,N_{i,j}\) (However, there are works that investigate the dependence between frequency and claim amount. For further details, see Garrido et al. (2016) and Clemente et al. (2022)), we get the well-known formulas for mean and variance of \(Z_{ij}\) see (Daykin et al., 1993):

$$\begin{aligned} \textrm{E}\left[ Z_{i,j}\right]= & {} \textrm{E}\left[ N_{i,j}\right] \cdot \textrm{E}\left[ Y_{i,j}\right] . \end{aligned}$$
(2)
$$\begin{aligned} \textrm{Var}\left[ Z_{i,j}\right]= & {} \textrm{E}\left[ N_{i,j}\right] \cdot \textrm{Var}\left[ Y_{i,j}\right] + \textrm{Var} \left[ N_{i,j}\right] \cdot \textrm{E}\left[ Y_{i,j}\right] ^2. \end{aligned}$$
(3)

In order to reduce its expenditures, the health plan could introduce some payment limitations. The reimbursement amount \(L_{i,j,k}\), after the application of a coverage modification, is obtained as a function \(h(Y_{i,j,k},\Theta )\), with \(\Theta \) a set of possible coverage limitation rules. To measure the impact of the reimbursement rules on the expenditure, we focus on the r.v. proportion of expenditure reimbursed (IDR) for a single policyholder, service and claim:

$$\begin{aligned} IDR_{i,j,k}=\frac{L_{i,j,k}}{Y_{i,j,k}}. \end{aligned}$$
(4)

whereas, the \(IDR_j\) at branch level is:

$$\begin{aligned} IDR_j=\frac{\sum _{i=1}^{r}{\sum _{k=1}^{N_{i,j}}L_{i,j,k}}}{\sum _{i=1}^{r}{\sum _{k=1}^{N_{i,j}}Y_{i,j,k}}}. \end{aligned}$$
(5)

2.2 The effect of reimbursement limitations in the profit and loss of an health plan.

In the following we assume no administrative and operational costs and the related expense loading. We assume that the total contribution requested by the health plan for the jth service group, \(C_j\), is determined as the sum of the expected expenditures for the service, \(\textrm{E}[Z_j]\), and a safety loading \(m_j\):

$$\begin{aligned} C_j=\textrm{E}[Z_j]+m_j \end{aligned}$$
(6)

Considering the typical actuarial assumption of independence between policyholders, the expected value and variance of the expenditure for single branch are:

$$\begin{aligned} \textrm{E}[Z_j]= & {} \sum _{i=i}^{r}\textrm{E}\left[ Z_{i,j} \right] , \end{aligned}$$
(7)
$$\begin{aligned} \textrm{Var}[Z_j]= & {} \sum _{i=i}^{r} \textrm{Var}\left[ Z_{i,j} \right] , \end{aligned}$$
(8)

respectively.

In the following, we assume that a proportional reimbursement rule (coinsurance) is applied to each claim, based on a coinsurance factor \(0\le \alpha _j\le 1\) for each branch j. Hence the r.v. reimbursement for single claim \(L_{i,j,k}\) are obtained as follows:

$$\begin{aligned} L_{i,j,k}=Y_{i,j,k} \cdot \alpha _j \end{aligned}$$
(9)

where \(\alpha _j\) is the jth element of the J-vector \(\varvec{\alpha }=(\alpha _1,\ldots ,\alpha _J)\) and it represents the proportion of expenditure reimbursed for a single claim.

It is easy to show by Eq. (5) that \(\alpha _j\) corresponds to the Indicated Deductible Relativity of the jth branch (i.e. \(IDR_j=\alpha _j\)).

Anyway, the introduction of a policy limitation reduces not only the reimbursement at branch level but also the contribution of the same proportion. Then, we can define a new r.v. profit (loss) for the jth branch as follows:

$$\begin{aligned} G_j(\alpha _j)=\alpha _j\cdot \left( C_j-Z_j\right) \end{aligned}$$
(10)

whose expected value is \(\textrm{E}\left[ G_j(\alpha _j)\right] =\alpha _j\cdot m_j\).

Therefore, the total expected gain of the HP after the introduction of the reimbursement rule is a function of the \(IDR_j=\alpha _j\) as follows:

$$\begin{aligned} \textrm{E}\left[ G(\varvec{\alpha })\right] =\sum _{j=1}^J{\textrm{E}\left[ G_j(\alpha _j)\right] }=\sum _{j=1}^J{\alpha _j\cdot m_j} \end{aligned}$$
(11)

It is important to note that the r.v. expected total profit for the HP, before the application of the reimbursement rules, i.e. if \(\alpha _j=1, j=1,\ldots , J\), is \(\textrm{E}\left[ G({\textbf {1}})\right] =\sum _{j=1}^{J}m_j\).

Under the assumption that the health service groups are uncorrelated, the variance of total profit is given by:

$$\begin{aligned} \textrm{Var}[G(\varvec{\alpha })] = \sum _{j=1}^{J}\textrm{Var}[G_j(\alpha _j)]=\sum _{j=1}^{J}\alpha ^2_j \cdot \sigma ^2_j \end{aligned}$$
(12)

whereas, for correlated service groups \(Var[G(\varvec{\alpha })]\) is:

$$\begin{aligned} \textrm{Var}[G(\varvec{\alpha })] = \sum _{j=1}^{J} \sum _{w=1}^{J} \alpha _j \alpha _w \sigma _{jw} \end{aligned}$$
(13)

where \(\sigma ^2_j=\textrm{Var}\left[ Z_j \right] \) is the variance of \(Z_j\) and \(\sigma _{jw}=\textrm{cov}\left[ Z_j,Z_w \right] \) is the covariance between \(Z_j\) and \(Z_w\).

2.3 The optimization problem

Once the expected value and variance of r.v. \(Z_j\) have been defined, it is possible to introduce an optimization problem to determine the optimal \(IDR_j=\alpha _j, j=1,\ldots , J\) for each branch. The optimal strategy is achieved by finding the optimal vector \((\varvec{\alpha }^*)\) that minimizes the total variance \(\textrm{Var}\left[ G(\varvec{\alpha }) \right] \) once the expected profit target \(\bar{G}\) has been set. Then, considering \(\Sigma \) as the non singular variance covariance matrix, we are able to formalize the optimization problem:

$$\begin{aligned} \left\{ \begin{array}{l} \min \varvec{\alpha }^T \Sigma \, \varvec{\alpha } \\ \text {s.t.}\\ \varvec{\alpha }^T{\textbf {m}} = \bar{G} \\ 0<\varvec{\alpha }\le 1 \end{array}\right. \end{aligned}$$
(14)

where \({\textbf {m}}\) is the J-vector of safety loadings, whose generic element is \(m_j\).

Following Pressacco and Ziani (2010), it is possible to write the Lagrangian of this problem as:

$$\begin{aligned} L(\varvec{\alpha },\tau ,{\textbf {u}},{\textbf {v}})=\frac{1}{2}\varvec{\alpha }^T \Sigma \varvec{\alpha }+\tau \cdot \left( \bar{G}-\varvec{\alpha }^T{\textbf {m}}\right) +{\textbf {u}}\cdot (\varvec{\alpha }-1)-{\textbf {v}}\cdot \varvec{\alpha } \end{aligned}$$
(15)

Making recourse to the Karush-Kuhn-Tucker (KKT) conditions, also known as the Kuhn-Tucker conditions, it turns out that \(\alpha ^{*}\) is optimal if there exists a triplet \(\left( \tau ^{*},{\textbf {u}}^{*},{\textbf {v}}^{*} \right) \) non negative vector such that:

  1. i)

    \(\varvec{\alpha }^{*}\) minimizes \(L(\varvec{\alpha },\tau ,{\textbf {u}}^{*},{\textbf {v}}^{*})\);

  2. ii)

    \(\varvec{\alpha }^{*}\) is feasible;

  3. iii)

    either \({\alpha _j}^{*}=0\) or \({v_j}^{*}=0\) (or both) and either \({\alpha _j}^{*}=1\) or \({u_j}^{*}=0\) (or both).

The optimization problem (14) is the same proposed in De Finetti (1940), where a solution for the optimal retention problem in reinsurance market is provided.

De Finetti described an algorithm to solve this optimization problem for uncorrelated case and suggested a procedure to solve the general case of correlated risks under implicit “regularity hypothesis” (see Pressacco and Ziani (2010)). The basic idea underlying the de Finetti solution, even in the case of correlation, is that, rather than dealing with two goal variables (variance and expectation), it is convenient to work with what he called advantage functions:

$$\begin{aligned} F_j(\varvec{\alpha })= {\left\{ \begin{array}{ll} \frac{\sigma ^2_{j_{}}}{m_{j}}\alpha _j, &{} \text {uncorrelated branches}\\ \sum _{w=1}^{J}\frac{ \sigma _{jw}}{m_j}\alpha _w, &{} \text {correlated branches} \end{array}\right. }, \, \, \,j=1,\ldots , J \end{aligned}$$
(16)

The advantage functions intuitively capture the advantage coming at \(\alpha \) from a small reduction of the jth risk. As stated in Pressacco and Ziani (2010), “Given the form of the advantage functions, it was easily seen that this implied a movement on a segment of the cube characterized by the set of equations \(F_i(x) = \lambda \) for all the current best performers. Here \(\lambda \) plays the role of the benefit parameter. And we should go on this segment until the advantage function of another, previously non efficient, risk matches the current value of the benefit parameter, thus becoming a member of the efficient set. Accordingly, at this point, the direction of the efficient path is changed as it is defined by a new set of equations, with the addition of the equation of the newcomer risk.” (Pressacco and Ziani, 73).

In our context it means that, starting from the case of absence of reimbursement rules, i.e. \(\bar{G}=\textrm{E}\left[ G({\textbf {1}}) \right] ={\textbf {m}}\), the introduction of reimbursement rules requires the reduction of the most efficient branch in terms of advantage function, i.e. \(\left\{ j:\underset{j}{max}F_j({\textbf {1}})\right\} \).The reduction on branch j stops when the advantage function of another branch, say w, becomes higher than the one measured on j.

To find the path of the optimal solutions, we solve the optimal problem (14) considering several decreasing value of the expected contribution \(\bar{G}\).

In order to solve the optimization problem, it is firstly necessary to get a robust estimate of \(\hat{\textrm{E}}\left[ Z_{ij}\right] \) and \(\hat{\textrm{Var}}\left[ Z_{ij}\right] \) and than calculate \(\hat{\textrm{E}}\left[ Z_{j}\right] \) and \(\hat{\textrm{Var}}\left[ Z_{j}\right] \) according to Eqs. (7) and (8). To this aim we propose in the following section two actuarial approaches based on regression models.

3 One-part model versus two-part model for health expenditure estimate

In actuarial literature, to predict the cost of a claim, it is a usual approach to decompose two-part data, one part for the frequency - indicating whether a claim has occurred or, more generally, the number of claims - one part for the severity, indicating the amount of a claim. This approach is also known as Two-Part or Two-Stage model (Duan et al., 1983) and in health care expenditure analysis has been proposed in Frees (2010), Frees et al. (2013), among others.

However, the request of health-care services can be influenced by a set of demographic and/or geographic variables, among others. Hence, an univariate analysis for each homogeneous risk class is not feasible, and a multivariate approach is more appropriate. We adopt a GLM (Mc Cullagh & Nelder, 1989) as it represents the standard model in industry for ratemaking. For further details on the basic theory of GLM see Appendix 1. For a full description of the regression model in actuarial and financial application reader can refer to Frees (2010).

As the expenditure per single policyholder and category has a mass probability at zero and a positive, continuous, and often right-skewed component for positive values, Tweedie regression is a valid alternative to predict such an expenditure (Kurz, 2017) respect to Two-Part models.

3.1 A negative binomial-Gamma two-part regression model for the estimate of the health expenditure

Fixing the jth branch, the goal is the calculation of \(\textrm{E}[N_{i,j}]\), \(\textrm{Var}[N_{i,j}]\), \(\textrm{E}[Y_{i,j}]\), \(\textrm{Var}[Y_{i,j}]\) necessary for the estimate of mean and variance of \(Z_{i,j}\) (see (2) and (3)). To this aim, we consider a two part model where a Negative binomial regression is used for the estimate of frequency component \(N_j\) and a Gamma regression for the estimate of severity component \(Y_j\).

Considering the (29) with a logarithm link function it is possible to write the density of Negative Binomial r.v. as:

$$\begin{aligned} f_{N_j}\left( n_i,\textbf{x}_{i},\beta _{N_j},\theta _j\right) = \frac{ \Gamma \left( y+ 1/\theta _j\right) }{\Gamma \left( y+1 \right) \cdot \Gamma \left( 1/\theta _j \right) }\cdot \frac{(\theta _j e^{\textbf{x}_i \cdot \beta _{N_j}})^y}{(1+\theta _j e^{\textbf{x}_i \cdot \beta _{N_j}} )^{y+1/\theta _j}} \end{aligned}$$
(17)

Then a likelihood function is associated to the Negative Binomial model:

$$\begin{aligned} L_{N_j}\left( n_i|\textbf{x}_{i},\beta _{N_j},\theta _j\right) = \prod _{i=1}^{r}=f_{N_j}\left( n_i,\textbf{x}_{i},\beta _{N_j},\theta _j\right) \end{aligned}$$
(18)

Therefore, it is possible to estimate \(\beta _{N_j}\) and \(\theta _j\) parameters, by maximizing the (18). Exploiting (29) and (28) it is possible to calculate an estimate for \(\textrm{E}\left[ N_{i,j}\right] \) and \(\textrm{Var}\left[ N_{i,j}\right] \). It is important to note that a number J of Negative Binomial GLM must be carried on.

For the severity component, selecting the logarithm link function it is possible to write the density of Gamma r.v. as:

$$\begin{aligned} f_{Y_j}\left( y_i,\textbf{x}_{i},\beta _{Y_j},\eta _j \right) = \frac{1}{y_i\cdot \Gamma (\eta _j)}\cdot \left( \frac{\eta _j\cdot y_i}{exp(\textbf{x}_{i}\beta _{Y_j})}\right) ^{\rho _j}\cdot e^{-\eta _j\cdot y_i/\left( exp(\textbf{x}_{i}\beta _{Y_j})\right) } \end{aligned}$$
(19)

Then, a likelihood function is associated to the Gamma model:

$$\begin{aligned} L_{Y_j}\left( y_i|\textbf{x}_{i},\beta _{Y_j},\eta _j\right) = \prod _{i=1}^{r}=f_{Y_j}\left( y_i,\textbf{x}_{i},\beta _{Y_j},\eta _j \right) \end{aligned}$$
(20)

Therefore, it is possible to estimate \(\beta _{Y_j}\) and \(\eta _j\) parameters, by maximizing the (20). Exploiting (29) and (28) it is possible to calculate an estimate for \(\textrm{E}\left[ Y_{i,j}\right] \) and \(\textrm{Var}\left[ Y_{i,j}\right] \). It is important to note that a number J of Gamma GLM must be carried on.

3.2 Tweedie regression model for the estimate of mean and variance of the expenditure

In this section, we use a Tweedy regression to get by a single model an estimate of mean and conditional variance of a response variable. Consider the collective risk model in Eq. (1), we temporarily remove for sake of simplicity the dependence on i and j. Suppose that N has a Poisson distribution with mean \(\psi \), and each Y has a gamma distribution shape parameter \(\zeta \) and scale parameter \(\rho \). The Tweedie distribution is derived as the Poisson sum of gamma variables. By using iterated expectations, mean and variance can be expressed as:

$$\begin{aligned} \textrm{E} (Z)=\psi \zeta \rho ~~~~\textrm{and}~~~~\textrm{Var} (Z)=\psi \zeta (1+\zeta )\rho ^2. \end{aligned}$$
(21)

Now, define three parameters \(\mu \), \(\phi \), and p through the following reparametrization

$$\begin{aligned} \psi =\frac{\mu ^{2-p}}{\phi (2-p)},~~~~\zeta =\frac{2-p}{p-1},~~~~\rho =\phi (p-1)\mu ^{p-1}, \end{aligned}$$
(22)

it is easy to show that \(\textrm{E}(Z)=\mu \) and \(\textrm{Var} (Z)=\phi \mu ^{p}\) (see Kaas (2005), Tweedie (1956)). As a consequence, under these assumptions Z denotes a Tweedie random variable with mean \(\mu >0\), variance \(\phi \mu ^{p}\) \(\left( \phi >0\right) \), where \(\phi \) is the so-called dispersion parameter, and \(p \in (- \infty ,0] \cup [1,\infty )\) the power parameter.

It is worth noting that Tweedie distribution is a special case of exponential dispersion models, a class of models used to describe the random component in GLMs, allowing us to use the Tweedie distribution with GLMs to model \(\textrm{E}(Z_{i,j})\) and \(\textrm{Var}(Z_{i,j})\).

The conditional expected expenditure and variance for the ith policyholder and jth branch are given by:

$$\begin{aligned} \textrm{E}\left[ Z_j|\textbf{x}_{i} \right] =\textrm{E}\left[ Z_{i,j} \right] =g^{-1}(\textbf{x}_{i} \cdot {\beta }_{Z_j}), \,\,\,\, \textrm{Var} \left[ Z_{j}|\textbf{x}_{i} \right] =\textrm{Var} \left[ Z_{i,j} \right] =\phi \cdot \mu _{i,j}^p \end{aligned}$$
(23)

The estimate of the row-vector of regression parameters \({\beta }_{Z_j}\), of the dispersion parameter \(\phi \) and the power p are provided by a Maximum Likelihood Estimation (MLE) approach (see Frees (2014)).

It is worth noting that a number J of Tweedy regressions must be carried on. In the following section, we report a numerical investigation applied to a real dataset.

4 Numerical application

4.1 Data set

We consider a dataset from an Italian HP, where \(J=5\) branches are covered. A description of each branch is shown in Table 1.

Table 1 Description of branches policy limitations

The observation has a time frame of three years (2017, 2018 and 2019). Each year contains claims from \(r=16,206\) policyholders per year, whose age varies from 0 to 69. In the following we report some descriptive statistics of this HP between 2017 and 2019.

In Table 2 the number of claims and total expenditures (data in EUR thousand) are classified for branch; SMV (\(j=5\)) is the branch with the greatest number of claims, whereas the greatest annual expenditure corresponds to DC (\(j=3\)).

Table 2 Number of claims and total expenditure (in EUR thousand) per branch

In Table 3, the observed frequency for the three most claimed branches (SMV, DC and OH) are reported for sex and age class.

The SMV shows greater frequency for females than males with a peak in age class [50,60). It is also important to highlight that the females have a higher frequency also for DC and OH, but differences between gender are lower than SMV case.

Table 3 Observed frequency in \(\%\) of branch \(j=5,3,1\) for sex and age

In Table 4, observed average expenditure per claim for the three most claimed branches (SMV, DC and OH) are reported for sex and age class.

The OH has no significant differences for sex, conversely females are in general more expensive than males for SMV and DC.

Table 4 Observed average expenditure per claim (in EUR thousand) of branch \(j=5,3,1\) for sex and age class

In Table 5 observed average per policyholder expenditure for the three most claimed branches (SMV, DC and OH) are reported for sex and age class. It is important to note that in SMV case females have an expenditure per policyholder greater than males, with a peak in age class [40;50). In DC and OH case, the higher riskiness of females is confirmed; furthermore, considering the whole population, the most expensive age classes are [50;60) for DC (144.85 EUR thousand) and [40;50) for OH (16.67 EUR thousand).

Table 5 Observed average expenditure per policyholder (in EUR thousand) of branch \(j=5,3,1\) for sex and age class

4.2 Regression models results

In the following paragraph, regression models outcomes are reported. All the regression models introduced in the previous section use sex and age as covariates. This choice is motivated by actuarial practice and literature (Frees, 2014; Duncan, 2018), where this demographic factors are considered crucial in each model for the annual expenditure. We limit the analysis to this two risk factors to facilitate the representation of results. Age variable is treated as quantitative and we use a polynomial function whose degree is selected by a stepwise procedure.

For the estimate of the expected number of claims we consider a negative binomial model; in Table 6 the results are largely in line with those found in the literature (Frees et al., 2011); indeed, sex and age are statistically significant and females shows greater frequencies than males for all the branches in accordance with the observed frequencies in Table 3. The stepwise regression suggests a third-degree polynomial to model age, without the coefficient of degree 1.

Table 6 Output of negative binomial model

For the amount of expenditures (Gamma regression model), Table 7 shows that almost every age coefficient is not significant. It is important to note that the most significant age coefficients are the ones of SMV, in accordance with 4. Age is modelled by a third-degree polynomial without the coefficient of degree 1.

Table 7 Output of Gamma model

As previously explained, a valid alternative approach to estimate mean and variance of \(Z_{i,j}\) consists on the application of Tweedie regression, whose results are reported in Table 8. As one can see, females are riskier than males, in accordance with observations reported in Table 5. The coefficients are all significant. Age is modeled by a third-degree polynomial without the coefficient of degree 1.

Table 8 Output of Tweedie model

In order to measure the estimation accuracy, we introduce the goodness-of-fit (GoF) measure called Nash-Sutcliffe Efficiency (NSE). The latter is a normalized statistic that determines the relative magnitude of the residual variance compared to the measured data variance (see (Nash & Sutcliffe, 1970)). Given a dependent variable y and its estimate \(\hat{y}\), NSE can be defined as: \(NSE=1- \frac{\sum (y - \hat{y})^2 }{\sum (y -E(y))^2}\).

The NSE can be interpreted as test statistic for the accuracy of model predictions. The NSE ranges from \(-\infty \) to 1: if \(NSE=1\), there is a perfect match of the modeled to the observed data; if \(NSE=0\), the model predictions are as accurate as the mean of the observed data, if \(-\infty<NSE<0\), the observed mean is a better predictor than the model. It is important to note that NSE is a GoF measure that allows to accept or reject the model using 0 as a critical value.

In Table 9: we report results for NSE:

Table 9 NSE values for branch. Two-Part and Tweedie case

where all the values are compliant with acceptance.

Finally, exploiting Eqs. (2) and (3) it is possible to calculate \(\textrm{E}[Z_{i,j}]\) and \(\textrm{Var}[Z_{i,j}]\) in the case of Two-Part model. whereas the Tweedie model gives a direct estimate of this values. By (7) and (8) we can get an estimate of mean and variance of the expenditure for each branch, whose results for 2019 are reported in Table 10

Table 10 \(\textrm{E}[Z_j]\) (data in EUR) and \(\textrm{Var}[Z_j]\) (data in EUR squared) for branch, Year 2019

The branch DC has the greatest variance of expenditure. If we consider the ratio between variance and mean representing the variance per unit of mean, the DC value is the greatest (1558 and 1807 in two-part and Tweedie estimate respectively) and the OH value is the lowest (102 and 96 in two-part and Tweedie estimate respectively) confirming the ranking observed in Table 11

4.3 The calculation of the optimal IDR

We show an example of the calculation of optimal IDRs in the case of no-correlation and correlation among branches. Before proceeding, we need to establish the general rule to assess the risk loading \(m_j\) for each branch j. To this aim, standard actuarial pricing principles (see Gerber (1979)) use a functional that assigns a loaded premium, called pure premium, to any distribution of the expenditure per policyholder. Traditional “loaded” net premium principles correct the equivalence principle to generate a premium that is bigger than the expected loss. In the following context we use the simplest method, the so-called Equivalence Premium, that starting from the expected loss assigns a positive “safety loading” to avoid ruin with a certainty proportional to the expected loss, i.e. \(m_j=\beta \cdot \textrm{E}[Z_j]\), \(\beta >0\). Therefore, in the following we consider \(\beta =10\%\) for each branch and the risk loading \(m_j \, (j=1,\ldots , 5)\) are obtained according to the values \(\textrm{E}[Z_j]\) show in Table 10. To find the path of the optimal solutions, we solve the optimal problem proposed in Eq. (14) considering several decreasing values of the expected contribution (\(\bar{G}\)) according to a proportion \(0\le \delta \le 1\) of the maximum contribution \(\textrm{E}[G({\textbf {1}})]\), i.e. \(\bar{G}=\delta \cdot \textrm{E}[G({\textbf {1}})]\). These solutions are equivalent to consider an overall IDR (or AV) equal to \(\delta \).

4.3.1 The no-correlation case

Assuming the absence of correlation among branches, the optimization problem in Eq. (14) can be solved numerically using Eq. (15) where the objective function is easily expressed as the weighted sum of the \(\textrm{Var}[Z_j]\), with weights \(\alpha _j\).

As an alternative, identical solutions can be achieved by following the De Finetti “algorithm” (for further details see Pressacco and Ziani (2010)) proposed in his famous “problema de pieni” (De Finetti (1940)). The de Finetti “algorithm”, in the uncorrelated case, can be summarized as follows:

  1. i)

    order branches by the terms \(F_j(\textbf{1})=\frac{\sigma ^2_j }{m_j}\), \(j=1,\ldots ,J\), which is the expression of the advantage function in Eq. (16) in the no-correlation case;

  2. ii)

    by reducing the expected gain to a level \(\bar{G}\), the IDRs must be reduced consequently, following the order defined in step i: the first IDR to be reduced is the one for which \(\frac{\sigma ^2_j }{m_j}\) is maximum and the last to be reduced is the one for which \(\frac{\sigma ^2_j }{m_j}\) is minimum;

  3. iii)

    For any \(0\le \lambda \le \max _j {\frac{\sigma ^2_j }{m_j}}\), an optimal retention is given by: \(\alpha _j^*=min\left\{ \frac{\lambda }{\frac{\sigma ^2_j }{m_j}},1 \right\} \)

As previously stated \(\lambda \) plays the role of the benefit parameter. Table 11 shows the values of \(F_j(\textbf{1})\) obtained using values in Table 10.

Table 11 de Finetti ordered \(F_j(\textbf{1})\) values (data in EUR). No-Correlation case

As observable, according to the de Finetti algorithm we expect that the first branch to move is DC and the last OH. This is confirmed by value reported in Table 12 where the optimal IDR respect to a decreasing contribution \(\delta \), according to the Two-Part regression model. Similar values are obtained with the Tweedie regression model.

Table 12 No-correlation case–optimal IDRs for different values of expected contribution reduction \(\delta \). Two-Part regression model. Mean and standard deviation in EUR

Fig. 1 gives a straight representation of the concept of the functioning of the de Finetti algorithm. Indeed, the first branch reduced is DC as it has the maximum \(F_3(\textbf{1})\), whereas the last one is OH where is observed the minimum \(F_1(\textbf{1})\).It is worth noting that, that the second highest value of the ratio is \(F_2(\textbf{1})\) (SC) where the Tweedie model shows a higher value compared to the Two-Part model. This involves a quicker movement from 1 to 0 of SC IDR concerning higher values of \(\delta \), in the Tweedie model compared to Two-Part one.

Fig. 1
figure 1

No-correlation case–optimal IDR paths for different values of overall IDR \(\delta \)

4.3.2 The general case of correlated branches

To provide an analysis on correlated branches, it is necessary to introduce the correlation matrix \(\Omega \) and, consequently, calculate the variance/covariance matrix \(\Sigma \), concerning the two regression models provided.

$$\begin{aligned} \Omega= & {} \begin{bmatrix} 1.0000 &{} 0.0540 &{} 0.1048 &{} 0.0959 &{} 0.2521 \\ 0.0540 &{} 1.0000 &{} 0.0122 &{} 0.1537 &{} 0.0877 \\ 0.1048 &{} 0.0122 &{} 1.0000 &{} 0.0294 &{} 0.0926 \\ 0.0959 &{} 0.1537 &{} 0.0294 &{} 1.0000 &{} 0.1495 \\ 0.2521 &{} 0.0877 &{} 0.0926 &{} 0.1495 &{} 1.0000 \\ \end{bmatrix} \end{aligned}$$
(24)
$$\begin{aligned} \Sigma _{Two-Part}= & {} \begin{bmatrix} 25,598,871 &{} 3,775,780 &{} 29,334,013 &{} 4,234,214 &{} 25,721,650 \\ 3,775,780 &{} 190,682,291 &{} 9,329,362 &{} 18,511,149 &{} 24,418,098 \\ 29,334,013 &{} 9,329,362 &{} 3,060,383,134 &{} 14,206,070 &{} 103,304,284 \\ 4,234,214 &{} 18,511,149 &{} 14,206,070 &{} 76,112,543 &{} 26,297,286 \\ 25,721,650 &{} 24,418,098 &{} 103,304,284 &{} 26,297,286 &{} 406,617,622 \\ \end{bmatrix} \nonumber \\ \end{aligned}$$
(25)
$$\begin{aligned} \Sigma _{Tweedie}= & {} \begin{bmatrix} 24,147,372 &{} 4,923,393 &{} 30,671,320 &{} 4,104,565 &{} 23,784,895 \\ 4,923,393 &{} 343,698,039 &{} 13,484,093 &{} 24,804,849 &{} 31,212,104 \\ 30,671,320 &{} 13,484,093 &{} 3,546,897,983 &{} 15,264,418 &{} 105,884,565 \\ 4,104,565 &{} 24,804,849 &{} 15,264,418 &{} 75,822,123 &{} 24,989,563 \\ 23,784,895 &{} 31,212,104 &{} 105,884,565 &{} 24,989,563 &{} 368,588,641 \\ \end{bmatrix}\nonumber \\ \end{aligned}$$
(26)

Once again, the optimization problem in Eq. (14) can be solved numerically and, as an alternative, de Finetti “algorithm” can be improved. Anyway, the latter, is based on the advantage function in Eq. (16) where the \(F_j(\varvec{\alpha })=\lambda \) depends on variance and covariance between branches. Hence, a more complex algorithm compared to the one introduced in the no-correlation case is required. For further details, the reader can refer to Pressacco and Serafini (2007). However, the basic rule of the maximum advantage function to identify starting point for the first declining risk is respected. Table 13 shows the ranking obtained in the correlated branches cases.

Table 13 de Finetti ordered \(F_j(\textbf{1})\) values (data in EUR). Correlation case
Table 14 Correlation case - Optimal IDR for different values of expected contribution reduction \(\delta \). Two-Part regression model. Mean and standard deviation in EUR

As one can see in Table 14, the first branch to move is DC and the last is OH as in the no-correlation case. Anyway, this time, the Tweedie model does not show similar values as shown in Fig. 2. Indeed, the ranking in this case changes, as the branch with maximum value of \(F_i(\textbf{1})\) is SC instead of DC (see Table 13) as in the Two-Part model.

Fig. 2
figure 2

Correlation case - Optimal IDR paths for different values of overall IDR \(\delta \)

Fig. 3
figure 3

A comparison of optimal IDR solutions with \(\delta =0.6\) with no correlated and correlated branches

5 Conclusion

This study investigates a methodological framework to assess optimal reimbursement strategy for employer-based health insurance plans. The reimbursement strategy is measured in terms of the percentage of expenditure reimbursed by the health plan defined in actuarial literature as Indicated Deductible Relativity and in the U.S. Actuarial Value. The IDR may be different for each category of services or branch provided by HP and the optimal strategy may be found assuming as objective the minimization of the risk, measured through the variance, or maximizing the expected gain. To this aim, we propose an optimization problem similar to the one proposed and solved by (De Finetti (1940)) in the context of proportional reinsurance. Firstly, we adopt a basic actuarial regression model based on a two-part approach (Negative Binominal - Gamma model) and we compare it with a Tweedie regression model. The latter has the main advantage to analyze the distribution of the loss, representing the total expenditure per policyholder, as a whole. This distribution is particularly indicated considering the large number of zero claims we observe in the data for each group of services. The regression models allow for the calculation of the mean and variance of the expenditure r.v. for each policyholder profile considered. Then, assuming independence among policyholders, we assess the overall expected expenditure and variance for each branch. These values are considered as the inputs of the optimization problem whose aim is to identify the optimal IDR for each branch considering the absence or the presence of correlation among branches.

We calibrate our model to a real database that offers five types of services. The findings provided confirm that the advantage function, as defined by de Finetti, represents the main quantity to be controlled to define an optimal strategy. In the case of the HP, the positive correlation observed among branches represents another relevant element to be considered. As observable, in the following Fig. 3 we compare the set of optimal solutions obtained with \(\delta =0.60\).

The quantitative analysis shows that to obtain the same gain, the positive correlation cases require a greater reduction of IDR. In such cases, the IDR is reduced over \(50\%\) but this condition may be not feasible as the IDR or AV is a commonly used method by policyholders for scoring a health insurance plan.

In the US HP context, it is relevant that the IDR is not lower than 60% as stated in Obamacare (2010) to ensure that policyholder can have a proper insurance cover. This means that further researches can be addressed to use 60% as lower limit for \(\alpha \) in the optimization problem (14). Such new constraint will give a sub-optimal result but is more compliant with indications in Obamacare (2010).