1 Introduction

The global spread of COVID-19 developed into the most serious threat to health since World War II. This paper deals with the assessment of health expenditure changes due to COVID-19 in a private Health Insurance Plan (henceforth HIP) and to understand if the policyholder behavior has been modified after the pandemic period; other researches have treated about similar issue.

In Banthin et al. (2020) a study was carried out on the effects of employment losses on health insurance coverage in the USA. The authors have calculated that 48 million people will live in families with a worker who experiences a COVID-19-related job loss in the last three quarters of 2020. Of them, 10.1 million lose employer coverage tied to that job. They have estimated that 32 percent of these people switch to another source of employer coverage through a family member.

In Hansen et al. (2021) potential inter-dependencies between the public health care system and the efficiency of lockdown measures are investigated. The results indicate that there is a trade-off between the stringency of the lockdown and the prevailing health expenditures. Less stringent lockdowns in countries with higher health expenditures had a similar impact on mortality as more stringent lockdowns in countries with lower health expenditures. They show that the effects of lockdown interventions were insignificant in developing countries with per capita health expenditure below the mean.

In Bundorf et al. (221) is assessed the extent to which insurance coverage changed between mid-April and December 2020 in the USA. Furthermore, a study was carried out about differences by prepandemic family income, age and ethnicity based on evidence of the pandemic’s disproportionate labor market effects across groups. The results indicate that much of the overall decline in coverage took place within a short 3-month period early in the pandemic. While employer-sponsored coverage declined throughout 2020, the decline was more fully offset by increases in other sources later in the year.

In Zuo and Zhai (2021), the work investigates the impact of China’s COVID-19 treatment policy on the sustainability of its Social Health Insurance (SHI), explores influences of the policy on Wuhan’s system, and discusses the effects of an assumed equivalent emergency on SHI funds for five other provincial capital cities in China. The results suggest the integration of insurance schemes and provincial pooling, fund balance adjusting and an emergency safety net is also advised. This paper contributes to the literature from a different point of view by focusing on the pre and post effects of COVID-19 on health care expenditure using data from an Italian private HIP. Our first quantitative question is: “How does COVID-19 impact the policyholder behavior of private health services after lockdown measures?”

The answer to this question is based on the measurement of the health care expenditure analyzing the number of episodes (henceforth frequency component) and the payment for the occurred episodes (henceforth severity component). The analysis has been carried out by observing the frequency and severity trends of the same insured persons during the pre-pandemic period 2017–2019 and mid/post-pandemic period 2020–2021 for several medical-care services.

Moreover, it is important to note that one standard method in actuarial literature is modelling the insurance expenditure by a two-part model (Frees 2009; Frees et al. 2013, 2016). Two-part models are useful for modelling the expectation of the interested expenditure variables through the product of the count estimates (used in episodes frequency) and the severity (used in episodes severity) estimates. As an alternative, to provide an estimate of the loss cost, also known as ‘pure premium’ using a single model it is possible to assume a Tweedie distribution. This distribution will help us in modeling pure premium directly without any need for two different models. Tweedie distribution is a special case of exponential dispersion models and is often used as a distribution for GLMs. It can have a cluster of data items at zero and this particular property makes it useful for modeling claims in the insurance industry. This model can also be applied in other use cases across industries where you find a mixture of zeros and non-negative continuous data points.

Hence, our second research question is: “Could a standard actuarial model, based on a Tweedie GLM, predict the effects of COVID-19 on the estimated expected loss costs?”

To this aim we conduct a quantitative analysis fitting a Tweedie GLM to our pre-pandemic data and then we forecast the health care expenditure in the pandemic and post-pandemic period; the model used allows for point estimate and confidence interval estimate of the interested loss variables. Furthermore, this analysis will be carried out considering not only calendar year and quarter but also policyholder features such as gender and age to asses which gender-age risk profiles have been most affected by pandemic outbreak. The rest of the paper is organized as follows: Sect. 2 introduces data and methods, Sect. 3 illustrates the results of a numerical application on a real insurance database and Sect. 4 concludes the work.

2 Data and Methods

2.1 Data

Health care expenditure over the period 2017–2021 is taken from a private HIP operating in Italy in a specific North region. The longitudinal data track the same sample of 16, 206 policyholders whose health care expenditures are collected quarterly. Longitudinal data allow for the measurement of within-sample change over time.

Italian’s HIPs usually offer supplementary and complementary health care protection to the services provided by the National Health Service (NHS); the latter assures free health care for its citizens. The health care services covered by the plan we analyzed are grouped as follows:

Table 1 Group of services (in alphabetic order) and annual number and amount of expenditure

These classes contain various medical-care episodes that can vary in the type of services and terms of frequency and severity. Then, medical-care episodes in the same class of service (henceforth class or group) are considered homogeneous. As shown in Table 1 the Specialist Medical Visits (SMV) represents the highest frequency group, while Rehabilitation & Physical Care (RBC) are the highest severity group.

The input data consist of the number (counts) and cost (expenditures) of invoices per quarter period and policyholder, distinguished by the group of services provided. The data set contains 66.69% of females and 33.31% of males, whereas the age distribution is quite similar, as shown in the following Fig. 1:

Fig. 1
figure 1

Gender and age distribution

It is worth noting that during the COVID-19 pandemic, Italy observed a hard lockdown period (Phase 1) between 9 March and 3 May 2020. Indeed, on 11 March, a Prime Ministerial Decree was published, nicknamed the “I Stay at Home Decree,” which provides for the suspension of everyday retail, commercial activities, educational activities, and catering services and prohibits gatherings of people in public places or places open to the public. The subsequent Phase 2 relaxed the containment measures (May 4–June 14), with a gradual relaxation of the previous containment measures, as the epidemic curve is in a downward phase. Phase 3 (June 15–October 7) and the subsequent phases consist of coexistence with COVID-19 that still loosens the containment measures, leading to a degree of “normality” in daily activities.

Therefore, according to the set theory, we denote A as the set of quarters before the pandemic (i.e., 2017–2019). Hence, A contains the “Pre” pandemic data. Then, we denote B and C as the quarters during (i.e., 2020) and after (i.e., 2021) the pandemic, respectively. Therefore, B and C contain the “During” and “Post” pandemic data. We analyze the behavior of the standard set (\(A \cup B \cup C\)) over a group of services.

2.2 Methodology: the frequency-severity model and the tweedie regression

Many insurance datasets are characterized by information about how often episodes arise in addition to the corresponding size. A standard model in actuarial literature to analyze such data consists of the so-called frequency-severity (or two-part) model, which separately models: the number of episodes or frequency per unit of exposure (i.e., time, value) and the per episode amount or severity. The frequency-severity method uses historical data to estimate the expected number of episodes and the expected cost of each episode during a given period.

In our context, following the collective risk theory approach (see, for instance, Daykin et al. (1993)), given a generic group of services k and n policyholders. Let i index the generic policyholder (\(1\le i \le n\)).

The ith policyholder expenditure (or aggregate loss) can be seen as the sum of the single expenditures, \(_kY_{ij}\), for the single service requested j in the group k, as follows:

$$\begin{aligned} {}_{k}S_{i}=\sum _{j=1}^{{{}_{k}}N_{i}} {{}_{k}}Y_{ij}, \end{aligned}$$
(2.1)

where \(_kN_{i}\) represents the random variable (r.v.) number of episodes for group of services k requested by the ith policyholder in a given period. The approach we use to investigate the effect of the pandemic is to analyze the distribution of the number and cost of services (as reported in formula (2.1)) of the single policyholder during a different set of periods (A, B, and C) previously defined. It is worth noting that, if there are no requests with policyholder i and service k, \(_kN_i=0\) implies that \(_kS_i=0\).

Assuming \(_{k}Y_{ij}\sim _{k}Y_{i}\) iid, respect to j, and independent of \(_{k}N_{i}\) we have:

$$\begin{aligned} \mathrm{E}(_{k}S_i)=\mathrm{E}(_{k}N_{i}) \mathrm{E}(_{k}Y_{i}),\,\,\,\ \mathrm{Var}(_{k}S_i)=\mathrm{E}(_{k}N_{i}) \mathrm{Var}(_{k}Y_{i}) +\mathrm{Var}(_{k}N_{i}) \mathrm{E}^2(_{k}Y_{i}). \end{aligned}$$
(2.2)

Finally, ignoring, for the sake of simplicity, the superscript k and subscript i. Suppose that \(N \sim Poisson(\lambda )\), and each \(Y \sim Gamma(\alpha , \theta )\) with \(\alpha \) and \(\theta \) shape and scale parameters, respectively. By using iterated expectations, Eq. (2.2) can be expressed as:

$$\begin{aligned} \mathrm{E} (S)=\lambda \alpha \theta ~~~~\mathrm{and}~~~~\mathrm{Var} (S)=\lambda \alpha (1+\alpha )\theta ^2. \end{aligned}$$
(2.3)

Now, define three parameters \(\mu \), \(\phi \), and p through the following reparametrization

$$\begin{aligned} \lambda =\frac{\mu ^{2-p}}{\phi (2-p)},~~~~\alpha =\frac{2-p}{p-1},~~~~\theta =\phi (p-1)\mu ^{p-1}, \end{aligned}$$
(2.4)

it is easy to show that \(\mathrm{E}(S)=\mu \) and \(\mathrm{Var} (S)=\phi \mu ^{p}\) (see Kaas 2005; Tweedie 1956). As a consequence, \(S \sim Tw(\mu , \phi )\) denotes a Tweedie random variable with mean \(\mu >0\), variance \(\phi \mu ^{p}\) \(\left( \phi >0\right) \), where \(\phi \) is the so-called “dispersion parameter”, and \(p \in (- \infty ,0] \cup [1,\infty )\) the power parameter. Its distribution belongs to the exponential family, allowing us to use the Tweedie distribution with GLMs to model \(\mathrm{E}(S_i)\) and \(\mathrm{Var}(S_i)\) for each ith risk profile. Tweedie distribution for two-part data is one of the most widely used mixture distributions in insurance claims modeling (Frees 2014). Indeed, this distribution will help us model the aggregate loss or pure premium directly without needing two different models for frequency and severity, respectively.

2.3 Model application

As previously stated, frequency-severity modeling is a standard actuarial approach to analyze insurance datasets. Anyway, the utilization of health-care services can be influenced by a set of explanatory variables categorized as demographic and geographic, among others. For example, demographic factor like age shows that, in some service, an increase in age results in an increasing impact on the demand for health care. Similarly, gender can be treated as a proxy for inherited health and different habits in maintaining health. We model this independent explanatory variables by the row vector \(\mathbf {x}_{i}\).

Since analyzing health expenditure rating factors such as gender and age are fundamental, a univariate analysis for each risk class would require \(2\cdot (\omega -1)\) two-part models, where \(\omega -1\) is the maximum age. Even in such a simple case, a univariate approach for each homogeneous risk class is not feasible; a multivariate approach is more appropriate. An industry-wide approach is based on multivariate regression models such as Generalized Linear Model (GLM), which has found extensive use in actuarial practice.

As the expenditure per single policyholder and group of services k has a mass probability at zero and a positive, continuous, and often right-skewed component for positive values, Tweedie regression is a valid candidate to predict such an expenditure.

As demonstrated in Kurz (2017), Tweedie distribution fits health care cost data very well and provides better fit, especially when the number of non-users is low and the correlation between users and non-users is high. The author states that a common way to counter this is the use of Two-part or Tobit models, which makes interpretation of the results more challenging. Tweedie distribution provides an interesting solution to many statistical problems in health economic analyses.

The Tweedie regression is a particular GLM based on the assumption that the dependent variable \(_kS_i\) is Tweedie distributed and that its mean is related to a set of covariates through a linear predictor \(\mathbf {x}_{i}\) with unknown coefficients \({\varvec{_k\beta }}\) and a link function g. The conditional expected expenditure and variance for the ith policyholder and kth group of services are given by:

$$\begin{aligned} _k\mu _i=\mathrm{E}\left[ _kS|\mathbf {x}_{i} \right] =\mathrm{E}\left[ _kS_i \right] =g^{-1}(\mathbf {x}_{i} \cdot _k {\varvec{\beta }}), \,\,\,\, _k\sigma _i ^2=\mathrm{Var} \left[ _kS|\mathbf {x}_{i} \right] =\mathrm{Var} \left[ _kS_i \right] =\phi \cdot _k\mu _i^p \end{aligned}$$
(2.5)

The estimate of the row-vector of regression parameters \({\varvec{_k\beta }}\), of the dispersion parameter \(\phi \) and the power p are provided by a Maximum Likelihood Estimation (MLE) approach (see Frees 2014).

In our particular case, the \(\mathbf {x}_{i}\) contains information about Gender “G” (M, F), Age “A” (from 0 to 70), Quarter “Q” (Q1, Q2, Q3 & Q4) and Year “Y” (2017, 2018 & 2019) of the ith observation. For a given group of services \(k = \left\{ \mathrm{OH, SC,DC, RPC}\right\} \), we fit the following model:

$$\begin{aligned} \mathrm{E}\left[ _{k}S|\mathbf {x}_{i}\right] =e^{\left( {_{k}\beta _0}+_{k}\beta _{Y,w}\ln \left( Y_i\right) +{_{k}\beta _{G}}G_i+\sum _{w=1}^\delta {_{k}\beta _{A,w}}A_i^w+_k\beta _{Q2} Q2_i+_k\beta _{Q3} Q3_i+_k\beta _{Q4} Q4_i\right) } \end{aligned}$$
(2.6)

Where as, for the group of services SMV we fit the following model:

$$\begin{aligned} \mathrm{E}\left[ _{SMV}S|\mathbf {x}_{i}\right] =e^{\left( {_{SMV}\beta _0}+\sum _{w=1}^\delta {{_{SMV}\beta _{Y,w}}}{Y_i}^w+{_{SMV}\beta _{G}}G_i+\sum _{w=1}^\delta {_{SMV}\beta _{A,w}}A_i^w+_{SMV}\beta _{Q2} Q2_i+_{SMV}\beta _{Q3} Q3_i+_{SMV}\beta _{Q4} Q4_i\right) } \end{aligned}$$
(2.7)

where \(\delta \) represents the degree of polynomial used for the numerical variables (i.e. Year, and Age). In this work, to avoid over parametrization, we set \(\delta =3\).

The model in Eq. (2.7) differs from the one in Eq. (2.6) just in the use of a polynomial function instead of a logarithmic function for the variable Year. The motivation for this choice is due to a better fitting of the model to the observed data.

The model coefficients are estimated using the MASS and tweedie packages of the R statistical software program (R Core Team 2016).

3 Results

3.1 Frequency results

To assess if COVID-19 has impacted the policyholder behavior in our HIP, we measured the frequency and severity components over the set of periods previously denoted as A, B, and C. Remembering that our longitudinal data refers to 16, 206 policyholders quarterly tracked, in Table 2 we show the quarterly average of the number of policyholders without any request of reimbursement over the selected set and group. Moreover, we show the quarterly average of the number of episodes and their expenditure.

Table 2 Data base basic information: frequency of zero episodes and severity quarterly mean over set

At first sight, the analysis may suggest an unchanged policyholder behavior if we compare only the number of policyholders without episodes in sets A and C. Where as, the overall number of medical-care episodes in each group shows a general decrease. It is evident that the number of episodes in each class has declined during the COVID-19 pandemic, given the hard lockdown measures adopted by Italian governments during the first two quarters of 2020. It is less evident that in the post-pandemic period, the number of episodes slightly realigned to pre-pandemic data, especially for some groups. Indeed, by comparing 2021 values with pre-pandemic years, the number of episodes shows a significant reduction; for example, SMV and RPC reduce by 17% and 23%, respectively, although the amount of claims is not reduced by the same amount.

A step forward in the analysis consists of a graphical comparison of the numbers and average expenditure per person and period as proposed in Fig. 2 and 3. Each panel contains the chart of a group of services with the evidence of each set of periods examined.

Fig. 2
figure 2

Number of episodes distribution by service

The kernel density function of the frequency proposed in Fig. 2 empathize the differences among periods; indeed, set B and C show less “heavy-tailed” distributions compared to set A. These outcomes are confirmed by the descriptive statistics reported in Table 3.

Table 3 Descriptive statistics: Frequency of episodes per policyholder among periods

The mean of set C seems to be realigned with pre-covid data, though high frequency groups, such as DC and SMV, are still lower than A values. This is probably due to the types of medical care episodes included in these classes that are not always strictly necessary and, therefore, can be postponed.

The mean of set C seems to be realigned with pre-covid data, though high-frequency groups, such as DC (0.06) and SMV (0.19), are still lower than A values (0.07 and 0.23, respectively). This is probably due to the types of medical care episodes included in these classes that are not always strictly necessary and, therefore, can be postponed.

Moreover, positive values for the skewness indicate data that are skewed right, which means that the right tail is long relative to the left tail. In addition, kurtosis over 3 indicates a “heavy-tailed” distribution. Significant skewness and kurtosis clearly indicate that data are not normal. Furthermore, A data figures out a heavy right tails per each group compared to the other sets with some exceptions. OH and SMV show distributions very similar to each other in sets A and C, where as the rest of the groups in the A set exhibits longer tails.

Fig. 3
figure 3

Average Cost distribution by group pf services

By comparing severity charts in Fig. 3, where zero episodes data are excluded, we can state that are no evident changes in the shape before, during, and after COVID-19. This is expected as the pandemic has a more significant impact on frequency than expenditure. Among the most significant changes in severity, we observe the increase in the A set’s leptokurtic shape (e.g., DC, OH, and RPC), though no longer tail between the examined periods. Descriptive statistics in Table 4 confirms the mean increase and the kurtosis decrease from set A to C.

Table 4 Descriptive statistics: Episode severity per policyholder among periods

The increase in the mean can be justified by a large number but less expensive medical-care episodes in the pre-pandemic periods. Where as, as stated before, post-pandemic episodes, not strictly necessary and probably less costly, can be postponed.

Lastly, to deeply explore the frequency where policyholder behavior seems to have changed, we analyze the quarterly probability mass at zero, corresponding to the event of no episodes.

Fig. 4
figure 4

Box plot of quarterly no-episode frequency during A period. Filled circles and triangles represent the quarterly no-episode frequency during a pandemic and post-pandemic quarters in different colors

The box plot in Fig. 4 summarizes the distribution of the quarterly no-episode frequency observed during the (twelve) quarters included in set A. It displays its median and first and third quartiles in the gray box. At the same time, the dots and triangles points refer to the four quarters in sets B and C, respectively. As expected, each group’s highest no-episode frequency is registered during Q2-2020 (i.e., during lockdown). However, the 2021 data shows an alignment to the pre-pandemic data as the triangle data points are most included between the first and third quartiles of the set A. Therefore, we can conclude that in the post-pandemic period analyzed, COVID-19 appears to have had no significant effect on the behavior of policyholders with the private medical coverage provided by a HIP.

3.2 Tweedie results

The second part of our analysis is devoted to assess the capability of a standard actuarial approach, based on a Tweedie GLM, to predict the HIP expenditure during and after the pandemic. To this aim, we conduct the analysis by splitting our dataset into a training set, composed of the A set, and a test set composed by the union of the set B and C. The training dataset represents 65,13% of the number of services and 62.92% of the total expenditure, where as the test set the 34.87% and 37.08%, respectively. The analysis is carried out implementing five independent Tweedie models, one for each group of services. As reported in Eqs. (2.6) and (2.7), we consider gender, age, quarter, and year as covariates; age and year are quantitative and continuous, where as gender and quarters are categorical variables. It is worth noting, that the parameters are estimated considering only the observation of the training set, in order to predict values on the test set.

The first stage of the calibration of the Tweedie GLM consists of the selection of the optimal value of the power p, which, for our purposes, needs to vary between 1 to 2. The methodology we adopt consists of testing a sequence of p between 1 and 2 by running iterative models and selecting the value corresponding to the maximum log-likelihood value. In Fig. 5, we show the traditional log-likelihood values that show an inverse “U” shape.

Fig. 5
figure 5

Maximum Likelihood Estimate

In Table 5, we show the parameter estimates and the significance measured by p-values.

Table 5 Output of tweedie regression

The probability distribution of r.v. \(_kS_h\) will be estimated conditioned to each risk class h, the latter being defined by combining the levels of the covariates considered here. All the insured belonging to the same risk class have the same estimates in terms of expected and tail behavior.

Figure 6 shows the distribution of the expected expenditure \(E(_kS_h)\) (solid line) per age, group of services, and year, predicted via Tweedie GLM, compared to the observed values (solid points). Moreover, we plot the 99% confidence interval estimate of our predictions, by adding shadowed areas to our lines.

Fig. 6
figure 6

Health care expenditure: a comparison of observed (solid points) and predicted value (solid line) with confidence interval (shadowed area) per Age, Group of services and Year

The graphs illustrate that the expenditure shape in each group remains essentially the same during and after the pandemic. It means that there is no evidence of a change in the policyholder behavior per age. Anyway, by looking at the left column of the graphs (i.e., the year 2020), there is a reduction in the expenditure in all groups, given the decrease in the frequency of the episodes, as previously exposed in Fig. 4 and Table 3. Moreover, the observed data (solid points) are generally located on or under the lower confidence interval of the mean value with a confidence level of 99%. These values confirm the exceptional nature of the pandemic phenomenon observed in 2020. Otherwise, the right column refers to the predicted value in 2021 by age, where the observed values are mainly in the prediction area. By way of example, SMV is the group with the highest number of episodes and the second in total expenditure (see Table 2) and displays observed values strictly in line with the predicted one. Similar behavior can be viewed in SC and RPC groups, where as DC and OH seem to overpredict the observations slightly.

Considering that the Tweedie model proposed also provides the predicted values concerning gender, quarter, and year, we compare in Fig. 7 the time series of the observed values with the predicted one.

Fig. 7
figure 7

Health care expenditure: time series of observed (solid points) and predicted value (solid line) with confidence interval (shadowed area) per Age, Gender, Group of services and Quarter

The outcomes provide a clear picture of the expenditure trend during and after the pandemic and the model’s prediction capability. Panels (a) and (b) figure the trend over different groups of services and quarters for females and males, respectively; the graphs limit the past data to 2019 for simplicity. As observable, there is no firm evidence of a difference between females and males in the observed data, which means that the medical services provided in each group are not limited to gender, as could be in the case of gynecology.

On the contrary, quarterly observed data, especially in the first two quarters of 2020, clearly depicts the model’s limits to predict the effect of lockdown measures adopted by the Italian government. However, the rest of the aforementioned quarterly data allows us to capture almost all the observed data. This demonstrates, on the one hand, the excellent prediction capacity of the model. On the other hand, after the government’s restrictive measures, the situation, at least for HIP, has returned to normal. The total expenditure predicted by the Tweedie model for years included in the test data is summarized in the following Table 6.

Table 6 Total expenditure per Year, observed vs predicted values in ,000

The yearly expenditure foreseen by the Tweedie model in each group of services shows that the 2021 values are generally included in the confidence interval except for DC, where the cost exceeds the lower bound of about 5%. Anyway, the goal of the HIP is to estimate the overall expenditure accurately. In this context, the model allows a difference between the predicted and observed mean values of about 15% and, in any case, well within the confidence interval. This result confirms the ability of the Tweedie model to predict healthcare cost data.

4 Conclusion

This study investigates the effects of COVID-19 on the health care expenditure of an Italian private HIP, which provides five groups of medical-care services. We first analyze if a relevant change in policyholder behavior is detected after the pandemic. To this aim, we adopt a basic actuarial approach based on a frequency-severity method, and we compare the before-pandemic data with the during and after one. Our results confirm no relevant evidence of a change in policyholder behavior. The only exception is observed during the pandemic due to the lockdown measure adopted by the Italian government in the first two quarters of 2020. In 2021 the main effect was a slight decline in the number of episodes but substantial stability of the average expenditure.

Then, we apply a Tweedie regression model whose main advantage is to analyze the distribution of the loss cost or pure premium, representing the total expenditure per policyholder. This distribution is particularly indicated considering the large number of zero episodes, we observe in the data for each group of services. This particular case of GLM is used to test the capacity to predict health care expenditure even in stressful conditions. We conduct a quantitative analysis comparing observed values with point estimates and confidence interval estimates obtained by the regression model. Furthermore, this analysis is carried out concerning basic policyholder features such as gender and age to asses if some profiles have been most affected by the pandemic. We highlight that, despite the presence of lockdown restrictions, the predictive model captures the observed values well, confirming that the post-pandemic situation returns to a state of normality of behavior. This conclusion is approved by the fact that the values observed in 2021 are generally within the confidence interval produced by the model on pre-pandemic data. Furthermore, the differentiated analysis by gender and age confirms no clear evidence of a change in policyholder behavior in each group of services analyzed.

We conclude that using Tweedie GLM to examine the HIP expenditure is a valid forecasting method. Its dynamics over time confirm an excellent ability to detect potential behavior changes.