The run-off triangle data considered in this paper are from a large Canadian property and casualty insurance company. They consist of the cumulative paid losses and net earned premiums for six lines of automobile and home insurance business. Tables 13, 14, 15, 16, 17 and 18 in Appendix 2 show the paid losses for accident years 2003–12 inclusively for each of the six lines of business developed over at most ten years. To preserve confidentiality, all figures were multiplied by a constant. However, this is inconsequential because in order to account for the volume of business, the analysis focuses on the paid loss ratios, i.e., the payments divided by the net earned premiums.
Table 1 gives a descriptive summary of each line of business (LOB). There are five run-off triangles of personal and commercial auto lines with accident benefits and bodily injury coverages from three regions (Atlantic, Ontario and the West). Atlantic Canada consists of New Brunswick, Nova Scotia, Prince Edward Island and Newfoundland/Labrador; the West comprises Manitoba, Saskatchewan, Alberta, British Columbia, Northwest Territories, Yukon, and Nunavut. Given that Québec has a public plan for this section of auto insurance, business for that province is included only in the sixth triangle, which comprises the company’s country-wide Liability personal and commercial home insurance.
Table 1 Descriptive summary of six lines of business for a Canadian insurance company
Bodily injury (BI) coverage provides compensation to the insured if the latter is injured or killed through the fault of a motorist who has no insurance, or by an unidentified vehicle. The accident benefits (AB) coverage provides compensation, regardless of fault, if a driver, passenger, or pedestrian suffers injury or death in an automobile collision. Disability income is an insurance product that provides supplementary income when the accident results in a disability that prevents the insured from working at his/her regular employment. For this reason, AB disability income is considered separately from other AB. Finally, liability insurance covers an insured for his/her legal liability for injuries or damage to others.
Marginal GLMs for incremental loss ratios
For LOB \(\ell \in \{ 1,\ldots ,6 \} \), denote by \(Y_{ij}^{(\ell )}\) the incremental payment for the ith accident year and the jth development period, where \( i,j \in \{ 1,\ldots ,10\}\). Given that the earned premiums \(p_{i}^{(\ell )}\) vary with accident year i and line of business \(\ell \), it is convenient to model the loss ratios, defined by
$$\begin{aligned} X_{ij}^{(\ell )} = Y_{ij}^{(\ell )}/p_{i}^{(\ell )}. \end{aligned}$$
In Fig. 1, loss ratios \(X_{ij}^{(\ell )}\) for \(i=1,2\), \(j=1,\ldots , 11-i\) and \(\ell =1,\ldots ,6\) are shown. It is clear from the graph that the loss ratio depends on the development lag for every portfolio. By comparing the solid and dashed lines of the same color, one can also see that the accident year has an impact. In order to capture these patterns, we consider a regression model with two explanatory variables, i.e., accident year and development period. This is in line with the classical chain-ladder approach.
For LOB \(\ell \in \{ 1,\ldots ,6 \} \), let \(\kappa _{i}^{(\ell )}\) be the effect of accident year \(i \in \{1,\ldots , 10\}\) and \(\lambda _{j}^{(\ell )}\) be the effect of development period \(j \in \{1,\ldots ,10\}\). The systematic component for the \(\ell \)th line of business can then be written as
$$\begin{aligned} \eta _{ij}^{(\ell )} = \zeta ^{(\ell )} + \kappa _{i}^{(\ell )} + \lambda _{j}^{(\ell )}, \end{aligned}$$
where \(\zeta ^{(\ell )}\) is the intercept, and for parameter identification, we set \(\kappa _{1}^{(\ell )} = \lambda _{1}^{(\ell )} = 0\). There is no interaction term in this model, i.e., it is assumed that the effect of a given development period does not vary by accident year. While this assumption is hard to check, it is required to ensure that all parameters can be estimated from the 55 observations available.
In their analysis of dependent loss triangles using copulas, Shi and Frees [37] use the log-normal and Gamma distributions for incremental claims. Their justification applies here as well. Following these authors, we consider the link
$$\begin{aligned} \mu _{ij}^{(\ell )}=\eta _{ij}^{(\ell )} \end{aligned}$$
for a log-normal distribution with mean \(\mu _{ij}^{(\ell )}\) and standard deviation \(\sigma ^{(\ell )}\) on the log scale. For the Gamma distribution, however, we use the exponential link instead of the canonical inverse link in order to enforce positive means. When the Gamma distribution is selected, therefore, its scale and shape parameters are respectively denoted by \(\beta _{ij}^{(\ell )}\) and \(\alpha ^{(\ell )}\), and it is assumed that
$$\begin{aligned} \beta _{ij}^{(\ell )}=\exp (\eta _{ij}^{(\ell )})/\alpha ^{(\ell )}. \end{aligned}$$
Log-normal and Gamma distributions were fitted to all lines of business by the method of maximum likelihood. Table 2 shows the corresponding values of the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). These criteria suggest the choice of the log-normal distribution for the first line of business and the Gamma distribution for all others. These choices of models are confirmed by the Kolmogorov–Smirnov goodness-of-fit test, whose p-values are also given in Table 2. No model is rejected at the 1 % level. Q–Q plots (not shown) of standardized residuals (defined below) provide visual confirmation that the selected models are adequate, although the fit for LOB 6 is borderline.
Table 2 Fit statistics and goodness-of-fit test of marginals
Parameter estimates of the fitted models are given in Appendix 2 along with their standard errors. Using these values, one can estimate the total reserve of the portfolio by
$$\begin{aligned} \sum _{\ell =1}^{6}\sum _{i=2}^{10}\sum _{j=10-i+2}^{10}p_{i}^{(\ell )}\mathrm{E}(X_{ij}^{(\ell )}), \end{aligned}$$
where \(\mathrm{E}(X_{ij}^{(\ell )})\) is the projected unpaid loss ratio, and \(p_{i}^{(\ell )}\) is the premiums earned in the corresponding accident year i. For \(\ell =1\), we have
$$\begin{aligned} \mathrm{E}(X_{ij}^{(1)}) = \exp \{\hat{\mu }_{ij}^{(1)}+(\hat{\sigma }^{(1)})^{2}/2\}, \end{aligned}$$
while for \(\ell >1\), \(\mathrm{E}(X_{ij}^{(\ell )}) = \hat{\beta }_{ij}^{(\ell )}\hat{\alpha }^{(\ell )}\). The estimated reserves of the six lines of business are given at the bottom of Table 19 in Appendix 2, along with those derived from the chain-ladder method, which is the industry’s benchmark. The two methods lead to similar results and total reserve estimates of $438,088 and $453,686, respectively.
Exploratory dependence analysis
One would expect intuitively that the AB, BI and liability claim payments are associated, as these coverages all involve compensation for injuries or damage to the insured or to others. One may also wonder whether there exist interactions between portfolios across regions. In order to account for such dependencies between \(d\ge 2\) triangles, Shi and Frees [37] propose to link the marginal GLMs through a copula. This approach involves expressing the joint distribution of the loss ratios in the form
$$\begin{aligned} \Pr (X_{ij}^{(1)}\le x_{ij}^{(1)},\ldots ,X_{ij}^{(d)}\le x_{ij}^{(d)}) = C\{ \Pr (X_{ij}^{(1)}\le x_{ij}^{(1)}), \ldots , \Pr (X_{ij}^{(d)}\le x_{ij}^{(d)})\}, \end{aligned}$$
where C is a d-variate cumulative distribution function with uniform margins on (0, 1).
In order to select a copula C that appropriately reflects the dependence in the data, it is best to rely on rank-based techniques as they allow to separate the effect of the marginals from the dependence structure [14, 17].
To illustrate this point, consider first the graph displayed in the left panel of Fig. 2, which shows a scatter plot of the pairs \((X_{ij}^{(3)},X_{ij}^{(6)})\) with \(i,j\in \{1,\ldots ,10\}\) and \(j\le i\). This graph suggests a strong, positive dependence between BI in Western Canada and country-wide liability; in particular, the Pearson correlation is 0.56. However, the pattern of points on this graph is induced by the systematic effects of the development lags and accident years. For example, the seven points in the lower left corner of the graph all correspond to development years 7–10. As these effects are already accounted for by the marginal GLMs, this graph is uninformative (not to say misleading) for the selection of C.
To get insight into the dependence structure, it is more relevant to consider the residuals from the GLMs. For LOB 1, (standardized) residuals of the log-normal regression model can be defined, for all \(i,j\in \{1,\ldots ,10\}\) and \(j\le i\), as
$$\begin{aligned} \varepsilon ^{(1)}_{ij} = \{\ln (X^{(1)}_{ij})-\hat{\mu }_{ij}^{(1)} \}/\hat{\sigma }^{(1)}, \end{aligned}$$
while for LOB \(\ell \in \{2,\ldots ,6\}\), the fact that Gamma regression models were used leads to set
$$\begin{aligned} \varepsilon _{ij}^{(\ell )}=X_{ij}^{(\ell )}/\hat{\beta }_{ij}^{(\ell )}. \end{aligned}$$
In this fashion, the vectors \((\varepsilon ^{(1)}_{ij},\ldots ,\varepsilon ^{(6)}_{ij})\) with \(i,j\in \{1,\ldots ,10\}\) and \(j\le i\) form a pseudo-random sample from a distribution with copula C and margins approximately \(\mathscr{N}(0,1)\) for \(\ell =1\) and \(\mathscr{G}(\hat{\alpha }^{(\ell )},1)\), for \(\ell \in \{ 2,\ldots ,6\}\).
As an illustration, the middle panel of Fig. 2 shows a scatter plot of the pairs \((\varepsilon _{ij}^{(3)},\varepsilon _{ij}^{(6)})\). This graph suggests a form of positive dependence (Pearson’s correlation is 0.34), but the message is blurred by the effect of the Gamma marginals. As the goal is to select the copula C, which does not depend on the margins, it is preferable to plot the pairs of normalized ranks, as in the right panel of Fig. 2. For arbitrary \(i,j\in \{1,\ldots ,10\}\) and \(j\le i\), the standardized rank of residual \(\varepsilon _{ij}^{(\ell )}\) is defined by
$$\begin{aligned} R_{ij}^{(\ell )} = \frac{1}{55+1} \sum _{i^*=1}^{10}\sum _{j^*=1}^{11-i^*} \mathbf {1}(\varepsilon _{i^*j^*}^{(\ell )} \le \varepsilon _{ij}^{(\ell )}), \end{aligned}$$
where, in general, \(\mathbf {1}(A)\) is the indicator function of the set A and the division by 56 rather than 55 is to ensure that all standardized ranks are strictly comprised between 0 and 1.
Let \(C_n\) be the empirical distribution function of the vectors \((R_{ij}^{(1)},\ldots ,R_{ij}^{(d)})\), with \(i,j\in \{1,\ldots ,10\}\) and \(j\le i\). It can be shown, under suitable conditions on the underlying copula C, that \(C_n\) is a consistent estimator thereof. Accordingly, the vectors of standardized ranks, which form the support of \(C_n\), are a reliable tool for copula selection, fitting and validation. In particular, all rank-based tests of bivariate or multivariate independence are based on \(C_n\).
For example, the right panel of Fig. 2 shows the pairs of standardized ranks associated with the residuals from the West BI and the country-wide liability coverages. One can see from this graph that there is a residual dependence between these two portfolios. In particular, the correlation between these pairs is 0.40; this rank-based correlation is a consistent estimate of Spearman’s \(\rho \). Alternative copula-based measures of association between two variables are Kendall’s \(\tau \) and van der Waerden’s coefficient \(\Upsilon \). Thus one can test the null hypothesis of bivariate independence by checking whether the empirical values of these coefficients are significantly different from 0; see, e.g., [23]. Table 3 gives estimates of \(\rho \), \(\tau \) and \(\Upsilon \) for the pair \((\varepsilon ^{(3)},\varepsilon ^{(6)})\), along with the p-values of the corresponding tests; the null hypothesis of independence is rejected at the 1 % level in all cases.
Table 3 Nonparametric tests of independence
Table 4 Empirical values of Kendall’s \(\tau \) for all pairs in the portfolio
The null hypothesis of multivariate independence between the six LOBs can also be
assessed globally using rank tests based on d-variate generalizations of \(\rho \), \(\tau \) or \(\Upsilon \). In particular, the d-variate version of Kendall’s \(\tau \) is given, e.g., in [18], by
$$\begin{aligned} \tau _{d,n}=\frac{1}{2^{d-1}-1}\left\{ -1+\frac{2^d}{n(n-1)} \sum _{(i,j) \ne (i^*,j^*)}\mathbf {1}\left( \varepsilon _{i^*j^*}^{(1)}\le \varepsilon _{ij}^{(1)},\ldots ,\varepsilon _{i^*j^*}^{(d)}\le \varepsilon _{ij}^{(d)}\right) \right\} =0.035. \end{aligned}$$
Under the hypothesis of multivariate independence, \(\tau _{d,n}\) has mean 0, finite sample variance
$$\begin{aligned} \mathrm{var}(\tau _{d,n})=\frac{n(2^{2d+1}+2^{d+1}-4\times 3^d)+3^d(2^d+6)-2^{d+2}(2^d+1)}{3^d(2^{d-1}-1)^2n(n-1)}=1.59\times 10^{-4}, \end{aligned}$$
and its distribution is asymptotically Gaussian. The approximate p-value of the test is \(0.53~\%\), suggesting that the residuals are dependent. The most dependent pairs of variables can be identified from Table 4, where all values of \(\tau _{2,n}\) are displayed. Values shown in bold are those that would be significantly different from 0 at the 5 % level in a single pairwise test. Although this level must be interpreted with care due to the multiple comparison issue, the two largest values in Table 4 are still significantly different from 0 at the global 5 % level even when the very conservative Bonferroni correction is applied.
Table 5 Parameter estimates and goodness-of-fit test p-value
Given the presence of dependence, the challenge is then to select a copula that best reflects the association between the variables. Many parametric families of copulas are available; see, e.g., [27] or [30] for the definition and properties of the Clayton, Frank, Plackett and t copula families used subsequently. Given a class \(\mathscr{C}=\{C_\theta :\theta \in \Theta \}\) of d-dimensional copulas, a rank-based estimate \(\hat{\theta }\) of the dependence parameter \(\theta \) can be obtained from loss-triangle data by maximizing the pseudo log-likelihood
$$\begin{aligned} \mathscr{L}(\theta )=\sum _{i=1}^{10}\sum _{j=1}^{11-i}\ln \{c_\theta (R^{(1)}_{ij},\ldots , R^{(d)}_{ij})\}, \end{aligned}$$
where \(c_\theta \) is the density of \(C_\theta \). The consistency and asymptotic normality of estimators of this type was established in [15] under broad regularity conditions. The adequacy of the class \(\mathscr{C}\) can then be tested using the Cramér–von Mises statistic defined by
$$\begin{aligned} S_n=\int _{[0,1]^d} \left\{ C_n(u_1,\ldots ,u_d)-C_{\hat{\theta }}(u_1,\ldots ,u_d)\right\} ^2\mathrm{d}u_1\cdots \mathrm{d}u_d. \end{aligned}$$
The p-value of a test of the hypothesis \(\mathscr{H}_0: C\in \mathscr{C}\) based on the statistic \(S_n\) can be computed via a parametric bootstrap procedure described in [19]. Both the estimation and the goodness-of-fit procedures are available in the R package copula. For illustration, Table 5 shows the parameter estimates, standard deviation and the p-value of the goodness-of-fit test for four copula families fitted to the pairs of residuals \((\varepsilon ^{(3)},\varepsilon ^{(6)})\) from the West BI and country-wide Liability triangles. This suggests that the Clayton copula would be a poor choice for these data; given the small sample size, however, it does not seem possible to discriminate between the other three copula families on the basis of \(S_n\).
This model selection, fitting and validation procedure is standard and straightforward to implement in two dimensions. However, the canonical d-variate generalizations of bivariate copulas typically lack flexibility: either they are exchangeable and/or their lower-dimensional margins are all of the same type. With six lines of business, these assumptions may be too restrictive. As one can see in Fig. 3, different pairs of residuals exhibit different types of association; this is also confirmed by the values of Kendall’s \(\tau \) reported earlier in Table 4. In particular, Ontario LOBs exhibit positive dependence, while the BI coverages for Ontario and the West are negatively associated.
The fact that many variables are positively dependent is due in part to exogenous common factors such as inflation and interest rates. Furthermore, strategic decisions can impact several portfolios, e.g., the acceleration of payments on all lines of the liability insurance sector could induce some dependence between West BI and country-wide liability. At a more basic level, the positive association between Ontario AB and BI can be explained by the fact that the same accident will often arise in both coverages. Finally, jurisprudence can play a role. For example, reforms were engaged in the Atlantic region to control BI costs; this may explain why LOB 1 is seemingly independent from all other lines of business.