Rank-based methods for modeling dependence between loss triangles

In order to determine the risk capital for their aggregate portfolio, property and casualty insurance companies must fit a multivariate model to the loss triangle data relating to each of their lines of business. As an inadequate choice of dependence structure may have an undesirable effect on reserve estimation, a two-stage inference strategy is proposed in this paper to assist with model selection and validation. Generalized linear models are first fitted to the margins. Standardized residuals from these models are then linked through a copula selected and validated using rank-based methods. The approach is illustrated with data from six lines of business of a large Canadian insurance company for which two hierarchical dependence models are considered, i.e., a fully nested Archimedean copula structure and a copula-based risk aggregation model.


Introduction
In Canada, the Own Risk Solvency and Assessment (ORSA) guideline from the Office of the Superintendent of Financial Institutions (OSFI) requires that insurance companies set internal targets for risk capital that are tailored to their consolidated operations. In order to relate risk to capital and consider their operations as a whole, insurers are encouraged to develop internal models for the aggregation of dependent risks. Similar regulations exist in many countries worldwide.
To comply with regulatory standards, property and casualty insurance companies have to hold reserves and risk capital relating to losses that are incurred but not yet paid. For each line of business, payments relating to past claims are usually structured in a run-off triangle arranged to rows according to the accident years, and to columns according to the development periods, i.e., the years since the accident occurred. In order to determine a reserve, one must forecast the payments that these ongoing claims will induce in future years, i.e., one must extend each triangle to a rectangle by predicting the missing entries.
Several nonparametric approaches are available for developing claims in a runoff triangle, most notably the chain-ladder method. In order to account for the dependence between triangles, multivariate extensions of this technique have been proposed, e.g., in [7,28,31,34,41]. These techniques account for dependence in the computation of reserves and their prediction errors but they do not provide the predictive distribution needed to obtain risk measures such as Value-at-Risk (VaR) or Tail Value-at-Risk (TVaR). Their use in the determination of risk capital is therefore limited.
Parametric approaches leading to the distribution of unpaid losses have been considered, e.g., in [1,8,12,29,36,37]. Models investigated in these articles incorporate dependence between lines of business and/or within calendar years of a line of business through Gaussian, Archimedean or Hierarchical Archimedean copulas. In these papers, the total reserve estimate in the presence of dependence is not equal to the sum of the marginal reserves estimated assuming independence. This is a by-product of the joint estimation of the marginal and dependence parameters, which relies heavily on the choice of multivariate model for the run-off triangles. An inadequate choice of dependence structure may then have a large, undesirable effect on the estimation of the reserves. This is particularly worrying given that this choice is typically based on very few data points (e.g., 55 observations for 10 accident years and 10 development periods). Tools are thus needed for assessing the dependence between run-off triangles and selecting an appropriate model.
In this paper, we address this inferential issue within the context of a multivariate extension of the pairwise model of [37], where the dependence between corresponding cells of different run-off triangles is described by a copula. We propose to use an alternative two-stage inference strategy, in which generalized linear models (GLMs) are first fitted to the margins, thereby fixing the estimates of the reserves. In the second step, standardized residuals from those models are linked through a dependence structure estimated using rank-based methods. This general approach has a long history in the copula modeling literature; see, e.g., [14] or [17] for reviews. When dealing with identically distributed data, rank-based methods are well-established tools for selecting, estimating and validating copulas. To our knowledge, however, these techniques have never been applied to run-off triangles.
To illustrate the proposed approach, we consider run-off triangles for six portfolios from a large Canadian property and casualty insurance company. These data are described in Sect. 2 and appended. In Sect. 2.1, GLMs with log-normal and Gamma distributions are fitted to the individual portfolios, and the properties of these two parametric families are exploited in Sect. 2.2 to define residuals that are suitable for a dependence analysis through ranks. Two different hierarchical approaches are then explored for modeling the dependence between the lines of business.
In Sect. 3, a nested Archimedean copula model is fitted, along the same lines as [1]. As this model imposes many constraints on the dependence structure and the choice of copulas, a more flexible approach considered in [4,11] is implemented in Sect. 4. Risk capital calculations and allocations for the two models are compared in Sect. 5, and Sect. 6 summarizes the pros and cons of these approaches. Appendix 1 contains density calculations for the nested Archimedean copula model, and the data (up to a multiplicative factor for confidentiality purposes) are provided in Appendix 2, along with parameter estimates of the marginal GLMs.

Data
The run-off triangle data considered in this paper are from a large Canadian property and casualty insurance company. They consist of the cumulative paid losses and net earned premiums for six lines of automobile and home insurance business. Tables 13,14,15,16,17 and 18 in Appendix 2 show the paid losses for accident years 2003-12 inclusively for each of the six lines of business developed over at most ten years. To preserve confidentiality, all figures were multiplied by a constant. However, this is inconsequential because in order to account for the volume of business, the analysis focuses on the paid loss ratios, i.e., the payments divided by the net earned premiums. Table 1 gives a descriptive summary of each line of business (LOB). There are five run-off triangles of personal and commercial auto lines with accident benefits and bodily injury coverages from three regions (Atlantic, Ontario and the West). Atlantic Canada consists of New Brunswick, Nova Scotia, Prince Edward Island and Newfoundland/Labrador; the West comprises Manitoba, Saskatchewan, Alberta, British Columbia, Northwest Territories, Yukon, and Nunavut. Given that Québec has a public plan for this section of auto insurance, business for that province is included only in the sixth triangle, which comprises the company's country-wide Liability personal and commercial home insurance.
Bodily injury (BI) coverage provides compensation to the insured if the latter is injured or killed through the fault of a motorist who has no insurance, or by an unidentified vehicle. The accident benefits (AB) coverage provides compensation, regardless of fault, if a driver, passenger, or pedestrian suffers injury or death in an automobile collision. Disability income is an insurance product that provides supplementary income when the accident results in a disability that prevents the insured from working at his/her regular employment. For this reason, AB disability income is considered separately from other AB. Finally, liability insurance covers an insured for his/her legal liability for injuries or damage to others.

Marginal GLMs for incremental loss ratios
For LOB ' 2 f1; . . .; 6g, denote by Y ð'Þ ij the incremental payment for the ith accident year and the jth development period, where i; j 2 f1; . . .; 10g. Given that the earned premiums p ð'Þ i vary with accident year i and line of business ', it is convenient to model the loss ratios, defined by In Fig. 1, loss ratios X ð'Þ ij for i ¼ 1; 2, j ¼ 1; . . .; 11 À i and ' ¼ 1; . . .; 6 are shown. It is clear from the graph that the loss ratio depends on the development lag for every portfolio. By comparing the solid and dashed lines of the same color, one can also see that the accident year has an impact. In order to capture these patterns, we consider a regression model with two explanatory variables, i.e., accident year and development period. This is in line with the classical chain-ladder approach.
For LOB ' 2 f1; . . .; 6g, let j ð'Þ i be the effect of accident year i 2 f1; . . .; 10g and k ð'Þ j be the effect of development period j 2 f1; . . .; 10g. The systematic component for the 'th line of business can then be written as where f ð'Þ is the intercept, and for parameter identification, we set j There is no interaction term in this model, i.e., it is assumed that the effect of a given development period does not vary by accident year. While this assumption is hard to check, it is required to ensure that all parameters can be estimated from the 55 observations available. In their analysis of dependent loss triangles using copulas, Shi and Frees [37] use the log-normal and Gamma distributions for incremental claims. Their justification applies here as well. Following these authors, we consider the link for a log-normal distribution with mean l ð'Þ ij and standard deviation r ð'Þ on the log scale. For the Gamma distribution, however, we use the exponential link instead of the canonical inverse link in order to enforce positive means. When the Gamma distribution is selected, therefore, its scale and shape parameters are respectively denoted by b ð'Þ ij and a ð'Þ , and it is assumed that Log-normal and Gamma distributions were fitted to all lines of business by the method of maximum likelihood. Table 2 shows the corresponding values of the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). These criteria suggest the choice of the log-normal distribution for the first line of business and the Gamma distribution for all others. These choices of models are confirmed by the Kolmogorov-Smirnov goodness-of-fit test, whose p-values are also given in Table 2. No model is rejected at the 1 % level. Q-Q plots (not shown) of standardized residuals (defined below) provide visual confirmation that the selected models are adequate, although the fit for LOB 6 is borderline. Parameter estimates of the fitted models are given in Appendix 2 along with their standard errors. Using these values, one can estimate the total reserve of the portfolio by where EðX  Table 19 in Appendix 2, along with those derived from the chain-ladder method, which is the industry's benchmark. The two methods lead to similar results and total reserve estimates of $438,088 and $453,686, respectively.

Exploratory dependence analysis
One would expect intuitively that the AB, BI and liability claim payments are associated, as these coverages all involve compensation for injuries or damage to the insured or to others. One may also wonder whether there exist interactions between portfolios across regions. In order to account for such dependencies between d ! 2 triangles, Shi and Frees [37] propose to link the marginal GLMs through a copula. This approach involves expressing the joint distribution of the loss ratios in the form where C is a d-variate cumulative distribution function with uniform margins on (0, 1). In order to select a copula C that appropriately reflects the dependence in the data, it is best to rely on rank-based techniques as they allow to separate the effect of the marginals from the dependence structure [14,17].
To illustrate this point, consider first the graph displayed in the left panel of Fig. 2, which shows a scatter plot of the pairs ðX ð3Þ ij ; X ð6Þ ij Þ with i; j 2 f1; . . .; 10g and j i. This graph suggests a strong, positive dependence between BI in Western Canada and country-wide liability; in particular, the Pearson correlation is 0.56. However, the pattern of points on this graph is induced by the systematic effects of the development lags and accident years. For example, the seven points in the lower left corner of the graph all correspond to development years 7-10. As these effects are already accounted for by the marginal GLMs, this graph is uninformative (not to say misleading) for the selection of C.
To get insight into the dependence structure, it is more relevant to consider the residuals from the GLMs. For LOB 1, (standardized) residuals of the log-normal regression model can be defined, for all i; j 2 f1; . . .; 10g and j i, as while for LOB ' 2 f2; . . .; 6g, the fact that Gamma regression models were used leads to set In this fashion, the vectors ðe ð1Þ ij ; . . .; e ð6Þ ij Þ with i; j 2 f1; . . .; 10g and j i form a pseudo-random sample from a distribution with copula C and margins approximately Nð0; 1Þ for ' ¼ 1 and Gðâ ð'Þ ; 1Þ, for ' 2 f2; . . .; 6g.
As an illustration, the middle panel of Fig. 2 shows a scatter plot of the pairs ðe ð3Þ ij ; e ð6Þ ij Þ. This graph suggests a form of positive dependence (Pearson's correlation is 0.34), but the message is blurred by the effect of the Gamma marginals. As the goal is to select the copula C, which does not depend on the margins, it is preferable to plot the pairs of normalized ranks, as in the right panel of Fig

Ranks of Residuals
West BI Country−wide Liability Modeling dependence in run-off triangles 383 where, in general, 1ðAÞ is the indicator function of the set A and the division by 56 rather than 55 is to ensure that all standardized ranks are strictly comprised between 0 and 1.
Let C n be the empirical distribution function of the vectors ðR ð1Þ ij ; . . .; R ðdÞ ij Þ, with i; j 2 f1; . . .; 10g and j i. It can be shown, under suitable conditions on the underlying copula C, that C n is a consistent estimator thereof. Accordingly, the vectors of standardized ranks, which form the support of C n , are a reliable tool for copula selection, fitting and validation. In particular, all rank-based tests of bivariate or multivariate independence are based on C n .
For example, the right panel of Fig. 2 shows the pairs of standardized ranks associated with the residuals from the West BI and the country-wide liability coverages. One can see from this graph that there is a residual dependence between these two portfolios. In particular, the correlation between these pairs is 0.40; this rank-based correlation is a consistent estimate of Spearman's q. Alternative copulabased measures of association between two variables are Kendall's s and van der Waerden's coefficient !. Thus one can test the null hypothesis of bivariate independence by checking whether the empirical values of these coefficients are significantly different from 0; see, e.g., [23]. Table 3 gives estimates of q, s and ! for the pair ðe ð3Þ ; e ð6Þ Þ, along with the p-values of the corresponding tests; the null hypothesis of independence is rejected at the 1 % level in all cases.
The null hypothesis of multivariate independence between the six LOBs can also be assessed globally using rank tests based on d-variate generalizations of q, s or !.
In particular, the d-variate version of Kendall's s is given, e.g., in [18], by Under the hypothesis of multivariate independence, s d;n has mean 0, finite sample variance and its distribution is asymptotically Gaussian. The approximate p-value of the test is 0:53%, suggesting that the residuals are dependent. The most dependent pairs of  Table 4 are still significantly different from 0 at the global 5 % level even when the very conservative Bonferroni correction is applied.
Given the presence of dependence, the challenge is then to select a copula that best reflects the association between the variables. Many parametric families of copulas are available; see, e.g., [27] or [30] for the definition and properties of the Clayton, Frank, Plackett and t copula families used subsequently. Given a class C ¼ fC h : h 2 Hg of d-dimensional copulas, a rank-based estimateĥ of the dependence parameter h can be obtained from loss-triangle data by maximizing the pseudo log-likelihood where c h is the density of C h . The consistency and asymptotic normality of estimators of this type was established in [15] under broad regularity conditions. The adequacy of the class C can then be tested using the Cramér-von Mises statistic defined by The p-value of a test of the hypothesis H 0 : C 2 C based on the statistic S n can be computed via a parametric bootstrap procedure described in [19]. Both the estimation and the goodness-of-fit procedures are available in the R package copula.
For illustration, Table 5 shows the parameter estimates, standard deviation and the p-value of the goodness-of-fit test for four copula families fitted to the pairs of residuals ðe ð3Þ ; e ð6Þ Þ from the West BI and country-wide Liability triangles. This suggests that the Clayton copula would be a poor choice for these data; given the small sample size, however, it does not seem possible to discriminate between the other three copula families on the basis of S n . This model selection, fitting and validation procedure is standard and straightforward to implement in two dimensions. However, the canonical d-variate generalizations of bivariate copulas typically lack flexibility: either they are exchangeable and/or their lower-dimensional margins are all of the same type. With six lines of business, these assumptions may be too restrictive. As one can see in Fig. 3, different pairs of residuals exhibit different types of association; this is also confirmed by the values of Kendall's s reported earlier in Table 4. In particular, Ontario LOBs exhibit positive dependence, while the BI coverages for Ontario and the West are negatively associated. The fact that many variables are positively dependent is due in part to exogenous common factors such as inflation and interest rates. Furthermore, strategic decisions can impact several portfolios, e.g., the acceleration of payments on all lines of the liability insurance sector could induce some dependence between West BI and country-wide liability. At a more basic level, the positive association between Ontario AB and BI can be explained by the fact that the same accident will often arise in both coverages. Finally, jurisprudence can play a role. For example, reforms were engaged in the Atlantic region to control BI costs; this may explain why LOB 1 is seemingly independent from all other lines of business.
This model is such that if ðU 1 ; . . .; U dþ1 Þ is distributed as C d , the copula linking variables U j and U k is Archimedean with generator u kÀ1 for all j\k. Because of condition (2), one must also have Algorithms for generating data from C d were given in [21,26]. Hofert and Mächler [22] also wrote the R package nacopula (now merged into copula) that can be used to simulate from fully nested Archimedean copulas in any dimension. Figure 4 depicts the fully nested Archimedean structure used to model the dependence between the residuals of the six lines of business. In this structure, copula C 1 links the two components of the Ontario AB coverage. Their dependence with Ontario BI coverage is then incorporated at level 2. The West BI and the country-wide Liability coverages are then included at levels 3 and 4, respectively. Anti-ranks (i.e., the ranks of the negative residuals) had to be used at levels 3 and 4, because of the constraints imposed by (1) and the fact that the residuals for LOB 3 are negatively associated with LOB 2 and positively associated with LOB 6. Finally, the Atlantic BI coverage was included at the last step given its apparent lack of dependence with the other lines of business. This overall structure is in accordance with ratemaking practices, as the rating is typically performed on a territorial basis. One may thus expect the dependence between lines of business to be larger when they are from the same region than when they are not.
In what follows, it is assumed that for each k 2 f1; . . .; 5g and all t 2 ð0; 1Þ, u k ðtÞ ¼ À ln e Àth k À 1 e Àh k À 1 for some h k 2 R. In other words, the nested copulas are taken to be from the Frank family, which spans all degrees of dependence between À1 and 1, as measured by Kendall's s. A rank-based estimateĥ of the vector h ¼ ðh 1 ; . . .; h 5 Þ characterizing the dependence structure is then obtained by maximizing the pseudo-likelihood function ij ; R ij ; 1 À R where c is the density of the fully nested Archimedean copula. As shown in Appendix 1, the evaluation of this density is straightforward but computationally intensive in high dimensions. Therefore, due to evidence that residuals for LOB 1 are independent from residuals for other LOBs, h 5 was set equal to 0. The maximization of the pseudo-likelihood for the model with four levels leads to the parameter estimateĥ ¼ ð2:693; 2:354; 1:782; 0:867Þ. However, a 95 % confidence interval for h 4 based on 1000 bootstrap replicates includes 0, which corresponds to independence in the Frank copula family. Accordingly, the dependence is significant only in the first three levels of the hierarchy. The parameters of the reduced model with h 4 ¼ h 5 ¼ 0 were estimated once again by the maximum pseudo-likelihood method. This led toĥ ¼ ð2:577; 2:233; 1:776Þ, whose components are all significantly different from 0. Figure 5 shows the approximate distribution ofĥ 3 (left),ĥ 2 (middle), andĥ 1 (right) based on 10,000 bootstrap replicates. In that figure, the dashed blue lines represent 95 % confidence intervals for the parameters, none of which includes 0. To check for model adequacy, a random sample of size 500 from the fitted model was generated. A test of the hypothesis that the underlying copula of this sample is the same as that of the original data was then carried out using the rank-based procedure in [32]. The test statistic was computed with the R package TwoCop and led to an approximate p-value of 31 %, suggesting that the fit is not inadequate.
As an additional informal check, random samples of size 55 were drawn from the fitted 6-dimensional copula and compared visually to the empirical copula by looking at rank plots of selected pairs. Figure 6 shows one result from such a comparison of pairs (LOB 2, LOB ') with ' 2 f3; 4; 5g and (LOB 3, LOB 4). The rank plots derived from the residuals are in the top row, and those corresponding to the random sample are in the bottom row. The positive dependence between Ontario risks seems to be accurately captured by the model. Although the negative association between LOBs 2 and 3 is taken into account, one can see in the second column of Fig. 6 that negative dependence is induced between LOBs 3 and 4. This is an artifact of the dependence structure, which assumes from the start that the pairs ðÀ3; 'Þ, with ' 2 f2; 4; 5g have the same degree of association. Table 4 suggests that this is not the case. This issue could have been avoided by grouping LOB 2 and Modeling dependence in run-off triangles 389 LOB 3 earlier in the structure, but at the expense of the overall fit of the model. A more flexible modeling approach is presented below.

Copula-based risk aggregation model
In this section, a hierarchical approach to loss triangle modeling is considered. It appears to have been originally proposed by Swiss reinsurance practitioners [9,35] but was formalized in [4]. Estimation and validation procedures for this class of models are described in [10,11], where rank-based clustering techniques are also proposed for selecting an appropriate structure. The model is defined using a tree comprising d À 1 nodes, each of which has two branches. An example of such a structure is shown in the left panel of Fig. 7. At Illustration of the tree structure and dendrogram for the copula-based aggregation model each node, a copula describes the dependence between the two components which are then summed and viewed as a single risk in higher levels of the hierarchy. For example, C 4;5 denotes the copula linking e ð4Þ and e ð5Þ and S 4;5 ¼ e ð4Þ þ e ð5Þ , while C 2;...;6 is the copula linking aggregated risks S 2;3;6 and S 4;5 . A joint distribution for the d variables is then defined in terms of d À 1 bivariate copulas and d marginal distributions under a conditional independence assumption. This assumption, which is reasonable in the present context, states that conditional on a sum at a given node, the descendents of that node are independent of the nondescendents. For additional details, see [4,11].
This strategy is simple to implement, as it builds on tools already available for bivariate copula selection, inference, and validation. Furthermore, the d À 1 copulas in the model can be chosen freely, thereby providing great flexibility in the dependence structure. Moreover, hierarchical clustering techniques can be adapted to obtain an appropriate tree structure.
As explained in [11], it is appealing to model first the risks that are the most dependent in some sense. In this paper, the distance based on Kendall's s, Dðe ð'Þ ; e ðkÞ Þ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 1 À s 2 ðe ð'Þ ; e ðkÞ Þ q ; is maximized at each step to obtain the dendrogram displayed in the right panel of Fig. 7. Risks 2 and 3 are grouped in the first step. Given that they are negatively associated, it was deemed preferable to work with Àe ð3Þ as was done in the previous section. Before selecting appropriate copulas for each aggregation step, Kendall and van der Waerden tests of independence were performed to see if the dependence is significant. The resulting p-values are shown in Table 6, where one can see that independence is rejected for the first four aggregation steps, but not at the last one. This is not surprising as the preliminary analysis of the data already suggested that the Atlantic BI line of business is not related to the others. Unlike the nested Archimedean copula model, the risk aggregation model captures the existing dependence between West BI and country-wide Liability lines, and includes the latter in the dependence analysis.
Given that the independence hypothesis cannot be rejected at the last node, there are only four copulas to be fitted, namely C 2;3 , C 2;3;6 , C 4;5 and C 2;...;6 . Based on rank plots, tests of extremeness from [6] and goodness-of-fit tests based on the Cramér- von Mises distance S n , parametric families of bivariate copulas were selected and fitted by maximum pseudo-likelihood. The final choices are summarized in Table 7. The model validation technique described in [11] was used. It relies on a simulation algorithm proposed in [4] and validated in [25]. Based on a random sample of size 500 from the model, the test in [32] led to an approximate p-value of 52 %. Therefore, the null hypothesis that both samples are coming from the same copula cannot be rejected. This suggests that the selected hierarchical model is appropriate, and that the conditional independence assumption is reasonable. A visual check of the latter assumption confirms this finding.
Looking at Fig. 8, one can see that the pitfalls of the nested Archimedean copula model have been addressed: there is no negative dependence between LOBs 3 and 4, and the model induces positive dependence between LOBs 3 and 6. However, the extent of the association between Ontario AB and BI risks is not portrayed as vividly in the aggregation model as it was in the nested Archimedean copula model. Over all, the risk aggregation model provides a faithful description of the data.
Note that if desired, a modification of the tree structure would make it possible to account for the dependence between LOB 2 and the pair (LOB 4, LOB 5). In that case, however, the negative dependence between LOBs 2 and 3 would be masked.

Predictive distribution and risk capital
The goal of loss triangle modeling is to forecast the unpaid loss by completing the triangle into a rectangle. Insurance companies are interested in the expected unpaid loss-the reserve-but also in its standard deviation, and other risk measures defined in terms of a risk tolerance j 2 ð0; 1Þ such as the Value-at-Risk (VaR) and the Tail Value-at-Risk (TVaR). In principle, these various measures could all be computed for the nested Archimedean copula model (Model I) and the risk aggregation model (Model II), given that they both specify a distribution for the total unpaid claims. As these distributions cannot be obtained explicitly through a convolution, however, all risk measures must be estimated by simulation. To obtain one realization of the total unpaid loss, one can proceed as follows.
Simulation procedure 1. Simulate 45 observations from the dependence model.
as well as the total unpaid loss S ¼ X ð1Þ þ Á Á Á þ X ð6Þ .
Consistent estimates of the risk measures can be derived easily from n independent copies of the unpaid loss S 1 ; . . .; S n . Let F n be the corresponding empirical distribution function. Then Table 8 shows risk measures for the total unpaid loss based on 500,000 simulations for Models I and II. Given the GLMs fitted to the marginal distributions, one would expect an average total unpaid loss of $438,088; the small discrepancy between this value and the approximations is due to simulation. The risk measures are all smaller for Model I than for Model II. This is slightly surprising because Model II takes into account the negative dependence between LOBs 2 and 3; intuitively, one would thus expect more risk diversification under Model II than under Model I. Nevertheless, Model II is more conservative than Model I in the sense that it does not assume that LOB 6 is independent from the other lines of business. In addition, Model II is based in part on Plackett and t 2 copulas, which exhibit tail dependence, whereas members of Frank's copula family in Model I do not. Insurance companies also have to determine capital allocations, i.e., the share of the risk capital to be allocated to each LOB. This exercise helps to identity the most and least profitable sectors of activities in a company. Capital allocation principles have first been introduced in [38]; see [5] for a review. Here, TVaR-based capital allocations are used. If is the unpaid loss for LOB ', the capital allocated to that LOB is where b j ¼ ½F S fVaR j ðSÞg À j= PrfS ¼ VaR j ðSÞg if the denominator is strictly positive and 0 otherwise. This quantity can be estimated by where X ð'Þ 1 ; . . .; X ð'Þ n are the n realizations of X ð'Þ corresponding to the realizations S 1 ; . . .; S n .
In Table 9, TVaR-based capital allocations are shown for both models as well as for the ''Silo'' method, which is widespread in industry [2]. It is clear that the Silo method overestimates the total capital required as it implicitly assumes that risks are comonotonic, thereby preventing any form of diversification. The results for Models I and II are similar. While the capital allocations for LOBs 4 and 5 are higher in Model II than in Model I, they are lower for LOBs 2 and 3, outlining the additional risk diversification that is possible in the presence of negative dependence.
The risk measures in Tables 8 and 9 could be used to set internal capital targets, but they do not incorporate parameter uncertainty, as the model is assumed to be correct. However, a parametric bootstrap can be used in order to quantify estimation error and to tackle potential model over-fitting; see, e.g., [37] or [39]. For the present purpose, it was assumed that the tree structure, the copula families, and the marginal distributions are given, except for their parameter values. The following procedure was then repeated a large number of times (10,000 times here) in order to obtain the approximate distribution of the unpaid loss, including parameter uncertainty.
Parametric bootstrap procedure 1. Simulate 55 observations from the dependence model, and transform them into observations of the loss ratios for the top triangle, i.e., all accident years i 2 f1; . . .; 10g and development years j 2 f1; . . .; 11 À ig, using the inverse marginal distributions. 2. Fit the marginal GLMs (log-normal for LOB 1 and Gamma for LOBs 2-6). 3. Compute the residuals from the GLMs. 4. Fit the copula model to the ranks of the residuals obtained. 5. From this new model, simulate the total unpaid loss using the steps described under ''Simulation procedure''. The aggregate value is the simulated total unpaid loss.
The results for the nested Archimedean copula model should be interpreted with caution, however, because the constraints on the dependence parameters in this model, and notably the fact thatĥ 2 is close toĥ 1 , may invalidate the parametric bootstrap [3]. Tables 10 and 11 show risk measures and capital allocations obtained with 10,000 bootstrap simulations, while Fig. 9 shows the predictive distribution obtained for Model I (left) and Model II (right). The risk measures in Table 10 are similar for both models and are much higher than those reported in Table 8; this highlights the importance of incorporating parameter uncertainty. Unsurprisingly, most of the increase in risk measures when including parameter uncertainty is due to the 6 Â 20 ¼ 120 marginal GLM parameters. Table 12 shows the risk measures obtained with the parametric bootstrap procedure without Step 4, i.e., the dependence parameters are fixed to their initial value estimated with the original Modeling dependence in run-off triangles 395 data. The resulting risk measures are close to those found in Table 10, even though the uncertainty in the copula parameters is not accounted for when Step 4 is omitted. Finally, the figures in Table 11 are in line with those of Table 9. In particular, observe that Model II allocates less capital to LOB 6 than Model I, reflecting the fact that LOB 6 is related to LOBs 2 and 3 in Model II. In view of these results, the    insurer might consider increasing the volume of LOB 3 to take better advantage of risk diversification.

Summary and discussion
In this paper, rank-based procedures were introduced for the selection, estimation and validation of dependence structures for run-off triangles of property and casualty insurance claim data. The approach was illustrated using data from six lines of business of a large Canadian insurance company. Two hierarchical approaches were considered for modeling the pairwise dependence between different lines of business, i.e., fully nested Archimedean copulas and a copula-based risk aggregation model. As simple and convenient as the nested Archimedean copula model may seem, its implementation raises more issues than one might anticipate initially. The success of this approach hinges on the choice of hierarchy and Archimedean generators at each of its levels. In principle, different Archimedean generators could be used throughout the structure, but the conditions required to ensure that the construction is valid are not always easy to verify. As there is no selection technique for generators, practitioners typically assume that they are all from the same parametric family u h . In the latter case, conditions for the validity of the nested copula typically boil down to the constraint h 1 ! Á Á Á ! h d ; see, e.g., [20].
As illustrated in the present paper, the use of the same generator throughout a fully nested Archimedean copula model has strong implications on the dependence structure. In particular, each variable is linked by the same bivariate copula to any variable appearing in a lower level of the hierarchy and, therefore, shares the same dependence characteristics with all of them in terms of symmetry, tail dependence, etc. In addition, the conditions stated in Eq. (1) are not only restrictive, but are also problematic for the parametric bootstrap. Indeed, when a bootstrap sample leads to unconstrained estimatesĥ 1 ; . . .;ĥ d such thatĥ 1 ! Á Á Á !ĥ d fails, one or more of the constrained parameter estimates end up being equal to 0. When this happens repeatedly, the dependence between the LOBs is underestimated. Thus, it is still unclear that this model can be used in a parametric bootstrap procedure to obtain the predictive distribution of unpaid losses, due to the optimization problem that is not standard.
Working with the risk aggregation model allows one to avoid most of these issues. The tree structure can be determined using hierarchical clustering and the copulas can be chosen freely at each aggregation step. In addition, standard tools for bivariate copula selection, estimation, and validation are available. Moreover, the application of the parametric bootstrap to this context is standard, as there are no constraints on the parameters. Overall, the model provides greater flexibility and the dependence structure can be considerably more complex than what can be achieved with the nested Archimedean approach. However, the conditional independence assumption must be satisfied (at least approximately) and formal tools for checking this assumption remain to be developed. Another minor irritant is the fact that simulation from this model relies on the Iman-Conover reordering algorithm, which is efficient but not yet included in standard software; in contrast, sampling from the fully nested Archimedean copula is easily done with the R package copula.
Perhaps the most significant limitation of the rank-based approach to risk aggregation modeling described here is that it can only be applied to data or residuals that are (at least approximately) identically distributed. Another requirement for this approach to make sense is that the sums that are linked by the copulas have the same number of components. This means that the risk aggregation model cannot be extended easily to include calendar year dependence, as Abdallah et al. [1] did using nested Archimedean copulas. Unfortunately, this approach is not amenable to estimation and validation procedures based on ranks, as there is then only one observation for each copula in the model.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
The formulas are available from the authors upon request or can be derived through long but routine calculations facilitated by resorting to a symbolic calculator such as Maple or Mathematica. Modeling dependence in run-off triangles 401