Keywords

AMS 2000 subject classifications

1 Introduction

From risk R management practice, there is an increasing interest in copula theory and applications in high dimensions. One of the reasons is that vectors of risk factor changes are typically high-dimensional and have to be adequately modeled; see [23, Chap. 2]. In high dimensions, the inherent model risk can be substantial. It is, thus, of interest to test whether an estimated or assumed (dependence) model is appropriate. One of our goals is, therefore, to present and explore goodness-of-fit tests in high dimensions for a widely used class of copulas in practice, namely Archimedean copulas. We also investigate the influence of the dimension on the conducted goodness-of-fit tests and address the problems that arise specifically in high dimensions.

It is clear that especially in high dimensions, the exchangeability of Archimedean copulas becomes an increasingly strong assumption for certain applications. This point of criticism applies equally well to all exchangeable copula models including the well-known homogeneous Gaussian or \(t\) copulas. However, note that these models are indeed applied in banks and insurance companies, typically in high dimensions, in order to introduce (tail-) dependence to joint models for risks as opposed to assuming (tail) independence. We therefore believe that it is important to investigate such models in high dimensions.

Archimedean copulas are copulas which admit the functional form

$$\begin{aligned} C({{\varvec{u}}})=\psi ({\psi ^{-1}}(u_1)+\dots +{\psi ^{-1}}(u_d)),\ {{\varvec{u}}}\in [0,1]^d, \end{aligned}$$
(1)

for an (Archimedean) generator \(\psi \), i.e., a continuous, decreasing function \(\psi :[0,\infty ]\rightarrow [0,1]\) which satisfies \(\psi (0)=1\), \(\psi (\infty )=\lim _{t\rightarrow \infty }\psi (t)=0\), and which is strictly decreasing on \([0,\inf \{t:\psi (t)=0\}]\). A necessary and sufficient condition under which (1) is indeed a proper copula is that \(\psi \) is \(d\) -monotone, i.e., \(\psi \) is continuous on \([0,\infty ]\), admits derivatives up to the order \(d-2\) satisfying \((-1)^k\psi ^{(k)}(t)\ge 0\) for all \(k\in \{0,\dots ,d-2\}\), \(t\in (0,\infty )\), and \((-1)^{d-2}\psi ^{(d-2)}(t)\) is decreasing and convex on \((0,\infty )\), see [20] or [22]. For reasons why Archimedean copulas are used in practice, see [9] or [19].

Goodness-of-fit techniques for copulas only more recently gained interest, see, e.g., [5, 6, 8, 1114], and references therein. Although usually presented in a \(d\)-dimensional setting, only some of the publications actually try to apply goodness-of-fit tests in more than two dimensions, including [5, 26] up to dimension \(d=5\) and [4] up to dimension \(d=8\). The common deficiency of goodness-of-fit tests for copulas in general, but also for the class of Archimedean copulas, is their limited applicability when the dimension becomes large. This is mainly due to the lack of a simple or at least numerically accessible form as the dimension becomes large. Furthermore, parameter estimation usually becomes much more demanding in high dimensions; see [19].

As a general goodness-of-fit test, the transformation of [25] is well known. It is important to note that the inverse of this transformation leads to a popular sampling algorithm, the conditional distribution method, see, e.g., [10]. In other words, for a bijective transformation which converts \(d\) independent and identically distributed (“i.i.d.”) standard uniform random variables to a \(d\)-dimensional random vector distributed according to some copula \(C\), the corresponding inverse transformation may be applied to obtain \(d\) i.i.d. standard uniform random variables from a \(d\)-dimensional random vector following the copula \(C\). In this work, we suggest this idea for goodness-of-fit testing based on a transformation originally proposed by [29] for sampling Archimedean copulas. With the recent work of [22] we obtain a more elegant proof of the correctness of this transformation under weaker assumptions. We then apply the first \(d-1\) components to build a general goodness-of-fit test for \(d\)-dimensional Archimedean copulas. This complements goodness-of-fit tests based on the \(d\)th component, the Kendall distribution function, see, e.g., [13, 26], or [14]. Our proposed test can be interpreted as an Archimedean analogon to goodness-of-fit tests based on Rosenblatt’s transformation for copulas in general as it establishes a link between a sampling algorithm and a goodness-of-fit test. The appealing property of tests based on the inverse of the transformation of [29] for Archimedean copulas is that they are easily applied in any dimension, whereas tests based on Rosenblatt’s transformation, as well as tests based on the Kendall distribution function are typically numerically challenging. The transformation can also be conveniently used for graphical goodness-of-fit testing as recently advocated by [16].

This paper is organized as follows. In Sect. 2, commonly used goodness-of-fit tests for copulas in general are recalled. In Sect. 3, the new goodness-of-fit test for Archimedean copulas is presented. Section 4 contains details about the conducted simulation study. The results are presented in Sect. 5 and the graphical goodness-of-fit test is detailed in Sect. 6. Finally, Sect. 7 concludes.

2 Goodness-of-fit Tests for Copulas

Let \(\varvec{ X}=(X_{1},\dots ,X_{d})\), \(d\ge 2\), denote a random vector with distribution function \(H\) and continuous marginals \(F_{1},\dots ,F_{d}\). In a copula model for \(\varvec{ X}\), one would like to know whether \(C\) is well represented by a parametric family \(\fancyscript{C}_0=\{C(\cdot \,;\varvec{\theta }):\varvec{\theta }\in \varTheta \}\) where \(\varTheta \) is an open subset of \(\mathbb {R}^p\), \(p\in \mathbb {N}\). In other words, one would like to test the null hypothesis

$$\begin{aligned} H_{0}:C\in \fancyscript{C}_0 \end{aligned}$$
(2)

based on realizations of independent copies \(\varvec{ X}_i\), \(i\in \{1,\dots ,n\}\), of \(\varvec{ X}\). For testing \(H_0\), the (usually unknown) marginal distributions are treated as nuisance parameters and are replaced by their slightly scaled empirical counterparts, the pseudo-observations \(\varvec{U}_i=(U_{i1},\dots ,U_{id})\), \(i\in \{1,\dots ,n\}\), with

$$\begin{aligned} U_{ij}=\frac{n}{n+1}\hat{F}_{nj}(X_{ij}),\ i\in \{1,\dots ,n\},\ j\in \{1,\dots ,d\}, \end{aligned}$$
(3)

where \(\hat{F}_{nj}(x)=\frac{1}{n}\sum _{k=1}^n1\!\!1_{\{X_{kj}\le x\}}\) denotes the empirical distribution function of the \(j\)th data column (the data matrix consisting of the entries \(X_{ij}\), \(i\in \{1,\dots ,n\}\), \(j\in \{1,\dots ,d\}\)), see [14]. Following the latter approach one ends up with rank-based pseudo-observations which are interpreted as observations of \(C\) (besides the known issues of this interpretation, see Remark 1 below) and are, therefore, used for estimating \(\varvec{\theta }\) and testing \(H_{0}\).

In order to conduct a goodness-of-fit test, the pseudo-observations \(\varvec{U}_i\), \(i\in \{1,\dots ,n\}\), are usually first transformed to some variables \(\varvec{U}_i'\), \(i\in \{1,\dots ,n\}\), so that the distribution of the latter is known and sufficiently simple to test under the null hypothesis. For Rosenblatt’s transformation (see Sect. 2.1), \(\varvec{U}_i'\), \(i\in \{1,\dots ,n\}\), is also \(d\)-dimensional, for tests based on the Kendall distribution function (described in Sect. 2.2), it is one-dimensional, and for the goodness-of-fit approach we propose in Sect. 3, it is \((d-1)\)-dimensional. If not already one-dimensional, after such a transformation, \(\varvec{U}_i'\), \(i\in \{1,\dots ,n\}\), is usually mapped to one-dimensional quantities \(Y_i\), \(i\in \{1,\dots ,n\}\), such that the corresponding distribution \(F_Y\) is again known under the null hypothesis. So indeed, instead of (2), one usually considers some adjusted hypothesis \(H_0^*:F_Y\in \fancyscript{F}_0\) under which a goodness-of-fit test can easily be carried out in a one-dimensional setting. For mapping the variates to a one-dimensional setting, different approaches exist, see Sect. 2.2. Note that if \(H_0^*\) is rejected, so is \(H_0\).

Remark 1

As, e.g., [8] describe, there are two problems with the approach described above. First, the pseudo-observations \(\varvec{U}_i\), \(i\in \{1,\dots ,d\}\), are neither realizations of perfectly independent random vectors nor are the components perfectly following univariate standard uniform distributions. This affects the null distribution of the test statistic under consideration. All copula goodness-of-fit approaches suffer from these effects since observations from the underlying copula are never directly observed in practice. A solution may be a bootstrap to access the exact null distribution. Particularly in high dimensions, it is often time-consuming, especially for goodness-of-fit tests suggested in the copula literature so far. Second, using estimated copula parameters additionally affects the null distribution.

2.1 Rosenblatt’s Transformation and a Corresponding Test

The transformation introduced by [25] is a standard approach for obtaining realizations of standard uniform random vectors \(\varvec{U}_i'\), \(i\in \{1,\dots ,n\}\), given random vectors \(\varvec{U}_i\), \(i\in \{1,\dots ,n\}\), from an absolutely continuous copula \(C\) which can then be tested directly or further mapped to one-dimensional variates for testing purposes. Consider a representative \(d\)-dimensional random vector \(\varvec{U}\sim C\). To obtain \(\varvec{U}'\sim \mathrm{U}[0,1]^d\) (i.e., a random vector with independent components, each uniformly distributed on \([0,1]\)), [25] proposed the transformation \(R:\varvec{U}\rightarrow \varvec{U}'\), given by

$$\begin{aligned} U_{1}'&=U_1,\\ U_{2}'&=C_2(U_{2}\,|\,U_{1}),\\&\vdots \\ U_{d}'&=C_d(U_{d}\,|\,U_{1},\dots ,U_{d-1}), \end{aligned}$$

where for \(j\in \{2,\dots ,d\}\), \(C_j(u_{j}\,|\,u_{1},\dots ,u_{j-1})\) denotes the conditional distribution function of \(U_{j}\) given \(U_{1}=u_{1},\dots ,U_{j-1}=u_{j-1}\). We denote this method for constructing goodness-of-fit tests by “\(R\)” in what follows.

Remark 2

Note that the inverse transformation \(R^{-1}\) of Rosenblatt’s transformation leads to the conditional distribution method for sampling copulas, see, e.g., [10]. This link brings rise to the general idea of using sampling algorithms based on one-to-one transformations to construct goodness-of-fit tests. This is done in Sect. 3 to construct a goodness-of-fit test for Archimedean copulas based on a transformation originally proposed by [29] for sampling random variates.

To find the quantities \(C_j(u_{j}\,|\,u_{1},\dots ,u_{j-1})\), \(j\in \{2,\dots ,d\}\), for a specific copula \(C\) (under weak conditions), the following connection between conditional distributions and partial derivatives is usually applied; see [27, p.20]. Assuming \(C\) admits continuous partial derivatives with respect to the first \(d-1\) arguments, one has

$$\begin{aligned} C_j(u_{j}\,|\,u_{1},\dots ,u_{j-1})=\frac{{{D}_{j-1,\dots ,1}} C^{(1,\dots ,j)}(u_{1}, \dots ,u_{j})}{{{D}_{j-1,\dots ,1}} C^{(1,\dots ,j-1)}(u_{1},\dots ,u_{j-1})},\ j\in \{2,\dots ,d\}, \end{aligned}$$
(4)

where \(C^{(1,\dots ,k)}\) denotes the \(k\)-dimensional marginal copula of \(C\) corresponding to the first \(k\) arguments and \({{D}_{j-1,\dots ,1}}\) denotes the mixed partial derivative of order \(j-1\) with respect to the first \(j-1\) arguments. For a \(d\)-dimensional Archimedean copula \(C\) with \((d-1)\)-times continuously differentiable generator \(\psi \), one has

$$\begin{aligned} C_j(u_{j}\,|\,u_{1},\dots ,u_{j-1})=\frac{\psi ^{(j-1)}\bigl (\sum _{k=1}^j{\psi ^{-1}}(u_k)\bigr )}{\psi ^{(j-1)}\bigl (\sum _{k=1}^{j-1}{\psi ^{-1}}(u_k)\bigr )},\ j\in \{2,\dots ,d\}. \end{aligned}$$
(5)

The problem when applying (4) or (5) in high dimensions is that it is usually quite difficult to access the derivatives involved, the price which one has to pay for such a general transformation. Furthermore, numerically evaluating the derivatives is often time-consuming and prone to errors.

Genest et al. [14] propose a test statistic based on the empirical distribution function of the random vectors \(\varvec{U}_i'\), \(i\in \{1,\dots ,d\}\). As an overall result, the authors recommend to use a distance between the distribution under \(H_0\), assumed to be standard uniform on \([0,1]^d\), and the empirical distribution, namely

$$\begin{aligned} S^B_{n,d}=n\int \limits _{[0,1]^d}(D_n(\varvec{u})-\varPi (\varvec{u}))^2\,\mathrm{d}\varvec{u}, \end{aligned}$$

where \(\varPi (\varvec{u})=\prod _{j=1}^d u_j\) denotes the independence copula and \(D_n(\varvec{u})=\frac{1}{n}\sum _{i=1}^n1\!\!1_{\{\varvec{U}_i'\le \varvec{u}\}}\) the empirical distribution function based on the random vectors \(\varvec{U}_i'\), \(i\in \{1,\dots ,d\}\). We refer to this transformation as “\(S^B_{n,d}\)” in what follows.

2.2 Tests in a One-Dimensional Setting

In order to apply a goodness-of-fit test in a one-dimensional setting one has to summarize the \(d\)-dimensional pseudo-observations \(\varvec{U}_i\) or \(\varvec{U}_i'\) via one-dimensional quantities \(Y_i\), \(i\in \{1,\dots ,n\}\), for which the distribution is known under the null hypothesis. In what follows, some popular mappings achieving this task are described.  

\(N_d\)::

Under \(H_0\), the one-dimensional quantities \(Y_i=F_{\chi _d^2}\big (\sum _{j=1}^d\varPhi ^{-1}(U_{ij}')^2\big )\), \(i\in \{1,\dots ,\) \(n\}\), should be i.i.d. according to a standard uniform distribution, where \({F_{\chi _d^2}}\) denotes the distribution function of a \(\chi ^2\) distribution with \(d\) degrees of freedom and \(\varPhi ^{-1}\) denotes the quantile function of the standard normal distribution. This transformation can be found, e.g., in [8] and is denoted by “\(N_d\)” in what follows.

\(K_C\)::

For a copula \(C\) let \(K_C\) denote the Kendall distribution function, i.e., \(K_C(t)=\mathbb {P}(C(\varvec{U})\le t)\), \(t\in [0,1]\), where \(\varvec{U}\sim C\), see [3] or [22]. Under \(H_0\) and if \(K_C\) is continuous, the random variables \(Y_i=K_C(C(\varvec{U}_i))\) should be i.i.d. according to a standard uniform distribution. This approach for goodness-of-fit testing will be referred to as “\(K_C\)”. Note that in this case, no multidimensional transformation of the data is performed beforehand.

\(K_\varPi \)::

One can also consider the random vectors \(\varvec{U}_i'\), \(i\in \{1,\dots ,n\}\), in conjunction with the independence copula, i.e., define \(\tilde{Y}_i=\prod _{j=1}^dU_{ij}'\), where \(\tilde{Y}_i\) has distribution function \(K_{\varPi }(t)=t\sum _{k=0}^{d-1}\frac{1}{k!}(-\log t)^k \). Under \(H_0\), the sample \(Y_i=K_{\varPi }(\tilde{Y}_i)\), \(i\in \{1,\dots ,n\}\), should indicate a uniform distribution on the unit interval. This approach is referred to as “\(K_\varPi \)”.

  In the approaches \(N_d\), \(K_C\), and \(K_\varPi \) we have to test the hypothesis that realizations of the random variables \(Y_i\), \(i\in \{1,\dots ,n\}\), follow a uniform distribution on the unit interval. This may be achieved in several ways, the following two approaches are applied in what follows.  

\(\chi ^2\)::

Pearson’s \(\chi ^2\) test, see [24, p. 391], shortly referred to as “\(\chi ^2\)”.

\(AD\)::

The so-called Anderson-Darling test, a specifically weighted Cramér-von Mises test, see [1, 2]. This method is referred to as “\(AD\)”.

 

3 A Goodness-of-fit Test for Archimedean Copulas

The goodness-of-fit test we now present is based on the following transformation from [29] for generating random variates from Archimedean copulas. Note that we present a rather short proof of this interesting result, under weaker assumptions.

Theorem 1

(The main transformation) Let \(\varvec{ U}\sim C\), \(d\ge 2\), where \(C\) is an Archimedean copula with \(d\)-monotone generator \(\psi \) and continuous Kendall distribution function \(K_C\). Then \(\varvec{ U}'\sim \mathrm{U}[0,1]^d\), where

$$\begin{aligned} U_{j}'=\left( \frac{\sum _{k=1}^{j}{\psi ^{-1}}(U_{k})}{\sum _{k=1}^{j+1}{\psi ^{-1}}(U_{k})}\right) ^{j},\ j\in \{1,\dots ,d-1\},\ U_{d}'=K_C(C(\varvec{ U})). \end{aligned}$$
(6)

Proof

As shown in [22], \(({\psi ^{-1}}(U_1),\dots ,{\psi ^{-1}}(U_d))\) has an \(\ell _1\)-norm symmetric distribution with survival copula \(C\) and radial distribution \(F_R=\fancyscript{W}_d^{-1}[\psi ]\), where \(\fancyscript{W}_d[\cdot ]\) denotes the Williamson \(d\)-transform. Hence, \(({\psi ^{-1}}(U_1),\dots ,\) \({\psi ^{-1}}(U_d))\underset{}{\overset{\text {{d}}}{=}}R\varvec{S}\), where \(R\sim F_R\) and \(\varvec{S}\sim \mathrm{U}(\{\varvec{x}\in \mathbb {R}_+^d\,|\,||\varvec{x}||_1=1\})\) are independent. For \(Z_{(0)}=0\), \(Z_{(d)}=1\), and \((Z_1,\dots ,Z_{d-1})\sim \mathrm{U}[0,1]^{d-1}\), it follows from [7, p. 207] that \(S_j\underset{}{\overset{\text {{d}}}{=}}Z_{(j)}-Z_{(j-1)}\), \(j\in \{1,\dots ,d\}\), independent of \(R\). This implies that \({\psi ^{-1}}(U_j)\underset{}{\overset{\text {{d}}}{=}}R(Z_{(j)}-Z_{(j-1)})\), \(j\in \{1,\dots ,d\}\), and hence that \(\varvec{ U}'\) is in distribution equal to \(\varvec{W}=\bigl ((Z_{(1)}/Z_{(2)})^1,\dots ,(Z_{(d-1)}/Z_{(d)})^{d-1},K_C(\psi (R))\bigr )\). Since \(K_C\) is continuous and \(\psi (R)\sim K_C\), \(K_C(\psi (R))\) is uniformly distributed in \([0,1]\). Furthermore, as a function in \(R\), \(K_C(\psi (R))\) is independent of \((W_1,\dots ,W_{d-1})\). It therefore suffices to show that \((W_1,\dots ,W_{d-1})\sim \mathrm{U}[0,1]^{d-1}\), a proof of which can be found in [7, p. 212].

The transformation \(T:\varvec{ U}\rightarrow \varvec{ U}'\) given in (6) can be interpreted as an analogon to Rosenblatt’s transformation \(R\) specifically for Archimedean copulas. Both \(T\) and \(R\) uniquely map \(d\) random variables to \(d\) random variables and can therefore be used in both directions, for generating random variates and goodness-of-fit tests; the latter approach for \(T\) is proposed in this paper. The advantage of this approach for obtaining the random variables (or their realizations in form of given data) \(\varvec{ U}_i'\sim \mathrm{U}[0,1]^d\), \(i\in \{1,\dots ,n\}\), from \(\varvec{ U}_i\sim C\), \(i\in \{1,\dots ,n\}\), in comparison to Rosenblatt’s transformation lies in the fact that it is typically much easier to compute the quantities in (6) than accessing the derivatives in (5). One can then proceed as for Rosenblatt’s transformation and use any of the transformations listed in Sect. 2.2 to transform \(\varvec{ U}_i'\), \(i\in \{1,\dots ,n\}\), to the one-dimensional quantities \(Y_i\), \(i\in \{1,\dots ,n\}\), for testing \(H_0^*\). A test involving the transformation \(T\) to obtain the random vectors \(\varvec{ U}_i'\sim \mathrm{U}[0,1]^d\), \(i\in \{1,\dots ,n\}\), is referred to as approach “\(T_d\)” in what follows.

Note that evaluating the transformation \(T\) might only pose difficulties for the last component \(U_{d}'\), the Kendall distribution function \(K_C\), whereas computing \(U_{j}'\), \(j\in \{1,\dots ,d-1\}\), is easily achieved for any Archimedean copula with explicit generator inverse. Furthermore, for large \(d\), evaluation of \(K_C\) often gets more and more complicated from a numerical point of view (see [18] for the derivatives involved), except for specific cases such as Clayton’s family where all involved derivatives of \(\psi \) are directly accessible, see, e.g., [29], and therefore \(K_C\) can be computed directly viaFootnote 1 \(K_C(t)=\sum _{k=0}^{d-1}(0-{\psi ^{-1}}(t))^k\psi ^{(k)}({\psi ^{-1}}(t))/k!\), see, e.g., [3] or [22]. Moreover, note that applying \(T_d\) for obtaining the transformed data \(\varvec{ U}_i'\), \(i\in \{1,\dots ,n\}\), requires \(n\)-times the evaluation of the Kendall distribution function \(K_C\), which can be computationally intensive, especially in simulation studies involving bootstrap procedures. With the informational loss inherent in the goodness-of-fit tests following the approaches addressed in Sect. 2.2 in mind, one may therefore suggest to omit the last component \(T_d\) of \(T\) and only consider \(T_1,\dots ,T_{d-1}\), i.e., using the data \((U_{i1}',\dots ,U_{id-1}')\), \(i\in \{1,\dots ,n\}\), for testing purposes if \(d\) is large. This leads to fast goodness-of-fit tests for Archimedean copulas in high dimensions. A goodness-of-fit test based on omitting the last component of the transformation \(T\) is referred to as approach “\(T_{d-1}\)” in what follows.

4 A Large-Scale Simulation Study

4.1 The Experimental Design

In our experimental design, focus is put on two features, the error probability of the first kind, i.e., if a test maintains its nominal level, and the power under several alternatives. To distinguish between the different approaches we use either pairs or triples, e.g., the approach “\((T_{d-1},N_{d-1},AD)\)” denotes a goodness-of-fit test based on first applying our proposed transformation \(T\) without the last component, then using the approach based on the \(\chi ^2_{d-1}\) distribution to transform the data to a one-dimensional setup, and then applying the Anderson-Darling statistic to test \(H_0^*\); similarly, “\((T_{d-1},S^B_{n,d-1})\)” denotes a goodness-of-fit test which uses the approach \(S^B_{n,d-1}\) for reducing the dimension and testing \(H_0^*\).

In the conducted Monte Carlo simulation,Footnote 2 the following ten different goodness-of-fit approaches are tested:

$$\begin{aligned}&(T_{d-1},N_{d-1},\chi ^2),\ (T_{d-1},N_{d-1},AD),\ (T_{d-1},S^B_{n,d-1}),\ (K_C,\chi ^2),\ (K_C,AD),\nonumber \\&(T_d,N_d,AD),\ (T_d,K_\varPi ,AD),\ (T_d,S^B_{n,d}),\ (R,N_d,AD),\ (R,S^B_{n,d}). \end{aligned}$$
(7)

Similar to [14], we investigate samples of size \(n=150\) and parameters of the copulas such that Kendall’s tau equals \(\tau =0.25\). We work in \(d=5\) and \(d=20\) dimensions for comparing the goodness-of-fit tests given in (7). For every scenario, we simulate the corresponding Archimedean copulas of Ali-Mikhail-Haq (“A”), Clayton (“C”), Frank (“F”), Gumbel (“G”), and Joe (“J”), see, e.g., [15], as well as the Gaussian (“Ga”) and \(t\) copula with four degrees of freedom (“\(t_4\)”); note that we use one-parameter copulas (\(p=1\)) in our study only for simplicity. Whenever computationally feasible, \(N=1\),\(000\) replications are used for computing the empirical level and power. In some cases, see Sect. 5, less than 1,000 replications had to be used. For all tests, the significance level is fixed at \(\alpha =5\,\%\). For the univariate \(\chi ^2\)-tests, ten cells were used.

Concerning the use of Maple, we proceed as follows. For computing the first \(d-1\) components \(T_1,\dots ,T_{d-1}\) of the transformation \(T\) involved in the first three and the sixth to eighth approach listed in (7), Maple is only used if working under double precision in C/C++ leads to errors. With errors, nonfloat values including nan, -inf, and inf, as well as float values less than zero or greater than one are meant. For computing the component \(T_d\), Maple is used to generate C/C++ code. To decrease runtime, the function is then hard coded in C/C++, except for Clayton’s family where an explicit form of all derivatives and hence \(K_C\) is known, see [29]. The same holds for computing \(K_C\) for the approaches \((K_C,\chi ^2)\) and \((K_C,AD)\). For the approaches involving Rosenblatt’s transform, a computation in C/C++ is possible for Clayton’s family in a direct manner, whereas again Maple’s code generator is used for all other copula families to obtain the derivatives of the generator. If there are numerical errors from this approach we use Maple with a high precision for the computation. If Rosenblatt’s transformation produces errors even after computations in Maple, we disregard the corresponding goodness-of-fit test and use the remaining test results of the simulation for computing the empirical level and power.

Due to its well-known properties, we use the maximum likelihood estimator (“MLE”) to estimate the copula parameters, based on the pseudo-observations of the simulated random vectors \(\varvec{ U}_i\sim C\), \(i\in \{1,\dots ,n\}\). Besides building the pseudo-observations, note that parameter estimation may also affect the null distribution. This is generally addressed by using a bootstrap procedure for accessing the correct null distribution, see Sect. 4.2 below. Note that a bootstrap can be quite time-consuming in high dimensions, even parameter estimation already turns out to be computationally demanding. For the bootstrap versions of the goodness-of-fit approaches involving the generator derivatives, we were required to hard code the derivatives in order to decrease runtime. Note that such effort is not needed for applying our proposed goodness-of-fit test \((T_{d-1},N_{d-1},AD)\), since it is not required to access the generator derivatives.

4.2 The Parametric Bootstrap

For our proposed approach \((T_{d-1},N_{d-1},AD)\) it is not clear whether the bootstrap procedure is valid from a theoretical point of view; see, e.g., [8] and [14]. However, empirical results, presented in Sect. 5, indicate the validity of this approach, described as follows.

  1. 1.

    Given the data \(\varvec{ X}_i\), \(i\in \{1,\dots ,n\}\), build the pseudo-observations \(\varvec{ U}_i\), \(i\in \{1,\dots ,n\}\) as given in (3) and estimate the unknown copula parameter vector \(\varvec{\theta }\) by its MLE \(\hat{\varvec{\theta }}_n\).

  2. 2.

    Based on \(\varvec{ U}_{i}\), \(i\in \{1,\dots ,n\}\), the given Archimedean family, and the parameter estimate \(\hat{\varvec{\theta }}_n\), compute the first \(d-1\) components \(U_{ij}'\), \(i\in \{1,\dots ,n\}\), \(j\in \{1,\dots ,d-1\}\), of the transformation \(T\) as in Eq. (6) and the one-dimensional quantities \(Y_i=\sum _{j=1}^{d-1}(\varPhi ^{-1}(U_{ij}'))^2\), \(i\in \{1,\dots ,n\}\). Compute the Anderson-Darling test statistic \(A_n=-n-\frac{1}{n}\sum _{i=1}^n(2i-1) [\log (F_{\chi ^2_{d-1}}(Y_{(i)}))+\log (1-F_{\chi ^2_{d-1}}(Y_{(n-i+1)}))]\).

  3. 3.

    Choose the number \(M\) of bootstrap replications. For each \(k\in \{1,\dots ,M\}\) do:

    1. a.

      Generate a random sample of size \(n\) from the given Archimedean copula with parameter \(\hat{\varvec{\theta }}_n\) and compute the corresponding vectors of componentwise scaled ranks (i.e., the pseudo-observations) \(\varvec{ U}^*_{i,k}\), \(i\in \{1,\dots ,n\}\). Then, estimate the unknown parameter vector \(\varvec{\theta }\) by \(\hat{\varvec{\theta }}^*_{n,k}\).

    2. b.

      Based on \(\varvec{ U}^*_{i,k}\), \(i\in \{1,\dots ,n\}\), the given Archimedean family, and the parameter estimate \(\hat{\varvec{\theta }}^*_{n,k}\), compute the first \(d-1\) components \(U_{ij,k}^{\prime *}\), \(i\in \{1,\dots ,n\}\), \(j\in \{1,\dots ,d-1\}\), of the transformation \(T\) as in Eq. (6) and \(Y_{i,k}^*=\sum _{j=1}^{d-1}(\varPhi ^{-1}(U_{ij,k}^{\prime *}))^2\), \(i\in \{1,\dots ,n\}\). Compute the Anderson-Darling test statistic \(A_{n,k}^*=-n-\frac{1}{n}\sum _{i=1}^n(2i-1) [\log (F_{\chi ^2_{d-1}}(Y_{(i),k}^*))+\log (1-F_{\chi ^2_{d-1}}(Y_{(n-i+1),k}^*))]\).

  4. 4.

    An approximate \(p\)-value for \((T_{d-1},N_{d-1},AD)\) is given by \(\frac{1}{M}\sum _{k=1}^M1\!\!1_{\{A_{n,k}^*>A_n\}}\).

The bootstrap procedures for the other approaches can be obtained similarly. For the bootstrap procedure using Rosenblatt’s transformation see, e.g., [14]. For our simulation studies, we used \(M=1\),\(000\) bootstrap replications. Note that, together with the number \(N=1\),\(000\) of test replications, simulation studies are quite time-consuming, especially if parameters need to be estimated and especially if high dimensions are involved.

Applying the MLE in high dimensions is numerically challenging and time-consuming; see also [19]. Although our proposed goodness-of-fit test can be applied in the case \(d=100\), it is not easy to use the bootstrap described above in such high dimensions. We therefore, for \(d=100\), investigate only the error probability of the first kind similar to the case A addressed in [8]. For this, we generate \(N=1\),\(000\) \(100\)-dimensional samples of size \(n=150\) with parameter chosen such that Kendall’s tau equals \(\tau =0.25\) and compute for each generated data set the \(p\)-value of the test \((T_{d-1},N_{d-1},AD)\) as before, however, this time with the known copula parameter. Finally, the number of rejections among the 1,\(000\) conducted goodness-of-fit tests according to the five percent level is reported. The results are given at the end of Sect. 5.

5 Results

We first present selected results obtained from the large-scale simulation study conducted for the 10 different goodness-of-fit approaches listed in (7). These results summarize the main characteristics found in the simulation study. As an overall result, we found that the empirical power against all investigated alternatives increases if the dimension gets large. As expected, so does runtime.

We start by discussing the methods that show a comparably weak performance in the conducted simulation study. We start with the results that are based on the test statistics \(S^B_{n,d-1}\) or \(S^B_{n,d}\) to reduce the dimension. Although keeping the error probability of the first kind, the goodness-of-fit tests \((T_{d-1},S^B_{n,d-1})\), \((T_d,S^B_{n,d})\), and \((R,S^B_{n,d})\) show a comparably weak performance against the investigated alternatives, at least in our test setup as described in Sect. 4.1. For example, for \(n=150\), \(d=5\), and \(\tau =0.25\), the method \((T_d,S^B_{n,d})\) leads to an empirical power of 5.2 % for testing Clayton’s copula when the simulated copula is Ali-Mikhail-Haq’s, 11.5 % for testing the Gaussian copula on Frank copula data, 7.7 % for testing Ali-Mikhail-Haq’s copula on data from Frank’s copula, and 6.4 % for testing Gumbel’s copula on data from Joe’s copula. Similarly for the methods \((T_{d-1},S^B_{n,d-1})\) and \((R,S^B_{n,d})\). We therefore do not further report on the methods involving \(S^B_{n,d-1}\) or \(S^B_{n,d}\) in what follows. The method \((T_d,K_\varPi ,AD)\) also shows a rather weak performance for both investigated dimensions and is therefore omitted. Since the cases of \((K_C,\chi ^2)\) and \((K_C,AD)\) as well as the approaches \((T_{d-1},N_{d-1},AD)\) and \((T_{d-1},N_{d-1},\chi ^2)\) do not significantly differ, we only report the results based on the Anderson-Darling tests.

Now consider the goodness-of-fit testing approaches \((T_{d-1},N_{d-1},AD)\), \((K_C,AD)\), and \((T_d,N_d,AD)\). Recall that \((T_{d-1},N_{d-1},AD)\) is based on the first \(d-1\) components of the transformation \(T\) addressed in Eq. (6), \((K_C,AD)\) applies only the last component of \(T\), and \((T_d,N_d,AD)\) applies the whole transformation \(T\) in \(d\) dimensions, where all three approaches use the Anderson-Darling test for testing \(H_0^*\). The test results for the three goodness-of-fit tests with \(n=150\), \(\tau =0.25\), and \(d\in \{5,20\}\) are reported in Tables 1, 2, and 3, respectively. As mentioned above, we use a bootstrap procedure to obtain approximate \(p\)-values and test the hypothesis based on those \(p\)-values. We use \(N=1\),\(000\) repetitions wherever possible. In all cases involving Joe’s copula as \(H_0\) copula only about 650 repetitions could be finished. As Tables 1 and 2 reveal, in many cases, \((T_{d-1},N_{d-1},AD)\) shows a larger empirical power than \((K_C,AD)\) (for both \(d\)), but the differences in either direction can be large (consider the case of the \(t_4\) copula when the true one is Clayton (both \(d\)) and the case of the Frank copula when the true is one is Clayton (both \(d\))). Overall, when the true copula is the \(t_4\) copula, \((T_{d-1},N_{d-1},AD)\) performs well. Given the comparably numerically simple form of \((T_{d-1},N_{d-1},AD)\), this method can be quite useful. Interestingly, by comparing Table 1 with Table 3, we see that if the transformation \(T\) with all \(d\) components is applied, there is actually a loss in power for the majority of families tested (the cause of this behavior remains an open question). Note that in Table 2 for the case where the Ali-Mikhail-Haq copula is tested, the power decreases in comparison to the five-dimensional case. This might be due to numerical difficulties occurring when \(K_C\) is evaluated in this case, since the same behavior is visible for the method \((K_C,\chi ^2)\).

Table 1 Empirical power in % for \((T_{d-1},N_{d-1},AD)\) based on \(N=1\),\(000\) replications with \(n=150\), \(\tau =0.25\), and \(d=5\) (left), respectively \(d=20\) (right)
Table 2 Empirical power in % for \((K_C,AD)\) based on \(N=1\),\(000\) replications with \(n=150\), \(\tau =0.25\), and \(d=5\) (left), respectively \(d=20\) (right)
Table 3 Empirical power in % for \((T_d,N_d,AD)\) based on \(N=1\),\(000\) replications with \(n=150\), \(\tau =0.25\), and \(d=5\) (left), respectively \(d=20\) (right)

Table 4 shows the empirical power of the method \((R,N_d,AD)\). In comparison to our proposed goodness-of-fit approach \((T_{d-1},N_{d-1},AD)\), the approach \((R,N_d,AD)\) overall performs worse. For \(d=5\), there are only two cases where \((R,N_d,AD)\) performs better than \((T_{d-1},N_{d-1},AD)\) which are testing the Ali-Mikhail-Haq copula when the true copula is \(t_4\) and testing Joe’s copula when the true one is Gumbel. In the high-dimensional case \(d=20\), only results for the Clayton copula are obtained. In this case the actual number of repetitions for calculating the empirical power is approximately 500. For the cases when testing the Ali-Mikhail-Haq, Gumbel, Frank, or Joe copula, no reliable results were obtained since only about 20 repetitions could be run in the runtime provided by the grid. This is due to the high-order derivatives involved in this transformation, which slow down computations considerably; see [19] for more details.

Table 4 Empirical power in % for \((R,N_d,AD)\) based on \(N=1\),\(000\) replications with \(n=150\), \(\tau =0.25\), and \(d=5\) (left), respectively \(d=20\) (right)

Another aspect, especially in a high-dimensional setup is numerical precision. In going from the low- to the high-dimensional case we faced several problems during our computations. For example, the approach \((R,N_d,AD)\) shows difficulties in testing the \(H_0\) copula of Ali-Mikhail-Haq for \(d=20\). Even after applying Maple (with Digits set to 15; default is 10), the goodness-of-fit tests indicated numerical problems. The numerical issues appearing in the testing approaches \((K_C,AD)\) and \((T_d,N_d,AD)\) when evaluating the Kendall distribution function were already mentioned earlier, e.g., in Sect. 4.1. In principal, one could be tempted to choose a (much) higher precision than standard double in order to obtain more reliable testing results. However, note that this significantly increases runtime. Under such a setup, applying a bootstrap procedure would not be possible anymore. In high dimensions, only the approaches \((T_{d-1},N_{d-1},AD)\) and \((T_{d-1},N_{d-1},\chi ^2)\) can be applied without facing computational difficulties according to precision and runtime.

Concerning the case \(d=100\), we checked if the error probability of the first kind according to the 5 %-level is kept. As results of the procedure described in the end of Sect. 4.2, we obtained 4.6, 4.2, 5.0, 5.5, and 4.9 % for the families of Ali-Mikhail-Haq, Clayton, Frank, Gumbel, and Joe, respectively.

6 A Graphical Goodness-of-fit Test

A plot often provides more information than a single \(p\)-value, e.g., it can be used to determine where deviations from uniformity are located; see [16] who advocate graphical goodness-of-fit tests in higher dimensions. We now briefly apply the transformation \(T:\varvec{ U}\rightarrow \varvec{ U}'\) addressed in Theorem 1 to graphically check how well the transformed variates indeed follow a uniform distribution. Figures 1, 2, and 3 show scatter-plot matrices of 1,000 generated three-dimensional vectors of random variates which are transformed with \(T\) under various assumed models (the captions are self-explanatory). Since \(K_C\) is easily computed in three dimensions, we also use this last component of \(T\).

Fig. 1
figure 1

Data from a Gaussian (left) and \(t_4\) (right) copula with parameter chosen such that Kendall’s tau equals 0.5, transformed with a Gumbel copula with parameter such that Kendall’s tau equals 0.5. The deviations from uniformity are small but visible, especially in the corners of the different panels

Fig. 2
figure 2

Data from a Clayton (left) and Gumbel (right) copula with parameter chosen such that Kendall’s tau equals 0.5, transformed with a Gumbel copula with parameter such that Kendall’s tau equals 0.5. The deviation from uniformity for the Clayton data is clearly visible. Since the Gumbel data is transformed with the correct family and parameter, the resulting variates are indeed uniformly distributed in the unit hypercube

Fig. 3
figure 3

Data from a Gumbel copula with parameter chosen such that Kendall’s tau equals 0.5, transformed with a Gumbel copula with parameter such that Kendall’s tau equals 0.2 (left) and 0.8 (right), respectively. Deviations from uniformity are easily visible

7 Conclusion and Discussion

Goodness-of-fit tests for Archimedean copulas, also suited to high dimensions were presented. The proposed tests are based on a transformation \(T\) whose inverse is known for generating random variates. The tests can, therefore, be viewed as analogs to tests based on Rosenblatt’s transformation, whose inverse is also used for sampling (known as the conditional distribution method). The suggested goodness-of-fit tests proceed in two steps. In the first step, the first \(d-1\) components of \(T\) are applied. They provide a fast and simple transformation from \(d\) to \(d-1\) dimensions. This complements known goodness-of-fit tests using only the \(d\)th component of \(T\), the Kendall distribution function, but which require the knowledge of the generator derivatives. In a second step, the \(d-1\) components are mapped to one-dimensional quantities, which simplifies testing. This second step is common to many goodness-of-fit tests and hence any such test can be applied.

The power of the proposed testing approach was compared to other known goodness-of-fit tests in a large-scale simulation study. In this study, goodness-of-fit tests in comparably high dimensions were investigated. The computational effort (precision, runtime) involved in applying commonly known testing procedures turned out to be tremendous. The results obtained from these tests in higher dimensions have to be handled with care: Numerical issues for the methods for which not all repetitions could be run without problems might have introduced a bias. To apply commonly known goodness-of-fit tests in higher dimensions requires (much) more work in the future, especially on the numerical side. Computational tools which systematically check for numerical inaccuracies and which are implemented on the paradigm of defensive programming might provide a solution here; see [17] for a first work in this direction.

In contrast, our proposed approach is easily applied in any dimension and its evaluation requires only small numerical precision. Due to the short runtimes, it could also be investigated with a bootstrap procedure, showing good performance in high dimensions. Furthermore, it easily extends to the multiparameter case. To reduce the effect of non-robustness with respect to the permutation of the arguments, one could randomize the data dimensions as is done for Rosenblatt’s transformation, see [4].

Finally, a graphical goodness-of fit test is outlined. This is a rather promising field of research for high-dimensional data, since, especially in high dimensions, none of the existing models fits perfectly, and so a graphical assessment of the parts (or dimensions) of the model which fit well and those which do not is in general preferable to a single \(p\)-value.