Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Copula Density Estimation as an Inverse Problem

A copula is a multivariate distribution function of a \(d\)-dimensional random vector with uniformly distributed margins. Sklar’s theorem ensures that any joint multivariate distribution \(F\) of a \(d\)-dimensional vector \(\mathbf{{X}}=(X_1, \ldots , X_d)^T\) with margins \(F_j\) (\(j=1, \ldots , d\)) can be expressed as

$$\begin{aligned} F(x_1, \ldots , x_d) = \mathrm{C}(F_1(x_1), \ldots , F_d(x_d)) \quad \forall \mathbf{{x}}=(x_1, \ldots , x_d)^T \in \mathbb {R}^d \end{aligned}$$

where the copula is unique on \(range(F_1) \times \cdots \times range(F_d)\), that is for continuous margins \(F_1, \ldots , F_d\) the copula \(\mathrm{C}\) is unique on the whole domain. Consequently, the copula contains the complete dependence structure of the random vector \(\mathbf{{X}}\). For a detailed introduction to copulas and their properties see, for example, [8, 9], Chap. 5] or [10]. In risk management, knowledge of the dependence is of paramount importance.

If the copula is sufficiently smooth, the copula density

$$\begin{aligned} \mathrm{c}(u_1, \ldots , u_d) = \frac{\partial ^d \mathrm{C}}{\partial u_1 \cdots \partial u_d} \end{aligned}$$
(1)

exists and then the density gives us the dependence structure in a more convenient way, because usually the graphs of the copulas look very similar and there are only small differences in the slope. For this reason the reconstruction of the copula density is a vibrant field of research in finance and many other scientific fields. Particularly in practical tasks, the dependence structure of more than two random variables is of special interest as the dimension \(d\) is large. In the nonparametric statistical estimation, usually kernel estimators are used, but they have often problems with the boundary bias. There are also spline- or wavelet-based approximation methods, but most of them are only discussed in the two-dimensional case. Likewise, in [12], the authors discuss a penalized nonparametrical maximum likelihood method in the two-dimensional case. A detailed survey of literature about nonparametrical copula density estimation can be found in [6]. However, most of the nonparametrical methods are faced with the curse of dimensionality such that the numerical computations are only for sufficiently low dimensions possible. Actually, many authors discuss only the two-dimensional case in non-parametrical copula density estimation.

In this paper we develop an alternative approach based on the theory of inverse problems. The copula density (1) exists only for absolutely continuous copulas. Obviously, the copula is not observable for a sample \(\mathbf{{X}}_1, \mathbf{{X}}_2, \ldots , \mathbf{{X}}_T\) in the statistical framework, but we can approximate it with the empirical copula

$$\begin{aligned} \hat{\mathrm{C}}(\mathbf{{u}})= \frac{1}{T} \sum \limits _{j=1}^T 1 \!\!\mathrm{1}_{\{\hat{\mathbf{{U}}}_j \le \mathbf{{u}} \}} = \frac{1}{T} \sum \limits _{j=1}^T \prod \limits _{k=1}^d 1 \!\!\mathrm{1}_{\{\hat{U}_{kj} \le u_k \}} \end{aligned}$$
(2)

of the margin transformed pseudo samples \(\hat{\mathbf{{U}}}_1, \hat{\mathbf{{U}}}_2, \ldots , \hat{\mathbf{{U}}}_T\) with \(\hat{U}_{kj} = \hat{F_k}(X_{kj})\) where

$$\begin{aligned} \hat{F}_k (x) = \frac{1}{T} \sum \limits _{j=1}^T 1 \!\!\mathrm{1}_{\{X_{kj} \le x \}} \end{aligned}$$

denotes the empirical margins. It is well-known that the empirical copula uniformly converges to the copula (see [2])

$$\begin{aligned} \max \limits _{\mathbf{{u}} \in [0,1]^d} \left| \mathrm{C}(\mathbf{{u}})-\hat{\mathrm{C}}(\mathbf{{u}}) \right| = \mathcal{{O}} \left( \frac{(\log \log T)^{\frac{1}{2}}}{T^{\frac{1}{2}}} \right) \quad \text {a.s.} \; \text { for } T \rightarrow \infty \end{aligned}$$
(3)

Therefore, we treat the empirical copula as a noisy representation of the unobservable copula \(\mathrm{C}^\delta = \hat{\mathrm{C}}\). The estimation problem of the density is faced with differentiating the empirical copula, which is obviously not smooth. However, for each density it yields the integral equation

$$\begin{aligned} \int \limits _0^{u_1} \cdots \int \limits _0^{u_d} \mathrm{c}(s_1,\ldots , s_d) \mathrm{d}s_1 \cdots \mathrm{d}s_d = \mathrm{C}(u_1, \ldots , u_d) \quad \forall \mathbf{{u}}=(u_1, \ldots , u_d)^T \in \varOmega = [0,1]^d \end{aligned}$$
(4)

which can be seen as a weak formulation of Eq. (1). In the following, we therefore consider the linear Volterra integral operator \(\mathrm{A}\in \fancyscript{L} \left( L^1(\varOmega ) , L^2(\varOmega ) \right) \) and solve the linear operator equation

$$\begin{aligned} \mathrm{A}\mathrm{c}= \mathrm{C}\end{aligned}$$
(5)

to find the copula density \(\mathrm{c}\). In the following, we assume attainability which means \(\mathrm{C}\in \fancyscript{R}(\mathrm{A})\), hence we only consider copulas \(\mathrm{C}\in L^2(\varOmega )\) which have a solution \(\mathrm{c}\in L^1(\varOmega )\)

The injective Volterra integral operator is well-studied in the inverse problem literature. Even in the one-dimensional case, this is an ill-posed operator resulting from the noncontinuity of the inverse \(\mathrm{A}^{-1}\), which is the differential operator. Hence, solving Eq. (1) leads to numerical instabilities if the right-hand side of (5) has only a small data error. Because the solution is sensitive to small data errors, regularization methods to overcome the instability are discussed in the inverse problem literature. For a detailed introduction to regularization see, for example, [4, 13].

In Sect. 2 we discuss a discretization of the integral equation (4) and in Sect. 3, we illustrate the numerical instability if we use the empirical copula instead of the exact one and discuss regularization methods for the discretized problem.

The basics to the numerical implementation of the problem and especially the details of the Kronecker multiplication are presented in the authors working paper [14] and a discussion that the Petrov–Galerkin projection is not a simple counting algorithm is done in [15]. This paper gives an summary of the proposed method for effective computation of the right-hand side for larger dimensions and discusses in more detail the analytical aspects of the inverse problem and reasons for the existence of the Kronecker structure.

2 Numerical Approximation

We discuss the numerical computation of the copula density \(\mathrm{c}\in \mathrm {X}= L^1(\varOmega )\) from a given copula \(\mathrm{C}\in \mathrm {Y}= L^2(\varOmega )\), which is in principle a numerical differentiation and in higher dimensions, a very hard problem (see [1]). Moreover, in practical applications, the measured data \(\mathrm{C}^\delta \) have some noise \(\delta \) with \(\left\| \mathrm{C}- \mathrm{C}^\delta \right\| _{\mathrm {Y}} \le \delta \) and very often the function is not smooth enough that is \(\mathrm{C}^\delta \notin C^1(\varOmega )\) even \(\mathrm{C}\in C^1(\varOmega )\), which leads to numerical instabilities making a usual numerical differentiation impossible.

For the sake of convenience, we write

$$\begin{aligned} \int \limits _{\mathbf{{0}}}^{\mathbf{{u}}} \mathrm{c}(\mathbf{{s}}) \mathrm{d} \mathbf{{s}} = \mathrm{C}(\mathbf{{u}}) \quad \forall \mathbf{{u}}=(u_1, \ldots , u_d)^T \in \varOmega = [0,1]^d \end{aligned}$$

for Eq. (4) as a short form. We propose applying a Petrov–Galerkin projection (see [5]) for some discretization size \(h\) and consider the finite dimensional approximation

$$\begin{aligned} \mathrm{c}_h(s) = \sum \limits _{j=1}^N c_j \phi _j(s)\,, \end{aligned}$$
(6)

where \(\varPhi = \lbrace \phi _1,\phi _2, \ldots , \phi _N \rbrace \) is a basis of the ansatz space \(V_h\). The vector of coefficients \(\mathbf{{c}}=(c_1, \ldots , c_N)^T \in \mathbb {R}^N\) is chosen such that

$$\begin{aligned} \int \limits _{\varOmega } {\int \limits _{\mathbf{{0}}}^{\mathbf{{u}}}} \mathrm{c}_h(\mathbf{{s}}) \mathrm{d}\mathbf{{s}} \psi (\mathbf{{u}}) \mathrm{d}\mathbf{{u}} = \int \limits _{\varOmega } \mathrm{C}(\mathbf{{u}}) \psi (\mathbf{{u}}) \mathrm{d}\mathbf{{u}} \quad \forall \psi \in \tilde{V}_h \,. \end{aligned}$$
(7)

It is sufficient to fulfill Eq. (7) for \(N\) linear independent test functions \(\psi _i \in \tilde{V_h}\). This yields the system of linear equations

$$\begin{aligned} K \mathbf{{c}} = \mathbf{{C}} \end{aligned}$$
(8)

with right-hand side

$$\begin{aligned} C_i = \int \limits _\varOmega \mathrm{C}(\mathbf{{u}}) \psi _i(\mathbf{{u}}) \mathrm{d}\mathbf{{u}}, \quad i=1, \ldots , N \end{aligned}$$
(9)

and the \(N \times N\) matrix \(K\) with

$$\begin{aligned} K_{ij} = \int \limits _\varOmega \int \limits _{\mathbf{{0}}}^{\mathbf{{u}}} \phi _j(\mathbf{{s}}) \mathrm{d}\mathbf{{s}} \psi _i(\mathbf{{u}}) \mathrm{d}\mathbf{{u}} \,. \end{aligned}$$

If the exact copula is replaced by the empirical copula, we obtain a noisy representation \(\mathbf{{C}}^\delta \) with

$$\begin{aligned} C_i^\delta = \int \limits _\varOmega \hat{\mathrm{C}}(\mathbf{{u}}) \psi _i(\mathbf{{u}}) \mathrm{d}\mathbf{{u}}, \quad i=1, \ldots , N \end{aligned}$$
(10)

of the exact right-hand side \(\mathbf{{C}}\). A typical phenomenon of ill-posed inverse problems is that the numerically computed solution based on noisy data (10) will be high oszillating without choosing a proper regularization. This problem is not caused by the numerical approximation, but rather by the discontinuity of the inverse operator. In Section 3 this will be illustrated. Figure 3 shows the reconstructed density of the Student copula for exact data (9), whereas Fig. 5 shows it for different noise levels.

In principle, we can choose arbitrary ansatz functions \(\phi _j \in V_h\) and test functions \(\psi _i \in \tilde{V}_h\). However, having the curse of high dimensions in mind, we choose very simple ansatz functions such that the matrix \(K\) gets a very special structure allowing us to solve (8) and compute the approximated copula density also for higher dimensional copulas. Obviously, the approximated density (6) is not smooth and in order to obtain a smoother approximated copula \(\mathrm{C}_h\) with

$$\begin{aligned} \mathrm{C}_h (\mathbf{{u}}) = \int \limits _{\mathbf{{0}}}^{\mathbf{{u}}} \mathrm{c}_h(\mathbf{{s}}) \mathrm{d}\mathbf{{s}} \end{aligned}$$

we choose the test functions as integrated ansatz functions, such that the approximated copula

$$\begin{aligned} \mathrm{C}_h (\mathbf{{u}}) = \sum \limits _{j=1}^N c_j \psi _j (\mathbf{{u}}) \end{aligned}$$

is smoother than the approximated density.

We discretize the domain \(\varOmega \) by splitting each one-dimensional interval \([0,1]\) in \(n\) equal subintervals of length \(h=\frac{1}{n}\). Hence, we obtain \(N=n^d\) equal-sized hypercubes and call these elements \(e_1, \ldots , e_N\). We number the elements in a specific order, illustrated in Fig. 1 such that if we look at the \((d+1)\)-dimensional problem, the first \(n^d\) elements of the new problem have the same number and location as the elements of the \(d\)-dimensional problem.

Fig. 1
figure 1

Discretization of the domain \(\varOmega =[0,1]^d\). a \(d=2\). b \(d=3\)

We set \(N=n^d\) and choose the ansatz functions

$$\begin{aligned} \phi _j(\mathbf{{u}})= {\left\{ \begin{array}{ll} 1 \quad \quad \mathbf{{u}} \in e_j \\ 0 \quad \quad \text { otherwise} \end{array}\right. } \end{aligned}$$
(11)

and the test functions \(\psi _i\) as the integrated ansatz functions

$$\begin{aligned} \psi _i(\mathbf{{u}})= \int \limits _{\mathbf{{0}}}^{\mathbf{{u}}} \phi _i(\mathbf{{s}}) \mathrm{d}\mathbf{{s}}. \end{aligned}$$
(12)

In contrast to finite element discretizations, the system matrix \(K\) is not sparse and the system size \(N=n^d\) grows exponentially with the dimension \(d\). A straightforward assembling and solving of the linear system (8) becomes impossible for usual discretizations \(n\). Even in the three-dimensional case, the matrix storage of the system matrix for \(n=80\) needs approximately one terabyte, even when exploiting symmetry, and computing times for assembling and solving such systems will become enormous.

The choices (11) and (12) yield a structure of the \(N \times N\) system matrix \(K\), illustrated in Fig. 2, allowing us to solve (8) also for \(d>2\). The matrixplot shows that the \(n \times n\) system matrix of the one-dimensional case is equivalent to the upper left \(n \times n\) corners of the two- and three-dimensional matrices. Moreover, the other parts of the system matrices are scaled replications of the one-dimensional \(n \times n\) system matrix. This effect is based by a Kronecker factorization of the \(d\)-dimensional system matrix into \(d\) one-dimensional matrices of the one-dimensional problem.

Fig. 2
figure 2

Matrixplots of the system matrix \(K\) for \(n=4\) and different dimensions \(d\). a System matrix for \(d=1\). b System matrix for \(d=2\). c System matrix for \(d=3\)

One important reason for this structure is that the chosen ansatz functions decomposed into a product of one-dimensional ansatz functions. In order to illustrate this, we consider the lowest corner \(\mathbf{{b}}^i\) of the \(i\)th element and define the one-dimensional function

$$\begin{aligned} \phi _i^k = 1 \!\!\mathrm{1}_{\{[b^i_k, b^i_k+h]\}} \end{aligned}$$

This yields

$$\begin{aligned} \phi _i(\mathbf{{u}})=\prod \limits _{k=1}^d \phi _i^k(u_k) \end{aligned}$$
(13)

as well as

$$\begin{aligned} \psi _i(\mathbf{{u}})=\prod \limits _{k=1}^d \psi _i^k(u_k) \end{aligned}$$
(14)

with the one-dimensional test functions

$$\begin{aligned} \psi _i^k(u) = \int \limits _0^u \phi _i^k(s) \mathrm{d}s. \end{aligned}$$

We only formulate the main result allowing us to compute solutions of (8) also for higher dimensions \(d\). Details and proofs can be found in the working paper [14].

Theorem 1

The system matrix for the \((d+1)\)-dimensional case can be extracted from the one and \(d\)-dimensional system matrices.

$$\begin{aligned} {^{(d+1)}}K = {^{(1)}}K \otimes {^{(d)}}K \end{aligned}$$

Corollary 1

The system matrix \({^{(d)}}K\) is the \(d\)-fold Kronecker product of the \(n \times n\) matrix \({^{(1)}}K\)

$$\begin{aligned} {^{(d)}}K = {^{(1)}}K \otimes {^{(1)}}K \otimes \cdots \otimes {^{(1)}}K \end{aligned}$$
(15)

and the inverse system matrix of the \(d\)-dimensional problem is the \(d\)-times Kronecker product of the one-dimensional inverse system matrix

$$\begin{aligned} {^{(d)}}K^{-1} = {^{(1)}}K^{-1} \otimes {^{(1)}}K^{-1} \otimes \cdots \otimes {^{(1)}}K^{-1} \,. \end{aligned}$$

Following Corollary 1 we only have to assemble the one-dimensional system matrix \({^{(1)}}K\) of dimension \(n \times n\), compute its inverse \({^{(1)}}K^{-1}\) and have to perform the Kronecker factorization for computing the solution \(\mathbf{{c}} = {^{(d)}}K^{-1} \mathbf{{C}}\) of (8). Details of the algorithm and an effective Kronecker multiplication are written in [14]. Using effective parallelization methods, the running time can be accelerated. Actually, the computation of the right-hand side (9) is the crucial part and much more expensive than solving the linear system, because we have to evaluate \(N=n^d\) different \(d\)-dimensional integrals over the whole domain \(\varOmega \). Note that for our special choice of ansatz functions (6) we have

$$\begin{aligned} C_i = \int \limits _\varOmega \mathrm{C}(\mathbf{{u}}) \psi _i(\mathbf{{u}}) \mathrm{d}\mathbf{{u}} = \sum \limits _{l=i}^N 1 \!\!\mathrm{1}_{\lbrace b^l \ge b^i \rbrace } \int \limits _{e_l} \mathrm{C}(\mathbf{{u}}) \psi _i(\mathbf{{u}}) \mathrm{d}\mathbf{{u}} \,, \end{aligned}$$
(16)

which also reduces the numerical effort. In higher dimensions, the number of elements \(e_i\) with zero values grows, such that using Eq. (16) instead of (9) improves the running times.

Table 1 Computing times using (16) for the independence copula

In the most practical relevant case, where the components of the right-hand side (10) are evaluated over the empirical copula (2), the numerical effort can be radically reduced, because the \(d\)-dimensional integral

$$\begin{aligned} \begin{array}{l} C_i^\delta = \int \limits _\varOmega \hat{\mathrm{C}}(\mathbf{{u}}) \psi _i( \mathbf{{u}}) \mathrm{d}\mathbf{{u}} = \frac{1}{T} \sum \limits _{j=1}^T \int \limits _{\varOmega } \prod \limits _{k=1}^d 1 \!\!\mathrm{1}_{\{\hat{U}_{kj} \le u_k \}} \psi _i^k(u_k) \mathrm{d}\mathbf{{u}} = \frac{1}{T} \sum \limits _{j=1}^T \prod \limits _{k=1}^d I_{ij}^k \end{array} \end{aligned}$$
(17)

degenerates in a product of \(d\) one-dimensional integrals

$$\begin{aligned} I_{ij}^k = \int \limits _0^1 1 \!\!\mathrm{1}_{\{\hat{U}_{kj} \le s \}} \psi _i^k(s) \mathrm{d}s = {\left\{ \begin{array}{ll} h(1-b_k^i)-\frac{1}{2}h^2 \,, &{} \hat{U}_{kj} < b_k^i\\ h(1-b_k^i)-\frac{1}{2}h^2 -\frac{1}{2}(\hat{U}_{kj}-b_k^i)^2 \,, &{} b_k^i \le \hat{U}_{kj} \le b_k^i+h\\ h(1-\hat{U}_{kj}) \,, &{} \hat{U}_{kj} > b_k^i + h \end{array}\right. } \end{aligned}$$

using Eqs. (13) and (14). In this case, the numerical effort is of order \(\mathcal{{O}} \left( N T d \right) \) which is an extreme improvement to \(\mathcal{{O}} \left( N 3^d T+\frac{N^2+N}{2}3^d \right) \), if the \(d\)-dimensional integrals (10) are numerically computed by a usual \(3^d\)-points Gauss formula. We want to point out that the computation of the right-hand side (10) for the empirical copula based on formula (17) is still possible for \(d=9\), whereas the computational effort for computing (16) for an arbitrary given copula \(\mathrm{C}\) is exorbitant, even if the discretization size \(n\) is moderately chosen. The numerical effort is illustrated in Table 1.

Note that contrary to what might be expected, the vector \(\mathbf{{c}}=(c_1, \ldots , c_N)^T\) does not count the number of samples in the elements, even though the approximated solution \(\mathrm{c}_h\) is a piecewise constant function on the elements and the Petrov–Galerkin projection is not simple counting (for more details see [15]).

2.1 Examples

In order to illustrate the computing times and approximation quality, we use the independent copula

$$\begin{aligned} \mathrm{C}(\mathbf{{u}})=\prod \limits _{k=1}^d u_k \end{aligned}$$

which has the exact solution \(\mathrm{c}(u)=1\). Please note that for this example, we used the exact copula as right-hand side without generating samples. So, there is no data noise and hence \(\delta =0\), which allows us to separate the approximation error and the ill-posedness resulting from the uncontinuity of the inverse operator \(\mathrm{C}^{-1}\).

Many authors (see, for example, [11]) look at the integrated square error, which is the squared \(L^2\)-norm of the difference between the copula density and its approximation. For the independent copula, the integrated square error can easily be computed

$$\begin{aligned} \mathrm {ISE}(\mathrm{c}, \mathrm{c}_h) = \left\| \mathrm{c}- \mathrm{c}_h \right\| _{L^2(\varOmega )}^2 = \frac{1}{N} \left\| \mathbf{{c}} - (1,1, \ldots , 1)^T\right\| _{l^2}^2 \,. \end{aligned}$$

Actually, this error measure is unsuitable, because the natural space for densities is \(L^1\) instead of \(L^2\) (see [3]) and so we measure the difference in the \(L^1\)-norm, which also can be easily computed for the independence copula

$$\begin{aligned} \left\| \mathrm{c}- \mathrm{c}_h\right\| _{L^1(\varOmega )} = \int \limits _\varOmega \left| \mathrm{c}(\mathbf{{u}})-\mathrm{c}_h(\mathbf{{u}}) \right| \mathrm{d}\mathbf{{u}} = \frac{1}{N} \left\| \mathbf{{c}} - (1,1, \ldots , 1)^T \right\| _{l^1} \,. \end{aligned}$$

In Table 1, we give the following quantities for different discretization steps \(n\) in dimension \(1\) and dimension \(d\): the system size \(N=n^d\), the computing times \(t_{rhs}\) for assembling the right-hand side, \(t_{solve}\) for solving the system, \(s_{rhs}\) as the number of computing slaves and the \(L^1\)-approximation errors. For the computation of the right-hand side, a parallel OpenMPI implementation was used with \(s_{rhs}\) computing slaves. For solving the system with the Kronecker factorization, a sequential C++ implementation is used. The exact computation of an ordinary right-hand side without using the product structure gets still impossible for \(d \ge 5\) and the times are estimated computing times. In summary, the example of the independence copula shows that for exact data of the right-hand side, the approximation error is suitable but grows with decreasing discretization size \(h = \frac{1}{n}\). We want to point out that this is typical phenomenon of inverse problems, called “regularization by discretization”.

Table 2 Computing times using (17) for \( T=100\),\(000\) samples
Fig. 3
figure 3

Student copula, \(\rho = 0.5\), \(\nu =1\), \(n=50\), a reconstructed density \(\mathrm{c}\), b copula \(\mathrm{C}\)

Fig. 4
figure 4

Frank copula, \(\theta = 4\), \(n=50\), a reconstructed density \(\mathrm{c}\), b copula \(\mathrm{C}\)

Fig. 5
figure 5

Student copula density, \(\rho = 0.5, \nu = 1\), \(n=50\). a \(T=1\),000,000. b \(T=100\),000. c \(T=10\),000. d \(T=1\),000

If we consider the more practical relevant case, that the empirical copula, generated by \(T\) independent samples of the independence copula, is used, we are faced with data noise \(\delta > 0\) and ill-posedness. Table 2 shows that the computation based on (17) is still possible for \(d \approx 10\). However, the approximation error increase with the dimension \(d\), which is a direct consequence of the ill-posedness, because the condition number of the system matrix \(K\) is the condition number of the one-dimensional system matrix \({^{(1)}}K\) to the power of \(d\).

Naturally, our proposed method works not only for the rather simple independence copula, it also works quite well for all typical copula families. The approximation error for noise free right-hand sides can be neglected. Figures 3 and 4 show the reconstructed densities for the Student and Frank copula, using exact data for the right-hand side. However, ill-posedness is expected when empirical copulas are used. In [14], numerical results for other copula families, like the Gaussian, Gumbel, or Clayton copula, can also be found. However, ill-posedness is expected when empirical copulas are used and we are faced with data noise, which we discuss in the next section.

Fig. 6
figure 6

Frank copula density, \(\theta = 4\), \(n=50\). a \(T=1\),000,000. b \(T=100\),000. c \(T=10\),000. d \(T=1\),000

3 Ill-Posedness and Regularization

Note that in real problems, the copula \(\mathrm{C}\) is not known and we only have noisy data (10) instead of (9). In order to illustrate the expected numerical instabilities, we have simulated \(T\) samples for each two-dimensional copula and present the nonparametric reconstructed densities using the Petrov–Galerkin projection with grid size \(n=50\). A typical problem of ill-posed inverse problems is, that the numerical instability decreases if the grid size \(n\) decreases, which can also be seen in Table 1. Therefore, we fix the grid size \(n=50\) and look at the influence of sample size \(T\).

Fig. 7
figure 7

Regularized Student copula density, \(\rho = 0.5, \nu = 1\), \(n=50\). a \(\alpha =0\), \(T=1\),000 samples. b \(\alpha =10^{-8}\), \(T=1\),000 samples. c \(\alpha =0\), \(T=10\),000 samples. d \(\alpha =10^{-8}\), \(T=10\),000 samples

Fig. 8
figure 8

Regularized Frank copula density, \(\theta = 4\), \(n=50\). a \(\alpha =0\), \(T=1\),000 samples. b \(\alpha =10^{-8}\), \(T=1\),000 samples. c \(\alpha =0\), \(T=10\),000 samples. d \(\alpha =10^{-8}\), \(T=10\),000 samples

Because of (3), the data noise \(\delta \) increases if \(T\) decreases. Figures 5 and 6 show the expected ill-posedness appearing for decreasing sample size \(T\). Of course, this instabilities also occur for the other copula families, but we restrict our illustration here to these two examples. More examples can be found in [14].

To overcome the ill-posedness, an appropriate regularization for the discretized problem (8) is required. Figures 7 and 8 show the reconstructed copula densities for \(T=1\),000 and \(T=10\),000 samples using the well-known Tikhonov regularization. There is no regularization, if the regularization parameter \(\alpha =0\) is chosen. The left-hand side of the figures shows the unregularized solutions. The choice of the regularization parameter \(\alpha = 10^{-8}\) is very naive and arbitrary and serves only as demonstration how the instability can be handled. A better parameter choice should improve the reconstructed densities. It is further work to discuss an appropriate parameter choice rule for Tikhonov regularization as well as other regularization methods.

In order to avoid the complete assembling of the system matrix \(K\) leading to high-dimensional systems for \(d>2\), we are interested in regularization methods using the special structure (15). In particular, all regularization methods based on the singular value or eigenvalue decomposition of \(K\) can be easily handled because the eigenvalue decomposition of the one-dimensional matrix\({^{(1)}}K = V \varLambda V^T\) leads to the eigenvalue decomposition of the system matrix

$$\begin{aligned} K = \left( V \otimes \cdots \otimes V \right) \left( \varLambda \otimes \cdots \otimes \varLambda \right) \left( V^T \otimes \cdots \otimes V^T \right) \,. \end{aligned}$$

A typical property of Tikhonov regularization is that true peaks in the density will be smoothed. This effect appears in particular for the Student copula density. Hence, the reconstruction quality should be improved, if other regularization methods are used. In the inverse problem theory, it is well-known that Tikhonov regularization accompanies \(L^2\)-norm penalization of the regularized solutions. Therefore, \(L^1\) penalties or total variation penalties (see [7]) seem more suitable.

Furthermore, the approximated copula

$$\begin{aligned} {\mathrm{C}_h}(\mathbf{{u}}) = \int \limits _{\mathbf{{0}}}^{\mathbf{{u}}} {\mathrm{c}_h}(\mathbf{{s}}) \mathrm{d}\mathbf{{s}} = \sum \limits _{j=1}^N {c_j} {\psi _j}(\mathbf{{u}}) \end{aligned}$$

should yield the typical properties of copulas. For example, the requirement

$$\mathrm{C}_h(1,\ldots ,1) \overset{!}{=}1$$

yields the condition \(\sum _{j=1}^N c_j =1\) and the requirements

$$\mathrm{C}_h(1,\ldots , 1, u_k, 1, \ldots ,1) \overset{!}{=} u_k \quad k=1, \ldots , d$$

lead to additional conditions on the vector \(\mathbf {c}\), which all together can be used to build problem specific regularization methods.