Model distances for vine copulas in high dimensions

Killiches, Matthias; Kraus, Daniel; Czado, Claudia

doi:10.1007/s11222-017-9733-y

Model distances for vine copulas in high dimensions

Published: 11 February 2017

Volume 28, pages 323–341, (2018)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Matthias Killiches¹,
Daniel Kraus¹ &
Claudia Czado¹

574 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Vine copulas are a flexible class of dependence models consisting of bivariate building blocks and have proven to be particularly useful in high dimensions. Classical model distance measures require multivariate integration and thus suffer from the curse of dimensionality. In this paper, we provide numerically tractable methods to measure the distance between two vine copulas even in high dimensions. For this purpose, we consecutively develop three new distance measures based on the Kullback–Leibler distance, using the result that it can be expressed as the sum over expectations of KL distances between univariate conditional densities, which can be easily obtained for vine copulas. To reduce numerical calculations, we approximate these expectations on adequately designed grids, outperforming Monte Carlo integration with respect to computational time. For the sake of interpretability, we provide a baseline calibration for the proposed distance measures. We further develop similar substitutes for the Jeffreys distance, a symmetrized version of the Kullback–Leibler distance. In numerous examples and applications, we illustrate the strengths and weaknesses of the developed distance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Vine copula approximation: a generic method for coping with conditional dependence

Article Open access 31 January 2017

Selection of Vine Copulas

Selection of sparse vine copulas in high dimensions with the Lasso

Article 24 March 2018

Notes

This includes, for example, C- and D-vines (Aas et al. 2009) having the same diagonal.
Since most copulas have an infinite value at the boundary of the unit cube, we usually restrict ourselves to $[\varepsilon ,1-\varepsilon ]^d$ for a small $\varepsilon >0$.
All numerical calculations in this paper were performed on a Linux computer (8-way Opteron) with 32 cores (each with 2.6 GHz and 3.9 GB of memory).

References

Aas, K., Czado, C., Frigessi, A., Bakken, H.: Pair-copula constructions of multiple dependence. Insur. Math. Econ. 44, 182–198 (2009)
Article MathSciNet MATH Google Scholar
Acar, E.F., Genest, C., Nešlehová, J.: Beyond simplified pair-copula constructions. J. Multivar. Anal. 110, 74–90 (2012)
Article MathSciNet MATH Google Scholar
Bedford, T., Cooke, R.M.: Vines: a new graphical model for dependent random variables. Ann. Stat. 30(4), 1031–1068 (2002)
Article MathSciNet MATH Google Scholar
Brechmann, E.C., Czado, C.: Risk management with high-dimensional vine copulas: an analysis of the Euro Stoxx 50. Stat. Risk Model. 30(4), 307–342 (2013)
MathSciNet MATH Google Scholar
Caflisch, R.E.: Monte carlo and quasi-monte carlo methods. Acta Numer. 7, 1–49 (1998)
Article MathSciNet MATH Google Scholar
Cooke, R.M., Joe, H., Chang, B.: Vine regression. Resources for the Future Discussion Paper, pp. 15–52 (2015)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Hoboken (2012)
MATH Google Scholar
Dißmann, J., Brechmann, E.C., Czado, C., Kurowicka, D.: Selecting and estimating regular vine copulae and application to financial returns. Comput Stat Data Anal 59, 52–69 (2013)
Article MathSciNet Google Scholar
Do, M.N.: Fast approximation of Kullback–Leibler distance for dependence trees and hidden Markov models. IEEE Signal Process. Lett. 10(4), 115–118 (2003)
Article Google Scholar
Haff, I.H., Aas, K., Frigessi, A.: On the simplified pair-copula construction—simply useful or too simplistic? J. Multivar. Anal. 101(5), 1296–1310 (2010)
Article MathSciNet MATH Google Scholar
Hershey, J.R., Olsen, P.A.: Approximating the Kullback Leibler divergence between Gaussian mixture models. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2007. ICASSP 2007, vol. 4, pp. IV–317. IEEE (2007)
Jeffreys, H.: An invariant form for the prior probability in estimation problems. In: Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol. 186, pp. 453–461. The Royal Society (1946)
Joe, H.: Multivariate Models and Multivariate Dependence Concepts. CRC Press, Boca Raton (1997)
Book MATH Google Scholar
Joe, H.: Generating random correlation matrices based on partial correlations. J. Multivar. Anal. 97(10), 2177–2189 (2006)
Article MathSciNet MATH Google Scholar
Joe, H.: Dependence Modeling with Copulas. CRC Press, Boca Raton (2014)
MATH Google Scholar
Killiches, M., Czado, C.: Block-maxima of vines. In: Dey, D., Yan, J. (eds.) Extreme Value Modelling and Risk Analysis: Methods and Applications, pp. 109–130. CRC Press, Boca Raton (2015)
Chapter Google Scholar
Killiches, M., Kraus, D., Czado, C.: Examination and visualisation of the simplifying assumption for vine copulas in three dimensions. Aust. N. Z. J. Stat. (2016). doi:10.1111/anzs.12182
MATH Google Scholar
Kraus, D., Czado, C.: D-vine copula based quantile regression. Comput. Stat. Data Anal. 110C, 1–18 (2017)
Article MathSciNet Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Article MathSciNet MATH Google Scholar
Maya, L., Albeiro, R., Gomez-Gonzalez, J.E., Melo Velandia, L.F.: Latin american exchange rate dependencies: a regular vine copula approach. Contemp. Econ. Policy 33(3), 535–549 (2015)
Article Google Scholar
McKay, M.D., Beckman, R.J., Conover, W.J.: Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21, 239–245 (1979)
MathSciNet MATH Google Scholar
Morales-Nápoles, O.: Counting vines. In: Kurowicka, D., Joe, H. (eds.) Dependence Modeling: Vine Copula Handbook. World Scientific Publishing Co, Singapore (2011)
Google Scholar
Nagler, T., Czado, C.: Evading the curse of dimensionality in nonparametric density estimation with simplified vine copulas. J. Multivar. Anal. 151, 69–89 (2016)
Article MathSciNet MATH Google Scholar
Nelsen, R.: An Introduction to Copulas, 2nd edn. Springer-Science Business Media, New York (2006)
MATH Google Scholar
Panagiotelis, A., Czado, C., Joe, H.: Pair copula constructions for multivariate discrete data. J. Am. Stat. Assoc. 107(499), 1063–1072 (2012)
Article MathSciNet MATH Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2017)
Rosenblatt, M.: Remarks on a Multivariate Transformation. Ann. Math. Stat. 23(3), 470–472 (1952)
Article MathSciNet MATH Google Scholar
Schepsmeier, U.: Efficient information based goodness-of-fit tests for vine copula models with fixed margins. J. Multivar. Anal. 138, 34–52 (2015)
Article MathSciNet MATH Google Scholar
Schepsmeier, U., Stoeber, J., Brechmann, E.C., Graeler, B., Nagler, T., Erhardt, T.: VineCopula: statistical inference of vine copulas. R Package Version 2(1), 1 (2017)
Article Google Scholar
Sklar, A.: Fonctions dé Repartition á n Dimensions et leurs Marges. Publ. Inst. Stat. Univ. Paris 8, 229–231 (1959)
MATH Google Scholar
Stöber, J., Czado, C.: Pair copula constructions. In: Mai, J.-F., Scherer, M. (eds.) Simulating Copulas: Stochastic Models, Sampling Algorithms, and Applications. World Scientific, Singapore (2012)
Google Scholar
Stöber, J., Joe, H., Czado, C.: Simplified pair copula constructions—limitations and extensions. J. Multivar. Anal. 119, 101–118 (2013)
Article MathSciNet MATH Google Scholar
Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max–min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65(1), 31–78 (2006)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the editor and an anonymous referee for their constructive comments and suggestions, which helped to improve the quality of the paper. The first author acknowledges financial support by a research stipend of the Technische Universität München. The third author is supported by the German Research Foundation (DFG Grant CZ 86/4-1). Numerical calculations were performed on a Linux cluster supported by DFG Grant INST 95/919-1 FUGG.

Author information

Authors and Affiliations

Zentrum Mathematik, Technische Universität München, Boltzmannstraße 3, 85748, Garching, Germany
Matthias Killiches, Daniel Kraus & Claudia Czado

Authors

Matthias Killiches
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Kraus
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Czado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias Killiches.

Appendices

Appendix 1: Proof of Proposition 1

From Eq. (2.3) we know that the vine copula density can be written as a product over the pair-copula expressions corresponding to the matrix entries. In Property 2.8 (ii), Dißmann et al. (2013) state that deleting the first row and column from a d-dimensional structure matrix yields a $(d-1)$-dimensional trimmed structure matrix. Due to Property 2 from Definition 1 the entry $m_{1,1}=1$ does not appear in the remaining matrix. Hence, we obtain the density $c_{2:d}$ by taking the product over all pair-copula expressions corresponding to the entries in the trimmed matrix. Iterating this argument yields that the entries of matrix $M_k:=(m_{i,j})_{i,j=k+1}^d$ resulting from cutting the first k rows and columns from M represent the density $c_{(k+1):d}$. In general, we have

$$\begin{aligned} c_{j|(j+1):d}\left( u_j|u_{j+1},\ldots ,u_d\right) =\frac{c_{j:d}\left( u_j,\ldots ,u_d\right) }{c_{(j+1):d}\left( u_{j+1},\ldots ,u_d\right) }. \end{aligned}$$

The numerator and denominator can be obtained as the product over all pair-copula expressions corresponding to the entries of $M_{j-1}$ and $M_j$. Thus, $c_{j|(j+1):d}$ is simply the product over the expressions corresponding to the entries from the first column of $M_{j-1}$. This proves Eq. (2.4).

Appendix 2: Proof of Proposition 2

We will prove an even more general version of Proposition 2 that holds for arbitrary densities f and g:

$$\begin{aligned} {{\mathrm{KL}}}\big (f,g\big )= & {} \sum _{j=1}^d{\mathbb {E}}_{f_{(j+1):d}}\Big [{{\mathrm{KL}}}\Big (f_{j|(j+1):d}\left( \,\cdot \,|{\mathbf {X}}_{(j+1):d}\right) ,\\&g_{j|(j+1):d}\left( \,\cdot \,|{\mathbf {X}}_{(j+1):d}\right) \Big )\Big ], \end{aligned}$$

where ${\mathbf {X}}_{(j+1):d}\sim f_{(j+1):d}$ and ${(d+1)\!:\!d}:=\emptyset $. Proposition 2 then follows directly from this statement.

Recall that using recursive conditioning we can obtain for density f

$$\begin{aligned} f\left( x_1,\ldots ,x_d\right) =\prod _{j=1}^d{f_{j|(j+1):d}}\left( x_j|{\mathbf {x}}_{(j+1):d}\right) . \end{aligned}$$

Thus, the Kullback–Leibler distance between f and g can be written in the following way:

$$\begin{aligned} \begin{aligned}&{{\mathrm{KL}}}\big (f,g\big )=\int \limits _{{\mathbf {x}}\in {\mathbb {R}}^d}{\ln \left( \frac{f({\mathbf {x}})}{g({\mathbf {x}})}\right) f({\mathbf {x}})\mathop {}\!\mathrm {d}{\mathbf {x}}}\\&\quad =\int \limits _{{\mathbf {x}}\in {\mathbb {R}}^d}\sum _{j=1}^{d}{\ln \left( \frac{f_{j|(j+1):d}\left( x_j|{\mathbf {x}}_{(j+1):d}\right) }{g_{j|(j+1):d}\left( x_j|{\mathbf {x}}_{(j+1):d}\right) }\right) f({\mathbf {x}})\mathop {}\!\mathrm {d}{\mathbf {x}}}\\&\quad =\sum _{j=1}^{d}\int \limits _{x_d\in {\mathbb {R}}}\!\!\cdots \!\!\int \limits _{x_1\in {\mathbb {R}}}\ln \left( \frac{f_{j|(j+1):d}\left( x_j|{\mathbf {x}}_{(j+1):d}\right) }{g_{j|(j+1):d}\left( x_j|{\mathbf {x}}_{(j+1):d}\right) }\right) \\&\qquad \times f(x_1,\ldots ,x_d)\mathop {}\!\mathrm {d}x_1\cdots \ \mathop {}\!\mathrm {d}x_d\\&\quad =\sum _{j=1}^{d}\int \limits _{x_d\in {\mathbb {R}}}\!\!\cdots \!\!\int \limits _{x_j\in {\mathbb {R}}}\ln \left( \frac{f_{j|(j+1):d}\left( x_j|{\mathbf {x}}_{(j+1):d}\right) }{g_{j|(j+1):d}\left( x_j|{\mathbf {x}}_{(j+1):d}\right) }\right) \\&\qquad \times \left\{ \int \limits _{x_{j-1}\in {\mathbb {R}}}\!\!\cdots \!\!\int \limits _{x_1\in {\mathbb {R}}}f(x_1,\ldots ,x_d)\mathop {}\!\mathrm {d}x_1\cdots \mathop {}\!\mathrm {d}x_{j-1}\right\} \mathop {}\!\mathrm {d}x_j\cdots \ \mathop {}\!\mathrm {d}x_d\\&\quad =\sum _{j=1}^{d}\int \limits _{x_d\in {\mathbb {R}}}\!\!\cdots \!\!\int \limits _{x_j\in {\mathbb {R}}}\ln \left( \frac{f_{j|(j+1):d}\left( x_j|{\mathbf {x}}_{(j+1):d}\right) }{g_{j|(j+1):d}\left( x_j|{\mathbf {x}}_{(j+1):d}\right) }\right) \\&\qquad \times f_{j,\ldots ,d}(x_j,\ldots ,x_d)\mathop {}\!\mathrm {d}x_j\cdots \ \mathop {}\!\mathrm {d}x_d\\&\quad =\sum _{j=1}^{d}\int \limits _{x_d\in {\mathbb {R}}}\!\!\cdots \!\!\int \limits _{x_{j+1}\in {\mathbb {R}}}\left\{ \int \limits _{x_{j}\in {\mathbb {R}}}\ln \left( \frac{f_{j|(j+1):d}\left( x_j|{\mathbf {x}}_{(j+1):d}\right) }{g_{j|(j+1):d}\left( x_j|{\mathbf {x}}_{(j+1):d}\right) }\right) \right. \\&\qquad \left. \times f_{j|(j+1):d}(x_j|{\mathbf {x}}_{(j+1):d})\mathop {}\!\mathrm {d}x_j\right\} f_{(j+1):d}({\mathbf {x}}_{(j+1):d})\mathop {}\!\mathrm {d}x_{j+1}\cdots \mathop {}\!\mathrm {d}x_d\\&\quad =\sum _{j=1}^d{\mathbb {E}}_{f_{(j+1):d}}\\&\quad \qquad \left[ {{\mathrm{KL}}}\left( f_{j|(j+1):d}(\,\cdot \,|{\mathbf {X}}_{(j+1):d}),g_{j|(j+1):d}(\,\cdot \,|{\mathbf {X}}_{(j+1):d})\right) \right] . \end{aligned} \end{aligned}$$

Appendix 3: Number of vines with the same diagonal

Proposition 4

Let $\varvec{\sigma }=\left( \sigma _1,\ldots ,\sigma _d\right) '$ be a permutation of ${1\!:\!d}$. Then, there exist $2^{\left( {\begin{array}{c}d-2\\ 2\end{array}}\right) +d-2}$ different vine decompositions whose structure matrix has the diagonal $\varvec{\sigma }$.

Proof

The number of vine decompositions whose structure matrix has the same diagonal $\sigma $ can be calculated as the quotient of the number of valid structure matrices and the number of possible diagonals. Morales-Nápoles (2011) show that there are $\frac{d!}{2}\cdot 2^{\left( {\begin{array}{c}d-2\\ d\end{array}}\right) }$ different vine decompositions. In each of the $d-1$ steps of the algorithm for encoding a vine decomposition in a structure matrix (see Stöber and Czado 2012) we have two possible choices such that there are $2^{d-1}$ structure matrices representing the same vine decomposition. Hence, there are in total $\frac{d!}{2}\cdot 2^{\left( {\begin{array}{c}d-2\\ d\end{array}}\right) }\cdot 2^{d-1}$ valid structure matrices. Further, there are d! different diagonals. Thus, for a fixed diagonal $\varvec{\sigma }$ there exist

$$\begin{aligned} \frac{\frac{d!}{2}\cdot 2^{\left( {\begin{array}{c}d-2\\ d\end{array}}\right) }\cdot 2^{d-1}}{d!}=2^{\left( {\begin{array}{c}d-2\\ 2\end{array}}\right) +d-2}\quad \text {different vine decompositions.} \end{aligned}$$

$\square $

Appendix 4: Proof of Proposition 3

Let $\varepsilon >0$ and $n\in {\mathbb {N}}$. To simplify notation, for $j=1,\ldots ,d-1$ we define

$$\begin{aligned}&\kappa _j\left( {\mathbf {u}}_{(j+1):d}\right) \\&\quad :={{\mathrm{KL}}}\left( c^{f}_{j|(j+1):d}\left( \,\cdot \,|{\mathbf {u}}_{(j+1):d}\right) ,c^{g}_{j|(j+1):d}\left( \,\cdot \,|{\mathbf {u}}_{(j+1):d}\right) \right) . \end{aligned}$$

Then, by definition

$$\begin{aligned} \begin{aligned} {{\mathrm{aKL}}}\big ({\mathscr {R}}^f,{\mathscr {R}}^g\big )&=\sum _{j=1}^{d-1}\frac{1}{n^{d-j}}\sum _{{\mathbf {u}}_{(j+1):d}\in {\mathscr {G}}_j}\kappa _j\left( {\mathbf {u}}_{(j+1):d}\right) \\ {}&=\sum _{j=1}^{d-1}\frac{1}{n^{d-j}}\sum _{{\mathbf {w}}_{(j+1):d}\in {\mathscr {W}}_j}\kappa _j\big (T_{c_{(j+1):d}^f}({\mathbf {w}}_{(j+1):d})\big ). \end{aligned} \end{aligned}$$

Since $W_j$ is a discretization of $[\varepsilon ,1-\varepsilon ]^{d-j}$ with mesh going to zero for $n\rightarrow \infty $, we have

$$\begin{aligned} \frac{1}{n^{d-j}}\sum _{{\mathbf {w}}_{(j+1):d}\in {\mathscr {W}}_j}\kappa _j\big (T_{c_{(j+1):d}^f}({\mathbf {w}}_{(j+1):d})\big )\\\mathop {\longrightarrow }\limits ^{n\rightarrow \infty }\int _{[\varepsilon ,1-\varepsilon ]^{d-j}}\kappa _j\big (T_{c_{(j+1):d}^f}({\mathbf {w}}_{(j+1):d})\big )\mathop {}\!\mathrm {d}{\mathbf {w}}_{(j+1):d}. \end{aligned}$$

Substituting ${\mathbf {w}}_{(j+1):d}=T^{-1}_{c_{(j+1):d}^f}\left( {\mathbf {u}}_{(j+1):d}\right) $ yields

$$\begin{aligned}&\int _{[\varepsilon ,1-\varepsilon ]^{d-j}}\kappa _j\big (T_{c_{(j+1):d}^f}({\mathbf {w}}_{(j+1):d})\big )\mathop {}\!\mathrm {d}{\mathbf {w}}_{(j+1):d}\\&\quad =\int _{T_{c_{(j+1):d}^f}\left( [\varepsilon ,1-\varepsilon ]^{d-j}\right) }\kappa _j\big ({\mathbf {u}}_{(j+1):d}\big ) c_{(j+1):d}^f\\&\qquad \left( {\mathbf {u}}_{(j+1):d}\right) \mathop {}\!\mathrm {d}{\mathbf {u}}_{(j+1):d} \end{aligned}$$

since

$$\begin{aligned}&T^{-1}_{c_{(j+1):d}^f}\left( {\mathbf {u}}_{(j+1):d}\right) \\&\quad =\big (C^f_{j+1|(j+2):d}(u_{j+1}|{\mathbf {u}}_{(j+2):d}), \ldots , C^f_{d-1|d}(u_{d-1}|{\mathbf {u}}_{d}), u_d\big )' \end{aligned}$$

with (upper triangular) Jacobian matrix

such that $\mathop {}\!\mathrm {d}{\mathbf {w}}_{(j+1):d}= \det (J) \mathop {}\!\mathrm {d}{\mathbf {u}}_{(j+1):d}=c_{(j+1):d}^f\left( {\mathbf {u}}_{(j+1):d}\right) \mathop {}\!\mathrm {d}{\mathbf {u}}_{(j+1):d}$. Since we are only interested in the determinant of J, whose lower-triangular matrix contains only zeros, the values in the upper triangular matrix (denoted by ) are irrelevant here. Finally, using the fact that

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} T_{c_{(j+1):d}^f}\left( [\varepsilon ,1-\varepsilon ]^{d-j}\right)= & {} T_{c_{(j+1):d}^f}\left( [0,1]^{d-j}\right) \\= & {} [0,1]^{d-j}, \end{aligned}$$

we obtain

$$\begin{aligned} \begin{aligned}&\lim _{\varepsilon \rightarrow 0}\lim _{n\rightarrow \infty }{{\mathrm{aKL}}}\big ({\mathscr {R}}^f,{\mathscr {R}}^g\big ) \\&\quad = \sum _{j=1}^{d-1} \int \limits _{[0,1]^{d-j}}\!\!\!\!\kappa _j\big ({\mathbf {u}}_{(j+1):d}\big ) c_{(j+1):d}^f\left( {\mathbf {u}}_{(j+1):d}\right) \mathop {}\!\mathrm {d}{\mathbf {u}}_{(j+1):d}\\&\quad \mathop {=}\limits ^{\text {Prop. }2}{{\mathrm{KL}}}\big (c^f,c^g\big ). \end{aligned} \end{aligned}$$

Appendix 5: Regarding Remark 2

1.1 Limit of the dKL

Let $\varepsilon >0$ and $n\in {\mathbb {N}}$. Again, for $j=1,\ldots ,d-1$ we define

$$\begin{aligned}&\kappa _j\left( {\mathbf {u}}_{(j+1):d}\right) \\&\quad :={{\mathrm{KL}}}\left( c^{f}_{j|(j+1):d}\left( \,\cdot \,|{\mathbf {u}}_{(j+1):d}\right) ,c^{g}_{j|(j+1):d}\left( \,\cdot \,|{\mathbf {u}}_{(j+1):d}\right) \right) . \end{aligned}$$

The contribution of ${\mathscr {D}}^u_{j,k}, j=1,\ldots ,d-1$, $k{=}1,\ldots ,2^{d-j-1}$, to the dKL is given by

$$\begin{aligned} \begin{aligned}&\frac{1}{n}\sum _{{\mathbf {u}}_{(j+1):d}\in {\mathscr {D}}^u_{j,k}}\kappa _j\left( {\mathbf {u}}_{(j+1):d}\right) \\&\quad =\frac{1}{n}\sum _{{\mathbf {w}}_{(j+1):d}\in {\mathscr {D}}^w_{j,k}}\kappa _j\big (T_{c_{(j+1):d}^f}({\mathbf {w}}_{(j+1):d})\big )\\&\quad =\frac{1}{n}\sum _{i=1}^n\kappa _j\big (T_{c_{(j+1):d}^f}(\varvec{\omega }(t_i))\big ), \end{aligned} \end{aligned}$$

where $\varvec{\omega }(t)={\mathbf {r}}+t {\mathbf {v}}({\mathbf {r}})$ with ${\mathbf {v}}(\cdot )$ as defined in Definition 4, ${\mathbf {r}}\in \left\{ 0,1\right\} ^{d-j}$ being a corner point of $D^w_{j,k}$ and $t_i=\varepsilon +(i-1)\frac{1-2\varepsilon }{n-1}$ for $i=1,\ldots ,n$. Letting $n\rightarrow \infty $ yields

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\kappa _j\big (T_{c_{(j+1):d}^f}(\varvec{\omega }(t_i))\big )\mathop {\longrightarrow }\limits ^{n\rightarrow \infty }\int _{t\in [\varepsilon ,1-\varepsilon ]}\kappa _j\big (T_{c_{(j+1):d}^f}(\varvec{\omega }(t))\big )\mathop {}\!\mathrm {d}t. \end{aligned}$$

(4.1)

Now, we further let $\varepsilon \rightarrow 0$ and use the fact that $\left\| {\dot{\varvec{\omega }}}(t)\right\| =\sqrt{d-j}$ to obtain

$$\begin{aligned} \begin{aligned}&\int \limits _{t\in [0,1]}\kappa _j\big (T_{c_{(j+1):d}^f}(\varvec{\omega }(t))\big )\mathop {}\!\mathrm {d}t\\&\quad =\frac{1}{\sqrt{d-j}}\int \limits _{t\in [0,1]}\kappa _j\big (T_{c_{(j+1):d}^f}(\varvec{\omega }(t))\big )\left\| {\dot{\varvec{\omega }}}(t)\right\| \mathop {}\!\mathrm {d}t\\&\quad =\frac{1}{\sqrt{d-j}}\int \limits _{{\mathbf {w}}_{(j+1):d}\in D^w_{j,k}}\kappa _j\big (T_{c_{(j+1):d}^f}({\mathbf {w}}_{(j+1):d})\big )\mathop {}\!\mathrm {d}{\mathbf {w}}_{(j+1):d}\\&\quad =\frac{1}{\sqrt{d-j}}\int \limits _{{\mathbf {u}}_{(j+1):d}\in D^u_{j,k}}\kappa _j\big ({\mathbf {u}}_{(j+1):d}\big )c_{(j+1):d}\\&\quad \big ({\mathbf {u}}_{(j+1):d}\big )\mathop {}\!\mathrm {d}{\mathbf {u}}_{(j+1):d}, \end{aligned} \end{aligned}$$

where we used the substitution ${\mathbf {u}}_{(j+1):d}:=T^{-1}_{c_{(j+1):d}^f}({\mathbf {w}}_{(j+1):d})$, $\mathop {}\!\mathrm {d}{\mathbf {w}}_{(j+1):d}=c_{(j+1):d}^f({\mathbf {u}}_{(j+1):d})\mathop {}\!\mathrm {d}{\mathbf {u}}_{(j+1):d}$ (cf. Appendix D) in the last line.

1.2 Tail transformation

In our empirical applications of the dKL, we have noticed that different vines tend to differ most in the tails of the distribution. Therefore, we increase the concentration of evaluation points in the tails of the diagonal by transforming the points $t_i$, $i=1,\ldots ,n$, via a suited function $\varPsi $. Hence, by substituting $t=\varPsi (s)$ in Eq. (4.1) we obtain

$$\begin{aligned} \int _{s\in \varPsi ^{-1}([\varepsilon ,1-\varepsilon ])}\kappa _j\Big (T_{c_{(j+1):d}^f}\big (\eta \big (\varPsi (s)\big )\big )\Big )\varPsi '(s)\mathop {}\!\mathrm {d}s. \end{aligned}$$

We use its discrete pendant

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\kappa _j\Big (T_{c_{(j+1):d}^f}\big (\eta \big (\varPsi (s_i)\big )\big )\Big )\varPsi '(s_i), \end{aligned}$$

where $s_i=\varPsi ^{-1}(\varepsilon )+(i-1)\frac{\varPsi ^{-1}(1-\varepsilon )-\varPsi ^{-1}(\varepsilon )}{n-1}$ for $i=1,\ldots ,n$. Regarding the choice of $\varPsi $, all results in this paper are obtained using

$$\begin{aligned} \varPsi _a:[0,1]\rightarrow [0,1],\quad \varPsi _a(t):=\frac{\varPhi (2a(t-0.5))-\varPhi (-a)}{2\varPhi (a)-1} \end{aligned}$$

with shape parameter $a>0$, where $\varPhi $ is the standard normal distribution function. Figure 5 shows the graph of $\varPsi _a$ for different values of a. We see that larger values of a imply more points being transformed into the tails. Having tested different values for a, we found that $a=4$ yields the best overall results. Therefore, we consistently use $a=4$.

Appendix 6: Finding the diagonal with the highest weight

1.1 Procedure 1: Finding a starting value

The idea behind the following heuristic is that a diagonal has a higher weight if its points have high probability implied by the copula density. Hence, the diagonal should reflect the dependence structure of the variables. The unconditional dependence in a vine captures most of the total dependence and is easy to interpret. For example, if $U_i$ and $U_j$ are positively dependent (i.e., $\tau _{i,j}>0$) and $U_j$ and $U_k$ are negatively dependent (i.e., $\tau _{j,k}<0$), then it seems plausible that $U_i$ and $U_k$ are negatively dependent. This concept can be extended to arbitrary dimensions.

1.
Take each variable to be a node in an empty graph.
2.
Consider the last row of the structure matrix, encoding the unconditional pair-copulas. Connect two nodes by an edge if the dependence of the corresponding variables is described by one of those copulas.
3.
Assign a “+” to node 1.
4.
As long as not all nodes have been assigned a sign, repeat:
1. (a)
  For each node that has been assigned a sign in the previous step, consider its neighborhood.
2. (b)
  If the root node has a “+,” then assign to the neighbor node the sign of the Kendall’s $\tau $ of the pair-copula connecting the root and neighbor node, else the opposite sign.
5.
The resulting direction vector ${\mathbf {v}}\in \left\{ -1,1\right\} ^d$ has entries $v_i$ which are 1 or $-1$ if node i is has been assigned a “$+$” or a “−,” respectively.

Note that if we had assigned a “−” to node 1 in step 3, we would have ended up with $-v$ instead of v, implying the same diagonal.

To illustrate the procedure from above, we consider a nine-dimensional example: Let ${\mathscr {R}}$ be a vine copula with density c, where the following (unconditional) pair-copulas are specified:

Table 9 Specification of the pair-copulas with empty conditioning set

Full size table

Now, we take an empty graph with node 1 to 9 and add edges (i, j) if $c_{i,j}$ is specified in Table 9. The result is a tree on the nodes 1 to 9 (see Fig. 6). We assign a “$+$” to node 1 and consider its neighborhood $\left\{ 2,3\right\} $ as there are still nodes without a sign. Since $\tau _{1,2}<0$ and the root node 1 has been assigned a “$+$”, node 2 gets a “−”. Node 3 is assigned a “$+$”. Next, we repeat this procedure for the neighborhoods of nodes 2 and 3. Iterating in this way until all nodes have been assigned a “$+$” or a “−” we obtain what is shown in Fig. 6. The resulting direction vector is given by ${\mathbf {v}}=(1,-1,1,1,-1,-1,-1,1,-1)'$.

1.2 Procedure 2: Local search for better candidates

Having found a diagonal through Procedure 1 (Appendix “Procedure 1: Finding a starting value”), we additionally perform the following steps in order to look whether there is a diagonal with even higher weight in the “neighborhood” of ${\mathbf {v}}$.

1.
Consider a candidate diagonal vector ${\mathbf {v}}\in \left\{ 1,-1\right\} ^d$ with corresponding weight $\lambda _c^{(0)}$.
2.
For $j=1,\ldots ,d$, calculate the weight $\lambda _c^{(j)}$ corresponding to ${\mathbf {v}}_j\in \left\{ 1,-1\right\} ^d$, where ${\mathbf {v}}_j$ is equal to ${\mathbf {v}}$ with the sign of the jth entry being reversed.
3.
If $\max _i \lambda _c^{(i)}>\lambda _c^{(0)}$, take ${\mathbf {v}}:={\mathbf {v}}_k$ with $k={{\mathrm{arg max}}}_i \lambda _c^{(i)}$ to be the new candidate for the (local) maximum.
4.
Repeat the steps 1–3 until a (local) maximum is found, i.e., $\max _i \lambda _c^{(i)}\le \lambda _c^{(0)}$.

Although there is no guarantee that we really find the global maximum of the diagonal weights, this procedure in any case finds a local maximum. Starting with a very plausible choice of ${\mathbf {v}}$, it is highly likely that we end up with the “right” diagonal.

In step 2, the weight of numerous diagonals has to be calculated. For a fast determination of these weights, it is reasonable to approximate the integral in Eq. (3.13) by

$$\begin{aligned} \lambda _c(D)\approx \frac{1}{n}\sum _{i=1}^n c(\mathbf {\varvec{\gamma }}(t_i))\left\| {\dot{\mathbf {\varvec{\gamma }}}}(t_i)\right\| , \end{aligned}$$

where $0<t_1<t_2<\ldots<t_n<1$ is an equidistant discretization of [0, 1].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Killiches, M., Kraus, D. & Czado, C. Model distances for vine copulas in high dimensions. Stat Comput 28, 323–341 (2018). https://doi.org/10.1007/s11222-017-9733-y

Download citation

Received: 20 April 2016
Accepted: 30 January 2017
Published: 11 February 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s11222-017-9733-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model distances for vine copulas in high dimensions

Abstract

Access this article

Similar content being viewed by others