Skip to main content
Log in

Selection of sparse vine copulas in high dimensions with the Lasso

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We propose a novel structure selection method for high-dimensional (\(d > 100\)) sparse vine copulas. Current sequential greedy approaches for structure selection require calculating spanning trees in hundreds of dimensions and fitting the pair copulas and their parameters iteratively throughout the structure selection process. Our method uses a connection between the vine and structural equation models. The later can be estimated very fast using the Lasso, also in very high dimensions, to obtain sparse models. Thus, we obtain a structure estimate independently of the chosen pair copulas and parameters. Additionally, we define the novel concept of regularization paths for R-vine matrices. It relates sparsity of the vine copula model in terms of independence copulas to a penalization coefficient in the structural equation models. We illustrate our approach and provide many numerical examples. These include simulations and data applications in high dimensions, showing the superiority of our approach to other existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

Download references

Acknowledgements

The first author acknowledges financial support by a research stipend of the Technische Universität München. The second author is supported by the German Research Foundation (DFG Grant CZ 86/4-1). Numerical calculations were performed on a Linux cluster supported by DFG Grant INST 95/919-1 FUGG.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dominik Müller.

Appendices

A Cross-validation for the Lasso

Assume a setup as introduced in Sect. 4. We divide the total data set of n observations into \(k > 1\) randomly chosen subsets \(K_1,\dots ,K_k\) such that \(\bigcup _{i=1}^k~K_i = n\). We obtain k training data sets \(S_{tr} = n \setminus K_m\) and corresponding test data sets \(S_{te} = K_m\), \(m=1,\dots ,k\). Then, the coefficient vector \(\hat{\varvec{\varphi }}_\ell = \left( {\hat{\varphi }}_1^\ell ,\dots ,{\hat{\varphi }}_p^\ell \right) \in {\mathbb {R}}^p\) is estimated for various \(\lambda _\ell ,\ \ell =1,\dots ,L\) on each of the k training sets. Now we use these L coefficient vectors to predict for each test data set the values

$$\begin{aligned} {{\hat{y}}}_i^\ell = \sum _{j=1}^p~{\hat{\varphi }}_{j}^\ell x_{i,j},\ i \in K_m,\ m = 1,\dots ,k,\ \ell = 1,\dots ,L. \end{aligned}$$

For these values, we also know the true values \(y_i\), \(i \in K_m\), \(m=1,\dots ,k\). Thus, we can calculate the mean squared prediction error for this pair of training and test data:

$$\begin{aligned} \delta _m^\ell = \frac{1}{|K_m|} \sum _{i \in K_m}~\left( y_i - {\hat{y}}_i^\ell \right) ^2,\ m = 1,\dots ,k. \end{aligned}$$

Since we have k pairs of training and test data, we obtain an estimate for the prediction error for each of the L values of \(\lambda _\ell ,\ \ell =1,\dots ,L\) by averaging:

$$\begin{aligned} \varDelta _\ell = \frac{1}{k} \sum _{m = 1}^k~\delta _m^\ell ,\ \ell = 1,\dots ,L. \end{aligned}$$

Next, consider the dependence between \(\lambda _\ell ,\ \ell =1,\dots ,L\) and the corresponding error \(\varDelta _\ell \). A natural choice is to select \(\lambda = \lambda _\ell \) such that \(\varDelta _\ell \) is minimal in \(\left( \varDelta _1,\dots ,\varDelta _L\right) \), we denote this by \(\lambda _{min}^{CV}\). Alternatively, we choose \(\lambda _\ell \) such that it is at least in within one-standard error of the minimum, denote \(\lambda _{1se}^{CV}\). For both types of cross-validation methods; see Friedman et al. (2010) or Hastie et al (2015, p. 13).

B Additional results for the simulation study

See Figs. 910.

C Algorithms

figure m
figure n
figure o

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Müller, D., Czado, C. Selection of sparse vine copulas in high dimensions with the Lasso. Stat Comput 29, 269–287 (2019). https://doi.org/10.1007/s11222-018-9807-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-018-9807-5

Keywords

Navigation