Abstract
We propose a novel structure selection method for high-dimensional (\(d > 100\)) sparse vine copulas. Current sequential greedy approaches for structure selection require calculating spanning trees in hundreds of dimensions and fitting the pair copulas and their parameters iteratively throughout the structure selection process. Our method uses a connection between the vine and structural equation models. The later can be estimated very fast using the Lasso, also in very high dimensions, to obtain sparse models. Thus, we obtain a structure estimate independently of the chosen pair copulas and parameters. Additionally, we define the novel concept of regularization paths for R-vine matrices. It relates sparsity of the vine copula model in terms of independence copulas to a penalization coefficient in the structural equation models. We illustrate our approach and provide many numerical examples. These include simulations and data applications in high dimensions, showing the superiority of our approach to other existing methods.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11222-018-9807-5/MediaObjects/11222_2018_9807_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11222-018-9807-5/MediaObjects/11222_2018_9807_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11222-018-9807-5/MediaObjects/11222_2018_9807_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11222-018-9807-5/MediaObjects/11222_2018_9807_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11222-018-9807-5/MediaObjects/11222_2018_9807_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11222-018-9807-5/MediaObjects/11222_2018_9807_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11222-018-9807-5/MediaObjects/11222_2018_9807_Fig7_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11222-018-9807-5/MediaObjects/11222_2018_9807_Fig8_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11222-018-9807-5/MediaObjects/11222_2018_9807_Fig9_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11222-018-9807-5/MediaObjects/11222_2018_9807_Fig10_HTML.gif)
Similar content being viewed by others
References
Aas, K.: Pair-copula constructions for financial applications: a review. Econometrics 4(4), 43 (2016)
Aas, K., Czado, C., Frigessi, A., Bakken, H.: Pair-copula constructions of multiple dependence. Insur. Math. Econ. 44, 182–198 (2009)
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csaki, F. (eds.) Proceedings of the Second International Symposium on Information Theory Budapest, pp. 267–281. Akademiai Kiado, Budapest (1973)
Andersson, S.A., Perlman, M.D.: Normal linear regression models with recursive graphical markov structure. J. Multivar. Anal. 66, 133–187 (1998)
Bauer, A., Czado, C.: Pair-Copula Bayesian networks. J. Comput. Graph. Stat. 25(4), 1248–1271 (2016). https://doi.org/10.1080/10618600.2015.1086355
Bedford, T., Cooke, R.: Probability density decomposition for conditionally dependent random variables modeled by vines. Ann. Math. Artif. Intell. 32, 245–268 (2001)
Bedford, T., Cooke, R.: Vines—a new graphical model for dependent random variables. Ann. Stat. 30(4), 1031–1068 (2002)
Bollen, K.A.: Structural Equations with Latent Variables, 1st edn. Wiley, Chicester (1989)
Brechmann, E., Czado, C., Aas, K.: Truncated regular vines in high dimensions with application to financial data. Can. J. Stat. 40, 68–85 (2012)
Brechmann, E.C., Joe, H.: Parsimonious parameterization of correlation matrices using truncated vines and factor analysis. Comput. Stat. Data Anal. 77, 233–251 (2014)
Brechmann, E.C., Schepsmeier, U.: Modeling dependence with C- and D-vine copulas: the R package CDVine. J. Stat. Softw. 52(3), 1–27 (2013), http://www.jstatsoft.org/v52/i03/
Dißmann, J., Brechmann, E., Czado, C., Kurowicka, D.: Selecting and estimating regular vine copulae and application to financial returns. Comput. Stat. Data Anal. 52(1), 52–59 (2013)
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432 (2008). https://doi.org/10.1093/biostatistics/kxm045
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010), http://www.jstatsoft.org/v33/i01/
Frommlet F, Chakrabarti A, Murawska M, Bogdan M (2011) Asymptotic Bayes optimality under sparsity for generally distributed effect sizes under the alternative. Technical report, arXiv:1005.4753
Gruber L, Czado C (2015a) Bayesian model selection of regular vine copulas. Preprint https://www.statistics.ma.tum.de/fileadmin/w00bdb/www/LG/bayes-vine.pdf
Gruber, L., Czado, C.: Sequential bayesian model selection of regular vine copulas. Bayesian Anal. 10, 937–963 (2015b)
Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity The Lasso and Generalizations. CRC Press, Boca Raton (2015)
Hoyle, R.H.: Structural Equation Modeling, 1st edn. SAGE Publications, Thousand Oaks (1995)
Joe, H.: Dependence Modeling with Copulas. Chapman & Hall/ CRC, London (2014)
Kaplan, D.: Structural Equation Modeling: Foundations and Extensions, 2nd edn. SAGE Publications, Thousand Oaks (2009)
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques, 1st edn. MIT Press, Cambridge, Massachusetts (2009)
Kurowicka, D., Cooke, R.: Uncertainty Analysis and High Dimensional Dependence Modelling, 1st edn. Wiley, Chicester (2006)
Kurowicka, D., Joe, H.: Dependence Modeling—Handbook on Vine Copulae. World Scientific Publishing Co., Singapore (2011)
Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34(3), 1436–1462 (2006). https://doi.org/10.1214/009053606000000281
Müller, D., Czado, C.: Representing sparse Gaussian DAGs as sparse R-vines allowing for non-Gaussian dependence. J. Comput. Graph. Stat. (2017). https://doi.org/10.1080/10618600.2017.1366911
Schepsmeier U, Stöber J, Brechmann EC, Graeler B, Nagler T, Erhardt T (2016) VineCopula: Statistical Inference of Vine Copulas. https://github.com/tnagler/VineCopula, r package version 2.0.6
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Sklar, A.: Fonctions dé repartition á n dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris 8, 229–231 (1959)
Stöber, J., Joe, H., Czado, C.: Simplified pair copula constructions-limitations and extensions. J. Multivar. Anal. 119, 101–118 (2013). https://doi.org/10.1016/j.jmva.2013.04.014
Tibshirani, R.: Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. B 58, 267–288 (1994)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301–320 (2005). https://doi.org/10.1111/j.1467-9868.2005.00503.x
Acknowledgements
The first author acknowledges financial support by a research stipend of the Technische Universität München. The second author is supported by the German Research Foundation (DFG Grant CZ 86/4-1). Numerical calculations were performed on a Linux cluster supported by DFG Grant INST 95/919-1 FUGG.
Author information
Authors and Affiliations
Corresponding author
Appendices
A Cross-validation for the Lasso
Assume a setup as introduced in Sect. 4. We divide the total data set of n observations into \(k > 1\) randomly chosen subsets \(K_1,\dots ,K_k\) such that \(\bigcup _{i=1}^k~K_i = n\). We obtain k training data sets \(S_{tr} = n \setminus K_m\) and corresponding test data sets \(S_{te} = K_m\), \(m=1,\dots ,k\). Then, the coefficient vector \(\hat{\varvec{\varphi }}_\ell = \left( {\hat{\varphi }}_1^\ell ,\dots ,{\hat{\varphi }}_p^\ell \right) \in {\mathbb {R}}^p\) is estimated for various \(\lambda _\ell ,\ \ell =1,\dots ,L\) on each of the k training sets. Now we use these L coefficient vectors to predict for each test data set the values
For these values, we also know the true values \(y_i\), \(i \in K_m\), \(m=1,\dots ,k\). Thus, we can calculate the mean squared prediction error for this pair of training and test data:
Since we have k pairs of training and test data, we obtain an estimate for the prediction error for each of the L values of \(\lambda _\ell ,\ \ell =1,\dots ,L\) by averaging:
Next, consider the dependence between \(\lambda _\ell ,\ \ell =1,\dots ,L\) and the corresponding error \(\varDelta _\ell \). A natural choice is to select \(\lambda = \lambda _\ell \) such that \(\varDelta _\ell \) is minimal in \(\left( \varDelta _1,\dots ,\varDelta _L\right) \), we denote this by \(\lambda _{min}^{CV}\). Alternatively, we choose \(\lambda _\ell \) such that it is at least in within one-standard error of the minimum, denote \(\lambda _{1se}^{CV}\). For both types of cross-validation methods; see Friedman et al. (2010) or Hastie et al (2015, p. 13).
B Additional results for the simulation study
C Algorithms
![figure m](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs11222-018-9807-5/MediaObjects/11222_2018_9807_Figm_HTML.gif)
![figure n](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs11222-018-9807-5/MediaObjects/11222_2018_9807_Fign_HTML.gif)
![figure o](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs11222-018-9807-5/MediaObjects/11222_2018_9807_Figo_HTML.gif)
Rights and permissions
About this article
Cite this article
Müller, D., Czado, C. Selection of sparse vine copulas in high dimensions with the Lasso. Stat Comput 29, 269–287 (2019). https://doi.org/10.1007/s11222-018-9807-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-018-9807-5