Selection of sparse vine copulas in high dimensions with the Lasso

Müller, Dominik; Czado, Claudia

doi:10.1007/s11222-018-9807-5

Selection of sparse vine copulas in high dimensions with the Lasso

Published: 24 March 2018

Volume 29, pages 269–287, (2019)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

737 Accesses
10 Citations
Explore all metrics

Abstract

We propose a novel structure selection method for high-dimensional ($d > 100$) sparse vine copulas. Current sequential greedy approaches for structure selection require calculating spanning trees in hundreds of dimensions and fitting the pair copulas and their parameters iteratively throughout the structure selection process. Our method uses a connection between the vine and structural equation models. The later can be estimated very fast using the Lasso, also in very high dimensions, to obtain sparse models. Thus, we obtain a structure estimate independently of the chosen pair copulas and parameters. Additionally, we define the novel concept of regularization paths for R-vine matrices. It relates sparsity of the vine copula model in terms of independence copulas to a penalization coefficient in the structural equation models. We illustrate our approach and provide many numerical examples. These include simulations and data applications in high dimensions, showing the superiority of our approach to other existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A plug-in approach to sparse and robust principal component analysis

Article 02 November 2015

A Guide for Sparse PCA: Model Comparison and Applications

Article Open access 29 June 2021

Structured Variable Selection for Regularized Generalized Canonical Correlation Analysis

References

Aas, K.: Pair-copula constructions for financial applications: a review. Econometrics 4(4), 43 (2016)
Article MathSciNet Google Scholar
Aas, K., Czado, C., Frigessi, A., Bakken, H.: Pair-copula constructions of multiple dependence. Insur. Math. Econ. 44, 182–198 (2009)
Article MathSciNet MATH Google Scholar
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csaki, F. (eds.) Proceedings of the Second International Symposium on Information Theory Budapest, pp. 267–281. Akademiai Kiado, Budapest (1973)
Google Scholar
Andersson, S.A., Perlman, M.D.: Normal linear regression models with recursive graphical markov structure. J. Multivar. Anal. 66, 133–187 (1998)
Article MathSciNet MATH Google Scholar
Bauer, A., Czado, C.: Pair-Copula Bayesian networks. J. Comput. Graph. Stat. 25(4), 1248–1271 (2016). https://doi.org/10.1080/10618600.2015.1086355
Article MathSciNet Google Scholar
Bedford, T., Cooke, R.: Probability density decomposition for conditionally dependent random variables modeled by vines. Ann. Math. Artif. Intell. 32, 245–268 (2001)
Article MathSciNet MATH Google Scholar
Bedford, T., Cooke, R.: Vines—a new graphical model for dependent random variables. Ann. Stat. 30(4), 1031–1068 (2002)
Article MathSciNet MATH Google Scholar
Bollen, K.A.: Structural Equations with Latent Variables, 1st edn. Wiley, Chicester (1989)
Book MATH Google Scholar
Brechmann, E., Czado, C., Aas, K.: Truncated regular vines in high dimensions with application to financial data. Can. J. Stat. 40, 68–85 (2012)
Article MathSciNet MATH Google Scholar
Brechmann, E.C., Joe, H.: Parsimonious parameterization of correlation matrices using truncated vines and factor analysis. Comput. Stat. Data Anal. 77, 233–251 (2014)
Article MathSciNet MATH Google Scholar
Brechmann, E.C., Schepsmeier, U.: Modeling dependence with C- and D-vine copulas: the R package CDVine. J. Stat. Softw. 52(3), 1–27 (2013), http://www.jstatsoft.org/v52/i03/
Dißmann, J., Brechmann, E., Czado, C., Kurowicka, D.: Selecting and estimating regular vine copulae and application to financial returns. Comput. Stat. Data Anal. 52(1), 52–59 (2013)
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432 (2008). https://doi.org/10.1093/biostatistics/kxm045
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010), http://www.jstatsoft.org/v33/i01/
Frommlet F, Chakrabarti A, Murawska M, Bogdan M (2011) Asymptotic Bayes optimality under sparsity for generally distributed effect sizes under the alternative. Technical report, arXiv:1005.4753
Gruber L, Czado C (2015a) Bayesian model selection of regular vine copulas. Preprint https://www.statistics.ma.tum.de/fileadmin/w00bdb/www/LG/bayes-vine.pdf
Gruber, L., Czado, C.: Sequential bayesian model selection of regular vine copulas. Bayesian Anal. 10, 937–963 (2015b)
Article MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity The Lasso and Generalizations. CRC Press, Boca Raton (2015)
Book MATH Google Scholar
Hoyle, R.H.: Structural Equation Modeling, 1st edn. SAGE Publications, Thousand Oaks (1995)
Google Scholar
Joe, H.: Dependence Modeling with Copulas. Chapman & Hall/ CRC, London (2014)
Book MATH Google Scholar
Kaplan, D.: Structural Equation Modeling: Foundations and Extensions, 2nd edn. SAGE Publications, Thousand Oaks (2009)
Book MATH Google Scholar
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques, 1st edn. MIT Press, Cambridge, Massachusetts (2009)
MATH Google Scholar
Kurowicka, D., Cooke, R.: Uncertainty Analysis and High Dimensional Dependence Modelling, 1st edn. Wiley, Chicester (2006)
Book MATH Google Scholar
Kurowicka, D., Joe, H.: Dependence Modeling—Handbook on Vine Copulae. World Scientific Publishing Co., Singapore (2011)
Google Scholar
Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34(3), 1436–1462 (2006). https://doi.org/10.1214/009053606000000281
Article MathSciNet MATH Google Scholar
Müller, D., Czado, C.: Representing sparse Gaussian DAGs as sparse R-vines allowing for non-Gaussian dependence. J. Comput. Graph. Stat. (2017). https://doi.org/10.1080/10618600.2017.1366911
Schepsmeier U, Stöber J, Brechmann EC, Graeler B, Nagler T, Erhardt T (2016) VineCopula: Statistical Inference of Vine Copulas. https://github.com/tnagler/VineCopula, r package version 2.0.6
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Article MathSciNet MATH Google Scholar
Sklar, A.: Fonctions dé repartition á n dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris 8, 229–231 (1959)
MATH Google Scholar
Stöber, J., Joe, H., Czado, C.: Simplified pair copula constructions-limitations and extensions. J. Multivar. Anal. 119, 101–118 (2013). https://doi.org/10.1016/j.jmva.2013.04.014
Article MathSciNet MATH Google Scholar
Tibshirani, R.: Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. B 58, 267–288 (1994)
MathSciNet MATH Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301–320 (2005). https://doi.org/10.1111/j.1467-9868.2005.00503.x
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The first author acknowledges financial support by a research stipend of the Technische Universität München. The second author is supported by the German Research Foundation (DFG Grant CZ 86/4-1). Numerical calculations were performed on a Linux cluster supported by DFG Grant INST 95/919-1 FUGG.

Author information

Authors and Affiliations

Department of Mathematics, Technical University of Munich, Boltzmannstr. 3, 85748, Garching, Germany
Dominik Müller & Claudia Czado

Authors

Dominik Müller
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Czado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dominik Müller.

Appendices

A Cross-validation for the Lasso

Assume a setup as introduced in Sect. 4. We divide the total data set of n observations into $k > 1$ randomly chosen subsets $K_1,\dots ,K_k$ such that $\bigcup _{i=1}^k~K_i = n$. We obtain k training data sets $S_{tr} = n \setminus K_m$ and corresponding test data sets $S_{te} = K_m$, $m=1,\dots ,k$. Then, the coefficient vector $\hat{\varvec{\varphi }}_\ell = \left( {\hat{\varphi }}_1^\ell ,\dots ,{\hat{\varphi }}_p^\ell \right) \in {\mathbb {R}}^p$ is estimated for various $\lambda _\ell ,\ \ell =1,\dots ,L$ on each of the k training sets. Now we use these L coefficient vectors to predict for each test data set the values

$$\begin{aligned} {{\hat{y}}}_i^\ell = \sum _{j=1}^p~{\hat{\varphi }}_{j}^\ell x_{i,j},\ i \in K_m,\ m = 1,\dots ,k,\ \ell = 1,\dots ,L. \end{aligned}$$

For these values, we also know the true values $y_i$, $i \in K_m$, $m=1,\dots ,k$. Thus, we can calculate the mean squared prediction error for this pair of training and test data:

$$\begin{aligned} \delta _m^\ell = \frac{1}{|K_m|} \sum _{i \in K_m}~\left( y_i - {\hat{y}}_i^\ell \right) ^2,\ m = 1,\dots ,k. \end{aligned}$$

Since we have k pairs of training and test data, we obtain an estimate for the prediction error for each of the L values of $\lambda _\ell ,\ \ell =1,\dots ,L$ by averaging:

$$\begin{aligned} \varDelta _\ell = \frac{1}{k} \sum _{m = 1}^k~\delta _m^\ell ,\ \ell = 1,\dots ,L. \end{aligned}$$

Next, consider the dependence between $\lambda _\ell ,\ \ell =1,\dots ,L$ and the corresponding error $\varDelta _\ell $. A natural choice is to select $\lambda = \lambda _\ell $ such that $\varDelta _\ell $ is minimal in $\left( \varDelta _1,\dots ,\varDelta _L\right) $, we denote this by $\lambda _{min}^{CV}$. Alternatively, we choose $\lambda _\ell $ such that it is at least in within one-standard error of the minimum, denote $\lambda _{1se}^{CV}$. For both types of cross-validation methods; see Friedman et al. (2010) or Hastie et al (2015, p. 13).

B Additional results for the simulation study

See Figs. 9, 10.

C Algorithms

Rights and permissions

Reprints and permissions

About this article

Cite this article

Müller, D., Czado, C. Selection of sparse vine copulas in high dimensions with the Lasso. Stat Comput 29, 269–287 (2019). https://doi.org/10.1007/s11222-018-9807-5

Download citation

Received: 02 May 2017
Accepted: 27 February 2018
Published: 24 March 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s11222-018-9807-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Selection of sparse vine copulas in high dimensions with the Lasso

Abstract

Access this article

Similar content being viewed by others

A plug-in approach to sparse and robust principal component analysis

A Guide for Sparse PCA: Model Comparison and Applications

Structured Variable Selection for Regularized Generalized Canonical Correlation Analysis

References

Acknowledgements