Abstract
A new methodology for selecting a Bayesian network for continuous data outside the widely used class of multivariate normal distributions is developed. The ‘copula DAGs’ combine directed acyclic graphs and their associated probability models with copula C/D-vines. Bivariate copula densities introduce flexibility in the joint distributions of pairs of nodes in the network. An information criterion is studied for graph selection tailored to the joint modeling of data based on graphs and copulas. Examples and simulation studies show the flexibility and properties of the method.
Similar content being viewed by others
References
Aas, K., Czado, C., Frigessi, A., Bakken, H.: Pair-copula constructions of multiple dependence. Insurance 44(2), 182–198 (2009)
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B., Csáki, F. (eds.) Second International Symposium on Information Theory, pp. 267–281. Akadémiai Kiadó, Budapest (1973)
Barber, D.: Bayesian Reasoning and Machine Learning. Cambridge University Press, Cambridge (2012)
Bauer, A., Czado, C., Klein, T.: Pair-copula constructions for non-Gaussian DAG models. Can. J. Stat. 40(1), 86–109 (2012)
Bedford, T., Cooke, R.M.: Probability density decomposition for conditionally dependent random variables modeled by vines. Ann. Math. Artif. Intell. 32(1–4), 245–268 (2001)
Bedford, T., Cooke, R.M.: Vines—a new graphical model for dependent random variables. Ann. Stat. 30(4), 1031–1068 (2002)
Brechmann, E., Czado, C.: Risk management with high-dimensional vine copulas: an analysis of the Euro Stoxx 50. Stat. Risk Model. 30(4), 307–342 (2013)
Brechmann, E., Schepsmeier, U.: Modeling dependence with C- and D-vine copulas: the R package CDVine. J. Stat. Softw. 52(3), 1–27 (2013)
Brechmann, E.C., Czado, C., Aas, K.: Truncated regular vines in high dimensions with applications to financial data. Can. J. Stat. 40(1), 68–85 (2012)
Chickering, D.: Optimal structure identification with greedy search. J. Mach. Learn. Res. 3, 507–554 (2002)
Clarke, K.: Nonparametric model discrimination in international relations. J. Confl. Resolut. 47(1), 72–93 (2003)
Core Team, R.: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (2014)
Cox, D., Wermuth, N.: Multivariate Dependencies: Models, Analysis and Interpretation. Chapman & Hall/CRC, London (1996)
Czado, C.: Pair-copula constructions of multivariate copulas. In: Jaworki, P., Durante, F., Härdle, W., Rychlik, W. (eds.) Copula Theory and its Applications, pp. 93–109. Springer, Berlin (2010)
Czado, C., Gärtner, F., Min, A.: Analysis of Australian electricity loads using joint Bayesian inference of D-vines with autoregressive margins. In: Kurowicka, D., Joe, H. (eds.) Vine Copula Handbook, pp. 265–280. World Scientific Publishing, Singapore (2011)
Czado, C., Schepsmeier, U., Min, A.: Maximum likelihood estimation of mixed C-vines with application to exchange rates. Stat. Model. 12(3), 229–255 (2012)
Dißmann, J., Brechmann, E., Czado, C., Kurowicka, D.: Selecting and estimating regular vine copulae and application to financial returns. Comput. Stat. Data Anal. 59, 52–69 (2013)
Drton, M., Perlman, M.: A SINful approach to Gaussian graphical model selection. J. Stat. Plan. Inference 138(4), 1179–1200 (2008)
Elidan, G.: Copula Bayesian networks. In: Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds) In: Proceesdings of Advances in Neural Information Processing Systems 23 (NIPS 2010), pp. 559–567 (2010)
Elidan, G.: Lightning-speed structure learning of nonlinear continuous networks. J. Mach. Learn. Res. Proc. Track 22, 355–363 (2012)
Geiger, D., Verma, T., Pearl, J.: Identifying independence in Bayesian networks. Networks 20(5), 507–534 (1990)
Genest, C., Favre, A.: Everything you always wanted to know about copula modeling but were afraid to ask. J. Hydrol. Eng. 12(4), 347–368 (2007)
Gijbels, I., Veraverbeke, N., Omelka, M.: Conditional copulas, association measures and their applications. Comput. Stat. Data Anal. 55(5), 1919–1932 (2011)
Hanea, A.M.: Non-parameteric bayesian belief nets versus vines. In: Kurowicka, D., Joe, H. (eds.) Vine Copula Handbook, Dependence Modeling, pp. 281–303. World Scientific Publishing, Singapore (2011)
Hanea, A.M., Kurowicka, D., Cooke, R.M., Ababei, D.A.: Mining and visualising ordinal data with non-parametric continuous BBNs. Comput. Stat. Data Anal. 54(3), 668–687 (2010)
Harris, N., Drton, M.: PC algorithm for nonparanormal graphical models. J. Mach. Learn. Res. 14, 3365–3383 (2013)
Heckerman, D., Geiger, D.: Learning Bayesian networks: a unification for discrete and Gaussian domains. In: Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, pp. 274–284 (1995)
Hobæk Haff, I.: Parameter estimation for pair-copula constructions. Bernoulli 19(2), 462–491 (2013)
Hofert, M., Kojadinovic, I., Maechler, M., Yan, J.: copula: Multivariate dependence with copulas. R package version 0.999-10 (2014)
Jalali, A., Ravikumar, P., Vasuki, V., Sanghavi, S.: On learning discrete graphical models using group-sparse regularization. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (2010)
Joe, H.: Families of \(m\) bivariate dependence parameters. In: Rüschendorf, L., Schweizer, B., Taylor, M. (eds) Distributions with Fixed Marginals and Related Topics, Lecture Notes-Monograph Series, vol 28, Institute of Mathematical Statistics, pp. 120–141 (1996)
Kalisch, M., Bühlmann, P.: High-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8, 613–636 (2007)
Kalisch, M., Mächler, M., Colombo, D., Maathuis, M.H., Bühlmann, P.: Causal inference using graphical models with the R package pcalg. J. Stat. Softw. 47(11), 1–26 (2012)
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge (2009)
Kurowicka, D., Cooke, R.: The vine copula method for representing high dimensional dependent distributions: applications to continuous belief nets. In: Yücesan, E., Chen, C.H., Snowdon, J.L., Chames, J.M. (eds) The Winter Simulation Conference, IEEE Press, Piscataway, pp. 270–278 (2002)
Kurowicka, D., Cooke, R.: Uncertainty Analysis with High Dimensional Dependence Modelling. Wiley, Chichester (2006)
Lauritzen, S.: Graphical Models. Oxford University Press, Oxford (1996)
Lee, J., Hastie, T.: Learning the structure of mixed graphical models. J. Comput. Graph. Stat. 24(1), 230–253 (2012)
Lichman, M.: UCI machine learning repository. University of California, School of Information and Computer Sciences, Irvine. http://archive.ics.uci.edu/ml (2013)
Liu, H., Lafferty, J., Wasserman, L.: The nonparanormal: semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 10, 2295–2328 (2009)
Loh, P.L., Wainwright, M.J.: Structure estimation for discrete graphical models: generalized covariance matrices and their inverses. Ann. Stat. 41(6), 3022–3049 (2013)
Lucas, P.J.: Biomedical applications of Bayesian networks. In: Lucas, P.J.F., Gámez, J., Salmerón Cerdan, A. (eds.) Advances in Probabilistic Graphical Models, Studies in Fuzziness and Soft Computing, pp. 333–358. Springer, Berlin (2007)
Madsen, A.L., Kjærulff, U.B.: Applications of HUGIN to diagnosis and control of autonomous vehicles. In: Lucas, P.J.F., Gámez, J., Salmerón Cerdan, A. (eds.) Advances in Probabilistic Graphical Models, Studies in Fuzziness and Soft Computing, vol. 214, pp. 313–332. Springer, Berlin (2007)
Mari, D., Kotz, S.: Correlation and Dependence. Imperial College Press, London (2001)
Min, A., Czado, C.: Bayesian model selection for multivariate copulas using pair-copula constructions. J. Financ. Econ. 8(4), 511–546 (2010)
Min, A., Czado, C.: Bayesian model selection for D-vine pair-copula constructions. Can. J. Stat. 39(2), 239–258 (2011)
Morales Nápoles, O.: Bayesian belief nets and vines in aviation safety and other applications. PhD Thesis, Technische Universiteit Delft (2010)
Nelsen, R.B.: An Introduction to Copulas. Springer, Berlin (2006)
Okhrin, O., Ristig, A.: Hierarchical Archimedean copulae: the HAC package. J. Stat. Softw. 58(4), 1–20 (2014)
Peshkin, L., Pfefer, A., Savova, V.: Bayesian nets in syntactic categorization of novel words. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Association for Computational Linguistics, vol. 2, pp. 79–81 (2003)
Schepsmeier, U., Stoeber, J., Brechmann, E.C., Graeler, B.: VineCopula: statistical inference of vine copulas. R package version 1.3 (2014)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Scutari, M.: Learning bayesian networks with the bnlearn R package. J. Stat. Softw. 35(3), 1–22 (2010)
Sin, C., White, H.: Information criteria for selecting possibly misspecified parametric models. J. Econ. 71(1–2), 207–225 (1996)
Sklar, A.: Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris 8, 229–231 (1959)
Smith, M., Min, A., Almeida, C., Czado, C.: Modeling longitudinal data using a pair-copula construction decomposition of serial dependence. J. Am. Stat. Assoc. 105, 1467–1479 (2010)
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction and Search, 2nd edn. MIT Press, Cambridge (2000)
Vuong, Q.H.: Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57(2), 307–333 (1989)
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)
Yang, E., Ravikumar, P.K., Allen, G.I., Liu, Z.: Graphical models via generalized linear models. In: Bartlett P, Pereira F, Burges C, Bottou L, Weinberger K (eds) In: Proceedings of Advances in Neural Information Processing Systems, (NIPS 2012), pp. 1367–1375 (2012)
Acknowledgments
We wish to thank the reviewers for their comments. We thank A. Hanea and D. Ababei for providing the software of their procedure. We acknowledge the support of the Fund for Scientific Research Flanders, KU Leuven grant GOA/12/14 and of the IAP Research Network P7/06 of the Belgian Science Policy. The computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Hercules Foundation and the Flemish Government—Department EWI.
Author information
Authors and Affiliations
Corresponding author
Appendix: Technical details
Appendix: Technical details
Assumptions of Proposition 4.1 adapted from Sin and White (1996). For every node l in the graph, define let \(q_{lk}(\cdot ,\varvec{\theta })=\log CV_{l,k}(\varvec{\theta }_{CV_l})-\log DV_{l,k}(\varvec{\theta }_{DV_l})\), define \(\tilde{q}_{lk}(\cdot ,\varvec{\theta })=\log DV_{l,k}(\varvec{\theta }_{DV_l})\) and \(\text {log-Lik}(\cdot ,\varvec{\theta };\text {node}_l)\equiv Q_{ln}(\cdot ,\varvec{\theta })=\sum _{k=1}^{n}q_{lk}(\cdot ,\varvec{\theta })\) with \(\varvec{\theta }=(\varvec{\theta }_{CV_l},\varvec{\theta }_{DV_l})\) and \(k=1,2,\ldots , n\). For ease of exposition we state general conditions that need to be satisfied by \(q_{lk}(\cdot ,\varvec{\theta }), \ \tilde{q}_{lk}(\cdot ,\varvec{\theta }), \ Q_{ln}(\cdot ,\varvec{\theta })\) and \(\varvec{\theta }\) for every model m.
Let \((\Theta ,\mathcal {F},P)\) be a complete probability space and \(\Theta \) be a compact subset of \(\mathbb {R}^{d}\) with \(d\in \mathbb {N}\). For all \(n\in \mathbb {N}\) let \(Q_{ln}:\Omega \times \Theta \rightarrow \mathbb {R}\) be such that:
-
i
\(\forall \varvec{\theta }\in \Theta , \ Q_{ln}(\cdot ,\varvec{\theta })\) is \(\mathcal {F}\)-measurable.
-
ii
\(\forall \omega \in A\in \mathcal {F}\) with \(P(A)=1, \ Q_{ln}(\omega ,\cdot )\) is continuously differentiable on \(\Theta \).
-
iii
The expectation \(E(Q_{ln}(\cdot ,\varvec{\theta }))\) exists and defines a function which is continuously differentiable on \(\Theta \) and \(\bigtriangledown E(Q_{ln}(\cdot ,\varvec{\theta }))=E(\bigtriangledown Q_{ln}(\cdot ,\varvec{\theta }))\) where \(\bigtriangledown \) is the gradient operator.
-
iv
The least false parameter defined by \(\varvec{\theta }_{0n}=\arg \sup _{\varvec{\theta }\in \Theta }\frac{1}{n}E(Q_{ln}(\cdot ,\varvec{\theta }))\) is interior to \(\Theta \) uniformly (in n).
-
v
Given \(\epsilon >0\) there exists \(N_0(\epsilon )<\infty \) and \(\delta (\epsilon )>0\) such that \(\inf \{\min \{K_n^{*}(\varvec{\theta }):\varvec{\theta }\in \mathcal {N}_n^{*}(\epsilon )^{c}\},n>N_0(\epsilon )\} \equiv \delta (\epsilon )\), where \(K_n^{*}(\varvec{\theta })\equiv n^{-1}E(Q_{ln}(\cdot ,\varvec{\theta }_{0n}))-n^{-1}E(Q_{ln}(\cdot ,\varvec{\theta })), \ \mathcal {N}_n^{*}(\epsilon )^{c}\) is the compact complement of \(\mathcal {N}_n^{*}(\epsilon ) \equiv \mathcal {S}^{*}_n(\epsilon )\cap \Theta \) in \(\Theta \) and \(\mathcal {S}^{*}_n(\epsilon )\) is an open sphere centered at \(\varvec{\theta }_{0n}\) with fixed radius \(\epsilon \).
-
vi
For P-almost all \(\omega , \ q_{lk}(\omega ,\cdot )\) is twice continuously differentiable as a function of \(\varvec{\theta }\), for \(k=1,2,\ldots \)
-
vii
\(q_{lk}\) and \(\tilde{q}_{lk}\) satisfy a uniform weak law of large numbers (UWLLN) on \(\Theta \).
-
viii
Each element of \(\bigtriangledown q_{lk}(\cdot ,\varvec{\theta }_{0n})\) satisfies a central limit theorem.
-
ix
\(\exists \epsilon , \alpha >0\) such that for P-almost all \(\omega \) and for all n sufficiently large and for all \(\varvec{\theta } \in \mathcal {N}_n^{*}(\epsilon ), \det (n^{-1}\bigtriangledown ^{2}Q_{ln}(\omega ,\varvec{\theta }))\ge \alpha \), with \(\mathcal {N}^{*}_n(\epsilon )\) as in Asumption v.
-
x
For all n sufficiently large and for all \(\varvec{\theta } \in \mathcal {N}_n^{*}(\epsilon ), E[n^{-1}\bigtriangledown ^2Q_{ln}(\cdot ,\varvec{\theta })]\) is \(\varvec{O}(1)\).
-
xi
Each element of \(\bigtriangledown ^2 q_{ln}\) satisfies a UWLLN on \(\mathcal {N}^{*}_n(\epsilon )\).
We assume that the copula densities are such that the above conditions are satisfied. These are basic assumptions that guarantee that \(\hat{\varvec{\theta }}_n-\varvec{\theta }_{0n}=\varvec{O}_p(n^{-1/2})\) and \(Q_n(\cdot ,\hat{\varvec{\theta }}_n)-Q_n(\cdot ,\varvec{\theta }_{0n})=\varvec{O}_p(1)\). The asymptotic normality of \(\sqrt{n}(\hat{\varvec{\theta }}_n-\varvec{\theta }_{0n})\) for the models we consider has been shown in Hobæk Haff (2013).
1.1 Penalty conditions in Lemma 4 for the penalty in cDAG-IC
Proof
Define \({\varDelta }\widehat{\hbox {pen}}_{\mathrm{cDAG}} = \widehat{\hbox {pen}}_{\mathrm{cDAG}}^1(n,\hat{\varvec{\theta }}^1)- \widehat{\hbox {pen}}_{\mathrm{cDAG}}^2(n,\hat{\varvec{\theta }}^2)\). For (i) it holds that
The first equality holds due to Assumption vii.
For (ii) and (iii) it follows that
By the assumed positiveness of the penalty difference, the conditions hold. \(\square \)
Definition of ‘d-separation’ between \(\mathcal {X}\) and \(\mathcal {Y}\) by \(\mathcal {Z}\) (Barber 2012). For every node \(x \in \mathcal {X}\) and \(y \in \mathcal {Y}\), check every path \(\mathcal {U}\) between x and y (that is, a sequence of nodes that starts in x and by following the directionality of the arrows leads to y). A path \(\mathcal {U}\) is blocked if there is a node w in \(\mathcal {U}\) such that either: (i) w is a collider (a collider node has two incoming arrows to it) and neither w nor any of its descendants is in \(\mathcal {Z}\), or (ii) w is not a collider on \(\mathcal {U}\) and w is in \(\mathcal {Z}\). If all such paths are blocked then the sets of nodes \(\mathcal {X}\) and \(\mathcal {Y}\) are d-separated by \(\mathcal {Z}\). If the sets of nodes \(\mathcal {X}\) and \(\mathcal {Y}\) are d-separated by \(\mathcal {Z}\), they are independent conditional on \(\mathcal {Z}\).
Rights and permissions
About this article
Cite this article
Pircalabelu, E., Claeskens, G. & Gijbels, I. Copula directed acyclic graphs. Stat Comput 27, 55–78 (2017). https://doi.org/10.1007/s11222-015-9599-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-015-9599-9