Abstract
Causal network models are popular statistical tools to represent dependencies or causal relationships among variables in complex systems. Structural learning of causal networks is crucial to discover the causal knowledge and to infer casual effects. In this paper, we discuss structural learning of two types of graphical models, undirected graphs and directed acyclic graphs. We first introduce the methods for learning undirected graphical models. Then we discuss structural learning of directed acyclic graphs. We focus on the issues on model space of causal networks, decomposition learning of structures from observational data, local structural learning approaches and the active learning for optimal designs of intervention.
Similar content being viewed by others
References
Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos X (2010) Local causal and Markov blanket induction for causal discovery and feature selection for classification Part I: algorithms and empirical evaluation. J Mach Learn Res 11(1):171–234
Andersson SA, Madigan D, Perlman MD (1997) A characterization of Markov equivalence classes for acyclic digraphs. Ann Stat 25(2):505–541
Bai X, Glymour C (2004) Pcx: Markov blanket classification for large data sets with few cases. Tech report: CMU-CALD-04-102
Bai X, Padman R, Ramsey J, Spirtes P (2008) Tabu search-enhanced graphical models for classification in high dimensions. Inf J Comput 20(3):423–437
Banerjee O, Ghaoui LE, d’-Aspremont A (2008) Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res 9(3):485–516
Bouckaert RR (1993) Probabilistic network construction using the minimum description length principle. In: European conference on symbolic and quantitative approaches to reasoning and uncertainty
Cai T, Liu W, Luo X (2011) A constrained l 1 minimization approach to sparse precision matrix estimation. J Am Stat Assoc 106(494):594–607
Castelo R, Perlman MD (2004) Learning essential graph Markov models from data. Stud Fuzz Soft Comput 146:255–270
Chandrasekaran V, Parrilo PA, Willsky AS et al (2012) Latent variable graphical model selection via convex optimization. Ann Stat 40(4):1935–1967
Chickering DM (1995) A transformational characterization of equivalent Bayesian network structures. In: 11th conference on uncertainty in artificial intelligence, pp 87–98
Chickering DM (2002) Learning equivalence classes of Bayesian-network structures. J Mach Learn Res 2(3):445–498
Chickering DM (2003) Optimal structure identification with greedy search. J Mach Learn Res 3(3):507–554
Cooper GF, Yoo C (1999) Causal discovery from a mixture of experimental and observational data. In: 15th conference on uncertainty in artificial intellegence, pp 116–125
Dahl J, Vandenberghe L, Roychowdhury V (2008) Covariance selection for nonchordal graphs via chordal embedding. Optim Meth Softw 23(4):501–520
Dash D, Druzdzel M (1999) A hybrid anytime algorithm for the construction of causal models from sparse data. In: 15th conference on uncertainty in artificial intelligence, pp 142–149
Dempster AP (1972) Covariance selection. In: Biometrics, pp 157–175
Deng W, Geng Z, Li H (2013) Learning local directed acyclic graphs based on multivariate time series data. Ann Appl Stat 7(3):1663–1683
Drton M, Perlman MD (2004) Model selection for Gaussian concentration graphs. Biometrika 91(3):591–602
Eberhardt F, Scheines R (2007) Interventions and causal inference. Philos Sci 74(5):981–995
Edwards D (2012) Introduction to graphical modelling. Springer Science & Business Media, Berlin
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Fan Y, Xu J, Shelton CR (2010) Importance sampling for continuous time Bayesian networks. J Mach Learn Res 11(2):2115–2140
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441
Friedman N, Yakhini Z (2013) On the sample complexity of learning Bayesian networks. Comput Sci 274–282
Gillispie S, Perlman M (2002) The size distribution for Markov equivalence classes of acyclic digraph models. Artif Intell 141(1–2):137–155
Gillispie SB (2006) Formulas for counting acyclic digraph Markov equivalence classes. J Stat Plann Infer 136(4):1410–1432
Goudie RJB, Mukherjee S (2016) A Gibbs sampler for learning dags. J Mach Learn Res 17(30):1–39
Guyon I, Aliferis C, Cooper G, Elisseeff A, J Pellet PS, Statnikov A (2011) Design and analysis of the causation and prediction challenge. In: Challenges in causality. Causation and prediction challenge, vol 1, pp 1–33
Hauser A, Bühlmann P (2012a) Characterization and greedy learning of interventional markov equivalence classes of directed acyclic graphs. J Mach Learn Res 13(1):2409–2464
Hauser A, Bühlmann P (2012b) Two optimal strategies for active learning of causal models from interventional data. Int J Approx Reas 55(4):926–939
Hauser A, Bühlmann P (2015) Jointly interventional and observational data: estimation of interventional markov equivalence classes of directed acyclic graphs. J R Stat Soc Ser B (Stat Methodol) 77(1):291–318
He Y, Geng Z (2008) Active learning of causal networks with intervention experiments and optimal designs. J Mach Learn Res 9(3):2523–2547
He Y, Jia J, Yu B (2013) Reversible mcmc on markov equivalence classes of sparse directed acyclic graphs. Ann Stat 41(4):1742–1779
He Y, Jia J, Yu B (2015) Counting and exploring sizes of markov equivalence classes of directed acyclic graphs. J Mach Learn Res 16(1):2589–2609
Heckerman D, Geiger D, Chickering D (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20(3):197–243
Heckerman D, Meek C, Cooper G (1999) A Bayesian approach to causal discovery. Comput Causat Discov 143–67
Jia J, Rohe K, Yu B (2013) The lasso under Poisson-like heteroscedasticity. Stat Sinica 99–118
Jia J, Rohe K et al (2015) Preconditioning the lasso for sign consistency. Electron J Stat 9(1):1150–1172
Kalisch M, Bühlmann P (2005) Estimating high-dimensional directed acyclic graphs with the pc-algorithm. J Mach Learn Res 8(2):613–636
Klaassen CA, Wellner JA et al (1997) Efficient estimation in the bivariate normal copula model: normal margins are least favourable. Bernoulli 3(1):55–77
Lam W, Bacchus F (1993) Using causal information and local measures to learn Bayesian networks. In: International conference on uncertainty in artificial intelligence, pp 243–250
Lauritzen S (1996) Graphical models. Oxford University Press, USA
Lim C, Yu B (2016) Estimation stability with cross-validation (ESCV). J Comput Graph Stat 25(2):464–492
Liu H, Lafferty J, Wasserman L (2009) The nonparanormal: semiparametric estimation of high dimensional undirected graphs. J Mach Learn Res 10(10):2295–2328
Liu H, Han F, Yuan M, Lafferty J, Wasserman L (2012) High-dimensional semiparametric Gaussian copula graphical models. Ann Stat 40(4):2293–2326
Maathuis MH, Kalisch M, Bühlmann P (2009) Estimating high-dimensional intervention effects from observational data. Ann Stat 37(6A):3133–3164
Madigan D, Andersson S, Perlman M, Volinsky C (1996) Bayesian model averaging and model selection for markov equivalence classes of acyclic digraphs. Commun Stat Theory Meth 25(11):2493–2519
Meek C (1995) Causal inference and causal explanation with background knowledge. In: 11th conference on uncertainty in artificial intelligence, pp 403–410
Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34(3):1436–1462
Munteanu P, Bendou M (2001) The eq framework for learning equivalence classes of Bayesian networks. In: Data mining, 2001. ICDM 2001, proceedings IEEE international conference on IEEE, pp 417–424
Murphy KP (2001) Active learning of causal Bayes net structure. Technical report, Department of Computer Science, University of California, Berkeley
Pearl J (2000) Causality: models, reasoning, and inference. Cambridge University Press, Cambridge
Pearl J, Verma TS (1991) A theory of inferred causation. Prin Knowl Repres Reas Proc Second Int Conf 11:441–452
Pellet J, Elisseeff A (2008) Using Markov blankets for causal structure learning. J Mach Learn Res 9(9):1295–1342
Peña JM, Nilsson R, Björkegren J, Tegnér J (2007) Towards scalable and data efficient learning of Markov boundaries. Inter J Approxim Reas 45:211–232
Ramsey J (2006) A pc-style Markov blanket for high dimensional datasets. Technical report, CMU-PHIL-177, Carnegie Mellon University, Department of Philosophy, Pennsylvania
Ravikumar P, Wainwright MJ, Lafferty JD et al (2010) High-dimensional ising model selection using \(\ell _1\)-regularized logistic regression. Ann Stat 38(3):1287–1319
Spirtes P, Glymour C (1991) An algorithm for fast recovery of sparse causal graphs. Soc Sci Comput Rev 9(1):62–72
Spirtes P, Glymour CN, Scheines R (2000) Causation, prediction, and search, vol 81. MIT press, New York
Suzuki J (1993) A construction of Bayesian networks from databases based on an mdl principle. In: 9th international conference on uncertainty in artificial intelligence, pp 266–273
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol), 267–288
Tong S, Koller D (2001) Active learning for structure in Bayesian networks. Int Jt Conf Artif Intell Citeseer 17:863–869
Triantafillou S, Tsamardinos I (2014) Constraint-based causal discovery from multiple interventions over overlapping variable sets. J Mach Learn Res 16(1):2147–2205
Tsamardinos I, Brown LE, Aliferis CF (2006) The max-min hill-climbing Bayesian network structure learning algorithm. Mach Learn 65(1):31–78
Tsukahara H (2005) Semiparametric estimation in copula models. Can J Stat 33(3):357–375
Verma T, Pearl J (1990) Equivalence and synthesis of causal models. In: 6th international conference on uncertainty in artificial intelligence. Elsevier Science Inc., Amsterdam, p 270
Wang C, Zhou Y, Zhao Q, Geng Z (2014) Discovering and orienting the edges connected to a target variable in a dag via a sequential local learning approach. Comput Stat Data Anal 77:252–266
Whittaker J (2009) Graphical models in applied multivariate statistics. Wiley Publishing, New York
Xie X, Geng Z (2008) A recursive method for structural learning of directed acyclic graphs. J Mach Learn Res 9(3):459–483
Xie X, Geng Z, Zhao Q (2006) Decomposition of structural learning about directed acyclic graphs. Artif Intell 170(4–5):422–439
Xue L, Zou H et al (2012) Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann Stat 40(5):2541–2571
Yin J, Zhou Y, Wang C, He P, Zheng C, Geng Z (2011) Partial orientation and local structural learning of causal networks for prediction. In: Guyon I, Aliferis C, Cooper G, Elisseeff A, Pellet J, Spirtes P, Statnikov A (eds) Challenges in causality. Causation and prediction challenge, vol 1. 3:93–105
Yuan M, Lin Y (2007) Model selection and estimation in the Gaussian graphical model. Biometrika 94(1):19–35
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942
Zhang J (2008) Causal reasoning with ancestral graphs. J Mach Learn Res 9(3):1437–1474
Zhao P, Yu B (2006) On model selection consistency of lasso. J Mach Learn Res 7(11):2541–2563
Zhou Q (2013) Learning sparse causal Gaussian networks with experimental intervention: regularization and coordinate descent. J Am Stat Assoc 108(501):288–300
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Acknowledgements
We would like to thank the Editor and Reviewers for valuable comments and suggestions. This research was supported by 863 Program of China (2015AA020507), 973 Program of China (2015CB856000) and NSFC (11331011,11671020). The authors would like to thank Dr. Lan Liu for valuable discussion.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Brandon Malone.
Y. He and J. Jia have contributed equally.
About this article
Cite this article
He, Y., Jia, J. & Geng, Z. Structural learning of causal networks. Behaviormetrika 44, 287–305 (2017). https://doi.org/10.1007/s41237-017-0018-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41237-017-0018-8