Structural learning of causal networks

He, Yangbo; Jia, Jinzhu; Geng, Zhi

doi:10.1007/s41237-017-0018-8

Structural learning of causal networks

Invited Paper
Published: 14 February 2017

Volume 44, pages 287–305, (2017)
Cite this article

Behaviormetrika Aims and scope Submit manuscript

Yangbo He¹,
Jinzhu Jia¹ &
Zhi Geng¹

524 Accesses
5 Citations
Explore all metrics

Abstract

Causal network models are popular statistical tools to represent dependencies or causal relationships among variables in complex systems. Structural learning of causal networks is crucial to discover the causal knowledge and to infer casual effects. In this paper, we discuss structural learning of two types of graphical models, undirected graphs and directed acyclic graphs. We first introduce the methods for learning undirected graphical models. Then we discuss structural learning of directed acyclic graphs. We focus on the issues on model space of causal networks, decomposition learning of structures from observational data, local structural learning approaches and the active learning for optimal designs of intervention.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Causal Structure Learning: A Combinatorial Perspective

Article Open access 01 August 2022

Structural learning and estimation of joint causal effects among network-dependent variables

Article Open access 02 August 2021

Non-Gaussian Methods for Causal Structure Learning

Article 22 May 2018

References

Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos X (2010) Local causal and Markov blanket induction for causal discovery and feature selection for classification Part I: algorithms and empirical evaluation. J Mach Learn Res 11(1):171–234
MathSciNet MATH Google Scholar
Andersson SA, Madigan D, Perlman MD (1997) A characterization of Markov equivalence classes for acyclic digraphs. Ann Stat 25(2):505–541
Article MathSciNet MATH Google Scholar
Bai X, Glymour C (2004) Pcx: Markov blanket classification for large data sets with few cases. Tech report: CMU-CALD-04-102
Bai X, Padman R, Ramsey J, Spirtes P (2008) Tabu search-enhanced graphical models for classification in high dimensions. Inf J Comput 20(3):423–437
Article MathSciNet MATH Google Scholar
Banerjee O, Ghaoui LE, d’-Aspremont A (2008) Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res 9(3):485–516
MathSciNet MATH Google Scholar
Bouckaert RR (1993) Probabilistic network construction using the minimum description length principle. In: European conference on symbolic and quantitative approaches to reasoning and uncertainty
Cai T, Liu W, Luo X (2011) A constrained l 1 minimization approach to sparse precision matrix estimation. J Am Stat Assoc 106(494):594–607
Article MathSciNet MATH Google Scholar
Castelo R, Perlman MD (2004) Learning essential graph Markov models from data. Stud Fuzz Soft Comput 146:255–270
Article MathSciNet Google Scholar
Chandrasekaran V, Parrilo PA, Willsky AS et al (2012) Latent variable graphical model selection via convex optimization. Ann Stat 40(4):1935–1967
Article MathSciNet MATH Google Scholar
Chickering DM (1995) A transformational characterization of equivalent Bayesian network structures. In: 11th conference on uncertainty in artificial intelligence, pp 87–98
Chickering DM (2002) Learning equivalence classes of Bayesian-network structures. J Mach Learn Res 2(3):445–498
MathSciNet MATH Google Scholar
Chickering DM (2003) Optimal structure identification with greedy search. J Mach Learn Res 3(3):507–554
MathSciNet MATH Google Scholar
Cooper GF, Yoo C (1999) Causal discovery from a mixture of experimental and observational data. In: 15th conference on uncertainty in artificial intellegence, pp 116–125
Dahl J, Vandenberghe L, Roychowdhury V (2008) Covariance selection for nonchordal graphs via chordal embedding. Optim Meth Softw 23(4):501–520
Article MathSciNet MATH Google Scholar
Dash D, Druzdzel M (1999) A hybrid anytime algorithm for the construction of causal models from sparse data. In: 15th conference on uncertainty in artificial intelligence, pp 142–149
Dempster AP (1972) Covariance selection. In: Biometrics, pp 157–175
Deng W, Geng Z, Li H (2013) Learning local directed acyclic graphs based on multivariate time series data. Ann Appl Stat 7(3):1663–1683
Article MathSciNet MATH Google Scholar
Drton M, Perlman MD (2004) Model selection for Gaussian concentration graphs. Biometrika 91(3):591–602
Article MathSciNet MATH Google Scholar
Eberhardt F, Scheines R (2007) Interventions and causal inference. Philos Sci 74(5):981–995
Article MathSciNet Google Scholar
Edwards D (2012) Introduction to graphical modelling. Springer Science & Business Media, Berlin
MATH Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet MATH Google Scholar
Fan Y, Xu J, Shelton CR (2010) Importance sampling for continuous time Bayesian networks. J Mach Learn Res 11(2):2115–2140
MathSciNet MATH Google Scholar
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441
Article MATH Google Scholar
Friedman N, Yakhini Z (2013) On the sample complexity of learning Bayesian networks. Comput Sci 274–282
Gillispie S, Perlman M (2002) The size distribution for Markov equivalence classes of acyclic digraph models. Artif Intell 141(1–2):137–155
Article MathSciNet MATH Google Scholar
Gillispie SB (2006) Formulas for counting acyclic digraph Markov equivalence classes. J Stat Plann Infer 136(4):1410–1432
Article MathSciNet MATH Google Scholar
Goudie RJB, Mukherjee S (2016) A Gibbs sampler for learning dags. J Mach Learn Res 17(30):1–39
MathSciNet MATH Google Scholar
Guyon I, Aliferis C, Cooper G, Elisseeff A, J Pellet PS, Statnikov A (2011) Design and analysis of the causation and prediction challenge. In: Challenges in causality. Causation and prediction challenge, vol 1, pp 1–33
Hauser A, Bühlmann P (2012a) Characterization and greedy learning of interventional markov equivalence classes of directed acyclic graphs. J Mach Learn Res 13(1):2409–2464
MathSciNet MATH Google Scholar
Hauser A, Bühlmann P (2012b) Two optimal strategies for active learning of causal models from interventional data. Int J Approx Reas 55(4):926–939
Article MathSciNet MATH Google Scholar
Hauser A, Bühlmann P (2015) Jointly interventional and observational data: estimation of interventional markov equivalence classes of directed acyclic graphs. J R Stat Soc Ser B (Stat Methodol) 77(1):291–318
Article MathSciNet Google Scholar
He Y, Geng Z (2008) Active learning of causal networks with intervention experiments and optimal designs. J Mach Learn Res 9(3):2523–2547
MathSciNet MATH Google Scholar
He Y, Jia J, Yu B (2013) Reversible mcmc on markov equivalence classes of sparse directed acyclic graphs. Ann Stat 41(4):1742–1779
Article MathSciNet MATH Google Scholar
He Y, Jia J, Yu B (2015) Counting and exploring sizes of markov equivalence classes of directed acyclic graphs. J Mach Learn Res 16(1):2589–2609
MathSciNet MATH Google Scholar
Heckerman D, Geiger D, Chickering D (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20(3):197–243
MATH Google Scholar
Heckerman D, Meek C, Cooper G (1999) A Bayesian approach to causal discovery. Comput Causat Discov 143–67
Jia J, Rohe K, Yu B (2013) The lasso under Poisson-like heteroscedasticity. Stat Sinica 99–118
Jia J, Rohe K et al (2015) Preconditioning the lasso for sign consistency. Electron J Stat 9(1):1150–1172
Article MathSciNet MATH Google Scholar
Kalisch M, Bühlmann P (2005) Estimating high-dimensional directed acyclic graphs with the pc-algorithm. J Mach Learn Res 8(2):613–636
MATH Google Scholar
Klaassen CA, Wellner JA et al (1997) Efficient estimation in the bivariate normal copula model: normal margins are least favourable. Bernoulli 3(1):55–77
Article MathSciNet MATH Google Scholar
Lam W, Bacchus F (1993) Using causal information and local measures to learn Bayesian networks. In: International conference on uncertainty in artificial intelligence, pp 243–250
Lauritzen S (1996) Graphical models. Oxford University Press, USA
MATH Google Scholar
Lim C, Yu B (2016) Estimation stability with cross-validation (ESCV). J Comput Graph Stat 25(2):464–492
Article MathSciNet Google Scholar
Liu H, Lafferty J, Wasserman L (2009) The nonparanormal: semiparametric estimation of high dimensional undirected graphs. J Mach Learn Res 10(10):2295–2328
MathSciNet MATH Google Scholar
Liu H, Han F, Yuan M, Lafferty J, Wasserman L (2012) High-dimensional semiparametric Gaussian copula graphical models. Ann Stat 40(4):2293–2326
Article MathSciNet MATH Google Scholar
Maathuis MH, Kalisch M, Bühlmann P (2009) Estimating high-dimensional intervention effects from observational data. Ann Stat 37(6A):3133–3164
Article MathSciNet MATH Google Scholar
Madigan D, Andersson S, Perlman M, Volinsky C (1996) Bayesian model averaging and model selection for markov equivalence classes of acyclic digraphs. Commun Stat Theory Meth 25(11):2493–2519
Article MATH Google Scholar
Meek C (1995) Causal inference and causal explanation with background knowledge. In: 11th conference on uncertainty in artificial intelligence, pp 403–410
Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34(3):1436–1462
Article MathSciNet MATH Google Scholar
Munteanu P, Bendou M (2001) The eq framework for learning equivalence classes of Bayesian networks. In: Data mining, 2001. ICDM 2001, proceedings IEEE international conference on IEEE, pp 417–424
Murphy KP (2001) Active learning of causal Bayes net structure. Technical report, Department of Computer Science, University of California, Berkeley
Pearl J (2000) Causality: models, reasoning, and inference. Cambridge University Press, Cambridge
MATH Google Scholar
Pearl J, Verma TS (1991) A theory of inferred causation. Prin Knowl Repres Reas Proc Second Int Conf 11:441–452
MathSciNet MATH Google Scholar
Pellet J, Elisseeff A (2008) Using Markov blankets for causal structure learning. J Mach Learn Res 9(9):1295–1342
MathSciNet MATH Google Scholar
Peña JM, Nilsson R, Björkegren J, Tegnér J (2007) Towards scalable and data efficient learning of Markov boundaries. Inter J Approxim Reas 45:211–232
Article MATH Google Scholar
Ramsey J (2006) A pc-style Markov blanket for high dimensional datasets. Technical report, CMU-PHIL-177, Carnegie Mellon University, Department of Philosophy, Pennsylvania
Ravikumar P, Wainwright MJ, Lafferty JD et al (2010) High-dimensional ising model selection using \(\ell _1\)-regularized logistic regression. Ann Stat 38(3):1287–1319
Article MATH Google Scholar
Spirtes P, Glymour C (1991) An algorithm for fast recovery of sparse causal graphs. Soc Sci Comput Rev 9(1):62–72
Article Google Scholar
Spirtes P, Glymour CN, Scheines R (2000) Causation, prediction, and search, vol 81. MIT press, New York
MATH Google Scholar
Suzuki J (1993) A construction of Bayesian networks from databases based on an mdl principle. In: 9th international conference on uncertainty in artificial intelligence, pp 266–273
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol), 267–288
Tong S, Koller D (2001) Active learning for structure in Bayesian networks. Int Jt Conf Artif Intell Citeseer 17:863–869
Google Scholar
Triantafillou S, Tsamardinos I (2014) Constraint-based causal discovery from multiple interventions over overlapping variable sets. J Mach Learn Res 16(1):2147–2205
MathSciNet MATH Google Scholar
Tsamardinos I, Brown LE, Aliferis CF (2006) The max-min hill-climbing Bayesian network structure learning algorithm. Mach Learn 65(1):31–78
Article Google Scholar
Tsukahara H (2005) Semiparametric estimation in copula models. Can J Stat 33(3):357–375
Article MathSciNet MATH Google Scholar
Verma T, Pearl J (1990) Equivalence and synthesis of causal models. In: 6th international conference on uncertainty in artificial intelligence. Elsevier Science Inc., Amsterdam, p 270
Wang C, Zhou Y, Zhao Q, Geng Z (2014) Discovering and orienting the edges connected to a target variable in a dag via a sequential local learning approach. Comput Stat Data Anal 77:252–266
Article MathSciNet Google Scholar
Whittaker J (2009) Graphical models in applied multivariate statistics. Wiley Publishing, New York
MATH Google Scholar
Xie X, Geng Z (2008) A recursive method for structural learning of directed acyclic graphs. J Mach Learn Res 9(3):459–483
MathSciNet MATH Google Scholar
Xie X, Geng Z, Zhao Q (2006) Decomposition of structural learning about directed acyclic graphs. Artif Intell 170(4–5):422–439
Article MathSciNet MATH Google Scholar
Xue L, Zou H et al (2012) Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann Stat 40(5):2541–2571
Article MathSciNet MATH Google Scholar
Yin J, Zhou Y, Wang C, He P, Zheng C, Geng Z (2011) Partial orientation and local structural learning of causal networks for prediction. In: Guyon I, Aliferis C, Cooper G, Elisseeff A, Pellet J, Spirtes P, Statnikov A (eds) Challenges in causality. Causation and prediction challenge, vol 1. 3:93–105
Yuan M, Lin Y (2007) Model selection and estimation in the Gaussian graphical model. Biometrika 94(1):19–35
Article MathSciNet MATH Google Scholar
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942
Article MathSciNet MATH Google Scholar
Zhang J (2008) Causal reasoning with ancestral graphs. J Mach Learn Res 9(3):1437–1474
MathSciNet MATH Google Scholar
Zhao P, Yu B (2006) On model selection consistency of lasso. J Mach Learn Res 7(11):2541–2563
MathSciNet MATH Google Scholar
Zhou Q (2013) Learning sparse causal Gaussian networks with experimental intervention: regularization and coordinate descent. J Am Stat Assoc 108(501):288–300
Article MathSciNet MATH Google Scholar
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We would like to thank the Editor and Reviewers for valuable comments and suggestions. This research was supported by 863 Program of China (2015AA020507), 973 Program of China (2015CB856000) and NSFC (11331011,11671020). The authors would like to thank Dr. Lan Liu for valuable discussion.

Author information

Authors and Affiliations

LMAM, School of Mathematical Sciences, Center for Statistical Science, Peking University, Beijing, 100871, China
Yangbo He, Jinzhu Jia & Zhi Geng

Authors

Yangbo He
View author publications
You can also search for this author in PubMed Google Scholar
Jinzhu Jia
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Geng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi Geng.

Additional information

Communicated by Brandon Malone.

Y. He and J. Jia have contributed equally.

About this article

Cite this article

He, Y., Jia, J. & Geng, Z. Structural learning of causal networks. Behaviormetrika 44, 287–305 (2017). https://doi.org/10.1007/s41237-017-0018-8

Download citation

Received: 29 September 2016
Accepted: 19 January 2017
Published: 14 February 2017
Issue Date: January 2017
DOI: https://doi.org/10.1007/s41237-017-0018-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Structural learning of causal networks

Abstract

Access this article

Similar content being viewed by others

Causal Structure Learning: A Combinatorial Perspective

Structural learning and estimation of joint causal effects among network-dependent variables

Non-Gaussian Methods for Causal Structure Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Keywords

Navigation

Structural learning of causal networks

Abstract

Access this article

Similar content being viewed by others

Causal Structure Learning: A Combinatorial Perspective

Structural learning and estimation of joint causal effects among network-dependent variables

Non-Gaussian Methods for Causal Structure Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Share this article

Keywords

Search

Navigation