Skip to main content
Log in

Structural learning of causal networks

  • Invited Paper
  • Published:
Behaviormetrika Aims and scope Submit manuscript

Abstract

Causal network models are popular statistical tools to represent dependencies or causal relationships among variables in complex systems. Structural learning of causal networks is crucial to discover the causal knowledge and to infer casual effects. In this paper, we discuss structural learning of two types of graphical models, undirected graphs and directed acyclic graphs. We first introduce the methods for learning undirected graphical models. Then we discuss structural learning of directed acyclic graphs. We focus on the issues on model space of causal networks, decomposition learning of structures from observational data, local structural learning approaches and the active learning for optimal designs of intervention.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos X (2010) Local causal and Markov blanket induction for causal discovery and feature selection for classification Part I: algorithms and empirical evaluation. J Mach Learn Res 11(1):171–234

    MathSciNet  MATH  Google Scholar 

  • Andersson SA, Madigan D, Perlman MD (1997) A characterization of Markov equivalence classes for acyclic digraphs. Ann Stat 25(2):505–541

    Article  MathSciNet  MATH  Google Scholar 

  • Bai X, Glymour C (2004) Pcx: Markov blanket classification for large data sets with few cases. Tech report: CMU-CALD-04-102

  • Bai X, Padman R, Ramsey J, Spirtes P (2008) Tabu search-enhanced graphical models for classification in high dimensions. Inf J Comput 20(3):423–437

    Article  MathSciNet  MATH  Google Scholar 

  • Banerjee O, Ghaoui LE, d’-Aspremont A (2008) Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res 9(3):485–516

    MathSciNet  MATH  Google Scholar 

  • Bouckaert RR (1993) Probabilistic network construction using the minimum description length principle. In: European conference on symbolic and quantitative approaches to reasoning and uncertainty

  • Cai T, Liu W, Luo X (2011) A constrained l 1 minimization approach to sparse precision matrix estimation. J Am Stat Assoc 106(494):594–607

    Article  MathSciNet  MATH  Google Scholar 

  • Castelo R, Perlman MD (2004) Learning essential graph Markov models from data. Stud Fuzz Soft Comput 146:255–270

    Article  MathSciNet  Google Scholar 

  • Chandrasekaran V, Parrilo PA, Willsky AS et al (2012) Latent variable graphical model selection via convex optimization. Ann Stat 40(4):1935–1967

    Article  MathSciNet  MATH  Google Scholar 

  • Chickering DM (1995) A transformational characterization of equivalent Bayesian network structures. In: 11th conference on uncertainty in artificial intelligence, pp 87–98

  • Chickering DM (2002) Learning equivalence classes of Bayesian-network structures. J Mach Learn Res 2(3):445–498

    MathSciNet  MATH  Google Scholar 

  • Chickering DM (2003) Optimal structure identification with greedy search. J Mach Learn Res 3(3):507–554

    MathSciNet  MATH  Google Scholar 

  • Cooper GF, Yoo C (1999) Causal discovery from a mixture of experimental and observational data. In: 15th conference on uncertainty in artificial intellegence, pp 116–125

  • Dahl J, Vandenberghe L, Roychowdhury V (2008) Covariance selection for nonchordal graphs via chordal embedding. Optim Meth Softw 23(4):501–520

    Article  MathSciNet  MATH  Google Scholar 

  • Dash D, Druzdzel M (1999) A hybrid anytime algorithm for the construction of causal models from sparse data. In: 15th conference on uncertainty in artificial intelligence, pp 142–149

  • Dempster AP (1972) Covariance selection. In: Biometrics, pp 157–175

  • Deng W, Geng Z, Li H (2013) Learning local directed acyclic graphs based on multivariate time series data. Ann Appl Stat 7(3):1663–1683

    Article  MathSciNet  MATH  Google Scholar 

  • Drton M, Perlman MD (2004) Model selection for Gaussian concentration graphs. Biometrika 91(3):591–602

    Article  MathSciNet  MATH  Google Scholar 

  • Eberhardt F, Scheines R (2007) Interventions and causal inference. Philos Sci 74(5):981–995

    Article  MathSciNet  Google Scholar 

  • Edwards D (2012) Introduction to graphical modelling. Springer Science & Business Media, Berlin

    MATH  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  MATH  Google Scholar 

  • Fan Y, Xu J, Shelton CR (2010) Importance sampling for continuous time Bayesian networks. J Mach Learn Res 11(2):2115–2140

    MathSciNet  MATH  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441

    Article  MATH  Google Scholar 

  • Friedman N, Yakhini Z (2013) On the sample complexity of learning Bayesian networks. Comput Sci 274–282

  • Gillispie S, Perlman M (2002) The size distribution for Markov equivalence classes of acyclic digraph models. Artif Intell 141(1–2):137–155

    Article  MathSciNet  MATH  Google Scholar 

  • Gillispie SB (2006) Formulas for counting acyclic digraph Markov equivalence classes. J Stat Plann Infer 136(4):1410–1432

    Article  MathSciNet  MATH  Google Scholar 

  • Goudie RJB, Mukherjee S (2016) A Gibbs sampler for learning dags. J Mach Learn Res 17(30):1–39

    MathSciNet  MATH  Google Scholar 

  • Guyon I, Aliferis C, Cooper G, Elisseeff A, J Pellet PS, Statnikov A (2011) Design and analysis of the causation and prediction challenge. In: Challenges in causality. Causation and prediction challenge, vol 1, pp 1–33

  • Hauser A, Bühlmann P (2012a) Characterization and greedy learning of interventional markov equivalence classes of directed acyclic graphs. J Mach Learn Res 13(1):2409–2464

    MathSciNet  MATH  Google Scholar 

  • Hauser A, Bühlmann P (2012b) Two optimal strategies for active learning of causal models from interventional data. Int J Approx Reas 55(4):926–939

    Article  MathSciNet  MATH  Google Scholar 

  • Hauser A, Bühlmann P (2015) Jointly interventional and observational data: estimation of interventional markov equivalence classes of directed acyclic graphs. J R Stat Soc Ser B (Stat Methodol) 77(1):291–318

    Article  MathSciNet  Google Scholar 

  • He Y, Geng Z (2008) Active learning of causal networks with intervention experiments and optimal designs. J Mach Learn Res 9(3):2523–2547

    MathSciNet  MATH  Google Scholar 

  • He Y, Jia J, Yu B (2013) Reversible mcmc on markov equivalence classes of sparse directed acyclic graphs. Ann Stat 41(4):1742–1779

    Article  MathSciNet  MATH  Google Scholar 

  • He Y, Jia J, Yu B (2015) Counting and exploring sizes of markov equivalence classes of directed acyclic graphs. J Mach Learn Res 16(1):2589–2609

    MathSciNet  MATH  Google Scholar 

  • Heckerman D, Geiger D, Chickering D (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20(3):197–243

    MATH  Google Scholar 

  • Heckerman D, Meek C, Cooper G (1999) A Bayesian approach to causal discovery. Comput Causat Discov 143–67

  • Jia J, Rohe K, Yu B (2013) The lasso under Poisson-like heteroscedasticity. Stat Sinica 99–118

  • Jia J, Rohe K et al (2015) Preconditioning the lasso for sign consistency. Electron J Stat 9(1):1150–1172

    Article  MathSciNet  MATH  Google Scholar 

  • Kalisch M, Bühlmann P (2005) Estimating high-dimensional directed acyclic graphs with the pc-algorithm. J Mach Learn Res 8(2):613–636

    MATH  Google Scholar 

  • Klaassen CA, Wellner JA et al (1997) Efficient estimation in the bivariate normal copula model: normal margins are least favourable. Bernoulli 3(1):55–77

    Article  MathSciNet  MATH  Google Scholar 

  • Lam W, Bacchus F (1993) Using causal information and local measures to learn Bayesian networks. In: International conference on uncertainty in artificial intelligence, pp 243–250

  • Lauritzen S (1996) Graphical models. Oxford University Press, USA

    MATH  Google Scholar 

  • Lim C, Yu B (2016) Estimation stability with cross-validation (ESCV). J Comput Graph Stat 25(2):464–492

    Article  MathSciNet  Google Scholar 

  • Liu H, Lafferty J, Wasserman L (2009) The nonparanormal: semiparametric estimation of high dimensional undirected graphs. J Mach Learn Res 10(10):2295–2328

    MathSciNet  MATH  Google Scholar 

  • Liu H, Han F, Yuan M, Lafferty J, Wasserman L (2012) High-dimensional semiparametric Gaussian copula graphical models. Ann Stat 40(4):2293–2326

    Article  MathSciNet  MATH  Google Scholar 

  • Maathuis MH, Kalisch M, Bühlmann P (2009) Estimating high-dimensional intervention effects from observational data. Ann Stat 37(6A):3133–3164

    Article  MathSciNet  MATH  Google Scholar 

  • Madigan D, Andersson S, Perlman M, Volinsky C (1996) Bayesian model averaging and model selection for markov equivalence classes of acyclic digraphs. Commun Stat Theory Meth 25(11):2493–2519

    Article  MATH  Google Scholar 

  • Meek C (1995) Causal inference and causal explanation with background knowledge. In: 11th conference on uncertainty in artificial intelligence, pp 403–410

  • Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34(3):1436–1462

    Article  MathSciNet  MATH  Google Scholar 

  • Munteanu P, Bendou M (2001) The eq framework for learning equivalence classes of Bayesian networks. In: Data mining, 2001. ICDM 2001, proceedings IEEE international conference on IEEE, pp 417–424

  • Murphy KP (2001) Active learning of causal Bayes net structure. Technical report, Department of Computer Science, University of California, Berkeley

  • Pearl J (2000) Causality: models, reasoning, and inference. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Pearl J, Verma TS (1991) A theory of inferred causation. Prin Knowl Repres Reas Proc Second Int Conf 11:441–452

    MathSciNet  MATH  Google Scholar 

  • Pellet J, Elisseeff A (2008) Using Markov blankets for causal structure learning. J Mach Learn Res 9(9):1295–1342

    MathSciNet  MATH  Google Scholar 

  • Peña JM, Nilsson R, Björkegren J, Tegnér J (2007) Towards scalable and data efficient learning of Markov boundaries. Inter J Approxim Reas 45:211–232

    Article  MATH  Google Scholar 

  • Ramsey J (2006) A pc-style Markov blanket for high dimensional datasets. Technical report, CMU-PHIL-177, Carnegie Mellon University, Department of Philosophy, Pennsylvania

  • Ravikumar P, Wainwright MJ, Lafferty JD et al (2010) High-dimensional ising model selection using \(\ell _1\)-regularized logistic regression. Ann Stat 38(3):1287–1319

    Article  MATH  Google Scholar 

  • Spirtes P, Glymour C (1991) An algorithm for fast recovery of sparse causal graphs. Soc Sci Comput Rev 9(1):62–72

    Article  Google Scholar 

  • Spirtes P, Glymour CN, Scheines R (2000) Causation, prediction, and search, vol 81. MIT press, New York

    MATH  Google Scholar 

  • Suzuki J (1993) A construction of Bayesian networks from databases based on an mdl principle. In: 9th international conference on uncertainty in artificial intelligence, pp 266–273

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol), 267–288

  • Tong S, Koller D (2001) Active learning for structure in Bayesian networks. Int Jt Conf Artif Intell Citeseer 17:863–869

    Google Scholar 

  • Triantafillou S, Tsamardinos I (2014) Constraint-based causal discovery from multiple interventions over overlapping variable sets. J Mach Learn Res 16(1):2147–2205

    MathSciNet  MATH  Google Scholar 

  • Tsamardinos I, Brown LE, Aliferis CF (2006) The max-min hill-climbing Bayesian network structure learning algorithm. Mach Learn 65(1):31–78

    Article  Google Scholar 

  • Tsukahara H (2005) Semiparametric estimation in copula models. Can J Stat 33(3):357–375

    Article  MathSciNet  MATH  Google Scholar 

  • Verma T, Pearl J (1990) Equivalence and synthesis of causal models. In: 6th international conference on uncertainty in artificial intelligence. Elsevier Science Inc., Amsterdam, p 270

  • Wang C, Zhou Y, Zhao Q, Geng Z (2014) Discovering and orienting the edges connected to a target variable in a dag via a sequential local learning approach. Comput Stat Data Anal 77:252–266

    Article  MathSciNet  Google Scholar 

  • Whittaker J (2009) Graphical models in applied multivariate statistics. Wiley Publishing, New York

    MATH  Google Scholar 

  • Xie X, Geng Z (2008) A recursive method for structural learning of directed acyclic graphs. J Mach Learn Res 9(3):459–483

    MathSciNet  MATH  Google Scholar 

  • Xie X, Geng Z, Zhao Q (2006) Decomposition of structural learning about directed acyclic graphs. Artif Intell 170(4–5):422–439

    Article  MathSciNet  MATH  Google Scholar 

  • Xue L, Zou H et al (2012) Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann Stat 40(5):2541–2571

    Article  MathSciNet  MATH  Google Scholar 

  • Yin J, Zhou Y, Wang C, He P, Zheng C, Geng Z (2011) Partial orientation and local structural learning of causal networks for prediction. In: Guyon I, Aliferis C, Cooper G, Elisseeff A, Pellet J, Spirtes P, Statnikov A (eds) Challenges in causality. Causation and prediction challenge, vol 1. 3:93–105

  • Yuan M, Lin Y (2007) Model selection and estimation in the Gaussian graphical model. Biometrika 94(1):19–35

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang J (2008) Causal reasoning with ancestral graphs. J Mach Learn Res 9(3):1437–1474

    MathSciNet  MATH  Google Scholar 

  • Zhao P, Yu B (2006) On model selection consistency of lasso. J Mach Learn Res 7(11):2541–2563

    MathSciNet  MATH  Google Scholar 

  • Zhou Q (2013) Learning sparse causal Gaussian networks with experimental intervention: regularization and coordinate descent. J Am Stat Assoc 108(501):288–300

    Article  MathSciNet  MATH  Google Scholar 

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank the Editor and Reviewers for valuable comments and suggestions. This research was supported by 863 Program of China (2015AA020507), 973 Program of China (2015CB856000) and NSFC (11331011,11671020). The authors would like to thank Dr. Lan Liu for valuable discussion.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhi Geng.

Additional information

Communicated by Brandon Malone.

Y. He and J. Jia have contributed equally.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, Y., Jia, J. & Geng, Z. Structural learning of causal networks. Behaviormetrika 44, 287–305 (2017). https://doi.org/10.1007/s41237-017-0018-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41237-017-0018-8

Keywords

Navigation