, Volume 44, Issue 1, pp 287–305 | Cite as

Structural learning of causal networks

Invited Paper


Causal network models are popular statistical tools to represent dependencies or causal relationships among variables in complex systems. Structural learning of causal networks is crucial to discover the causal knowledge and to infer casual effects. In this paper, we discuss structural learning of two types of graphical models, undirected graphs and directed acyclic graphs. We first introduce the methods for learning undirected graphical models. Then we discuss structural learning of directed acyclic graphs. We focus on the issues on model space of causal networks, decomposition learning of structures from observational data, local structural learning approaches and the active learning for optimal designs of intervention.


Causal network Directed acyclic graph Discover causes and effects Structural learning 



We would like to thank the Editor and Reviewers for valuable comments and suggestions. This research was supported by 863 Program of China (2015AA020507), 973 Program of China (2015CB856000) and NSFC (11331011,11671020). The authors would like to thank Dr. Lan Liu for valuable discussion.


  1. Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos X (2010) Local causal and Markov blanket induction for causal discovery and feature selection for classification Part I: algorithms and empirical evaluation. J Mach Learn Res 11(1):171–234MathSciNetMATHGoogle Scholar
  2. Andersson SA, Madigan D, Perlman MD (1997) A characterization of Markov equivalence classes for acyclic digraphs. Ann Stat 25(2):505–541MathSciNetCrossRefMATHGoogle Scholar
  3. Bai X, Glymour C (2004) Pcx: Markov blanket classification for large data sets with few cases. Tech report: CMU-CALD-04-102Google Scholar
  4. Bai X, Padman R, Ramsey J, Spirtes P (2008) Tabu search-enhanced graphical models for classification in high dimensions. Inf J Comput 20(3):423–437MathSciNetCrossRefMATHGoogle Scholar
  5. Banerjee O, Ghaoui LE, d’-Aspremont A (2008) Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res 9(3):485–516MathSciNetMATHGoogle Scholar
  6. Bouckaert RR (1993) Probabilistic network construction using the minimum description length principle. In: European conference on symbolic and quantitative approaches to reasoning and uncertaintyGoogle Scholar
  7. Cai T, Liu W, Luo X (2011) A constrained l 1 minimization approach to sparse precision matrix estimation. J Am Stat Assoc 106(494):594–607MathSciNetCrossRefMATHGoogle Scholar
  8. Castelo R, Perlman MD (2004) Learning essential graph Markov models from data. Stud Fuzz Soft Comput 146:255–270MathSciNetCrossRefGoogle Scholar
  9. Chandrasekaran V, Parrilo PA, Willsky AS et al (2012) Latent variable graphical model selection via convex optimization. Ann Stat 40(4):1935–1967MathSciNetCrossRefMATHGoogle Scholar
  10. Chickering DM (1995) A transformational characterization of equivalent Bayesian network structures. In: 11th conference on uncertainty in artificial intelligence, pp 87–98Google Scholar
  11. Chickering DM (2002) Learning equivalence classes of Bayesian-network structures. J Mach Learn Res 2(3):445–498MathSciNetMATHGoogle Scholar
  12. Chickering DM (2003) Optimal structure identification with greedy search. J Mach Learn Res 3(3):507–554MathSciNetMATHGoogle Scholar
  13. Cooper GF, Yoo C (1999) Causal discovery from a mixture of experimental and observational data. In: 15th conference on uncertainty in artificial intellegence, pp 116–125Google Scholar
  14. Dahl J, Vandenberghe L, Roychowdhury V (2008) Covariance selection for nonchordal graphs via chordal embedding. Optim Meth Softw 23(4):501–520MathSciNetCrossRefMATHGoogle Scholar
  15. Dash D, Druzdzel M (1999) A hybrid anytime algorithm for the construction of causal models from sparse data. In: 15th conference on uncertainty in artificial intelligence, pp 142–149Google Scholar
  16. Dempster AP (1972) Covariance selection. In: Biometrics, pp 157–175Google Scholar
  17. Deng W, Geng Z, Li H (2013) Learning local directed acyclic graphs based on multivariate time series data. Ann Appl Stat 7(3):1663–1683MathSciNetCrossRefMATHGoogle Scholar
  18. Drton M, Perlman MD (2004) Model selection for Gaussian concentration graphs. Biometrika 91(3):591–602MathSciNetCrossRefMATHGoogle Scholar
  19. Eberhardt F, Scheines R (2007) Interventions and causal inference. Philos Sci 74(5):981–995MathSciNetCrossRefGoogle Scholar
  20. Edwards D (2012) Introduction to graphical modelling. Springer Science & Business Media, BerlinMATHGoogle Scholar
  21. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360MathSciNetCrossRefMATHGoogle Scholar
  22. Fan Y, Xu J, Shelton CR (2010) Importance sampling for continuous time Bayesian networks. J Mach Learn Res 11(2):2115–2140MathSciNetMATHGoogle Scholar
  23. Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441CrossRefMATHGoogle Scholar
  24. Friedman N, Yakhini Z (2013) On the sample complexity of learning Bayesian networks. Comput Sci 274–282Google Scholar
  25. Gillispie S, Perlman M (2002) The size distribution for Markov equivalence classes of acyclic digraph models. Artif Intell 141(1–2):137–155MathSciNetCrossRefMATHGoogle Scholar
  26. Gillispie SB (2006) Formulas for counting acyclic digraph Markov equivalence classes. J Stat Plann Infer 136(4):1410–1432MathSciNetCrossRefMATHGoogle Scholar
  27. Goudie RJB, Mukherjee S (2016) A Gibbs sampler for learning dags. J Mach Learn Res 17(30):1–39MathSciNetMATHGoogle Scholar
  28. Guyon I, Aliferis C, Cooper G, Elisseeff A, J Pellet PS, Statnikov A (2011) Design and analysis of the causation and prediction challenge. In: Challenges in causality. Causation and prediction challenge, vol 1, pp 1–33Google Scholar
  29. Hauser A, Bühlmann P (2012a) Characterization and greedy learning of interventional markov equivalence classes of directed acyclic graphs. J Mach Learn Res 13(1):2409–2464MathSciNetMATHGoogle Scholar
  30. Hauser A, Bühlmann P (2012b) Two optimal strategies for active learning of causal models from interventional data. Int J Approx Reas 55(4):926–939MathSciNetCrossRefMATHGoogle Scholar
  31. Hauser A, Bühlmann P (2015) Jointly interventional and observational data: estimation of interventional markov equivalence classes of directed acyclic graphs. J R Stat Soc Ser B (Stat Methodol) 77(1):291–318MathSciNetCrossRefGoogle Scholar
  32. He Y, Geng Z (2008) Active learning of causal networks with intervention experiments and optimal designs. J Mach Learn Res 9(3):2523–2547MathSciNetMATHGoogle Scholar
  33. He Y, Jia J, Yu B (2013) Reversible mcmc on markov equivalence classes of sparse directed acyclic graphs. Ann Stat 41(4):1742–1779MathSciNetCrossRefMATHGoogle Scholar
  34. He Y, Jia J, Yu B (2015) Counting and exploring sizes of markov equivalence classes of directed acyclic graphs. J Mach Learn Res 16(1):2589–2609MathSciNetMATHGoogle Scholar
  35. Heckerman D, Geiger D, Chickering D (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20(3):197–243MATHGoogle Scholar
  36. Heckerman D, Meek C, Cooper G (1999) A Bayesian approach to causal discovery. Comput Causat Discov 143–67Google Scholar
  37. Jia J, Rohe K, Yu B (2013) The lasso under Poisson-like heteroscedasticity. Stat Sinica 99–118Google Scholar
  38. Jia J, Rohe K et al (2015) Preconditioning the lasso for sign consistency. Electron J Stat 9(1):1150–1172MathSciNetCrossRefMATHGoogle Scholar
  39. Kalisch M, Bühlmann P (2005) Estimating high-dimensional directed acyclic graphs with the pc-algorithm. J Mach Learn Res 8(2):613–636MATHGoogle Scholar
  40. Klaassen CA, Wellner JA et al (1997) Efficient estimation in the bivariate normal copula model: normal margins are least favourable. Bernoulli 3(1):55–77MathSciNetCrossRefMATHGoogle Scholar
  41. Lam W, Bacchus F (1993) Using causal information and local measures to learn Bayesian networks. In: International conference on uncertainty in artificial intelligence, pp 243–250Google Scholar
  42. Lauritzen S (1996) Graphical models. Oxford University Press, USAMATHGoogle Scholar
  43. Lim C, Yu B (2016) Estimation stability with cross-validation (ESCV). J Comput Graph Stat 25(2):464–492MathSciNetCrossRefGoogle Scholar
  44. Liu H, Lafferty J, Wasserman L (2009) The nonparanormal: semiparametric estimation of high dimensional undirected graphs. J Mach Learn Res 10(10):2295–2328MathSciNetMATHGoogle Scholar
  45. Liu H, Han F, Yuan M, Lafferty J, Wasserman L (2012) High-dimensional semiparametric Gaussian copula graphical models. Ann Stat 40(4):2293–2326MathSciNetCrossRefMATHGoogle Scholar
  46. Maathuis MH, Kalisch M, Bühlmann P (2009) Estimating high-dimensional intervention effects from observational data. Ann Stat 37(6A):3133–3164MathSciNetCrossRefMATHGoogle Scholar
  47. Madigan D, Andersson S, Perlman M, Volinsky C (1996) Bayesian model averaging and model selection for markov equivalence classes of acyclic digraphs. Commun Stat Theory Meth 25(11):2493–2519CrossRefMATHGoogle Scholar
  48. Meek C (1995) Causal inference and causal explanation with background knowledge. In: 11th conference on uncertainty in artificial intelligence, pp 403–410Google Scholar
  49. Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34(3):1436–1462MathSciNetCrossRefMATHGoogle Scholar
  50. Munteanu P, Bendou M (2001) The eq framework for learning equivalence classes of Bayesian networks. In: Data mining, 2001. ICDM 2001, proceedings IEEE international conference on IEEE, pp 417–424Google Scholar
  51. Murphy KP (2001) Active learning of causal Bayes net structure. Technical report, Department of Computer Science, University of California, BerkeleyGoogle Scholar
  52. Pearl J (2000) Causality: models, reasoning, and inference. Cambridge University Press, CambridgeMATHGoogle Scholar
  53. Pearl J, Verma TS (1991) A theory of inferred causation. Prin Knowl Repres Reas Proc Second Int Conf 11:441–452MathSciNetMATHGoogle Scholar
  54. Pellet J, Elisseeff A (2008) Using Markov blankets for causal structure learning. J Mach Learn Res 9(9):1295–1342MathSciNetMATHGoogle Scholar
  55. Peña JM, Nilsson R, Björkegren J, Tegnér J (2007) Towards scalable and data efficient learning of Markov boundaries. Inter J Approxim Reas 45:211–232CrossRefMATHGoogle Scholar
  56. Ramsey J (2006) A pc-style Markov blanket for high dimensional datasets. Technical report, CMU-PHIL-177, Carnegie Mellon University, Department of Philosophy, PennsylvaniaGoogle Scholar
  57. Ravikumar P, Wainwright MJ, Lafferty JD et al (2010) High-dimensional ising model selection using \(\ell _1\)-regularized logistic regression. Ann Stat 38(3):1287–1319CrossRefMATHGoogle Scholar
  58. Spirtes P, Glymour C (1991) An algorithm for fast recovery of sparse causal graphs. Soc Sci Comput Rev 9(1):62–72CrossRefGoogle Scholar
  59. Spirtes P, Glymour CN, Scheines R (2000) Causation, prediction, and search, vol 81. MIT press, New YorkMATHGoogle Scholar
  60. Suzuki J (1993) A construction of Bayesian networks from databases based on an mdl principle. In: 9th international conference on uncertainty in artificial intelligence, pp 266–273Google Scholar
  61. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol), 267–288Google Scholar
  62. Tong S, Koller D (2001) Active learning for structure in Bayesian networks. Int Jt Conf Artif Intell Citeseer 17:863–869Google Scholar
  63. Triantafillou S, Tsamardinos I (2014) Constraint-based causal discovery from multiple interventions over overlapping variable sets. J Mach Learn Res 16(1):2147–2205MathSciNetMATHGoogle Scholar
  64. Tsamardinos I, Brown LE, Aliferis CF (2006) The max-min hill-climbing Bayesian network structure learning algorithm. Mach Learn 65(1):31–78CrossRefGoogle Scholar
  65. Tsukahara H (2005) Semiparametric estimation in copula models. Can J Stat 33(3):357–375MathSciNetCrossRefMATHGoogle Scholar
  66. Verma T, Pearl J (1990) Equivalence and synthesis of causal models. In: 6th international conference on uncertainty in artificial intelligence. Elsevier Science Inc., Amsterdam, p 270Google Scholar
  67. Wang C, Zhou Y, Zhao Q, Geng Z (2014) Discovering and orienting the edges connected to a target variable in a dag via a sequential local learning approach. Comput Stat Data Anal 77:252–266MathSciNetCrossRefGoogle Scholar
  68. Whittaker J (2009) Graphical models in applied multivariate statistics. Wiley Publishing, New YorkMATHGoogle Scholar
  69. Xie X, Geng Z (2008) A recursive method for structural learning of directed acyclic graphs. J Mach Learn Res 9(3):459–483MathSciNetMATHGoogle Scholar
  70. Xie X, Geng Z, Zhao Q (2006) Decomposition of structural learning about directed acyclic graphs. Artif Intell 170(4–5):422–439MathSciNetCrossRefMATHGoogle Scholar
  71. Xue L, Zou H et al (2012) Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann Stat 40(5):2541–2571MathSciNetCrossRefMATHGoogle Scholar
  72. Yin J, Zhou Y, Wang C, He P, Zheng C, Geng Z (2011) Partial orientation and local structural learning of causal networks for prediction. In: Guyon I, Aliferis C, Cooper G, Elisseeff A, Pellet J, Spirtes P, Statnikov A (eds) Challenges in causality. Causation and prediction challenge, vol 1. 3:93–105Google Scholar
  73. Yuan M, Lin Y (2007) Model selection and estimation in the Gaussian graphical model. Biometrika 94(1):19–35MathSciNetCrossRefMATHGoogle Scholar
  74. Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942MathSciNetCrossRefMATHGoogle Scholar
  75. Zhang J (2008) Causal reasoning with ancestral graphs. J Mach Learn Res 9(3):1437–1474MathSciNetMATHGoogle Scholar
  76. Zhao P, Yu B (2006) On model selection consistency of lasso. J Mach Learn Res 7(11):2541–2563MathSciNetMATHGoogle Scholar
  77. Zhou Q (2013) Learning sparse causal Gaussian networks with experimental intervention: regularization and coordinate descent. J Am Stat Assoc 108(501):288–300MathSciNetCrossRefMATHGoogle Scholar
  78. Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© The Behaviormetric Society 2017

Authors and Affiliations

  1. 1.LMAM, School of Mathematical Sciences, Center for Statistical SciencePeking UniversityBeijingChina

Personalised recommendations