Skip to main content
Log in

Exact estimation of multiple directed acyclic graphs

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

This paper considers structure learning for multiple related directed acyclic graph (DAG) models. Building on recent developments in exact estimation of DAGs using integer linear programming (ILP), we present an ILP approach for joint estimation over multiple DAGs. Unlike previous work, we do not require that the vertices in each DAG share a common ordering. Furthermore, we allow for (potentially unknown) dependency structure between the DAGs. Results are presented on both simulated data and fMRI data obtained from multiple subjects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Achterberg, T.: SCIP: solving constraint integer programs. Math Program Comput 1(1), 1–41 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Bartlett, M., Cussens, J.: Advances in Bayesian network learning using integer programming. In: Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence, pp. 182–191 (2013)

  • Berg, J., Järvisalo, M., Malone, B.: Learning optimal bounded treewidth Bayesian networks via maximum satisfiability. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics 33, pp. 86–95 (2014)

  • Chickering, D.M.: Optimal structure identification with greedy search. J. Mach. Learn. Res. 3, 507–554 (2003)

    MathSciNet  MATH  Google Scholar 

  • Costa, L., Smith, J.Q., Nicholls, T., Cussens, J., Duff, E.P., Makin, T.R.: Searching multiregression dynamic models of resting-state fMRI networks using integer programming. Bayesian Anal., to appear (2015)

  • Cowell, R.G.: Efficient maximum likelihood pedigree reconstruction. Theor. Popul. Biol. 76, 285–291 (2009)

    Article  Google Scholar 

  • Cussens, J.: Maximum likelihood pedigree reconstruction using integer programming. In: Proceedings of the Workshop on Constraint Based Methods for Bioinformatics (WCB-10), Edinburgh (2010)

  • Cussens, J.: Bayesian network learning with cutting planes. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, pp. 153–160 (2011)

  • Danaher, P., Wang, P., Witten, D.M.: The joint graphical lasso for inverse covariance estimation across multiple classes. J. R. Stat. Soc. B 76(2), 373–397 (2014)

    Article  MathSciNet  Google Scholar 

  • De Campos, C.P., Ji, Q.: Efficient structure learning of Bayesian networks using constraints. J. Mach. Learn. Res. 12, 663–689 (2011)

    MathSciNet  MATH  Google Scholar 

  • Ellis, B., Wong, W.H.: Learning causal Bayesian network structures from experimental data. J. Am. Stat. Assoc. 103(482), 778–789 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman, N., Koller, D.: Being Bayesian about network structure: a Bayesian approach to structure discovery in Bayesian networks. Mach. Learn. 50(1–2), 95–126 (2003)

    Article  MATH  Google Scholar 

  • Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)

    Article  MATH  Google Scholar 

  • Friston, K.J.: Functional and effective connectivity: a review. Brain Connect. 1(1), 13–36 (2011)

    Article  MathSciNet  Google Scholar 

  • He, Y., Jia, J., Yu, B.: Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs. Ann. Stat. 41(4), 1742–1779 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)

    MATH  Google Scholar 

  • Hill, S., Lu, Y., Molina, J., Heiser, L.M., Spellman, P.T., Speed, T.P., Gray, J.W., Mills, G.B., Mukherjee, S.: Bayesian inference of signaling network topology in a cancer cell line. Bioinformatics 28(21), 2804–2810 (2012)

    Article  Google Scholar 

  • Jaakkola, T., Sontag, D., Globerson, A., Meila, M.: Learning Bayesian network structure using LP relaxations. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, pp. 358–365 (2010)

  • Lee, S.Y.: Structural Equation Modeling: A Bayesian Approach. Wiley, New York (2007)

    Book  Google Scholar 

  • Li, J., Wang, Z.J., Palmer, S.J., McKeown, M.J.: Dynamic Bayesian network modeling of fMRI: a comparison of group-analysis methods. Neuroimage 41(2), 398–407 (2008)

    Article  Google Scholar 

  • Loh, P.-L., Wainwright, M.J.: Structure estimation for discrete graphical models: generalized covariance matrices and their inverses. Ann. Stat. 41(6), 3022–3049 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Luis, R., Sucar, L.E., Morales, E.F.: Inductive transfer for learning Bayesian networks. Mach. Learn. 79(1–2), 227–255 (2010)

    Article  MathSciNet  Google Scholar 

  • Mahajan, A.: Presolving mixed-integer linear programs. Wiley Encyclopedia of Operations Research and Management Science (2010)

  • Malone, B., Kangas, K., Jarvisalo, M., Koivisto, M., Myllymäki, P.: Predicting the hardness of learning Bayesian networks. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence, (2014)

  • Mechellia, A., Penny, W.D., Pricea, C.J., Gitelman, D.R., Friston, K.J.: Effective connectivity and intersubject variability: using a multisubject network to test differences and commonalities. Neuroimage 17(3), 1459–1469 (2002)

    Article  Google Scholar 

  • Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34(3), 1436–1462 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Nemhauser, G.L., Wolsey, L.A.: Integer and Combinatorial Optimization. Wiley, New York (1988)

    Book  MATH  Google Scholar 

  • Niculescu-Mizil, A., Caruana, R.: Inductive transfer for Bayesian network structure learning. In: Proceedings of the 11th International Conference on Artificial Intelligence and Statistics, pp. 339–346 (2007)

  • Nie, S., Mauá, D.D., de Campos, C.P., Ji, Q.: Advances in learning Bayesian networks of bounded treewidth. Adv. Neur. In. 27, 2285–2293 (2014)

    Google Scholar 

  • Oates, C.J., Mukherjee, S.: Joint structure learning of multiple non-exchangeable networks. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, pp. 687–695 (2014)

  • Oates, C.J., Korkola, J., Gray, J.W., Mukherjee, S.: Joint estimation of multiple networks from time course data. Ann. Appl. Stat. 8(3), 1892–1919 (2014a)

    Article  MathSciNet  MATH  Google Scholar 

  • Oates, C.J., Carneiro da Costa, L., Nichols, T.: Towards a multi-subject analysis of neural connectivity. Neural Compt. 27, 151–170 (2015)

    Article  Google Scholar 

  • Oyen, D., Lane, T.: Leveraging domain knowledge in multitask bayesian network structure learning. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence (2012)

  • Oyen, D., Lane, T.: Bayesian discovery of multiple Bayesian networks via transfer learning. In: Proceedings of the 13th IEEE International Conference on Data Mining, pp. 577–586 (2013)

  • Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE T. Knowl. Data En. 22(10), 1345–1359 (2010)

    Article  Google Scholar 

  • Parviainen, P., Farahani, H.S., Lagergren, J.: Learning Bounded Tree-width Bayesian Networks using Integer Linear Programming Proceedings of the 17th International Conference on Artificial Intelligence and Statistics 33, pp. 751–759 (2014)

  • Penfold, C.A., Buchanan-Wollaston, V., Denby, K.J., Wild, D.L.: Nonparametric Bayesian inference for perturbed and orthologous gene regulatory networks. Bioinformatics 28(12), i233–i241 (2012)

    Article  Google Scholar 

  • Peters, J., Mooij, J.M., Janzing, D., Schölkopf, B.: Identifiability of causal graphs using functional models. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, pp. 589–598 (2011)

  • Peters, J., Bühlmann, P.: Identifiability of Gaussian structural equation models with equal error variances. Biometrika 101, 219–228 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Queen, C.M., Smith, J.Q.: Multiregression dynamic models. J. R. Stat. Soc. B 55(4), 849–870 (1993)

    MathSciNet  MATH  Google Scholar 

  • Sheehan, N.A., Bartlett, M., Cussens, J.: Improved maximum likelihood reconstruction of complex multi-generational pedigrees. Theor. Popul. Biol. 97, 11–19 (2014)

    Article  MATH  Google Scholar 

  • Silander, T., Myllymäki, P.: A simple approach to finding the globally optimal Bayesian network structure. In: Proceedings of the 22nd Conference on Artificial Intelligence, pp. 445–452 (2006)

  • Studený, M., Vomlel, J., Hemmecke, R.: A geometric view on learning Bayesian network structures. Int. J. Approx. Reason. 51(5), 578–586 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Studený, M., Haws, D.: On polyhedral approximations of polytopes for learning Bayesian networks. J. Algebraic Stat. 4(1), 59–92 (2013)

    Article  MathSciNet  Google Scholar 

  • Sugihara, G., Kaminaga, T., Sugishita, M.: Interindividual uniformity and variety of the “Writing center”: a functional MRI study. Neuroimage 32(4), 1837–1849 (2006)

    Article  Google Scholar 

  • Thiesson, B., Meek, C., Chickering, D. M., Heckerman, D.: Learning mixtures of Bayesian networks. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, pp. 504–513 (1998)

  • Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65(1), 31–78 (2006)

    Article  Google Scholar 

  • Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E., Yacoub, E., Ugurbil, K.: The WU-Minn human connectome project: an overview. Neuroimage 80, 62–79 (2013)

    Article  Google Scholar 

  • Werhli, A.V., Husmeier, D.: Gene regulatory network reconstruction by Bayesian integration of prior knowledge and/or different experimental conditions. J. Bioinform. Comput. Biol. 6(3), 543–572 (2008)

    Article  Google Scholar 

  • Wolsey, L.A.: Integer Programming. Wiley, New York (1998)

    MATH  Google Scholar 

  • Yajima, M., Telesca, D., Ji, Y., Müller, P.: Detecting differential patterns of interaction in molecular pathways. Biostatistics, kxu054 (2014)

  • Yuan, C., Malone, B.: Learning optimal Bayesian networks: a shortest path perspective. J. Artif. Intell. Res. 48, 23–65 (2013)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors are grateful to Dr. Ricardo Silva and two anonymous reviewers, whose feedback helped to improve the paper. CJO was supported by the Centre for Research in Statistical Methodology (CRiSM) EPSRC EP /D002060/1. JC was supported by the Medical Research Council (Project Grant G1002312). SM was supported by the UK Medical Research Council and is a recipient of a Royal Society Wolfson Research Merit Award. The authors are grateful to Lilia Carneiro da Costa and Tom Nichols who collaborated in the analysis of fMRI data and to Mark Bartlett who provided technical support with GOBNILP. The authors also thank Diane Oyen and several other colleagues who provided feedback on an earlier draft.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chris J. Oates.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 357 KB)

Appendix: Multiregression mynamical models

Appendix: Multiregression mynamical models

MDMs are a generalisation of BNs that model time series data and that, unlike BNs, are fully identifiable (i.e. the score equivalence classes are singletons). The MDM is defined on a multivariate time series that aims to identify the conditional independence structure among the variables over time (Queen and Smith 1993). In the MDM that we consider, a multivariate model for observable series \(\varvec{Y}_{1:P}^{(k)}(n)\), for subject k at time n is characterised by a contemporaneous DAG \(G^{(k)}\), with information shared across time only through evolution of the model parameters \(\varvec{\theta }_{G_i^{(k)}}^{(k)}(n)\). Formally, this model is described by the following observation equations and system equations:

$$\begin{aligned} Y_i^{(k)}(n)= & {} \mathbf {Y}_{G_i^{(k)}}^{(k)}(n)^T \varvec{\theta }_i^{(k)}(n) + \epsilon _i^{(k)}(n) \end{aligned}$$
(14)
$$\begin{aligned} \varvec{\theta }^{(k)}(n)= & {} \varvec{\varGamma }^{(k)}(n) \varvec{\theta }^{(k)} (n-1) + \mathbf {w}^{(k)}(n) \end{aligned}$$
(15)

where \(\varvec{\theta }^{(k)}(n)^T = (\varvec{\theta }_1^{(k)}(n)^T, \ldots , \varvec{\theta }_P^{(k)}(n)^T)\) is the concatenated parameter vector. Here the disturbance terms \(\epsilon _i^{(k)}(n) \sim N(0,V_ i^{(k)}(n))\) and \(\mathbf {w}^{(k)}(n)\sim N (\mathbf {0},\mathbf {W}^{(k)}(n))\) are both normally distributed and the hyper-parameters \(V_i^{(k)}(n)\), \(\varvec{\varGamma }^{(k)}(n)\), \(\varvec{W}^{(k)}(n)\) must be specified. The equations of the MDM can be viewed as a collection of nested univariate linear models, allowing the parameters to be estimated using Kalman filter recurrences over time (full details in the supplementary text).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oates, C.J., Smith, J.Q., Mukherjee, S. et al. Exact estimation of multiple directed acyclic graphs. Stat Comput 26, 797–811 (2016). https://doi.org/10.1007/s11222-015-9570-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-015-9570-9

Keywords

Navigation