Abstract
This paper considers structure learning for multiple related directed acyclic graph (DAG) models. Building on recent developments in exact estimation of DAGs using integer linear programming (ILP), we present an ILP approach for joint estimation over multiple DAGs. Unlike previous work, we do not require that the vertices in each DAG share a common ordering. Furthermore, we allow for (potentially unknown) dependency structure between the DAGs. Results are presented on both simulated data and fMRI data obtained from multiple subjects.
Similar content being viewed by others
References
Achterberg, T.: SCIP: solving constraint integer programs. Math Program Comput 1(1), 1–41 (2009)
Bartlett, M., Cussens, J.: Advances in Bayesian network learning using integer programming. In: Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence, pp. 182–191 (2013)
Berg, J., Järvisalo, M., Malone, B.: Learning optimal bounded treewidth Bayesian networks via maximum satisfiability. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics 33, pp. 86–95 (2014)
Chickering, D.M.: Optimal structure identification with greedy search. J. Mach. Learn. Res. 3, 507–554 (2003)
Costa, L., Smith, J.Q., Nicholls, T., Cussens, J., Duff, E.P., Makin, T.R.: Searching multiregression dynamic models of resting-state fMRI networks using integer programming. Bayesian Anal., to appear (2015)
Cowell, R.G.: Efficient maximum likelihood pedigree reconstruction. Theor. Popul. Biol. 76, 285–291 (2009)
Cussens, J.: Maximum likelihood pedigree reconstruction using integer programming. In: Proceedings of the Workshop on Constraint Based Methods for Bioinformatics (WCB-10), Edinburgh (2010)
Cussens, J.: Bayesian network learning with cutting planes. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, pp. 153–160 (2011)
Danaher, P., Wang, P., Witten, D.M.: The joint graphical lasso for inverse covariance estimation across multiple classes. J. R. Stat. Soc. B 76(2), 373–397 (2014)
De Campos, C.P., Ji, Q.: Efficient structure learning of Bayesian networks using constraints. J. Mach. Learn. Res. 12, 663–689 (2011)
Ellis, B., Wong, W.H.: Learning causal Bayesian network structures from experimental data. J. Am. Stat. Assoc. 103(482), 778–789 (2008)
Friedman, N., Koller, D.: Being Bayesian about network structure: a Bayesian approach to structure discovery in Bayesian networks. Mach. Learn. 50(1–2), 95–126 (2003)
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)
Friston, K.J.: Functional and effective connectivity: a review. Brain Connect. 1(1), 13–36 (2011)
He, Y., Jia, J., Yu, B.: Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs. Ann. Stat. 41(4), 1742–1779 (2013)
Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)
Hill, S., Lu, Y., Molina, J., Heiser, L.M., Spellman, P.T., Speed, T.P., Gray, J.W., Mills, G.B., Mukherjee, S.: Bayesian inference of signaling network topology in a cancer cell line. Bioinformatics 28(21), 2804–2810 (2012)
Jaakkola, T., Sontag, D., Globerson, A., Meila, M.: Learning Bayesian network structure using LP relaxations. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, pp. 358–365 (2010)
Lee, S.Y.: Structural Equation Modeling: A Bayesian Approach. Wiley, New York (2007)
Li, J., Wang, Z.J., Palmer, S.J., McKeown, M.J.: Dynamic Bayesian network modeling of fMRI: a comparison of group-analysis methods. Neuroimage 41(2), 398–407 (2008)
Loh, P.-L., Wainwright, M.J.: Structure estimation for discrete graphical models: generalized covariance matrices and their inverses. Ann. Stat. 41(6), 3022–3049 (2013)
Luis, R., Sucar, L.E., Morales, E.F.: Inductive transfer for learning Bayesian networks. Mach. Learn. 79(1–2), 227–255 (2010)
Mahajan, A.: Presolving mixed-integer linear programs. Wiley Encyclopedia of Operations Research and Management Science (2010)
Malone, B., Kangas, K., Jarvisalo, M., Koivisto, M., Myllymäki, P.: Predicting the hardness of learning Bayesian networks. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence, (2014)
Mechellia, A., Penny, W.D., Pricea, C.J., Gitelman, D.R., Friston, K.J.: Effective connectivity and intersubject variability: using a multisubject network to test differences and commonalities. Neuroimage 17(3), 1459–1469 (2002)
Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34(3), 1436–1462 (2006)
Nemhauser, G.L., Wolsey, L.A.: Integer and Combinatorial Optimization. Wiley, New York (1988)
Niculescu-Mizil, A., Caruana, R.: Inductive transfer for Bayesian network structure learning. In: Proceedings of the 11th International Conference on Artificial Intelligence and Statistics, pp. 339–346 (2007)
Nie, S., Mauá, D.D., de Campos, C.P., Ji, Q.: Advances in learning Bayesian networks of bounded treewidth. Adv. Neur. In. 27, 2285–2293 (2014)
Oates, C.J., Mukherjee, S.: Joint structure learning of multiple non-exchangeable networks. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, pp. 687–695 (2014)
Oates, C.J., Korkola, J., Gray, J.W., Mukherjee, S.: Joint estimation of multiple networks from time course data. Ann. Appl. Stat. 8(3), 1892–1919 (2014a)
Oates, C.J., Carneiro da Costa, L., Nichols, T.: Towards a multi-subject analysis of neural connectivity. Neural Compt. 27, 151–170 (2015)
Oyen, D., Lane, T.: Leveraging domain knowledge in multitask bayesian network structure learning. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence (2012)
Oyen, D., Lane, T.: Bayesian discovery of multiple Bayesian networks via transfer learning. In: Proceedings of the 13th IEEE International Conference on Data Mining, pp. 577–586 (2013)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE T. Knowl. Data En. 22(10), 1345–1359 (2010)
Parviainen, P., Farahani, H.S., Lagergren, J.: Learning Bounded Tree-width Bayesian Networks using Integer Linear Programming Proceedings of the 17th International Conference on Artificial Intelligence and Statistics 33, pp. 751–759 (2014)
Penfold, C.A., Buchanan-Wollaston, V., Denby, K.J., Wild, D.L.: Nonparametric Bayesian inference for perturbed and orthologous gene regulatory networks. Bioinformatics 28(12), i233–i241 (2012)
Peters, J., Mooij, J.M., Janzing, D., Schölkopf, B.: Identifiability of causal graphs using functional models. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, pp. 589–598 (2011)
Peters, J., Bühlmann, P.: Identifiability of Gaussian structural equation models with equal error variances. Biometrika 101, 219–228 (2014)
Queen, C.M., Smith, J.Q.: Multiregression dynamic models. J. R. Stat. Soc. B 55(4), 849–870 (1993)
Sheehan, N.A., Bartlett, M., Cussens, J.: Improved maximum likelihood reconstruction of complex multi-generational pedigrees. Theor. Popul. Biol. 97, 11–19 (2014)
Silander, T., Myllymäki, P.: A simple approach to finding the globally optimal Bayesian network structure. In: Proceedings of the 22nd Conference on Artificial Intelligence, pp. 445–452 (2006)
Studený, M., Vomlel, J., Hemmecke, R.: A geometric view on learning Bayesian network structures. Int. J. Approx. Reason. 51(5), 578–586 (2010)
Studený, M., Haws, D.: On polyhedral approximations of polytopes for learning Bayesian networks. J. Algebraic Stat. 4(1), 59–92 (2013)
Sugihara, G., Kaminaga, T., Sugishita, M.: Interindividual uniformity and variety of the “Writing center”: a functional MRI study. Neuroimage 32(4), 1837–1849 (2006)
Thiesson, B., Meek, C., Chickering, D. M., Heckerman, D.: Learning mixtures of Bayesian networks. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, pp. 504–513 (1998)
Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65(1), 31–78 (2006)
Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E., Yacoub, E., Ugurbil, K.: The WU-Minn human connectome project: an overview. Neuroimage 80, 62–79 (2013)
Werhli, A.V., Husmeier, D.: Gene regulatory network reconstruction by Bayesian integration of prior knowledge and/or different experimental conditions. J. Bioinform. Comput. Biol. 6(3), 543–572 (2008)
Wolsey, L.A.: Integer Programming. Wiley, New York (1998)
Yajima, M., Telesca, D., Ji, Y., Müller, P.: Detecting differential patterns of interaction in molecular pathways. Biostatistics, kxu054 (2014)
Yuan, C., Malone, B.: Learning optimal Bayesian networks: a shortest path perspective. J. Artif. Intell. Res. 48, 23–65 (2013)
Acknowledgments
The authors are grateful to Dr. Ricardo Silva and two anonymous reviewers, whose feedback helped to improve the paper. CJO was supported by the Centre for Research in Statistical Methodology (CRiSM) EPSRC EP /D002060/1. JC was supported by the Medical Research Council (Project Grant G1002312). SM was supported by the UK Medical Research Council and is a recipient of a Royal Society Wolfson Research Merit Award. The authors are grateful to Lilia Carneiro da Costa and Tom Nichols who collaborated in the analysis of fMRI data and to Mark Bartlett who provided technical support with GOBNILP. The authors also thank Diane Oyen and several other colleagues who provided feedback on an earlier draft.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: Multiregression mynamical models
Appendix: Multiregression mynamical models
MDMs are a generalisation of BNs that model time series data and that, unlike BNs, are fully identifiable (i.e. the score equivalence classes are singletons). The MDM is defined on a multivariate time series that aims to identify the conditional independence structure among the variables over time (Queen and Smith 1993). In the MDM that we consider, a multivariate model for observable series \(\varvec{Y}_{1:P}^{(k)}(n)\), for subject k at time n is characterised by a contemporaneous DAG \(G^{(k)}\), with information shared across time only through evolution of the model parameters \(\varvec{\theta }_{G_i^{(k)}}^{(k)}(n)\). Formally, this model is described by the following observation equations and system equations:
where \(\varvec{\theta }^{(k)}(n)^T = (\varvec{\theta }_1^{(k)}(n)^T, \ldots , \varvec{\theta }_P^{(k)}(n)^T)\) is the concatenated parameter vector. Here the disturbance terms \(\epsilon _i^{(k)}(n) \sim N(0,V_ i^{(k)}(n))\) and \(\mathbf {w}^{(k)}(n)\sim N (\mathbf {0},\mathbf {W}^{(k)}(n))\) are both normally distributed and the hyper-parameters \(V_i^{(k)}(n)\), \(\varvec{\varGamma }^{(k)}(n)\), \(\varvec{W}^{(k)}(n)\) must be specified. The equations of the MDM can be viewed as a collection of nested univariate linear models, allowing the parameters to be estimated using Kalman filter recurrences over time (full details in the supplementary text).
Rights and permissions
About this article
Cite this article
Oates, C.J., Smith, J.Q., Mukherjee, S. et al. Exact estimation of multiple directed acyclic graphs. Stat Comput 26, 797–811 (2016). https://doi.org/10.1007/s11222-015-9570-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-015-9570-9