Advertisement

Machine Learning

, Volume 83, Issue 3, pp 355–419 | Cite as

Non-homogeneous dynamic Bayesian networks for continuous data

  • Marco GrzegorczykEmail author
  • Dirk Husmeier
Article

Abstract

Classical dynamic Bayesian networks (DBNs) are based on the homogeneous Markov assumption and cannot deal with non-homogeneous temporal processes. Various approaches to relax the homogeneity assumption have recently been proposed. The present paper presents a combination of a Bayesian network with conditional probabilities in the linear Gaussian family, and a Bayesian multiple changepoint process, where the number and location of the changepoints are sampled from the posterior distribution with MCMC. Our work improves four aspects of an earlier conference paper: it contains a comprehensive and self-contained exposition of the methodology; it discusses the problem of spurious feedback loops in network reconstruction; it contains a comprehensive comparative evaluation of the network reconstruction accuracy on a set of synthetic and real-world benchmark problems, based on a novel discrete changepoint process; and it suggests new and improved MCMC schemes for sampling both the network structures and the changepoint configurations from the posterior distribution. The latter study compares RJMCMC, based on changepoint birth and death moves, with two dynamic programming schemes that were originally devised for Bayesian mixture models. We demonstrate the modifications that have to be made to allow for changing network structures, and the critical impact that the prior distribution on changepoint configurations has on the overall computational complexity.

Keywords

Dynamic Bayesian networks Non-homogeneity Multiple changepoint process Reversible jump Markov chain Monte Carlo (RJMCMC) Dynamic programming Receiver operator characteristics (ROC) curve Precision-recall (PR) curve Gene regulatory network Circadian regulation Arabidopsis 

References

  1. Ahmed, A., & Xing, E. P. (2009). Recovering time-varying networks of dependencies in social and biological studies. Proceedings of the National Academy of Sciences, 106, 11878–11883. CrossRefGoogle Scholar
  2. Alabadi, D., Oyama, T., Yanovsky, M. J., Harmon, F. G., Mas, P., & Kay, S. A. (2001). Reciprocal regulation between TOC1 and LHY/CCA1 within the Arabidopsis circadian clock. Science, 293, 880–883. CrossRefGoogle Scholar
  3. Brooks, S., & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphial Statistics, 7, 434–455. MathSciNetCrossRefGoogle Scholar
  4. Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and ROC curves. In Proceedings of the twenty-third international conference on machine learning (ICML) (pp. 233–240). New York: ACM. Google Scholar
  5. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B39, 1–38. MathSciNetGoogle Scholar
  6. Dougherty, M. K., Muller, J., Ritt, D. A., Zhou, M., Zhou, X. Z., Copeland, T. D., Conrads, T. P., Veenstra, T. D., Lu, K. P., & Morrison, D. K. (2005). Regulation of Raf-1 by direct feedback phosphorylation. Molecular Cell, 17, 215–224. CrossRefGoogle Scholar
  7. Edwards, K. D., Anderson, P. E., Hall, A., Salathia, N. S., Locke, J. C., Lynn, J. R., Straume, M., Smith, J. Q., & Millar, A. J. (2006). Flowering locus C mediates natural variation in the high-temperature response of the Arabidopsis circadian clock. The Plant Cell, 18, 639–650. CrossRefGoogle Scholar
  8. Fearnhead, P. (2006). Exact and efficient Bayesian inference for multiple changepoint problems. Statistics and Computing, 16, 203–213. MathSciNetCrossRefGoogle Scholar
  9. Friedman, N., & Koller, D. (2003). Being Bayesian about network structure. Machine Learning, 50, 95–126. zbMATHCrossRefGoogle Scholar
  10. Friedman, N., Linial, M., Nachman, I., & Pe’er, D. (2000). Using Bayesian networks to analyze expression data. Journal of Computational Biology, 7, 601–620. CrossRefGoogle Scholar
  11. Geiger, D., & Heckerman, D. (1994). Learning Gaussian networks. In Proceedings of the tenth conference on uncertainty in artificial intelligence (pp. 235–243). San Francisco: Morgan Kaufmann. Google Scholar
  12. Giudici, P., & Castelo, R. (2003). Improving Markov chain Monte Carlo model search for data mining. Machine Learning, 50, 127–158. zbMATHCrossRefGoogle Scholar
  13. Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711–732. MathSciNetzbMATHCrossRefGoogle Scholar
  14. Grzegorczyk, M., & Husmeier, D. (2008). Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move. Machine Learning, 71, 265–305. CrossRefGoogle Scholar
  15. Grzegorczyk, M., & Husmeier, D. (2009). Non-stationary continuous dynamic Bayesian networks. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (NIPS) (Vol. 22, pp. 682–690). Google Scholar
  16. Grzegorczyk, M., Husmeier, D., Edwards, K., Ghazal, P., & Millar, A. (2008). Modelling non-stationary gene regulatory processes with a non-homogeneous Bayesian network and the allocation sampler. Bioinformatics, 24, 2071–2078. CrossRefGoogle Scholar
  17. Grzegorczyk, M., Rahnenführer, J., & Husmeier, D. (2010). Modelling non-stationary dynamic gene regulatory processes with the BGM model. Computational Statistics. doi: 10.1007/s00180-010-0201-9. Google Scholar
  18. Hartemink, A. J. (2001) Principled computational methods for the validation and discovery of genetic regulatory networks. Ph.D. thesis, MIT. Google Scholar
  19. Heckerman, D., & Geiger, D. (1995). Learning Bayesian networks: A unification for discrete and Gaussian domains. In Proceedings of the 11th annual conference on uncertainty in artificial intelligence (UAI-95) (pp. 274–82). San Francisco: Morgan Kaufmann. Google Scholar
  20. Kikis, E., Khanna, R., & Quail, P. (2005). ELF4 is a phytochrome-regulated component of a negative-feedback loop involving the central oscillator components CCA1 and LHY. The Plant Journal, 44, 300–313. CrossRefGoogle Scholar
  21. Ko, Y., Zhai, C., & Rodriguez-Zas, S. (2007). Inference of gene pathways using Gaussian mixture models. In BIBM International conference on bioinformatics and biomedicine, Fremont, CA (pp. 362–367). Google Scholar
  22. Kolar, M., Song, L., & Xing, E. (2009). Sparsistent learning of varying-coefficient models with structural changes. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (NIPS) (pp. 1006–1014). Google Scholar
  23. Lèbre, S. (2007) Stochastic process analysis for genomics and dynamic Bayesian networks inference. Ph.D. thesis, Université d‘Evry-Val-d‘Essonne, France. Google Scholar
  24. Lèbre, S., Becq, J., Devaux, F., Lelandais, G., & Stumpf, M. (2010). Statistical inference of the time-varying structure of gene-regulation networks. BMC Systems Biology, 4 (130). Google Scholar
  25. Lim, W., Wang, K., Lefebvre, C., & Califano, A. (2007). Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks. Bioinformatics, 23, i282–i288. CrossRefGoogle Scholar
  26. Locke, J., Southern, M., Kozma-Bognar, L., Hibberd, V., Brown, P., Turner, M., & Millar, A. (2005) Extension of a genetic network model by iterative experimentation and mathematical analysis. Molecular Systems Biology, 1 (online). Google Scholar
  27. Madigan, D., & York, J. (1995). Bayesian graphical models for discrete data. International Statistical Review, 63, 215–232. zbMATHCrossRefGoogle Scholar
  28. McClung, C. R. (2006). Plant circadian rhythms. Plant Cell, 18, 792–803. CrossRefGoogle Scholar
  29. Miwa, K., Serikawa, M., Suzuki, S., Kondo, T., & Oyama, T. (2006). Conserved expression profiles of circadian clock-related genes in two lemna species showing long-day and short-day photoperiodic flowering responses. Plant and Cell Physiology, 47, 601–612. CrossRefGoogle Scholar
  30. Miwa, K., Ito, S., Nakamichi, N., Mizoguchi, T., Niinuma, K., Yamashino, T., & Mizuno, T. (2007). Genetic linkages of the circadian clock-associated genes, TOC1, CCA1 and LHY, in the photoperiodic control of flowering time in Arabidopsis thaliana. Plant and Cell Physiology, 48, 925–937. CrossRefGoogle Scholar
  31. Mockler, T., Michael, T., Priest, H., Shen, R., Sullivan, C., Givan, S., McEntee, C., Kay, S., & Chory, J. (2007). The diurnal project: Diurnal and circadian expression profiling, model-based pattern matching and promoter analysis. Cold Spring Harbor Symposia on Quantitative Biology, 72, 353–363. CrossRefGoogle Scholar
  32. Nobile, A., & Fearnside, A. (2007). Bayesian finite mixtures with an unknown number of components: The allocation sampler. Statistics and Computing, 17, 147–162. MathSciNetCrossRefGoogle Scholar
  33. Robinson, J. W., & Hartemink, A. J. (2009). Non-stationary dynamic Bayesian networks. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems (NIPS) (Vol. 21, pp. 1369–1376). San Mateo: Morgan Kaufmann. Google Scholar
  34. Rogers, S., & Girolami, M. (2005). A Bayesian regression approach to the inference of regulatory networks from gene expression data. Bioinformatics, 21, 3131–3137. CrossRefGoogle Scholar
  35. Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D. A., & Nolan, G. P. (2005). Protein-signaling networks derived from multiparameter single-cell data. Science, 308, 523–529. CrossRefGoogle Scholar
  36. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464. MathSciNetzbMATHCrossRefGoogle Scholar
  37. Shen-Orr, S. S., Milo, R., Mangan, S., & Alon, U. (2002). Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics, 31, 64–68. CrossRefGoogle Scholar
  38. Smith, V. A., Yu, J., Smulders, T. V., Hartemink, A. J., & Jarvi, E. D. (2006). Computational inference of neural information flow networks. PLoS Computational Biology, 2, 1436–1449. CrossRefGoogle Scholar
  39. Talih, M., & Hengartner, N. (2005). Structural learning with time-varying components: Tracking the cross-section of financial time series. Journal of the Royal Statistical Society B, 67, 321–341. MathSciNetzbMATHCrossRefGoogle Scholar
  40. Werhli, A. V., & Husmeier, D. (2008). Gene regulatory network reconstruction by Bayesian integration of prior knowledge and/or different experimental conditions. Journal of Bioinformatics and Computational Biology, 6, 543–572. CrossRefGoogle Scholar
  41. Xuan, X., & Murphy, K. (2007). Modeling changing dependency structure in multivariate time series. In Z. Ghahramani (Ed.), Proceedings of the 24th annual international conference on machine learning (ICML 2007) (pp. 1055–1062). New York: Omnipress. CrossRefGoogle Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.Department of StatisticsTU Dortmund UniversityDortmundGermany
  2. 2.Biomathematics and Statistics Scotland (BioSS)JCMBEdinburghUK

Personalised recommendations