Bayesian learning of Bayesian networks with informative priors

  • Nicos Angelopoulos
  • James Cussens


This paper presents and evaluates an approach to Bayesian model averaging where the models are Bayesian nets (BNs). A comprehensive study of the literature on structural priors for BNs is conducted. A number of prior distributions are defined using stochastic logic programs and the MCMC Metropolis-Hastings algorithm is used to (approximately) sample from the posterior. We use proposals which are tightly coupled to the priors which give rise to cheaply computable acceptance probabilities. Experiments using data generated from known BNs have been conducted to evaluate the method. The experiments used 6 different BNs and varied: the structural prior, the parameter prior, the Metropolis-Hasting proposal and the data size. Each experiment was repeated three times with different random seeds to test the robustness of the MCMC-produced results. Our results show that with effective priors (i) robust results are produced and (ii) informative priors improve results significantly.


Prior knowledge Bayesian inference Bayesian model averaging Markov chain Monte Carlo Loss functions Stochastic logic programs 

Mathematics Subject Classifications (2000)

68T05 68T27 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abramson, B., Brown, J., Murphy, A., Winker, R.L.: Hailfinder: a Bayesian system for forecasting severe weather. Int. J. Forecast. 12, 57–71 (1996)CrossRefGoogle Scholar
  2. 2.
    Acid, S., de Campos, L.M.: Searching for Bayesian network structures in the space of restricted acyclic partially directed graphs. J. Artif. Intell. Res. 18, 445–490 (2003)zbMATHGoogle Scholar
  3. 3.
    Andrieu, C., de Freitas, N., Doucet, A., Jordan, M.I.: An introduction to MCMC for machine learning. Mach. Learn. 50, 5–43 (2003)zbMATHCrossRefGoogle Scholar
  4. 4.
    Angelopoulos, N., Cussens, J.: Markov chain Monte Carlo using tree-based priors on model structure. In: Breese, J., Koller, D. (eds.) Proceedings of the Seventeenth Annual Conference on Uncertainty in Artificial Intelligence (UAI–2001), Seattle, August 2001. Morgan Kaufmann, San Francisco (2001)Google Scholar
  5. 5.
    Angelopoulos, N., Cussens, J.: Extended stochastic logic programs for informative priors over C&RTs. In: Camacho, R., King, R., Srinivasan, A. (eds.) Proceedings of the work-in-progress track of the Fourteenth International Conference on Inductive Logic Programming (ILP04), pp. 7–11, Porto, September 2004Google Scholar
  6. 6.
    Angelopoulos, N., Cussens, J.: On the implementation of MCMC proposals over stochastic logic programs. In: Colloquium on Implementation of Constraint and LOgic Programming Systems. Satellite workshop to ICLP’04, Saint-Malo, September 2004Google Scholar
  7. 7.
    Angelopoulos, N., Cussens, J.: Exploiting informative priors for Bayesian classification and regression trees. In: Proc. 19th International Joint Conference on AI (IJCAI-05), Edinburgh, August 2005Google Scholar
  8. 8.
    Angelopoulos, N., Cussens, J.: MCMCMS 0.3.4 User Guide. University of York (2005)Google Scholar
  9. 9.
    Angelopoulos, N., Cussens, J.: Tempering for Bayesian C&RT. In: Proceedings of the 22nd International Conference on Machine Learning (ICML05), Bonn, 7–11 August 2005Google Scholar
  10. 10.
    Beinlich, I.A., Suermondt, H.J., Chavez, R.M., Cooper, G.F.: The alarm monitoring system: a case study with two probabilistic inference techniques for belief networks. In: Proceedings of the European Conference on Artificial Intelligence in Medicine, pp. 247–256, London, 29–31 August 1989Google Scholar
  11. 11.
    Binder, J., Koller, D., Russell, S., Kanazawa, K.: Adaptive probabilistic networks with hidden variables. Mach. Learn. 29, 213–244 (1997)zbMATHCrossRefGoogle Scholar
  12. 12.
    Bøttcher, S.G., Dethlefsen, C.: Deal: a package for learning Bayesian networks. J. Stat. Softw. 8(20), 1–40 (2003)Google Scholar
  13. 13.
    Buntine, W.L.: Theory refinement of Bayesian networks. In: D’Ambrosio, B., Smets, P., Bonissone, P. (eds.) Proceedings of the Seventh Annual Conference on Uncertainty in Artificial Intelligence (UAI–1991), pp. 52–60, San Mateo, 13–15 July 1991Google Scholar
  14. 14.
    Cameron, P.J.: First-order logic. In: Beineke, L.W., Wilson R.J. (eds.) Graph Connections: Relationships between Graph Theory and other Areas of Mathematics, pp. 70–85. Clarendon, Oxford (1997)Google Scholar
  15. 15.
    Castelo, R., Kočka, T.: On inclusion-driven learning of Bayesian networks. J. Mach. Learn. Res. 4, 527–574 (2003)CrossRefGoogle Scholar
  16. 16.
    Cooper, G., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9, 309–347 (1992). Appeared as 1991 Technical Report KSL-91-02 for the Knowledge Systems Laboratory, Stanford University (also SMI-91-0355)zbMATHGoogle Scholar
  17. 17.
    Cussens, J.: Stochastic logic programs: sampling, inference and applications. In: Proc. UAI-00, pp. 115–122. Morgan Kaufmann, San Francisco (2000)Google Scholar
  18. 18.
    Cussens, J.: Parameter estimation in stochastic logic programs. Mach. Learn. 44(3), 245–271 (2001)zbMATHCrossRefGoogle Scholar
  19. 19.
    Dobra, A., Jones B., Hans, C., Nevins J., West, M.: Sparse graphical models for exploring gene expression data. J. Multivar. Anal. 90, 196–212 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Egeland, T., Mostad, P., Mevåg, B., Stenersen, M.: Beyond traditional paternity and identification cases. Selecting the most probable pedigree. Forensic Sci. Int. 110(1), 47–59 (2000)CrossRefGoogle Scholar
  21. 21.
    Feller, W.: An Introduction to Probability Theory and Its Applications, vol. 1, 3rd edn. Wiley, New York (1950)Google Scholar
  22. 22.
    Frege, G.: Begriffsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens (1879)Google Scholar
  23. 23.
    Friedman, N., Koller, D.: Being Bayesian about network structure: a Bayesian approach to structure discovery in Bayesian networks. Mach. Learn. 50, 95–126 (2003)zbMATHCrossRefGoogle Scholar
  24. 24.
    Gelman, A.: Parameterization and Bayesian modeling. J. Am. Stat. Assoc. 99(466), 537–545 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Gilks, W.R., Richardson, S., Spiegelhalter, D.J., (eds.).: Markov Chain Monte Carlo in Practice. Chapman & Hall, London (1996)zbMATHGoogle Scholar
  26. 26.
    Häggström, O.: Finite Markov Chains and Algorithmic Applications. London Mathematical Society Student Texts, vol. 52. Cambridge University Press, Cambridge (2002)zbMATHGoogle Scholar
  27. 27.
    Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20, 197–243 (1995). Also appears as Technical Report MSR-TR-94-09, Microsoft Research, March, 1994 (revised December, 1994)zbMATHGoogle Scholar
  28. 28.
    Heckerman, D., Chickering, D.M., Meek, C., Rounthwaite, R., Kadie, C.: Dependency networks for inference, collaborative filtering, and data visualization. J. Mach. Learn. Res. 1, 49–75 (2000)CrossRefGoogle Scholar
  29. 29.
    Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)zbMATHGoogle Scholar
  30. 30.
    Højsgaard, S., Thiesson, B.: BIFROST—block recursive models induced from relevant knowledge, observations, and statistical techniques. Comput. Stat. Data Anal. 19, 155–175 (1995)CrossRefGoogle Scholar
  31. 31.
    Howson, C., Urbach, P.: Scientific Reasoning: The Bayesian Approach. Open Court, La Salle (1989)Google Scholar
  32. 32.
    Koivisto, M., Sood, K.: Exact Bayesian structure discovery in Bayesian networks. J. Mach. Learn. Res. 5, 549–573 (2004)MathSciNetGoogle Scholar
  33. 33.
    Langseth, H., Nielsen, T.D.: Fusion of domain knowledge with data for structural learning in object oriented domains. J. Mach. Learn. Res. 4, 339–368 (2003)CrossRefMathSciNetGoogle Scholar
  34. 34.
    Laskey, K.B., Myers, J.W.: Population Markov chain Monte Carlo. Mach. Learn. 50, 175–196 (2003)zbMATHCrossRefGoogle Scholar
  35. 35.
    Lauritzen, S.L., Richardson, T.S.: Chain graph models and their causal interpretations. J. R. Stat. Soc. B 64(3), 321–361 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  36. 36.
    Lauritzen, S.L., Spiegelhalter, D.J.: Local computations with probabilities on graphical structures and their applications to expert systems. J. R. Stat. Soc. A 50(2), 157–224 (1988)zbMATHMathSciNetGoogle Scholar
  37. 37.
    Madigan, D., York, J.: Bayesian graphical models for discrete data. Int. Stat. Rev. 63, 215–232 (1995)zbMATHCrossRefGoogle Scholar
  38. 38.
    Madigan, D., Gavrin, J., Raftery, A.E.: Eliciting prior information to enhance the predictive performance of Bayesian graphical models. Commun. Stat. Theory Methods 24, 2271–2292 (1995). Appeared as 1994 Technical Report 270, University of Washington.zbMATHCrossRefMathSciNetGoogle Scholar
  39. 39.
    Madigan, D., Raftery, A.E.: Model selection and accounting for model uncertainty in graphical models using Occam’s window. J. Am. Stat. Assoc. 89, 1535–1546 (1994). First version was 1991 Technical Report 213, University of Washington.zbMATHCrossRefGoogle Scholar
  40. 40.
    Muggleton, S.: Stochastic logic programs. In: De Raedt, L. (ed.) Advances in Inductive Logic Programming. Frontiers in Artificial Intelligence and Applications, vol. 32, pp. 254–264. IOS, Amsterdam (1996)Google Scholar
  41. 41.
    Nilsson, U., Małuszyński, J.: Logic, Programming and Prolog, 2nd edn. Wiley, Chichester (1995)Google Scholar
  42. 42.
    Richardson, M., Domingos, P.: Learning with knowledge from multiple experts. In: Proceedings of the Twentieth International Conference on Machine Learning. Morgan Kaufmann, Washington, DC (2003)Google Scholar
  43. 43.
    Robert, C.P., Casella, R.: Monte Carlo Statistical Methods, 2nd edn. Springer, New York (2004)zbMATHGoogle Scholar
  44. 44.
    Sato, T., Kameya, Y.: Parameter learning of logic programs for symbolic-statistical modeling. J. Artif. Intell. Res. 15, 391–454 (2001)zbMATHMathSciNetGoogle Scholar
  45. 45.
    Segal, E., Pe’er, D., Regev, A., Koller, D., Friedman, N.: Learning module networks. J. Mach. Learn. Res. 6, 557–588 (2005)MathSciNetGoogle Scholar
  46. 46.
    Sheehan, N., Sorensen, D.: Graphical models for mapping continuous traits. In: Green, P.J., Hjort, N.L., Richardson, S. (eds.) Highly Structured Stochastic Systems, pp. 382–386. Oxford University Press, Oxford (2003)Google Scholar
  47. 47.
    Srinivas, S., Russell, S., Agogino, A.M.: Automated construction of sparse Bayesian networks from unstructured probabilistic models and domain information. In: Henrion, M., Schachter, R., Kanal, L., Flemmer, J. (eds.) Uncertainty in Artificial Intelligence: Proceedings of the Fifth Conference (UAI-1989), pp. 295–308. Elsevier Science, New York (1990)Google Scholar
  48. 48.
    Stephens, M., Donelly, P.: A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am. J. Hum. Genet. 73, 1162–1169 (2003)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  1. 1.School of Biological SciencesUniversity of EdinburghEdinburghUK
  2. 2.Department of Computer Science & York Centre for Complex Systems AnalysisUniversity of YorkYorkUK

Personalised recommendations