Skip to main content

A Bayesian Approach to Causal Discovery

  • Chapter

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 194))

Abstract

We examine the Bayesian approach to the discovery of causal DAG models and compare it to the constraint-based approach. Both approaches rely on the Causal Markov condition, but the two differ significantly in theory and practice. An important difference between the approaches is that the constraint-based approach uses categorical information about conditional-independence constraints in the domain, whereas the Bayesian approach weighs the degree to which such constraints hold. As a result, the Bayesian approach has three distinct advantages over its constraint-based counterpart. One, conclusions derived from the Bayesian approach are not susceptible to incorrect categorical decisions about independence facts that can occur with data sets of finite size. Two, using the Bayesian approach, finer distinctions among model structures—both quantitative and qualitative—can be made. Three, information from several models can be combined to make better inferences and to better account for modeling uncertainty. In addition to describing the general Bayesian approach to causal discovery, we review approximation methods for missing data and hidden variables, and illustrate differences between the Bayesian and constraint-based methods using artificial and real examples.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aliferis C and Cooper G (1994) An evaluation of an algorithm for inductive learning of Bayesian belief networks using simulated data sets. In Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence Seattle WA pages 8–14. Morgan Kaufmann

    Google Scholar 

  2. Becker S and LeCun Y (1989) Improving the convergence of backpropagation learning with second order methods In Proceedings of the 1988 Connectionist Models Summer School pages 29–37. Morgan Kaufmann

    Google Scholar 

  3. Bernardo J and Smith A (1984) Bayesian Theory John Wiley and Sons New York

    Google Scholar 

  4. Buntine W (1991) Theory refinement on Bayesian networks In Proceedings of Seventh Conference on Uncertainty in Artificial Intelligence Los Angeles CA pages 52–60. Morgan Kaufmann

    Google Scholar 

  5. Buntine W (1994) Operations for learning with graphical models Journal of Artificial Intelligence Research 2:159–225

    Google Scholar 

  6. Cheeseman P and Stutz J (1995) Bayesian classification (AutoClass): Theory and results. In Fayyad U, Piatesky-Shapiro G, Smyth P and Uthurusamy R editors, Advances in Knowledge Discovery and Data Mining pages 153–180. AAAI Press Menlo Park CA

    Google Scholar 

  7. Chib S (1995) Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90:1313–1321.

    Article  MATH  MathSciNet  Google Scholar 

  8. Chickering D (1996a) Learning Bayesian networks is NP complete. In Fisher D and Lenz H editors, Learning from Data, pages 121–130 SpringerVerlag.

    Google Scholar 

  9. Chickering D (1996b) Learning equivalence classes of Bayesian network structures. In Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence Portland OR Morgan Kaufmann

    Google Scholar 

  10. Chickering D and Heckerman D (1997) Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables. Machine Learning, 29: 181–212.

    Article  MATH  Google Scholar 

  11. Cooper G (1995) Causal discovery from data in the presence of selection bias. In Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics pages 140–150. Fort Lauderdale FL

    Google Scholar 

  12. Statistics pages 140–150. Fort Lauderdale FL

    Google Scholar 

  13. Cooper G and Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9:309–347.

    MATH  Google Scholar 

  14. Crawford S (1994) An application of the Laplace method to finite mixture distributions Journal of the American Statistical Association 89:259–267.

    Article  MATH  MathSciNet  Google Scholar 

  15. Dempster A Laird N and Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39:1–38.

    MathSciNet  MATH  Google Scholar 

  16. DiCiccio T, Kass R, Raftery A and Wasserman L (July 1995) Computing Bayes factors by combining simulation and asymptotic approximations Technical Report 630, Department of Statistics, Carnegie Mellon University PA

    Google Scholar 

  17. Geiger D and Heckerman D (1994) Learning Gaussian networks. In Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence Seattle WA pages 235–243. Morgan Kaufmann

    Google Scholar 

  18. Geiger D and Heckerman D (Revised February 1995). A characterization of the Dirichlet distribution applicable to learning Bayesian networks. Technical Report MSR-TR-94-16, Microsoft Research Redmond WA

    Google Scholar 

  19. Geiger D Heckerman D and Meek C (1996) Asymptotic model selection for directed networks with hidden variables. In Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR. Pages 283–290. Morgan Kaufmann

    Google Scholar 

  20. Geman S and Geman D (1984) Stochastic relaxation Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721–742.

    Article  MATH  Google Scholar 

  21. Haughton D (1988) On the choice of a model to t data from an exponential family. Annals of Statistics 16:342–355.

    MATH  MathSciNet  Google Scholar 

  22. Heckerman D (1995) A Bayesian approach for learning causal network. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal QU pages 285–295. Morgan Kaufmann

    Google Scholar 

  23. Heckerman D and Geiger D (1996) (Revised November 1996) Likelihoods and priors for Bayesian networks. Technical Report MSR-TR-95-54 Microsoft Research, Redmond, WA

    Google Scholar 

  24. Heckerman D Geiger D and Chickering D (1995) Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20:197–243.

    MATH  Google Scholar 

  25. Herskovits E (1991) Computer-based probabilistic network construction. PhD thesis, Medical Information Sciences, Stanford University, Stanford CA

    Google Scholar 

  26. Jensen F, Lauritzen S and Olesen K (1990) Bayesian updating in recursive graphical models by local computations. Computational Statisticals Quarterly 4:269–282.

    MathSciNet  Google Scholar 

  27. Kass R and Raftery A (1995) Bayes factors. Journal of the American Statistical Association, 90:773–795.

    Article  MATH  Google Scholar 

  28. Kass R, Tierney L and Kadane J (1988) Asymptotics in Bayesian computation. In Bernardo J, DeGroot M, Lindley D and Smith A editors, Bayesian Statistics 3, pages 261–278, Oxford University Press

    Google Scholar 

  29. Kass R and Wasserman L (1995) A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association 90:928–934.

    Article  MathSciNet  MATH  Google Scholar 

  30. Madigan D, Garvin J and Raftery A (1995) Eliciting prior information to enhance the predictive performance of Bayesian graphical models. Communications in Statistics Theory and Methods 24:2271–2292.

    MathSciNet  MATH  Google Scholar 

  31. Madigan D, Raftery A, Volinsky C and Hoeting J. (1996) Bayesian model averaging. In Proceedings of the AAAI Workshop on Integrating Multiple Learned Models, Portland, OR.

    Google Scholar 

  32. Madigan D and York J (1995) Bayesian graphical models for discrete data. International Statistical Review 63:215–232.

    Article  MATH  Google Scholar 

  33. McLachlan G and Krishnan T (1997) The EM algorithm and extensions. Wiley

    Google Scholar 

  34. Meek C. (1995) Strong completeness and faithfulness in Bayesian networks. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence. Montreal QU pages 411–418. Morgan Kaufmann

    Google Scholar 

  35. Meng X and Rubin D (1991) Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association 86:899–909.

    Article  Google Scholar 

  36. Neal R (1993) Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1. Department of Computer Science, University of Toronto

    Google Scholar 

  37. Raftery A (1995) Bayesian model selection in social research. In Marsden P, editor, Sociological Methodology. Blackwells, Cambridge MA.

    Google Scholar 

  38. Raftery A (1996) Hypothesis testing and model selection chapter 10. Chapman and Hall

    Google Scholar 

  39. Rissanen J (1987) Stochastic complexity (with discussion). Journal of the Royal Statistical Society Series B 49:223–239 and 253–265.

    MATH  MathSciNet  Google Scholar 

  40. Robins J (1986) A new approach to causal inference in mortality studies with sustained exposure results. Mathematical Modelling 7:1393–1512.

    Article  MATH  MathSciNet  Google Scholar 

  41. Rubin D (1978) Bayesian inference for causal effects: The role of randomization. Annals of Statistics 6:34–58.

    MATH  MathSciNet  Google Scholar 

  42. Russell S, Binder J, Koller D and Kanazawa K. (1995) Local learning in probabilistic networks with hidden variables. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence Montreal QU pages 1146–1152. Morgan Kaufmann San Mateo CA

    Google Scholar 

  43. Scheines R, Spirtes P, Glymour C and Meek C (1994) Tetrad II Users Manual. Lawrence Erlbaum, Hillsdale NJ

    Google Scholar 

  44. Schwarz G (1978) Estimating the dimension of a model. Annals of Statistics, 6:461–464.

    MATH  MathSciNet  Google Scholar 

  45. Sewell W and Shah V (1968) Social class parental encouragement and educational aspirations. American Journal of Sociology 73:559–572.

    Article  Google Scholar 

  46. Singh M and Valtorta M (1993) An algorithm for the construction of Bayesian network structures from data. In Proceedings of Ninth Conference on Uncertainty in Artificial Intelligence, Washington DC pages 259–265. Morgan Kaufmann

    Google Scholar 

  47. Spiegelhalter D and Lauritzen S (1990) Sequential updating of conditional probabilities on directed graphical structures. Networks 20:579–605.

    MathSciNet  MATH  Google Scholar 

  48. Spirtes P, Glymour C and Scheines R (1993) Causation, Prediction and Search. Springer Verlag New York

    MATH  Google Scholar 

  49. Spirtes P and Meek C (1995) Learning Bayesian networks with discrete variables from data In Proceedings of First International Conference on Knowledge Discovery and Data Mining Montreal QU Morgan Kaufmann

    Google Scholar 

  50. Spirtes P, Meek C and Richardson T (1995) Causal inference in the presence of latent variables and selection bias. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence Montreal QU pages 499–506. Morgan Kaufmann

    Google Scholar 

  51. Thiesson B (1997) Score and information for recursive exponential models with incomplete data. Technical report Institute of Electronic System, Aalborg University, Aalborg Denmark

    Google Scholar 

  52. Verma T and Pearl J (1990) Equivalence and synthesis of causal models. In Proceedings of Sixth Conference on Uncertainty in Artificial Intelligence Boston MA pages 220–227. Morgan Kaufmann

    Google Scholar 

  53. Winkler R (1967) The assessment of prior distributions in Bayesian analysis. American Statistical Association Journal 62:776–800.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Heckerman, D., Meek, C., Cooper, G. (2006). A Bayesian Approach to Causal Discovery. In: Holmes, D.E., Jain, L.C. (eds) Innovations in Machine Learning. Studies in Fuzziness and Soft Computing, vol 194. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33486-6_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-33486-6_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30609-2

  • Online ISBN: 978-3-540-33486-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics