A Bayesian Approach to Causal Discovery

Heckerman, David; Meek, Christopher; Cooper, Gregory

doi:10.1007/3-540-33486-6_1

A Bayesian Approach to Causal Discovery

David Heckerman⁴,
Christopher Meek⁴ &
Gregory Cooper⁵

Chapter

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 194))

Abstract

We examine the Bayesian approach to the discovery of causal DAG models and compare it to the constraint-based approach. Both approaches rely on the Causal Markov condition, but the two differ significantly in theory and practice. An important difference between the approaches is that the constraint-based approach uses categorical information about conditional-independence constraints in the domain, whereas the Bayesian approach weighs the degree to which such constraints hold. As a result, the Bayesian approach has three distinct advantages over its constraint-based counterpart. One, conclusions derived from the Bayesian approach are not susceptible to incorrect categorical decisions about independence facts that can occur with data sets of finite size. Two, using the Bayesian approach, finer distinctions among model structures—both quantitative and qualitative—can be made. Three, information from several models can be combined to make better inferences and to better account for modeling uncertainty. In addition to describing the general Bayesian approach to causal discovery, we review approximation methods for missing data and hidden variables, and illustrate differences between the Bayesian and constraint-based methods using artificial and real examples.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aliferis C and Cooper G (1994) An evaluation of an algorithm for inductive learning of Bayesian belief networks using simulated data sets. In Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence Seattle WA pages 8–14. Morgan Kaufmann
Google Scholar
Becker S and LeCun Y (1989) Improving the convergence of backpropagation learning with second order methods In Proceedings of the 1988 Connectionist Models Summer School pages 29–37. Morgan Kaufmann
Google Scholar
Bernardo J and Smith A (1984) Bayesian Theory John Wiley and Sons New York
Google Scholar
Buntine W (1991) Theory refinement on Bayesian networks In Proceedings of Seventh Conference on Uncertainty in Artificial Intelligence Los Angeles CA pages 52–60. Morgan Kaufmann
Google Scholar
Buntine W (1994) Operations for learning with graphical models Journal of Artificial Intelligence Research 2:159–225
Google Scholar
Cheeseman P and Stutz J (1995) Bayesian classification (AutoClass): Theory and results. In Fayyad U, Piatesky-Shapiro G, Smyth P and Uthurusamy R editors, Advances in Knowledge Discovery and Data Mining pages 153–180. AAAI Press Menlo Park CA
Google Scholar
Chib S (1995) Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90:1313–1321.
Article MATH MathSciNet Google Scholar
Chickering D (1996a) Learning Bayesian networks is NP complete. In Fisher D and Lenz H editors, Learning from Data, pages 121–130 SpringerVerlag.
Google Scholar
Chickering D (1996b) Learning equivalence classes of Bayesian network structures. In Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence Portland OR Morgan Kaufmann
Google Scholar
Chickering D and Heckerman D (1997) Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables. Machine Learning, 29: 181–212.
Article MATH Google Scholar
Cooper G (1995) Causal discovery from data in the presence of selection bias. In Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics pages 140–150. Fort Lauderdale FL
Google Scholar
Statistics pages 140–150. Fort Lauderdale FL
Google Scholar
Cooper G and Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9:309–347.
MATH Google Scholar
Crawford S (1994) An application of the Laplace method to finite mixture distributions Journal of the American Statistical Association 89:259–267.
Article MATH MathSciNet Google Scholar
Dempster A Laird N and Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39:1–38.
MathSciNet MATH Google Scholar
DiCiccio T, Kass R, Raftery A and Wasserman L (July 1995) Computing Bayes factors by combining simulation and asymptotic approximations Technical Report 630, Department of Statistics, Carnegie Mellon University PA
Google Scholar
Geiger D and Heckerman D (1994) Learning Gaussian networks. In Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence Seattle WA pages 235–243. Morgan Kaufmann
Google Scholar
Geiger D and Heckerman D (Revised February 1995). A characterization of the Dirichlet distribution applicable to learning Bayesian networks. Technical Report MSR-TR-94-16, Microsoft Research Redmond WA
Google Scholar
Geiger D Heckerman D and Meek C (1996) Asymptotic model selection for directed networks with hidden variables. In Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR. Pages 283–290. Morgan Kaufmann
Google Scholar
Geman S and Geman D (1984) Stochastic relaxation Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721–742.
Article MATH Google Scholar
Haughton D (1988) On the choice of a model to t data from an exponential family. Annals of Statistics 16:342–355.
MATH MathSciNet Google Scholar
Heckerman D (1995) A Bayesian approach for learning causal network. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal QU pages 285–295. Morgan Kaufmann
Google Scholar
Heckerman D and Geiger D (1996) (Revised November 1996) Likelihoods and priors for Bayesian networks. Technical Report MSR-TR-95-54 Microsoft Research, Redmond, WA
Google Scholar
Heckerman D Geiger D and Chickering D (1995) Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20:197–243.
MATH Google Scholar
Herskovits E (1991) Computer-based probabilistic network construction. PhD thesis, Medical Information Sciences, Stanford University, Stanford CA
Google Scholar
Jensen F, Lauritzen S and Olesen K (1990) Bayesian updating in recursive graphical models by local computations. Computational Statisticals Quarterly 4:269–282.
MathSciNet Google Scholar
Kass R and Raftery A (1995) Bayes factors. Journal of the American Statistical Association, 90:773–795.
Article MATH Google Scholar
Kass R, Tierney L and Kadane J (1988) Asymptotics in Bayesian computation. In Bernardo J, DeGroot M, Lindley D and Smith A editors, Bayesian Statistics 3, pages 261–278, Oxford University Press
Google Scholar
Kass R and Wasserman L (1995) A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association 90:928–934.
Article MathSciNet MATH Google Scholar
Madigan D, Garvin J and Raftery A (1995) Eliciting prior information to enhance the predictive performance of Bayesian graphical models. Communications in Statistics Theory and Methods 24:2271–2292.
MathSciNet MATH Google Scholar
Madigan D, Raftery A, Volinsky C and Hoeting J. (1996) Bayesian model averaging. In Proceedings of the AAAI Workshop on Integrating Multiple Learned Models, Portland, OR.
Google Scholar
Madigan D and York J (1995) Bayesian graphical models for discrete data. International Statistical Review 63:215–232.
Article MATH Google Scholar
McLachlan G and Krishnan T (1997) The EM algorithm and extensions. Wiley
Google Scholar
Meek C. (1995) Strong completeness and faithfulness in Bayesian networks. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence. Montreal QU pages 411–418. Morgan Kaufmann
Google Scholar
Meng X and Rubin D (1991) Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association 86:899–909.
Article Google Scholar
Neal R (1993) Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1. Department of Computer Science, University of Toronto
Google Scholar
Raftery A (1995) Bayesian model selection in social research. In Marsden P, editor, Sociological Methodology. Blackwells, Cambridge MA.
Google Scholar
Raftery A (1996) Hypothesis testing and model selection chapter 10. Chapman and Hall
Google Scholar
Rissanen J (1987) Stochastic complexity (with discussion). Journal of the Royal Statistical Society Series B 49:223–239 and 253–265.
MATH MathSciNet Google Scholar
Robins J (1986) A new approach to causal inference in mortality studies with sustained exposure results. Mathematical Modelling 7:1393–1512.
Article MATH MathSciNet Google Scholar
Rubin D (1978) Bayesian inference for causal effects: The role of randomization. Annals of Statistics 6:34–58.
MATH MathSciNet Google Scholar
Russell S, Binder J, Koller D and Kanazawa K. (1995) Local learning in probabilistic networks with hidden variables. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence Montreal QU pages 1146–1152. Morgan Kaufmann San Mateo CA
Google Scholar
Scheines R, Spirtes P, Glymour C and Meek C (1994) Tetrad II Users Manual. Lawrence Erlbaum, Hillsdale NJ
Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Annals of Statistics, 6:461–464.
MATH MathSciNet Google Scholar
Sewell W and Shah V (1968) Social class parental encouragement and educational aspirations. American Journal of Sociology 73:559–572.
Article Google Scholar
Singh M and Valtorta M (1993) An algorithm for the construction of Bayesian network structures from data. In Proceedings of Ninth Conference on Uncertainty in Artificial Intelligence, Washington DC pages 259–265. Morgan Kaufmann
Google Scholar
Spiegelhalter D and Lauritzen S (1990) Sequential updating of conditional probabilities on directed graphical structures. Networks 20:579–605.
MathSciNet MATH Google Scholar
Spirtes P, Glymour C and Scheines R (1993) Causation, Prediction and Search. Springer Verlag New York
MATH Google Scholar
Spirtes P and Meek C (1995) Learning Bayesian networks with discrete variables from data In Proceedings of First International Conference on Knowledge Discovery and Data Mining Montreal QU Morgan Kaufmann
Google Scholar
Spirtes P, Meek C and Richardson T (1995) Causal inference in the presence of latent variables and selection bias. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence Montreal QU pages 499–506. Morgan Kaufmann
Google Scholar
Thiesson B (1997) Score and information for recursive exponential models with incomplete data. Technical report Institute of Electronic System, Aalborg University, Aalborg Denmark
Google Scholar
Verma T and Pearl J (1990) Equivalence and synthesis of causal models. In Proceedings of Sixth Conference on Uncertainty in Artificial Intelligence Boston MA pages 220–227. Morgan Kaufmann
Google Scholar
Winkler R (1967) The assessment of prior distributions in Bayesian analysis. American Statistical Association Journal 62:776–800.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, Redmond, WA, 98052-6399
David Heckerman & Christopher Meek
University of Pittsburgh, Pittsburgh, PA
Gregory Cooper

Authors

David Heckerman
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Meek
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Cooper
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Statistics and Applied Probability, University of California at Santa Barbara, South Hall, Santa Barbara, CA, 93106-3110, USA
Dawn E. Holmes
School of Electrical & Information Engineering, Knowledge-Based Intelligent Engineering, Mawson Lakes, SA, Adelaide, 5095, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Heckerman, D., Meek, C., Cooper, G. (2006). A Bayesian Approach to Causal Discovery. In: Holmes, D.E., Jain, L.C. (eds) Innovations in Machine Learning. Studies in Fuzziness and Soft Computing, vol 194. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33486-6_1

Download citation

DOI: https://doi.org/10.1007/3-540-33486-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30609-2
Online ISBN: 978-3-540-33486-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics