Abstract
We examine the Bayesian approach to the discovery of causal DAG models and compare it to the constraint-based approach. Both approaches rely on the Causal Markov condition, but the two differ significantly in theory and practice. An important difference between the approaches is that the constraint-based approach uses categorical information about conditional-independence constraints in the domain, whereas the Bayesian approach weighs the degree to which such constraints hold. As a result, the Bayesian approach has three distinct advantages over its constraint-based counterpart. One, conclusions derived from the Bayesian approach are not susceptible to incorrect categorical decisions about independence facts that can occur with data sets of finite size. Two, using the Bayesian approach, finer distinctions among model structures—both quantitative and qualitative—can be made. Three, information from several models can be combined to make better inferences and to better account for modeling uncertainty. In addition to describing the general Bayesian approach to causal discovery, we review approximation methods for missing data and hidden variables, and illustrate differences between the Bayesian and constraint-based methods using artificial and real examples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aliferis C and Cooper G (1994) An evaluation of an algorithm for inductive learning of Bayesian belief networks using simulated data sets. In Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence Seattle WA pages 8–14. Morgan Kaufmann
Becker S and LeCun Y (1989) Improving the convergence of backpropagation learning with second order methods In Proceedings of the 1988 Connectionist Models Summer School pages 29–37. Morgan Kaufmann
Bernardo J and Smith A (1984) Bayesian Theory John Wiley and Sons New York
Buntine W (1991) Theory refinement on Bayesian networks In Proceedings of Seventh Conference on Uncertainty in Artificial Intelligence Los Angeles CA pages 52–60. Morgan Kaufmann
Buntine W (1994) Operations for learning with graphical models Journal of Artificial Intelligence Research 2:159–225
Cheeseman P and Stutz J (1995) Bayesian classification (AutoClass): Theory and results. In Fayyad U, Piatesky-Shapiro G, Smyth P and Uthurusamy R editors, Advances in Knowledge Discovery and Data Mining pages 153–180. AAAI Press Menlo Park CA
Chib S (1995) Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90:1313–1321.
Chickering D (1996a) Learning Bayesian networks is NP complete. In Fisher D and Lenz H editors, Learning from Data, pages 121–130 SpringerVerlag.
Chickering D (1996b) Learning equivalence classes of Bayesian network structures. In Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence Portland OR Morgan Kaufmann
Chickering D and Heckerman D (1997) Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables. Machine Learning, 29: 181–212.
Cooper G (1995) Causal discovery from data in the presence of selection bias. In Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics pages 140–150. Fort Lauderdale FL
Statistics pages 140–150. Fort Lauderdale FL
Cooper G and Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9:309–347.
Crawford S (1994) An application of the Laplace method to finite mixture distributions Journal of the American Statistical Association 89:259–267.
Dempster A Laird N and Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39:1–38.
DiCiccio T, Kass R, Raftery A and Wasserman L (July 1995) Computing Bayes factors by combining simulation and asymptotic approximations Technical Report 630, Department of Statistics, Carnegie Mellon University PA
Geiger D and Heckerman D (1994) Learning Gaussian networks. In Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence Seattle WA pages 235–243. Morgan Kaufmann
Geiger D and Heckerman D (Revised February 1995). A characterization of the Dirichlet distribution applicable to learning Bayesian networks. Technical Report MSR-TR-94-16, Microsoft Research Redmond WA
Geiger D Heckerman D and Meek C (1996) Asymptotic model selection for directed networks with hidden variables. In Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR. Pages 283–290. Morgan Kaufmann
Geman S and Geman D (1984) Stochastic relaxation Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721–742.
Haughton D (1988) On the choice of a model to t data from an exponential family. Annals of Statistics 16:342–355.
Heckerman D (1995) A Bayesian approach for learning causal network. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal QU pages 285–295. Morgan Kaufmann
Heckerman D and Geiger D (1996) (Revised November 1996) Likelihoods and priors for Bayesian networks. Technical Report MSR-TR-95-54 Microsoft Research, Redmond, WA
Heckerman D Geiger D and Chickering D (1995) Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20:197–243.
Herskovits E (1991) Computer-based probabilistic network construction. PhD thesis, Medical Information Sciences, Stanford University, Stanford CA
Jensen F, Lauritzen S and Olesen K (1990) Bayesian updating in recursive graphical models by local computations. Computational Statisticals Quarterly 4:269–282.
Kass R and Raftery A (1995) Bayes factors. Journal of the American Statistical Association, 90:773–795.
Kass R, Tierney L and Kadane J (1988) Asymptotics in Bayesian computation. In Bernardo J, DeGroot M, Lindley D and Smith A editors, Bayesian Statistics 3, pages 261–278, Oxford University Press
Kass R and Wasserman L (1995) A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association 90:928–934.
Madigan D, Garvin J and Raftery A (1995) Eliciting prior information to enhance the predictive performance of Bayesian graphical models. Communications in Statistics Theory and Methods 24:2271–2292.
Madigan D, Raftery A, Volinsky C and Hoeting J. (1996) Bayesian model averaging. In Proceedings of the AAAI Workshop on Integrating Multiple Learned Models, Portland, OR.
Madigan D and York J (1995) Bayesian graphical models for discrete data. International Statistical Review 63:215–232.
McLachlan G and Krishnan T (1997) The EM algorithm and extensions. Wiley
Meek C. (1995) Strong completeness and faithfulness in Bayesian networks. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence. Montreal QU pages 411–418. Morgan Kaufmann
Meng X and Rubin D (1991) Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association 86:899–909.
Neal R (1993) Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1. Department of Computer Science, University of Toronto
Raftery A (1995) Bayesian model selection in social research. In Marsden P, editor, Sociological Methodology. Blackwells, Cambridge MA.
Raftery A (1996) Hypothesis testing and model selection chapter 10. Chapman and Hall
Rissanen J (1987) Stochastic complexity (with discussion). Journal of the Royal Statistical Society Series B 49:223–239 and 253–265.
Robins J (1986) A new approach to causal inference in mortality studies with sustained exposure results. Mathematical Modelling 7:1393–1512.
Rubin D (1978) Bayesian inference for causal effects: The role of randomization. Annals of Statistics 6:34–58.
Russell S, Binder J, Koller D and Kanazawa K. (1995) Local learning in probabilistic networks with hidden variables. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence Montreal QU pages 1146–1152. Morgan Kaufmann San Mateo CA
Scheines R, Spirtes P, Glymour C and Meek C (1994) Tetrad II Users Manual. Lawrence Erlbaum, Hillsdale NJ
Schwarz G (1978) Estimating the dimension of a model. Annals of Statistics, 6:461–464.
Sewell W and Shah V (1968) Social class parental encouragement and educational aspirations. American Journal of Sociology 73:559–572.
Singh M and Valtorta M (1993) An algorithm for the construction of Bayesian network structures from data. In Proceedings of Ninth Conference on Uncertainty in Artificial Intelligence, Washington DC pages 259–265. Morgan Kaufmann
Spiegelhalter D and Lauritzen S (1990) Sequential updating of conditional probabilities on directed graphical structures. Networks 20:579–605.
Spirtes P, Glymour C and Scheines R (1993) Causation, Prediction and Search. Springer Verlag New York
Spirtes P and Meek C (1995) Learning Bayesian networks with discrete variables from data In Proceedings of First International Conference on Knowledge Discovery and Data Mining Montreal QU Morgan Kaufmann
Spirtes P, Meek C and Richardson T (1995) Causal inference in the presence of latent variables and selection bias. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence Montreal QU pages 499–506. Morgan Kaufmann
Thiesson B (1997) Score and information for recursive exponential models with incomplete data. Technical report Institute of Electronic System, Aalborg University, Aalborg Denmark
Verma T and Pearl J (1990) Equivalence and synthesis of causal models. In Proceedings of Sixth Conference on Uncertainty in Artificial Intelligence Boston MA pages 220–227. Morgan Kaufmann
Winkler R (1967) The assessment of prior distributions in Bayesian analysis. American Statistical Association Journal 62:776–800.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Heckerman, D., Meek, C., Cooper, G. (2006). A Bayesian Approach to Causal Discovery. In: Holmes, D.E., Jain, L.C. (eds) Innovations in Machine Learning. Studies in Fuzziness and Soft Computing, vol 194. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33486-6_1
Download citation
DOI: https://doi.org/10.1007/3-540-33486-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30609-2
Online ISBN: 978-3-540-33486-6
eBook Packages: EngineeringEngineering (R0)