# A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships

- 579 Downloads
- 60 Citations

## Abstract

This paper presents a simple, efficient computer-based method for discovering causal relationships from databases that contain observational data. Observational data is passively observed, as contrasted with experimental data. Most of the databases available for data mining are observational. There is great potential for mining such databases to discover causal relationships. We illustrate how observational data can constrain the causal relationships among measured variables, sometimes to the point that we can conclude that one variable is causing another variable. The presentation here is based on a constraint-based approach to causal discovery. A primary purpose of this paper is to present the constraint-based causal discovery method in the simplest possible fashion in order to (1) readily convey the basic ideas that underlie more complex constraint-based causal discovery techniques, and (2) permit interested readers to rapidly program and apply the method to their own databases, as a start toward using more elaborate causal discovery algorithms.

## Preview

Unable to display preview. Download preview PDF.

## References

- Aliferis, C.F. and Cooper, G.F. 1994. An evaluation of an algorithm for inductive learning of Bayesian belief networks using simulated data sets. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 8–14.Google Scholar
- Almond, R.G., 1997. Web page on Software for Learning Belief Networks from Data, http://bayes.stat.washington.edu/almond/belfit.html#BNG.Google Scholar
- Bishop, Y., Fienberg, S., and Holland, P. 1975. Discrete Multivariate Analysis. Cambridge, MA: MIT Press.Google Scholar
- Bouckaert, R. 1995. Bayesian belief networks: From construction to inference, Doctoral dissertation, University of Utrecht, Utrecht, Netherlands.Google Scholar
- Castillo, E., Gutierrez, J.M., and Hadi, A.S. 1997. Expert Systems and Probabilistic Network Models. New York: Spring-Verlag.Google Scholar
- Chickering, D.M. and Heckerman, D. 1996. Efficient approximations for the marginal likelihood of incomplete data given a Bayesian network. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 158–168.Google Scholar
- Cooper, G.F. 1995. Causal discovery from data in the presence of selection bias. Proceedings of the Workshop on Artificial Intelligence and Statistics, pp. 140–150.Google Scholar
- Cooper, G.F. and Herskovits, E. 1992. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347.Google Scholar
- Geiger, D., Verma, T., and Pearl, J. 1990. Identifying independence in Bayesian networks. Networks 20:507–534.Google Scholar
- Heckerman, D. 1996. A tutorial on learning with Bayesian networks, Microsoft Research Report MSR-TR-95-06(available at http://www.research.microsoft.com/ research/dtg/heckerma/heckerma.html).Google Scholar
- Heckerman, D., Geiger, D., and Chickering, D. 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20:197–243.Google Scholar
- Herskovits, E.H. 1991. Computer-based probabilistic-network construction, Doctoral dissertation, Medical Information Sciences, Stanford University.Google Scholar
- Jensen, F.V. 1996. An Introduction to Bayesian Networks. New York: Springer-Verlag.Google Scholar
- Little, R.J.A. and Rubin, D.B. 1987. Statistical Analysis with Missing Data. New York: John Wiley & Sons.Google Scholar
- Meek, C. 1995. Strong completeness and faithfulness in Bayesian networks. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 411–418.Google Scholar
- Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Morgan Kaufmann.Google Scholar
- Pearl, J. 1994. Causal diagrams for empirical research, Report R-218-L, Computer Science Department, University of California at Los Angeles.Google Scholar
- Pearl, J. and Verma, T.S. 1991. A theory of inferred causality. Proceedings of the Second International Conference on the Principles of Knowledge Representation and Reasoning, pp. 441–452.Google Scholar
- Pearl, J. and Dechter, R. 1996. Identifying independencies in causal graphs with feedback. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 420–426.Google Scholar
- Richardson, T. 1996. A discovery algorithm for directed causal graphs. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 454–461.Google Scholar
- Scheines, R., Spirtes, P., Glymour, C., and Meek, C. 1995. Tetrad II: Tools for Causal Modeling (with software). Mahwah, New Jersey: Lawrence Erlbaum.Google Scholar
- Spirtes, P., Glymour, C., and Scheines, R. 1993. Causation, Prediction, and Search. New York: Springer-Verlag. (This book is out of print, but it is available in its entirety in Adobe Acrobat format at http://hss.cmu.edu/html/departments/philosophy/TETRAD.BOOK/book.html).Google Scholar
- Spirtes, P., Meek, C., and Richardson, T. 1995. Causal inference in the presence of latent variables and selection bias. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 499–506.Google Scholar