Skip to main content
Log in

A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

This paper presents a simple, efficient computer-based method for discovering causal relationships from databases that contain observational data. Observational data is passively observed, as contrasted with experimental data. Most of the databases available for data mining are observational. There is great potential for mining such databases to discover causal relationships. We illustrate how observational data can constrain the causal relationships among measured variables, sometimes to the point that we can conclude that one variable is causing another variable. The presentation here is based on a constraint-based approach to causal discovery. A primary purpose of this paper is to present the constraint-based causal discovery method in the simplest possible fashion in order to (1) readily convey the basic ideas that underlie more complex constraint-based causal discovery techniques, and (2) permit interested readers to rapidly program and apply the method to their own databases, as a start toward using more elaborate causal discovery algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aliferis, C.F. and Cooper, G.F. 1994. An evaluation of an algorithm for inductive learning of Bayesian belief networks using simulated data sets. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 8–14.

  • Almond, R.G., 1997. Web page on Software for Learning Belief Networks from Data, http://bayes.stat.washington.edu/almond/belfit.html#BNG.

  • Bishop, Y., Fienberg, S., and Holland, P. 1975. Discrete Multivariate Analysis. Cambridge, MA: MIT Press.

    Google Scholar 

  • Bouckaert, R. 1995. Bayesian belief networks: From construction to inference, Doctoral dissertation, University of Utrecht, Utrecht, Netherlands.

    Google Scholar 

  • Castillo, E., Gutierrez, J.M., and Hadi, A.S. 1997. Expert Systems and Probabilistic Network Models. New York: Spring-Verlag.

    Google Scholar 

  • Chickering, D.M. and Heckerman, D. 1996. Efficient approximations for the marginal likelihood of incomplete data given a Bayesian network. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 158–168.

  • Cooper, G.F. 1995. Causal discovery from data in the presence of selection bias. Proceedings of the Workshop on Artificial Intelligence and Statistics, pp. 140–150.

  • Cooper, G.F. and Herskovits, E. 1992. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347.

    Google Scholar 

  • Geiger, D., Verma, T., and Pearl, J. 1990. Identifying independence in Bayesian networks. Networks 20:507–534.

    Google Scholar 

  • Heckerman, D. 1996. A tutorial on learning with Bayesian networks, Microsoft Research Report MSR-TR-95-06(available at http://www.research.microsoft.com/ research/dtg/heckerma/heckerma.html).

  • Heckerman, D., Geiger, D., and Chickering, D. 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20:197–243.

    Google Scholar 

  • Herskovits, E.H. 1991. Computer-based probabilistic-network construction, Doctoral dissertation, Medical Information Sciences, Stanford University.

    Google Scholar 

  • Jensen, F.V. 1996. An Introduction to Bayesian Networks. New York: Springer-Verlag.

    Google Scholar 

  • Little, R.J.A. and Rubin, D.B. 1987. Statistical Analysis with Missing Data. New York: John Wiley & Sons.

    Google Scholar 

  • Meek, C. 1995. Strong completeness and faithfulness in Bayesian networks. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 411–418.

  • Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Pearl, J. 1994. Causal diagrams for empirical research, Report R-218-L, Computer Science Department, University of California at Los Angeles.

    Google Scholar 

  • Pearl, J. and Verma, T.S. 1991. A theory of inferred causality. Proceedings of the Second International Conference on the Principles of Knowledge Representation and Reasoning, pp. 441–452.

  • Pearl, J. and Dechter, R. 1996. Identifying independencies in causal graphs with feedback. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 420–426.

  • Richardson, T. 1996. A discovery algorithm for directed causal graphs. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 454–461.

  • Scheines, R., Spirtes, P., Glymour, C., and Meek, C. 1995. Tetrad II: Tools for Causal Modeling (with software). Mahwah, New Jersey: Lawrence Erlbaum.

    Google Scholar 

  • Spirtes, P., Glymour, C., and Scheines, R. 1993. Causation, Prediction, and Search. New York: Springer-Verlag. (This book is out of print, but it is available in its entirety in Adobe Acrobat format at http://hss.cmu.edu/html/departments/philosophy/TETRAD.BOOK/book.html).

    Google Scholar 

  • Spirtes, P., Meek, C., and Richardson, T. 1995. Causal inference in the presence of latent variables and selection bias. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 499–506.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cooper, G.F. A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships. Data Mining and Knowledge Discovery 1, 203–224 (1997). https://doi.org/10.1023/A:1009787925236

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009787925236

Navigation