A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships

Cooper, Gregory F.

doi:10.1023/A:1009787925236

A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships

Published: June 1997

Volume 1, pages 203–224, (1997)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Gregory F. Cooper¹

759 Accesses
105 Citations
Explore all metrics

Abstract

This paper presents a simple, efficient computer-based method for discovering causal relationships from databases that contain observational data. Observational data is passively observed, as contrasted with experimental data. Most of the databases available for data mining are observational. There is great potential for mining such databases to discover causal relationships. We illustrate how observational data can constrain the causal relationships among measured variables, sometimes to the point that we can conclude that one variable is causing another variable. The presentation here is based on a constraint-based approach to causal discovery. A primary purpose of this paper is to present the constraint-based causal discovery method in the simplest possible fashion in order to (1) readily convey the basic ideas that underlie more complex constraint-based causal discovery techniques, and (2) permit interested readers to rapidly program and apply the method to their own databases, as a start toward using more elaborate causal discovery algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

Evaluating significance in linear mixed-effects models in R

Article 12 September 2016

A simple introduction to Markov Chain Monte–Carlo sampling

Article Open access 11 March 2016

References

Aliferis, C.F. and Cooper, G.F. 1994. An evaluation of an algorithm for inductive learning of Bayesian belief networks using simulated data sets. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 8–14.
Almond, R.G., 1997. Web page on Software for Learning Belief Networks from Data, http://bayes.stat.washington.edu/almond/belfit.html#BNG.
Bishop, Y., Fienberg, S., and Holland, P. 1975. Discrete Multivariate Analysis. Cambridge, MA: MIT Press.
Google Scholar
Bouckaert, R. 1995. Bayesian belief networks: From construction to inference, Doctoral dissertation, University of Utrecht, Utrecht, Netherlands.
Google Scholar
Castillo, E., Gutierrez, J.M., and Hadi, A.S. 1997. Expert Systems and Probabilistic Network Models. New York: Spring-Verlag.
Google Scholar
Chickering, D.M. and Heckerman, D. 1996. Efficient approximations for the marginal likelihood of incomplete data given a Bayesian network. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 158–168.
Cooper, G.F. 1995. Causal discovery from data in the presence of selection bias. Proceedings of the Workshop on Artificial Intelligence and Statistics, pp. 140–150.
Cooper, G.F. and Herskovits, E. 1992. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347.
Google Scholar
Geiger, D., Verma, T., and Pearl, J. 1990. Identifying independence in Bayesian networks. Networks 20:507–534.
Google Scholar
Heckerman, D. 1996. A tutorial on learning with Bayesian networks, Microsoft Research Report MSR-TR-95-06(available at http://www.research.microsoft.com/ research/dtg/heckerma/heckerma.html).
Heckerman, D., Geiger, D., and Chickering, D. 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20:197–243.
Google Scholar
Herskovits, E.H. 1991. Computer-based probabilistic-network construction, Doctoral dissertation, Medical Information Sciences, Stanford University.
Google Scholar
Jensen, F.V. 1996. An Introduction to Bayesian Networks. New York: Springer-Verlag.
Google Scholar
Little, R.J.A. and Rubin, D.B. 1987. Statistical Analysis with Missing Data. New York: John Wiley & Sons.
Google Scholar
Meek, C. 1995. Strong completeness and faithfulness in Bayesian networks. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 411–418.
Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Pearl, J. 1994. Causal diagrams for empirical research, Report R-218-L, Computer Science Department, University of California at Los Angeles.
Google Scholar
Pearl, J. and Verma, T.S. 1991. A theory of inferred causality. Proceedings of the Second International Conference on the Principles of Knowledge Representation and Reasoning, pp. 441–452.
Pearl, J. and Dechter, R. 1996. Identifying independencies in causal graphs with feedback. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 420–426.
Richardson, T. 1996. A discovery algorithm for directed causal graphs. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 454–461.
Scheines, R., Spirtes, P., Glymour, C., and Meek, C. 1995. Tetrad II: Tools for Causal Modeling (with software). Mahwah, New Jersey: Lawrence Erlbaum.
Google Scholar
Spirtes, P., Glymour, C., and Scheines, R. 1993. Causation, Prediction, and Search. New York: Springer-Verlag. (This book is out of print, but it is available in its entirety in Adobe Acrobat format at http://hss.cmu.edu/html/departments/philosophy/TETRAD.BOOK/book.html).
Google Scholar
Spirtes, P., Meek, C., and Richardson, T. 1995. Causal inference in the presence of latent variables and selection bias. Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 499–506.

Download references

Author information

Authors and Affiliations

Center for Biomedical Informatics, University of Pittsburgh, Suite 8084, Forbes Tower, Pittsburgh, PA, 15213
Gregory F. Cooper

Authors

Gregory F. Cooper
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cooper, G.F. A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships. Data Mining and Knowledge Discovery 1, 203–224 (1997). https://doi.org/10.1023/A:1009787925236

Download citation

Issue Date: June 1997
DOI: https://doi.org/10.1023/A:1009787925236

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Evaluating significance in linear mixed-effects models in R

A simple introduction to Markov Chain Monte–Carlo sampling

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Evaluating significance in linear mixed-effects models in R

A simple introduction to Markov Chain Monte–Carlo sampling

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation