Skip to main content
Log in

Causal Learning with Occam’s Razor

  • Published:
Studia Logica Aims and scope Submit manuscript

Abstract

Occam’s razor directs us to adopt the simplest hypothesis consistent with the evidence. Learning theory provides a precise definition of the inductive simplicity of a hypothesis for a given learning problem. This definition specifies a learning method that implements an inductive version of Occam’s razor. As a case study, we apply Occam’s inductive razor to causal learning. We consider two causal learning problems: learning a causal graph structure that presents global causal connections among a set of domain variables, and learning context-sensitive causal relationships that hold not globally, but only relative to a context. For causal graph learning, Occam’s inductive razor directs us to adopt the model that explains the observed correlations with a minimum number of direct causal connections. For expanding a causal graph structure to include context-sensitive relationships, Occam’s inductive razor directs us to adopt the expansion that explains the observed correlations with a minimum number of free parameters. This is equivalent to explaining the correlations with a minimum number of probabilistic logical rules. The paper provides a gentle introduction to the learning-theoretic definition of inductive simplicity and the application of Occam’s razor for causal learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Boutilier, C., T. L. Dean, and S. Hanks, Decision-theoretic planning: Structural assumptions and computational leverage, Journal of Artificial Intelligence Research (JAIR) 11:1–94, 1999.

    Article  Google Scholar 

  2. Boutilier, C., N. Friedman, M. Goldszmidt, and D. Koller, Context-specific independence in Bayesian networks, in UAI, 1996, pp. 115–123.

  3. Case, J., and C. Smith, Comparison of identification criteria for machine inductive inference, Theoretical Computer Science 25:193–220, 1983.

    Article  Google Scholar 

  4. Chickering, D., Optimal structure identification with greedy search, Journal of Machine Learning Research 3:507–554, 2003.

    Google Scholar 

  5. Cooper, G., An overview of the representation and discovery of causal relationships using Bayesian networks, in C. Glymour and G. Cooper, (eds.), Computation, Causation, and Discovery, AAAI Press/The MIT Press, Cambridge, 1999, pp. 4–62.

    Google Scholar 

  6. de Campos, L. M., A scoring function for learning Bayesian networks based on mutual information and conditional independence tests, Journal of Machine Learning Research 7:2149–2187, 2006.

    Google Scholar 

  7. Dowe, D. L., MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness, in Handbook of Philosophy of Science, volume 7: Handbook of Philosophy of Statistics, Elsevier, 2011.

  8. Friedman, N., and M. Goldszmidt, Learning Bayesian networks with local structures, in Proceedings of the NATO Advanced Study Institute on Learning in graphical models, Norwell, MA, USA, Kluwer Academic Publishers, 1998, pp. 421–459.

  9. Geiger, D., and D. Heckerman, Knowledge representation and inference in similarity networks and Bayesian multinets, Artificial Intelligence 82(1-2):45–74, 1996.

    Article  Google Scholar 

  10. Genin, K., and K. T. Kelly, The topology of statistical verifiability, in Proceedings Conference on Theoretical Aspects of Rationality and Knowledge, TARK, 2017, pp. 236–250.

    Article  Google Scholar 

  11. Giere, R. N., The significance test controversy, The British Journal for the Philosophy of Science 23(2):170–181, 1972.

    Article  Google Scholar 

  12. Glymour, C., On the methods of cognitive neuropsychology, British Journal for the Philosophy of Science 45:815–835, 1994.

    Article  Google Scholar 

  13. Gold, E. M., Language identification in the limit, Information and Control 10(5):447–474, 1967.

    Article  Google Scholar 

  14. Heckerman, D., A tutorial on learning with Bayesian networks, in Proceedings of the NATO Advanced Study Institute on Learning in graphical models, 1998, pp. 301–354.

    Chapter  Google Scholar 

  15. Jain, S., D. Osherson, J. S. Royer, and A. Sharma, Systems that Learn, 2 edition, MIT Press, Cambridge, 1999.

  16. Kelly, K., The Logic of Reliable Inquiry, Oxford University Press, Oxford, 1996.

    Google Scholar 

  17. Kelly, K., Justification as truth-finding efficiency: How Ockham’s razor works, Minds and Machines 14(4):485–505, 2004.

    Article  Google Scholar 

  18. Kelly, K., Why probability does not capture the logic of scientific justification, in C. Hitchcock, (ed.), Contemporary Debates in the philosophy of Science, Wiley-Blackwell, London, 2004, pp. 94–114.

    Google Scholar 

  19. Kelly, K. T., and C. Mayo-Wilson, Causal conclusions that flip repeatedly and their justification, in UAI, 2010, pp. 277–285.

  20. Khosravi, H., O. Schulte, J. Hu, and T. Gao, Learning compact Markov logic networks with decision trees, Machine Learning 89(3):257–277, 2012.

    Article  Google Scholar 

  21. Lauritzen, S. L., and D. J. Spiegelhalter, Local computations with probabilities on graphical structures and their application to expert systems, Journal of the Royal Statistics Society B 50(2):157–194, 1988.

    Google Scholar 

  22. Lucas, J. F., Introduction to Abstract Mathematics, Rowman & Littlefield, Lanham, 1990.

    Google Scholar 

  23. Luo, W., Learning Bayesian networks in semi-deterministic systems, in Canadian AI 2006, number 4013 in LNAI, Springer-Verlag, 2006, pp. 230–241.

  24. Luo, W., and O. Schulte, Mind change efficient learning, Information and Computation 204:989–1011, 2006.

    Article  Google Scholar 

  25. Martin, E., and D. N. Osherson, Elements of Scientific Inquiry, The MIT Press, Cambridge, Massachusetts, 1998.

    Google Scholar 

  26. Meek, C., Graphical Models: Selecting causal and statistical models, Ph.D. thesis, Carnegie Mellon University, 1997.

  27. Ngo, L., and P. Haddawy, Answering queries from context-sensitive probabilistic knowledge bases, Theoretical Computer Science 171(1-2):147–177, 1997.

    Article  Google Scholar 

  28. Pearl, J., Probabilistic Reasoning in Intelligent Systems, Morgan Kauffmann, San Mateo, CA, 1988.

    Google Scholar 

  29. Pearl, J., Causality: Models, Reasoning, and Inference, Cambridge university press, Cambridge, 2000.

    Google Scholar 

  30. Provost, F. J., and P. Domingos, Tree induction for probability-based ranking, Machine Learning 52(3):199–215, 2003.

    Article  Google Scholar 

  31. Putnam, H., Trial and error predicates and the solution to a problem of Mostowski, The Journal of Symbolic Logic 30(1):49–57, 1965.

    Article  Google Scholar 

  32. Schulte, O., Means-ends epistemology epistemology, The British Journal for the Philosophy of Science 79(1):141–147, 1996.

    Google Scholar 

  33. Schulte, O., Discussion. What to believe and what to take seriously: A reply to David Chart concerning the riddle of induction, The British Journal for the Philosophy of Science 51(1):151–153, 2000.

    Article  Google Scholar 

  34. Schulte, O., The co-discovery of conservation laws and particle families, Studies in the History and Philosophy of Modern Physics 39(2):288–314, 2008.

    Article  Google Scholar 

  35. Schulte, O., G. Frigo, R. Greiner, and H. Khosravi, The IMAP hybrid method for learning Gaussian Bayes nets, in A. Farzindar, and V. Keselj, (eds.), Canadian Conference on AI, volume 6085 of Lecture Notes in Computer Science, Springer, 2010, pp. 123–134.

  36. Schulte, O., W. Luo, and R. Greiner, Mind-change optimal learning of Bayes net structure from dependency and independency data, Information and Computation 208:63–82, 2010.

    Article  Google Scholar 

  37. Spirtes, P., C. Glymour, and R. Scheines, Causation, prediction, and search, MIT Press, Cambridge, 2000.

    Google Scholar 

  38. Studeny, M., Probabilistic Conditional Independence Structures, Springer, Berlin, 2005.

    Google Scholar 

  39. Tsamardinos, I., L. E. Brown, and C. Aliferis, The max-min hill-climbing Bayesian network structure learning algorithm, Machine Learning 65(1):31–78, 2006.

    Article  Google Scholar 

  40. Verma, T. S., and J. Pearl, Equivalence and synthesis of causal models, in Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (UAI 1990), 1990, pp. 220–227.

  41. Xiang, Y., S. K. Wong, and N. Cercone, Critical remarks on single link search in learning belief networks, in Proceedings of the 12th Annual Conference on Uncertainty in Artificial Intelligence (UAI 1996), 1996, pp. 564–571.

Download references

Acknowledgements

This research was supported by an NSERC discovery grant to the author. Preliminary results were presented at the Center for Formal Epistemology at Carnegie Mellon University. The author is grateful to the audience at the Center for helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oliver Schulte.

Additional information

Presented by Jacek Malinowski

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schulte, O. Causal Learning with Occam’s Razor. Stud Logica 107, 991–1023 (2019). https://doi.org/10.1007/s11225-018-9829-1

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11225-018-9829-1

Keywords

Navigation