Abstract
Occam’s razor directs us to adopt the simplest hypothesis consistent with the evidence. Learning theory provides a precise definition of the inductive simplicity of a hypothesis for a given learning problem. This definition specifies a learning method that implements an inductive version of Occam’s razor. As a case study, we apply Occam’s inductive razor to causal learning. We consider two causal learning problems: learning a causal graph structure that presents global causal connections among a set of domain variables, and learning context-sensitive causal relationships that hold not globally, but only relative to a context. For causal graph learning, Occam’s inductive razor directs us to adopt the model that explains the observed correlations with a minimum number of direct causal connections. For expanding a causal graph structure to include context-sensitive relationships, Occam’s inductive razor directs us to adopt the expansion that explains the observed correlations with a minimum number of free parameters. This is equivalent to explaining the correlations with a minimum number of probabilistic logical rules. The paper provides a gentle introduction to the learning-theoretic definition of inductive simplicity and the application of Occam’s razor for causal learning.
Similar content being viewed by others
References
Boutilier, C., T. L. Dean, and S. Hanks, Decision-theoretic planning: Structural assumptions and computational leverage, Journal of Artificial Intelligence Research (JAIR) 11:1–94, 1999.
Boutilier, C., N. Friedman, M. Goldszmidt, and D. Koller, Context-specific independence in Bayesian networks, in UAI, 1996, pp. 115–123.
Case, J., and C. Smith, Comparison of identification criteria for machine inductive inference, Theoretical Computer Science 25:193–220, 1983.
Chickering, D., Optimal structure identification with greedy search, Journal of Machine Learning Research 3:507–554, 2003.
Cooper, G., An overview of the representation and discovery of causal relationships using Bayesian networks, in C. Glymour and G. Cooper, (eds.), Computation, Causation, and Discovery, AAAI Press/The MIT Press, Cambridge, 1999, pp. 4–62.
de Campos, L. M., A scoring function for learning Bayesian networks based on mutual information and conditional independence tests, Journal of Machine Learning Research 7:2149–2187, 2006.
Dowe, D. L., MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness, in Handbook of Philosophy of Science, volume 7: Handbook of Philosophy of Statistics, Elsevier, 2011.
Friedman, N., and M. Goldszmidt, Learning Bayesian networks with local structures, in Proceedings of the NATO Advanced Study Institute on Learning in graphical models, Norwell, MA, USA, Kluwer Academic Publishers, 1998, pp. 421–459.
Geiger, D., and D. Heckerman, Knowledge representation and inference in similarity networks and Bayesian multinets, Artificial Intelligence 82(1-2):45–74, 1996.
Genin, K., and K. T. Kelly, The topology of statistical verifiability, in Proceedings Conference on Theoretical Aspects of Rationality and Knowledge, TARK, 2017, pp. 236–250.
Giere, R. N., The significance test controversy, The British Journal for the Philosophy of Science 23(2):170–181, 1972.
Glymour, C., On the methods of cognitive neuropsychology, British Journal for the Philosophy of Science 45:815–835, 1994.
Gold, E. M., Language identification in the limit, Information and Control 10(5):447–474, 1967.
Heckerman, D., A tutorial on learning with Bayesian networks, in Proceedings of the NATO Advanced Study Institute on Learning in graphical models, 1998, pp. 301–354.
Jain, S., D. Osherson, J. S. Royer, and A. Sharma, Systems that Learn, 2 edition, MIT Press, Cambridge, 1999.
Kelly, K., The Logic of Reliable Inquiry, Oxford University Press, Oxford, 1996.
Kelly, K., Justification as truth-finding efficiency: How Ockham’s razor works, Minds and Machines 14(4):485–505, 2004.
Kelly, K., Why probability does not capture the logic of scientific justification, in C. Hitchcock, (ed.), Contemporary Debates in the philosophy of Science, Wiley-Blackwell, London, 2004, pp. 94–114.
Kelly, K. T., and C. Mayo-Wilson, Causal conclusions that flip repeatedly and their justification, in UAI, 2010, pp. 277–285.
Khosravi, H., O. Schulte, J. Hu, and T. Gao, Learning compact Markov logic networks with decision trees, Machine Learning 89(3):257–277, 2012.
Lauritzen, S. L., and D. J. Spiegelhalter, Local computations with probabilities on graphical structures and their application to expert systems, Journal of the Royal Statistics Society B 50(2):157–194, 1988.
Lucas, J. F., Introduction to Abstract Mathematics, Rowman & Littlefield, Lanham, 1990.
Luo, W., Learning Bayesian networks in semi-deterministic systems, in Canadian AI 2006, number 4013 in LNAI, Springer-Verlag, 2006, pp. 230–241.
Luo, W., and O. Schulte, Mind change efficient learning, Information and Computation 204:989–1011, 2006.
Martin, E., and D. N. Osherson, Elements of Scientific Inquiry, The MIT Press, Cambridge, Massachusetts, 1998.
Meek, C., Graphical Models: Selecting causal and statistical models, Ph.D. thesis, Carnegie Mellon University, 1997.
Ngo, L., and P. Haddawy, Answering queries from context-sensitive probabilistic knowledge bases, Theoretical Computer Science 171(1-2):147–177, 1997.
Pearl, J., Probabilistic Reasoning in Intelligent Systems, Morgan Kauffmann, San Mateo, CA, 1988.
Pearl, J., Causality: Models, Reasoning, and Inference, Cambridge university press, Cambridge, 2000.
Provost, F. J., and P. Domingos, Tree induction for probability-based ranking, Machine Learning 52(3):199–215, 2003.
Putnam, H., Trial and error predicates and the solution to a problem of Mostowski, The Journal of Symbolic Logic 30(1):49–57, 1965.
Schulte, O., Means-ends epistemology epistemology, The British Journal for the Philosophy of Science 79(1):141–147, 1996.
Schulte, O., Discussion. What to believe and what to take seriously: A reply to David Chart concerning the riddle of induction, The British Journal for the Philosophy of Science 51(1):151–153, 2000.
Schulte, O., The co-discovery of conservation laws and particle families, Studies in the History and Philosophy of Modern Physics 39(2):288–314, 2008.
Schulte, O., G. Frigo, R. Greiner, and H. Khosravi, The IMAP hybrid method for learning Gaussian Bayes nets, in A. Farzindar, and V. Keselj, (eds.), Canadian Conference on AI, volume 6085 of Lecture Notes in Computer Science, Springer, 2010, pp. 123–134.
Schulte, O., W. Luo, and R. Greiner, Mind-change optimal learning of Bayes net structure from dependency and independency data, Information and Computation 208:63–82, 2010.
Spirtes, P., C. Glymour, and R. Scheines, Causation, prediction, and search, MIT Press, Cambridge, 2000.
Studeny, M., Probabilistic Conditional Independence Structures, Springer, Berlin, 2005.
Tsamardinos, I., L. E. Brown, and C. Aliferis, The max-min hill-climbing Bayesian network structure learning algorithm, Machine Learning 65(1):31–78, 2006.
Verma, T. S., and J. Pearl, Equivalence and synthesis of causal models, in Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (UAI 1990), 1990, pp. 220–227.
Xiang, Y., S. K. Wong, and N. Cercone, Critical remarks on single link search in learning belief networks, in Proceedings of the 12th Annual Conference on Uncertainty in Artificial Intelligence (UAI 1996), 1996, pp. 564–571.
Acknowledgements
This research was supported by an NSERC discovery grant to the author. Preliminary results were presented at the Center for Formal Epistemology at Carnegie Mellon University. The author is grateful to the audience at the Center for helpful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Presented by Jacek Malinowski
Rights and permissions
About this article
Cite this article
Schulte, O. Causal Learning with Occam’s Razor. Stud Logica 107, 991–1023 (2019). https://doi.org/10.1007/s11225-018-9829-1
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11225-018-9829-1