Advertisement

Minds and Machines

, Volume 23, Issue 2, pp 227–249 | Cite as

Replacing Causal Faithfulness with Algorithmic Independence of Conditionals

  • Jan Lemeire
  • Dominik Janzing
Article

Abstract

Independence of Conditionals (IC) has recently been proposed as a basic rule for causal structure learning. If a Bayesian network represents the causal structure, its Conditional Probability Distributions (CPDs) should be algorithmically independent. In this paper we compare IC with causal faithfulness (FF), stating that only those conditional independences that are implied by the causal Markov condition hold true. The latter is a basic postulate in common approaches to causal structure learning. The common spirit of FF and IC is to reject causal graphs for which the joint distribution looks ‘non-generic’. The difference lies in the notion of genericity: FF sometimes rejects models just because one of the CPDs is simple, for instance if the CPD describes a deterministic relation. IC does not behave in this undesirable way. It only rejects a model when there is a non-generic relation between different CPDs although each CPD looks generic when considered separately. Moreover, it detects relations between CPDs that cannot be captured by conditional independences. IC therefore helps in distinguishing causal graphs that induce the same conditional independences (i.e., they belong to the same Markov equivalence class). The usual justification for FF implicitly assumes a prior that is a probability density on the parameter space. IC can be justified by Solomonoff’s universal prior, assigning non-zero probability to those points in parameter space that have a finite description. In this way, it favours simple CPDs, and therefore respects Occam’s razor. Since Kolmogorov complexity is uncomputable, IC is not directly applicable in practice. We argue that it is nevertheless helpful, since it has already served as inspiration and justification for novel causal inference algorithms.

Keywords

Causality Causal learning Bayesian networks Kolmogorov complexity 

Notes

Acknowledgments

We would like to thank the blind reviewers in helping to structure our exposition and make our ideas clear. We would also like to thank Patrik Hoyer for providing us the example of Sect. "Both FF and IC are Sanity Checks of the Model Class". This work has partially been carried out within the framework of the Prognostics for Optimal Maintenance (POM) project (grant nr. 100031; http://www.pom-sbo.org) which is financially supported by the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT-Vlaanderen).

References

  1. Cartwright, N. (1999). The dappled word: A study of the boundaries of science. Cambridge, MA: Cambridge University Press.zbMATHCrossRefGoogle Scholar
  2. Cartwright, N. (2002). Against modularity, the causal Markov condition and any link between the two. British Journal for the Philosophy of Science, 53, 411–53.MathSciNetzbMATHCrossRefGoogle Scholar
  3. Chaitin, G. (1966). On the length of programs for computing finite binary sequences. Journal of Association for Computing Machinery, 13, 547–569.MathSciNetzbMATHCrossRefGoogle Scholar
  4. Chaitin, G. (1975). A theory of program size formally identical to information theory. Journal of Association for Computing Machinery, 22, 329–340.MathSciNetzbMATHCrossRefGoogle Scholar
  5. Daniusis, P., Janzing, D., Mooij, J., Zscheischler, J., Steudel, B., Zhang, et al. (2010). Inferring deterministic causal relations. In: Proceedings of 6th Conference on Uncertainty in Artificial Intelligence (UAI).Google Scholar
  6. Gacs, P., Tromp, J., & Vitányi, P. (2001). Algorithmic statistics. IEEE Transactions on Information Theory , 47(6), 2443–2463.zbMATHCrossRefGoogle Scholar
  7. Grünwald, P. (2007). The minimum description length principle. Cambridge, MA: MIT Press.Google Scholar
  8. Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J., & Schölkopf, B. (2008). Nonlinear causal discovery with additive noise models. In: D. Koller, D. Schuurmans, Y. Bengio & L. Bottou (Eds.), NIPS (pp. 689–696). Cambridge, MA: IT Press.Google Scholar
  9. Hutter, M. (2007). On universal prediction and Bayesian confirmation. Theoretical Computer Science, 384(1), 33–48.MathSciNetzbMATHCrossRefGoogle Scholar
  10. Janzing, D., & Schölkopf, B. (2010). Causal inference using the algorithmic Markov condition. IEEE Transactions on Information Theory, 56(10), 5168–5194.CrossRefGoogle Scholar
  11. Janzing, D., & Steudel, B. (2010). Justifying additive-noise-based causal discovery via algorithmic information theory. Open Systems and Information Dynamics, 17(2), 189–212.MathSciNetzbMATHCrossRefGoogle Scholar
  12. Janzing, D., Sun, X., & Schölkopf, B. (2009). Distinguishing cause and effect via second order exponential models. http://arxivorg/abs/09105561.
  13. Janzing, D., Hoyer, P., & Schölkopf, B. (2010). Telling cause from effect based on high-dimensional observations. In: Proceedings of the Internationl Conference on Machine Learning (ICML), Israel: Haifa.Google Scholar
  14. Janzing, D., Mooij, J., Zhang, K., Lemeire, J., Zscheischler, J., Daniusis, P., et al. (2012). Information-geometric approach to inferring causal directions. Artificial Intelligence, 56(10), 5168–5194.Google Scholar
  15. Kolmogorov, A. (1965). Three approaches to the quantitative definition of information. Problems of Information Transmission, 1(1), 1–7.MathSciNetGoogle Scholar
  16. Korb, K. B., & Nyberg, E. (2006). The power of intervention. Minds and Machines, 16(3), 289–302.CrossRefGoogle Scholar
  17. Lauritzen, S. L. (1996). Graphical models. Oxford: Clarendon Press.Google Scholar
  18. Lauritzen, S. L., & Richardson, T. S. (2002). Chain graph models and their causal interpretation. Journal of the Royal Statistical Society, Series B, 64, 321 – 361.MathSciNetzbMATHCrossRefGoogle Scholar
  19. Lemeire, J., & Dirkx, E. (2006). Causal models as minimal descriptions of multivariate systems. http://parallel.vub.ac.be/∼jan.
  20. Lemeire, J., Meganck, S., Cartella, F., Liu, T., & Statnikov, A. (2011a). Inferring the causal decomposition under the presence of deterministic relations. In: Special session learning of causal relations at the ESANN conference.Google Scholar
  21. Lemeire, J., Steenhaut, K., & Touhafi, A. (2011b). When are graphical causal models not good models? In: J. Williamson, F. Russo & P. McKay (Eds.), Causality in the sciences. Oxford: Oxford University Press.Google Scholar
  22. Levin, L. (1974). Laws of information conservation (non-growth) and aspects of the foundation of probability theory. Problems Information Transmission, 10(3), 206–210.Google Scholar
  23. Meek, C. (1995). Strong completeness and faithfulness in Bayesian networks. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), pp 411–418.Google Scholar
  24. Pearl, J. (2000). Causality. Models, reasoning, and inference. Cambridge, MA: Cambridge University Press.zbMATHGoogle Scholar
  25. Peters, J., Janzing, D., & Schölkopf, B. (2011a). Causal inference on discrete data using additive noise models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2436–2450.CrossRefGoogle Scholar
  26. Peters, J., Mooij, J., Janzing, D., & Schölkopf, B. (2011b) Identifiability of causal graphs using functional models. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI).Google Scholar
  27. Rathmanner, S., & Hutter, M. (2011). A philosophical treatise of universal induction. Entropy, 13(6), 1076–1136 doi: 10.3390/e13061076.MathSciNetCrossRefGoogle Scholar
  28. Solomonoff, R. (1960). A preliminary report on a general theory of inductive inference. Technical report V-131 report ZTB-138 Zator Co.Google Scholar
  29. Solomonoff, R. (1964). A formal theory of inductive inference. Information and Control, Part II, 7(2), 224–254.MathSciNetzbMATHCrossRefGoogle Scholar
  30. Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search, 2nd edn. Berlin: Springer Verlag.zbMATHCrossRefGoogle Scholar
  31. Zhang, J., Spirtes, P. (2011). Intervention, determinism, and the causal minimality condition. Synthese, 182(3), 335–347.MathSciNetzbMATHCrossRefGoogle Scholar
  32. Zscheischler, J., Janzing, D., & Zhang, K. (2011) Testing whether linear equations are causal: A free probability theory approach. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI).Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.ETRO DepartmentVrije Universiteit Brussel (VUB)BrusselsBelgium
  2. 2.FMI DepartmentInterdisciplinary Institute for Broadband Technology (IBBT)GhentBelgium
  3. 3.MPI for Intelligent SystemsTubingenGermany

Personalised recommendations