From Dependency to Causality: A Machine Learning Approach

  • Gianluca BontempiEmail author
  • Maxime Flauder
Part of the The Springer Series on Challenges in Machine Learning book series (SSCML)


The relationship between statistical dependency and causality lies at the heart of all statistical approaches to causal inference. Recent results in the ChaLearn cause-effect pair challenge have shown that causal directionality can be inferred with good accuracy also in Markov indistinguishable configurations thanks to data driven approaches. This paper proposes a supervised machine learning approach to infer the existence of a directed causal link between two variables in multivariate settings with n > 2 variables. The approach relies on the asymmetry of some conditional (in)dependence relations between the members of the Markov blankets of two variables causally connected. Our results show that supervised learning methods may be successfully used to extract causal information on the basis of asymmetric statistical descriptors also for n > 2 variate distributions.


Causal inference Information theory Machine learning 



This work was supported by the ARC project “Discovery of the molecular pathways regulating pancreatic beta cell dysfunction and apoptosis in diabetes using functional genomics and bioinformatics” funded by the Communauté Française de Belgique and the BridgeIRIS project funded by INNOVIRIS, Brussels Region. The authors wishes to thank the editor and the anonymous reviewers for their insightful comments and remarks.


  1. 1.
    C. F. Aliferis, A. Statnikov, I. Tsamardinos, S. Mani, and X. D. Koutsoukos. Local causal and markov blanket induction for causal discovery and feature selection for classification. Journal of Machine Learning Research, 11:171–234, 2010.zbMATHGoogle Scholar
  2. 2.
    C.F. Aliferis, I. Tsamardinos, and A. Statnikov. Causal explorer: A probabilistic network learning toolkit for biomedical discovery. In Proceedings of METMBS, 2003.Google Scholar
  3. 3.
    G. Bontempi and P.E. Meyer. Causal filter selection in microarray data. In Proceedings of ICML, 2010.Google Scholar
  4. 4.
    G. Bontempi, M. Birattari, and H. Bersini. Lazy learning for modeling and control design. International Journal of Control, 72(7/8):643–658, 1999.MathSciNetCrossRefGoogle Scholar
  5. 5.
    G. Bontempi, B. Haibe-Kains, C. Desmedt, C. Sotiriou, and J. Quackenbush. Multiple-input multiple-output causal strategies for gene selection. BMC Bioinformatics, 12(1):458, 2011.Google Scholar
  6. 6.
    G. Bontempi, C. Olsen, and M. Flauder. D2C: Predicting Causal Direction from Dependency Features, 2014. URL R package version 1.1.
  7. 7.
    T. Claassen and T. Heskes. A logical characterization of constraint-based causal discovery. In Proceedings of UAI, 2011.Google Scholar
  8. 8.
    T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley, New York, 1990.zbMATHGoogle Scholar
  9. 9.
    R. Daly and Q. Shen. Methods to accelerate the learning of bayesian network structures. In Proceedings of the UK Workshop on Computational Intelligence, 2007.Google Scholar
  10. 10.
    P. Daniusis, D. Janzing, J. Mooij, J. Zscheischler, B. Steudel, K. Zhang, and B. Scholkopf. Inferring deterministic causal relations. In Proceedings of UAI, pages 143–150, 2010.Google Scholar
  11. 11.
    C. Dethlefsen and S. Højsgaard. A common platform for graphical models in R: The gRbase package. Journal of Statistical Software, 14(17):1–12, 2005. URL
  12. 12.
    N. Friedman, M. Linial, I. Nachman, and Dana Pe’er. Using bayesian networks to analyze expression data. Journal of Computational Biology, 7, 2000.Google Scholar
  13. 13.
    D. Geiger, T. Verma, and J. Pearl. Identifying independence in bayesian networks. Networks, 20, 1990.MathSciNetCrossRefGoogle Scholar
  14. 14.
    I. Guyon. Results and analysis of the 2013 ChaLearn cause-effect pair challenge. In Proceedings of NIPS 2013 Workshop on Causality: Large-scale Experiment Design and Inference of Causal Mechanisms, 2014.Google Scholar
  15. 15.
    I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157–1182, 2003.zbMATHGoogle Scholar
  16. 16.
    I. Guyon, C. Aliferis, and A. Elisseeff. Computational Methods of Feature Selection, chapter Causal Feature Selection, pages 63–86. Chapman and Hall, 2007.Google Scholar
  17. 17.
    PO Hoyer, D. Janzing, J. Mooij, J. Peters, and B. Scholkopf. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems, pages 689–696, 2009.Google Scholar
  18. 18.
    D. Janzing, J. Mooij, K. Zhang, J. Lemeire, J. Zscheischler, P. Daniusis, B. Steudel, and B. Scholkopf. Information-geometric approach to inferring causal directions. Artificial Intelligence, 2012.Google Scholar
  19. 19.
    M. Kalisch, M. Mächler, D. Colombo, M. H. Maathuis, and P. Bühlmann. Causal inference using graphical models with the R package pcalg. Journal of Statistical Software, 47(11):1–26, 2012. URL
  20. 20.
    D. Koller and N. Friedman. Probabilistic Graphical Models. The MIT Press, 2009.zbMATHGoogle Scholar
  21. 21.
    A. Liaw and M. Wiener. Classification and regression by randomforest. R News, 2(3):18–22, 2002. URL
  22. 22.
    D. Margaritis. Learning Bayesian Network Model Structure from Data. PhD thesis, School of Computer Science, Carnegie-Mellon University, Pittsburgh, PA, 2003.Google Scholar
  23. 23.
    G.J. McLaughlan. Finite Mixture Models. Wiley, 2000.CrossRefGoogle Scholar
  24. 24.
    P.E. Meyer and G. Bontempi. Biological Knowledge Discovery Handbook, chapter Information-theoretic gene selection in expression data. IEEE Computer Society, 2014.Google Scholar
  25. 25.
    JM Mooij, O. Stegle, D. Janzing, K. Zhang, and B. Scholkopf. Probabilistic latent variable models for distinguishing between cause and effect. In Advances in Neural Information Processing Systems, 2010.Google Scholar
  26. 26.
    J. Pearl. Causal diagrams for empirical research. Biometrika, 82:669–710, 1995.MathSciNetCrossRefGoogle Scholar
  27. 27.
    J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.zbMATHGoogle Scholar
  28. 28.
    J.P. Pellet and A. Elisseeff. Using markov blankets for causal structure learning. Journal of Machine Learning Research, 9:1295–1342, 2008.MathSciNetzbMATHGoogle Scholar
  29. 29.
    H. Peng, F. Long, and C. Ding. Feature selection based on mutual information: Criteria of max-dependency,max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8):1226–1238, 2005.CrossRefGoogle Scholar
  30. 30.
    O. Pourret, P. Nam, and B. Marcot. Bayesian Networks: A Practical Guide to Applications. Wiley, 2008.CrossRefGoogle Scholar
  31. 31.
    H. Reichenbach. The Direction of Time. University of California Press, Berkeley, 1956.CrossRefGoogle Scholar
  32. 32.
    M. Schmidt, A. Niculescu-Mizil, and K. Murphy. Learning graphical model structure using l1-regularization paths. In Proceedings of AAAI, 2007.Google Scholar
  33. 33.
    Marco Scutari. Learning bayesian networks with the bnlearn R package. Journal of Statistical Software, 35(3):1–22, 2010. URL
  34. 34.
    S. Shimizu, P.O. Hoyer, A. Hyvrinen, and A.J. Kerminen. A linear, non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7:2003–2030, 2006.MathSciNetzbMATHGoogle Scholar
  35. 35.
    P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction and Search. Springer Verlag, Berlin, 2000.zbMATHGoogle Scholar
  36. 36.
    A. Statnikov, M. Henaff, N.I. Lytkin, and C. F. Aliferis. New methods for separating causes from effects in genomics data. BMC Genomics, 13 (S22), 2012.CrossRefGoogle Scholar
  37. 37.
    I. Tsamardinos, CF Aliferis, and A Statnikov. Time and sample efficient discovery of markov blankets and direct causal relations. In Proceedings of KDD, pages 673–678, 2003.Google Scholar
  38. 38.
    I. Tsamardinos, C.F. Aliferis, and A. Statnikov. Algorithms for large scale markov blanket discovery. In Proceedings of FLAIRS, 2003.Google Scholar
  39. 39.
    I. Tsamardinos, LE Brown, and CF Aliferis. The max-min hill-climbing bayesian network structure learning algorithm. Machine Learning, 65(1):31–78, 2010.CrossRefGoogle Scholar
  40. 40.
    J. Zhang. Causal reasoning with ancestral graphs. Journal of Machine Learning Research, 9:1437–1474, 2008.MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Machine Learning Group, Computer Science DepartmentULB, Université Libre de BruxellesBrusselsBelgium

Personalised recommendations