From Dependency to Causality: A Machine Learning Approach
The relationship between statistical dependency and causality lies at the heart of all statistical approaches to causal inference. Recent results in the ChaLearn cause-effect pair challenge have shown that causal directionality can be inferred with good accuracy also in Markov indistinguishable configurations thanks to data driven approaches. This paper proposes a supervised machine learning approach to infer the existence of a directed causal link between two variables in multivariate settings with n > 2 variables. The approach relies on the asymmetry of some conditional (in)dependence relations between the members of the Markov blankets of two variables causally connected. Our results show that supervised learning methods may be successfully used to extract causal information on the basis of asymmetric statistical descriptors also for n > 2 variate distributions.
KeywordsCausal inference Information theory Machine learning
This work was supported by the ARC project “Discovery of the molecular pathways regulating pancreatic beta cell dysfunction and apoptosis in diabetes using functional genomics and bioinformatics” funded by the Communauté Française de Belgique and the BridgeIRIS project funded by INNOVIRIS, Brussels Region. The authors wishes to thank the editor and the anonymous reviewers for their insightful comments and remarks.
- 2.C.F. Aliferis, I. Tsamardinos, and A. Statnikov. Causal explorer: A probabilistic network learning toolkit for biomedical discovery. In Proceedings of METMBS, 2003.Google Scholar
- 3.G. Bontempi and P.E. Meyer. Causal filter selection in microarray data. In Proceedings of ICML, 2010.Google Scholar
- 5.G. Bontempi, B. Haibe-Kains, C. Desmedt, C. Sotiriou, and J. Quackenbush. Multiple-input multiple-output causal strategies for gene selection. BMC Bioinformatics, 12(1):458, 2011.Google Scholar
- 6.G. Bontempi, C. Olsen, and M. Flauder. D2C: Predicting Causal Direction from Dependency Features, 2014. URL http://CRAN.R-project.org/package=D2C. R package version 1.1.
- 7.T. Claassen and T. Heskes. A logical characterization of constraint-based causal discovery. In Proceedings of UAI, 2011.Google Scholar
- 9.R. Daly and Q. Shen. Methods to accelerate the learning of bayesian network structures. In Proceedings of the UK Workshop on Computational Intelligence, 2007.Google Scholar
- 10.P. Daniusis, D. Janzing, J. Mooij, J. Zscheischler, B. Steudel, K. Zhang, and B. Scholkopf. Inferring deterministic causal relations. In Proceedings of UAI, pages 143–150, 2010.Google Scholar
- 11.C. Dethlefsen and S. Højsgaard. A common platform for graphical models in R: The gRbase package. Journal of Statistical Software, 14(17):1–12, 2005. URL http://www.jstatsoft.org/v14/i17/.
- 12.N. Friedman, M. Linial, I. Nachman, and Dana Pe’er. Using bayesian networks to analyze expression data. Journal of Computational Biology, 7, 2000.Google Scholar
- 14.I. Guyon. Results and analysis of the 2013 ChaLearn cause-effect pair challenge. In Proceedings of NIPS 2013 Workshop on Causality: Large-scale Experiment Design and Inference of Causal Mechanisms, 2014.Google Scholar
- 16.I. Guyon, C. Aliferis, and A. Elisseeff. Computational Methods of Feature Selection, chapter Causal Feature Selection, pages 63–86. Chapman and Hall, 2007.Google Scholar
- 17.PO Hoyer, D. Janzing, J. Mooij, J. Peters, and B. Scholkopf. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems, pages 689–696, 2009.Google Scholar
- 18.D. Janzing, J. Mooij, K. Zhang, J. Lemeire, J. Zscheischler, P. Daniusis, B. Steudel, and B. Scholkopf. Information-geometric approach to inferring causal directions. Artificial Intelligence, 2012.Google Scholar
- 19.M. Kalisch, M. Mächler, D. Colombo, M. H. Maathuis, and P. Bühlmann. Causal inference using graphical models with the R package pcalg. Journal of Statistical Software, 47(11):1–26, 2012. URL http://www.jstatsoft.org/v47/i11/.
- 21.A. Liaw and M. Wiener. Classification and regression by randomforest. R News, 2(3):18–22, 2002. URL http://CRAN.R-project.org/doc/Rnews/.
- 22.D. Margaritis. Learning Bayesian Network Model Structure from Data. PhD thesis, School of Computer Science, Carnegie-Mellon University, Pittsburgh, PA, 2003.Google Scholar
- 24.P.E. Meyer and G. Bontempi. Biological Knowledge Discovery Handbook, chapter Information-theoretic gene selection in expression data. IEEE Computer Society, 2014.Google Scholar
- 25.JM Mooij, O. Stegle, D. Janzing, K. Zhang, and B. Scholkopf. Probabilistic latent variable models for distinguishing between cause and effect. In Advances in Neural Information Processing Systems, 2010.Google Scholar
- 32.M. Schmidt, A. Niculescu-Mizil, and K. Murphy. Learning graphical model structure using l1-regularization paths. In Proceedings of AAAI, 2007.Google Scholar
- 33.Marco Scutari. Learning bayesian networks with the bnlearn R package. Journal of Statistical Software, 35(3):1–22, 2010. URL http://www.jstatsoft.org/v35/i03/.
- 37.I. Tsamardinos, CF Aliferis, and A Statnikov. Time and sample efficient discovery of markov blankets and direct causal relations. In Proceedings of KDD, pages 673–678, 2003.Google Scholar
- 38.I. Tsamardinos, C.F. Aliferis, and A. Statnikov. Algorithms for large scale markov blanket discovery. In Proceedings of FLAIRS, 2003.Google Scholar