Learning Functional Causal Models with Generative Neural Networks

  • Olivier GoudetEmail author
  • Diviyan Kalainathan
  • Philippe Caillou
  • Isabelle Guyon
  • David Lopez-Paz
  • Michèle Sebag
Part of the The Springer Series on Challenges in Machine Learning book series (SSCML)


We introduce a new approach to functional causal modeling from observational data, called Causal Generative Neural Networks (CGNN). CGNN leverages the power of neural networks to learn a generative model of the joint distribution of the observed variables, by minimizing the Maximum Mean Discrepancy between generated and observed data. An approximate learning criterion is proposed to scale the computational cost of the approach to linear complexity in the number of observations. The performance of CGNN is studied throughout three experiments. Firstly, CGNN is applied to cause-effect inference, where the task is to identify the best causal hypothesis out of “X → Y ” and “Y → X”. Secondly, CGNN is applied to the problem of identifying v-structures and conditional independences. Thirdly, CGNN is applied to multivariate functional causal modeling: given a skeleton describing the direct dependences in a set of random variables X = [X1, …, Xd], CGNN orients the edges in the skeleton to uncover the directed acyclic causal graph describing the causal structure of the random variables. On all three tasks, CGNN is extensively assessed on both artificial and real-world data, comparing favorably to the state-of-the-art. Finally, CGNN is extended to handle the case of confounders, where latent variables are involved in the overall causal model.


Generative neural networks Causal structure discovery Cause-effect pair problem Functional causal models Structural equation models 


  1. Bühlmann, P., Peters, J., Ernest, J., et al. (2014). Cam: Causal additive models, high-dimensional order search and penalized regression. The Annals of Statistics, 42(6):2526–2556.MathSciNetzbMATHCrossRefGoogle Scholar
  2. Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3(Nov):507–554.MathSciNetzbMATHGoogle Scholar
  3. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv.Google Scholar
  4. Colombo, D. and Maathuis, M. H. (2014). Order-independent constraint-based causal structure learning. Journal of Machine Learning Research, 15(1):3741–3782.MathSciNetzbMATHGoogle Scholar
  5. Colombo, D., Maathuis, M. H., Kalisch, M., and Richardson, T. S. (2012). Learning high-dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, pages 294–321.MathSciNetzbMATHCrossRefGoogle Scholar
  6. Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems (MCSS), 2(4):303–314.MathSciNetzbMATHCrossRefGoogle Scholar
  7. Daniusis, P., Janzing, D., Mooij, J., Zscheischler, J., Steudel, B., Zhang, K., and Schölkopf, B. (2012). Inferring deterministic causal relations. arXiv preprint arXiv:1203.3475.Google Scholar
  8. Drton, M. and Maathuis, M. H. (2016). Structure learning in graphical modeling. Annual Review of Statistics and Its Application, (0).Google Scholar
  9. Edwards, R. (1964). Fourier analysis on groups.Google Scholar
  10. Fonollosa, J. A. (2016). Conditional distribution variability measures for causality detection. arXiv preprint arXiv:1601.06680.Google Scholar
  11. Goldberger, A. S. (1984). Reverse regression and salary discrimination. Journal of Human Resources.Google Scholar
  12. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. In Neural Information Processing Systems (NIPS), pages 2672–2680.Google Scholar
  13. Gretton, A., Borgwardt, K. M., Rasch, M., Schölkopf, B., Smola, A. J., et al. (2007). A kernel method for the two-sample-problem. 19:513.Google Scholar
  14. Gretton, A., Herbrich, R., Smola, A., Bousquet, O., and Schölkopf, B. (2005). Kernel methods for measuring independence. Journal of Machine Learning Research, 6(Dec):2075–2129.MathSciNetzbMATHGoogle Scholar
  15. Guyon, I. (2013). Chalearn cause effect pairs challenge.Google Scholar
  16. Guyon, I. (2014). Chalearn fast causation coefficient challenge.Google Scholar
  17. Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine.Google Scholar
  18. Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J., and Schölkopf, B. (2009). Nonlinear causal discovery with additive noise models. In Neural Information Processing Systems (NIPS), pages 689–696.Google Scholar
  19. Kalisch, M., Mächler, M., Colombo, D., Maathuis, M. H., Bühlmann, P., et al. (2012). Causal inference using graphical models with the r package pcalg. Journal of Statistical Software, 47(11):1–26.CrossRefGoogle Scholar
  20. Kingma, D. P. and Ba, J. (2014). Adam: A Method for Stochastic Optimization. ArXiv e-prints.Google Scholar
  21. Kingma, D. P. and Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.Google Scholar
  22. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. NIPS.Google Scholar
  23. Lopez-Paz, D. (2016). From dependence to causation. PhD thesis, University of Cambridge.Google Scholar
  24. Lopez-Paz, D., Muandet, K., Schölkopf, B., and Tolstikhin, I. O. (2015). Towards a learning theory of cause-effect inference. In ICML, pages 1452–1461.Google Scholar
  25. Lopez-Paz, D. and Oquab, M. (2016). Revisiting classifier two-sample tests. arXiv preprint arXiv:1610.06545.Google Scholar
  26. Mendes, P., Sha, W., and Ye, K. (2003). Artificial gene networks for objective comparison of analysis algorithms. Bioinformatics, 19(suppl_2):ii122–ii129.CrossRefGoogle Scholar
  27. Mooij, J. M., Peters, J., Janzing, D., Zscheischler, J., and Schölkopf, B. (2016). Distinguishing cause from effect using observational data: methods and benchmarks. Journal of Machine Learning Research, 17(32):1–102.MathSciNetzbMATHGoogle Scholar
  28. Nandy, P., Hauser, A., and Maathuis, M. H. (2015). High-dimensional consistency in score-based and hybrid structure learning. arXiv preprint arXiv:1507.02608.Google Scholar
  29. Ogarrio, J. M., Spirtes, P., and Ramsey, J. (2016). A hybrid causal search algorithm for latent variable models. In Conference on Probabilistic Graphical Models, pages 368–379.Google Scholar
  30. Pearl, J. (2003). Causality: models, reasoning and inference. Econometric Theory, 19(675-685):46.Google Scholar
  31. Pearl, J. (2009). Causality. Cambridge university press.Google Scholar
  32. Pearl, J. and Verma, T. (1991). A formal theory of inductive causation. University of California (Los Angeles). Computer Science Department.zbMATHGoogle Scholar
  33. Peters, J. and Bühlmann, P. (2013). Structural intervention distance (sid) for evaluating causal graphs. arXiv preprint arXiv:1306.1043.Google Scholar
  34. Peters, J., Janzing, D., and Schölkopf, B. (2017). Elements of Causal Inference - Foundations and Learning Algorithms. MIT Press.Google Scholar
  35. Quinn, J. A., Mooij, J. M., Heskes, T., and Biehl, M. (2011). Learning of causal relations. In ESANN.Google Scholar
  36. Ramsey, J. D. (2015). Scaling up greedy causal search for continuous variables. arXiv preprint arXiv:1507.07749.Google Scholar
  37. Richardson, T. and Spirtes, P. (2002). Ancestral graph markov models. The Annals of Statistics, 30(4):962–1030.MathSciNetzbMATHCrossRefGoogle Scholar
  38. Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D. A., and Nolan, G. P. (2005). Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721):523–529.CrossRefGoogle Scholar
  39. Scheines, R. (1997). An introduction to causal inference.Google Scholar
  40. Sgouritsa, E., Janzing, D., Hennig, P., and Schölkopf, B. (2015). Inference of cause and effect with unsupervised inverse regression. In AISTATS.Google Scholar
  41. Shen-Orr, S. S., Milo, R., Mangan, S., and Alon, U. (2002). Network motifs in the transcriptional regulation network of escherichia coli. Nature genetics, 31(1):64.CrossRefGoogle Scholar
  42. Shimizu, S., Hoyer, P. O., Hyvärinen, A., and Kerminen, A. (2006). A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7(Oct):2003–2030.MathSciNetzbMATHGoogle Scholar
  43. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature.Google Scholar
  44. Spirtes, P., Glymour, C., and Scheines, R. (1993). Causation, prediction and search. 1993. Lecture Notes in Statistics.Google Scholar
  45. Spirtes, P., Glymour, C. N., and Scheines, R. (2000). Causation, prediction, and search. MIT press.zbMATHGoogle Scholar
  46. Spirtes, P., Meek, C., Richardson, T., and Meek, C. (1999). An algorithm for causal inference in the presence of latent variables and selection bias.Google Scholar
  47. Spirtes, P. and Zhang, K. (2016). Causal discovery and inference: concepts and recent methodological advances. In Applied informatics, volume 3, page 3. Springer Berlin Heidelberg.Google Scholar
  48. Statnikov, A., Henaff, M., Lytkin, N. I., and Aliferis, C. F. (2012). New methods for separating causes from effects in genomics data. BMC genomics, 13(8):S22.CrossRefGoogle Scholar
  49. Stegle, O., Janzing, D., Zhang, K., Mooij, J. M., and Schölkopf, B. (2010). Probabilistic latent variable models for distinguishing between cause and effect. In Neural Information Processing Systems (NIPS), pages 1687–1695.Google Scholar
  50. Tsamardinos, I., Brown, L. E., and Aliferis, C. F. (2006). The max-min hill-climbing bayesian network structure learning algorithm. Machine learning, 65(1):31–78.CrossRefGoogle Scholar
  51. Van den Bulcke, T., Van Leemput, K., Naudts, B., van Remortel, P., Ma, H., Verschoren, A., De Moor, B., and Marchal, K. (2006). Syntren: a generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC bioinformatics, 7(1):43.CrossRefGoogle Scholar
  52. Verma, T. and Pearl, J. (1991). Equivalence and synthesis of causal models. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI ’90, pages 255–270, New York, NY, USA. Elsevier Science Inc.CrossRefGoogle Scholar
  53. Yamada, M., Jitkrittum, W., Sigal, L., Xing, E. P., and Sugiyama, M. (2014). High-dimensional feature selection by feature-wise kernelized lasso. Neural computation, 26(1):185–207.MathSciNetCrossRefGoogle Scholar
  54. Zhang, K. and Hyvärinen, A. (2009). On the identifiability of the post-nonlinear causal model. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, pages 647–655. AUAI Press.Google Scholar
  55. Zhang, K., Peters, J., Janzing, D., and Schölkopf, B. (2012). Kernel-based conditional independence test and application in causal discovery. arXiv preprint arXiv:1202.3775.Google Scholar
  56. Zhang, K., Wang, Z., Zhang, J., and Schölkopf, B. (2016). On estimation of functional causal models: general results and application to the post-nonlinear causal model. ACM Transactions on Intelligent Systems and Technology (TIST), 7(2):13.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Olivier Goudet
    • 1
    Email author
  • Diviyan Kalainathan
    • 1
  • Philippe Caillou
    • 1
  • Isabelle Guyon
    • 2
    • 3
  • David Lopez-Paz
    • 4
  • Michèle Sebag
    • 1
  1. 1.Team TAU - CNRS, INRIAUniversité Paris Sud, Université Paris SaclayParisFrance
  2. 2.INRIAUniversité Paris Sud, Université Paris SaclayParisFrance
  3. 3.ChaLearnBerkeleyUSA
  4. 4.Facebook AI ResearchMenlo ParkUSA

Personalised recommendations