Causal Discovery from Databases with Discrete and Continuous Variables

  • Elena Sokolova
  • Perry Groot
  • Tom Claassen
  • Tom Heskes
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8754)


Bayesian Constraint-based Causal Discovery (BCCD) is a state-of-the-art method for robust causal discovery in the presence of latent variables. It combines probabilistic estimation of Bayesian networks over subsets of variables with a causal logic to infer causal statements. Currently BCCD is limited to discrete or Gaussian variables. Most of the real-world data, however, contain a mixture of discrete and continuous variables. We here extend BCCD to be able to handle combinations of discrete and continuous variables, under the assumption that the relations between the variables are monotonic. To this end, we propose a novel method for the efficient computation of BIC scores for hybrid Bayesian networks. We demonstrate the accuracy and efficiency of our approach for causal discovery on simulated data as well as on real-world data from the ADHD-200 competition.


Causal discovery hybrid data structure learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bach, F.R., Jordan, M.I.: Learning graphical models with Mercer kernels. In: Proceedings of the NIPS Conference, pp. 1009–1016 (2002)Google Scholar
  2. 2.
    Bauermeister, J.J., Shrout, P.E., Chávez, L., Rubio-Stipec, M., Ramírez, R., Padilla, L., Anderson, A., García, P., Canino, G.: ADHD and Gender: Are risks and sequela of ADHD the same for boys and girls? Journal of Child Psychology and Psychiatry 48(8), 831–839 (2007)CrossRefGoogle Scholar
  3. 3.
    Cao, Q., Zang, Y., Sun, L., Sui, M., Long, X., Zou, Q., Wang, Y.: Abnormal neural activity in children with attention deficit hyperactivity disorder: a resting-state functional magnetic resonance imaging study. Neuroreport 17(10), 1033–1036 (2006)CrossRefGoogle Scholar
  4. 4.
    Chickering, D.M.: Optimal structure identification with greedy search. Journal of Machine Learning Research 3, 507–554 (2002)MathSciNetGoogle Scholar
  5. 5.
    Chickering, D.M., Geiger, D., Heckerman, D.: Learning Bayesian networks: Search methods and experimental results. In: Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, pp. 112–128 (January 1995)Google Scholar
  6. 6.
    Claassen, T., Heskes, T.: A Bayesian approach to constraint based causal inference. In: Proceedings of the UAI Conference, pp. 207–216. AUAI Press (2012)Google Scholar
  7. 7.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Series in Telecommunications and Signal Processing. Wiley-Interscience (2006)Google Scholar
  8. 8.
    Daly, R., Shen, Q., Aitken, J.S.: Learning Bayesian networks: approaches and issues. Knowledge Eng. Review 26(2), 99–157 (2011)CrossRefGoogle Scholar
  9. 9.
    Dawid, A.P.: Statistical theory: the prequential approach (with discussion). J. R. Statist. Soc. A 147, 278–292 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    de Campos, L.M., Huete, J.F.: Approximating causal orderings for Bayesian networks using genetic algorithms and simulated annealing. In: Proceedings of the Eighth IPMU Conference, pp. 333–340 (2000)Google Scholar
  11. 11.
    de Santana, Á.L., Francês, C.R.L., Costa, J.C.W.: Algorithm for graphical Bayesian modeling based on multiple regressions. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 496–506. Springer, Heidelberg (2007)Google Scholar
  12. 12.
    Friedman, N., Goldszmidt, M.: Discretizing continuous attributes while learning Bayesian networks. In: Proceedings of the ICML Conference, pp. 157–165 (1996)Google Scholar
  13. 13.
    Geiger, D., Heckerman, D.: Learning Gaussian networks. In: Proceedings of the UAI Conference, pp. 235–243. Morgan Kaufmann (1994)Google Scholar
  14. 14.
    Gretton, A., Fukumizu, K., Teo, C.H., Song, L., Schölkopf, B., Smola, A.J.: A kernel statistical test of independence. In: NIPS. Curran Associates, Inc. (2007)Google Scholar
  15. 15.
    Harris, N., Drton, M.: PC algorithm for nonparanormal graphical models. Journal of Machine Learning Research 14, 3365–3383 (2013)MathSciNetGoogle Scholar
  16. 16.
    Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: The combination of knowledge and statistical data. In: Machine Learning, pp. 197–243 (1995)Google Scholar
  17. 17.
    Larrañaga, P., Kuijpers, C.M.H., Murga, R.H., Yurramendi, Y.: Learning Bayesian network structures by searching for the best ordering with genetic algorithms. IEEE Transactions on Systems, Man and Cybernetics 26, 487–493 (1996)CrossRefGoogle Scholar
  18. 18.
    Lauritzen, S., Spiegelhalter, D.J.: Local computations with probabilities on graphical structures and their application to expert systems (with discussion). Journal of the Royal Statistical Society Series B 50, 157–224 (1988)Google Scholar
  19. 19.
    Lauritzen, S.L., Lauritzen, S.L.: Propagation of probabilities, means and variances in mixed graphical association models. Journal of the American Statistical Association 87, 1098–1108 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Monti, S., Cooper, G.F.: Learning hybrid Bayesian networks from data. Technical Report ISSP-97-01, Intelligent Systems Program, University of Pittsburgh (1997)Google Scholar
  21. 21.
    Monti, S., Cooper, G.F.: A multivariate discretization method for learning Bayesian networks from mixed data. In: Cooper, G.F., Moral, S. (eds.) Proceedings of the UAI Conference, pp. 404–413. Morgan Kaufmann (1998)Google Scholar
  22. 22.
    Paloyelis, Y., Rijsdijk, F., Wood, A., Asherson, P., Kuntsi, J.: The genetic association between adhd symptoms and reading difficulties: the role of inattentiveness and IQ. J. Abnorm. Child Psychol. 38, 1083–1095 (2010)Google Scholar
  23. 23.
    Pearl, J., Verma, T.: A theory of inferred causation. In: Proceedings of the KR Conference, pp. 441–452. Morgan Kaufmann (1991)Google Scholar
  24. 24.
    Pellet, J.P., Elisseeff, A.: Using Markov blankets for causal structure learning. J. Mach. Learn. Res. 9, 1295–1342 (2008)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Ramsey, J., Zhang, J., Spirtes, P.: Adjacency-faithfulness and conservative causal inference. In: Proceedings of the UAI Conference, pp. 401–408. AUAI Press (2006)Google Scholar
  26. 26.
    Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6, 461–464 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press (2000)Google Scholar
  28. 28.
    Spirtes, P., Glymour, C., Scheines, R.: The TETRAD Project: Causal Models and Statistical Data. Carnegie Mellon University Department of Philosophy, Pittsburgh (2004)Google Scholar
  29. 29.
    Vaida, N., Mattoo, N.H., Wood, A., Madhosh, A.: Intelligence among attention deficit hyperactivity disordered (adhd) children (aged 5-9). J. Psychology 4(1), 9–12 (2013)Google Scholar
  30. 30.
    Willcutt, E., Pennington, B., DeFries, J.: Etiology of inattention and hyperactivity/impulsivity in a community sample of twins with learning difficulties. J. Abnorm. Child Psychol. 28(2), 149–159 (2000)CrossRefGoogle Scholar
  31. 31.
    Zhang, K., Peters, J., Janzing, D., Schölkopf, B.: Kernel-based conditional independence test and application in causal discovery. CoRR 1202.3775 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Elena Sokolova
    • 1
  • Perry Groot
    • 1
  • Tom Claassen
    • 1
  • Tom Heskes
    • 1
  1. 1.Faculty of ScienceRadboud UniversityNijmegenThe Netherlands

Personalised recommendations