Causal Discovery from Medical Data: Dealing with Missing Values and a Mixture of Discrete and Continuous Data

  • Elena SokolovaEmail author
  • Perry Groot
  • Tom Claassen
  • Daniel von Rhein
  • Jan Buitelaar
  • Tom Heskes
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9105)


Causal discovery is an increasingly popular method for data analysis in the field of medical research. In this paper we consider two challenges in causal discovery that occur very often when working with medical data: a mixture of discrete and continuous variables and a substantial amount of missing values. To the best of our knowledge there are no methods that can handle both challenges at the same time. In this paper we develop a new method that can handle these challenges based on the assumption that data is missing completely at random and that variables obey a non-paranormal distribution. We demonstrate the validity of our approach for causal discovery for empiric data from a monetary incentive delay task. Our results may help to better understand the etiology of attention deficit-hyperactivity disorder (ADHD).


Causal discovery Missing data Mixture of discrete and continuous data ADHD 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abegaz, F., Wit, E.: Penalized EM algorithm and copula skeptic graphical models for inferring networks for mixed variables. Statistics in Medicine (2014)Google Scholar
  2. 2.
    Bach, F.R., Jordan, M.I.: Learning graphical models with Mercer kernels. In: Proceedings of the NIPS Conference, pp. 1009–1016 (2002)Google Scholar
  3. 3.
    Claassen, T., Heskes, T.: A Bayesian approach to constraint based causal inference. In: Proceedings of the UAI Conference, pp. 207–216 (2012)Google Scholar
  4. 4.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), pp. 1–38 (1977)Google Scholar
  5. 5.
    Franke, B., Neale, B.M., Faraone, S.V.: Genome-wide association studies in ADHD. Human Genetics 126(1), 13–50 (2009)CrossRefGoogle Scholar
  6. 6.
    Friedman, N.: The bayesian structural EM algorithm. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 129–138. Morgan Kaufmann Publishers Inc. (1998)Google Scholar
  7. 7.
    Harris, N., Drton, M.: PC algorithm for nonparanormal graphical models. Journal of Machine Learning Research 14, 3365–3383 (2013)zbMATHMathSciNetGoogle Scholar
  8. 8.
    Monti, S., Cooper, G.F.: Learning hybrid Bayesian networks from data. Technical Report ISSP-97-01, Intelligent Systems Program, University of Pittsburgh (1997)Google Scholar
  9. 9.
    Riggelsen, C., Feelders, A.: Learning bayesian network models from incomplete data using importance sampling. In: Proc. of Artificial Intelligence and Statistics, pp. 301–308 (2005)Google Scholar
  10. 10.
    Sokolova, E., Groot, P., Claassen, T., Heskes, T.: Causal discovery from databases with discrete and continuous variables. In: van der Gaag, L.C., Feelders, A.J. (eds.) PGM 2014. LNCS, vol. 8754, pp. 442–457. Springer, Heidelberg (2014)Google Scholar
  11. 11.
    von Rhein, D., Mennes, M., van Ewijk, H., Groenman, A.P., Zwiers, M.P., Oosterlaan, J., Heslenfeld, D., Franke, B., Hoekstra, P.J., Faraone, S.V.: et al. The NeuroIMAGE study: a prospective phenotypic, cognitive, genetic and MRI study in children with attention-deficit/hyperactivity disorder. Design and descriptives. European Child & Adolescent Psychiatry, 1–17 (2014)Google Scholar
  12. 12.
    Wang, H., Fazayeli, F., Chatterjee, S., Banerjee, A., Steinhauser, K., Ganguly, A., Bhattacharjee, K., Konar, A., Nagar, A.: Gaussian copula precision estimation with missing values. Biotechnology Journal 4(9) (2009)Google Scholar
  13. 13.
    Willcutt, E.G., Pennington, B.F., DeFries, J.C.: Etiology of inattention and hyperactivity/impulsivity in a community sample of twins with learning difficulties. J. Abnorm. Child Psychol. 28(2), 149–159 (2000)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Elena Sokolova
    • 1
    Email author
  • Perry Groot
    • 1
  • Tom Claassen
    • 1
  • Daniel von Rhein
    • 2
  • Jan Buitelaar
    • 2
  • Tom Heskes
    • 1
  1. 1.Faculty of ScienceRadboud UniversityNijmegenThe Netherlands
  2. 2.Donders Institute for Brain, Cognition and BehaviourRadboud University Medical CenterNijmegenThe Netherlands

Personalised recommendations