Causal Discovery from Medical Data: Dealing with Missing Values and a Mixture of Discrete and Continuous Data
Causal discovery is an increasingly popular method for data analysis in the field of medical research. In this paper we consider two challenges in causal discovery that occur very often when working with medical data: a mixture of discrete and continuous variables and a substantial amount of missing values. To the best of our knowledge there are no methods that can handle both challenges at the same time. In this paper we develop a new method that can handle these challenges based on the assumption that data is missing completely at random and that variables obey a non-paranormal distribution. We demonstrate the validity of our approach for causal discovery for empiric data from a monetary incentive delay task. Our results may help to better understand the etiology of attention deficit-hyperactivity disorder (ADHD).
KeywordsCausal discovery Missing data Mixture of discrete and continuous data ADHD
Unable to display preview. Download preview PDF.
- 1.Abegaz, F., Wit, E.: Penalized EM algorithm and copula skeptic graphical models for inferring networks for mixed variables. Statistics in Medicine (2014)Google Scholar
- 2.Bach, F.R., Jordan, M.I.: Learning graphical models with Mercer kernels. In: Proceedings of the NIPS Conference, pp. 1009–1016 (2002)Google Scholar
- 3.Claassen, T., Heskes, T.: A Bayesian approach to constraint based causal inference. In: Proceedings of the UAI Conference, pp. 207–216 (2012)Google Scholar
- 4.Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), pp. 1–38 (1977)Google Scholar
- 6.Friedman, N.: The bayesian structural EM algorithm. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 129–138. Morgan Kaufmann Publishers Inc. (1998)Google Scholar
- 8.Monti, S., Cooper, G.F.: Learning hybrid Bayesian networks from data. Technical Report ISSP-97-01, Intelligent Systems Program, University of Pittsburgh (1997)Google Scholar
- 9.Riggelsen, C., Feelders, A.: Learning bayesian network models from incomplete data using importance sampling. In: Proc. of Artificial Intelligence and Statistics, pp. 301–308 (2005)Google Scholar
- 10.Sokolova, E., Groot, P., Claassen, T., Heskes, T.: Causal discovery from databases with discrete and continuous variables. In: van der Gaag, L.C., Feelders, A.J. (eds.) PGM 2014. LNCS, vol. 8754, pp. 442–457. Springer, Heidelberg (2014)Google Scholar
- 11.von Rhein, D., Mennes, M., van Ewijk, H., Groenman, A.P., Zwiers, M.P., Oosterlaan, J., Heslenfeld, D., Franke, B., Hoekstra, P.J., Faraone, S.V.: et al. The NeuroIMAGE study: a prospective phenotypic, cognitive, genetic and MRI study in children with attention-deficit/hyperactivity disorder. Design and descriptives. European Child & Adolescent Psychiatry, 1–17 (2014)Google Scholar
- 12.Wang, H., Fazayeli, F., Chatterjee, S., Banerjee, A., Steinhauser, K., Ganguly, A., Bhattacharjee, K., Konar, A., Nagar, A.: Gaussian copula precision estimation with missing values. Biotechnology Journal 4(9) (2009)Google Scholar