MIDA: Multiple Imputation Using Denoising Autoencoders

  • Lovedeep GondaraEmail author
  • Ke Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10939)


Missing data is a significant problem impacting all domains. State-of-the-art framework for minimizing missing data bias is multiple imputation, for which the choice of an imputation model remains nontrivial. We propose a multiple imputation model based on overcomplete deep denoising autoencoders. Our proposed model is capable of handling different data types, missingness patterns, missingness proportions and distributions. Evaluation on several real life datasets show our proposed model significantly outperforms current state-of-the-art methods under varying conditions while simultaneously improving end of the line analytics.


  1. 1.
    Beaulieu-Jones, B.K., Moore, J.H.: The pooled resource open-access ALS, and clinical trials consortium. Missing data imputation in the electronic health record using deeply learned autoencoders. In: Pacific Symposium on Biocomputing, vol. 22, pp. 207. NIH Public Access (2016)Google Scholar
  2. 2.
    Bengio, Y., Yao, L., Alain, G., Vincent, P.: Generalized denoising auto-encoders as generative models. In: Advances in Neural Information Processing Systems, pp. 899–907 (2013)Google Scholar
  3. 3.
    Buuren, S., Groothuis-Oudshoorn, K.: MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–68 (2011)CrossRefGoogle Scholar
  4. 4.
    Chen, P.: Optimization algorithms on subspaces: revisiting missing data problem in low-rank matrix. Int. J. Comput. Vis. 80(1), 125–142 (2008)CrossRefGoogle Scholar
  5. 5.
    Duan, Y., Lv, Y., Kang, W., Zhao, Y.: A deep learning based approach for traffic data imputation. In: 2014 IEEE 17th International Conference on Intelligent Transportation Systems (ITSC), pp. 912–917. IEEE (2014)Google Scholar
  6. 6.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRefGoogle Scholar
  7. 7.
    Leisch, F., Dimitriadou, E.: Machine learning benchmark problems (2010)Google Scholar
  8. 8.
    Li, S., Kawale, J., Fu, Y.: Deep collaborative filtering via marginalized denoising auto-encoder. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 811–820. ACM (2015)Google Scholar
  9. 9.
    Little, R.J.A.: Missing-data adjustments in large surveys. J. Bus. Econ. Stat. 6(3), 287–296 (1988)Google Scholar
  10. 10.
    Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Hoboken (2014)zbMATHGoogle Scholar
  11. 11.
    Morris, T.P., White, I.R., Royston, P.: Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med. Res. Methodol. 14(1), 75 (2014)CrossRefGoogle Scholar
  12. 12.
    Nelwamondo, F.V., Mohamed, S., Marwala, T.: Missing data: A comparison of neural network and expectation maximisation techniques. arXiv preprint arXiv:0704.3474 (2007)
  13. 13.
    Nesterov, Y.: A method of solving a convex programming problem with convergence rate O (1/k2) (1983)Google Scholar
  14. 14.
    Rubin, D.B.: Inference and missing data. Biometrika 63, 581–592 (1976)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Schafer, J.L.: Multiple imputation: a primer. Stat. Methods Med. Res. 8(1), 3–15 (1999)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Shah, A.D., Bartlett, J.W., Carpenter, J., Nicholas, O., Hemingway, H.: Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. Am. J. Epidemiol. 179(6), 764–774 (2014)CrossRefGoogle Scholar
  17. 17.
    Sterne, J.A.C., White, I.R., Carlin, J.B., Spratt, M., Royston, P., Kenward, M.G., Wood, A.M., Carpenter, J.R.: Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338, b2393 (2009)CrossRefGoogle Scholar
  18. 18.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103. ACM (2008)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computing ScienceSimon Fraser UniversityBurnabyCanada

Personalised recommendations