Improving deep learning performance with missing values via deletion and compensation

  • Adrián Sánchez-MoralesEmail author
  • José-Luis Sancho-Gómez
  • Juan-Antonio Martínez-García
  • Aníbal R. Figueiras-Vidal


Missing values in a dataset is one of the most common difficulties in real applications. Many different techniques based on machine learning have been proposed in the literature to face this problem. In this work, the great representation capability of the stacked denoising auto-encoders is used to obtain a new method of imputating missing values based on two ideas: deletion and compensation. This method improves imputation performance by artificially deleting values in the input features and using them as targets in the training process. Nevertheless, although the deletion of samples is demonstrated to be really efficient, it may cause an imbalance between the distributions of the training and the test sets. In order to solve this issue, a compensation mechanism is proposed based on a slight modification of the error function to be optimized. Experiments over several datasets show that the deletion and compensation not only involve improvements in imputation but also in classification in comparison with other classical techniques.


Missing values Imputation Classification Deep learning 



The work of A. R. Figueiras-Vidal has been partly supported by Grant Macro-ADOBE (TEC 2015-67719-P, MINECO/FEDER&FSE). The work of J.L. Sancho-Gómez has been partly supported by Grant AES 2017 (PI17/00771, MINECO/FEDER).


  1. 1.
    Sharpe PK, Solly RJ (1995) Dealing with missing values in neural network-based diagnostic systems. Neural Comput Appl 3(2):73–77. CrossRefGoogle Scholar
  2. 2.
    Little R, Rubin D (2002) Statistical analysis with missing data, 2nd edn. Wiley, LondonCrossRefzbMATHGoogle Scholar
  3. 3.
    García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19(2):263–282. CrossRefGoogle Scholar
  4. 4.
    Quinlan JR (1993) C4.5: programs for machine learning. Morgan-Kaufmann, BurlingtonGoogle Scholar
  5. 5.
    Lim CP, Leong JH, Kuan MM (2005) A hybrid neural network system for pattern classification tasks with missing features. IEEE Trans Pattern Anal Mach Intell 27:648–653. CrossRefGoogle Scholar
  6. 6.
    Del Castillo PR, Cardeosa J (2012) Fuzzy min–max neural networks for categorical data: application to missing data imputation. Neural Comput Appl 21(6):1349–1362. CrossRefGoogle Scholar
  7. 7.
    Delalleau O, Courville A, Bengio Y (2008) Gaussian mixtures with missing data: an efficient EM training algorithm. In: Proceeding of the computing research association conference, Snowbird, p 155Google Scholar
  8. 8.
    Ghahramani Z, Jordan MI (1994) Supervised learning from incomplete data via an EM approach. In: Cowan JD, Tesauro G, Alspector J (eds) Advances in neural information processing systems, vol 6. Morgan-Kaufmann, Burlington, pp 120–127Google Scholar
  9. 9.
    Zio MD, Guarnera U, Luzi O (2007) Imputation through finite Gaussian mixture models. Comput Stat Data Anal 51(11):5305–5316. MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR, Verleysen M (2009) K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72(7–9):1483–1493. CrossRefGoogle Scholar
  11. 11.
    Batista GE, Monard MC (2003) An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17(5–6):519–533. CrossRefGoogle Scholar
  12. 12.
    Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, David Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525CrossRefGoogle Scholar
  13. 13.
    Fessant F, Midenet S (2002) Self-organising map for data imputation and correction in surveys. Neural Comput Appl 10(4):300–310. CrossRefzbMATHGoogle Scholar
  14. 14.
    Peng H, Zhu S (2007) Handling of incomplete data sets using ICA and SOM in data mining. Neural Comput Appl 16(2):167–172. CrossRefGoogle Scholar
  15. 15.
    Latif BA, Mercier G (2010) Self-organizing maps.
  16. 16.
    Gupta A, Lam MS (1996) Estimating missing values using neural networks. J Oper Res Soc 47:229–238. CrossRefzbMATHGoogle Scholar
  17. 17.
    Nishanth KJ, Ravi V, Ankaiaha N, Bose I (2012) Soft computing based imputation and hybrid data and text mining: the case of predicting the severity of phishing alerts. Expert Syst Appl 39(12):10583–10589. CrossRefGoogle Scholar
  18. 18.
    Smola AJ, Vishwanathan SVN, Hofmann T (2005) Kernel methods for missing variables. In: Proceedings of the 10th international workshop on artificial intelligence and statistics, pp 325–332Google Scholar
  19. 19.
    García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR (2013) Classifying patterns with missing values using multi-task learning perceptrons. Expert Syst Appl 40(4):1333–1341. CrossRefGoogle Scholar
  20. 20.
    Bengio Y, Lecun Y (2007) Scaling learning algorithms towards AI. MIT Press, CambridgeGoogle Scholar
  21. 21.
    Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput Appl 18(7):1527–1554. MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127. MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Deng L, Yu D (2014) Deep learning: methods and applications. Found Trends Signal Process 7(3–4):197–387. MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Beaulieu-Jones BK, Moore JH (2017) Missing data imputation in the electronic health record using deeply learned autoencoders. World Scientific, Singapore, pp 207–218. Google Scholar
  25. 25.
    Gondara L, Wang K (2017) Multiple imputation using deep denoising autoencoders. arXiv:1705.02737v2
  26. 26.
    Sánchez-Morales A, Sancho-Gómez JL, Figueiras-Vidal AR (2017) Values deletion to improve deep imputation processes. In: International work-conference on the interplay between natural and artificial computation, IWINAC 2017, Coruna, pp 240–246.
  27. 27.
    Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning, ICML’08. ACM, New York, pp 1096–1103.
  28. 28.
    Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408MathSciNetzbMATHGoogle Scholar
  29. 29.
    Alvear-Sandoval RF, Figueiras-Vidal AR (2018) On building ensembles of stacked denoising auto-encoding classifiers and their further improvement. Inf Fusion 39:41–52. CrossRefGoogle Scholar
  30. 30.
    Little RJA, Rubin DB (1986) Statistical analysis with missing data. Wiley, LondonzbMATHGoogle Scholar
  31. 31.
    Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, LondonCrossRefzbMATHGoogle Scholar
  32. 32.
    Lichman M (2013) UCI machine learning repository.
  33. 33.
    Delve: data for evaluating learning in valid experiments.
  34. 34.
    Schmitt P, Mandel J, Guedj M (2015) A comparison of six methods for missing data imputation. J Biomet Biostat 6:224. Google Scholar
  35. 35.
    Azur MJ, Stuart EA, Frangakis C, Leaf PJ (2011) Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res 20(1):40–49. CrossRefGoogle Scholar
  36. 36.
    Brahma PP, Wu D, She Y (2016) Why deep learning works: a manifold disentanglement perspective. IEEE Trans Neural Netw Learn Syst 27(10):1997–2008. MathSciNetCrossRefGoogle Scholar
  37. 37.
    Goodfellow I, McDaniel P, Papernot N (2018) Making machine learning robust against adversarial inputs. Commun ACM 61(6):56–66. CrossRefGoogle Scholar
  38. 38.
    Vorobeychik Y, Kantarcioglu M (2018) Adversarial machine learning. Synth Lect Artif Intell Mach Learn 12(3):1–169CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Departamento de Tecnologías de la Información y las ComunicacionesUniversidad Politécnica de CartagenaCartagenaSpain
  2. 2.Departamento de Teoría de la Señal y ComunicacionesUniversidad Carlos III de MadridLeganésSpain

Personalised recommendations