Advertisement

Improving deep learning performance with missing values via deletion and compensation

  • Adrián Sánchez-MoralesEmail author
  • José-Luis Sancho-Gómez
  • Juan-Antonio Martínez-García
  • Aníbal R. Figueiras-Vidal
IWINAC 2015
  • 45 Downloads

Abstract

Missing values in a dataset is one of the most common difficulties in real applications. Many different techniques based on machine learning have been proposed in the literature to face this problem. In this work, the great representation capability of the stacked denoising auto-encoders is used to obtain a new method of imputating missing values based on two ideas: deletion and compensation. This method improves imputation performance by artificially deleting values in the input features and using them as targets in the training process. Nevertheless, although the deletion of samples is demonstrated to be really efficient, it may cause an imbalance between the distributions of the training and the test sets. In order to solve this issue, a compensation mechanism is proposed based on a slight modification of the error function to be optimized. Experiments over several datasets show that the deletion and compensation not only involve improvements in imputation but also in classification in comparison with other classical techniques.

Keywords

Missing values Imputation Classification Deep learning 

Notes

Acknowledgements

The work of A. R. Figueiras-Vidal has been partly supported by Grant Macro-ADOBE (TEC 2015-67719-P, MINECO/FEDER&FSE). The work of J.L. Sancho-Gómez has been partly supported by Grant AES 2017 (PI17/00771, MINECO/FEDER).

References

  1. 1.
    Sharpe PK, Solly RJ (1995) Dealing with missing values in neural network-based diagnostic systems. Neural Comput Appl 3(2):73–77.  https://doi.org/10.1007/BF01421959 CrossRefGoogle Scholar
  2. 2.
    Little R, Rubin D (2002) Statistical analysis with missing data, 2nd edn. Wiley, LondonCrossRefzbMATHGoogle Scholar
  3. 3.
    García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19(2):263–282.  https://doi.org/10.1007/s00521-009-0295-6 CrossRefGoogle Scholar
  4. 4.
    Quinlan JR (1993) C4.5: programs for machine learning. Morgan-Kaufmann, BurlingtonGoogle Scholar
  5. 5.
    Lim CP, Leong JH, Kuan MM (2005) A hybrid neural network system for pattern classification tasks with missing features. IEEE Trans Pattern Anal Mach Intell 27:648–653.  https://doi.org/10.1109/TPAMI.2005.64 CrossRefGoogle Scholar
  6. 6.
    Del Castillo PR, Cardeosa J (2012) Fuzzy min–max neural networks for categorical data: application to missing data imputation. Neural Comput Appl 21(6):1349–1362.  https://doi.org/10.1007/s00521-011-0574-x CrossRefGoogle Scholar
  7. 7.
    Delalleau O, Courville A, Bengio Y (2008) Gaussian mixtures with missing data: an efficient EM training algorithm. In: Proceeding of the computing research association conference, Snowbird, p 155Google Scholar
  8. 8.
    Ghahramani Z, Jordan MI (1994) Supervised learning from incomplete data via an EM approach. In: Cowan JD, Tesauro G, Alspector J (eds) Advances in neural information processing systems, vol 6. Morgan-Kaufmann, Burlington, pp 120–127Google Scholar
  9. 9.
    Zio MD, Guarnera U, Luzi O (2007) Imputation through finite Gaussian mixture models. Comput Stat Data Anal 51(11):5305–5316.  https://doi.org/10.1016/j.csda.2006.10.002 MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR, Verleysen M (2009) K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72(7–9):1483–1493.  https://doi.org/10.1016/j.neucom.2008.11.026 CrossRefGoogle Scholar
  11. 11.
    Batista GE, Monard MC (2003) An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17(5–6):519–533.  https://doi.org/10.1080/713827181 CrossRefGoogle Scholar
  12. 12.
    Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, David Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525CrossRefGoogle Scholar
  13. 13.
    Fessant F, Midenet S (2002) Self-organising map for data imputation and correction in surveys. Neural Comput Appl 10(4):300–310.  https://doi.org/10.1007/s005210200002 CrossRefzbMATHGoogle Scholar
  14. 14.
    Peng H, Zhu S (2007) Handling of incomplete data sets using ICA and SOM in data mining. Neural Comput Appl 16(2):167–172.  https://doi.org/10.1007/s00521-006-0058-6 CrossRefGoogle Scholar
  15. 15.
    Latif BA, Mercier G (2010) Self-organizing maps.  https://doi.org/10.5772/9178
  16. 16.
    Gupta A, Lam MS (1996) Estimating missing values using neural networks. J Oper Res Soc 47:229–238.  https://doi.org/10.2307/2584344 CrossRefzbMATHGoogle Scholar
  17. 17.
    Nishanth KJ, Ravi V, Ankaiaha N, Bose I (2012) Soft computing based imputation and hybrid data and text mining: the case of predicting the severity of phishing alerts. Expert Syst Appl 39(12):10583–10589.  https://doi.org/10.1016/j.eswa.2012.02.138 CrossRefGoogle Scholar
  18. 18.
    Smola AJ, Vishwanathan SVN, Hofmann T (2005) Kernel methods for missing variables. In: Proceedings of the 10th international workshop on artificial intelligence and statistics, pp 325–332Google Scholar
  19. 19.
    García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR (2013) Classifying patterns with missing values using multi-task learning perceptrons. Expert Syst Appl 40(4):1333–1341.  https://doi.org/10.1016/j.eswa.2012.08.057 CrossRefGoogle Scholar
  20. 20.
    Bengio Y, Lecun Y (2007) Scaling learning algorithms towards AI. MIT Press, CambridgeGoogle Scholar
  21. 21.
    Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput Appl 18(7):1527–1554.  https://doi.org/10.1162/neco.2006.18.7.1527 MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127.  https://doi.org/10.1561/2200000006 MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Deng L, Yu D (2014) Deep learning: methods and applications. Found Trends Signal Process 7(3–4):197–387.  https://doi.org/10.1561/2000000039 MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Beaulieu-Jones BK, Moore JH (2017) Missing data imputation in the electronic health record using deeply learned autoencoders. World Scientific, Singapore, pp 207–218.  https://doi.org/10.1142/97898132078130021 Google Scholar
  25. 25.
    Gondara L, Wang K (2017) Multiple imputation using deep denoising autoencoders. arXiv:1705.02737v2
  26. 26.
    Sánchez-Morales A, Sancho-Gómez JL, Figueiras-Vidal AR (2017) Values deletion to improve deep imputation processes. In: International work-conference on the interplay between natural and artificial computation, IWINAC 2017, Coruna, pp 240–246.  https://doi.org/10.1007/978-3-319-59773-7-25
  27. 27.
    Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning, ICML’08. ACM, New York, pp 1096–1103.  https://doi.org/10.1145/1390156.1390294
  28. 28.
    Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408MathSciNetzbMATHGoogle Scholar
  29. 29.
    Alvear-Sandoval RF, Figueiras-Vidal AR (2018) On building ensembles of stacked denoising auto-encoding classifiers and their further improvement. Inf Fusion 39:41–52.  https://doi.org/10.1016/j.inffus.2017.03.008 CrossRefGoogle Scholar
  30. 30.
    Little RJA, Rubin DB (1986) Statistical analysis with missing data. Wiley, LondonzbMATHGoogle Scholar
  31. 31.
    Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, LondonCrossRefzbMATHGoogle Scholar
  32. 32.
    Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
  33. 33.
    Delve: data for evaluating learning in valid experiments. https://www.cs.toronto.edu/~delve/data/datasets.html
  34. 34.
    Schmitt P, Mandel J, Guedj M (2015) A comparison of six methods for missing data imputation. J Biomet Biostat 6:224.  https://doi.org/10.4172/2155-6180.1000224 Google Scholar
  35. 35.
    Azur MJ, Stuart EA, Frangakis C, Leaf PJ (2011) Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res 20(1):40–49.  https://doi.org/10.1002/mpr.329 CrossRefGoogle Scholar
  36. 36.
    Brahma PP, Wu D, She Y (2016) Why deep learning works: a manifold disentanglement perspective. IEEE Trans Neural Netw Learn Syst 27(10):1997–2008.  https://doi.org/10.1109/tnnls.2015.2496947 MathSciNetCrossRefGoogle Scholar
  37. 37.
    Goodfellow I, McDaniel P, Papernot N (2018) Making machine learning robust against adversarial inputs. Commun ACM 61(6):56–66.  https://doi.org/10.1145/3134599 CrossRefGoogle Scholar
  38. 38.
    Vorobeychik Y, Kantarcioglu M (2018) Adversarial machine learning. Synth Lect Artif Intell Mach Learn 12(3):1–169CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Departamento de Tecnologías de la Información y las ComunicacionesUniversidad Politécnica de CartagenaCartagenaSpain
  2. 2.Departamento de Teoría de la Señal y ComunicacionesUniversidad Carlos III de MadridLeganésSpain

Personalised recommendations