DAEMA: Denoising Autoencoder with Mask Attention

Tihon, Simon; Javaid, Muhammad Usama; Fourure, Damien; Posocco, Nicolas; Peel, Thomas

doi:10.1007/978-3-030-86362-3_19

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12891))

Included in the following conference series:

International Conference on Artificial Neural Networks

3269 Accesses
4 Citations

Abstract

Missing data is a recurrent and challenging problem, especially when using machine learning algorithms for real-world applications. For this reason, missing data imputation has become an active research area, in which recent deep learning approaches have achieved state-of-the-art results. We propose DAEMA (Denoising Autoencoder with Mask Attention), an algorithm based on a denoising autoencoder architecture with an attention mechanism. While most imputation algorithms use incomplete inputs as they would use complete data - up to basic preprocessing (e.g. mean imputation) - DAEMA leverages a mask-based attention mechanism to focus on the observed values of its inputs. We evaluate DAEMA both in terms of reconstruction capabilities and downstream prediction and show that it achieves superior performance to state-of-the-art algorithms on several publicly available real-world datasets under various missingness settings.

S. Tihon and M. U. Javaid—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/euranova/DAEMA.

References

van Buuren, S., Groothuis-Oudshoorn, K.: mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–68 (2010)
Google Scholar
Camino, R.D., Hammerschmidt, C.A., State, R.: Improving missing data imputation with deep generative models. arXiv preprint arXiv:1902.10666 (2019)
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Gondara, L., Wang, K.: MIDA: multiple imputation using denoising autoencoders. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 260–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93040-4_21
Chapter Google Scholar
Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein GANs. In: NIPS, pp. 5769–5779 (2017). http://papers.nips.cc/paper/7159-improved-training-of-wasserstein-gans
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (Poster) (2015)
Google Scholar
McCoy, J.T., Kroon, S., Auret, L.: Variational autoencoders for missing data imputation with application to a simulated milling circuit. IFAC-PapersOnLine 51(21), 141–146 (2018)
Article Google Scholar
Muzellec, B., Josse, J., Boyer, C., Cuturi, M.: Missing data imputation using optimal transport. In: International Conference on Machine Learning, pp. 7130–7140. PMLR (2020)
Google Scholar
Nazabal, A., Olmos, P.M., Ghahramani, Z., Valera, I.: Handling incomplete heterogeneous data using VAEs. Pattern Recogn. 107, 107501 (2020)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Richardson, T.W., Wu, W., Lin, L., Xu, B., Bernal, E.A.: McFlow: Monte Carlo flow models for data imputation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14205–14214 (2020)
Google Scholar
Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)
Article MathSciNet Google Scholar
Seaman, S., Galati, J., Jackson, D., Carlin, J.D.: What is meant by “missing at random’’? Stat. Sci. 28, 257–268 (2013)
Article MathSciNet Google Scholar
Spinelli, I., Scardapane, S., Uncini, A.: Missing data imputation with adversarially-trained graph convolutional networks. Neural Netw. 129, 249–260 (2020)
Article Google Scholar
Stekhoven, D.J., Bühlmann, P.: MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118 (2012)
Article Google Scholar
Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 6000–6010 (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103 (2008)
Google Scholar
Wang, X., Li, A., Jiang, Z., Feng, H.: Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme. BMC Bioinform. 7(1), 1–10 (2006). https://doi.org/10.1186/1471-2105-7-32
Article Google Scholar
Wu, R., Zhang, A., Ilyas, I., Rekatsinas, T.: Attention-based learning for missing data imputation in HoloClean. In: Proceedings of Machine Learning and Systems, vol. 2, pp. 307–325 (2020)
Google Scholar
Yoon, J., Jordon, J., Schaar, M.: GAIN: missing data imputation using generative adversarial nets. In: International Conference on Machine Learning, pp. 5689–5698. PMLR (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

EURA NOVA, Mont-St-Guibert, Belgium
Simon Tihon, Muhammad Usama Javaid, Damien Fourure, Nicolas Posocco & Thomas Peel

Authors

Simon Tihon
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Usama Javaid
View author publications
You can also search for this author in PubMed Google Scholar
Damien Fourure
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Posocco
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Peel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Usama Javaid .

Editor information

Editors and Affiliations

Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
iMotions A/S, Copenhagen, Denmark
Paolo Masulli
University of Tübingen, Tübingen, Baden-Württemberg, Germany
Sebastian Otte
Universität Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tihon, S., Javaid, M.U., Fourure, D., Posocco, N., Peel, T. (2021). DAEMA: Denoising Autoencoder with Mask Attention. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12891. Springer, Cham. https://doi.org/10.1007/978-3-030-86362-3_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-86362-3_19
Published: 07 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86361-6
Online ISBN: 978-3-030-86362-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DAEMA: Denoising Autoencoder with Mask Attention