Skip to main content

Artificial Neural Networks to Impute Rounded Zeros in Compositional Data

  • Chapter
  • First Online:
Advances in Compositional Data Analysis

Abstract

Methods of deep learning have become increasingly popular in recent years, but they have not arrived in compositional data analysis. Imputation methods for compositional data are typically applied on additive, centered, or isometric log-ratio representations of the data. Generally, methods for compositional data analysis can only be applied to observed positive entries in a data matrix. Therefore, one tries to impute missing values or measurements that were below a detection limit. In this paper, a new method for imputing rounded zeros based on artificial neural networks is shown and compared with conventional methods. We are also interested in the question whether for ANNs, a representation of the data in log-ratios for imputation purposes is relevant. It can be shown that ANNs are competitive or even performing better when imputing rounded zeros of data sets with moderate size. They deliver better results when data sets are big. Also, we can see that log-ratio transformations within the artificial neural network imputation procedure nevertheless help to improve the results. This proves that the theory of compositional data analysis and the fulfillment of all properties of compositional data analysis is still very important in the age of deep learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org

  • M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, and C. Citro. TensorFlow: large-scale machine learning on heterogeneous systems, 2018. URL https://www.tensorflow.org/. Version: 1.10.0, Software available from tensorflow.org

  • J. Aitchison, The Statistical Analysis of Compositional Data (Chapman & Hall, London, 1986)

    Book  Google Scholar 

  • J. Aitchison, C. Barceló-Vidal, J.A. Martín-Fernández, V. Pawlowsky-Glahn, Logratio analysis and compositional distance. Math. Geol. 32(3), 271–275 (2000)

    Article  Google Scholar 

  • J.J. Allaire and F. Chollet. keras: R Interface to ’Keras’, 2019. URL https://keras.rstudio.com. R package version 2.2.4.1.9001

  • C. Arisdakessian, O. Poirion, B. Yunits, X. Zhu, L.X. Garmire, Deepimpute: an accurate, fast, and scalable deep neural network method to impute single-cell rna-seq data. Genome Biol. 20(1), 211 (2019). https://doi.org/10.1186/s13059-019-1837-6

    Article  Google Scholar 

  • J. Chen, X. Zhang, K. Hron, M. Templ, S. Li, Regression imputation with q-mode clustering for rounded zero replacement in high-dimensional compositional data. J. Appl. Stat. 45(11), 2067–2080 (2017). https://doi.org/10.1080/02664763.2017.1410524

    Article  MathSciNet  Google Scholar 

  • F. Chollet et al., Keras (2015). https://keras.io

  • S.J. Choudhury, N.R. Pal, Imputation of missing data with neural networks for classification. Knowl.-Based Syst. 182, 104838 (2019.) ISSN 0950-7051. https://doi.org/10.1016/j.knosys.2019.07.009

  • P. Filzmoser, K. Hron, M. Templ, Applied Compositional Data Analysis (Springer International Publishing, 2018). ISBN 9783319964225. https://doi.org/10.1007/978-3-319-96422-5

  • T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, 2nd edn. (Springer, New York, 2009). ISBN 978-0-387-84857-0

    Google Scholar 

  • K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: surpassing human-level performance on ImageNet classification (2015)

    Google Scholar 

  • K. Hron, M. Templ, P. Filzmoser, Imputation of missing values for compositional data using classical and robust methods. Comput. Stat. Data Anal. 54(12), 3095–3107 (2010). ISSN 0167-9473. https://doi.org/10.1016/j.csda.2009.11.023

  • J.M. Jerez, I. Molina, P.J. García-Laencina, E. Alba, N. Ribelles, M. MartÅ„, L. Franco, Missing data imputation using statistical & machine learning methods in a real breast cancer problem. Artif. Intell. Med. 50(2), 105–115 (2010). ISSN 0933-3657. https://doi.org/10.1016/j.artmed.2010.05.002

  • D.P. Kingma, J. Ba. Adam: a method for stochastic optimization (2014). arXiv:abs/1412.6980

  • A. Kowarik, M. Templ, Imputation with the R package VIM. J. Stat. Softw. 74(7), 1–16 (2016). https://doi.org/10.18637/jss.v074.i07

  • A. Krizhevsky, I. Sutskever, G.E. Hinton. Imagenet classification with deep convolutional neural networks (2012). Internet Resource, accessed on 14 Jan 2019

    Google Scholar 

  • S.C-X. Li, B. Jiang, B.M. Marlin, Misgan: learning from incomplete data with generative adversarial networks (2019). http://arxiv.org/abs/1902.09599

  • Y.C. Lim, Learning generative models from incomplete data. Technical report CMU-CS-19-120, School of Computer Science, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213 (2019)

    Google Scholar 

  • T. Maiti, C.P. Miller, P.K. Mukhopadhyay, Neural network imputation: an experience with the national resources inventory survey. J. Agric., Biol., Environ. Stat. 13(3), 255–269 (2008). ISSN 10857117

    Google Scholar 

  • J. Martín-Fernández, K. Hron, P. Templ, M. Filzmoser, J. Palarea-Albaladejo, Model-based replacement of rounded zeros in compositional data: classical and robust approaches. Comput. Stat. Data Anal. 56(9), 2688–2704 (2012). https://doi.org/10.1016/j.csda.2012.02.012

    Article  MathSciNet  MATH  Google Scholar 

  • J.A. Martín-Fernández, C. Barceló-Vidal, V. Pawlowsky-Glahn, Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Math. Geol. 35(3), 253–278 (2003)

    Article  Google Scholar 

  • J.A. Martín-Fernández, J. Palarea-Albaladejo, R.A. Olea, Dealing with zeros, in Compositional Data Analysis: Theory and Applications, ed. by V. Pawlowsky-Glahn, A. Buccianti (Wiley, Chichester, 2011), pp. 43–58

    Chapter  Google Scholar 

  • J.A. Martín-Fernández, K. Hron, M. Templ, P. Filzmoser, J. Palarea-Albaladejo, Bayesian-multiplicative treatment of count zeros in compositional data sets. Stat. Model. 15(2), 134–158 (2015)

    Article  MathSciNet  Google Scholar 

  • P-A. Mattei, J. Frellsen, missiwae: deep generative modelling and imputation of incomplete data (2018). ArXiv:abs/1812.02633

  • M. Mayer, missRanger: fast imputation of missing values (2019). https://CRAN.R-project.org/package=missRanger. R package version 2.1.0

  • J.T. McCoy, S. Kroon, L. Auret, Variational autoencoders for missing data imputation with application to a simulated milling circuit. IFAC-PapersOnLine 51(21), 141 – 146 (2018). ISSN 2405-8963. https://doi.org/10.1016/j.ifacol.2018.09.406; in 5th IFAC Workshop on Mining, Mineral and Metal Processing MMM 2018

  • M.A. Nielsen, Neural Networks & Deep Learning, vol. 25 (Determination Press, USA, 2015)

    Google Scholar 

  • J. Palarea-Albaladejo, J.A. Martín-Fernández, A modified em alr-algorithm for replacing rounded zeros in compositional data sets. Comput. Geosci. 34(8), 902–917 (2008)

    Article  Google Scholar 

  • J. Palarea-Albaladejo, J.A. Martín-Fernández, Values below detection limit in compositional chemical data. Anal. Chim. Acta 764, 32–43 (2013)

    Article  Google Scholar 

  • J. Palarea-Albaladejo, J.A. Martín-Fernández, J. Gómez-García, A parametric approach for dealing with compositional rounded zeros. Math. Geol. 39(7), 625–645 (2007)

    Article  Google Scholar 

  • J. Palarea-Albaladejo, J.A. Martín-Fernández, R.A. Olea, A bootstrap estimation scheme for chemical compositional data with nondetects. J. Chemom. 28(7), 585–599 (2014)

    Article  Google Scholar 

  • C. Reimann, P. Filzmoser, R.G. Garrett, R. Dutter, Statistical Data Analysis Explained: Applied Environmental Statistics with R (Wiley, Chichester, 2008)

    Book  Google Scholar 

  • S. Ruder, An overview of gradient descent optimization algorithms (2016). http://arxiv.org/abs/1609.04747

  • E-L. Silva-Ramírez, R. Pino-Mejías, M. López-Coello, Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns. Appl. Soft Comput. 29, 65–74 (2015.) ISSN 1568-4946. https://doi.org/10.1016/j.asoc.2014.09.052

  • M. Smieja, U. Struski, J. Tabor, B. Zieliski, P. Spurek, Processing of missing data by neural networks, in Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18 (Curran Associates Inc, Red Hook, NY, USA, 2018), pp. 2724–2734

    Google Scholar 

  • D.J. Stekhoven, P. Bühlmann, Missforest—non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118, 10 (2011). ISSN 1367-4803. https://doi.org/10.1093/bioinformatics/btr597

  • M. Templ, deepImp: imputation with deep learning methods (2020). https://bitbucket.org/matthias-da/deepimp/. R package version 1.0.0

  • M. Templ, P. Filzmoser, C. Reimann, Cluster analysis applied to regional geochemical data: problems and possibilities. Appl. Geochem. 23(8), 2198 – 2213 (2008). ISSN 0883-2927. https://doi.org/10.1016/j.apgeochem.2008.03.004. http://www.sciencedirect.com/science/article/pii/S088329270800125X

  • M. Templ, K. Hron, P. Filzmoser, robCompositions: An R-package for Robust Statistical Analysis of Compositional Data (Wiley, Hoboken, 2011), pp. 341–355. ISBN 9781119976462. http://dx.doi.org/10.1002/9781119976462.ch25

  • M. Templ, A. Alfons, P. Filzmoser, Exploring incomplete data using visualization techniques. Adv. Data Anal. Classif. 6(1), 29–47 (2012). https://doi.org/10.1007/s11634-011-0102-y

    Article  MathSciNet  Google Scholar 

  • M. Templ, K. Hron, P. Filzmoser, A. Gardlo, Imputation of rounded zeros for high-dimensional compositional data. Chemom. Intell. Lab. Syst. 155, 183–190 (2016). https://doi.org/10.1016/j.chemolab.2016.04.011. http://www.sciencedirect.com/science/article/pii/S0169743916300958

  • S. van Buuren, K. Groothuis-Oudshoorn, mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011). http://www.jstatsoft.org/v45/i03/

  • K.G. van-den Boogaart, R. Tolosana-Delgado, M. Templ, Regression with compositional response having unobserved components or below detection limit values. Stat. Model. 15(2), 191–213 (2015)

    Google Scholar 

  • A. Vedaldi, K. Lenc, Matconvnet: convolutional neural networks for MATLAB, in Proceedings of the 23rd ACM International Conference on Multimedia (ACM, 2015), pp. 689–692

    Google Scholar 

  • J. Xie, L. Xu, E. Chen, Image denoising and inpainting with deep neural networks, in Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS’12 (Curran Associates Inc, Red Hook, NY, USA, 2012), pp. 341–349

    Google Scholar 

  • J. Yoon, J. Jordon, M. van der Schaar, GAIN: missing data imputation using generative adversarial nets (2018). http://arxiv.org/abs/1806.02920

Download references

Acknowledgements

I would like to thank Peter Filzmoser and Karel Hron for the many collaborations and the fruitful discussion on the topic of compositional data analysis, imputation, and rounded zeros. Furthermore, my thanks go to Eric Grunsky and Peter Filzmoser as well as to one unknown reviewer for their constructive and helpful comments on the initial submission.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthias Templ .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Templ, M. (2021). Artificial Neural Networks to Impute Rounded Zeros in Compositional Data. In: Filzmoser, P., Hron, K., Martín-Fernández, J.A., Palarea-Albaladejo, J. (eds) Advances in Compositional Data Analysis. Springer, Cham. https://doi.org/10.1007/978-3-030-71175-7_9

Download citation

Publish with us

Policies and ethics