Artificial Neural Networks to Impute Rounded Zeros in Compositional Data

Templ, Matthias

doi:10.1007/978-3-030-71175-7_9

Matthias Templ⁵

1803 Accesses
6 Citations
1 Altmetric

Abstract

Methods of deep learning have become increasingly popular in recent years, but they have not arrived in compositional data analysis. Imputation methods for compositional data are typically applied on additive, centered, or isometric log-ratio representations of the data. Generally, methods for compositional data analysis can only be applied to observed positive entries in a data matrix. Therefore, one tries to impute missing values or measurements that were below a detection limit. In this paper, a new method for imputing rounded zeros based on artificial neural networks is shown and compared with conventional methods. We are also interested in the question whether for ANNs, a representation of the data in log-ratios for imputation purposes is relevant. It can be shown that ANNs are competitive or even performing better when imputing rounded zeros of data sets with moderate size. They deliver better results when data sets are big. Also, we can see that log-ratio transformations within the artificial neural network imputation procedure nevertheless help to improve the results. This proves that the theory of compositional data analysis and the fulfillment of all properties of compositional data analysis is still very important in the age of deep learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, and C. Citro. TensorFlow: large-scale machine learning on heterogeneous systems, 2018. URL https://www.tensorflow.org/. Version: 1.10.0, Software available from tensorflow.org
J. Aitchison, The Statistical Analysis of Compositional Data (Chapman & Hall, London, 1986)
Book Google Scholar
J. Aitchison, C. Barceló-Vidal, J.A. Martín-Fernández, V. Pawlowsky-Glahn, Logratio analysis and compositional distance. Math. Geol. 32(3), 271–275 (2000)
Article Google Scholar
J.J. Allaire and F. Chollet. keras: R Interface to ’Keras’, 2019. URL https://keras.rstudio.com. R package version 2.2.4.1.9001
C. Arisdakessian, O. Poirion, B. Yunits, X. Zhu, L.X. Garmire, Deepimpute: an accurate, fast, and scalable deep neural network method to impute single-cell rna-seq data. Genome Biol. 20(1), 211 (2019). https://doi.org/10.1186/s13059-019-1837-6
Article Google Scholar
J. Chen, X. Zhang, K. Hron, M. Templ, S. Li, Regression imputation with q-mode clustering for rounded zero replacement in high-dimensional compositional data. J. Appl. Stat. 45(11), 2067–2080 (2017). https://doi.org/10.1080/02664763.2017.1410524
Article MathSciNet Google Scholar
F. Chollet et al., Keras (2015). https://keras.io
S.J. Choudhury, N.R. Pal, Imputation of missing data with neural networks for classification. Knowl.-Based Syst. 182, 104838 (2019.) ISSN 0950-7051. https://doi.org/10.1016/j.knosys.2019.07.009
P. Filzmoser, K. Hron, M. Templ, Applied Compositional Data Analysis (Springer International Publishing, 2018). ISBN 9783319964225. https://doi.org/10.1007/978-3-319-96422-5
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, 2nd edn. (Springer, New York, 2009). ISBN 978-0-387-84857-0
Google Scholar
K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: surpassing human-level performance on ImageNet classification (2015)
Google Scholar
K. Hron, M. Templ, P. Filzmoser, Imputation of missing values for compositional data using classical and robust methods. Comput. Stat. Data Anal. 54(12), 3095–3107 (2010). ISSN 0167-9473. https://doi.org/10.1016/j.csda.2009.11.023
J.M. Jerez, I. Molina, P.J. García-Laencina, E. Alba, N. Ribelles, M. Martń, L. Franco, Missing data imputation using statistical & machine learning methods in a real breast cancer problem. Artif. Intell. Med. 50(2), 105–115 (2010). ISSN 0933-3657. https://doi.org/10.1016/j.artmed.2010.05.002
D.P. Kingma, J. Ba. Adam: a method for stochastic optimization (2014). arXiv:abs/1412.6980
A. Kowarik, M. Templ, Imputation with the R package VIM. J. Stat. Softw. 74(7), 1–16 (2016). https://doi.org/10.18637/jss.v074.i07
A. Krizhevsky, I. Sutskever, G.E. Hinton. Imagenet classification with deep convolutional neural networks (2012). Internet Resource, accessed on 14 Jan 2019
Google Scholar
S.C-X. Li, B. Jiang, B.M. Marlin, Misgan: learning from incomplete data with generative adversarial networks (2019). http://arxiv.org/abs/1902.09599
Y.C. Lim, Learning generative models from incomplete data. Technical report CMU-CS-19-120, School of Computer Science, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213 (2019)
Google Scholar
T. Maiti, C.P. Miller, P.K. Mukhopadhyay, Neural network imputation: an experience with the national resources inventory survey. J. Agric., Biol., Environ. Stat. 13(3), 255–269 (2008). ISSN 10857117
Google Scholar
J. Martín-Fernández, K. Hron, P. Templ, M. Filzmoser, J. Palarea-Albaladejo, Model-based replacement of rounded zeros in compositional data: classical and robust approaches. Comput. Stat. Data Anal. 56(9), 2688–2704 (2012). https://doi.org/10.1016/j.csda.2012.02.012
Article MathSciNet MATH Google Scholar
J.A. Martín-Fernández, C. Barceló-Vidal, V. Pawlowsky-Glahn, Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Math. Geol. 35(3), 253–278 (2003)
Article Google Scholar
J.A. Martín-Fernández, J. Palarea-Albaladejo, R.A. Olea, Dealing with zeros, in Compositional Data Analysis: Theory and Applications, ed. by V. Pawlowsky-Glahn, A. Buccianti (Wiley, Chichester, 2011), pp. 43–58
Chapter Google Scholar
J.A. Martín-Fernández, K. Hron, M. Templ, P. Filzmoser, J. Palarea-Albaladejo, Bayesian-multiplicative treatment of count zeros in compositional data sets. Stat. Model. 15(2), 134–158 (2015)
Article MathSciNet Google Scholar
P-A. Mattei, J. Frellsen, missiwae: deep generative modelling and imputation of incomplete data (2018). ArXiv:abs/1812.02633
M. Mayer, missRanger: fast imputation of missing values (2019). https://CRAN.R-project.org/package=missRanger. R package version 2.1.0
J.T. McCoy, S. Kroon, L. Auret, Variational autoencoders for missing data imputation with application to a simulated milling circuit. IFAC-PapersOnLine 51(21), 141 – 146 (2018). ISSN 2405-8963. https://doi.org/10.1016/j.ifacol.2018.09.406; in 5th IFAC Workshop on Mining, Mineral and Metal Processing MMM 2018
M.A. Nielsen, Neural Networks & Deep Learning, vol. 25 (Determination Press, USA, 2015)
Google Scholar
J. Palarea-Albaladejo, J.A. Martín-Fernández, A modified em alr-algorithm for replacing rounded zeros in compositional data sets. Comput. Geosci. 34(8), 902–917 (2008)
Article Google Scholar
J. Palarea-Albaladejo, J.A. Martín-Fernández, Values below detection limit in compositional chemical data. Anal. Chim. Acta 764, 32–43 (2013)
Article Google Scholar
J. Palarea-Albaladejo, J.A. Martín-Fernández, J. Gómez-García, A parametric approach for dealing with compositional rounded zeros. Math. Geol. 39(7), 625–645 (2007)
Article Google Scholar
J. Palarea-Albaladejo, J.A. Martín-Fernández, R.A. Olea, A bootstrap estimation scheme for chemical compositional data with nondetects. J. Chemom. 28(7), 585–599 (2014)
Article Google Scholar
C. Reimann, P. Filzmoser, R.G. Garrett, R. Dutter, Statistical Data Analysis Explained: Applied Environmental Statistics with R (Wiley, Chichester, 2008)
Book Google Scholar
S. Ruder, An overview of gradient descent optimization algorithms (2016). http://arxiv.org/abs/1609.04747
E-L. Silva-Ramírez, R. Pino-Mejías, M. López-Coello, Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns. Appl. Soft Comput. 29, 65–74 (2015.) ISSN 1568-4946. https://doi.org/10.1016/j.asoc.2014.09.052
M. Smieja, U. Struski, J. Tabor, B. Zieliski, P. Spurek, Processing of missing data by neural networks, in Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18 (Curran Associates Inc, Red Hook, NY, USA, 2018), pp. 2724–2734
Google Scholar
D.J. Stekhoven, P. Bühlmann, Missforest—non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118, 10 (2011). ISSN 1367-4803. https://doi.org/10.1093/bioinformatics/btr597
M. Templ, deepImp: imputation with deep learning methods (2020). https://bitbucket.org/matthias-da/deepimp/. R package version 1.0.0
M. Templ, P. Filzmoser, C. Reimann, Cluster analysis applied to regional geochemical data: problems and possibilities. Appl. Geochem. 23(8), 2198 – 2213 (2008). ISSN 0883-2927. https://doi.org/10.1016/j.apgeochem.2008.03.004. http://www.sciencedirect.com/science/article/pii/S088329270800125X
M. Templ, K. Hron, P. Filzmoser, robCompositions: An R-package for Robust Statistical Analysis of Compositional Data (Wiley, Hoboken, 2011), pp. 341–355. ISBN 9781119976462. http://dx.doi.org/10.1002/9781119976462.ch25
M. Templ, A. Alfons, P. Filzmoser, Exploring incomplete data using visualization techniques. Adv. Data Anal. Classif. 6(1), 29–47 (2012). https://doi.org/10.1007/s11634-011-0102-y
Article MathSciNet Google Scholar
M. Templ, K. Hron, P. Filzmoser, A. Gardlo, Imputation of rounded zeros for high-dimensional compositional data. Chemom. Intell. Lab. Syst. 155, 183–190 (2016). https://doi.org/10.1016/j.chemolab.2016.04.011. http://www.sciencedirect.com/science/article/pii/S0169743916300958
S. van Buuren, K. Groothuis-Oudshoorn, mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011). http://www.jstatsoft.org/v45/i03/
K.G. van-den Boogaart, R. Tolosana-Delgado, M. Templ, Regression with compositional response having unobserved components or below detection limit values. Stat. Model. 15(2), 191–213 (2015)
Google Scholar
A. Vedaldi, K. Lenc, Matconvnet: convolutional neural networks for MATLAB, in Proceedings of the 23rd ACM International Conference on Multimedia (ACM, 2015), pp. 689–692
Google Scholar
J. Xie, L. Xu, E. Chen, Image denoising and inpainting with deep neural networks, in Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS’12 (Curran Associates Inc, Red Hook, NY, USA, 2012), pp. 341–349
Google Scholar
J. Yoon, J. Jordon, M. van der Schaar, GAIN: missing data imputation using generative adversarial nets (2018). http://arxiv.org/abs/1806.02920

Download references

Acknowledgements

I would like to thank Peter Filzmoser and Karel Hron for the many collaborations and the fruitful discussion on the topic of compositional data analysis, imputation, and rounded zeros. Furthermore, my thanks go to Eric Grunsky and Peter Filzmoser as well as to one unknown reviewer for their constructive and helpful comments on the initial submission.

Author information

Authors and Affiliations

Institute of Data Analysis and Process Design, Zurich University of Applied Sciences, Rosenstrasse 3, CH-8401, Winterthur, Switzerland
Matthias Templ

Authors

Matthias Templ
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias Templ .

Editor information

Editors and Affiliations

Institute of Statistics and Mathematical Methods in Economics, TU Wien, Vienna, Austria
Peter Filzmoser
Department of Mathematical Analysis and Applications of Mathematics, Palacký University, Olomouc, Czech Republic
Karel Hron
Department of Computer Science, Applied Mathematics and Statistics, University of Girona, Girona, Spain
Josep Antoni Martín-Fernández
Biomathematics and Statistics Scotland, Edinburgh, UK
Javier Palarea-Albaladejo

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Templ, M. (2021). Artificial Neural Networks to Impute Rounded Zeros in Compositional Data. In: Filzmoser, P., Hron, K., Martín-Fernández, J.A., Palarea-Albaladejo, J. (eds) Advances in Compositional Data Analysis. Springer, Cham. https://doi.org/10.1007/978-3-030-71175-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-71175-7_9
Published: 02 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71174-0
Online ISBN: 978-3-030-71175-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics