Abstract
Methods of deep learning have become increasingly popular in recent years, but they have not arrived in compositional data analysis. Imputation methods for compositional data are typically applied on additive, centered, or isometric log-ratio representations of the data. Generally, methods for compositional data analysis can only be applied to observed positive entries in a data matrix. Therefore, one tries to impute missing values or measurements that were below a detection limit. In this paper, a new method for imputing rounded zeros based on artificial neural networks is shown and compared with conventional methods. We are also interested in the question whether for ANNs, a representation of the data in log-ratios for imputation purposes is relevant. It can be shown that ANNs are competitive or even performing better when imputing rounded zeros of data sets with moderate size. They deliver better results when data sets are big. Also, we can see that log-ratio transformations within the artificial neural network imputation procedure nevertheless help to improve the results. This proves that the theory of compositional data analysis and the fulfillment of all properties of compositional data analysis is still very important in the age of deep learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, and C. Citro. TensorFlow: large-scale machine learning on heterogeneous systems, 2018. URL https://www.tensorflow.org/. Version: 1.10.0, Software available from tensorflow.org
J. Aitchison, The Statistical Analysis of Compositional Data (Chapman & Hall, London, 1986)
J. Aitchison, C. Barceló-Vidal, J.A. MartÃn-Fernández, V. Pawlowsky-Glahn, Logratio analysis and compositional distance. Math. Geol. 32(3), 271–275 (2000)
J.J. Allaire and F. Chollet. keras: R Interface to ’Keras’, 2019. URL https://keras.rstudio.com. R package version 2.2.4.1.9001
C. Arisdakessian, O. Poirion, B. Yunits, X. Zhu, L.X. Garmire, Deepimpute: an accurate, fast, and scalable deep neural network method to impute single-cell rna-seq data. Genome Biol. 20(1), 211 (2019). https://doi.org/10.1186/s13059-019-1837-6
J. Chen, X. Zhang, K. Hron, M. Templ, S. Li, Regression imputation with q-mode clustering for rounded zero replacement in high-dimensional compositional data. J. Appl. Stat. 45(11), 2067–2080 (2017). https://doi.org/10.1080/02664763.2017.1410524
F. Chollet et al., Keras (2015). https://keras.io
S.J. Choudhury, N.R. Pal, Imputation of missing data with neural networks for classification. Knowl.-Based Syst. 182, 104838 (2019.) ISSN 0950-7051. https://doi.org/10.1016/j.knosys.2019.07.009
P. Filzmoser, K. Hron, M. Templ, Applied Compositional Data Analysis (Springer International Publishing, 2018). ISBN 9783319964225. https://doi.org/10.1007/978-3-319-96422-5
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, 2nd edn. (Springer, New York, 2009). ISBN 978-0-387-84857-0
K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: surpassing human-level performance on ImageNet classification (2015)
K. Hron, M. Templ, P. Filzmoser, Imputation of missing values for compositional data using classical and robust methods. Comput. Stat. Data Anal. 54(12), 3095–3107 (2010). ISSN 0167-9473. https://doi.org/10.1016/j.csda.2009.11.023
J.M. Jerez, I. Molina, P.J. GarcÃa-Laencina, E. Alba, N. Ribelles, M. MartÅ„, L. Franco, Missing data imputation using statistical & machine learning methods in a real breast cancer problem. Artif. Intell. Med. 50(2), 105–115 (2010). ISSN 0933-3657. https://doi.org/10.1016/j.artmed.2010.05.002
D.P. Kingma, J. Ba. Adam: a method for stochastic optimization (2014). arXiv:abs/1412.6980
A. Kowarik, M. Templ, Imputation with the R package VIM. J. Stat. Softw. 74(7), 1–16 (2016). https://doi.org/10.18637/jss.v074.i07
A. Krizhevsky, I. Sutskever, G.E. Hinton. Imagenet classification with deep convolutional neural networks (2012). Internet Resource, accessed on 14 Jan 2019
S.C-X. Li, B. Jiang, B.M. Marlin, Misgan: learning from incomplete data with generative adversarial networks (2019). http://arxiv.org/abs/1902.09599
Y.C. Lim, Learning generative models from incomplete data. Technical report CMU-CS-19-120, School of Computer Science, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213 (2019)
T. Maiti, C.P. Miller, P.K. Mukhopadhyay, Neural network imputation: an experience with the national resources inventory survey. J. Agric., Biol., Environ. Stat. 13(3), 255–269 (2008). ISSN 10857117
J. MartÃn-Fernández, K. Hron, P. Templ, M. Filzmoser, J. Palarea-Albaladejo, Model-based replacement of rounded zeros in compositional data: classical and robust approaches. Comput. Stat. Data Anal. 56(9), 2688–2704 (2012). https://doi.org/10.1016/j.csda.2012.02.012
J.A. MartÃn-Fernández, C. Barceló-Vidal, V. Pawlowsky-Glahn, Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Math. Geol. 35(3), 253–278 (2003)
J.A. MartÃn-Fernández, J. Palarea-Albaladejo, R.A. Olea, Dealing with zeros, in Compositional Data Analysis: Theory and Applications, ed. by V. Pawlowsky-Glahn, A. Buccianti (Wiley, Chichester, 2011), pp. 43–58
J.A. MartÃn-Fernández, K. Hron, M. Templ, P. Filzmoser, J. Palarea-Albaladejo, Bayesian-multiplicative treatment of count zeros in compositional data sets. Stat. Model. 15(2), 134–158 (2015)
P-A. Mattei, J. Frellsen, missiwae: deep generative modelling and imputation of incomplete data (2018). ArXiv:abs/1812.02633
M. Mayer, missRanger: fast imputation of missing values (2019). https://CRAN.R-project.org/package=missRanger. R package version 2.1.0
J.T. McCoy, S. Kroon, L. Auret, Variational autoencoders for missing data imputation with application to a simulated milling circuit. IFAC-PapersOnLine 51(21), 141 – 146 (2018). ISSN 2405-8963. https://doi.org/10.1016/j.ifacol.2018.09.406; in 5th IFAC Workshop on Mining, Mineral and Metal Processing MMM 2018
M.A. Nielsen, Neural Networks & Deep Learning, vol. 25 (Determination Press, USA, 2015)
J. Palarea-Albaladejo, J.A. MartÃn-Fernández, A modified em alr-algorithm for replacing rounded zeros in compositional data sets. Comput. Geosci. 34(8), 902–917 (2008)
J. Palarea-Albaladejo, J.A. MartÃn-Fernández, Values below detection limit in compositional chemical data. Anal. Chim. Acta 764, 32–43 (2013)
J. Palarea-Albaladejo, J.A. MartÃn-Fernández, J. Gómez-GarcÃa, A parametric approach for dealing with compositional rounded zeros. Math. Geol. 39(7), 625–645 (2007)
J. Palarea-Albaladejo, J.A. MartÃn-Fernández, R.A. Olea, A bootstrap estimation scheme for chemical compositional data with nondetects. J. Chemom. 28(7), 585–599 (2014)
C. Reimann, P. Filzmoser, R.G. Garrett, R. Dutter, Statistical Data Analysis Explained: Applied Environmental Statistics with R (Wiley, Chichester, 2008)
S. Ruder, An overview of gradient descent optimization algorithms (2016). http://arxiv.org/abs/1609.04747
E-L. Silva-RamÃrez, R. Pino-MejÃas, M. López-Coello, Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns. Appl. Soft Comput. 29, 65–74 (2015.) ISSN 1568-4946. https://doi.org/10.1016/j.asoc.2014.09.052
M. Smieja, U. Struski, J. Tabor, B. Zieliski, P. Spurek, Processing of missing data by neural networks, in Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18 (Curran Associates Inc, Red Hook, NY, USA, 2018), pp. 2724–2734
D.J. Stekhoven, P. Bühlmann, Missforest—non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118, 10 (2011). ISSN 1367-4803. https://doi.org/10.1093/bioinformatics/btr597
M. Templ, deepImp: imputation with deep learning methods (2020). https://bitbucket.org/matthias-da/deepimp/. R package version 1.0.0
M. Templ, P. Filzmoser, C. Reimann, Cluster analysis applied to regional geochemical data: problems and possibilities. Appl. Geochem. 23(8), 2198 – 2213 (2008). ISSN 0883-2927. https://doi.org/10.1016/j.apgeochem.2008.03.004. http://www.sciencedirect.com/science/article/pii/S088329270800125X
M. Templ, K. Hron, P. Filzmoser, robCompositions: An R-package for Robust Statistical Analysis of Compositional Data (Wiley, Hoboken, 2011), pp. 341–355. ISBN 9781119976462. http://dx.doi.org/10.1002/9781119976462.ch25
M. Templ, A. Alfons, P. Filzmoser, Exploring incomplete data using visualization techniques. Adv. Data Anal. Classif. 6(1), 29–47 (2012). https://doi.org/10.1007/s11634-011-0102-y
M. Templ, K. Hron, P. Filzmoser, A. Gardlo, Imputation of rounded zeros for high-dimensional compositional data. Chemom. Intell. Lab. Syst. 155, 183–190 (2016). https://doi.org/10.1016/j.chemolab.2016.04.011. http://www.sciencedirect.com/science/article/pii/S0169743916300958
S. van Buuren, K. Groothuis-Oudshoorn, mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011). http://www.jstatsoft.org/v45/i03/
K.G. van-den Boogaart, R. Tolosana-Delgado, M. Templ, Regression with compositional response having unobserved components or below detection limit values. Stat. Model. 15(2), 191–213 (2015)
A. Vedaldi, K. Lenc, Matconvnet: convolutional neural networks for MATLAB, in Proceedings of the 23rd ACM International Conference on Multimedia (ACM, 2015), pp. 689–692
J. Xie, L. Xu, E. Chen, Image denoising and inpainting with deep neural networks, in Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS’12 (Curran Associates Inc, Red Hook, NY, USA, 2012), pp. 341–349
J. Yoon, J. Jordon, M. van der Schaar, GAIN: missing data imputation using generative adversarial nets (2018). http://arxiv.org/abs/1806.02920
Acknowledgements
I would like to thank Peter Filzmoser and Karel Hron for the many collaborations and the fruitful discussion on the topic of compositional data analysis, imputation, and rounded zeros. Furthermore, my thanks go to Eric Grunsky and Peter Filzmoser as well as to one unknown reviewer for their constructive and helpful comments on the initial submission.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Templ, M. (2021). Artificial Neural Networks to Impute Rounded Zeros in Compositional Data. In: Filzmoser, P., Hron, K., MartÃn-Fernández, J.A., Palarea-Albaladejo, J. (eds) Advances in Compositional Data Analysis. Springer, Cham. https://doi.org/10.1007/978-3-030-71175-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-71175-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71174-0
Online ISBN: 978-3-030-71175-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)