Abstract
Compositional data carry relative information. Hence, their statistical analysis has to be performed on coordinates with respect to a log-ratio basis. Frequently, the modeler is required to back-transform the estimates obtained with the modeling to have them in the original units such as euros, kg or mg/liter. Approaches for recovering original units need to be formally introduced and its properties explored. Here, we formulate and analyze the properties of two procedures: a simple approach consisting of adding a residual part to the composition and an approach based on the use of an auxiliary variable. Both procedures are illustrated using a geochemical data set where the original units are recovered when spatial models are applied.
Similar content being viewed by others
References
Aitchison, J. (1986). The statistical analysis of compositional data. London: Chapman & Hall. (reprinted in 2003 by Blackburn Press).
Barceló-Vidal, C., & Martín-Fernández, J. A. (2016). The mathematics of compositional analysis. Austrian Journal of Statistics, 45(4), 57–71.
Buccianti, A. (2015). The FOREGS repository: Modelling variability in stream water on a continental scale revising classical diagrams from CoDA (compositional data analysis) perspective. Journal of Geochemical Exploration, 154, 94–104.
Buccianti, A., Egozcue, J. J., & Pawlowsky-Glahn, V. (2014). Variation diagrams to statistically model the behavior of geochemical variables: Theory and applications. Journal of Hydrology, 519, 988–998.
Coenders, G., Martín-Fernández, J. A., & Ferrer-Rosell, B. (2017). When relative and absolute information matter. Compositional predictor with a total in generalized linear models. Statistical Modelling, 17(6), 494–512.
Edjabou, M. E., Martín-Fernández, J. A., Scheutz, C., & Astrup, T. F. (2017). Statistical analysis of solid waste composition data: Arithmetic mean, standard deviation and correlation coefficients. Waste Management, 69, 13–23.
Egozcue, J. J., & Pawlowsky-Glahn, V. (2019). Compositional data: The sample space and its structure. TEST, 28(3), 599–638.
Graler, B., Pebesma, E., & Heuvelink, G. (2016). Spatio-temporal interpolation using gstat. The R Journal, 8(1), 204–218.
Jarauta-Bragulat, E., Hervada-Sala, C., & Egozcue, J. J. (2016). Air quality index revisited from a compositional point of view. Mathematical Geosciences, 48(5), 581–593.
Martín-Fernández, J. A. (2019). Comments on: Compositional data: The sample space and its structure. TEST, 28(3), 653–657.
Martín-Fernández, J. A., Daunis-Estadella, J., & Mateu-Figueras, G. (2015). On the interpretation of differences between groups for compositional data. SORT, 39, 231–252.
Martín-Fernández, J. A., Engle, M. A., Ruppert, L., & Olea, R. A. (2019). Advances in self-organizing maps for their application to compositional data. SERRA, 33, 817–826.
Mateu-Figueras, G., Pawlowsky-Glahn, V., & Egozcue, J. J. (2011). The principle of working on coordinates. In V. Pawlowsky-Glahn & A. Buccianti (Eds.), Compositional data analysis: Theory and applications (pp. 31–42). Chichester: Wiley.
Mateu-Figueras, G., Pawlowsky-Glahn, V., & Egozcue, J. J. (2013). The normal distribution in some constrained sample spaces. SORT, 37(1), 29–56.
Olea, R. A., Raju, N. J., Egozcue, J. J., Pawlowsky-Glahn, V., & Shubhra, S. (2018). Advancements in hydrochemistry mapping: Application to groundwater arsenic and iron concentrations in Varanasi, Uttar Pradesh, India. SERRA, 32(1), 241–259.
Palarea-Albaladejo, J., & Martín-Fernández, J. A. (2015). zCompositions—R package for multivariate imputation of nondetects and zeros in compositional data sets. Chemometrics and Intelligent Laboratory Systems, 143, 85–96.
Pawlowsky-Glahn, V., Egozcue, J. J., & Lovell, D. (2015a). Tools for compositional data with a total. Statistical Modelling, 15, 175–190.
Pawlowsky-Glahn, V., Egozcue, J. J., Olea, R. A., & Pardo-Igúzquiza, E. (2015b). Cokriging of compositional balances including a dimension reduction and retrieval of original units. Journal of the Southern African Institute of Mining and Metallurgy, 115(1), 59–72.
Pawlowsky-Glahn, V., Egozcue, J. J., & Tolosana-Delgado, R. (2015c). Modeling and analysis of compositional data. Chichester: Wiley.
R Core-Team. (2019). R: A language and environment for statistical computing. Retrieved December 1, 2019, from http://www.R-project.org.
Rikken, M. G. J., & Rijn, R. P. G. V. (1993). Soil pollution with heavy metals—An inquiry into spatial variation, cost of mapping and the risk evaluation of copper, cadmium, lead and zinc in the floodplains of the Meuse west of Stein, the Netherlands. Ph.D. thesis, Doctoraalveldwerkverslag, Dept. of Physical Geography, Utrecht University.
Tolosana-Delgado, R., Mueller, U., & van den Boogaart, K. G. (2019). Geostatistics for compositional data: An overview. Mathematical Geoscience, 51, 485–526.
Acknowledgments
This research has been funded by the project “CODAMET” (Ministerio de Ciencia, Innovación y Universidades; Ref: RTI2018-095518-B-C21).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1
Let f(T) be the function
then it holds
-
\(f({T})>1\), for any total T. To prove this property, one can use the well-known inequality between the geometric and arithmetic means
$$\begin{aligned} \sum _{j=1}^D Gx_j +Gr \le \sum _{j=1}^D \left( \frac{1}{n}\sum _{i=1}^n x_{ij}\right) + \frac{1}{n}\sum _{i=1}^n {\text {Res}}_{i} , \end{aligned}$$where the equality holds only for a constant series, which is not the case in our context. Therefore,
$$\begin{aligned} \sum _{j=1}^D Gx_j +Gr < \frac{1}{n} \sum _{i=1}^n \left( \sum _{j=1}^D x_{ij} + {\text {Res}}_{i}\right) = {T}. \end{aligned}$$ -
\(\lim _{{T} -> +\infty } f({T}) = 1\). For any \({T}>0\), the expression
$$\begin{aligned} f({T})=\frac{{T}}{\sum _{j=1}^D Gx_j +Gr}=\frac{{T}}{\sum _{j=1}^D Gx_j + \left( \prod _{i=1}^n \left( {T} - \sum _{j=1}^D x_{ij}\right) \right) ^{1/n} }, \end{aligned}$$is equal to
$$\begin{aligned} f({T})=\frac{1}{\frac{\sum _{j=1}^D Gx_j}{{T}} + \left( \prod _{i=1}^n \left( 1 - \frac{\sum _{j=1}^D x_{ij}}{{T}}\right) \right) ^{1/n} }, \end{aligned}$$where \(\lim _{{T} -> +\infty } \frac{\sum _{j=1}^D Gx_j}{{T}} = 0\) and \(\lim _{{T} -> +\infty } \frac{\sum _{j=1}^D x_{ij}}{{T}} = 0\).
-
f(T) is a monotonically decreasing function. To prove this behavior, one can prove that the function \(g({T})=1/f({T})\) is a monotonically increasing function. The derivative function \(g'({T})\) is equal to
$$\begin{aligned} g'({T})= & {} \frac{\frac{1}{n}\left( \prod _{i=1}^n \left( {T} - \sum _{j=1}^D x_{ij}\right) \right) ^{1/n}\left( \sum _{i=1}^n \frac{1}{{T} - \sum _{j=1}^D x_{ij}} \right) {T}}{\mathrm {T}^2}\\&-\frac{\sum _{j=1}^D Gx_j + \left( \prod _{i=1}^n \left( {T} - \sum _{j=1}^D x_{ij}\right) \right) ^{1/n}}{{T}^2}, \end{aligned}$$where using the inequality between the geometric and arithmetic means it holds
$$\begin{aligned} g'({T})> & {} \frac{\frac{1}{n}\left( \prod _{i=1}^n \left( {T} - \sum _{j=1}^D x_{ij}\right) \right) ^{1/n}\left( \sum _{i=1}^n \frac{1}{{T} - \sum _{j=1}^D x_{ij}} \right) {T}-\mathrm {T}}{{T}^2} \\= & {} \frac{\frac{1}{n}\left( \prod _{i=1}^n \left( {T} - \sum _{j=1}^D x_{ij}\right) \right) ^{1/n}\left( \sum _{i=1}^n \frac{1}{{T} - \sum _{j=1}^D x_{ij}} \right) -1}{{T}} \end{aligned}$$Because the term \(\prod _{i=1}^n \left( {T} - \sum _{j=1}^D x_{ij}\right) ^{1/n}\) is the geometric mean of the residuals and the term \(\frac{1}{n}\left( \sum _{i=1}^n \frac{1}{{T} - \sum _{j=1}^D x_{ij}} \right) \) is the inverse of the harmonic mean of the residuals, the sign of the numerator is positive. Therefore, \(g'({T})>0\).
Appendix 2
Let T be a total fixed but as large as we need, like a “big T.” In consequence, for \(i=1,2,\ldots ,n\), the residual \({\text {Res}}_i\) is as large as we need, that is \({T}>> \sum _{j=1}^D x_{ij}\). For \(i=1,2,\ldots ,n\), it holds that
In consequence, because \({T}>> \sum _{j=1}^D x_{ij}\), it holds that
Rights and permissions
About this article
Cite this article
Martín-Fernández, J.A., Egozcue, J.J., Olea, R.A. et al. Units Recovery Methods in Compositional Data Analysis. Nat Resour Res 30, 3045–3058 (2021). https://doi.org/10.1007/s11053-020-09659-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11053-020-09659-7