Skip to main content
Log in

Units Recovery Methods in Compositional Data Analysis

  • Published:
Natural Resources Research Aims and scope Submit manuscript

Abstract

Compositional data carry relative information. Hence, their statistical analysis has to be performed on coordinates with respect to a log-ratio basis. Frequently, the modeler is required to back-transform the estimates obtained with the modeling to have them in the original units such as euros, kg or mg/liter. Approaches for recovering original units need to be formally introduced and its properties explored. Here, we formulate and analyze the properties of two procedures: a simple approach consisting of adding a residual part to the composition and an approach based on the use of an auxiliary variable. Both procedures are illustrated using a geochemical data set where the original units are recovered when spatial models are applied.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Aitchison, J. (1986). The statistical analysis of compositional data. London: Chapman & Hall. (reprinted in 2003 by Blackburn Press).

    Book  Google Scholar 

  • Barceló-Vidal, C., & Martín-Fernández, J. A. (2016). The mathematics of compositional analysis. Austrian Journal of Statistics, 45(4), 57–71.

    Article  Google Scholar 

  • Buccianti, A. (2015). The FOREGS repository: Modelling variability in stream water on a continental scale revising classical diagrams from CoDA (compositional data analysis) perspective. Journal of Geochemical Exploration, 154, 94–104.

    Article  Google Scholar 

  • Buccianti, A., Egozcue, J. J., & Pawlowsky-Glahn, V. (2014). Variation diagrams to statistically model the behavior of geochemical variables: Theory and applications. Journal of Hydrology, 519, 988–998.

    Article  Google Scholar 

  • Coenders, G., Martín-Fernández, J. A., & Ferrer-Rosell, B. (2017). When relative and absolute information matter. Compositional predictor with a total in generalized linear models. Statistical Modelling, 17(6), 494–512.

    Article  Google Scholar 

  • Edjabou, M. E., Martín-Fernández, J. A., Scheutz, C., & Astrup, T. F. (2017). Statistical analysis of solid waste composition data: Arithmetic mean, standard deviation and correlation coefficients. Waste Management, 69, 13–23.

    Article  Google Scholar 

  • Egozcue, J. J., & Pawlowsky-Glahn, V. (2019). Compositional data: The sample space and its structure. TEST, 28(3), 599–638.

    Article  Google Scholar 

  • Graler, B., Pebesma, E., & Heuvelink, G. (2016). Spatio-temporal interpolation using gstat. The R Journal, 8(1), 204–218.

    Article  Google Scholar 

  • Jarauta-Bragulat, E., Hervada-Sala, C., & Egozcue, J. J. (2016). Air quality index revisited from a compositional point of view. Mathematical Geosciences, 48(5), 581–593.

    Article  Google Scholar 

  • Martín-Fernández, J. A. (2019). Comments on: Compositional data: The sample space and its structure. TEST, 28(3), 653–657.

    Article  Google Scholar 

  • Martín-Fernández, J. A., Daunis-Estadella, J., & Mateu-Figueras, G. (2015). On the interpretation of differences between groups for compositional data. SORT, 39, 231–252.

    Google Scholar 

  • Martín-Fernández, J. A., Engle, M. A., Ruppert, L., & Olea, R. A. (2019). Advances in self-organizing maps for their application to compositional data. SERRA, 33, 817–826.

    Google Scholar 

  • Mateu-Figueras, G., Pawlowsky-Glahn, V., & Egozcue, J. J. (2011). The principle of working on coordinates. In V. Pawlowsky-Glahn & A. Buccianti (Eds.), Compositional data analysis: Theory and applications (pp. 31–42). Chichester: Wiley.

    Google Scholar 

  • Mateu-Figueras, G., Pawlowsky-Glahn, V., & Egozcue, J. J. (2013). The normal distribution in some constrained sample spaces. SORT, 37(1), 29–56.

    Google Scholar 

  • Olea, R. A., Raju, N. J., Egozcue, J. J., Pawlowsky-Glahn, V., & Shubhra, S. (2018). Advancements in hydrochemistry mapping: Application to groundwater arsenic and iron concentrations in Varanasi, Uttar Pradesh, India. SERRA, 32(1), 241–259.

    Google Scholar 

  • Palarea-Albaladejo, J., & Martín-Fernández, J. A. (2015). zCompositions—R package for multivariate imputation of nondetects and zeros in compositional data sets. Chemometrics and Intelligent Laboratory Systems, 143, 85–96.

    Article  Google Scholar 

  • Pawlowsky-Glahn, V., Egozcue, J. J., & Lovell, D. (2015a). Tools for compositional data with a total. Statistical Modelling, 15, 175–190.

    Article  Google Scholar 

  • Pawlowsky-Glahn, V., Egozcue, J. J., Olea, R. A., & Pardo-Igúzquiza, E. (2015b). Cokriging of compositional balances including a dimension reduction and retrieval of original units. Journal of the Southern African Institute of Mining and Metallurgy, 115(1), 59–72.

    Article  Google Scholar 

  • Pawlowsky-Glahn, V., Egozcue, J. J., & Tolosana-Delgado, R. (2015c). Modeling and analysis of compositional data. Chichester: Wiley.

    Google Scholar 

  • R Core-Team. (2019). R: A language and environment for statistical computing. Retrieved December 1, 2019, from http://www.R-project.org.

  • Rikken, M. G. J., & Rijn, R. P. G. V. (1993). Soil pollution with heavy metals—An inquiry into spatial variation, cost of mapping and the risk evaluation of copper, cadmium, lead and zinc in the floodplains of the Meuse west of Stein, the Netherlands. Ph.D. thesis, Doctoraalveldwerkverslag, Dept. of Physical Geography, Utrecht University.

  • Tolosana-Delgado, R., Mueller, U., & van den Boogaart, K. G. (2019). Geostatistics for compositional data: An overview. Mathematical Geoscience, 51, 485–526.

    Article  Google Scholar 

Download references

Acknowledgments

This research has been funded by the project “CODAMET” (Ministerio de Ciencia, Innovación y Universidades; Ref: RTI2018-095518-B-C21).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. A. Martín-Fernández.

Appendices

Appendix 1

Let f(T) be the function

$$\begin{aligned} f(T)=\frac{T}{\sum _{j=1}^D Gx_j +Gr}, \end{aligned}$$

then it holds

  • \(f({T})>1\), for any total T. To prove this property, one can use the well-known inequality between the geometric and arithmetic means

    $$\begin{aligned} \sum _{j=1}^D Gx_j +Gr \le \sum _{j=1}^D \left( \frac{1}{n}\sum _{i=1}^n x_{ij}\right) + \frac{1}{n}\sum _{i=1}^n {\text {Res}}_{i} , \end{aligned}$$

    where the equality holds only for a constant series, which is not the case in our context. Therefore,

    $$\begin{aligned} \sum _{j=1}^D Gx_j +Gr < \frac{1}{n} \sum _{i=1}^n \left( \sum _{j=1}^D x_{ij} + {\text {Res}}_{i}\right) = {T}. \end{aligned}$$
  • \(\lim _{{T} -> +\infty } f({T}) = 1\). For any \({T}>0\), the expression

    $$\begin{aligned} f({T})=\frac{{T}}{\sum _{j=1}^D Gx_j +Gr}=\frac{{T}}{\sum _{j=1}^D Gx_j + \left( \prod _{i=1}^n \left( {T} - \sum _{j=1}^D x_{ij}\right) \right) ^{1/n} }, \end{aligned}$$

    is equal to

    $$\begin{aligned} f({T})=\frac{1}{\frac{\sum _{j=1}^D Gx_j}{{T}} + \left( \prod _{i=1}^n \left( 1 - \frac{\sum _{j=1}^D x_{ij}}{{T}}\right) \right) ^{1/n} }, \end{aligned}$$

    where \(\lim _{{T} -> +\infty } \frac{\sum _{j=1}^D Gx_j}{{T}} = 0\) and \(\lim _{{T} -> +\infty } \frac{\sum _{j=1}^D x_{ij}}{{T}} = 0\).

  • f(T) is a monotonically decreasing function. To prove this behavior, one can prove that the function \(g({T})=1/f({T})\) is a monotonically increasing function. The derivative function \(g'({T})\) is equal to

    $$\begin{aligned} g'({T})= & {} \frac{\frac{1}{n}\left( \prod _{i=1}^n \left( {T} - \sum _{j=1}^D x_{ij}\right) \right) ^{1/n}\left( \sum _{i=1}^n \frac{1}{{T} - \sum _{j=1}^D x_{ij}} \right) {T}}{\mathrm {T}^2}\\&-\frac{\sum _{j=1}^D Gx_j + \left( \prod _{i=1}^n \left( {T} - \sum _{j=1}^D x_{ij}\right) \right) ^{1/n}}{{T}^2}, \end{aligned}$$

    where using the inequality between the geometric and arithmetic means it holds

    $$\begin{aligned} g'({T})> & {} \frac{\frac{1}{n}\left( \prod _{i=1}^n \left( {T} - \sum _{j=1}^D x_{ij}\right) \right) ^{1/n}\left( \sum _{i=1}^n \frac{1}{{T} - \sum _{j=1}^D x_{ij}} \right) {T}-\mathrm {T}}{{T}^2} \\= & {} \frac{\frac{1}{n}\left( \prod _{i=1}^n \left( {T} - \sum _{j=1}^D x_{ij}\right) \right) ^{1/n}\left( \sum _{i=1}^n \frac{1}{{T} - \sum _{j=1}^D x_{ij}} \right) -1}{{T}} \end{aligned}$$

    Because the term \(\prod _{i=1}^n \left( {T} - \sum _{j=1}^D x_{ij}\right) ^{1/n}\) is the geometric mean of the residuals and the term \(\frac{1}{n}\left( \sum _{i=1}^n \frac{1}{{T} - \sum _{j=1}^D x_{ij}} \right) \) is the inverse of the harmonic mean of the residuals, the sign of the numerator is positive. Therefore, \(g'({T})>0\).

Appendix 2

Let T be a total fixed but as large as we need, like a “big T.” In consequence, for \(i=1,2,\ldots ,n\), the residual \({\text {Res}}_i\) is as large as we need, that is \({T}>> \sum _{j=1}^D x_{ij}\). For \(i=1,2,\ldots ,n\), it holds that

$$\begin{aligned} \sqrt{\frac{D}{D+1}}\cdot \ln \frac{{{\text {Res}}_i}}{m_i}= & {} \sqrt{\frac{D}{D+1}}\cdot \ln {{{\text {Res}}_i}}-\sqrt{\frac{D}{D+1}}\cdot \ln {m_i} \\= & {} \sqrt{\frac{D}{D+1}}\cdot \ln \left( {{T}- \sum _{j=1}^D x_{j}}\right) -\sqrt{\frac{D}{D+1}}\cdot \ln {m_i} \\= & {} \sqrt{\frac{D}{D+1}}\cdot \ln \left( {T}\cdot \left( 1-\frac{ \sum _{j=1}^D x_{ij}}{{T}}\right) \right) -\sqrt{\frac{D}{D+1}} \cdot \ln {m_i} \\= & {} \sqrt{\frac{D}{D+1}}\cdot \ln {T}+\sqrt{\frac{D}{D+1}}\cdot \ln \left( 1-\frac{ \sum _{j=1}^D x_{ij}}{{T}}\right) \\&-\sqrt{\frac{D}{D+1}}\cdot \ln {m_i}. \end{aligned}$$

In consequence, because \({T}>> \sum _{j=1}^D x_{ij}\), it holds that

$$\begin{aligned} \sqrt{\frac{D}{D+1}}\cdot \ln \frac{{{\text {Res}}_i}}{m_i}\approx \sqrt{\frac{D}{D+1}}\cdot \ln {T}-\sqrt{\frac{D}{D+1}}\cdot \ln {m_i}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martín-Fernández, J.A., Egozcue, J.J., Olea, R.A. et al. Units Recovery Methods in Compositional Data Analysis. Nat Resour Res 30, 3045–3058 (2021). https://doi.org/10.1007/s11053-020-09659-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11053-020-09659-7

Keywords

Navigation