Skip to main content
Log in

Weighted Symmetric Pivot Coordinates for Compositional Data with Geochemical Applications

  • Published:
Mathematical Geosciences Aims and scope Submit manuscript

Abstract

Negative correlations between elements, molecules, or minerals can indicate a variety of geochemical processes, such as ion exchange, incongruent mineral precipitation/dissolution, and redox reactions. However, compositional data (those composed of relative parts) can also exhibit negative correlations simply due to displacement of one part by another, such as the addition of sodium chloride lowering the concentration of all ions other than sodium and chloride in a solution. Apart from this practical problem, the question is more general: how to address the relationships between components in data carrying relative information. For this purpose, the symmetric pivot coordinates were developed which allow for the identification of both positive and negative correlations between two parts in compositional data in terms of their relative dominance to the other parts. Accordingly, the symmetric pivot coordinate approach aggregates all of the logratios with those two parts of interest. This may not be desirable if data quality problems occur, because such parts would contribute the same weight to the coordinate as parts with good data quality. As a way out, the new method of weighted symmetric coordinates focusing on pairwise associations is proposed. In this approach, variables with large logratio variances are down-weighted to suppress their effect on the remaining variables, which is also demonstrated in a small simulation study. Finally, the weighted symmetric pivot coordinates are applied to chemistry data from a series of waste water samples from oil and gas wells produced from the lower Eagle Ford Group in the U.S. Gulf Coast Basin. In particular, strong negative correlations between ions are examined using this method to reveal processes which occur as a function of depth, including clay diagenesis, de-dolomitization, kerogen maturation, and sulfate and carbonate mineral saturation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aitchison J (1986) The Statistical Analysis of Compositional Data. Monographs on Statistics and Applied Probability. Chapman and Hall, London

    Google Scholar 

  • Billheimer D, Guttorp P, Fagan W (2001) Statistical interpretation of species composition. J. Am. Stat. Assoc. 96:1205–1214

    Article  Google Scholar 

  • Buccianti A, Pawlowsky-Glahn V (2005) New perspectives on water chemistry and compositional data analysis. Math. Geol. 37:703–727

    Article  Google Scholar 

  • Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (2006) Compositional Data Analysis in the Geosciences: From Theory to Practice (Special Publications 264). Geological Society, London

    Google Scholar 

  • Dresel PE, Rose AW (2010) Chemistry and origin of oil and gas well brines in Western Pennsylvania. Pennsylvania Geological Survey Open-File Oil and Gas Report 10–01

  • Eaton ML (1983) Multivariate Statistics. A Vector Space Approach. Wiley, New York

    Google Scholar 

  • Egozcue JJ, Pawlowsky-Glahn V (2005) Groups of parts and their balances in compositional data analysis. Math. Geol. 37:795–828

    Article  Google Scholar 

  • Egozcue JJ, Pawlowsky-Glahn V (2016) Changing the reference measure in the simplex and its weighting effects. Aust. J. Stat. 45:25–44

    Article  Google Scholar 

  • Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math. Geol. 35:279–300

    Article  Google Scholar 

  • Engle MA (2019) Chemical and isotopic composition of produced waters from the lower Eagle Ford Group, south-central Texas. Geological Survey Data Release, U.S. https://doi.org/10.5066/P9KUH0F6

  • Engle MA, Rowan EL (2014) Geochemical evolution of produced waters from hydraulic fracturing of the Marcellus Shale, Northern Appalachian Basin: a multivariate compositional data analysis approach. Int. J. Coal Geol. 126:45–56

    Article  Google Scholar 

  • Engle MA, Doolan CA, Pitman JA, Varonka MS, Chenault J, Orem WH, McMahon PB, Jubb AM (2020) Origin and Geochemistry of Formation Waters from the Lower Eagle Ford Group, Gulf Coast Basin, South Central Texas. Chemical Geology (in review)

  • Filzmoser P, Hron K (2015) Robust coordinates for compositional data using weighted pivot coordinates. In: Nordhausen K, Taskinen S (eds) Modern Nonparametric, Robust and Multivariate Methods. Springer, Heidelberg, pp 167–184

    Chapter  Google Scholar 

  • Filzmoser P, Hron K, Reimann C (2012) Interpretation of multivariate outliers for compositional data. Comput. Geosci. 39:77–85

    Article  Google Scholar 

  • Filzmoser P, Hron K, Templ M (2018) Applied Compositional Data Analysis. Springer, Cham

    Book  Google Scholar 

  • Fišerová E, Hron K (2011) On interpretation of orthonormal coordinates for compositional data. Math. Geosci. 43:455–468

    Article  Google Scholar 

  • Greenacre M (2019) Variable selection in compositional data analysis using pairwise logratios. Math. Geosci. 51:649–682

    Article  Google Scholar 

  • Greenacre MJ, Lewi PJ (2009) Distributional equivalence and subcompositional coherence in the analysis of compositional data, contingency tables and ratio-scale measurements. J. Classif. 26:29–64

    Article  Google Scholar 

  • Harville DA (1997) Matrix Algebra from a Statistican’s Perspective. Springer, New York

    Book  Google Scholar 

  • Hron K, Filzmoser P, de Caritat P, Fišerová E, Gardlo A (2017) Weighted pivot coordinates for compositional data and their application to geochemical mapping. Math. Geosci. 49:797–814

    Article  Google Scholar 

  • Karacan CÖ, Olea RA (2018) Mapping of compositional properties of coal using isometric log-ratio transformation and sequential Gaussian simulation: a comparative study for spatial ultimate analyses data. J. Geochem. Explor. 186:24–35

    Article  Google Scholar 

  • Kynčlová P, Hron K, Filzmoser P (2017) Correlation between compositional parts based on symmetric balances. Math. Geosci. 49:777–796

    Article  Google Scholar 

  • Maronna R, Martin D, Yohai V (2006) Robust Statistics: Theory and Methods. Wiley, Chichester

    Book  Google Scholar 

  • Martín-Fernández JA, Hron K, Templ M, Filzmoser P, Palarea-Albaladejo J (2012) Model-based replacement of rounded zeros in compositional data: classical and robust approaches. Comput. Stat. Data Anal. 56:2688–2704

    Article  Google Scholar 

  • Martín-Fernández JA, Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2018) Advances in principal balances for compositional data. Math. Geosci. 50:273–298

    Article  Google Scholar 

  • McKinley JM, Hron K, Grunsky E, Reimann C, de Caritat P, Filzmoser P, van den Boogaart KG, Tolosana-Delgado R (2016) The single component geochemical map: fact or fiction. J. Geochem. Explor. 162:16–28

    Article  Google Scholar 

  • Mert C, Filzmoser P, Hron K (2016) Error propagation in isometric logratio coordinates for compositional data: theoretical and practical considerations. Math. Geosci. 48:941–961

    Article  Google Scholar 

  • Pawlowsky-Glahn V, Egozcue JJ (2001) Geometric approach to statistical analysis on the simplex. Stoch. Environ. Res. Risk Ass. 15:384–398

    Article  Google Scholar 

  • Pawlowsky-Glahn V, Buccianti A (eds) (2011) Compositional Data Analysis: Theory and Applications. Wiley, Chichester

  • Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and Analysis of Compositional Data. Wiley, Chichester

    Google Scholar 

  • Reimann C, Filzmoser P, Fabian K, Hron K, Birke M, Demetriades A, Dinelli E, Ladenberger A (2012) GEMAS Project Team: The concept of compositional data analysis in practice: total major element concentrations in agricultural and grazing land soils in Europe. Sci. Total Environ. 426, 196–210

  • Reimann C, Filzmoser P, Hron K, Kynčlová P, Garrett R (2017) A new method for correlation analysis of compositional (environmental) data: a worked example. Sci. Total Environ. 607–608:965–971

    Article  Google Scholar 

  • Rivera-Pinto J, Egozcue JJ, Pawlowsky-Glahn V, Paredes R, Noguera-Julian M, Calle ML (2018) Balances: a new perspective for microbiome analysis. mSystems, 3, e00053-18

  • Schloerke B (2016) geozoo: Zoo of Geometric Objects. R package version (5):1. https://CRAN.R-project.org/package=geozoo

  • Templ M, Hron K, Filzmoser P, Kynčlová P, Walach J, Pintar V, Chen J, Mikšová D (2018) robCompositions: an R-package for robust statistical analysis of compositional data. R package version 2:9. https://CRAN.R-project.org/package=robCompositions

  • Tolosana-Delgado R, McKinley J (2016) Exploring the joint compositional variability of major components and trace elements in the Tellus soil geochemistry survey (Northern Ireland). Appl. Geochem. 75:263–276

    Article  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the support by Czech Science Foundation GA19-01768S and the U.S. Geological Survey Energy Resources Program. Helpful comments on an earlier draft of the manuscript provided by Nicholas Gianoutsos (U.S. Geological Survey). Any use of trade, firm or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karel Hron.

Appendix: Explicit Formulas for Weighted Symmetric Pivot Coordinates

Appendix: Explicit Formulas for Weighted Symmetric Pivot Coordinates

Unlike in the case of the weighted pivot coordinates, it is not straightforward to derive explicit formulas for the weighted symmetric pivot coordinates. For the third coordinate \(w_3^s\) (up to a normalizing constant), still quite a simple logcontrast vector

$$\begin{aligned} \mathbf {w}_{3}^s=A (\underbrace{0,\ldots ,0}_{D-4},w_{3,D-3}^s,w_{3,D-2}^s,w_{3,D-1}^s,w_{3,D}^s)' \end{aligned}$$

is obtained. For \(D>5\), the coefficients are the following

$$\begin{aligned}&w_{3,D-3}^s=\alpha _{D-2}(\beta _{D-1}-\beta _{D})-\alpha _{D-1}(\beta _{D-2}-\beta _{D})+\alpha _{D}(\beta _{D-2}-\beta _{D-1}),\\&w_{3,D-2}^s=-\,[\alpha _{D-3}(\beta _{D-1}-\beta _{D})-\alpha _{D-1}(\beta _{D-3}-\beta _{D})+ \alpha _{D}(\beta _{D-3}-\beta _{D-1})],\\&w_{3,D-1}^s= \alpha _{D-3}(\beta _{D-2}-\beta _{D})-\alpha _{D-2}(\beta _{D-3}-\beta _{D})+\alpha _{D}(\beta _{D-3}-\beta _{D-2}),\\&w_{3,D}^s= -\,[\alpha _{D-3}(\beta _{D-2}-\beta _{D-1})-\alpha _{D-2}(\beta _{D-3}-\beta _{D-1})+\alpha _{D-1}(\beta _{D-3}-\beta _{D-2})]. \end{aligned}$$

For a 4-part, or 5-part composition, some weights are replaced by this scheme

$$\begin{aligned}&D=4: \alpha _{D-3}\rightarrow -1,\ \alpha _{D-2}\rightarrow \gamma ,\ \beta _{D-3}\rightarrow \gamma ,\ \beta _{D-2}\rightarrow -1,\\&D=5: \alpha _{D-3}\rightarrow \gamma ,\quad \beta _{D-3}\rightarrow \ -1. \end{aligned}$$

However, for the fourth coordinate \({\mathrm{W}}_{4}^s\), after tedious computational effort, by considering the logcontrast vector

$$\begin{aligned} \mathbf {w}_{4}^s = A (\underbrace{0,\ldots ,0}_{D-5},w_{4,D-4}^s,w_{4,D-3}^s,w_{4,D-2}^s,w_{4,D-1}^s,w_{4,D}^s)' \end{aligned}$$

the following coefficients are derived

$$\begin{aligned} w_{4,D-4}^s= & {} -\,2\sum _{{\mathop {}\limits ^{i=D-3}}}^{D-1} \sum _{{\mathop {}\limits ^{j=i+1}}}^D \alpha _i\alpha _j\left\{ 2\beta _i\beta _j + \sum _{{\mathop {k\ne i,j}\limits ^{k=D-3}}}^D \beta _{k}(\beta _k-\beta _i-\beta _j) \right\} \\&- \,2\sum _{{\mathop {}\limits ^{i=D-2}}}^{D} \sum _{{\mathop {k\ne i}\limits ^{k=D-3}}}^D \sum _{{\mathop {l\ne i}\limits ^{l= k}}}^D (-1)^{\delta _{kl}}\alpha _i\beta _{k}\beta _l,\\ w_{4,p}^s= & {} 2\alpha _{D-4}\alpha _p\sum _{{\mathop {i\ne p}\limits ^{i=D-3}}}^{D} \sum _{{\mathop {j\ne p}\limits ^{j=i}}}^D (-1)^{\delta _{ij}}\beta _i\beta _j \\&+\, \alpha _p \sum _{{\mathop {i\ne p}\limits ^{i=D-3}}}^D \sum _{{\mathop {k\ne i,p}\limits ^{k=D-3}}}^D \alpha _i(\beta _k-\beta _i)(\beta _k-\beta _{D-4}) \\&+\,\sum _{{\mathop {i\ne p}\limits ^{i=D-3}}}^{D-1} \sum _{{\mathop {j\ne p}\limits ^{j=i+1}}}^D\alpha _i\alpha _j\left[ \beta _{D-4}(2\beta _p-\beta _i-\beta _j)+2\beta _i\beta _j-\beta _p(\beta _i+\beta _j)\right] \\&+\, \alpha _{D-4} \sum _{{\mathop {i\ne p}\limits ^{i=D-3}}}^D \sum _{{\mathop {k\ne i,p}\limits ^{k=D-3}}}^D\alpha _i\left[ \beta _{k}(\beta _k-\beta _i-\beta _p)+2\beta _i\beta _p\right] ,\\&\quad p=D-3,D-2,D-1,D, \end{aligned}$$

where

$$\begin{aligned} \delta _{kl}= {\left\{ \begin{array}{ll} 0 &{} \quad k\ne l\\ 1 &{}\quad k=l. \end{array}\right. } \end{aligned}$$

For \(D=5\), or \(D=6\), some weights are replaced by the following scheme

$$\begin{aligned}&D=5: \alpha _{D-4}\rightarrow -1,\ \alpha _{D-3}\rightarrow \gamma ,\quad \beta _{D-4}\rightarrow \gamma ,\ \beta _{D-3}\rightarrow -1,\\&D=6: \alpha _{D-4}\rightarrow \gamma ,\quad \beta _{D-4}\rightarrow -1. \end{aligned}$$

Due to the complexity of the logcontrast coefficients, explicit formulas for the coordinates \(w_5^s,\ldots ,w_{D-1}^s\) are omitted. In practice, they are evaluated numerically using Eq. (21).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hron, K., Engle, M., Filzmoser, P. et al. Weighted Symmetric Pivot Coordinates for Compositional Data with Geochemical Applications. Math Geosci 53, 655–674 (2021). https://doi.org/10.1007/s11004-020-09862-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11004-020-09862-5

Keywords

Navigation