Abstract
Negative correlations between elements, molecules, or minerals can indicate a variety of geochemical processes, such as ion exchange, incongruent mineral precipitation/dissolution, and redox reactions. However, compositional data (those composed of relative parts) can also exhibit negative correlations simply due to displacement of one part by another, such as the addition of sodium chloride lowering the concentration of all ions other than sodium and chloride in a solution. Apart from this practical problem, the question is more general: how to address the relationships between components in data carrying relative information. For this purpose, the symmetric pivot coordinates were developed which allow for the identification of both positive and negative correlations between two parts in compositional data in terms of their relative dominance to the other parts. Accordingly, the symmetric pivot coordinate approach aggregates all of the logratios with those two parts of interest. This may not be desirable if data quality problems occur, because such parts would contribute the same weight to the coordinate as parts with good data quality. As a way out, the new method of weighted symmetric coordinates focusing on pairwise associations is proposed. In this approach, variables with large logratio variances are down-weighted to suppress their effect on the remaining variables, which is also demonstrated in a small simulation study. Finally, the weighted symmetric pivot coordinates are applied to chemistry data from a series of waste water samples from oil and gas wells produced from the lower Eagle Ford Group in the U.S. Gulf Coast Basin. In particular, strong negative correlations between ions are examined using this method to reveal processes which occur as a function of depth, including clay diagenesis, de-dolomitization, kerogen maturation, and sulfate and carbonate mineral saturation.
Similar content being viewed by others
References
Aitchison J (1986) The Statistical Analysis of Compositional Data. Monographs on Statistics and Applied Probability. Chapman and Hall, London
Billheimer D, Guttorp P, Fagan W (2001) Statistical interpretation of species composition. J. Am. Stat. Assoc. 96:1205–1214
Buccianti A, Pawlowsky-Glahn V (2005) New perspectives on water chemistry and compositional data analysis. Math. Geol. 37:703–727
Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (2006) Compositional Data Analysis in the Geosciences: From Theory to Practice (Special Publications 264). Geological Society, London
Dresel PE, Rose AW (2010) Chemistry and origin of oil and gas well brines in Western Pennsylvania. Pennsylvania Geological Survey Open-File Oil and Gas Report 10–01
Eaton ML (1983) Multivariate Statistics. A Vector Space Approach. Wiley, New York
Egozcue JJ, Pawlowsky-Glahn V (2005) Groups of parts and their balances in compositional data analysis. Math. Geol. 37:795–828
Egozcue JJ, Pawlowsky-Glahn V (2016) Changing the reference measure in the simplex and its weighting effects. Aust. J. Stat. 45:25–44
Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math. Geol. 35:279–300
Engle MA (2019) Chemical and isotopic composition of produced waters from the lower Eagle Ford Group, south-central Texas. Geological Survey Data Release, U.S. https://doi.org/10.5066/P9KUH0F6
Engle MA, Rowan EL (2014) Geochemical evolution of produced waters from hydraulic fracturing of the Marcellus Shale, Northern Appalachian Basin: a multivariate compositional data analysis approach. Int. J. Coal Geol. 126:45–56
Engle MA, Doolan CA, Pitman JA, Varonka MS, Chenault J, Orem WH, McMahon PB, Jubb AM (2020) Origin and Geochemistry of Formation Waters from the Lower Eagle Ford Group, Gulf Coast Basin, South Central Texas. Chemical Geology (in review)
Filzmoser P, Hron K (2015) Robust coordinates for compositional data using weighted pivot coordinates. In: Nordhausen K, Taskinen S (eds) Modern Nonparametric, Robust and Multivariate Methods. Springer, Heidelberg, pp 167–184
Filzmoser P, Hron K, Reimann C (2012) Interpretation of multivariate outliers for compositional data. Comput. Geosci. 39:77–85
Filzmoser P, Hron K, Templ M (2018) Applied Compositional Data Analysis. Springer, Cham
Fišerová E, Hron K (2011) On interpretation of orthonormal coordinates for compositional data. Math. Geosci. 43:455–468
Greenacre M (2019) Variable selection in compositional data analysis using pairwise logratios. Math. Geosci. 51:649–682
Greenacre MJ, Lewi PJ (2009) Distributional equivalence and subcompositional coherence in the analysis of compositional data, contingency tables and ratio-scale measurements. J. Classif. 26:29–64
Harville DA (1997) Matrix Algebra from a Statistican’s Perspective. Springer, New York
Hron K, Filzmoser P, de Caritat P, Fišerová E, Gardlo A (2017) Weighted pivot coordinates for compositional data and their application to geochemical mapping. Math. Geosci. 49:797–814
Karacan CÖ, Olea RA (2018) Mapping of compositional properties of coal using isometric log-ratio transformation and sequential Gaussian simulation: a comparative study for spatial ultimate analyses data. J. Geochem. Explor. 186:24–35
Kynčlová P, Hron K, Filzmoser P (2017) Correlation between compositional parts based on symmetric balances. Math. Geosci. 49:777–796
Maronna R, Martin D, Yohai V (2006) Robust Statistics: Theory and Methods. Wiley, Chichester
Martín-Fernández JA, Hron K, Templ M, Filzmoser P, Palarea-Albaladejo J (2012) Model-based replacement of rounded zeros in compositional data: classical and robust approaches. Comput. Stat. Data Anal. 56:2688–2704
Martín-Fernández JA, Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2018) Advances in principal balances for compositional data. Math. Geosci. 50:273–298
McKinley JM, Hron K, Grunsky E, Reimann C, de Caritat P, Filzmoser P, van den Boogaart KG, Tolosana-Delgado R (2016) The single component geochemical map: fact or fiction. J. Geochem. Explor. 162:16–28
Mert C, Filzmoser P, Hron K (2016) Error propagation in isometric logratio coordinates for compositional data: theoretical and practical considerations. Math. Geosci. 48:941–961
Pawlowsky-Glahn V, Egozcue JJ (2001) Geometric approach to statistical analysis on the simplex. Stoch. Environ. Res. Risk Ass. 15:384–398
Pawlowsky-Glahn V, Buccianti A (eds) (2011) Compositional Data Analysis: Theory and Applications. Wiley, Chichester
Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and Analysis of Compositional Data. Wiley, Chichester
Reimann C, Filzmoser P, Fabian K, Hron K, Birke M, Demetriades A, Dinelli E, Ladenberger A (2012) GEMAS Project Team: The concept of compositional data analysis in practice: total major element concentrations in agricultural and grazing land soils in Europe. Sci. Total Environ. 426, 196–210
Reimann C, Filzmoser P, Hron K, Kynčlová P, Garrett R (2017) A new method for correlation analysis of compositional (environmental) data: a worked example. Sci. Total Environ. 607–608:965–971
Rivera-Pinto J, Egozcue JJ, Pawlowsky-Glahn V, Paredes R, Noguera-Julian M, Calle ML (2018) Balances: a new perspective for microbiome analysis. mSystems, 3, e00053-18
Schloerke B (2016) geozoo: Zoo of Geometric Objects. R package version (5):1. https://CRAN.R-project.org/package=geozoo
Templ M, Hron K, Filzmoser P, Kynčlová P, Walach J, Pintar V, Chen J, Mikšová D (2018) robCompositions: an R-package for robust statistical analysis of compositional data. R package version 2:9. https://CRAN.R-project.org/package=robCompositions
Tolosana-Delgado R, McKinley J (2016) Exploring the joint compositional variability of major components and trace elements in the Tellus soil geochemistry survey (Northern Ireland). Appl. Geochem. 75:263–276
Acknowledgements
The authors gratefully acknowledge the support by Czech Science Foundation GA19-01768S and the U.S. Geological Survey Energy Resources Program. Helpful comments on an earlier draft of the manuscript provided by Nicholas Gianoutsos (U.S. Geological Survey). Any use of trade, firm or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Author information
Authors and Affiliations
Corresponding author
Appendix: Explicit Formulas for Weighted Symmetric Pivot Coordinates
Appendix: Explicit Formulas for Weighted Symmetric Pivot Coordinates
Unlike in the case of the weighted pivot coordinates, it is not straightforward to derive explicit formulas for the weighted symmetric pivot coordinates. For the third coordinate \(w_3^s\) (up to a normalizing constant), still quite a simple logcontrast vector
is obtained. For \(D>5\), the coefficients are the following
For a 4-part, or 5-part composition, some weights are replaced by this scheme
However, for the fourth coordinate \({\mathrm{W}}_{4}^s\), after tedious computational effort, by considering the logcontrast vector
the following coefficients are derived
where
For \(D=5\), or \(D=6\), some weights are replaced by the following scheme
Due to the complexity of the logcontrast coefficients, explicit formulas for the coordinates \(w_5^s,\ldots ,w_{D-1}^s\) are omitted. In practice, they are evaluated numerically using Eq. (21).
Rights and permissions
About this article
Cite this article
Hron, K., Engle, M., Filzmoser, P. et al. Weighted Symmetric Pivot Coordinates for Compositional Data with Geochemical Applications. Math Geosci 53, 655–674 (2021). https://doi.org/10.1007/s11004-020-09862-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11004-020-09862-5