Abstract
It often occurs in practice that it is sensible to give different weights to the variables involved in a multivariate data analysis—and the same holds for compositional data as multivariate observations carrying relative information. It can be convenient to apply weights to better accommodate differences in the quality of the measurements, the occurrence of zeros and missing values, or generally to highlight some specific features of compositional parts. The characterisation of compositional data as elements of a Bayes space, which is as a natural generalisation of the ordinary Aitchison geometry, enables the definition of a formal framework to implement weighting schemes for the parts of a composition. This is formally achieved by considering a reference measure in the Bayes space alternative to the common uniform measure via the well-known chain rule. Unweighted centred logratio (clr) coefficients and isometric logratio (ilr) coordinates then allow us to express compositions in real space equipped with (unweighted) Euclidean geometry. The resulting elements of real space generated by the clr coefficients or ilr coordinates are invariant to the scale of the original compositions, but the actual scale of the weights matters. In this work, these formal developments are presented and used to introduce a general approach for weighting parts in compositional data analysis. The practical use is demonstrated on simulated and real-world data sets in the context of the earth sciences.
Similar content being viewed by others
Availability of data and materials
Data are available from the authors on request.
References
Aitchison J (1982) The statistical analysis of compositional data (with discussion). J R Stat Soc Ser B (Stat Methodol) 44(2):139–177
Aitchison J (1983) Principal component analysis of compositional data. Biometrika 70(1):57–65
Aitchison J (1986) The statistical analysis of compositional data. Chapman & Hall, London. (Reprinted in 2003 with additional material by The Blackburn Press)
Aitchison J, Barceló-Vidal C, Martín-Fernández JA, Pawlowsky-Glahn V (2000) Logratio analysis and compositional distance. Math Geol 32(3):271–275
Aitchison J, Greenacre M (2002) Biplots of compositional data. J R Stat Soc Ser C (Appl Stat) 51(4):375–392
Barceló-Vidal C, Martín-Fernández JA (2016) The mathematics of compositional analysis. Aust J Stat 45:57–71
Barceló-Vidal C, Martín-Fernández JA, Pawlowsky-Glahn V (2001) Mathematical foundations of compositional data analysis. In: Ross G (ed) Proceedings of IAMG’01—The VII annual conference of the international association for mathematical geology, p 20
Billheimer D, Guttorp P, Fagan W (2001) Statistical interpretation of species composition. J Am Stat Assoc 96(456):1205–1214
Butler BM, Palarea-Albaladejo J, Shepherd KD, Nyambura KM, Towett EK, Sila AM, Hillier S (2020) Mineral-nutrient relationships in African soils assessed using cluster analysis of X-ray powder diffraction patterns and compositional methods. Geoderma 375:124474
Eaton ML (1983) Multivariate statistics. A vector space approach. Wiley, New York
Egozcue JJ (2009) Reply to “On the Harker variation diagrams; ...” by J.A. Cortés. Math Geosci 41(7):829–834
Egozcue JJ, Pawlowsky-Glahn V (2005) Groups of parts and their balances in compositional data analysis. Math Geol 37(7):795–828
Egozcue JJ, Pawlowsky-Glahn V (2016) Changing the reference measure in the simplex and its weighting effects. Aust J Stat 45(4):25–44
Egozcue JJ, Pawlowsky-Glahn V (2018) Modelling compositional data. The sample space approach. In: Daya Sagar BS, Cheng Q, Agterberg F (eds) Handbook of mathematical geosciences—fifty years of IAMG. Springer, Cham, pp 81–103
Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3):279–300
Egozcue JJ, Barceló-Vidal C, Martín-Fernández JA, Jarauta-Bragulat E, Díaz-Barrero JL, Mateu-Figueras G (2011) Elements of simplicial linear algebra and geometry. In: Pawlowsky-Glahn V, Buccianti A (eds) Compositional data analysis: theory and applications. Wiley, Chichester, pp 141–157
Filzmoser P, Hron K, Reimann C (2009) Univariate statistical analysis of environmental (compositional) data: problems and possibilities. Sci Total Environ 407:6100–6108
Filzmoser P, Hron K, Templ M (2018) Applied compositional data analysis. Springer series in statistics. Springer, Cham
Fišerová E, Hron K (2011) On interpretation of orthonormal coordinates for compositional data. Math Geosci 43(4):455–468
Greenacre M (2018) Compositional data in practice. CRC Press, Boca Raton
Greenacre M, Lewi P (2009) Distributional equivalence and subcompositional coherence in the analysis of compositional data, contingency tables and ratio-scale measurements. J Classif 26(1):29–54
Hron K, Templ M, Filzmoser P (2010) Imputation of missing values for compositional data using classical and robust methods. Comput Stat Data Anal 54(12):3095–3107
Hron K, Filzmoser P, de Caritat P, Fišerová E, Gardlo A (2017) Weighted pivot coordinates for compositional data and their application to geochemical mapping. Math Geosci 49(6):797–814
Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
Martín-Fernández JA, Hron K, Templ M, Filzmoser P, Palarea-Albaladejo J (2012) Model-based replacement of rounded zeros in compositional data: classical and robust approaches. Comput Stat Data Anal 56:2688–2704
Mert C, Filzmoser P, Hron K (2016) Error propagation in compositional data analysis: theoretical and practical considerations. Math Geosci 48(8):941–961
Palarea-Albaladejo J, Martín-Fernández JA (2008) A modified EM alr-algorithm for replacing rounded zeros in compositional data sets. Comput Geosci 34(8):902–917
Palarea-Albaladejo J, Martín-Fernández JA (2013) Values below detection limit in compositional chemical data. Anal Chim Acta 764:32–43
Palarea-Albaladejo J, Martín-Fernández J (2015) zCompositions—R package for multivariate imputation of left-censored data under a compositional approach. Chemometr Intell Lab Syst 143:85–96
Palarea-Albaladejo J, Martín-Fernández JA, Gómez-García J (2007) A parametric approach for dealing with compositional rounded zeros. Math Geol 39(7):625–645
Pawlowsky-Glahn V, Egozcue JJ (2001) Geometric approach to statistical analysis on the simplex. Stochastic Environ Res Risk Assess (SERRA) 15(5):384–398
Pawlowsky-Glahn V, Egozcue JJ (2002) BLU estimators and compositional data. Math Geol 34(3):259–274
Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. Wiley, Chichester
Reimann C, Demetriades A, Eggen O, Filzmoser P (2009) the EuroGeoSurveys Geochemistry expert group, The EuroGeoSurveys geochemical mapping of agricultural and grazing land soils project (GEMAS)—Evaluation of quality control results of aqua regia extraction analysis. NGU Report 2009:049
Reimann C, Filzmoser P, Fabian K, Hron K, Birke M, Demetriades A, Dinelli E, Ladenberger A, The GEMAS Project Team (2012) The concept of compositional data analysis in practice-Total major element concentrations in agricultural and grazing land soils of Europe. Sci Total Environ 426:196–210
Talská R, Menafoglio A, Hron K, Egozcue JJ, Palarea-Albaladejo J (2020) Weighting the domain of probability densities in functional data analysis. Stat. 9(1):e283
Templ M, Hron K, Filzmoser P (2011) robCompositions: an R-package for robust statistical analysis of compositional data. In: Pawlowsky-Glahn V, Buccianti A (eds) Compositional data analysis: theory and applications. Wiley, Chichester, pp 341–355
van den Boogaart KG, Egozcue JJ, Pawlowsky-Glahn V (2014) Bayes Hilbert spaces. Aust N Z J Stat 56(2):171–194
van den Boogaart K, Tolosana-Delgado R, Templ M (2015) Regression with compositional response having unobserved components or below detection limit values. Stat Model 15(2):191–213
Author information
Authors and Affiliations
Contributions
KH and AM conceived this research; JPA and PF designed the experiments and provided the data sets and interpretations; KH wrote the first draft of the paper, and KH, AM, JPA, PF, RT and JJE all participated in the revisions of it.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
K.H., P.F. and R.T. gratefully acknowledge the support by the Czech Science Foundation (GACR), GA 19-01768S; J. P-A was partly supported by the Scottish Government’s Rural and Environment Science and Analytical Services Division and the Spanish Ministry of Economy and Competitiveness [Ref: RTI2018-095518-B-C21].
Rights and permissions
About this article
Cite this article
Hron, K., Menafoglio, A., Palarea-Albaladejo, J. et al. Weighting of Parts in Compositional Data Analysis: Advances and Applications. Math Geosci 54, 71–93 (2022). https://doi.org/10.1007/s11004-021-09952-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11004-021-09952-y