Skip to main content
Log in

Statistical comparisons of heavy metal pollutants between seven regions of the Polish exclusive economic zone

  • Original Article
  • Published:
Environmental Earth Sciences Aims and scope Submit manuscript

Abstract

This paper addresses three intractable difficulties associated with the statistical analysis of compositional data, such as percentages or ppm. These are: (1) that such data do not follow multivariate normal distributions thus rendering inappropriate, standard parametric statistical tests and estimation procedures, (2) the covariance/correlation coefficients between specific pairs of components are determined in whole or in part by the presence or absence of other components, and, (3) the negative bias property. That is, at least one covariance and therefore at least one correlation, must be negative, hence the remaining correlations are prevented from ranging freely between −1 and +1. It follows that correlation coefficients formed from compositional data are not only not absolute, but also frequently spurious. Standard multivariate procedures based on them are unreliable, and intrinsic associations between components inferred from strong positive correlations in particular, are potentially false. In a recent 2009 paper, it was reported that 59 surface sediment samples from 7 regions in the Polish exclusive economic zone had been chemically analyzed for 16 elements. Enrichment factors together with crude correlation coefficients between selected elements were presented. All these quantities were computed from the initial raw compositional data resulting from the chemical analyses In this paper, a statistical procedure is presented which is distinctly different to the enrichment factor computations based on the same raw compositional data. The procedure generates a log-ratio measure of the abundance of each element in each of the seven regions, thus enabling comparisons of relative levels of pollution between the regions. Although the two techniques are quite unrelated, it is shown that in general, extremely high or low measures of the relative abundances in the regions are associated with correspondingly high or low values of the enrichment factors in the same regions that were reported in the 2009 paper. That is, the statistical analysis confirms the results of the enrichment factor data in the identification of the most to the least polluted regions. In an additional analysis, the residue term was excluded from each sediment sample by rescaling the 16 element concentrations to sum to 100%, thus forming 59 residue-free sub-compositions. Crude correlation coefficients were computed for pairs of elements of this sub-compositional data. These revealed that certain correlations based on the initial raw data that were reported in the 2009 paper for the same pairs of elements, were not only inconsistent, but sometimes also contradictory. Such contradictions imply that intrinsic geochemical element associations inferred in that paper from such correlations were false.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aitchison J (1982) The statistical analysis of compositional data (with discussion). J Roy Statist Soc B 44:139–177

    Google Scholar 

  • Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, London, p 416

    Book  Google Scholar 

  • Aitchison J (1997) The 1-h course in compositional data or compositional data analysis is simple. In: Vera Pawlowsky-Glahn (ed) Proceedings of IAMG’97, Part 1:3–35

  • Alaoui AM, Choura M, Maanan M, Zourarah B, Robin M, Conceicao MF, Andrade C, Khalid M, Carruesco C (2010) Metal fluxes to the sediments of the Moulay Bousselham Lagoon, Morocco. Environ Earth Sci 61:275–286

    Article  Google Scholar 

  • Chayes F (1960) On correlation between variables of constant sum. J Geophys Res 65:4185–4193

    Article  Google Scholar 

  • Chayes F (1962) Numerical correlation and petrographic variation. J Geology 70:440–552

    Article  Google Scholar 

  • Chayes F (1983) Detecting nonrandom associations between proportions by tests of remaining space variables. Math Geol 15:197–206

    Article  Google Scholar 

  • Ciszewski D, Czajka A, Blazej S (2008) Rapid migration of heavy metals and 137Cs in alluvial sediments, Upper Odra River valley, Poland. Environ Geol 55:1577–1586

    Google Scholar 

  • Du P, Xue N, Liu L, Li F (2008) Distribution of Cd, Pb, Zn, and Cu and their chemical speciations in soils from a peri-smelter area in northeast China. Environ Geol 55:205–213

    Google Scholar 

  • Egozcue JJ, Pawlowsky-Glahn V (2005) Groups of parts and their balances in compositional data analysis. Math Geol 37:795–828

    Article  Google Scholar 

  • Full WE, Ehrlich R, Klovan JE (1981) EXTENDED QMODEL: objective definition of external end members in the analysis of mixtures. Math Geol 13:331–344

    Article  Google Scholar 

  • Glasby GP, Szefer P, Geldon J, Warzocha J (2004) Heavy-metal pollution of sediments from Szczecin Lagoon and the Gdansk Basin, Poland. Sci Total Environ 330:249–269

    Article  Google Scholar 

  • Hoodaji M, Tahmourespour A, Amini H (2010) Assessment of copper, cobalt and zinc contaminations in soils and plants of industrial area in Esfahan city (in Iran). Environ Earth Sci 61:1353–1360

    Article  Google Scholar 

  • Imbrie J, Van Andel TH (1964) Vector analysis of heavy mineral data. Geol Soc Am Bull 76:1131–1156

    Article  Google Scholar 

  • Kim Y, Kim B, Kim K (2010) Distribution and speciation of heavy metals and their sources in Kumho River sediment, Korea. Environ Earth Sci 60:943–952

    Article  Google Scholar 

  • Kljakovic-Gaspic Z, Bogner D, Ujevic I (2009) Trace metals (Cd, Pb, Cu, Zn and Ni) in sediment of the submarine pit Dragon ear (Soline Bay, Rogoznica, Croatia). Environ Geol 58:751–760

    Google Scholar 

  • Labare MP, Butkus MA, Riegner D, Schommer N, Atkinson J (2004) Evaluation of lead movement from the abiotic to biotic at a small-arms firing range. Environ Geol 46:750–754

    Google Scholar 

  • Leinen M, Pisias N (1984) An objective technique for determining end-member compositions and for partitioning sediments according to their sources. Geochim Cosmochim Acta 48:47–62

    Article  Google Scholar 

  • Machender G, Dhakate R, Prasanna L, Govil PK (2011) Assessment of heavy metal contamination in soils around Balanagar industrial area, Hyderabad, India. Environ Earth Sci 63:945–953

    Article  Google Scholar 

  • Miesch AT (1976) Q-mode factor analysis of compositional data. Comput Geosci 1(147):159

    Google Scholar 

  • Palmer MJ, Douglas GB (2008) A Bayesian statistical model for end member analysis of sediment geochemistry, incorporating spatial dependences. Appl Statist 57:313–327

    Article  Google Scholar 

  • Pawlowsky-Glahn V, Egozcue JJ (2006) Compositional data and their analysis: an introduction Geological Society, vol 264. Special Publications, London, pp 1–10

  • Pearson K (1897) Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc Roy Soc 60:489–498

    Google Scholar 

  • Poulos SE, Dounas CG, Alexandrakis G, Koulouri P, Drakopoulis P (2009) Trace metal distribution in sediments of northern continental shelf of Crete Island, Eastern Mediterranean. Environ Geol 58:843–857

    Google Scholar 

  • Reimann C, de Caritat P (2005) Distinguishing between natural and anthropogenic sources for elements in the environment: regional geochemical surveys versus enrichment factors. Sci Total Environ 337:91–107

    Article  Google Scholar 

  • Renner RM (1993) The resolution of a compositional dataset into mixtures of fixed source compositions. Appl Statist 42:615–631

    Article  Google Scholar 

  • Renner RM (1995) The construction of extreme compositions. Math Geol 27:485–497

    Article  Google Scholar 

  • Renner RM, Glasby GP, Szefer P (1998) End-member analysis of heavy-metal pollution in surficial sediments from the Gulf of Gdansk and the Southern Baltic Sea off Poland. Appl Geochem 13:313–318

    Article  Google Scholar 

  • Selig U, Leipe T (2008) Stratigraphy of nutrients and metals in sediment profiles of two dimictic lakes in North-Eastern Germany. Environ Geol 55:1099–1107

    Google Scholar 

  • Sindern S, Lima RFS, Schwarzbauer J, Petta RA (2007) Anthropogenic heavy metal signatures for the fast growing urban area of Natal (NE Brazil). Environ Geol 52:731–737

    Google Scholar 

  • Sundararajan M, Natesan U (2010) Geochemistry of core sediments from Mullipallam creek, South East coast of India. Environ Earth Sci 61:947–961

    Article  Google Scholar 

  • Szczucinska AM, Siepak M, Ziola-Frankowska A, Marciniak M (2010) Seasonal and spatial changes of metal concentrations in groundwater outflows from porous sediments in the Gryzyna-Grabin Tunnel Valley in western Poland. Environ Earth Sci 61:921–930

    Article  Google Scholar 

  • Szefer P, Glasby GP, Pempkowiak J, Kaliszan R (1995) Extraction studies of heavy-metal pollutants in surficial sediments from the southern Baltic Sea off Poland. Chem Geol 120:111–126

    Article  Google Scholar 

  • Szefer P, Szefer K, Glasby GP, Pempkowiak J, Kaliszan R (1996) Heavy-metal pollution in surficial sediments from the southern Baltic Sea off Poland. J Environ Sci Health A31:2723–2754

    Google Scholar 

  • Szefer P, Glasby GP, Kusak A, Szefer K, Jankowska H, Wolowicz M, Ali AA (1998) Evaluation of anthropogenic influx of metallic contaminants into Puck Bay, southern Baltic. Appl Geochem 13:293–304

    Article  Google Scholar 

  • Szefer P, Glasby GP, Stüben D, Kusak A, Geldon J, Berner Z, Neumann T, Warzocha J (1999) Distribution of selected heavy metals and rare earth elements in surficial sediments from the Polish sector of the Vistula Lagoon. Chemosphere 39:145–158

    Article  Google Scholar 

  • Szefer P, Glasby GP, Geldon J, Renner RM, Björn E, Snell J, Frech W, Warchoza J (2009) Heavy-metal pollution of sediments from the Polish exclusive economic zone, southern Baltic Sea. Environ Geol 57:847–862

    Google Scholar 

  • Tylmann W, Golebiewski R, Wozniak PP, Czarnecka K (2007) Heavy metals in sediments as evidence for recent pollution and quasi-estuarine processes: an example from Lake Druzno, Poland. Environ Geol 53:35–46

    Google Scholar 

  • Wakida FT, Lara-Ruiz D, Temores-Pena J, Rodriguez-Ventura JG, Diaz C, Garcia-Flores E (2008) Heavy metals in sediments of the Tecate River, Mexico. Environ Geol 54:637–642

    Google Scholar 

  • Yalcin MG, Narin I, Soylak M (2008) Multivariate analysis of heavy metal contents of sediments from Gumusler creek, Nigde, Turkey. Environ Geol 54:1155–1163

    Google Scholar 

Download references

Acknowledgments

Dr Subhasch Shetty of the Whangarei Base Hospital and the University of Auckland School of Medicine, provided professional support and encouragement without which, this paper would not have been completed. Dr Piotr Szefer from the Medical University of Gdansk has been extensively engaged into preparation of the paper. His initiating, stimulating and critical remarks undoubtedly resulted in great improvement of the first version. However, in spite of great contribution, P.S. has had objections to the structure of manuscript as well as to ways of presentation and interpretation of some data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ross M. Renner.

Appendix

Appendix

Perturbations

Let \( x_{0} + y_{0} = 1 \) be a two-part composition where x 0 is the initial proportion of some element X in a region (e.g., Fe), and y 0 is the proportion of Y, the entire remainder. Subsequently, due to either natural attrition or augmentation, the initial proportions x 0 and y 0 are perturbed by factors a 1 and b 1, respectively. The new proportions of X and Y, are related to x 0 and y 0 since by definition, x 1a 1 x 0, and y 1b 1 y 0. However, \( x_{ 1} + y_{ 1} = 1 \), necessarily, hence,

$$ x_{ 1} = a_{ 1} x_{0} /\left( {a_{ 1} x_{0} + b_{ 1} y_{0} } \right) \, \;{\text{and}}\;\;y_{ 1} = b_{ 1} y_{0} /\left( {a_{ 1} x_{0} + b_{ 1} y_{0} } \right), $$

Forming a log-ratio of x 1 and y 1, the denominator (a 1 x 0 + b 1 y 0) cancels and,

$$ { \log }\left( {x_{ 1} /y_{ 1} } \right) = { \log }\left( {a_{ 1} x_{0} /b_{ 1} y_{0} } \right), $$
$$ { \log }\left( {x_{ 1} /y_{ 1} } \right) = { \log }\left( {x_{0} /y_{0} } \right) + { \log }\left( {a_{ 1} /b_{ 1} } \right) $$

If over time there were n sequential perturbations \( \left( {a_{ 1} ,b_{ 1} } \right), \, \left( {a_{ 2} ,b_{ 2} } \right), \ldots , \, \left( {a_{\text{n}} ,b_{\text{n}} } \right), \) then by the same algebra, the second log-ratio is given by,

$$ { \log }\left( {x_{ 2} /y_{ 2} } \right) = { \log }\left( {x_{ 1} /y_{ 1} } \right) + { \log }\left( {a_{ 2} /b_{ 2} } \right) $$

Substituting in this for log(x 1/y 1) from above,

$$ { \log }\left( {x_{ 2} /y_{ 2} } \right) = { \log }\left( {x_{0} /y_{0} } \right) + { \log }\left( {a_{ 1} /b_{ 1} } \right) + { \log }\left( {a_{ 2} /b_{ 2} } \right) $$

Similarly for log(x 3/y 3), log(x 4/y 4), and so on to the n-th log-ratio which is given by,

$$ { \log }\left( {x_{\text{n}} /y_{\text{n}} } \right) = { \log }\left( {x_{0} /y_{0} } \right) + { \log }\left( {a_{ 1} /b_{ 1} } \right) + { \log }\left( {a_{ 2} /b_{ 2} } \right) + \cdots + { \log }\left( {a_{\text{n}} /b_{\text{n}} } \right) $$

That is, log(x n /y n ) is formed from the initial log(x 0/y 0) plus the sum of these n effects. Assuming n is sufficiently large, and given certain mild regularity conditions, the Central Limit Theorem of probability theory predicts that the distribution of a family of observations like log(x n /y n ) will follow the bell-shaped curve of the normal distribution. Denoting a real observation of X by x and the remainder Y, by y, we have y = 1 − x. The log-ratio value, u, for x is u = log[x/(1 − x)]. For a percentage, v = 100x, we have u = log[v/(100 − v)], which defines the transformation of all element concentrations in the statistical analysis. Assuming all such values x are the results of a continuous history of natural perturbations, we treat the log-ratios as normally distributed, or approximately so. In any event, the t statistic based on that assumption is robust.

The linear model and the t statistic

The linear model adopted for the seven regional means of the log-ratios of each element is described as follows. Denoting the mean log-ratio of an element in region i by \( \mu_{i} ,\,i = { 1},{ 2}, \ldots , 7, \) the pooled mean μ for the element is defined to be \( \mu = \, \left( {\mu_{ 1} + \mu_{ 2} + \cdots + \mu_{ 7} } \right)/ 7 \). So that by setting a regional mean \( \mu_{i} = \mu + \alpha_{i} , \) each α i measures the deviation of μ i from the pooled mean μ. Adding these seven equations it follows that \( \alpha_{1} + \alpha_{2} + \cdots + \alpha_{7} = 0 \) necessarily. This is a sum-to-zero constraint. One further assumption is that the variability in all the measurements of an individual element is assumed to be accounted for by a constant variance, σ 2 (the homoscedastic assumption), the parameters \( \mu ,\alpha_{1} ,\alpha_{2} , \cdots \alpha_{7} \) are estimated by ordinary least squares regression, such that the estimates all obey the equations above. Consequently, unless they are all zero, the estimates for the deviations must include positive and negative values, indicating abundances above and below the pooled mean. If â i denotes the estimate for each α i then a t statistic, \( T_{i} = \left( {\hat{a}_{i} - \alpha_{i} } \right)/s_{i} \) is defined for any given α i (s i is the so-called standard error of the estimate). Assuming an underlying normal distribution of log-ratios, the expected value of T i is zero, but its observed value, which varies with â i will be assessed by whether or not it falls within or outside a range of probable values for the t statistic. In this study, it was assumed that each α i  = 0 implying that for each element, the regional averages μ i were all equal to the pooled average μ. When α i  = 0, the expression for the observed t statistic becomes, \( T_{i} = \left( {\hat{a}_{i} - 0} \right)/s_{i} \) or \( T_{i} = \hat{a}_{i} /s_{i} ,\;i = { 1},{ 2}, \cdots ,{ 7} \). So it follows that an extremely large positive or negative value of â i will result in a large positive or negative value of T i . In that case, the assumption that the true deviation, α i  = 0, may be improbable. Representative two-tailed probabilities for the t statistic with 52 degrees of freedom are set out in Table 2. For example, the second column of Table 2 defines the 5% “significance level” for this study. A frequentist interpretation of the Prob(|T| > 2) < 5.0% as indicated, would be that 95% of values of the t statistic lie between −2 and +2, so an observed value outside that range is “unlikely”, with only a 1 in 20 chance. Note, a 5% significance level is conventional in many fields and hence included here. Tail-end probabilities are very close to zero for |T| > 4.5. That is, 100.00% of the values of T are in the range −4.5 to 4.5. Observed values outside that range are extremely rare unless |α i | ≫ 0, indicating a large deviation from the pooled mean.

Statistical notes

In the equations below, y is the 59 × 1 column vector of the log-ratios of the element concentrations, X is the 59 × 7 incidence (or design) matrix, b is the 7 × 1 vector of estimates for μ and six of the α i (the sum to zero constraint determines the 7th), \( {\mathbf{b}} = \left( {{\mathbf{X}}^{\text{T}} {\mathbf{X}}} \right)^{ - 1} {\mathbf{X}}^{\text{T}} {\mathbf{y}} \), and \( \left( {{\mathbf{y}} - {\mathbf{Xb}}} \right) \) is the 59 × 1 vector of residuals.

The estimate for σ 2 is given by \( s^{2} = \frac{1}{n - p}\left[ {\left( {{\mathbf{y}} - {\mathbf{Xb}}} \right)^{\text{T}} \left( {{\mathbf{y}} - {\mathbf{Xb}}} \right)} \right] \) where n = 59 and p = 7. The i-th diagonal element of the 7 × 7 matrix \( \left( {{\mathbf{X}}^{\text{T}} {\mathbf{X}}} \right)^{ - 1} \) is denoted by k i and so the Standard Error of The Estimate associated with region i is given by \( s_{i} = s\sqrt {k_{i} } \).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Renner, R.M. Statistical comparisons of heavy metal pollutants between seven regions of the Polish exclusive economic zone. Environ Earth Sci 67, 987–997 (2012). https://doi.org/10.1007/s12665-012-1542-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12665-012-1542-1

Keywords

Navigation