Introduction

One of the main challenges in geochemical data analysis is to describe and model time–space changes in the composition of environmental media (e.g., water, soil and stream sediment) and to link them to the dynamics of natural or anthropic drivers. Compositional changes are often driven by thermodynamic and kinetical laws and can modify a given natural matrix in a transient or permanent way (Bethke, 2008). In this framework, most natural systems are open and classified as dissipative. This means they contain subsystems that permanently fluctuate, in an apparently stationary way, until fluctuations become strong enough to move the system to a new state, often unexpectedly (Kondepudi & Prigogine, 1998; Shvartsev, 2009; Roduner & Radhakrishnan, 2016). Earth history has been marked by occasional and unexpected shifts in some characteristics, for example, temperature and climatic conditions, able to generate alternative states characterized by dynamic regimes at different spatial and temporal scales (Kleidon, 2010; Heinze et al., 2021; Li et al., 2021). Gradual changes in some underlying conditions (drivers) can bring a system closer to a catastrophic bifurcation point (tipping point) causing a loss of resilience. Even small perturbations may therefore trigger an abrupt shift to an alternative system state. In this context, fractal structures in time and space offer an optimal nonlinear means to dissipate energy gradients developing spontaneous emergent properties, as typical of complex dissipative systems (Seely & Macklem, 2012). Thus, the presence of scale-invariant structures are examples of how the increasing energy dissipation and entropy production create order, structure, and complexity, as demonstrated in spatial patterns of geochemical data (Zuo & Wang, 2016). The identification of early warning signals (or leading indicators) represents a fundamental challenge to limit restoration costs or irreparable damages to ecosystems and to understand if a critical threshold is being approached in a gradual or a critical (sudden) way (Weinan et al., 2021). Several investigations have been undertaken on the mathematical properties of such indicators and on the identification of tools able to highlight the governing (additive or multiplicative) dynamics of a system (van Rooij et al., 2013). The presence of modularity and heterogeneity appears to favor adaptive capacity and gradual changes, while connectivity and homogeneity increase the resistance to change, thus opening the path to sudden critical transitions (Scheffer et al., 2012). It is evident that the nature of the interconnections among the parts of a system is fundamental to understanding its behavior. In our case, the parts that can interact, simultaneously or at a different hierarchical scale, are the single variables of the composition and the whole is the composition itself, a condition replicated for all geological samples. However, when geochemical data are analyzed by using several mathematical and statistical tools, it is important to recognize that concentrations are relative data (%, ppm, mg/L and similar measurement units) and that the investigation of the variance–covariance structure of the whole cannot ignore the presence of induced relationships (Chayes, 1960). In fact, it is not sufficient to apply some multivariate procedure to capture the joint behavior of the whole, since compositional data do not pertain to a Euclidean space. Implications of using an incorrect sample space are evident especially when comparing confidence regions in a constrained space with the real ones (Buccianti & Magli, 2011). Hence, it is necessary to consider the principles of the compositional data analysis (CoDA, Aitchison 1982) and work in the simplex sample space by using its algebraic structure (the so-called “stay in the simplex approach”) or transforming data to move variables from the simplex to the Real space, (“log-ratio approach”). In this research, the “stay in the simplex approach” was adopted by focusing on the use of the perturbation operator to investigate compositional changes from source to mouth in surficial waters and stream sediments of the Tiber river (central Italy). Measuring the “nearness” or “farness” between two compositions requires the use of a distance/difference measure, and the results may depend on how this distance is measured. For this reason, testing the performance of different statistical methods to trace the evolution of a geochemical composition is of critical importance. Starting from a \(n \times p\) data matrix (n number of cases, p number of variables or parts) unclosed perturbation factors were estimated considering the difference with respect to a selected reference composition, thus obtaining a new \(n\times p\) matrix. The cumulative sum of these factors for each composition (n row sums) was then compared with n values of the robust Mahalanobis distance (RMD), calculated in a compositional context, and with n geometric means. The latter, representing the compositional barycenter of the composition (row barycenter), assumes in CoDA a central role in the log-ratio approach, tracing the interactions among the parts inside each composition (Pawlowsky-Glahn et al., 2015). The goal of this research is to assess the potential of these techniques for their applications to environmental monitoring, with special attention to the role of the geometric mean as a potential quick monitoring index.

Materials and Methods

Overview on the Tiber River and Its Catchment

The Tiber river was chosen as a study area due to the extensive knowledge acquired from previous studies (Gozzi et al., 2019) on the different sources of riverine variability (i.e., natural, seasonal, anthropogenic). A deep understanding of the catchment processes is important in order to evaluate the performance of the tested methods. The Tiber river is the second-largest Italian river in terms of catchment size (ca. 17,375 km2), second only to the Po river. It contributes, with a mean annual discharge of about 240 m3/s, to almost 20% of the riverine inputs into the Tyrrhenian Sea. The basin has a mean altitude of 520 m and a climate regime from sub-coastal to marine with a mean annual precipitation of about 1200 mm (Bagnini et al., 2005). During its 409 km course from the Mt. Fumaiolo to the sea, the Tiber river is fed by several major tributaries: Paglia-Chiani and Treia on the right bank, Chiascio-Topino, Nera-Velino and Aniene on the left (Fig. 1).

Figure 1
figure 1

Location of the sampling points along the Tiber river together with a simplified scheme depicting the geology and water circulation in the catchment

A torrential and turbulent regime characterizes the upper stretch of the river, whereas a slow and meandering flow dominates after the confluence with the Nera. The natural variability of the catchment is linked to its heterogeneous morphological, geological and hydrogeological setting which defines sub-basins with important differences in terms of lithology, discharge rate, summer base flow and winter run-off (Boni et al., 1986). The upper basin is dominated by impermeable flysch sediments and surface run-off processes prevail over infiltration, leading to an irregular outflow regime (Fig. 1). Conversely, the SE sector is characterized by Apennine limestones which ensure high infiltration and a more stable outflow with reduced seasonal variations. Potassic and ultrapotassic volcanic rocks prevail to the SW, with intermediate hydrogeological features compared to the previous ones.

Anthropic pressures are significantly high due to the presence of important urban centers (e.g., Rome), industrial activities and diffuse pollution from agricultural areas, representing 53% of the total land use. The discharge distribution of the Tiber river has been significantly impacted by the construction of multiple hydropower dams and weirs (Fig. 1). A decrease in the amount of suspended load is being observed over the last decades (Iadanza & Napolitano, 2006) together with a reduction of sediment supply to the coast. This is strongly contributing to the erosion of the beach at Ostia, the coastal environs of Rome (Ruggiero, 2020).

Geochemical Dataset

The geochemical dataset consists of: (i) 38 water samples collected from 19 sampling sites along the river course in different hydrological conditions; and, (ii) 17 stream sediments taken from the river bank at the same locations, except for two points at which it was not possible to collect the samples due to safety reasons (Fig. 1). Water samples refer, with a good approximation, to high and low flow rate conditions, even if distributed over two years (2017–2018), as highlighted by Gozzi et al. (2021). All major cations and ions were considered for a total of 10 chemical variables with concentrations expressed in mg/L. For stream sediments, 8 major and 8 trace elements were selected for their high variability and environmental interest. Concentrations were reported in % and ppm for major and trace elements, respectively. In CoDA, it is a good practice to consider all parts of the composition in the same units if a joint analysis is planned. In our case, major and trace elements were analyzed separately; therefore, the use of common measurement units is not compulsory. The only two values below the lower detection limit in the water dataset were replaced by 2/3rds of the detection limit, while none are found for the stream sediment dataset. In the presence of a significant number of values below the detection limit, the use of imputation methods such as those described by Templ et al. (2016) is suggested. More detailed information about sampling and analytical methods is reported in Gozzi (2020).

Computational Methods

Perturbation Operator

Since the sample space associated with a D-part composition is the unit simplex:

$$\begin{aligned} S^D = \{{[x_1,\ldots ,x_D)]: x_i > 0\;(i = 1,\ldots ,D),\; x_1+\ldots +x_D = 1}\} \end{aligned}$$
(1)

in order to trace compositional changes, the perturbation is the group operation able to counterpart translation in real space (classical Euclidean statistics) and rotation on the sphere (circular statistics).

Given two D-part compositions \(\mathbf {x}, \mathbf {y} \in S^D\), \(\mathbf {x} \oplus \mathbf {y}\) is defined by:

$$\begin{aligned} \mathbf {x} \oplus \mathbf {y} = \frac{[x_1y_1,\ldots ,x_Dy_D]}{(x_1y_1+\ldots +x_Dy_D)}=C [x_1y_1,\ldots ,x_Dy_D] \end{aligned}$$
(2)

where the “closure” operator C standardizes the perturbation vector by dividing by the sum of its components so that the sum is equal to unity. Following Aitchison and Ng (2005), the operation \(\oplus\) defines an Abelian group on the simplex with identity \(e=(1/D)[1,\ldots ,1]\) and inverse \(x^{-1}=C[x_1,\ldots ,x_D]\). This mathematical property is fundamental to characterize the operation that changes a D-part composition \(\mathbf {x}\) into a D-part composition \(\mathbf {y}\), thus answering the geochemical question: what is the perturbation \(\mathbf {p}\) such that a composition \(\mathbf {x}\) is transformed, by the action of several environmental and anthropic factors, in a composition \(\mathbf {y}\)? The answer is given by the inverse operation, also called perturbation difference by analogy with standard operations in real space (Pawlowsky-Glahn et al., 2015):

$$\begin{aligned} \mathbf {p} = \mathbf {y} \ominus \mathbf {x}= C\left[ \frac{y_1}{x_1},\ldots , \frac{y_D}{x_D}\right] \end{aligned}$$
(3)

The perturbation, together with the powering operation, analogous to scalar multiplication in real space, both defined in \(S^D\), satisfies the requirements for operations of a vector space for which an inner product, a norm and a distance can also be defined (Billheimer et al., 2001; Pawlowsky-Glahn & Egozcue, 2001). Consequently, any measure of difference between compositions \(\mathbf {x}\) and \(\mathbf {y}\) must be expressible in terms of perturbations. In geochemistry, the perturbation approach finds easy application where an initial specimen of composition \(\mathbf {x}_{\mathbf {0}}\) is subjected to a sequence of perturbations \(\mathbf {p}_{\mathbf {1}},\ldots ,\mathbf {p}_{\mathbf {n}}\) to reaching its current state \(\mathbf {x}_{\mathbf {n}}\). For example, compositional changes can be traced for river water and stream sediment chemistry starting from the source to the sink following the landscape morphology. The results can be interpreted in the light of the dynamics of geochemical processes, thus discovering areas experiencing a major rate of change or more resilient zones. The outcomes could be useful for both monitoring environmental changes, in the light of climate changes modifying erosion rates and hydrological cycle, and applied environmental geochemical studies. Perturbations may be computed sequentially after ranking the data according to a spatiotemporal sequence or a physical-chemical criterion (Gozzi et al., 2020; Sauro Graziano et al., 2020). Alternatively, they can be determined with respect to a reference composition with cases previously ordered following some criteria (e.g., spring distance, conductivity, elevation, or other landscape parameters). Common examples of reference compositions are those of spring water for rivers or the average Earth crust for stream sediments.

Perturbation can be represented as: (i) a composition expressed in percentage (%), as in Eq.  3; (ii) multiplicative non-closed factors by considering ratios of Eq.  3 without the closure application; and (iii) percentage of increase\decrease. A detailed description of the use of these modalities is presented in Egozcue and Pawlowsky-Glahn (2011).

In this work, perturbations are represented as multiplicative non-closed factors (Pfs) by considering the ratios \(\left[ \frac{y_1}{x_1},\ldots , \frac{y_D}{x_D}\right]\) of Eq.  3 without applying the closure operator C. For every factor, \({x_i}\) represents each term of the reference composition, the choice of which is discussed in Section “The Reference Composition: a Central Issue”, while \({y_i}\) the terms of the sample composition from the ordered sequence along the river course. Since Pfs take into account the ratios \({y_i}/ x_i\) for \(i=1,\ldots ,D\), the closer the latter are to 1, for each D-part, the more alike the two compositions are. On the contrary, if a ratio is \(\gg 1\) or \(\ll 1\), this indicates, respectively, an excess or a diminution of the involved variable compared to the benchmark. The resulting Pfs, representing the contribution of every single term of the composition with respect to the reference, are reported with the letter “p” followed by the considered chemical component (e.g., \(\text{p}\mathrm {HCO}_{3}^{-}\), \(\text{p}\mathrm {Ca}^{2+}\)). These compositional deviations monitored by the Pfs might be the sentinels of major environmental modifications or stressors. For this reason, Pfs were compared with the variations of potential drivers within the corresponding catchment (e.g., lithotypes and land-use). The latter were derived from Copernicus Land Monitoring Service (2019) and ISPRA Ambiente (2017), respectively, and estimated according to the procedure described in Gozzi et al. (2021). Lithotypes and land use changes are here expressed in terms of drainage area differences between each point and the previous one along the source-to-mouth sequence.

Ultimately, all Pfs of a D-part composition were summed for each row j of the data matrix as in Eq.  4 and reported in the text as \(\sum\)Pfs.

$$\begin{aligned} \sum _{i=1}^D Pf_j = p_{j1}+p_{j2}+\ldots +p_{jD} \quad \text{ for }\quad j=1,\ldots ,n \end{aligned}$$
(4)

This operation was done in order to obtain a comprehensive measure to trace the downstream evolution of the chemical composition of water and stream sediment samples. Pfs are here reported as multiplicative non-closed factors and their denominators do not contain common parts. Therefore, they are positive real numbers and the sum by row of these factors could be considered a proper operation in the positive Real space. This quantity, representing an interesting cumulative information about the compositional change, will be compared with the other tested measures illustrated in the following Section.

Comparative Methods: Mahalanobis Distance and Geometric Mean

The Mahalanobis distance is a distance measure which accounts for the variance–covariance matrix of the data (Mahalanobis, 1936). Its robust version calculated through the minimum covariance determinant (MCD) estimator (Rousseeuw & Van Driessen, 1999; Filzmoser & Hron, 2008; Filzmoser et al., 2012) is defined as:

$$\begin{aligned} \mathrm {RMD}_{i}=\sqrt{(\mathbf {x}_i-\varvec{\mu }_{\mathrm{MCD}}) \varvec{\Sigma }^{-1}_{\mathrm{MCD}}(\mathbf {x}_i-\varvec{\mu }_{\mathrm{MCD}})^\top } \quad \text{ for }\quad i=1,\ldots ,n \end{aligned}$$
(5)

where \(\mathbf{x} _1,\ldots ,\mathbf{x} _n\) are the observations in the p-dimensional space having center \(\varvec{\mu }_{\mathrm{MCD}}\) and covariance \(\varvec{\Sigma }_{\mathrm{MCD}}\), these last calculated from the sample data \({\overline{\mathbf {x}}}\) and \(\mathbf {s}\), respectively. To reliably estimate the location and scatter from which distances are computed, a certain number of compositions is required, and a single reference sample (or case) is not sufficient. This number depends on how many variables and cases are present in the dataset. Distances were computed in R with the package robustbase (Maechler et al., 2018) after having transformed the data in pivot coordinates (Filzmoser et al., 2018), a special case of the isometric log-ratio coordinates, calculated by means of the function pivotCoord within the R package robCompositions (Templ et al., 2011). Therefore, \(\varvec{\mu }_{\mathrm{MCD}}\), distances to \(\varvec{\mu }_{\mathrm{MCD}}\) and \(\varvec{\Sigma }_{\mathrm{MCD}}\) are here estimated from the pivot coordinates.

The geometric mean may represent an alternative measure of location to the mean when data of a given geochemical variable are skewed and near-lognormal, but it is equally susceptible to outliers (Rock 1988). As pointed out by Reimann and Filzmoser (2000), the geometric mean can be used, but it may be problematic in some cases. Rock (1988) highlights that its use requires caution since it vanishes or becomes imaginary in the event of zero or negative values in the data. This occurs in geochemistry with all standardized data, values lower than a detection limit or with real negative data (e.g., stable isotopes values). In CoDA, the geometric mean is well known since it enters the calculation of the commonly applied centered and isometric log-ratios transformations (Aitchison, 1986; Egozcue et al., 2003). In this framework, it is usually calculated not for the values of a given variable but for a composition or its subsets; in other words, the calculus is performed on the rows of a data matrix. The geometric mean is also used to describe, with bar plots, differences between groups in compositional datasets (Martín-Fernández et al., 2015). To summarize, when computed for data belonging to the same variable, it could measure, under some assumptions, the univariate barycenter of the values. However, the geometric mean \(\mathrm {g}(\mathbf {x}\)) can also be computed in a multivariate framework for a D-part chemical composition \(\mathbf {x} =x_{1},x_{2},\ldots ,x_{D}\), as follows:

$$\begin{aligned} \mathrm {g}(\mathbf {x}) = \root D \of {x_{1}\cdot x_{2} \cdots x_{D}}. \end{aligned}$$
(6)

In this case, \(\mathrm {g}(\mathbf {x}\)) is calculated for the entire composition of each sample from data pertaining to different chemical variables. What does this measure represent from a geochemical perspective if calculated in a multivariate context? Similarly, it is expected that it will measure the compositional barycenter of the entire composition. The main difference is that, in this case, the barycenter is not univariate but multivariate and the effects of multiple variables are taken into consideration altogether. To better answer this question, the geometric mean of water and sediment compositions was compared with both \(\sum\)Pfs and RMD. This was done with the aim of gaining a greater understanding of its geochemical meaning, its relation with single variable fluctuations and with other multivariate measures.

The consistency of the three different approaches, \(\mathrm {g}(\mathbf {x}\)), \(\sum\)Pfs and RMD, was cross-checked, after having removed outliers, by means of H-scatterplots (at lag 0) created using the R package: Astsa (Stoffer, 2021). All the considered measures represent potential tools to monitor the spatial or temporal evolution of a composition, taking into consideration the interaction among its constituent parts. By comparing these measures, we expect to verify if all these quantities equally capture the fundamental compositional changes downriver and, if differences are found, the potential influences will be examined.

The Reference Composition: A Central Issue

The choice of the reference composition is critical since it influences the nature and magnitude of the monitored changes. To capture all variations, the objective is to identify a suitably homogeneous composition not too far from the environmental conditions of the target samples. As concerns rivers, the spring water composition is often considered as a benchmark to detect downstream hydro-geochemical variations linked to water–rock interaction processes as well as various anthropogenic pressures. However, its composition might significantly differ from that of streams since it flows directly from an aquifer, not being affected by the multiple interactions with the surficial environment. This is the case of the Tiber source which, with a total dissolved solids (TDS) of about 250 mg/L, is compositionally inhomogeneous. This is mainly due to the high and low relative concentrations of \(\mathrm {NO}_{3}^{-}\) and \(\mathrm {K}^{+}\), respectively. For this reason, the mean composition of 14 samples collected over the two-year period of study from the upper reaches of the catchment and having the lowest TDS (310–510 mg/L) was chosen as a reference from which Pfs are calculated. These include 3 samples collected from the main course and 11 from tributaries, characterized by a low variability and absence of outlying observations. In the presence of outliers and therefore a significant difference between the mean and median value, the use of the latter as estimator for the reference is recommended. The resulting composition represents a local pristine surficial water of calcium-bicarbonate type with a low content of the other ions (Table 1a).

Table 1 (a) Reference water composition from which unclosed perturbation factors are calculated; summary statistics of unclosed perturbation factors for each term of the composition in flood (b) and drought (c) periods. Median values \(<1\), \(>1\), \(>1.5\) and \(>2\) are highlighted in green, yellow, orange and red colors, respectively

The same waters were used to estimate the robust compositional center and covariance matrix from which RMDs were calculated (Gozzi et al., 2019). The results measure how each observation differs from the benchmark condition, taking into account the multivariate information. In this way, Pfs and RMDs were computed starting from common reference compositions, thus helping to compare the potentials of the two methods in disclosing compositional changes.

A similar approach was used for stream sediments by considering 15 homogeneous and uncontaminated samples, 3 from the main course and 12 from tributaries. The final reference compositions for major and trace elements are reported in Tables 2a and 3a.

Table 2 (a) Reference stream sediment composition for major elements from which unclosed perturbation factors are calculated; (b) summary statistics of the unclosed perturbation factors for each element of the composition
Table 3 (a) Reference stream sediment composition for trace elements from which unclosed perturbation factors are calculated; (b) summary statistics of unclosed perturbation factors for each element of the composition

Results

Compositional Perturbations in River Water

The medians of the Pfs exceed 1 for most of the variables, indicating an increased ion content with respect to the reference (Table 1b–c). The sole exceptions are \(\text{p}\mathrm {Ca}^{2+}\), \(\text{p}\mathrm {HCO}_{3}^{-}\) and \(\text{p}\mathrm {Mg}^{2+}\) showing medians ranging from 0.83 to 0.94. The higher values are found for \(\text{p}\mathrm {NO}_{3}^{-}\) in flood periods (3.46) and p\(\mathrm {F}^{-}\), p\(\mathrm {Cl}^-\), p\(\mathrm {SO}_{4}^{2-}\) in dry conditions (2.39, 1.95 and 1.92, respectively). Pfs’ variability significantly increases in the season of minimum flow as well as the skewness of the distributions, with p\(\mathrm {NH}_{4}^{+}\) showing the highest maximum value. The curves represented in Figure 2a, c highlight the evolution of \(\text{p}\mathrm {NO}_{3}^{-}\) and p\(\mathrm {NH}_{4}^{+}\) from the source to the mouth. These species are handled separately from the rest of the composition owing to their large range and link to anthropogenic processes.

Figure 2
figure 2

Unclosed perturbation factors (Pfs), ordered from source to mouth, for (a) \(\mathrm {NO}_{3}^{-}\) and (c) \(\mathrm {NH}_{4}^{+}\) in the Tiber river waters; the horizontal black dashed line marks the reference value. (b) Differences in drainage areas characterized by forests and seminatural areas, agricultural areas, artificial surfaces and (d) water bodies and pastures (Copernicus Land Monitoring Service 2019); the single area is calculated as difference between that of each point and the previous one along the sequence; the vertical gray lines indicate potential associations between the considered Pfs and land use variations

After a peak at the spring, \(\text{p}\mathrm {NO}_{3}^{-}\) starts to significantly increase at a distance of 57 km. High perturbations are monitored down to 57 km with values up to 9.5 reached in the dry period. At 270 km, when the Nera-Velino river system enters the main course, Pfs decrease, reaching a value similar to the reference in summer. Comparing the increments of drainage area referring to different land uses (Fig. 2b) with \(\text{p}\mathrm {NO}_{3}^{-}\) fluctuations, several matches are noticed especially with forests and agricultural areas. On the contrary, Pfs of \(\mathrm {NH}_{4}^{+}\) are overall more restrained (Fig. 2c). High values occur mainly in the upper river stretch and at 203 km, downstream from the Corbara reservoir. At this place, a great spike shows up achieving a value of 17. A decrease is also found in this case at 270 km, after which Pfs remain below the benchmark until the river mouth. Potential associations of these changes with the increasing area covered by pastures and water bodies are inferred from Figure 2d.

The curves depicting the downstream evolution of \(\text{p}\mathrm {Ca}^{2+}\), \(\text{p}\mathrm {HCO}_{3}^{-}\) and \(\text{p}\mathrm {Mg}^{2+}\) confirmed their stable pattern with low changes with respect to the reference in both the sampling conditions (Fig. 3a, b). Conversely, the Pfs of the other species show a higher variability downstream from 221 km. In the upper-middle course, these variables do not contribute as much to the compositional change. Minor perturbation increments in this area are mainly due to \(\mathrm {K}^{+}\) and \(\mathrm {F}^{-}\). Values much lower than 1 are detected at the source. This fact, together with the high \(\text{p}\mathrm {NO}_{3}^{-}\) (Fig. 2a), confirmed the spring to be compositionally far from the main river. Corresponding to the input from the Paglia-Chiani river system at 221 km, important deviations take place primarily in terms of \(\mathrm {K}^{+}\), \(\mathrm {F}^{-}\) and \(\mathrm {SO}_{4}^{2-}\) with increased perturbations in the dry period. Pfs of \(\mathrm {Cl}^-\) and \(\mathrm {Na}^{+}\) are affected as well, but higher values are recorded downriver at 270 km, followed by a further fluctuation at 327 and a peak to the mouth solely in summer (Fig. 3a, b). The shifting points in the river water match fairly well the different drained lithotypes (Fig. 3c). In fact, the major compositional changes appear associated with a greater lithological complexity marked by the overlapping fluctuations of Fig. 3c (e.g., 221 and 270 km). Particularly noteworthy is the perfect correspondence of p\(\mathrm {F}^{-}\) with the increasing areas underlain by the volcanic rocks.

Figure 3
figure 3

Unclosed perturbation factors (Pfs), ordered from source to mouth, for major elements in the Tiber river waters in (a) drought and (b) flood conditions; the horizontal black dashed line marks the reference value. (c) Differences in drainage areas characterized by different classes of lithotypes (ISPRA Ambiente 2017); the single area is calculated as difference between that of each point and the previous one along the sequence; the vertical gray lines indicate potential associations between Pfs and lithological variations

Compositional Perturbations in Stream Sediments

Major Elements

The medians of the Pfs for major elements slightly exceed 1 for most of the variables (Table 2b). Values lower than 1 are only found for pCa and pP (0.81 and 0.88, respectively). Overall, the medians are smaller compared to those of river waters, with no values over 1.3. Similarly, data variability is also reduced, with the highest range belonging to pNa and pMg (Table 2b). Unlike the Pfs of the waters, those of stream sediments are characterized by multiple fluctuations along almost the whole river course (Fig. 4a).

Figure 4
figure 4

Unclosed perturbation factors (Pfs), ordered from source to sink, for major (a) and trace elements (b) in the Tiber river sediments; the horizontal black dashed line marks the reference value. (c) Differences in drainage areas characterized by different classes of lithotypes; the single area is calculated as difference between that of each point and the previous one along the sequence; the vertical gray lines indicate potential associations between Pfs and lithological variations

Notable increments of Mg, Ca and Fe with respect to the reference affect the primary sediments supplied from headwater streams. Downstream, the most relevant perturbation mainly involves Na, which almost matches the increment of drainage area featured by flysch sediments. As the lithological heterogeneity of the watershed increases, the Pfs of multiple elements show peaks at selected locations. The main increases are monitored for: (i) pMn at 203 km; (ii) pMg, pFe and pK at 221 km; (iii) pNa at 270 km; (iv) pFe, pK and pMn at 303 km, and (v) pP from 366 km to the river mouth. The increments of pFe and pK appear well related to the presence of volcanic rocks (Fig. 4c). Moreover, it is worth noting that most of the perturbations are not maintained downstream but instead appear sourced locally, not showing any increasing or decreasing pattern along the river course. Only pNa and pFe seem to partially preserve their perturbation rate in the early-medium and lower river stretch, respectively.

Trace Elements

Pfs of trace elements are generally lower compared to the major ones, showing medians below 1 for all elements except for Ba, Rb and Cr (Table 3b). The highest range is detected for pCr, which exhibits a significant increase from 57 to 104 km, where it reaches a maximum value of 3.29 (Fig. 4b). The pPb shows the second widest range, with four great spikes at 15, 135, 221 and 366 km. Further notable deviations from the reference are detected for: (i) pSr between 14–26 km, (ii) pCu at 42 km; and (iii) pNi at 162 km. In the lower reaches, Rb and Cr show higher perturbations in correspondence of volcanic lithotypes, thus following the same behavior as pFe and pK. Overall, no trends are found during the source to sink propagation but rather trace element perturbations seem limited to short-range extents. Additionally, unlike major elements, trace element Pfs seem to flatten from 245 km to the river mouth.

Comparison among Monitoring Indices

River Waters

The comparison between the \(\sum\)Pfs with the other methods (i.e., the row geometric mean and the RMD) reveals, for the water chemistry, a good consistency between the different indices (Fig. 5).

Figure 5
figure 5

Comparison between the geometric mean of the composition \((\text{g}(\mathbf {x}))\), the sum of unclosed perturbation factors \((\sum\text{Pfs})\) and the robust Mahalanobis distance (RMD) of the river water composition in flood (a–c) and drought periods (d–f). The profiles are fitted using the smoothing cubic spline function, and predicted values are depicted with rainbow colors for an easier comparison among the same sections of the curves. The black dots represent the actual values

The main points of compositional changes along the river course are well exhibited in all cases, with only a few differences. During the high-flow period, these crucial points are located in the upper river course at 15 and 42 km, and in the lower stretch at 221 and 270 km, with an additional variation at 327, only identified by the RMD. The green sections of the curves (79–185 km) are generally characterized by the absence of significant compositional variations. In the dry period, the magnitude of these changes increases with a more relevant variation detected between 327 and 366 km and spike at the river mouth. It is worth noting that the \(\sum\)Pfs seems to emphasize the peak at 221 km, with respect to g(\(\mathbf {x})\) and RMD, which instead give more prominence to that located at 270 km.

The h-scatterplots represented in Figure 6a–f confirm the consistency of the methods showing moderately high cross-correlations values, ranging from a minimum of 0.72 to a maximum of 0.94.

Figure 6
figure 6

H-scatterplot at lag 0 relating each other the indices \((\text{g}(\mathbf {x})\), RMD and \(\sum\text{Pfs})\) of the water composition in flood (a–c) and drought (d–f) periods. The cross-correlation value is reported in the top-right legend, and the red line represents the lowess fit of each scatterplot

Stream Sediments

The statistical indices computed for major elements show common changing points (i.e., those at 15, 42, 221 and 366 km) with river waters (Fig. 7).

Figure 7
figure 7

Comparison between the geometric mean of the composition \((\text{g}(\mathbf {x}))\), the sum of unclosed perturbation factors \((\sum\text{Pfs})\) and the robust Mahalanobis distance (RMD) of the stream sediment composition for the major (a–c) and trace elements (d–f). The profiles are fitted using the smoothing cubic spline function, and predicted values are depicted with rainbow colors for an easier comparison among the same sections of the curves. The black dots represent the actual values

The major difference is the presence of an additional spike at 303 km and the absence of that at 270 km. The performance of the \(\sum\)Pfs and g(\(\mathbf {x})\) looks similar, as confirmed by a cross-correlations value of 0.88 (Fig. 8a).

Figure 8
figure 8

H-scatterplot at lag 0 relating each other the different indices \((\text{g}(\mathbf {x})\), RMD and \(\sum\text{Pfs})\) of the stream sediment composition for major (a–c) and trace elements (d–f). The cross-correlation value is reported in the top-right legend and the red line represents the lowess fit of each scatterplot

On the contrary, the RMD shows a different pattern, especially in the upper-middle course with displacement of some peaks and a lower correlation with the other indices. This inconsistency is even more pronounced if the trace element compositions are considered, with cross-correlations values close to zero (Fig. 8e–f). One of the main divergences of the RMD is the great spike at 104 km (Fig. 7f) not depicted by the other parameters. The \(\sum\)Pfs and g(\(\mathbf {x})\) of trace elements seem to have a similar behavior, even though the correlation index indicates a weaker association (0.45) between them. Their pattern also agrees with that of major elements, with the sole exception of a greater magnitude of the fluctuations and a more marked spike located around 135–162 km.

It is worth noting that pronounced changes in stream sediment composition and especially, those highlighted by \(\sum\)Pfs and g(\(\mathbf {x})\), occur most commonly downstream from dams (Fig. 1). This is verified for the sample at 42 km located downflow to the Montedoglio dam and similarly for those located after the Corbara (203 km) and Alviano (221 km) dams, respectively. These represent the largest artificial reservoirs along the river course, having a crucial role in curtailing sediment transport to the sea (Tacconi et al., 2020). The same occurs for stream sediment samples at 303 and 366 km, situated downstream of the Nazzano and Castel Giubileo weirs, respectively.

Discussion

Downstream Propagation of Geochemical Signals in the Tiber River

Pfs seem to track quite well the compositional fluctuations in the water chemistry from the source to the river mouth, with a good agreement with the results obtained in previous studies by means of different methods (Panichi et al., 2005; Gozzi et al., 2021; 2019). This is particularly useful when the focus is on the monitoring of a specific contaminant or toxic element. An example for the Tiber river is represented by the analysis of \(\text{p}\mathrm {NO}_{3}^{-}\) (Fig. 2a, b). The pressure of forests and agricultural areas appear to be responsible for its high values, by supplying \(\mathrm {NO}_{3}^{-}\) from anthropic and natural sources (i.e., synthetic fertilizers or NO2 from biological decay, respectively). The most prominent instances occur especially during the dry period, as also highlighted by Gozzi et al., (2021). Conversely, the volume of water from the Nera-Velino river system (270 km), draining Appennine limestones, causes a strong dilution of \(\mathrm {NO}_{3}^{-}\) contamination, as demonstrated by the lowering of values at both sampling times. The same effect also occurs for p\(\mathrm {NH}_{4}^{+}\), whose spikes in the upper course highlight a potential link with the presence of pastures, representing a possible source of \(\mathrm {NH}_{4}^{+}\) through animal waste decomposition. The huge perturbation located just below the Corbara damn and some matches of the increases with the presence of water bodies (Fig. 2b, c) suggest the potential role of reservoirs and dams in rising \(\mathrm {NH}_{4}^{+}\) concentration downstream. The latter, heavily modifying the hydraulic and sedimentation conditions, reshapes the distributions of microorganisms, affecting the nitrogen retention and transformation in rivers (Xia et al., 2018; Maavara et al., 2020). The effects of water impoundment, such as lack of turbulence and mixing, thermal stratification and anoxic conditions, may contribute, especially with bottom-release dams, in increasing ammonia in downstream waters (Symons et al., 1965; Beutel 2006).

Hence, the perturbation operator highlights the changes to which the investigated variables are subject, making it possible to link them with driving agents of variation. This facilitates the identification of those species which are more resilient to environmental changes, with relevant implications for water system monitoring plans. The stable patterns shown by \(\text{p}\mathrm {Ca}^{2+}\), \(\text{p}\mathrm {HCO}_{3}^{-}\) and \(\text{p}\mathrm {Mg}^{2+}\) most likely depict this condition, showing an efficient response in the face of the sharp increase of carbonatic lithotypes. As highlighted by Gozzi (2020) for European rivers, this state is favored by the rapid dissolution reaction of carbonates compared to silicates which maintain river waters permanently in near-saturation conditions.

By contrast, the other species are significantly perturbed due to elements inputs from surface runoff and groundwater circulation stemming from physicochemical weathering of the existing lithotypes. The three spikes of p\(\mathrm {F}^{-}\) well demonstrate the leaching of the volcanic districts of Mt. Amiata (Paglia river), Mts. Sabatini and Cimini (Treia river) and Albani hills, respectively. Differently, \(\mathrm {K}^{+}\) is perturbed only in correspondence of the first volcanic district, where also p\(\mathrm {SO}_{4}^{2-}\) increase as a result of local mixing process with \(\mathrm {SO}_{4}^{2-}\)-rich ground waters (e.g., Minissale 2004, and references therein). The perturbation of \(\mathrm {K}^{+}\) at 42 km in the flood period could be linked to the use of fertilizers or to the absence of uptakes by vegetation in winter when plants are dormant (Berner & Berner, 1996). Groundwater circulation dissolving gypsum, anhydrite and halite from Triassic evaporites of the Narni-Amelia aquifer system (Nera sub-basin) (Frondini et al., 2012) explain the considerable perturbation values of \(\mathrm {Cl}^-\) and \(\mathrm {Na}^{+}\).

As a result, these compounds appear less resilient as confirmed by the persistence of their perturbations downstream. Possibly, this is due to their conservative nature in the aqueous phase which prevents precipitation from the water column or participation in biochemical reactions and to the lack of strong dilution processes.

The downstream propagation of the water geochemical footprints is sensitive, at least for conservative species, to the effects of upstream signals, thus increasing the magnitude of overall perturbations in the lower river course (Fig. 5b, e). This condition is not exhibited in stream sediments for which upstream perturbations seem to have scarce effects downstream with no monitored source-to-sink trend. This might indicate that the sediment routing system is disrupted or intermittent, thus affecting the transfer of sediments particles and dissolved solids to the sea (Brandt, 2000; Armitage et al., 2013; Allen, 2017). A low propagation of the geochemical footprints from source regions could imply a higher stream sediment vulnerability to geochemical threats at local scale due to the lack or short-range compositional homogenization. Dams most likely play a significant role in the observed behavior by trapping sediments and reducing the natural blending of stream sediments from lithologically different watersheds during their seaward movement. This could explain why some of the major compositional variations are measured just downstream from dams.

The most relevant perturbation in the stream sediment composition involves Na, which perfectly monitors the increments of catchment area drained by flysch as a consequence of silicate weathering processes. Cr shows a high perturbation due to the denudation of the occasional ophiolitic formations outcropping in the high Tiberine valley (Bortolotti, 1961, 1962), while K, Rb and Fe are perturbed in correspondence to the presence of volcanic rocks. The high values of pCa, pSr and pMg detected between 15 and 26 km are likely related to the presence of calcareous sandstones and marls in the upstream area (Conti et al., 2016). Unexpectedly, no compositional changes in terms of these elements are observed after Apennine limestones were drained. This might be due to the lack of sediment transfer to the Tiber from the Nera whose course is interrupted and impacted by several weirs and dams constructed for hydropower purpose (Panichi et al., 2005; Iadanza & Napolitano, 2006). The high variability of pPb, characterized by 4 great spikes, may be predominantly due to anthropogenic lead contamination since no significant natural mineralizations are known in the areas concerned. Possible sources are represented by industrial emissions, paint production, coal combustion, fertilizers and pesticides (e.g., Komárek et al., 2008; Bur et al., 2009, and references therein). The perturbation at 366 km is most likely relative to Rome’s urban and industrial development; nevertheless, volcanic rocks with elevated Pb-background might provide additional Pb inputs to the lower course (Reimann et al., 2012). Delile et al. (2017), by studying the sedimentary profile in the Ostia harbor, show that lead water pipes were the primary source of lead pollution in the city’s runoff, reflecting the main stages of ancient Rome’s urbanization.

Potential of Monitoring Indices

The good consistency between the \(\sum\text{Pfs}\) with \(\text{g}(\mathbf {x})\) and RMD indicates that all indices operate well in tracking compositional changes. This appears generally true if the variables having a high variability are removed from the analysis and treated separately, as in the example proposed for river waters (Fig. 2a, c). Otherwise, the risk is that their high variability will mask other variations occurring on a smaller scale, though they are equally important (Gozzi et al., 2019). This drawback primarily affects the calculus of RMD which accounts for the variance–covariance matrix of the data. This fact is supported by the example of stream sediments (Fig. 7) where all variables were considered, including those having high variance. In this case, the RMD differs considerably from the other two indices and more extensively for the trace element composition. The reason may be found in the high variability of Na, Mg, for major elements, and Pb and Cr, for traces, distorting the variance–covariance structure of the data and consequently the distance measure. This seems to occur in spite of the minimization of the influence of outlying observations due to the application of a robust method. A higher number of cases compared to the number of variables would increase the stability of the covariance matrix, ensuring a greater consistency of RMD with the other indices. Therefore, in the presence of variables having a mixed natural/anthropogenic origins, not treatable separately, and a low number of cases, the \(\sum\)Pfs and g(\(\mathbf {x})\) could be a more reliable alternative to monitor the overall geochemical fluctuations of a composition. The few discrepancies between these two methods, mainly consisting in differences of peak magnitudes, likely depend on: (1) the choice of the reference composition from which unclosed perturbations factors are calculated; and (2) the major influence that low values have on the geometric mean, by reducing it. The high cross-correlation values of about 0.9 among the two indices, for the major element composition of both media, suggest that the reference composition was correctly selected. Diversely, the presence of low values could have played a role in determining the poorer agreement of the two approaches for trace element composition.

Therefore, the geometric mean could represent a simple and effective monitoring index to quickly assess comprehensive compositional changes in major element composition where low values are rare. From the obtained results, g(\(\mathbf {x})\) reflects the \(\sum\)Pfs of the single variables calculated by working in the simplex. This means that it is able to account for the perturbations experienced by the composition, representing a sort of multivariate barycenter. Its power, with respect to other more robust estimates such as the median, is that it keeps track of the interactions among the constituents and these last play a pivotal role in the understanding of the system resilience. A first explorative framework for a given geochemical system can be thereby realized. The source-to-sink pattern of g(\(\mathbf {x})\) (e.g., being it stable or cumulative) enables a deeper understanding of the inner workings of the river system and how the geochemical footprints are transmitted or buffered along topographic gradients. This awareness might have multiple implications for environmental monitoring, enabling a quick assessment of where or when a given medium is perturbed as a result of natural and/or anthropogenic drivers.

Conclusions

The application of the perturbation operator by staying in the simplex has led to capture the relevant source-to-mouth compositional changes in the surficial waters and stream sediments of the Tiber river. They mainly originate from water–rock interaction and denudation processes within the corresponding sub-catchments and from pollution sources linked to anthropic activities. The results revealed that the geochemical footprints of stream sediments, unlike those of waters, are poorly transmitted downstream. This suggests a low resilience of this medium to geochemical threats at a local scale due to the disruption or intermittency of the sediment routing system, also enhanced by the anthropic activities (i.e., dams and weirs).

The comparison of the sum of unclosed perturbation factors with the geometric mean of the compositions reveals both indexes providing consistent results, especially for major elements, where low values are rarer. In this case, the geometric mean reflects, with a good approximation, the sum of the unclosed perturbation factors of each composition, thus constituting a simple and effective monitoring index of the complex relationships among the involved parts. Conversely, the robust Mahalanobis distance occasionally diverges from the other two measures due to the instability of the covariance matrix; therefore, its application is only recommended for larger datasets.

In conclusion, with the awareness of the importance of the correct choice of the reference composition for the calculus of perturbations and the influence of low values on the geometric mean, the use of these methods for environmental monitoring and assessment appears promising. They offer new insights into the inner workings of the river system, its short or long compositional memory, its connectivity and thus its resilience to environmental pressures at catchment scale. Particularly, the use of the geometric mean of a composition as a quick monitoring index of comprehensive compositional variations could find broad applications in environmental studies to uncover time–space changes.