1 Introduction

Soil forming processes and ecosystem services provided by the pedosphere are central to the Critical Zone (Lin 2010; Banwart 2011). Soil erosion reduces soil quality by reducing soil depth, degrading soil structure, and reducing organic carbon and nutrient contents. In addition to these on-site effects, increased sediment delivery due to accelerated soil erosion can lead to pollution and eutrophication of downstream water bodies (Zamparas and Zacharias 2014; Yang et al. 2017). Moreover, high sedimentation rates reduce dam and reservoir storage capacity, compromising water supply and hydroelectric power generation (Hu et al. 2009; Zhao et al. 2017). These off-site consequences of soil erosion are often experienced at significant distances downstream. Knowledge of sediment transport processes and identifying the origin of sediments in river catchments is therefore necessary to understand, predict, and remediate off-site erosion impacts.

Sediment fingerprinting techniques are often used to identify sediment sources within a catchment. As the properties of the material being transported through river networks essentially reflect biogeochemical processes occurring in the Critical Zone (Amundson et al. 2007), the fingerprinting approach is based on the similarity of physical or biogeochemical properties between target sediment and their potential upstream sources (Klages and Hsieh 1975; Yu and Oldfield 1989; Walling and Woodward 1995; Collins et al. 1996). The relative source contribution is estimated through parameter optimization of mass balance un-mixing models, which are typically either stochastically solved in a Monte Carlo simulation (Collins et al. 2013; Wilkinson et al. 2015; Tiecher et al. 2016) or in a Bayesian framework (Cooper et al. 2014; Cooper and Krueger 2017).

Although many different sediment properties have been used to identify sources, sediment elemental composition has been commonly used in fingerprinting studies to distinguish source contributions according to land use (Collins et al. 2010; Voli et al. 2013; Cooper et al. 2015; Pulley et al. 2017), geological units (Olley and Caitcheon 2000; Wilkinson et al. 2013; Laceby and Olley 2015), and, less frequently, soil classes (Evrard et al. 2013; Lepage et al. 2016; Le Gall et al. 2017). In addition to aiding catchment management, Koiter et al. (2013b) argue that the information obtained in such studies can be used to understand the underlying processes that regulate sediment transport and generate the individual geochemical signatures within sources.

Large catchments present particular problems for fingerprinting studies. The long distances between potential upstream sources and the catchment outlet often lead to increased residence times, which may intensify fluvial sorting processes and particle size selectivity (Koiter et al. 2013a, b). Moreover, large catchments often have a diversity of land uses, parent materials, and soil classes. In these settings, a land use-based source apportionment may be unsuitable for geochemical fingerprinting, due to within land use soil variability (Pulley et al. 2017). In such cases, lithological and/or confluence-based source stratifications might be more effective (Collins et al. 2017). While lithology has been proven to be a main control of sediment geochemistry in catchments with contrasting felsic/mafic geological units (Laceby et al. 2015), pedogenetic processes may provide an important insight to source signal development in catchments with less dissimilar parent materials, as demonstrated by Bajard et al. (2017). These processes might be particularly relevant in tropical environments, where intense weathering-leaching may have considerable influence on soil, and ultimately, sediment properties.

The selection of sediment geochemical properties prior to modeling has received much attention in fingerprinting studies, and recent work has brought to question the validity of widely used statistical approaches (Smith et al. 2018). To address this, Koiter et al. (2013a) and Laceby et al. (2015) have proposed a combination of statistical and process/knowledge-based methods, which increases interpretation possibilities of modeling estimates. Ideally, fingerprint properties should be conceptually relatable to upstream processes regarding sediment transport and geochemical source signals (Koiter et al. 2013a). Given that the soil is an important regulator of these processes, pedological knowledge can offer valuable information regarding geochemical tracer selection.

Furthermore, understanding the relationship between sediment particle size and elemental concentration is imperative to improve the knowledge of sediment tracer predictability (Laceby et al. 2017). Fluvial processes typically have a sorting effect on sediment particles, which usually decrease in median grain size with travelled distance as a result of selective transportation and deposition (Walling et al. 2000). Given that soil elemental composition is strongly related to particle size, transport selectivity can affect geochemical fingerprinting properties (Koiter et al. 2013b). Moreover, different processes regulate sediment transport in varying size fractions. While coarser fractions have a greater interaction with channel bed, finer loads are controlled primarily by catchment sediment supply and are therefore less influenced by river transport capacity (Walling and Collins 2016). Hence, sediment source contributions can display contrasting patterns across different size fractions (Haddadchi et al. 2016). Although the influence of particle size on sediment source signals is widely recognized, relatively few studies have focused on tracing different particle size fractions (e.g., Motha et al. 2002; Hatfield and Maher 2009; Haddadchi et al. 2016).

The evaluation of sediment fingerprinting approaches is crucial to enable informed decision-making based on modeled source apportionments. However, gathering independent data to test the outputs of fingerprinting models is problematic, as reliable alternative techniques to quantify source contributions (i.e., suspended sediment yield measurements from multiple sub-catchments or source unit end-members) can be operationally complex and expensive (Collins et al. 2017). Therefore, artificial mixtures with known proportions of sediment source groups have been increasingly used to, at the very least, test the accuracy of un-mixing model estimates (Haddadchi et al. 2014; Sherriff et al. 2015; Pulley et al. 2017; Cooper and Krueger 2017). With this approach, the robustness of the models is assessed by a comparison of calculated source contributions and known mixture proportions (Haddadchi et al. 2014).

The goal of this research is to develop a tributary tracing technique that incorporates pedological knowledge of tropical soil formation/erosion processes into sediment source apportionment and tracer selection across multiple particle size fractions. The study is conducted in the Ingaí River basin (~ 1200 km2), Brazil, which has a complex geological and pedological heterogeneity. We compare a knowledge-based element tracer selection to a statistical methodology, which are both evaluated against a set of artificial mixtures. While others have incorporated knowledge-based criteria to the selection of fingerprinting properties, our approach is the first to be comprehensively grounded on pedological reasoning, highlighting the role of soils as regulators of the processes leading to source signal development. Multiple particle size fractions are analyzed to understand the relationship between particle size and source signal, as well as their interaction with fluvial transport processes. The outcomes of this research will help develop appropriate strategies for sediment fingerprinting and management in tropical environments, while also contributing to our knowledge of processes affecting sediment geochemistry and transport across different particle sizes.

2 Materials and methods

2.1 Catchment description

The Ingaí River basin (~ 1200 km2) is located within the upper Grande River basin, in southeastern Brazil (Fig. 1c). The Ingaí River is formed by sources in the Mantiqueira mountain range and flows into the Capivari River, which is dammed near its confluence with the Grande River, at the Funil hydroelectric power plant reservoir. Altitude ranges from approximately 1780 m in the headwaters to 900 m at the catchment outlet. The predominant climate type according to Köppen’s climatic classification is humid subtropical with dry winters and warm summers (Cwb) with an average annual precipitation of ~ 1500 mm (Hijmans et al. 2005; Alvares et al. 2013).

Fig. 1
figure 1

Geological (a) and pedological (b) map of the Ingaí River basin, Brazil (c). S1: upper catchment; S2: mid catchment; S3: lower catchment. Adapted from CODEMIG–CPRM (2014) and FEAM (2010)

The Ingaí River basin is set upon old surfaces, mostly made of metamorphosed Proterozoic and Archean rocks (Fig. 1a). The upper catchment is dominated by both paragneiss (38%) and orthogneiss (32%) (CODEMIG-CPRM 2014) (Table 1). The remaining area contains biotite-schists of the same formation as the paragneiss, though with a less intense metamorphic facies. Although the main soil class is Paleudult (48%), there are also areas of Hapludoxes (20%) and Ustorthents (16%) (FEAM 2010) (Fig. 1b). Land use consists mainly of extensive, minimally managed, pastures (64%), found on the slightly more fertile blocky structured Paleudults (Fig. 2a). Erosion is typically only evident where cattle trails create preferential water pathways that tend to evolve to rills and small gullies. Also, cropland located on steep slopes in the absence of soil conservation practices often results in isolated erosion hotspots.

Table 1 Percentage area distribution of soil classes, lithological units, and land use in the Ingaí River basin and source groups
Fig. 2
figure 2

a Characteristic landscape of the upper catchment. b Gully erosion formed in the intermediate region of the Ingaí basin. c Shallow soils derived from the quartzitic/mica-schistic ridges of the lower catchment

In the mid catchment, the relief is gentler and the river valley widens enough to generate some clastic Quaternary sediment deposits (CODEMIG-CPRM 2014). The surface is again very old, with a predominance of orthogneiss (65%). Cropland is more widespread, despite the major occurrence of Dystrudepts (54%), which are shallow and highly erodible soils. Gullies are a common feature, often associated with degraded pastures and unpaved roads, some of which have been used since colonial times in the early eighteenth century (Fig. 2b).

In the lower area of the catchment, the Ingaí River crosses a Proterozoic ridge formation dominated by quartzite, mica-schist, and phyllite (CODEMIG-CPRM 2014). These same rocks establish the northern boundary of the watershed. The steeper slopes contain Ustorthents and rock outcrops (46%) (FEAM 2010). Soils are very shallow because of naturally high erosion rates, which remove the surface soil layer before pedogenetic processes take place at greater depths (Resende et al. 2014) (Fig. 2c). The environment restricts agriculture to eucalyptus stands and extensive cattle grazing. In addition, mine pits for commercial quartzite exploration are found in the region. In the last decade, some of these mines have been fined or had their activities suspended due to irregularities regarding deforestation and waste disposal (Borges 2011; G1 Sul de Minas 2016). The remaining area of the lower catchment is dominated by biotite-schist, metagraywacke, and orthogneiss (Table 1), upon which Hapludoxes (54%) have developed, favored by the gentler landscape. These soils have the most intense agricultural use in the watershed: soybean followed by maize and wheat or oats are a common no-till crop rotation scheme.

Accordingly, three geographical source units were established: (i) the upper catchment (S1), comprised predominantly of Paleudults derived from gneiss; (ii) the mid catchment (S2), where Dystrudepts are widespread and are also developed from a gneissic parent material; and (iii) the lower catchment (S3), comprised of a mixture of Ustorthents, that occur in association to quartzite/phyllite/mica-schist ridge formations, and Hapludoxes, which are found in more gentle slopes formed above biotite and schist-metagraywacke bedrocks. These three geographical source end-members will be modeled as the potential contributors of target sediment sampled at the catchment outlet.

2.2 Sampling design and sample collection

A tributary sampling design (Laceby et al. 2015; Le Gall et al. 2016; Vale et al. 2016) was utilized within the catchment hydrological network to stratify potential sediment sources based on contributing area soil classes and their underlying parent material (Fig. 1). In the Ingaí catchment, the heterogeneity of lithotypes and soil classes makes it difficult to sample sources directly. The basic foundation of our approach is that a set of tributaries can be considered a specific spatial sediment source. Tributary tracing designs do not rely on hillslope connectivity assumptions, given that source samples are retrieved from the riverine system. Moreover, potential particle size selectivity during sediment transport is restricted to in-stream processes (Laceby et al. 2017).

Sediment sampling was conducted from July 2017 to February 2018. Composite samples were collected from lag deposits, which consisted of sediment drapes located on riverbanks or floodplains formed as water level receded after recent floods (Laceby and Olley 2015; Theuring et al. 2015). The uppermost sediment layer (1–2 cm) was scraped with a non-metallic trowel. Each sample was composed of approximately 15 scrapes. In total, 69 source samples (n S1 = 29, S2 = 21, S3 = 19) and 10 target sediment samples from the catchment outlet were collected.

2.3 Laboratory analysis

Samples were oven dried at 60 °C before being dry sieved into three particle size fractions: 2–0.2 mm, 0.2–0.062 mm, and < 0.062 mm. Sediment elemental composition was determined by X-ray fluorescence (XRF), using a portable XRF spectrometer equipped with a 50 kV/100 μA X-ray tube. XRF technology has been increasingly used for quantifying soil geochemistry, given that it provides a non-destructive method with rapid results and no chemical waste generation (Ribeiro et al. 2017; Silva et al. 2017). The analysis allows for the quantification of the following 45 elements: Ag, Al2O3, As, Au, Ba, Bi, CaO, Cd, Ce, Cl, Co, Cr, Cu, Fe, Hf, Hg, K2O, La, MgO, Mn, Mo, Nb, Ni, P2O5, Pb, Pd, Pt, Rb, Rh, S, Sb, Se, SiO2, Sn, Sr, Ta, Th, Ti, Tl, U, V, W, Y, Zn, Zr. Each sample was measured in triplicates, and the average element concentration was used. Elements below detection limits on all tributary source samples were excluded from subsequent analyses (Electronic Supplementary Material, Table S1). P2O5 was not considered as a possible tracer due to potential biogeochemical transformations during transport in aquatic environments (Koiter et al. 2013b; Cooper et al. 2015; Sherriff et al. 2015). Unfortunately, the portable XRF spectrometer broke down near the end of analyses. Accordingly, for the intermediate particle size fraction, one source sample from the mid catchment and two catchment outlet samples were not analyzed.

2.4 Artificial mixtures

To test the accuracy and precision of the un-mixing models, a set of 10 artificial mixtures with different known relative source contributions were produced for each sediment size fraction (Table 2). Sub-samples of equal mass were retrieved from each of the individual dried/sieved composite samples. The sub-samples from the same source units were then combined in a source pool, which was later used to create mixtures with known source mass proportions. Elemental composition of the artificial mixtures was used to solve the un-mixing models as if the artificial mixtures comprised the outlet target sediment. Similar approaches to model testing have been adopted by Cooper et al. (2014), Haddadchi et al. (2014), and Pulley et al. (2017).

Table 2 Artificial mixtures with known source contributions used for model evaluation

2.5 Element selection

In this study, widely used statistical procedures to tracer selection were compared to a process-based methodology, where prior knowledge of soil geochemistry is used to identify elements that are expected to provide source discrimination. For the statistical approach, a commonly used three-step method to element selection was employed. First, box plots were used to evaluate if elements on target samples plotted within the mixing polygon defined by element concentrations on individual source types. Elements on target sediments with a range of variation plotting outside the source ranges were excluded, as tracer properties outside mixing polygons violate numerical modeling assumptions and may lead to spurious results (Collins et al. 2013). Box-plot range of variation is defined as the 25th and 75th percentiles ± extreme values within 1.5 times the interquartile range (IQR). The use of these ranges helps to select elements which are well bounded by the distributions of the mixing polygon. If only minimum and maximum values are taken into account, element distributions from target sediments may plot outside all but potentially one of the source samples. This would bias the un-mixing model solutions in the Monte Carlo simulation, which samples parameter values from data distributions. Elements within the source range were grouped by source and then tested for normality with a Shapiro-Wilk test. When the null hypothesis that the data comes from a normal distribution was rejected (p < 0.05), the elements were analyzed with a Kruskal-Wallis H test. Otherwise, elements were analyzed with an ANOVA. Elements that provided significant discrimination between sources (p < 0.05) were analyzed with a forward step-wise linear discriminant analysis (LDA) (niveau = 0.1) in order to select a minimum set of variables that maximizes source discrimination (Collins et al. 2010). All statistical analyses were performed with R software (R Development Core Team 2017). Packages MASS (Venables and Ripley 2002) and klaR (Weihs et al. 2005) were used for the multivariate analyses.

The knowledge-based approach to element selection essentially relies on the interpretation of the theoretical source apportionment and sampling design. While the upper and mid catchment areas have a similar parent material, soil classes may provide an adequate stratification: Paleudults from the upper area are more weathered-leached than Dystrudepts from the mid catchment, which means that the first soils are deeper, have higher clay content and higher residual concentration of Al and Fe oxides than the latter (Kämpf and Curi 2012). The lower catchment provides more of a challenge, given that the soil map presents an association of Ustorthents and Hapludoxes. However, a greater tributary density is associated to shallow headwaters (Fig. 1), which allows us to assume that sediments from the lower area will have a greater connection to the least weathered-leached soils in the catchment. Hence, it is expected that this sediment source will be characterized by much higher contents of SiO2, due to minimal dessilicification. Mica-inherited K2O may also be found in greater quantities than it should be expected in the mid and upper areas. Ti and Zr are some of the most resistant elements in the soil (Marques et al. 2004; Koiter et al. 2013a) and would be expected to occur at reduced concentrations in the lower catchment sediments, reflecting the younger parent material and the underdeveloped soils. Accordingly, Al2O3, Fe, K2O, SiO2, Ti, and Zr were proposed as potential knowledge-based tracers. Elements from target samples plotting outside the source range of variation were excluded from modeling, similarly to the statistical approach, for each sediment particle size fraction. The selected knowledge-based tracers were also analyzed with a LDA to compare the reclassification accuracy of the element selection methods.

2.6 Modeling

Source contributions were estimated by minimizing the sum of squared residuals (SSR) of the mass balance un-mixing model:

$$ SSR=\sum \limits_{i=1}^n{\left[\left({C}_i-\sum \limits_{s=1}^m{P}_s{S}_{si}\right)/{C}_i\right]}^2 $$

where n is the number of elements used for modeling, Ci is the concentration of element i in the target sediment, m is the number of sources, Ps is the optimized relative contribution of source s, and Ssi is the concentration of element i in source s. Optimization constraints were set to ensure that source contributions Ps were non-negative and that their sum equaled 1.

The un-mixing model was solved by a Monte Carlo simulation with 2500 iterations. In each iteration, target and source element concentrations were sampled from a multivariate normal distribution, which preserves correlations between variables (Cooper et al. 2014). Prior to modeling the multivariate distributions, element concentrations were log transformed to ensure a near normal distribution and to avoid possible negative concentration values. During the Monte Carlo simulation, element concentrations were back-transformed by an exponential function. R packages foreach (Calway et al. 2017) and Rsolnp (Ghalanos and Theussl 2015) were used to script the simulations and the optimization functions, respectively. Modeling results are presented as the median and the IQR of possible un-mixing model solutions based on the Monte Carlo simulations. The IQR is a more adequate measure of variability for highly skewed data than the standard deviation, as it is not influenced by extreme values (Sainani 2012). Local optimization functions typically produce heavily skewed data, as some model realizations lead to best fit scenarios where one source provides 100% of the sediments and others 0% (Cooper et al. 2014). Accordingly, the IQR may provide a more informative representation of parameter distributions than broader confidence intervals.

Model accuracy was evaluated against artificial mixtures according to their Mean Absolute Error (MAE):

$$ MAE=\sum \limits_{s=1}^m\frac{\mid {X}_s-{P}_s\mid }{m} $$

where m is the number of sources, Xs is the known proportion of source s on the artificial mixture, and Ps is the median of modeled relative contribution of source s. Sediment geochemical data and R un-mixing model scripts are included as Electronic Supplementary Material.

3 Results

3.1 Element selection and source analysis

Of all the 45 analyzed elements, 19 (42%) were below detection limit on all source samples for the coarse (2–0.2 mm) and intermediate (0.2–0.062 mm) fractions, whereas 13 (29%) elements were not detected for the fine fraction (< 0.062 mm) (Electronic Supplementary Material, Table S1). Of the detected elements, only 13 (52%) plotted within the mixing polygons for the coarse fraction, mainly because of higher element concentrations in the outlet target sediments (all element selection results are displayed in Table 3). Concentrations of major (e.g., K2O and CaO) and trace elements (e.g., Y and Sr) were enriched in the outlet sediment when compared to source samples. For the intermediate and fine fractions, 22 (88%) and 30 (97%) elements plotted within the source mixing polygons, respectively.

Table 3 Selected elements for modeling after each step of the statistical procedure for each size fraction

Of the elements plotting within the mixing polygon for the coarse and intermediate fractions, five (38%) and six (27%) elements, respectively, failed to provide significant discrimination between sources according to the Kruskal-Wallis H test (or ANOVA for normally distributed elements) (Electronic Supplementary Material, Tables S2-S4). For the fine fraction, only four elements (13%) failed to reject the null hypothesis of the employed statistical tests (Electronic Supplementary Material, Table S5).

The forward step-wise LDA selected four elements for modeling the coarse fraction (Fe, Cl, SiO2, and V), which were able to correctly reclassify only 64% of the samples according to a cross-validation (Fig. 3). For the intermediate fraction, nine elements (Al2O3, CaO, Fe, K2O, Mo, Ti, V, Y, and Zn) selected by the LDA correctly reclassified 84% of the samples. For the fine fraction, eight selected elements (Al2O3, Ba, Ce, K2O, Nb, Pb, Y, and Zr) yielded 90% reclassification accuracy.

Fig. 3
figure 3

LDA bi-plots of source reclassification using the selected elements from the statistical approach. Ellipses represent 90% confidence intervals

For the knowledge-based elements proposed for modeling (Al2O3, Fe, K2O, SiO2, Ti, and Zr), only Al2O3, Fe, SiO2, Ti, and Zr plotted within the mixing polygon for the coarse fraction. No elements were outside the source range for the intermediate fraction, and therefore all proposed elements were used in the un-mixing model. For the fine fraction, a depletion of Fe contents on target samples led to the exclusion of this element from analysis.

The LDA reclassification accuracy was on average 9% lower for the knowledge-based element selection method in comparison to the statistical approach, which could be expected. However, a similar trend of increasing accuracy was observed with a decrease of particle size, as the percentage of correctly reclassified samples ranged from 58% for the coarse fraction to 78% and 80% on the intermediate and fine fractions, respectively.

Overall, the behavior of the knowledge-based proposed elements for all size fractions was in accordance with the anticipated scenario used to stratify sediment sources in the Ingaí catchment: sediments from catchment headwaters (S1) are derived from more weathered-leached soils (mainly Paleudults), with a higher residual concentration of Fe, Al2O3, Ti, and Zr (Fig. 4). Samples from the lower catchment (S3) display decreased Fe, Al2O3, Ti, and Zr contents and a higher concentration of SiO2, which confirms that these sediments were generated from younger soils (mainly Ustorthents). Samples from the mid catchment (S2), where Dystrudepts are the main soil class, have intermediate concentrations of the discussed elements in comparison to S1 and S3. Also expectedly, K2O contents were higher overall on S3 samples, except for the coarse fraction.

Fig. 4
figure 4

Scatter plots of the knowledge-based proposed elements for each sediment size fraction. S1: upper catchment; S2: mid catchment; S3: lower catchment

3.2 Artificial mixtures and model evaluation

The comparison between modeled source contribution and actual mixture proportions demonstrate that modeling the coarse fraction yielded the poorest results, with a MAE error of 23.8% on the statistical variable selection model (M1) and 17.8% on the knowledge-based variable selection model (M2) (Table 4 and Fig. 5). On the intermediate fraction, model error decreased from 22.6% on M1 to 10.9% on M2. Results from the fine fraction had the lowest errors and a more similar model performance between M1 (MAE = 12.9%) and M2 (MAE = 11.8%).

Table 4 Mean absolute errors (MAE) of the statistical variable selection model (M1) and the knowledge-based variable selection model (M2) for the three analyzed sediment size fractions
Fig. 5
figure 5

Scatter plots of known and modeled source contributions of the artificial mixtures for each sediment size fraction. S1: upper catchment; S2: mid catchment; S3: lower catchment; M1: statistical element selection model; M2: knowledge-based element selection model

Considering all size fractions, models were more effective at estimating the source contributions of artificial mixtures 1–4 (MAE = 9.8%), in which source proportions varied from 25 to 50%. Results from artificial mixtures 5–10, in which source proportions ranged from 0 to 75%, had increased error (MAE = 21.2%). Overall, the models had a greater difficulty distinguishing contributions from S2 (MAE = 18.25%) than from S1 (MAE = 15.0%) and S3 (MAE = 16.9%). Such behavior is particularly evident for the fine fraction, where the MAE of M2 decreased from 15.5% on S2 to 7.2% on S3.

3.3 Model results for the Ingaí catchment

Source proportions estimated by M1 and M2 for the coarse fraction are highly uncertain, as demonstrated by the prediction intervals on Fig. 6, and no inference can be made based on the data. Moreover, considering the median source proportions estimates, the models display contrasting results: M1 indicates that target sediments are derived mainly from S2 (median = 40%; IQR = 0–87%), whereas M2 signals a higher contribution from S1 (median = 39%; IQR = 5–76%) (Electronic Supplementary Material, Table S6).

Fig. 6
figure 6

Box plots of estimated source contributions based on the 2500 iterations of the Monte Carlo simulations. (a) Coarse fraction (2–0.2 mm). (b) Intermediate fraction (0.2–0.062 mm). (c) Fine fraction (< 0.062 mm). S1: upper catchment; S2: mid catchment; S3: lower catchment; M1: statistical element selection model; M2: knowledge-based element selection model

Results from the Monte Carlo simulations again demonstrate a high degree of uncertainty for the intermediate fraction source apportionments, which are contrasting between models. For the intermediate fraction, M1 estimates that the contribution to outlet sediments are dominated by S2 (median = 57%, IQR = 8–100%), whereas M2 estimates reveal a greater contribution from S3 (median = 60%, IQR = 0–94%) and S1 (median = 16%, IQR = 0–61%).

For the fine fraction, the simulation results display much narrower source apportionment estimates. M1 indicates that contributions from S1 (median = 0%, IQR = 0–3%) and S2 (median = 0%, IQR = 0–18%) are negligible, with target sediments being almost completely derived from S3 (median = 93%, IQR = 71–100%). M2 results are nearly identical, estimating that S3 (median = 96%, IQR = 77–100%) is again the dominant source, with insignificant contributions from S1 (median = 0%, IQR = 0–2%) and S2 (median = 0%, IQR = 0–11%).

4 Discussion

Source signal development in the Ingaí catchment is controlled primarily by pedogenetic processes, which display different degrees of expression across particle sizes. Such behavior was reflected throughout this research, starting with the elements identified by XRF analysis. Fewer elements were detected for the coarse and intermediate fractions (Table 3), which could be expected, since trace elements are retained in greater quantities in finer particles (Antoniadis et al. 2017). Moreover, a greater proportion of detected elements for the coarse and intermediate fractions were outside the source range. We deliberately avoided using the term conservative behavior to describe this process, as we do not have evidence that the elements failing to plot within source range were depleted or enriched during sediment transport due to biogeochemical mechanisms or to changes in physical properties, including grain size distributions. Nevertheless, the greater number of elements plotting outside source mixing polygons, particularly for coarse sediments, may indicate that there has been particle size selectivity occurring during mobilization, transportation, and deposition processes or there could be a missing/unsampled source of coarse material near the catchment outlet (Smith and Blake 2014; Laceby et al. 2015).

By comparing the composition of target and source samples, it can be observed that unlike the source sediments, in which Al2O3 increased with decreasing particle size, the highest Al2O3 contents on the catchment outlet target sediments were associated with the intermediate and coarse size fractions (Fig. 7). Moreover, the coarse fraction had the highest Fe and the lowest SiO2 concentrations, which is also inconsistent with the tributary source sample patterns. Within soils derived from a same parent material, elements found in stable clay minerals (e.g., Al2O3 and Fe) usually occur in greater residual concentrations on finer particles, as demonstrated by Silva et al. (2018). Contrarily, SiO2 decreases with particle size, due to dessilicification and of the lower stability of quartz in the clay fraction (Fontes 2012). The higher concentration of Al2O3 and Fe for the coarse and intermediate fractions of the target sediments may therefore suggest that these fractions have received a greater contribution of sediments derived from a contrasting parent material compared to the sources influencing the fine fraction. Such parent material is likely to have been un- or under-sampled, which may explain the number of elements plotting outside the source range for the coarser sediments.

Fig. 7
figure 7

Al2O3, Fe, and SiO2 contents on source (S1: upper catchment; S2: mid catchment; S3: lower catchment) and target (T) sediments

Results from the analyses of variance and the LDA also demonstrate contrasting patterns regarding the geochemical composition of sediments across particle sizes. Fewer elements provided statistical discrimination between sources for the coarse and intermediate sediments compared to the finer fraction, according to the employed tests (i.e., ANOVA or Kruskal-Wallis) (Table 3). These results demonstrate that the source stratification was more effective for the fine fraction, likely because the geochemical source signal in the Ingaí catchment is mainly associated to pedogenetic processes (e.g., dessilicification, residual accumulation of Al and Fe in pedogenetic oxides). These processes are more clearly expressed on finer, more weathered-leached particles, and particularly on clay minerals (Kämpf et al. 2012). Conversely, the coarser particles may be more representative of the parent material (Curi and Kämpf 2012), which is less contrasting among the sources in the catchment. These findings may also reflect on the poor reclassification accuracy of the LDA for the coarse fraction (64% and 58% for the statistical and knowledge-based approach, respectively) when compared to the intermediate (84% and 78% for the statistical and knowledge-based approach, respectively) and fine fractions (90% and 80% for the statistical and knowledge-based approach, respectively). Interestingly, the forward step-wise LDA selected elements that were also proposed by the knowledge-based approach for all size fractions (SiO2 and Fe for the coarse, Al2O3, Fe, K2O, and Ti for the intermediate, and Al2O3, K2O, and Zr for the fine). This demonstrates that these elements provide both statistical and pedological discrimination between sources.

The model evaluation against artificial mixtures corroborates the lack of source discrimination for the coarse fraction, in which the MAE for both statistical (M1) (23.8%) and knowledge-based (M2) (17.8%) models is higher than what is usually reported on similar studies (e.g., Haddadchi et al. 2014; Pulley et al. 2017; Cooper and Krueger 2017). For the intermediate fraction, although M2 yielded the lowest MAE (10.9%) among the analyzed models and particle sizes, high errors were again associated to M1 (22.6%). In contrast, M1 (MAE = 12.9%) and M2 (11.8%) displayed a similar performance for the fine sediments, and a greater confidence can be ascribed to model predictions for this fraction.

The modeling results for the catchment outlet target sediments for the coarse fraction again demonstrate poor source discrimination, given the uncertainty of the estimates (Fig. 6). Moreover, the relative contributions from the upper and mid catchment represented by model predictions seem unlikely considering the results for the finer fraction, which predict with little uncertainty that target outlet sediments are derived almost entirely from the lower catchment. As coarser material is often transported as bed or saltating bed load, at slower rates than the finer wash load (Collins and Walling 2016), proximal sources are usually the major contributors of coarse sediment particles (Haddadchi et al. 2016). Therefore, estimated source contributions from the mid and upper catchment for the coarse fraction are more likely to have been derived from other downstream sources, probably in close proximity to the outlet sediment sampling location, with a similar soil parent material as the mid and upper regions of the watershed.

In a similar way, modeling the intermediate fraction indicated a considerable, although also very uncertain, contribution from the mid and upper catchment for both models (Fig. 6). Again, such contributions seem unlikely to represent sediment dynamics in the catchment, and a missing or under-sampled source located proximately to catchment outlet might be biasing model predictions.

A possible provenance of sediments identified as derived from the upper and mid catchment by the un-mixing models may be related to a strip of orthogneiss located near the outlet of the Ingaí River (Fig. 1). This lithotype comprises only 3% of the lower catchment and a single composite sample was retrieved from a tributary draining the area. The concentrations of Al2O3 (13.9%), Fe (3.6%), and SiO2 (37.0%) for the coarse sediments from this particular sample were different to the average concentrations of these elements in the other lower catchment samples (Al2O3 = 6.1%, Fe = 2.2%, SiO2 = 51.6%). The sample concentrations are however similar to the average contents of Al2O3 (13.8%), Fe (4.3%), and SiO2 (34.0%) for the coarse fraction of the target outlet sediments. Nevertheless, this interpretation of the modeling results remains speculative, and the most important inference from the data is that the spatial scale of the source stratification was not appropriate for fingerprinting the coarse and intermediate size fractions.

Contrarily to the coarser fractions, the source contributions estimated for the fine sediments are consistent among the employed models (Fig. 6). The similarity between model results increases the confidence in the predictions, which are also corroborated by the small errors of the estimated source proportions of the artificial mixtures. Moreover, the results fit with our understanding of erosion and sediment transport dynamics in the catchment.

According to model predictions, the fine sediments collected at the watershed outlet are almost entirely derived from the lower catchment. These sediments are primarily associated with the shallow and underdeveloped Ustorthents from the quartzitic/mica-schistic ridges within the lower catchment, as demonstrated by the higher SiO2 and K2O contents and the lower Al2O3, Fe, Ti, and Zr concentrations. This Entisols region is erosion prone: the solum is shallow and the underlying C horizon is situated right below the A horizon, decreasing water infiltration and increasing runoff propensity (Araújo 2006). These soils are also located on steep slopes and have elevated contents of silt and fine sand in relation to clay (Curi et al. 1990). Hence, a large sediment supply from these soils in the lower catchment is plausible. Furthermore, the lower catchment is much closer to the Ingaí River outlet than the mid and upper areas. Fine sediments originated from these upstream sources have a greater probability of being stored on floodplains and lower-gradient sections.

Results reported by Le Gall et al. (2017) also show that the contribution of fine sediments from farther upstream sources on large catchments is minor, at least considering the sediments that effectively reach the catchment outlet. Such behavior must be analyzed with caution, as fingerprinting the origin of outlet sediments does not necessarily represent overland and fluvial transport processes elsewhere in the catchment (Koiter et al. 2013a). These considerations might be particularly important in large watersheds, where sediment yield components are likely to be subjected to a variety of travel times and transport energies, which will also vary with particle size (Parsons 2011), as illustrated by our results.

The Ingaí River drains approximately 60% of the Capivari River basin, which is estimated to supply over 480,000 t year−1 of sediment to the Funil hydroelectric power plant reservoir (Batista et al. 2017). Accordingly, fine sediment from the Ingaí River may contribute significantly to reservoir sedimentation. Soil conservation practices targeting the lower Ingaí Entisols may therefore help minimize fine sediment delivery to the Funil reservoir. According to RUSLE-based estimates (Batista et al. 2017), average erosion rates were the highest in the mid catchment area. Therefore, future research should monitor erosion dynamics across multiple scales and different particle size fractions in the Ingaí catchment. Ultimately, it is important to understand how different Critical Zone processes regulate sediment connectivity throughout the catchment in order to help target the implementation of best management practices that limit the deleterious off-site effects of soil erosion.

Overall, our results demonstrate that source stratification and geochemical element selection for sediment fingerprinting can be carried out based on the knowledge of pedogenetic processes that develop source signals in tropical soils. However, such an approach might be less effective for coarse sediment particles, particularly if parent material has few geochemical contrasts. In this sense, a soil-based source stratification might be more powerful for fine sediment fingerprinting than a geological approach, given that pedogenetic processes and soil forming factors other than parent material are also able to generate contrasting source signals, particularly in tropical soils. Nevertheless, for modeling coarse sediment provenance, a geological source stratification may be more appropriate, as pronounced lithological dissimilarity might dominate the source signal generation for coarse material.

The comparison between the element selection methods demonstrated that the commonly used three-step statistical approach does not necessarily yield more accurate model predictions, which is corroborated by the results of Smith et al. (2018). However, a valuable outcome of using both methods is that different model predictions can be compared. If similar results are achieved with a different set of tracers, a greater confidence can be ascribed to model estimates (Laceby et al. 2015).

A significant advantage of a knowledge-based element selection is that subsequent modeling results are more easily relatable to known source characteristics. In the knowledge-based approach, processes occurring in the Critical Zone that drive source signal development, erosion, and sediment transport can be conjointly analyzed. This contributes to a more comprehensive understanding of these processes, and generates multiples lines of evidence to corroborate or falsify model assumptions and predictions. The use of the knowledge-based approach encourages researchers to understand the fundamental Critical Zone processes driving erosion and sediment geochemistry across multiple scales. This increased understanding of fundamental processes is instrumental to improve catchment sediment management strategies, particularly in erosion-prone tropical environments.

5 Conclusions

In this research, the pedological knowledge of tropical soils was incorporated into source stratification and geochemical element selection in a fingerprinting study across three particle size fractions. Our approach provided source discrimination for the fine and intermediate size fractions, as demonstrated by the comparison of the un-mixing model estimates and artificial mixture proportions. However, the source stratification was unable to provide sufficient geochemical discrimination for the coarse sediments. This probably stems from the fact that pedogenetic processes are the main drivers of geochemical contrast and source discrimination between fine sediment sources, whereas geological background may be more likely to drive these contrasts for the coarser material. Model evaluation against the artificial mixtures also indicated that the commonly used three-step statistical approach to variable selection may not always provide the most accurate estimates.

The spatial scale of the source stratification was however unable to represent the coarse and intermediate size sediment dynamics in the catchment, which seems to be controlled by very proximal sources—at least in the temporal scale of the analysis. Hence, different field sampling approaches might be necessary to model specific size fractions in the Ingaí catchment, and potentially in other catchments.

For the fine sediments, both knowledge-based and the statistical methods to geochemical element selection yielded very similar results: Ustorthents from the lower catchment ridges are by far the main sediment source reaching the Ingaí River outlet. The consistent model results increase confidence in the predictions. Moreover, the knowledge-based method facilitates the interpretation of the results, as the selected fingerprinting properties can be explicitly related to upstream processes regarding source signals and behavioral characteristics of the soils comprising each end-member source. This enhanced interpretation of fingerprint models provides a framework for an integrated assessment of Critical Zone dynamics, linking soil and parent material geochemistry to soil erosion and sediment transport processes in river catchments.

The source stratification procedure and the knowledge-based element selection for sediment fingerprinting described in this study have potential to improve sediment management strategies across Brazil and around the world. This approach would be particularly useful in large catchments where soil parent materials have similar geochemistry, and source signal development of fine sediments is controlled by pedogenetic processes. Ultimately, understanding the fundamental pedogenetic processes driving the formation of source signatures will likely aid in the management of the dominant Critical Zone processes driving erosion in Brazil and in other tropical regions where intense weathering-leaching leads to unique expressions of soil forming processes.