Less is more? A novel method for identifying and evaluating non-informative tracers in sediment source mixing models

Accelerated soil erosion poses a global hazard to soil health. Understanding soil and sediment behaviour through sediment fingerprinting enables the monitoring and identification of areas with high sediment delivery. Land-use specific sediment source apportionment is increasingly determined using the Bayesian mixing model MixSIAR with compound-specific stable isotopes (CSSI). Here, we investigate CSSIs of fatty acid (FA) tracer selection with a novel method to identify and investigate the effect of non-informative tracers on model performance. To evaluate CSSI tracer selection, mathematical mixtures were generated using source soils (n = 28) from the Rhine catchment upstream of Basel (Switzerland). Using the continuous ranked probability (CRP) skill score, MixSIAR’s performance was evaluated for 11 combinations of FAs and 15 combinations of FAs with δ15N as a mixing line offset tracer. A novel scaling and discrimination analysis (SDA) was also developed to identify tracers with non-unique mixing spaces. FA only tracer combinations overestimated pasture contributions while underestimating arable contributions. When compared to models with only FA tracers, utilizing δ15N to offset the mixing line resulted in a 28% improvement in the CRP skill score. δ15N + δ13C FA26 was the optimal tracer set resulting in a 62% model improvement relative to δ15N + all δ13C FAs. The novel SDA method demonstrated how δ13C FA tracers have a non-unique mixing space and thus behave as non-informative tracers. Importantly, the inclusion of non-informative tracers decreased model performance. These results indicate that MixSIAR did not handle non-informative CSSI tracers effectively. Accordingly, it may be advantageous to remove non-informative tracers, and where feasible, all combinations and permutations of tracers should be assessed to optimize tracer selection. Application of these tracer selection steps can help improve and advance the performance of sediment fingerprinting models and ultimately aid in improving erosion mitigation and management strategies.


Introduction
Accelerated soil erosion and sedimentation is a widely recognized global problem that reduces water quality and agricultural output (Bakker et al. 2004;Issaka and Ashraf 2017).
Comprehensive and economically feasible mitigation plans are required to reduce accelerated soil erosion. Therefore, effective mitigation plans need to be founded on the accurate identification of erosion sources (Collins and Walling 2004;Walling 2005;Owens et al. 2016).
Sediment source fingerprinting helps identify and apportion the main erosion sources in a catchment. Tracing sediments back to their original sources (e.g., from soils with different land uses), provides a direct, field-based approach that holds the potential to identify the relative contribution of different soil erosion sources to sediment transported downstream in various waterways (Collins et al. 1996;Gibbs 2008;Cooper et al. 2015). The technique uses various properties of the soils and sediments as fingerprints to differentiate the main erosion sources (Collins et al. 1997a(Collins et al. , 2001Walling 2013;Smith et al. 2018). For properties to be able to effectively fingerprint sediment sources, they need to discriminate between sediment sources and remain constant through detachment, transport and deposition processes, or vary in a measurable and predictable way (Motha et al. 2002;Koiter et al. 2013;Belmont et al. 2014;García-Comendador et al. 2023). Essentially, properties of the eroded sediment should remain constant or any variation during these processes must be reproducible. Sediment fingerprinting has been applied to a wide range of fluvial sediment types including: lacustrine sediment cores (le Gall et al. 2016;Lavrieux et al. 2019), flood plains (Pulley et al. 2015;Kemper et al. 2022), dam reservoir samples (Nosrati et al. 2011;Ben Slimane et al. 2013), and riverine systems (Collins et al. 2001;Bispo et al. 2020;Upadhayay et al. 2018b). A wide range of parameters have been employed to trace sediment sources, including but not limited to radionuclides , elemental geochemistry (Batista et al. 2019), compound-specific stable isotopes (Hirave et al. 2021), colour (Martínez-Carreras et al. 2010), diffuse Reflectance Infrared Fourier Transform Spectroscopy , ultraviolet-visible absorbance (Lake et al. 2022), and eDNA (Evrard et al. 2019) among others. For more information on the sediment fingerprinting technique, please consult some of the reviews in the literature (Haddadchi et al. 2013;Koiter et al. 2013;Owens et al. 2016;Collins et al. 2017Collins et al. , 2020. Sediment source apportionment is generally determined by unmixing sediment and soil fingerprints typically using linear equations (Collins et al. 1997b). The reliability of the model outputs is dependent on the mixing model used , the number of sources, the number of tracers and the dominate source contributing to the sediment load (Vale et al. 2022). Specifically, the accuracy of the apportionment increases when the primary source is well discriminated regardless of the discrimination of the other sources (Vale et al. 2022). Bayesian (e.g., MixSIAR), (Stewart et al. 2015;Stock et al. 2018) and frequentist models (Collins et al. 1997a;Pulley and Collins 2018) have the potential to provide robust and reliable erosion source information fundamental to targeting sediment management practices (Cooper and Krueger 2017;Evrard et al. 2022;Xu et al. 2022).
The compound-specific stable isotopes (CSSI) of plantderived biomarkers such as fatty acids (FA) and n-alkanes have been used to apportion sediment source contributions from different land uses (Gibbs 2008;Alewell et al. 2016;Upadhayay et al. 2018b;Lavrieux et al. 2019;Hirave et al. 2021). The CO 2 fixation routes (C 3 , C 4 , or CAM) of plants generate distinct isotopic patterns with the effect being more pronounced in C 4 plants than in C 3 plants (Reiffarth et al. 2016). Even though they are not species specific, CSSI isotopic values can further distinguish between some plant groups, for example, angiosperms and gymnosperms (Collister et al. 1994;Chikaraishi et al. 2004). Additionally, biological and environmental factors (e.g., altitude and rainfall patterns) can induce variation in the isotopic value (Reiffarth et al. 2016).
The exclusion of short, medium, and non-saturated FAs helps reduce the uncertainty related to input from nonterrestrial plant-derived FAs (Alewell et al. 2016;Reiffarth et al. 2016;Lavrieux et al. 2019). Ultimately, the 13 C FA fingerprint of the sediment mixture is determined by source mixing proportions and two parameters in each source: FA concentration and δ 13 C FA (Upadhayay et al. 2018a). Therefore, the non-linear mixing of the isotopic tracers in the mathematical mixtures requires the incorporation of FA concentration dependency. Additionally, the use of FA isotopes as tracers requires the transformation of unmixing isotopic values to the unmixing sediment. The use of the concentration dependency of FA isotopes incorporates this transformation into the model and therefore requires no additional post organic matter corrections (Alewell et al. 2016).
Fingerprinting with FA CSSIs has limitations, where source values regularly plot along a linear mixing line (Alewell et al. 2016;Lavrieux et al. 2019). Importantly, having sources plot along a mixing line can result in modelled contributions from the central source(s) being misclassified as a contribution from the sources located at the mixing line endpoints. The misclassification has previously been resolved by grouping sources with the subsequent apportionment occurring only between the two grouped sources (Alewell et al. 2016), with the drawback of not being able to distinguish between three or more sources.
The highly correlated δ 13 C of FA tracers and the resultant linear mixing line are a product of the biosynthesis of very long chain fatty acids (VLCFA, FA 22:0 -FA 30:0 ). The elongation of long chain fatty acids (LCFA, FA 16:0 -FA 20:0 ) to VLCFA proceeds with a cyclic four-step process of condensation, reduction, dehydration, and reduction (Erdbrügger and Fröhlich 2020). The elongation of FAs occurs with a δ 13 C depletion with increasing alkyl length (Chikaraishi et al. 2004). As this relationship is assumed to be similar for VLCFA of different plant groups, this may result in the δ 13 C FA tracers of different alkyl lengths having a non-unique mixing space and possibly acting as non-informative tracers.
CSSI tracers have been combined with geochemical tracers in an attempt to improve the discrimination between different land covers using non-land-use specific tracers (Lizaga et al. 2022). As geochemical tracers are not landuse specific, they require significant geological differences between land uses and low variability of lithology within land uses (Blake et al. 2012;Hancock and Revill 2013). δ 15 N has been used previously as a tracer for land-use-specific sediment source apportionment (Papanicolaou 2003;Fox and Papanicolaou 2007;Mukundan et al. 2010). However, the conservativeness of δ 15 N is questionable (Laceby et al. 2017). Here, we nonetheless use δ 15 N to expand the δ 13 C FA mixing line to a mixing polygon for mathematical mixtures (also known as: virtual mixtures and artificial mixtures). When investigating model behaviour using mathematical mixtures, the conservativeness of tracers is less relevant as sediment tracer values are calculated from source soil values and are not subject to degradation and possible isotopic fractionation effects.
Mathematical mixtures using non-concentration dependent tracers (e.g., geochemical tracers) were reported to be equivalent to laboratory mixtures . Although mathematical mixtures do not fully represent what happens during mixing processes in nature (e.g., signal degradation, tracer alteration in case of non-conservativeness, isotope fractionation, particle size transport selectivity), they are fundamental to understanding and evaluating model performance, and characterizing the uncertainty of the unmixing process (Haddadchi et al. 2014;Batista et al. 2019;Vale et al. 2022).
Currently, there is a limited application of mathematical mixtures to concentration-dependent tracers, in which tracer values of the mixture (e.g., isotopic signatures) are dependent on another parameter in source soils (e.g., the concentration of isotopic tracer). Until recently the validation of concentration-dependent unmixing models has been reliant on the generation of a small number of time-consuming laboratory mixtures (Bravo-Linares et al. 2018) or the over-simplification by the removal of the concentration dependency by assuming that isotopic tracers mix linearly (Collins et al. 2019;Bahadori et al. 2019). However, recently concentration dependent mathematical mixtures have been explored and utilised (Lizaga et al. 2022;Vale et al. 2022) for investigating model parameters.
The deficiency of suitable evaluation tools and metrics for CSSI tracer selection steps has resulted in the legacy of two commonly used assessments: a Kruskal Wallis test to optimize model performance and a polygon/boxplot range test to identify non-conservative tracers. When using a large number of tracers (e.g., geochemical tracers), linear discrimination analysis (LDA) has also been applied to reduce the suite of tracers to an optimal number with maximum discrimination (Gellis and Noe 2013;Walling 2013;Laceby et al. 2015). The LDA tracer reduction step is not commonly included when using CSSIs due to the limited number of CSSI tracers relative to the number of sources being discriminated. Upadhayay et al. (2018b) used LDA with CSSI tracers to remove bulk 13 C from the VLCFA tracer suite as bulk 13 C did not improve source LDA reclassification. Lizaga et al. (2021) also used LDA as a tracer selection step for mixtures from different time points in an attempt to optimise tracer selection for each mixture. An argument for not including the LDA when using MixSIAR is the hypothesis that the covariance structure of MixSIAR  effectively handles conservative non-informative tracers resulting in a null or a beneficial impact (Smith et al. 2018). The model's output should accurately reflect the real-world scenarios, without being reduced in the interest of enhancing model performance. An additional argument for maximising the number of tracers is to reduce the potential influence of possible non-conservativeness within the tracer set.
Land use-specific sediment source apportionment with CSSIs has been determined with all tracers that pass the two prerequisites without further validation of tracer selection. Consensus ranking (Lizaga et al. 2020) and consistent tracer selection (Latorre et al. 2021) methods have been recently applied to remove non-conservative tracers and tracers which have non-consistent results. Others have argued that tracer selection should be made on a robust biophysical-chemical foundation (Laceby et al. 2015;Batista et al. 2019), with adjustments to the tracer set aimed at supporting the reliability and accuracy of the model.
We hypothesize that the relationship between δ 13 C depletion and increasing alkyl length is similar for all land uses. If this is true, the mixing space for each FA tracer may be seen as a direct isometry translation of each other (i.e., every point of the mixing shape is moved the same distance and in the same direction), resulting in additional FA tracers having non-unique mixing spaces and being non-informative. This effect may result in the inclusion of additional δ 13 C FA tracers being seen as essentially comparable to the addition of non-informative clone tracers (i.e., an exact copy of a tracer included as an additional tracer). In this study, clone tracers are used as a standard example of tracers which have identical mixing spaces and therefore can potentially be seen as non-informative tracers. In particular, clone tracers are used to determine the capacity of the mixing model to handle non-informative tracers. The comparison of model performance using additional FA tracers and non-informative clone tracers helps to quantify the information gained by using an additional FA tracer.
Further evaluation and optimization of CSSI tracer selection in sediment source fingerprinting research is critical to increase the reliability of apportionment estimates and as a result the development of appropriate sediment management practices. In this study, we present the results for all combinations of δ 13 C FA (n = 11) and FA tracer sets including δ 15 N (n = 15) using concentration-dependent mathematical mixtures. Using a novel scaling and discrimination analysis (SDA), non-informative tracers that have a non-unique mixing space are identified. Importantly, we test the hypothesis that the covariance structure of MixSIAR can adequately handle non-informative tracers using clone tracers.

Site description and sampling
The study was conducted using source soils from the Rhine catchment upstream of Basel (10,687 km 2 ) (draining northern Switzerland and parts of southern Baden-Württemberg, Germany) and downstream of the large lakes (i.e., Konstanz, Zürich, Hallwil, Sempach, and Biel) (Fig. 1). Land use within the catchment area was mainly classified as arable land (28%), mixed forest (20%), and pasture (13%).
The Basel Rhine catchment was divided into four subcatchments: The Birs catchment, the Aare catchment, a Rhine catchment downstream of the Aare entering the Rhine, and a Rhine catchment upstream of the Aare entering the Rhine (Fig. 1). Each sub-catchment contained 3-8 sites of the major land-use classes: arable, pasture, and forest. With the aid of a connectivity model by Borselli et al. (2008), land-use specific sample locations within each sub-catchment were selected based on their high likelihood to contribute suspended sediment to the watercourses. To reduce analytical costs while maintaining the representativeness of the source samples, composite samples were generated from four individual samples located 2 m apart using a soil extraction cylinder (diameter 5.5 cm, length 5 cm). As suggested by Laceby et al. (2017) and Evrard et al. (2022), the size fraction of source soils analysed (< 100 µm), was selected based on particle size analysis of flood sediment from wider research project. Information on the sediment collection and size analysis is included in Online Resource 1.

Laboratory analysis
Lipids were extracted from 0.5-1.5 g of soil using CH 2 Cl 2 : MeOH (9: 1 v∕v) in an Accelerated Solvent Extractor (Dionex ASE 350) with the addition of FA 19:0 as an internal standard. The total lipid extract was separated into polar, neutral, and acidic fractions using solid-phase extraction on aminopropyl bonded silica as described in Jacob et al. (2005). The acidic fractions were eluted using 1% formic acid in diethyl ether on a pre-acidified column. The acidic fraction was subsequently methylated Aare entering the Rhine, and a Rhine catchment upstream of the Aare entering the Rhine at 60 °C for 1 h using 1 ml of 14% BF 3 in MeOH. Fatty acid methyl esters were extracted from the solution by agitating it four times with 2 mL hexane in the presence of 1 mL of 0.1 M KCl. The δ 13 C FA isotopic ratio was measured using a Trace 1310 GC instrument interfaced online through a GC-Isolink II to a Conflo IV and Delta V Plus isotope ratio mass spectrometer (Thermo Fisher Scientific) as described by Lavrieux et al. (2019). Nitrogen concentrations and δ 15 N values for source soils were measured by Flash EA (Thermo Finnigan Delta plus XP mass spectrometer coupled with Flash EA 1112 series elemental analyser supplied by Thermo Finnigan, Waltham, MA, USA). Carbon and nitrogen stable isotope ratios were reported in delta notation, per mil deviation from Vienna Pee Dee Belemnite (VPDB) and atmospheric nitrogen (AIR) respectively.

Mathematical mixtures
Mathematical mixtures were generated using the mean stable isotopic ratio and mean concentration of tracers (i.e., bulk N, FA (24,26,28,30) ) for arable (n = 10), pasture (n = 7), and forest (n = 11) soil samples. Proportions of source contributions were created using a random number generator sampling from a Dirichlet distribution between 0 and 100 with the condition that the sum of source proportions must equal 1. The python script used to generate mathematical mixtures is appended in the Online Resource 2 (an excel version of the mathematical mixture formulation is appended as Online Resource 3). To ensure evenly distributed mixing proportions of each source, 150 mathematical mixtures were generated (the mean of each source proportion of 150 mixtures was ~ 33%). Concentration-dependent mathematical mixtures were generated as shown in Eq. (1).
Where V is the mean isotopic value of the tracer t, C refers to the mean concentration for all ( ∀) tracers in a set ( ∈) of tracers T in source S. S O refers to the number of sources and P refers to the known proportions of the mathematical mixtures.

End member mixing model
Mathematical mixtures were modelled using the opensource MixSIAR R package . MixSIAR was run with concentration dependency utilising the concentration of both FAs and N, transforming the unmixing of isotopes to the unmixing sediment/mixtures. Therefore, an organic matter correction was not applied post hoc to prevent a secondary transformation. Priors were set to uninformative and all MixSIAR runs used the same model parameters: chains = 3, chain length = 3,000,000, thin = 500, burn = 2,700,000 with a 'very long' run time. The convergence of the mixing model was assessed by using the Gelman-Rubin diagnostic, with model output being rejected if variables scored > 1.0. The R script used for all models is appended in the Online Resource 4. While the 'residual x process' error structure has been applied appropriately to multiple mixture samples (Cooper et al. 2015;Smith et al. 2018;Upadhayay et al. 2018b;Blake et al. 2018;Vale et al. 2022), likely the high cost and processing time of the analysis of CSSI has led to the predominant use of 'process only' error structure using single mixture samples (Gateuille et al. 2019;Reiffarth et al. 2019;Liu and Han 2021). As such, a single sample of each mixture proportion was unmixed in MixSIAR using the "process only" error structure, in which the variation in the mixtures is assumed to be fully dependent on the variation in the sources (Smith et al. 2018).

Model evaluation
The probabilistic output of MixSIAR should be evaluated using probabilistic metrics rather than deterministic metrics such as mean absolute error ). The continuously ranked probability score (CRPS) (Matheson and Winkler 1976) is a generalization of the mean absolute error toward a probabilistic perspective and can be thought of as the total displacement needed to move the output distribution density to the observed single outcome or known mixture proportion. CRPS is negatively orientated with smaller values equating to better model performance. A perfect score of 0 represents the entire density of the output placed exactly on the outcome value. Deviation from the perfect score results from a lower density of probability around the observed value. CRPS has provided a useful metric to account for both accuracy and precision of unmixing models and has been suggested to be particularly applicable for model comparison and tracer selection analysis . CRPS was calculated using the python package 'properscoring' and is used to report on individual model performance.
The CRPS of all tracer combinations are further used to evaluate the accuracy of using LDA for tracer selection. Using the R package 'KlaR', the model performance of the tracer section by a stepwise forward variable selection using the Wilk's Lambda criterion (niveau = 0.1) is compared to the empirically selected optimal tracer combination with the lowest CRPS.
Model comparisons are then evaluated by the continuously ranked probability skill score (CRP skill score) shown in Eq. (2) (Pedro et al. 2018). The CRP skill score is a comparative metric of the accuracy and precision between two mode outputs. Where CRPS m and CRPS ref are the CRPS of the new model (the model compared, e.g., δ 15 N + δ 13 C all FA's) and the reference model (the model compared against, e.g., only δ 13 C all FA's.) respectively.
Negative CRP skill score values indicate the new model does not outperform the reference model as the newer model requires more displacement of output distribution density to be shifted onto the known value than the reference model. A value of one indicates that the newer model has a perfect skill score compared to the reference model (Pedro et al. 2018).

Tracer selection and prediction bias analysis (PBA)
The unmixing performance of ideal tracers should be independent of the source contribution, as contributiondependent model performance is an indication of prediction bias. Predictive bias and dominant source effects on model output have been previously recognised and suggested to be an effect of poor tracer source discrimination (Vale et al. 2022). The hypothesis that FA tracers have similar and non-informative mixing spaces infers that additional FA tracers provide minimal additional source discrimination information. If the hypothesis is true, predictive bias will be decreased by reducing the number of tracers with a non-unique mixing spaces (e.g., FA tracers) as any added source uncertainty is removed. To assess if apportionment estimates occur with predictive bias, known source proportions of mathematical mixtures are plotted against the model performance (CRPS) for each source (predictive bias analysis-PBA). Two tracer sets (δ 15 N + all δ 13 C FAs and δ 15 N + δ 13 C FA 26 ) were used to illustrate the effect of reducing the number of tracers on predictive bias.

Non-informative tracers -scaling and discrimination analysis (SDA)
MixSIAR uses the relative source-sediment-source positions for un-mixing. Therefore, tracers that exhibit differences in their mixing space by only direct isometry translation, can (2) CRP Skill score = 1 − (CRPS m ∕ CRPS ref ) be seen mathematically, as being identical and potentially non-informative. To evaluate if δ 13 C FA tracers have nonunique mixing spaces and are direct isometry translations of each other, a novel scaling and discrimination analysis (SDA) was developed. Scikit-learn's MinMaxScaler package (Pedregosa et al. 2011) was used to scale tracer values between 0 and 1 across all sources. Scaling retains the relative location, shape, and distribution of the sources for each tracer, enabling comparison of relative source locations between tracers. A Kruskal Wallis H-test (p < 0.05) was used to evaluate the similarity between the relative source locations of the scaled tracers. Scaled tracers depicting a nonsignificant difference in source locations will consequently have mixing spaces which are direct isometry translations and can therefore be seen as non-informative.

Non-informative tracers -clone tracer analysis
To evaluate MixSIAR's effectiveness to model tracers with non-unique mixing spaces, δ 13 C FA 26 was utilized as a noninformative clone tracer (an exact copy of a tracer used as an additional tracer). This clone tracer was then added three times to the δ 15 N + δ 13 C FA 26 tracer set. Each addition of FA 26 was evaluated individually by CRPS to identify the effect of additional non-informative tracers. The comparison of model performance using additional FA tracers and non-informative clone tracers is then used to quantify the information gained by using an additional FA tracer.

Source tracer values
To reduce errors associated with input from non-terrestrial plant-derived FAs, only VLCFAs (FA 22:0 -FA 30:0 ) (hereby referred to as FAs) were considered (Alewell et al. 2016;Reiffarth et al. 2016;Upadhayay et al. 2017;Lavrieux et al. 2019). Forest sources contained the highest concentration of FAs (mean: 19.4 µg g −1 , SD: 5.0 µg g −1 ) and the most δ 13 C enriched isotopic values for all FA tracers (mean δ 13 C: -33.4 ‰, SD: 1.3 ‰), (Tables 1 and 2, Fig. 2). Pasture sources had the most δ 13 C depleted isotopic values for all FAs (mean δ 13 C: -36.2 ‰, SD: 1.4 ‰) and mid-ranged FA concentrations (mean: 11.9 µg g −1 , SD: 1.9 µg g −1 ). Arable sources contained the lowest concentration of FAs (mean: 7.7 µg g −1 , SD: 1.1 µg g −1 ) and mid-ranged FA isotopic values (mean δ 13 C: -35.0 ‰, SD: 1.7 ‰). The δ 13 C FA values for these land uses are similar to previous findings in fresh biomass (Chikaraishi et al. 2004) and soils from the same land use classification in similar geographic and climate regions (Alewell et al. 2016;Lavrieux et al. 2019;Hirave et al. 2021). The δ 15 N value of soil reflects the isotopic signature of nitrogen inputs, outputs and internal processes of the system (Amundson et al. 2003). δ 15 N values ranged from a mean of 6.3 ‰ (SD 0.9 ‰) in arable land to 4.0 ‰ (SD 0.9 ‰) in pasture and 0.0 ‰ (SD 1.6 ‰) in forest soil. Nitrogen concentrations ranged from a mean 0.5 mg g −1 (SD 0.1 mg g −1 ) in pasture to a 0.4 mg g −1 (SD 0.1 mg g −1 ) in forest and 0.3 mg g −1 (SD 0.1 mg g −1 ) in arable soil (Tables 1 and 2,  Fig. 2). The δ 15 N values are comparable to previous results of similar land uses (Fox and Papanicolaou 2007;Mukundan et al. 2010). Source tracer distribution are similar to those in the literature (Fox and Papanicolaou 2007;Mukundan et al. 2010;Alewell et al. 2016;Lavrieux et al. 2019;Hirave et al. 2021). As such, we found the samples representative of their land use classification and therefore are suitable for the mathematical mixture analysis of this study. However, we suggest that further source soil sampling should be done for the reliable unmixing of real suspended sediment. The full source value data set is appended in the Online Resource 5.

Source discrimination and mixing line origins
The discriminative power of the isotopic tracers between each possible source pair was tested before MixSIAR modelling. 93% of all tracers significantly discriminated between all pairs of sources (Kruskal Wallis, p < 0.05). Only δ 13 C FA 24:0 did not discriminate between arable and forest sources (Fig. 2). All land uses displayed similar δ 13 C depletion with increasing alkyl chain length (x̄ = -1.1 ‰ δ 13 C per two additional carbon atoms, SD = 0.13 ‰) (Fig. 3).
The results are consistent with the literature that suggests a depletion of up to -2.7 ‰ in C 3 plants from FA 24:0 to FA 32:0 (Agrawal et al. 2014; and references within). The small variation of δ 13 C enrichment and alkyl chain length (SD = 0.13) between all sources suggests the δ 13 C enrichment during FA elongation is not land-use dependent. The similar enrichment of the mean δ 13 C FAs with increasing alkyl length (forest: R 2 = 0.999, arable: R 2 = 0.952 and pasture: R 2 = 0.995) (Fig. 3) results in a mixing line for all FA tracers (Fig. 4) with the isotopic value of forest and pasture located at either end of the mixing line.

The mixing line problem
The FA mixing line illustrated in Fig. 4 is present in δ 13 C FA sediment fingerprinting studies with a similar land-use classification of sediment sources (Alewell et al. 2016;Lavrieux et al. 2019). The linear mixing line problem is not confined to isotopic tracers. Colour (Barthod et al. 2015) and geochemical tracers (Bouchez et al. 2011) have also presented a linear mixing line. The similar alkyl length-δ 13 C relationship of the different land uses is a result of the same mechanistic process of FA elongation for all land uses. Interestingly, this effect is not observed in all reported cases of arable, pasture and forest land uses (Upadhayay et al. 2020;Lizaga et al. 2021). Deviation from this relationship and the absence of a mixing line could indicate that the previous land use or crop type contained a higher concentration of a specific FA, which is now more present in the current soil compared to other legacy FAs. Conservative tracers, such as FAs, can persist in the soil after a change in land management (Upadhayay et al. 2020). Swales and Gibbs (2020) demonstrated that isotopic shifts occur during a land use transition, and therefore, past land use management should be considered when grouping source soils. This legacy effect has the potential to increases the uncertainty in source distributions and reduce source discrimination and subsequent unmixing performance. However, the legacy effect can potentially be used beneficially for fingerprinting if sources are grouped by their crop cycle rather than the current crop.

Evaluation of mathematical mixtures
All possible permutations and combinations of δ 13 C FA tracers (n = 11) were evaluated using 150 concentration-dependent mathematical mixtures (Fig. 5A). Results demonstrated a general increase in CRPS of all sources as the number of δ 13 C FA tracers is increased (2 FAs mean CRPS: 0.165, 3 FAs: 0.188, 4 FAs: 0.195). A summary of all tracer combination CRPS is appended in the Online Resource 1. The elevated errors for the arable source contributions (mean CRPS: 0.260) are probably directly related to the location of arable FAs in the mixing space, resulting in the underreporting of arable sources with their contributions likely being misclassified as pasture. Misclassification of the arable source contribution potentially results in the overestimation of pasture contributions, as suggested by pasture having the second highest CRPS value (mean CRPS: 0.209) for all δ 13 C FA tracers sets. Strong discrimination of forest sources resulted in a relatively low CRPS value (mean CRPS: 0.058) for all sets of tracers. Forest apportionment estimates were relatively independent of the number of δ 13 C FA tracers, suggesting that any additional source-based Fig. 3 All land uses displayed a similar δ 13 C depletion with increasing alkyl chain length (mean = -1.1 ‰ δ 13 C per two additional carbon atoms) indicating the δ 13 C enrichment during FA elongation is not land-use dependent. Uncertainty is depicted with 95% confidence intervals uncertainty induced by additional δ 13 C FAs was out-weighed by the beneficial source discrimination gained. This collaborates with the iso plots that display low source uncertainty of the forest using any δ 13 C FA tracer (Fig. 4). Although, our findings differ from those of Vale et al. (2022), who reported that the forest source apportionment had the highest mean absolute error (MAE) among all sources. Both studies demonstrate that the sources with a higher source discrimination have increased model performance. This indicates the ability to discriminate between sources is likely a crucial factor in model performance. Nonetheless, there is a necessity for catchment specific apportionment validation as source discrimination is highly variable even with similar land use groups. Overall, the performance of the model is more dependent on the number of δ 13 C FA tracers rather than the selection of individual tracers due to mixing space similarities.
Including δ 15 N to offset the mixing line for all combinations of tracers (n = 15) increased the performance of the model of all sources (FA combinations mean CRPS: 0.175, δ 15 N + FA combinations mean CRPS: 0.091) (Fig. 5B). Importantly, pasture and arable source apportionment estimates decreased in performance with additional δ 13 C FA tracers (Pasture mean CRPS: δ 15 N + 1 FA 0.073,2 FAs CRPS: 0.102, 3 FAs: 0.161, 4 FAs: 0.184 and arable mean CRPS: δ 15 N + 1 FA 0.079,2 FAs CRPS: 0.109, 3 FAs: 0.172, Fig. 4 Iso plots of δ 15 N and δ 13 C FA with colours indicating land use type. δ 13 C FA tracers present the mixing line problem that occurred using FA tracers only. It is unlikely that there will be a perfect 1:1 mixing line when there are multiple samples for each source. Nonetheless, the central location of one source consistently between two other source end members will create challenges (i.e. central source(s) being misclassified as a contribution from endpoint sources) during the modelling process. The addition of δ 15 N expands the mixing line/space to more of a mixing polygon that provides the source discrimination necessary for more accurate and less uncertain model results

Fig. 5
Mean CRPS of all possible permutations and combinations of A δ 13 C FA tracers (n = 11) and B δ 15 N and all δ 13 C FA tracers (n = 15). Tracer sets were evaluated using 150 concentration-dependent mathematical mixtures with CRPS (a higher CRPS indicates lower performance) 4 FAs: 0.184) suggesting any beneficial source discrimination by additional δ 13 C FA tracers is out-weighted by the increase in source-based uncertainty. Further evidence supporting these results is the iso plots that depict a large intersection between the arable and pasture source groups for all δ 13 C FA tracers (Fig. 4). Consequently, the mixing space shifts from a mixing line to a mixing polygon with the inclusion of δ 15 N, reducing pasture-arable misclassification.

Tracer selection by the analysis of all combinations
The benefit of using δ 15 N as a mixing line offset tracer is presented in Fig. 6 with the solid line indicating perfect fit (i.e., estimated proportion equals the known proportion). Figure 6(A) highlights the only δ 13 C FA tracer set's inaccurate and underestimated apportionment of arable contribution and the overestimation of pasture contribution. Again, the inaccuracy can be attributed to the central location of arable in the mixing space for all δ 13 C FA tracers, causing an underestimation of arable contributions as they are misclassified as pasture contributions. This centralised source location challenge and misclassification has been presented previously by Alewell et al. (2016) and Lavrieux et al. (2019).
The inclusion of the δ 15 N tracer reduced the overestimation of pasture and the underestimation of the contribution of arable sources (Fig. 6B). Using the CRP skill score as a comparative model performance metric, the expansion of the mixing space using δ 15 N had a mean 18% (median 22%) CRP skill score model improvement compared to using only δ 13 C FAs. The improvement of the model output using δ 15 N as an additional tracer was expected by the expansion of the δ 13 C FA linear mixing line into a more suitable mixing polygon.
When examining all potential tracer combinations, δ 15 N + δ 13 C FA 26 (Fig. 5C) had the best model performance for all permutations with a mean 16% (median 62%) improvement compared to δ 15 N + all δ 13 C FAs (Fig. 5B). The offset between mean and median is a result of the model predicting low contributions for all arable mixture proportions, at these low arable contributions, the model is likely correct for the wrong reasons. The increase in the accuracy and uncertainty of estimated source apportionment using δ 15 N + δ 13 C FA 26 (Fig. 6C) results in the known source proportions being bracketed by the estimated values. The increase in uncertainty of the model compared to using δ 15 N + all δ 13 C FA tracers suggests that a reduction in the number of δ 13 C FA tracers increases the number of possible solutions to the unmixing equation.
Although LDA is commonly used to optimise the power of discrimination when handling a large number of tracers, it is used irregularly for CSSI tracer selection. The accuracy of using LDA for tracer selection was investigated with a stepwise forward variable selection using the Wilk's Lambda criterion (niveau = 0.1), which selected δ 15 N + δ 13 C FA 24 + δ 13 C FA 26 as the optimal tracer set (LDA reclassification score 89%). Interestingly, the model performance of the LDA selected tracers decreased by 24% compared to δ 15 N + δ 13 C FA 26 . The poor performance of the LDA selected tracers may be attributable to the mixing model's inclusion of concentration dependency, which is ignored by the LDA.

Tracer selection and prediction bias analysis (PBA)
Predictive bias and the impact of the dominant source on model output has been identified previously in sediment fingerprinting and been described as a product of the source discrimination (Vale et al. 2022). Ideal tracers should contain enough discrimination power for null predictive bias; however, this is not the case with real tracers. To assess if predictive bias effects are reduced by the removal of tracers which have non-unique mixing spaces, known source proportions of mathematical mixtures are plotted against the model performance (CRPS) for each source (predictive bias analysis-PBA) (Figs. 7 and 8). PBA of δ 15 N + all δ 13 C FA and δ 15 N + δ 13 C FA 26 was used to illustrate the effect of reducing source uncertainty by removing non-informative tracers.
PBA of the δ 15 N + all δ 13 C FA tracer set illustrates the decrease in arable and pasture performance with increasing arable contribution (Fig. 7). The extremely similar and linear relationship between arable and pasture CRPS is strong evidence for the misclassification of arable and pasture as the model underestimates and overestimates contributions from arable and pasture respectively. The clear discrimination of the forest source for all tracers (Fig. 4) resulted in the performance of the forest estimates being not affected by different source contributions. The PBA of the δ 15 N + δ 13 C FA 26 tracer set depicts a reduction in the linear regression slope indicating a reduction in predictive bias effects (Fig. 8). It can be assumed that this is the result of a reduction in source uncertainty, when using a single FA tracer. The PBA highlights the balance between the source uncertainty error and the discriminative information gained by additional tracers.

Identifying non-informative tracers by scaling and discrimination analysis (SDA)
The tracers' balance of source discrimination and source uncertainty is determined on a regular basis using boxplots and a Kruskal Wallis test (Fig. 2). However, tracers are not independent factors and work in the mixing model simultaneously. The current approach to tracer selection is to see if individual tracers can distinguish between sources. As an alternative, we investigated whether it is possible to distinguish various FA tracers based on mixing space.

3
The majority of δ 13 C FA tracers (94%) had significantly different source distributions (p < 0.05) (except δ 13 C FA 28 -FA 30 in arable) (Fig. 9A, Table 3, Left). This can lead to the assumption that each tracer has valuable information for the model. However, the difference between absolute source distributions of each tracer (distance of source distribution from 0) is caused by each FA tracer being depleted by approximately -1.1 ‰ δ 13 C per two additional carbon atoms (similarly shown by Chikaraishi et al. 2004) (Fig. 3). Considering that MixSIAR uses relative (source-source) tracer vales rather than the absolute tracer value, tracers which demonstrate modification of all source distributions by direct isometric translation (e.g., every point/source of the mixing shape is moved in the same distance and in the same direction) can be considered mathematically nonunique in terms of mixing space.
To provide an alternative and more robust line of evidence of non-unique mixing spaces of FA tracers, the tracer values were scaled between 0 and 1 across all sources. Scaling retains the relative location, shape, and distribution of The scaled value of δ 15 N was shown to be significantly different (p < 0.05) to all δ 13 C FAs for all land uses (Fig. 9B, Table 3, Right). In contrast, only 17% of FA tracers had significant differences between any of the scaled source values. Pasture had no significant differences between any scaled δ 13 C FA. Forest and arable only had a significant difference between two and one pair of scaled tracers, respectively (Forest: δ 13 C FA 26 -FA 30:0 and δ 13 C FA 28 -FA 30 . Arable: δ 13 C FA 24 -FA 30 ) (Fig. 9B). The minimal but present uniqueness of mixing space for 20% of the FAs in the forest source can be assumed to be caused by a more biodiverse FA input, while the non-uniqueness of 10% of FAs in arable sources could be a result of the legacy tracer signal from crop rotation (Upadhayay et al. 2020). δ 13 C depletion during the FA elongation processes appears to be similar for all land uses, with any land-use-specific isotopic variation during FA elongation being negligible when compared to the intra-source variability. The linear relationship between δ 13 C and FA alkyl length causes FA tracers to be direct isometry translations of each other and consequently, there is minimal significant differences between the relative source locations of each FA tracer (Fig. 9B) and as such, the mixing space can be thought of as being non-unique for all δ 13 C FA tracers. The similarities between scaled source values for all tracers are illustrated in Fig. 10. δ 15 N is depicted to have to have non-translation transformations of the mixing shape compared to FAs. The similarities in the mixing shape for all FAs indicate that direct isometry translation is present between different FAs, making multiple FA tracers nonunique and non-informative.
Considering MixSIAR uses the relative source-sediment-source positions for un-mixing, any modification of the mixing space by only direct isometry translation has a null effect on the mixing space. Therefore, any tracer with a mixing space that is a direct isometry translation of another tracer can be seen as almost identical and either one of the tracers is non-informative.

Non-informative tracers -clone tracer analysis
To assess MixSIAR's performance when using tracers with identical mixing spaces, a non-informative clone tracer (an exact copy of a tracer used as an additional tracer) was used as a direct approach to test non-informative tracer behaviour. Three sequential additions of the clone tracer δ 13 C FA 26 were added to the δ 15 N + δ 13 C FA 26 tracer set (Fig. 11A). Increasing the number of clone tracers decreased the model performance (δ 15 N + 1 × δ 13 C FA 26 CRPS: 0.034, 2 × FA 26 : 0.181,3 × FA 26 : 0.199,4 × FA 26 : 0.202). This effect can be attributed to the lack of any additional beneficial information when using tracers with non-unique mixing spaces, whilst the source uncertainty error induced by adding multiple clone tracers is propagated. In this study, these results disagree with the notion that MixSIAR handles non-informative tracers sufficiently (Smith et al. 2018).
Optimizing model performance strives to balance new beneficial source discrimination and the additional source uncertainty brought to the model by each additional tracer. Figure 11A indicates that when using a clone tracer, the source uncertainty is propagated until the addition of a fourth tracer. The difference in CRPS between clone tracers and different FAs was used as a measure of information gain when using additional FA tracers. The mean CRPS of different FA tracer combinations with the same number of tracers Fig. 8 The prediction bias analysis of δ 15 N and δ 13 C FA 26 shows how the model's performance for each source is impacted by different source proportions, with the contribution of each source plotted against the mean model's performance (where higher CRPS values indicate lower performance). By comparing the linear regression slope to Fig. 7, it is clear that that there is a decrease in predictive bias with less non informative tracers displayed a similar trend to that when adding additional clone tracers (δ 15 N + 1 × δ 13 C FA CRPS: 0.060, 2 × FA: 0.081, 3 × FA: 0.124, 4 × FA: 0.138) (Fig. 11B). Small nontranslation modifications of FA mixing spaces resulted in the CRPS using additional different FA tracers being generally lower (mean 22%) than additional clone tracers. Therefore, from a mathematical perspective, different FAs are not completely non-informative. Although, from a practical perspective additional FA tracers are essentially non-informative, as any beneficial information gained is outweighed by the error added from the propagation of source uncertainty.
Indeed, this approach is highly experimental, and it is unlikely that you will have truly identical tracers in the field. Nonetheless, this method demonstrates that noninformative tracers can add bias to a model, as additional FA tracers may bring limited additional information for unmixing. When using different FAs, our results demonstrate that the error gained by mixing spaces translation effects outweighs the information gained from non-translation modification. This, however, may not be the case for all catchments and tracers.
An intriguing area of investigation is how the balance of source discrimination and tracer mixing space similarities effect model performance. CSSI of FAs have a relatively narrow range of possible source values (ca. 10-40 ‰) compared to other tracers (e.g., geochemistry). When tracers with a higher degree of source discrimination, though identical mixing spaces, are modelled, the propagation of source uncertainty may be out weighted and potentially result in improved model performance.

Conclusion
Using mathematical mixtures, the addition of δ 15 N to expand the CSSI FA mixing line, improved the model by 22% compared to using only δ 13 C FAs. The evaluation of possible combinations of tracers indicated that δ 15 N + δ 13 C FA 26 was the optimal tracer set and had a 62% improvement compared to δ 15 N + all δ 13 C FAs. LDA tracer selection is regularly used in the literature to select the optimal suite of tracers to increase model performance. However, in this case, the tracers selected by the LDA did not provide the optimal tracer selection. Additional δ 13 C FA tracers had a negative influence on model performance, indicating that increasing the number of conservative tracers does not necessarily result in improved performance, as previously suggested when using a Bayesian framework. However, the reduction in number of tracers will increase the influence of any non-conservative tracers. As mathematical mixtures, by definition, do not contain non-conservative tracers, the potential influence of nonconservative tracers needs careful consideration when apportioning sediment sources. Our results indicated there is a reduction of predictive bias when using a single FA tracer. Using a novel SDA test, additional FAs were shown to have non-unique mixing spaces. Considering MixSIAR uses the relative sourcesediment-source positions for un-mixing, any tracer which exhibits a non-unique mixing space can be seen as noninformative. Using a clone tracer to evaluate MixSIAR's performance handling non-informative tracers resulted in strong evidence of MixSIAR's insufficient handling of tracers with non-unique mixing spaces. In particular, model performance decreased when using additional FA as well as clone tracers.
Land-use-specific sediment source apportionment using FA CSSIs requires a supplemental offset tracer that is not dependent on the C 3 -C 4 discrimination pathway. Since a single FA CSSI had the best performance with an additional offset tracer, Fig. 11 Comparison between model performance using A additional clone tracers (δ 13 C FA 26 ) and B additional different δ 13 C FA tracers to the δ 15 N + δ. 13 C FA 26 tracer set. The mean CRPS of all tracer combinations with the same number of tracers is used for additional FA tracers to improve the representative of results (higher CRPS indicates lower performance) 1 3 an alternative single tracer to FA CSSIs that uses the C 3 -C 4 discrimination pathway for source discrimination such as bulk isotopes may be more accessible and have similar unmixing performance. However, the conservativeness and unmixing performance of these tracers need to be explored further; the latter can be evaluated confidently by using mathematical mixtures. Even though adding δ 15 N as a tracer in this study outperformed the combination of several FA CSSI, δ 15 N may be prone to isotopic fractionation during the degradation of molecules and thus may not meet the requirement of a conservative tracer under real world situations, where molecules are subject to transport and possible degradation. δ 15 N may be useful in scenarios where the balance between beneficial information gained by improving source discrimination outweighs any effect of modification or fractionation of the tracer during sediment mobilization, transport and deposition processes. Here, we capitalized on the availability of δ 15 N data (which is analysed simultaneously with bulk δ 13 C) to demonstrate the utility of additional tracers that have an alternative mixing space.
In fingerprinting applications, additional tracer selection steps should be considered, including: 1) checking the uniqueness of tracer mixing spaces by SDA, with the removal of tracers that show non-unique mixing spaces, and 2) where feasible, analysing all combinations and permutations of tracers using mathematical mixtures to further optimize tracer selection. Although computationally intensive, it can help identify the optimal tracer suite for modelling. Even though this method is applied to FA CSSI and δ 15 N tracers in this study, this method is potentially appropriate for broader application to identify non-informative tracers. This includes multiple fingerprinting parameters (e.g. fallout radionuclides, spectra and geochemical tracers) in which the co-linearity of tracers is not uncommon. However, we suggest further exploration of mathematical mixtures to determine the effect of different error structures on model performance and the validity of organic matter or particle size corrections. We anticipate that the use of mathematical mixtures and tracer combinations as a decisive tracer selection step will enable a wider range of applications for sediment fingerprinting, improve our knowledge of the dynamics of soil and sediment in the environment, and enhance soil erosion mitigation techniques.