Garnet major-element composition as an indicator of host-rock type: a machine learning approach using the random forest classifier

The major-element chemical composition of garnet provides valuable petrogenetic information, particularly in metamorphic rocks. When facing detrital garnet, information about the bulk-rock composition and mineral paragenesis of the initial garnet-bearing host-rock is absent. This prevents the application of chemical thermo-barometric techniques and calls for quantitative empirical approaches. Here we present a garnet host-rock discrimination scheme that is based on a random forest machine-learning algorithm trained on a large dataset of 13,615 chemical analyses of garnet that covers a wide variety of garnet-bearing lithologies. Considering the out-of-bag error, the scheme correctly predicts the original garnet host-rock in (i) > 95% concerning the setting, that is either mantle, metamorphic, igneous, or metasomatic; (ii) > 84% concerning the metamorphic facies, that is either blueschist/greenschist, amphibolite, granulite, or eclogite/ultrahigh-pressure; and (iii) > 93% concerning the host-rock bulk composition, that is either intermediate–felsic/metasedimentary, mafic, ultramafic, alkaline, or calc–silicate. The wide coverage of potential host rocks, the detailed prediction classes, the high discrimination rates, and the successfully tested real-case applications demonstrate that the introduced scheme overcomes many issues related to previous schemes. This highlights the potential of transferring the applied discrimination strategy to the broad range of detrital minerals beyond garnet. For easy and quick usage, a freely accessible web app is provided that guides the user in five steps from garnet composition to prediction results including data visualization.


Introduction
Garnet is one of the most useful minerals in Earth sciences providing key information on mantle, metamorphic, metasomatic, and igneous processes (e.g., Baxter et al. 2013).
Due to its widespread occurrence and its wide compositional range that mainly depends on pressure, temperature, and host-rock composition, garnet major-element chemistry became a well-established tool in sedimentary provenance analysis (e.g., Krippner et al. 2014) and economic exploration campaigns (e.g., Hardman et al. 2018). Apart from discriminating different source regions, the potential capability of extracting petrogenetic host-rock information in terms of geological setting, metamorphic conditions, and composition by solely considering garnet single grains is of particular significance in interdisciplinary research linking sedimentary, metamorphic, tectonic, and geodynamic processes (e.g., Schönig et al. 2018a).
For instance, our knowledge about the evolution of metamorphism and plate tectonic regimes through time is mainly based on the preserved crystalline rock record (e.g., Brown and Johnson 2018;Holder et al. 2019), which is increasingly incomplete with increasing age (e.g., Goodwin 1996). To overcome this issue, detrital zircon is commonly used Communicated by Daniela Rubatto. and has been shown to provide important information on eroded crustal rocks, enabling a more reliable global-scale reconstruction of continental growth and igneous suites throughout Earth history (e.g., Dhuime et al. 2017, and references therein). Similar approaches to reconstruct metamorphic conditions and/or plate tectonic regimes via the detrital record are absent, but garnet major-element chemistry provides a potential tool to tackle this issue. However, to link detrital garnet composition with petrogenetic host-rock information, a robust statistical model with reliable prediction success is required.
Since the first application of detrital garnet major-element chemistry to constrain different source regions (Morton 1985), several petrogenetic schemes have been developed to discriminate specific garnet host rocks (Teraoka et al. 1998;Schulze 2003;Mange and Morton 2007;Aubrecht et al. 2009;Suggate and Hall 2014;Hardman et al. 2018;Tolosana-Delgado et al. 2018). Until 2018, all schemes used strict and subjective compositional fields to discriminate garnet source rocks by considering a subset of available variables. Krippner et al. (2014) and Suggate and Hall (2014) demonstrated that this approach leads to high misclassification rates due to large compositional overlaps of garnets from various source rocks. Notably, although garnet growth stages are often preserved by compositional zoning (e.g., Tracy et al. 1976;Tracy 1982), the composition does (i) not always pinpoint unique growth conditions (e.g., ), (ii) may be influenced by progressive fractionation of the reactive bulk-rock composition (e.g., , and (iii) may be modified by post-growth diffusion at high temperatures (e.g., Caddick et al. 2010).
Recently, the usage of robust multivariate statistics has been shown to significantly improve classification results (Hardman et al. 2018;Tolosana-Delgado et al. 2018). While Hardman et al. (2018) focus on the discrimination of mantle-versus-crustal garnet, Tolosana-Delgado et al. (2018) consider five major garnet host-rock types including amphibolite-, granulite-, and eclogite-facies metamorphic rocks, igneous rocks, and ultramafic rocks. The capabilities of extracting petrogenetic information from the detrital garnet record have further been improved by considering trace elements (Čopjaková et al. 2005;Hong et al. 2020), mineral inclusions (e.g., Schönig et al. 2019Schönig et al. , 2020Baldwin et al. 2021), U-Pb geochronology (e.g., Seman et al. 2017;Millonig et al. 2020), as well as Sm-Nd geochronology (Maneiro et al. 2019). However, all these novel techniques require a wealth of experience, caution, equipment, and effort. In contrast, major-element chemistry is routinely applied and enables to efficiently screen a statistically significant number of grains. Although multivariate schemes are already highly advanced, these may be improved by (i) enlarging the database, (ii) including previously unconsidered host-rock types, (iii) considering host-rock composition, (iv) refining prediction classes, and (iv) applying statistical classification methods with enough flexibility to disentangle the strongly overlapping compositional signatures of garnet types. Particularly machine-learning algorithms are suitable to tackle multivariate discrimination tasks as demonstrated by prediction models that are based on bulk-rock chemistry (Petrelli and Perugini 2016;Petrelli et al. 2017;Ren et al. 2019), glass composition (Bolton et al. 2020), as well as single-grain chemistry Itano et al. 2020). In addition, random forest regression led to an improvement of barometric predictions for majoritic garnet inclusions in diamond that crystallised at P > 5 GPa (Thomson et al. 2021), implying that the algorithm may prove suitable for developing a model to discriminate detrital garnet sourced from a potentially broad range of host rocks.
Here we present a new garnet major-element discrimination scheme for host-rock prediction developed by a machine learning approach using the random forest algorithm in classification mode (Breiman 2001). It is based on a large database compiled from the literature (n = 13,615), which was used to train and test the discrimination model simultaneously. The new scheme enables garnet discrimination into host-rock setting (mantle versus metamorphic versus igneous versus metasomatic), metamorphic facies (blueschist/greenschist versus amphibolite versus granulite versus eclogite/ultrahigh-pressure), and composition (intermediate-felsic/metasedimentary versus mafic versus ultramafic versus alkaline versus calc-silicates). Besides providing a more detailed classification than any previous scheme, we highlight the usefulness of our scheme by demonstrating the much higher discrimination success and interpretability of results compared to others. We introduce a freely accessible web application, allowing users to easily apply the discrimination scheme to their own data without the need for programming expertise. Finally, application examples emphasize the potential to reflect (i) varying pressure-temperature conditions during garnet growth, (ii) the geological framework of catchments, and (iii) provenance shifts through time.

Database
The compiled chemical garnet database (Supplementary Information 1, https://rodare.hzdr.de/record/1220) includes 13,615 observations of eight oxides commonly analysed in lab routines: SiO 2 , TiO 2 , Al 2 O 3 , Cr 2 O 3 , FeO total , MnO, MgO, and CaO (in wt%). Data compilation results from a comprehensive literature survey and benefitted from several published databases, most importantly to mention are Grütter et al. (2004), Krippner et al. (2014), Suggate and Hall (2014), and Hardman et al. (2018). Original references have been cross-checked to promote database quality. If any of the eight oxides is not included on any particular observation, that value is handled as not available (NA). As most standard routines use a detection limit between 0.02 and 0.03 wt%, we chose 0.03 wt% as the threshold and all values below are handled as below detection limit (BDL).

Model development
For developing the garnet discrimination scheme, the random forest machine-learning algorithm (Breiman 2001) is applied in classification mode. A description of the principle of creating a random forest classification model is given in Supplementary Information 2. For a more detailed and mathematically based explanation, the reader is referred to the original work of Breiman (2001) and reviews treating this topic (Boulesteix et al. 2012;Ziegler and König 2014;Belgiu and Drăgut 2016;Biau and Scornet 2016).
Data processing, calculations, and plotting were performed using the statistic software R (R Core Team 2020). Used packages include 'compositions' (van den Boogart and Tolosana-Delgado 2008) for compositional data analysis, 'dplyr' (Wickham et al. 2020) for data wrangling, 'ggtern' (Hamilton and Ferry 2018) for display, 'magittr' (Bache and Wickham 2014) for readability of complex code, and 'ran-domForest' (Liaw and Wiener 2002) for the calculations of the forest itself.
Two models are developed called 'setting and metamorphic facies' and 'composition'. For the 'setting and metamorphic facies' model, main groups and groups of Table 1 have partially been merged into seven classes, namely garnet of (i) mantle rocks (class MA, includes main group MA), (ii) blueschist-/greenschist-facies metamorphic rocks (class BS/ GS, includes groups BS and GS), (iii) amphibolite-facies metamorphic rocks (class AM, includes group AM), (iv) granulite-facies metamorphic rocks (class GR, includes group GR), (v) eclogite-/ultrahigh-pressure-facies metamorphic rocks (class EC/UHP, includes groups EC and UHP),   Table 2). Each observation in the database includes eight variables in form of oxide wt% (see "Database"section). These observations have been acquired by many operators since major-element analysis has become a standard analytical tool. Thus, observations in the database include a wide range of used systems, analysed oxides, calibrations, operating conditions, and data processing techniques. This mainly results in discrepancies in the total wt% of the eight oxides considered for discrimination, making the amount of the whole (or total sum by sample) a non-informative quantity. In addition, all chemical components are rarely analysed, but rather the amount of each component is limited to that whole. Thus, only relative changes are relevant (e.g., van den Boogart and Tolosana-Delgado 2008). To tackle this issue and to get rid of spurious anti-correlations (Chayes 1960), the natural logarithms of ratios between all pairs of oxides are used as variables instead of the single values (Aitchison 1986). The usage of log-ratios is a mathematically elegant transformation that enables the use of standard unconstrained multivariate statistics (Aitchison and Egozcue 2005). This approach increases the total number of variables from eight oxides measured to 28 pairwise log-ratios. Besides the advantages, this introduces a difficulty when handling values BDL, that is < 0.03 wt%. The log-ratios with values BDL in the numerator and/or dominator can potentially span a wide range of values. Considering a log-ratio that contains a value BDL in the numerator with a detection limit DL, the log-ratio of numerator and a denominator with value x is always < ln(DL × x -1 ). Conversely, a value BDL in the denominator with a detection limit DL leads to a log-ratio > ln(x × DL -1 ) with x the value of the numerator. Thus, log-ratios with a value BDL in the numerator or denominator have been replaced by the minimum of that log-ratio in the database minus one or the maximum plus one, respectively (see pair-wise log-ratio function in Supplementary Information 3). This approach ensures that values BDL are treated in the same way than fully observed values, while maintaining and making use of this information. Those log-ratios with values BDL both in the numerator and denominator are replaced by NA. All missing values (those that involve an NA and those that involve two BDLs) are treated by the 'na.roughfix' function, which first replaces missing values by the median of not-missing values, trains a random forest model, computes the proximity matrix between samples, and refines the missing values by replacing it with the weighted median of nonmissing values (using the proximity values as weights).
The method parameters of the random forest were chosen in a double procedure. The 'sampsize' parameter was set by a discretionary approach. This parameter controls the number of observations taken from each class for each tree by random sampling. As random forest is optimized for creating discrimination models that have the highest overall classification success, classes that include more observations are often better classified compared to those where fewer observations are available (Supplementary Information 2). The aim of the garnet discrimination model was to balance the classification success rates for the individual classes as well as possible. Therefore, the 'sampsize' for classes containing a higher number of observations was reduced (Table 2). This results in a slightly lower classification success for the entire database, but more balanced success rates for the individual classes (Chen et al. 2004). The other parameters were set by formal exhaustive cross-validation. Both models have been computed by five iterations with all combinations of 'ntree' between 200 and 6,000 (step size = 200), 'mtry' between 1 and 12 (step size = 1), and 'nodesize' between 1 and 3 (step size = 1). Those parameter values giving on average the lowest OOB (out-of-bag) error have been chosen for the final models (Table 2).

Performance of the 'setting and metamorphic facies' model
The 'setting and metamorphic facies' model predicts the correct class out of seven classes for > 88% of all observations included in the database based on the OOB error (Table 3).
Notably, the predicted classes are more detailed and useful in terms of petrogenetic information compared to the most frequently used scheme after Mange and Morton (2007), which includes prediction classes that do not point to a specific host-rock type (Supplementary Information 2).
To take full and quick advantage of the classification regarding provenance, the voting results are shown in two separate graphical schemes for 'setting' and 'metamorphic facies', each representing the four ternary sides of a tetrahedron (Fig. 1). The 'setting' scheme discriminates garnet sourced from MA, IG, MS, and metamorphic rocks (MM) based on the votes for each class. MM is represented by the maximum vote of the four metamorphic classes, that is BS/GS, AM, GR, and EC/UHP. The decision of taking the maximum vote instead of the sum of votes is based on two major points. First, taking the sum of votes artificially introduces classification results that are based on a much larger 'sampsize' for training the random forest model for MM (1011 + 1200 + 1011 + 1100 = 4322) compared to MA (1011), IG (1011), and MS (826, see Table 2). Consequently, the balancing introduced by making use of the 'sampsize' argument (see "Model development" section) is getting out of balance, resulting in higher classification success rates for MM at the expense of the three other classes. Second, taking the sum of votes entangles the two different questions to be answered by a single classification model, which is impermissible.
Votes of the 'setting' scheme are shown as kernel density maps in four ternary plots to represent each setting class (Fig. 1a). The voting result of each individual observation from the database is solely plotted in one of the four ternary plots, that is, the plot representing the three highest votes. The vast majority of the votes plots close to the apexes of the corresponding correct class. Only very minor overlaps occur for MM versus MA, MM versus IG, and MM versus MS. This is obvious in the corresponding majority vote bar plot showing that on average > 95% are assigned to their correct class (Fig. 1a).
Considering solely the discrimination of MA versus crustal garnet (MM, IG, and MS), MA are correctly classified in 96% and crustal garnet in 99%, giving a class average of 97%. Thus, the prediction success excels the graphical mantle-versus-crustal garnet discrimination after Hardman et al. (2018), that is 95% based on the presented database (Supplementary Information 2). In the developed model, MA-M is the only mantle subgroup exceeding 2% of the observations misclassified as MM, and EC-M as well as UHP-M are the only crustal subgroups exceeding 2% of the observations misclassified as MA (Fig. 1b). However, even these challenging groups show high success rates: 93% for MA-M, 97% for EC-M, and 96% for UHP-M.
Garnets of class MS are correctly identified in > 97% and only minor amounts (< 3%) are misclassified as MM ( Fig. 1a, b). Contrary, metamorphic calc-silicates that formed at high temperature, which are subgroups AM-CS and GR-CS, are often misclassified as MS. Besides their chemical similarity, this is caused by the underrepresentation of observations from metamorphic calc-silicates compared to mafic and intermediate-felsic/metasedimentary host-rock compositions, in particular for AM (Table 1). In addition to calc-silicates, some of the alkaline igneous garnets (7%) are misclassified as MS.
IG garnet is correctly classified in > 93% (Fig. 1a). The highest success rates are given for subgroups IG-IF and IG-A with 94% (6% misclassified as MM) and 93% (7% misclassified as MS), respectively. IG-M shows a slightly lower success with 86%, which still represents a well-discriminated subgroup. Noteworthy, very low amounts of garnet from other subgroups are misclassified as IG, with AM-IF/S and EC-IF/S being the only subgroups exceeding 2% misclassification as IG (Fig. 1b). Garnets of the setting class MM are correctly classified in > 96% (Fig. 1a), with 15 of the 17 metamorphic subgroups listed in Table 1 being correctly assigned in > 95%.
Subgroups showing lower success rates are restricted to calc-silicates (GR-CS and AM-CS; Fig. 1b).
Votes of the 'metamorphic facies' scheme are as well shown as kernel density maps in four ternary plots to represent each metamorphic class (Fig. 1c). The plot includes all observations giving the highest votes for MM in the 'setting' scheme. The voting result of each individual observation is solely plotted in the ternary diagram representing the three highest votes. Compared to the 'setting' scheme, the spread     (2002). Corresponding classification success rates based on the majority vote are shown as barplots (note break in scale at 75%). Proportions of observations from each class occurring in the individual ternary diagrams are given in small rectangular boxes within the ternary diagrams of (a) and (c). The proportions of observations from individual subgroups assigned to the prediction classes are shown as barplots for the 'setting' (b) and 'metamorphic facies' (d) scheme (see Table 1 for abbreviations). Bars representing misclassifying votes are highlighted by frames that are colour coded according to the true class in votes and overlaps are more pronounced but the maxima are clearly located close to the apexes of the corresponding class. The barplot in Fig. 1c shows that classification success rates of the individual classes range from 83 to 87%, giving an average of > 84% correctly classified garnets. Beyond the additional and separate prediction of garnet sourced from BS/GS, classification clearly improved compared to the scheme of Tolosana-Delgado et al. (2018), which shows an average classification success rate of 65% for metamorphic classes ( Supplementary Information 2). The barplots in Fig. 1d show that (i) the correct subgroups constitute the vast majority of observations assigned to the individual metamorphic classes, except for several of the CS subgroups; (ii) garnet from IF/S host-rock composition is better classified for GR, AM, and BS/GS subgroups than for EC/UHP subgroups and vice versa; and (iii) misclassifications are mainly restricted to adjacent classes in P-T space, that is BS/GS and GR with AM and EC/UHP, respectively, while AM and EC/UHP share borders with all other classes (see also barplot in Fig. 1c). Points (i) and (ii) can be related to the distribution of observations from the individual subgroups (Table 1), but point (iii) implies a rather continuous change in garnet composition with changing P-T conditions, which is well known from metamorphic petrology and reflected by the votes (Supplementary Information 2).

Performance of the 'composition' model
The 'composition' model predicts the correct class out of five for > 92% of all observations included in the database based on the OOB error estimate (Table 4) Votes of the 'composition' scheme are shown as kernel density maps in four ternary plots to represent four of the five classes (Fig. 2a). Because alkaline garnet has the lowest number of observations, is rarely misclassified, and misclassifications are restricted to class CS (Table 4), class A is excluded from the plots. Thus, plots include all observations giving not the highest votes for class A. As in the preceding plots, voting results for each individual observation from the database is solely plotted in one of the four ternary plots, which is the plot representing the three highest votes. The vast majority of the votes plots close to the apexes of the corresponding correct class. However, overlaps occur for M versus IF/S and M versus UM. This is highlighted in the corresponding majority vote bar plot showing that on average > 93% are assigned to their correct class (Fig. 2a).
The barplots in Fig. 2b show that the correct subgroups constitute the vast majority of observations assigned to the individual composition classes. In particular, garnet classified as CS and UM are well represented and only scarcely garnet of subgroups IG-A and MA-M receive the highest votes for CS or UM, respectively. Garnets classified as IF/S and M are dominated by the correct subgroups, too. However, up to almost 16% of the M subgroups are assigned to IF/S and IF/S subgroups to M. Conspicuously, EC-IF/S (50%), BS-M (50%), and BS-CS (100%, 60% in M, 40% in IF/S) show high misclassifications.

Understanding the models
The two presented garnet discrimination models consist of 3400 and 3200 trees (Table 2), each based on a different bootstrapped random sample and deeply grown without pruning leading to between 1619 and 1949 decision nodes for each tree of the 'setting and metamorphic facies' model and between 1033 and 1269 nodes for the 'composition' model. Thus, following the decision process in detail is not feasible, giving random forest models a kind of 'black box' character.
To understand the basic discrimination decisions performed by the models, the importance of variables is explored in three ways: (i) Considering the mean decrease in Gini impurity (Supplementary Information 2) for individual variables, that is the weighted average of the decrease in Gini impurity between parent and child nodes in all trees of the trained forest when the values of an individual variable are permuted; (ii) considering the mean decrease in accuracy for individual variables, that is the decrease of prediction accuracy when an individual variable is removed from the OOB test set; and (iii) considering the increase in  Venables and Ripley (2002). Corresponding classification success rates based on the majority vote are shown as barplots (note break in scale at 75%). Proportions of observations from each class occurring in the individual ternary diagrams are given in small rectangular boxes. b Proportions of observations from individual subgroups assigned to the prediction classes shown as barplots (see Table 1 for abbreviations). Bars that represent misclassifying votes are highlighted by colour coded frames according to the true class  misclassification rates for individual classes when individual oxides are removed from the database for training the forest. Note that the exclusion of each oxide means that all seven log-ratios that include this oxide are removed. Finally, the most important variables to discriminate individual classes and subgroups are retraced and their origin is discussed.

Variable importance of the 'setting and metamorphic facies' model
The highest variable importance based on the mean decrease in Gini impurity and accuracy is given by log-ratios involving MgO. This particularly includes ln(FeO total × MgO -1 ), followed by ln(SiO 2 × MgO -1 ) and ln(Al 2 O 3 × MgO -1 ) (Fig. 3a). All three show a high spread for observations of the dataset with distinct ranges for the individual classes (Fig. 3b). Notably, values of these ratios decrease from metasomatic garnet (MS), over low-temperature metamorphic (BS/GS and AM) and igneous garnet (IG), to high-grade metamorphic garnet (GR and EC/UHP), and mantle garnet (MA) shows the lowest values.
The next important set of log-ratios includes CaO versus Al 2 O 3 , SiO 2 , MgO, MnO, and FeO total (Fig. 3a). This is dominated by the importance of discriminating class MS which shows higher CaO values than the other classes (Fig. 3b). Another major difference between class MS and all other classes is the high range of values for ln(SiO 2 × Al 2 O 3 -1 ) for MS compared to the very tight range of the other classes, which is also visible in the values for ln(SiO 2 × FeO total -1 ) and ln(Al 2 O 3 × FeO total -1 ). In contrast to class MS, garnets of class GR and to a lesser extent IG show high values for log-ratios including CaO in the denominator, even higher than those for the other metamorphic classes, making log-ratios including CaO useful to separate garnet of classes GR and IG. Besides MS and GR, ln(MgO × CaO -1 ) is particularly important for the discrimination of IG versus BS/ GS, which are in most other respects compositionally similar (Fig. 3b).
Behind the log-ratios mainly dictated by MgO and CaO along with FeO total , a set of log-ratios including MnO becomes next most important (Fig. 3a). Remarkably, values for logratios with MnO in the denominator versus MgO, Al 2 O 3 , SiO 2 , and FeO total show a reverse trend to log-ratios with MgO in the denominator (Fig. 3b). Values successively increase from lower-temperature metamorphic and igneous garnet, over higher-temperature metamorphic garnet, to high-grade metamorphic garnet, and finally mantle garnet. Distinct is the behaviour of metasomatic garnet showing higher similarities with low-grade metamorphic garnet for ln(MnO × MgO -1 ) and ln(Al 2 O 3 × MnO -1 ), and higher similarities with highgrade metamorphic garnet for ln(SiO 2 × MnO -1 ) and ln(FeO total × MnO -1 ).
Considering the mean decrease in Gini impurity and accuracy of the entire tree ensemble, the individual log-ratios including TiO 2 or Cr 2 O 3 are placed in a subordinate role for creating high-purity splits. However, some specific log-ratios are important for splitting individual classes. In particular, ln(TiO 2 × CaO -1 ) is essential for purifying the discrimination of class IG versus metamorphic classes. The variables ln(Cr 2 O 3 × FeO total -1 ), ln(Cr 2 O 3 × MnO -1 ), ln(Cr 2 O 3 × CaO -1 ), ln(TiO 2 × FeO total -1 ) and ln(TiO 2 × MnO -1 ) are important for discriminating MA versus EC/UHP (Fig. 3).
In addition to the exploration of variable importance based on the decrease in Gini impurity and accuracy, Fig. 4 shows the implications for misclassification rates when individual oxides are excluded for the development of a random forest 'setting and metamorphic facies' discrimination model. Misclassification rates for all classes show the highest increase when MnO is excluded, followed by TiO 2 (strongly influencing the correct classification of IG), MgO (with notable contribution to the quality of BS/GS and MS reclassification), CaO, Al 2 O 3 , FeO total , SiO 2 , and Cr 2 O 3 in decreasing order of global importance (Fig. 4, bold grey line). The higher importance of MnO and TiO 2 compared to MgO, CaO, and FeO total is contrary to the importance order based on the mean decrease in Gini impurity (Fig. 3). This indicates that MnO and TiO 2 either gain their importance by considering many log-ratios including MnO and TiO 2 or by being specifically important for the discrimination of individual classes, or a combination of both. A detailed description of consequences for individual classes by excluding individual oxides is provided in Supplementary Information 2.

Variable importance of the 'composition' model
The highest variable importance based on the mean decrease in Gini impurity and accuracy is given by log-ratios with CaO versus Al 2 O 3 , SiO 2 , FeO total , MnO, and MgO (Fig. 5a). Calc-silicate (CS) and alkaline (A) garnet show low values for all of these ratios and thus can be discriminated from all other groups (Fig. 5b). Mafic garnet (M) shows higher values than CS and A, but lower values than intermediate-felsic/metasedimentary (IF/S), enabling to create high-purity splits. In addition, the ratios with CaO versus FeO total , as well as MgO, are useful to separate ultramafic (UM) and M garnet.
The next important set of log-ratios includes MgO versus SiO 2 , Al 2 O 3 , FeO total , and MnO (Fig. 5a). This is not solely related to the creation of high-purity splits but also to highly increase the model accuracy. Like log-ratios with CaO, the ratios with MgO allow the separation of most CS and A garnet (high values) from M and IF/S garnet (intermediate values), and UM garnet (low values) (Fig. 5b).
Besides ln(FeO total × CaO -1 ) and ln(FeO total × MgO -1 ), other ratios that include FeO total like ln(SiO 2 × FeO total -1 ), ln(Al 2 O 3 × FeO total -1 ), and ln(FeO total × MnO -1 ) show only intermediate importance for the decrease in Gini impurity. Nevertheless, they are highly important for the accuracy of the model (Fig. 5a). By contrast, log-ratios that include Cr 2 O 3 also show intermediate importance for the purity of splits, but much lower importance for the model accuracy.   Table 1 for abbreviations). For some variables, nominator or denominator are often below detection limit, resulting in similar values for median and quantiles (lines overlap) However, these ratios are obviously useful to separate UM garnet (Fig. 5b). The log-ratios including TiO 2 are placed in a subordinate role for the mean decrease in Gini impurity and accuracy (Fig. 5a). By contrast, excluding TiO 2 when developing the 'composition' discrimination model leads to the highest increase in misclassification rates (owing to its dramatic influence in the correct reclassification of alkaline garnets), followed by CaO, MgO, MnO, Cr 2 O 3 , FeO total , SiO 2 , and Al 2 O 3 in decreasing order of importance (Fig. 6). A detailed description of consequences for individual classes by excluding individual oxides is provided in Supplementary Information 2.

Origin of main discriminators
The discrimination of classes in the 'setting and metamorphic facies' as well as the 'composition' model is complex, and for sufficient separation, individual subgroups have to be considered individually. A detailed exploration of main discriminators for individual subgroups is provided in Supplementary Information 2. Here we focus on the most important class discriminators.
Garnet of class MS in the 'setting and metamorphic facies' model shows the highest classification success (Table 3), reflecting its distinct chemical composition. MS garnet is CaO rich, MgO poor, and shows a broad range in Al 2 O 3 contents with an average lower than the other classes (Fig. 7a). The distinct composition is strongly related to the formation environment. Although skarn garnet can form in a wide range of settings and different protolith lithologies, by far the most are associated with igneous activity that leads to contact metamorphism of carbonates by heat supply and infiltrating metasomatic fluids at depth < 12 km (e.g., Meinert 1992). Garnet mainly forms at the prograde anhydrous stage together with clinopyroxene, both being high in Ca 2+ due to the availability given by the chemical composition of the protolith. Furthermore, high oxygen fugacity enables the stabilization of andradite (end member composition: Ca 3 Fe 2 Si 3 O 12 ), that involves the substitution of Al 3+ by Fe 3+ compared to grossular (Ca 3 Al 2 Si 3 O 12 ), resulting in a garnet solid solution rich in grossular-andradite (e.g., Zhang and Saxena 1991). In contrast, clinopyroxene crystallizes mainly as solid solutions between diopside (CaMgSi 2 O 6 ) and hedenbergite (CaFeSi 2 O 6 ), and thus much of the available Mg 2+ and Fe 2+ are incorporated in clinopyroxene (Bin and Barton 1988). An exceptional case includes skarns that formed under low oxygen fugacity conditions like those associated with tungsten and tin mineralization, where some garnet populations are rich in Al 3+ (e.g., Zhang and Saxena 1991;Meinert 1992). However, considering recently published compositions of garnet from these reduced environments (Duan et al. 2020;Im et al. 2020), > 87% are correctly classified as MS, and the remaining are mainly classified as GR garnet of CS composition.
Garnet of class MA shows the second highest classification success in the 'setting and metamorphic facies' model (Table 3). In particular, the comparatively high content of MgO and Cr 2 O 3 and the low content of FeO total and MnO in mantle rocks compared to crustal rocks represent the most important difference (Fig. 7b). In addition, a high importance of TiO 2 is observed for the discrimination of some subgroups ( Supplementary Information 2). The enrichment of Cr in MA garnet is mainly related to the lithophile behaviour of Cr, resulting in the accumulation of Cr in mantle mineral phases like chromium spinel during partial melting of the upper mantle (e.g., Matrosova et al. 2020), while spinel is replaced by garnet at greater depths (e.g., Klemme et al. 2009). The higher TiO 2 content of MA garnet is probably related to the higher formation temperatures, resulting in increasing solubility of Ti in garnet (Aulbach 2020). Notably, caution should be taken for some rare UHP-IF/S garnets that can have extremely high MgO contents (Chopin 1984) leading to a misclassification as MA garnet.
Compared to MS and MA garnet, the discrimination of IG versus MM garnet is more challenging. Very distinct are only IG garnets of alkaline composition ( Supplementary  Information 2). Otherwise, the higher content of MnO and lower content of MgO as well as CaO represent the main discriminators (Fig. 7c). This is related to the much higher abundance of garnet in felsic igneous rocks compared to alkaline or mafic igneous rocks. High MnO contents reflect the abundant crystallization from highly fractionated Al-and Mn-rich magmas (e.g., Dahlquist et al. 2007), enabling garnet growth at pressures as low as 3 kbar (e.g., Green 1977). The low-pressure formation conditions also agree with low CaO contents, and Ca-rich garnet in igneous rocks only occurs in deeply emplaced intrusions (e.g., Anderson et al. 2008). In addition, at crystallization temperatures of felsic melts, Mg is partitioning into the melt, resulting in low-Mg garnet (Green 1977). However, this superordinate trend is only sufficient to separate parts of class IG and subpopulations have to be considered individually. Particularly notable is the classification improvement given by the higher content of TiO 2 in IG garnet compared to MM. Besides the relevance for garnet of alkaline igneous rocks, which are very rich in Ti (e.g., Huggins et al. 1977), TiO 2 is highly important for garnet from intermediate-felsic igneous rocks (Supplementary Information 2). Understanding the partitioning of Ti in garnet of igneous versus metamorphic systems is not straightforward, but the observed preferred incorporation of Ti into igneous biotite (Samadi et al. 2021) may also apply to garnet.
With regard to the discrimination of individual MM classes, the most important variables to separate BS/GS from other metamorphic classes include MgO and MnO (Fig. 7d). The low content of MgO agrees with the lowtemperature formation conditions and the many exchange thermometers that imply increasing Fe × (Fe + Mg) -1 with decreasing temperature (Reverdatto et al. 2019, and references therein). High MnO contents are consistent with the increasing stability field of garnet to lower P-T conditions with increasing MnO content of the protolith. Thus, the earliest grown garnet typically shows the highest MnO contents (e.g., Carlson 1989) and removes MnO from the effective bulk composition leading to an up-temperature shift of garnet stability and bell-shaped Mn-zoning patterns (Evans 2004).
For the discrimination of GR from AM and EC/UHP garnet, the combination of high MgO and low CaO is most important (Fig. 7e). This highlights the fundamentals of Fe-Mg exchange thermometry (decreasing Fe × (Fe + Mg) -1 with increasing temperature) and the garnet-aluminosilicate-plagioclase-quartz geo-thermo-barometer based on the higher stability of anorthite at high-temperature/lowpressure (low Ca in garnet) compared to the higher stability of grossular + aluminosilicate + quartz at low-temperature/high-pressure (e.g., Ghent 1976;Koziol and Newton 1988). Similarly, the best separation of AM and EC/UHP is observed by considering the higher values of FeO total versus MgO (lower temperature) and MnO versus CaO (lower pressure) for AM (Fig. 7f).
Concerning composition, class CS shows the highest discrimination success, followed by A (Table 4). CS and A garnet are well separated from all other classes by their higher CaO content (Fig. 8a), reflecting the high host-rock CaO content. The high TiO 2 content of A garnet allows the discrimination to CS (Fig. 8b). Furthermore, UM garnet separates from IF/S and M garnet by the higher MgO and Cr 2 O 3 content in combination with the lower FeO total content (Fig. 8c), in line with the element availability defined by the host-rock composition as well as the high-temperature formation conditions of mantle rocks. For the discrimination of IF/S and M garnet, ln(FeO total × MgO -1 ) and ln(MnO × CaO -1 ) are the most important variables, both being higher for IF/S compared to M garnet (Fig. 8d). This agrees with the often higher CaO content of M host rocks and the stabilization effect of garnet at lower temperatures by increased host-rock MnO contents that are mainly observed from IF/S protoliths.

Sensitivity and applicability
One of the major aims in sedimentary provenance analysis is the reconstruction of source rock assemblages of sediments and the climatic-physiographic conditions under which they formed (von Eynatten and Dunkl 2012). The main potential of garnet single-grain analysis lies in reflecting variations in metamorphic conditions of rocks located in the source region, in particular when other characteristic minerals are lost due to their lower mechanical and/or chemical stability (Morton 1985). Thus, to represent a robust tool, predictions of the introduced garnet discrimination scheme should (i) be sensitive to changes in P-T conditions, (ii) reflect catchment specific host-rock characteristics, and (iii) be able to identify provenance shifts occurring in sedimentary successions or between samples.

Sensitivity to P-T changes
The sensitivity to variations in P-T conditions during garnet growth is tested by comparing the proposed discrimination scheme predictions for different garnet growth zones and/ or different garnet populations from samples that record multistage garnet growth based on geo-thermo-barometric investigations. Schantl et al. (2019) studied garnet-bearing granulites from the Moldanubian Zone in Lower Austria by a combination of rutile thermometry, biotite breakdown reactions, Ti content of biotite, garnet-aluminosilicate-plagioclase-quartz and amphibole-plagioclase thermo-barometry as well as thermodynamic modelling. For a sample set of five felsic granulites, the authors deduced 810-820 °C and 16-25 kbar for garnet core formation (eclogite facies), after which the core composition has been entirely reset by diffusion during decompression under increasing temperature to 1000-1050 °C and 15-17 kbar (eclogite-granulite-facies transition). Subsequent fast decompression cooling to 770 °C and 8 kbar (granulite-amphibolite facies transition)  Table 1 for abbreviations) lead to diffusional modification of the rim composition and minor modification of the core composition, as well as the formation of a second garnet generation (Fig. 9a, red path in P-T facies diagram for MORB composition). This P-T evolution is well reflected by the arithmetic mean votes for garnet core (n = 5), garnet rim (n = 5), and retrograde garnet (n = 2, only observed in one sample) compositions (Fig. 9a, red path in ternary vote plots). Core compositions received Gini impurity (left) to lowest (right). Lower diagram shows corresponding variable values for individual classes as 25% quantile, median, and 75% quantile (see Table 1 for abbreviations) the highest votes for EC/UHP being five times higher than for AM and GR, rim compositions received highest votes for GR followed by EC/UHP, and retrograde garnet compositions show the highest votes for GR followed by slightly lower votes for AM, both being more than two times higher than for EC/UHP. Giuntoli et al. (2018) investigated complexly zoned garnets from micaschists of the Sesia Zone in the Western Italian Alps. Thermodynamic modelling of garnet growth zones records two orogenic cycles, most completely preserved by sample '1249'. The pre-alpine metamorphic event gives 730 °C and 6 kbar for garnet core formation (high-temperature amphibolite facies), followed by isobaric cooling to 620 °C and 6 kbar recorded by the first rim (amphibolite facies) (Fig. 9a, pale-blue path in P-T facies diagram). Alpine metamorphism at 620 °C and 16 kbar led to diffusional re-equilibration of the outermost rim and in proximity to fractures, denoted as the second rim (eclogite facies). Growth of a third rim resulted from subsequent temperature increase to 660 °C at almost isobaric conditions of 15 kbar (eclogite facies). This complex path is nicely reflected by the discrimination scheme votes (Fig. 9a, pale-blue path in ternary vote plots). For the high-temperature amphibolitefacies core formation, predictions are highest for class AM followed by GR with very minor votes for BS/GS and EC/ UHP. Votes for the composition of the first rim that formed under isobaric cooling within the amphibolite facies stay highest for AM, but the second highest votes switched to BS/GS reflecting the colder conditions. The strong increase in pressure recorded by the second and third rim results  Table 1 Table 1 for abbreviations). Data shown as kernel density estimate maps in dominant votes for EC/UHP. Such increase in votes for EC/UHP is also observed for the increased pressure conditions from the innermost garnet core (545 °C and 15 kbar, blueschist-eclogite-facies transition) to the outermost core (560 °C and 21 kbar, eclogite facies) of a chloritoid-bearing micaschist studied by Negulescu et al. (2018) (Fig. 9a, yellow path). It is important to note that in all three P-T paths tested, the host-rock composition for stages outside the eclogite facies is correctly predicted as IF/S, but growth stages within the eclogite facies are misclassified as M. This highlights the difficulty of assigning the correct composition class to subgroup EC-IF/S (cf. Figure 2b). Li et al. (2021) studied mafic granulite lenses from the South Altyn Orogen in West China by thermodynamic modelling including garnet and plagioclase isopleth thermobarometry. The authors observed three garnet generations, whereby conditions of 600-655 °C and 15.8-19.2 kbar (eclogite facies) are recorded by the core of garnet one, followed by garnet one mantle growth at 920 °C and 36.2 kbar (ultrahigh-pressure eclogite facies), formation of garnet two from 820 °C and 17.8 kbar (eclogite facies) to 826 °C and 11.5 kbar (high-pressure granulite facies), and rim formation of garnet three at 826-735 °C and 11.5-8.7 kbar (highpressure granulite-amphibolite facies transition) (Fig. 9a, dark blue path and white points in P-T facies diagram). While the formation of garnet one is correctly predicted as EC/UHP by the discrimination scheme, decompression is not well reflected, and although votes for GR increase, they stay highest for EC/UHP (Fig. 9a, dark blue path and white points in ternary vote plots). This agrees with the issue of the metamorphic class prediction for garnet of subgroup GR-M (cf. Figure 1d).

Reflecting catchment specific garnet host-rock characteristics
To test the sensitivity of the discrimination scheme to reflect changes in the relative abundance of different garnet-bearing host rocks contributing erosional material to sediments, a modern sand case study is chosen with a well-known geological framework of the catchments. Thereby, we focus on the discrimination between blueschist/greenschist and eclogite facies sources, an issue that is not tackled by previous schemes. Krippner et al. (2015) studied detrital garnets from tributaries draining the western Hohe Tauern window in the central Eastern Alps, Austria. The Dorfertal stream section was sampled by seven modern sand samples, here renamed as A-G (Fig. 9b). Along the profile, the stream drains (i) the Venediger nappe that preserves few pre-Alpine eclogites (e.g., von Quadt et al. 1997) and underwent alpine peak metamorphic conditions of ~ 550 °C and ≥ 10 kbar mainly corresponding to the blueschist facies (Selverstone 1993), (ii) the 'Eclogite Zone' that records peak conditions of ~ 630 °C and ~ 25 kbar corresponding to the eclogite facies (Hoschek 2007), and (iii) the Glockner nappe that underwent metamorphism at 400-500 °C and ≤ 7 kbar mainly corresponding to the greenschist facies (Selverstone 1993). This drainage path is well represented by the discrimination scheme predictions, showing that (i) garnets of samples A-C collected within the Venediger nappe are dominantly assigned to a BS/GS origin, (ii) an increasing amount of EC/UHP assigned garnet is observed once the 'Eclogite Zone' is entered (samples D and E) which is becoming highest directly downstream of the eclogite zone (sample F), and (iii) the proportion of EC/ UHP assigned garnet dilutes further downstream in sample G of the Glockner nappe (Fig. 9b, barplot).

Detecting provenance shifts
To test the capability of the new garnet discrimination scheme regarding the identification of shifts in provenance, a multi-method provenance study of Cretaceous sedimentary rocks from the Northern Calcareous Alps is chosen (von Eynatten and Gaupp 1999). Based on framework separate composition classes  Table 1 for abbreviations). Data shown as kernel density estimate maps  Bucher and Frey (2002), modified from Schönig et al. (2018b). Vote results for individual growth zones by the introduced discrimination scheme are shown in ternary diagrams. b Sensitivity to reflect catchment spe-cific characteristics tested on an example case study of Krippner et al. (2015). Simplified geological map of the Hohe Tauern area shows the sampling locations. Class assignments of garnets from individual samples by the introduced discrimination scheme are shown as barplots. c Sensitivity to identify shifts in provenance tested on an example case study by von Eynatten and Gaupp (1999). Individual votes as well as arithmetic mean votes for garnet grains of sediments sourced from southeast and northwest are shown in the ternary 'metamorphic facies' plot. Barplot shows the corresponding class assignment (see Table 1 for abbreviations) petrography, heavy mineral analysis, and the chemical composition of amphiboles, white mica, tourmaline, garnet, and chloritoid, the authors distinguished two contrasting source regions. Particularly the occurrence of glaucophane and phengite indicate the contribution of erosional material from low-temperature/high-pressure metamorphic rocks located in a north-western source area, which are absent in sediments sourced from a south-western source. Re-evaluation of the detrital garnet data by the introduced discrimination scheme strongly supports this observation. Garnet grains from the south-eastern source are dominantly classified as AM (~ 56%), while class assignments to BS/GS, GR, and EC/UHP are each < 11% (Fig. 9c). By contrast, garnet grains sourced from northwest are dominantly assigned to EC/ UHP (~ 34%) with significant but much lower proportions of AM (~ 22%), and much higher proportions of BS/GS (~ 23%). This is further highlighted by the arithmetic mean votes that shift from the AM-GR-EC/UHP diagram (39% AM, 12% GR, 14% EC/UHP, 11% BS/GS) for the south-eastern source, to the AM-BS/GS-EC/UHP diagram (24% AM, 8% GR, 31% EC/UHP, 19% BS/GS) for the north-western source (Fig. 9c, black arrow).

The 'garnetRF' web application
To enable a user-friendly application of the garnet random forest discrimination scheme without the need for installing software and having programming expertise, we developed a freely and worldwide accessible web application called 'garnetRF v1.1'. The app was developed using the 'shiny' R package of Chang et al. (2021) and the 'shinydashboard' package of Chang and Ribeiro (2018). Used packages further include those mentioned in the "Model development" section as well as 'readxl' (Wickham and Bryan 2019) and 'openxlsx' (Schauberger and Walker 2020) for importing spreadsheet files, as well as 'colourpicker' (Attali 2020) and 'cowplot' (Wilke 2020) for visualisation.
The 'garnetRF' app is standalone and users are guided by an 'instructions' tab. By five quick steps, users get from their garnet major-element data to discrimination scheme results and corresponding plots:

Conclusion
A large database including 13,615 observations of chemical garnet major-element analyses is compiled from the literature. Observations are subdivided into 23 petrogenetic subgroups with regard to host-rock type, metamorphic facies, and composition. With this database, a new discrimination scheme is developed that aims at predicting the original host-rock of detrital garnet grains. To focus on the most substantial information in terms of sedimentary provenance, database subgroups are merged into seven classes to predict the 'setting and metamorphic facies' of the garnet-bearing host-rock as well as five classes to predict the 'composition'. For both prediction issues, the random forest classification machine-learning algorithm is applied. The final discrimination scheme is easily applicable via a provided web app. Considering the out-of-bag error, the scheme is able to correctly predict (i) the host-rock setting with a class average of > 95%, (ii) the host-rock metamorphic facies with a class average of > 84%, and (iii) the host-rock composition with a class average of > 93%.
Success rates for individual classes included in setting, metamorphic facies, and composition predictions differ ≤ 5% from the class average, emphasizing substantial discrimination balance. However, balancing between individual subgroups merged in the prediction classes is much less pronounced. Particularly notable are higher misclassification rates in (i) setting prediction for metamorphic calc-silicate garnet (misclassified as metasomatic) and mafic igneous garnet (misclassified as metamorphic), (ii) metamorphic facies prediction for intermediate-felsic/metasedimentary eclogite/ultrahigh-pressure facies garnet (misclassified as granulite, blueschist/greenschist, and amphibolite facies) and mafic granulite-facies garnet (misclassified as eclogite/ultrahigh-pressure facies), and (iii) composition prediction for mafic blueschist/greenschist facies as well as mafic igneous garnet (misclassified as intermediate-felsic/metasedimentary) and intermediate-felsic/metasedimentary eclogite/ultrahigh-pressure garnet (misclassified as mafic).
Detailed exploration of the discrimination models reveals that decisions mainly follow characteristic partitioning trends that are the building blocks of geo-thermo-barometry. The strength of the developed model is the step-wise consideration of many of those element ratios in a randomized way that leads to a significant increase in classification success without compromising generalization. This procedure also uses individual decision lines for subpopulations occurring within the prediction classes, which seems to be the main reason for high classification success rates for the host-rock metamorphic facies without knowledge of the host-rock composition and vice versa. In addition, model exploration uncovers the high potential of minor variations in TiO 2 to strongly increase prediction accuracy, underlining the importance to include TiO 2 in the standard protocol of in situ chemical analysis of garnet.
Application examples to crystalline rocks that record multiple stages of garnet growth, to modern sand samples from tributaries draining variable lithologies, and to sedimentary successions that received erosional material from different sources demonstrate that the discrimination scheme is sensitive to identify (i) variations in P-T conditions, (ii) catchment specific host-rock characteristics, and (iii) shifts in provenance. Although the scheme is designed for provenance applications and subsists on a statistically significant number of grains analysed, it may also be useful in crystalline rock petrology to quickly narrow down sample sets to the most interesting samples to study these in more detail.

Outlook and call on the community
Although garnet host-rock predictions clearly enhanced compared to previous discrimination schemes, there is still potential for improvement, in particular with regard to robustness and balance within individual classes. One issue is the underrepresentation of some individual groups/subgroups, especially BS, GS, EC/UHP-IF/S, IG-M, and lowtemperature MM-CS. These are also those groups/subgroups that show the lowest classification success. Another issue is the separate treatment of 'setting and metamorphic facies' and 'composition'.
We suggest that a combined model with a prediction of detailed classes and derivation of the most important information is the best future direction. For example, class GR could be split into GR-IF/S, GR-M, and GR-CS, and after prediction, the maximum vote of the three classes defines the GR prediction, similar to the approach for MM in the presented setting scheme. In the same way, the maximum vote of all mafic classes, that would be MA-M, IG-M, BS/ GS-M, AM-M, GR-M, and EC/UHP-M, defines the prediction for M. This also allows to balance all subgroups during the training phase of model development.
To reach this aim, the dataset has to be extended, in particular for underrepresented groups/subgroups mentioned. In many works, solely the most representative garnet compositions are reported, but it is clear that a much higher amount of data was acquired during the project. Thus, we want to encourage the community to contribute their supplementary as well as newly published garnet data with known host-rock type, metamorphic grade, and composition to the dataset. We also appreciate hints on references that are so far not included in the dataset. In addition, we welcome any kind of criticism or detected problems. Based on feedback and new data acquisition, we plan to submit an updated scheme within the next couple of years.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.