Introduction

Geothermobarometry, the science of measuring the equilibration PT of mineral assemblages, has proven to be invaluable in the study of the Earth’s crust and upper mantle. Geothermobarometry is widely used to determine the PT paths and thermal histories of metamorphic and igneous rocks in the crust and mantle and has been used to reconstruct the stratigraphies of cratons (e.g., Griffin et al. 2004) and study the geochemical heterogeneities of the mantle (O'Reilly and Griffin 2006). Diamond explorers apply them to mantle xenoliths and single-grain xenocrysts to determine the diamond potential of kimberlite and lamproite pipes (Griffin and Ryan 1995). In recent years, geothermobarometry has been applied to mantle xenoliths to calibrate seismic tomography models and build mineral prospectivity maps (Hoggard et al. 2020). Amongst the most used geothermometers for garnet lherzolites, pyroxenites and eclogites are those based on Fe–Mg exchange reactions between mantle silicates such as garnet, clinopyroxene, orthopyroxene, and olivine (Fe–Mg exchange geothermometry). The two most reliable Fe–Mg geothermometers are based on exchange reactions between garnet-clinopyroxene and garnet-orthopyroxene. These geothermometers have been widely employed to estimate mantle and deep crustal temperatures, especially in assemblages lacking co-existing orthopyroxene and clinopyroxene which preclude the use of two-pyroxene solvus geothermometry. Geothermometers based on Fe–Mg exchange equilibria between garnet and clinopyroxene have been particularly important for studying the PT histories of the mantle and crustal eclogites, and inclusions in diamond, whereas geothermometers based on Fe–Mg exchange equilibria between garnet and orthopyroxene have been an important tool for constraining the PT of clinopyroxene-free assemblages within the lower crust and upper mantles such as harzburgites, charnockites and granulites. Efforts to achieve increasingly accurate and consistent PT estimates have resulted in multiple calibrations for the same set of Fe–Mg exchange geothermometers over the past 50 years. Improvements have been achieved through integrating corrections for major and minor elements such as Mn, Mg and Ca (Ellis and Green 1979; Ai 1994; Ravna 2000), recalibrating existing empirical calibrations using experimental methods (Harley 1984), incorporating new thermodynamic data (Wu and Zhao 2007), and conducting new experiments at extended pressures, temperatures, and compositions (Brey and Köhler 1990; Ai 1994; Sudholz et al. 2021b). The reliability of Fe–Mg exchange geothermometers has been evaluated in several studies over the past four decades. Early evaluations by Carswell and Gibb (1980) and Finnerty and Boyd (1984) used PT calculations on natural samples to assess the precision and accuracy of calibrations by Råheim and Green (1974), Mori and Green (1978), O’Neill and Wood (1979), Ellis and Green (1979) and Ganguly (1979) amongst others. Carswell and Gibb (1987) tested Fe–Mg calibrations against experimentally derived mineral compositions between ca. 30 and 45 kbar and 950 and 1450 °C. Nimis and Grütter (2010) used PT estimates from the Nickel and Green (1985) and Taylor (1998) geothermobarometers on natural peridotite and pyroxenite xenoliths to test the Fe–Mg exchange geothermometer calibrations by O’Neill and Wood (1979), Harley (1984), Ravna (2000), and Wu and Zhao (2007). Although these evaluations have helped identify inconsistencies and mutual relationships between various geothermobarometer calibrations, they have not significantly improved our overall understanding of the true precision and accuracy of Fe–Mg exchange geothermometers. There remains to be a detailed understanding of the sources of error for Fe–Mg exchange geothermometers, as well as a consensus on the ‘best-choice’ calibrations for mantle lithologies, including garnet-bearing harzburgites, wehrlites, websterites and eclogites.

In the following study, we present the results from a detailed audit of eight commonly used Fe–Mg exchange geothermometers using an experimental database containing over 300 well equilibrated garnet-bearing peridotite, pyroxenite and eclogite experiments. We limit our evaluation to Fe–Mg exchange geothermometers that are commonly employed by petrologists to study the upper mantle, which includes calibrations by Nimis and Grütter (2010), O’Neill and Wood (1979), Harley (1984), Ai (1994), Krogh (1988), Ravna (2000), Ellis and Green (1979), and Powell (1985). Our experimental database includes a variety of natural and synthetic, fertile, and refractory compositions, and covers a PT range of 10–70 kbar and 850 to > 1650 \(^\circ{\rm C}\). Our experimental dataset has been used to recognize PT intervals and compositions that result in imprecise and inaccurate T estimates. We identify the limitations of different geothermometer calibrations with the overall aim of improving the precision and accuracy of mantle geothermobarometry. To improve the consistency in T estimates between different calibrations, and to allow for more accurate and precise PT estimates on a wider range of compositions, we have recalibrated the garnet-clinopyroxene and garnet-orthopyroxene Fe–Mg exchange geothermometers by multiple regression analysis. We have recalibrated each geothermometer using the chemical parameters employed in previous calibrations (\({\mathrm{e}.\mathrm{g}.\mathrm{ X}}_{\mathrm{Ca}}^{\mathrm{grt}}{\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}};{\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\)Ellis and Green 1979; Ai 1994; Harley 1984), however, we demonstrate considerable improvements in precision, accuracy, and utility through the use of extended P, T, and compositional ranges in our calibration.

Experimental database

Our experimental database is comprised of over 300 experiments from 25 experimental studies (Table 1) (see Appendix 1). Experiments contain a variety of lherzolite (garnet-olivine-clinopyroxene-orthopyroxene), harzburgite (garnet-olivine-orthopyroxene), wehrlite (garnet-olivine-clinopyroxene) websterite (garnet-orthopyroxene-clinopyroxene) and eclogite phase assemblages. These experiments were selected because they contain a variety of phase assemblages that were synthesized across a range of experimental pressures and temperatures using complex starting compositions (see below). The PT of the experiments range from 10 to 70 kbar and 760 to > 1650 °C but most were conducted between 15 and 40 kbar and 1000‒1400 °C. Most experimental assemblages were synthesised from bulk starting materials containing commonly analysed major and minor oxides (SiO2, TiO2, Al2O3, Cr2O3, FeO, MnO, MgO, NiO, CaO, Na2O, K2O). Complex starting materials reflect a variety of natural and synthetic, fertile, and refractory compositions, including MORB Pyrolite, Hawaiian Pyrolite, Cr-doped Pyrolite, Na-doped Pyrolite, KLB-1 and others (see Ringwood 1962; Hart and Zindler 1986; Green and Falloon 1998; Green 2015; Sudholz et al. 2021a). The \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{olv}}\) (Mg/((Mg + Fe)) varied between < 0.85 and > 0.95, highlighting fertile and refectory compositions for experimental run products. Eclogite experiments were also synthesized from bulk starting materials containing commonly analysed major and minor oxides. Eclogite compositions include EC1 and EC2 (Yaxley and Brey 2004), Ca-rich MORB (GA1 and GA2) (Spandler et al. 2008; Kiseeva et al. 2012), and hydrous subducted basaltic compositions (i.e., Elazar et al. 2019).

Table 1 References for peridotite, pyroxenite and eclogite experiments used in study. Refer to Appendix 1

The experimental assemblages used in this study were typically produced from synthesis experiments, that is, equilibrium was approached from one direction only. However, to maintain an appropriate level of data quality and to ensure only well-equilibrated experiments were used, we applied extensive data quality protocols to our dataset. As a first pass filter, we did not include experiments where the analysed oxide totals lay above 102 wt% or below 98 wt%. This filtering bracket is conventionally used to filter experiments used for the calibration of mantle geothermobarometers (e.g., Sudholz et al. 2021a). For Fe–Mg exchange geothermometers we did not include garnet with cation normalized totals above 8.05 cpfu or below 7.95 cpfu (calculated based on 12 oxygen anions in garnet). This filter is more restrictive than the protocols outlined in Nimis and Grütter (2010) who opted not to filter based on garnet cation abundances for their evaluation of mantle geothermometers. We have not included clinopyroxene and orthopyroxene with cation normalized totals above 4.02 cpfu or below 3.98 cpfu (calculated based on 6 oxygen anions in pyroxene). This filter has been successfully used for clinopyroxene-based geothermobarometers (Sudholz et al. 2021a; Ziberna et al. 2016). A slightly less restrictive cation filter for eclogitic clinopyroxene in hydrous experiments was required (3.95–4.05 cpfu). As a final filter for the attainment of equilibrium, we have excluded experiments that returned T estimates that varied by more than \(\pm\)250 °C from Taylor (1998) two-pyroxene solvus T estimates. This filter could only be applied to experiments containing both pyroxene species. The TA98 geothermometer was chosen because it is widely regarded as the most precise and accurate calibration for garnet-bearing peridotites and pyroxenites (Nimis and Grütter 2010).

For all geothermobarometry calculations, we have assumed total Fe is equal to Fe2+, which is common practice for most applications to experimental and natural assemblages. Most of the peridotite experiments in our database were run using a graphite furnace experimental assembly and thus under relatively reducing conditions (near the CCO buffer) which are unlikely to have resulted in significant oxidation of Fe2+ in the run products. Although ferric iron contents can be approximated using stoichiometry in garnet and pyroxene, it has been shown that this approach is inaccurate (Canil and O’Neill 1996) when applied to garnet and clinopyroxene and estimates more in line with pyroxene geothermometry are obtained by treating all Fe as Fe2+. Additionally, results from Matjuschkin et al. (2014) suggest that consideration of ferric Fe contents makes little difference to the performance of two-pyroxene solvus and garnet-clinopyroxene Fe–Mg exchange geothermometry. The results from Nimis et al. (2015) suggest that the Fe2+ partition coefficient for garnet-orthopyroxene equilibria (\({\mathrm{D}}_{{\mathrm{Fe}}^{3+}}^{\mathrm{opx}/\mathrm{grt}})\) shows no obvious relationship with T but increases with decreasing P and with increasing NaOpx. Nimis et al. (2015) suggest that T estimates calculated from garnet-orthopyroxene Fe–Mg exchange geothermometry may not be robust if total Fe is treated as Fe3+, therefore future work is required to properly evaluate and account for the effects of ferric iron on Fe–Mg exchange geothermometry, particularly for garnet-olivine and garnet-orthopyroxene.

Equilibration PT for experiments was calculated using eight mineral geothermometers based on Fe–Mg exchange equilibria. A list of the geothermobarometer calibrations tested in this study (and their abbreviations) is given in Table 2. Although more than 50 Fe–Mg exchange geothermometers are available for garnet-bearing peridotites and eclogites, we have restricted our evaluation to eight calibrations. Our selection is based on usage and applicability. We have focused our evaluation on calibrations that are currently used in studies on the upper mantle. For all PT calculations, we used the PTEXL spreadsheet (maintained by T. Stachel, University of Alberta) (https://cms.eas.ualberta.ca/team-diamond/downloads/). The experimental P was used in all geothermometry calculations that required a P term. This approach allowed us to independently test each type of geothermometer without introducing unnecessary biases. Discussion on iterative combinations of geothermometers is beyond the scope of this study.

Table 2 Abbreviations for the geothermobarometer calibrations used in study

Results

O’Neill and Wood (1979) (OW79)

The O’Neill and Wood (1979) (OW79) garnet-olivine Fe–Mg exchange geothermometer is one of few mantle geothermometers suitable for harzburgitic (clinopyroxene-free) assemblages. Limitations of this calibration on Fe+3 bearing garnet have been discussed in previous studies including Canil and O’Neill (1996), Nimis and Grütter, (2010) and Matjuschkin et al. (2014). For this study, OW79 reproduced experimental T with a mean (\(\mu\)) and median (M) difference \(\left(\Delta \mathrm{T}={\mathrm{T}}_{\mathrm{exp}- }{\mathrm{T}}_{\mathrm{calc}}\right)\) of 12.29 °C and 13.34 °C respectively (Table 3). The standard deviation (\(\sigma\)) and standard error (SE) for \(\Delta \mathrm{T}\) were 147.95 ℃ and 13.34 °C respectively (Table 3). The standard deviation of \(\Delta \mathrm{T}\) was calculated using the formula, \(\sigma =\sqrt{\frac{\Sigma {({x}_{i}-\mu )}^{2}}{N}},\) where xi is the value for \(\Delta \mathrm{T}\) for each experiment, \(\mu\) is the mean value for \(\Delta \mathrm{T}\) (population mean), and N is the number of experiments used in the test. Standard error (SE) was calculated using the formula, SE = \(\sqrt{\frac{\sigma }{N}}\), where \(\sigma\) is the value for standard deviation, and N is the number of experiments used in the test. This statistical approach is the most logical for testing the ability of a geothermometer to reproduce experimental T consistently across a large test dataset. For OW79 T estimates, a large scatter was observed across the entire range in T (Fig. 1) and P, although the geothermometer’s performance was not dependent on variation in experimental P (Fig. 2). Large values for \(\Delta \mathrm{T}\) were associated with changes lnKd (lnKd = ln (\(\frac{{\mathrm{X}}_{\mathrm{Fe}}^{\mathrm{grt}}/{\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}}{{\mathrm{X}}_{\mathrm{Fe}}^{\mathrm{olv}}/{\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{olv}}})\)) (Fig. 3). For lnKd values between 0.4 and 0.8, OW79 returned T estimates within ± 50 °C of experimental T. However, outside of this range, T estimates deviated from experimental values by up to ± 300 °C (Fig. 3). All T estimates were underestimated on garnet-olivine pairs with lnKd values above 0.8 and overestimated on pairs with values below 0.4 (Fig. 3). The magnitude of both underestimation and overestimation increased systematically such that, for a lnKd of 0.2, T was overestimated by more than 300 °C, although the calibration is known to be less reliable on FeO-rich olivine. Systematic bias associated with the lnKd are likely related to limitations in the working compositional range of the geothermometer. This feature is probably an artifact inherited from the limited compositional range used during experimental calibration (i.e., \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\) = 0.90). The lack of correlation between \(\Delta \mathrm{T}\) with \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{olv}}\) suggests that errors in the OW79 geothermometer are unlikely related to Fe loss in the experimental runs.

Table 3 Descriptive statistics for geothermobarometers used in study. Refer to Table 2 for abbreviations
Fig. 1
figure 1

Comparison of experimental T versus \(\Delta\) T (T experimental—T calculated) for Fe–Mg exchange geothermometers. All experiments are on peridotite compositions

Fig. 2
figure 2

Comparison of experimental P versus \(\Delta\) T (T experimental—T calculated) for Fe–Mg exchange geothermometers. All experiments are on peridotite compositions

Fig. 3
figure 3

Comparison of lnKd versus \(\Delta\) T (T experimental – T calculated) for Fe–Mg exchange geothermometers. Filled diamonds are peridotite composition experiments. Open circles are eclogite composition experiments

Ellis and Green (1979) (EG79)

The garnet-clinopyroxene Fe–Mg exchange geothermometer of Ellis and Green (1979) (EG79) is widely used to study the PT histories of peridotitic and eclogitic mantle xenoliths and eclogites. This calibration has been critiqued and refined by several authors over the past four decades, most notably by Powell (1985), Krogh, (1988), Ai (1994), Ravna (2000), and Nimis and Grutter (2010). For this study, EG79 reproduced the T of peridotitic experiments with a mean and median \(\Delta \mathrm{T}\) of − 41.14 °C and − 54.04 °C respectively (Table 3). The standard deviation in \(\Delta \mathrm{T}\) was 96.89 ℃ and the standard error was 8.07 °C. Between 1000 and 1400 °C, EG79 overestimated experimental T by 50 to 200 °C (Fig. 1). Temperature was only underestimated for experiments completed above 1500 °C (Fig. 1). The magnitude of underestimation and overestimation of EG79 T estimates were not associated with changes in experimental P (Fig. 2). Underestimation of T in peridotitic experiments was common for lnKd values below 0.50, whereas overestimation was common above lnKd of 0.60 (Fig. 3). Changes in \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) and \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\) did not adversely affect the performance of the geothermometer (Fig. 4). When applied to garnet-clinopyroxene pairs synthesized from eclogite experiments, the EG79 calibration provided reliable T estimates across most of the experimental T range (Fig. 5). For these experiments, EG79 reproduced experimental T with a \(\Delta \mathrm{T}\) of − 18.59 °C and a median difference of − 11.78 °C. The standard deviation and standard error for \(\Delta \mathrm{T}\) were 97.44 °C and 69.4 °C\(,\) respectively. Increases in \(\Delta \mathrm{T}\) were not correlated with increasing experimental P, however, T was underestimated for a limited number of eclogite experiments containing \(\mathrm{ln}\) Kd values above 2 as well as for garnet containing Mg below 0.1 (Fig. 3). The EG79 calibration reproduced experimental T within ca.\(\pm\)200 °C across a range of \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\) and \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) compositions for eclogite experiments (Fig. 4) except for a few experiments with very low \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\) (< 0.20) where T was overestimated (Fig. 4). The EG79 calibration also performed consistently across a large range of Jd compositions in eclogite experiments (Fig. 4), although overestimation of T was evident for several high Jd experiments (Jd ≥ 0.55) (Fig. 4).

Fig. 4
figure 4

Comparison between \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\), \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) and Jd content in clinopyroxene versus \(\Delta\) T (T experimental—T calculated) for EG79, PO85, KR00, and A94 calibrations of the clinopyroxene-garnet Fe–Mg exchange geothermometer

Fig. 5
figure 5

Comparison of experimental T versus \(\Delta\) T, and experimental P versus \(\Delta\) T for Fe–Mg exchange geothermometers. All experiments are on eclogite compositions

Krogh (1988) (KR88)

The Krogh (1988) calibration of the garnet-clinopyroxene Fe–Mg exchange geothermometer (KR88) is commonly used to study the PT histories of mantle eclogites (i.e., Carswell et al. 1997). The performance of KR88 on our experimental dataset was like EG79 (see above) and Powell (1985) (see below) (Fig. 1). The KR88 calibration reproduced the experimental T of our dataset with a mean and median value for \(\Delta \mathrm{T}\) of − 42.01 °C and − 47.57 °C respectively (Table 3). The standard deviation and standard error for \(\Delta \mathrm{T}\) were 92.54 °C and 8.14 °C respectively (Table 3). KR88 underestimated T above 1400 °C and returned higher T estimates (relative to experimental T) between 1000 °C and 1350 °C (Fig. 1). The value for \(\Delta \mathrm{T}\) did not correlate with experimental P, however, larger values for \(\Delta \mathrm{T}\) were recorded for pressures above 50 kbar (Fig. 2). This relationship indicates a systematic underestimation of T at elevated P. Erroneous T estimates for KR88 were not correlated with lnKd but overestimation of T was common for lnKd < 1 (Fig. 3). \(\Delta \mathrm{T}\) for KR88 is negatively correlated with \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}.\) For \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) below 0.1, \(\Delta \mathrm{T}\) was typically > 100 °C, and for \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) below 0.15, \(\Delta \mathrm{T}\) was typically below − 100 °C. For garnet with Cr contents above 0.05 cpfu, the value for \(\Delta \mathrm{T}\) was typically between − 50 and − 100 °C. A positive correlation exists between \(\Delta \mathrm{T}\) and Al contents in garnet, whereby, Al contents below 0.95, the value for \(\Delta \mathrm{T}\) was typically between − 25 and − 250 °C, and for Al above 0.98, \(\Delta \mathrm{T}\) was typically between 50 and 250 °C. When applied to eclogitic garnet-clinopyroxene pairs, the KR88 calibration returned T estimates with a mean and median \(\Delta \mathrm{T}\) of − 41.81 °C and − 28.48 °C. The \(\Delta \mathrm{T}\) had a standard deviation of 113.21 °C.

Ravna (2000) (KR00)

The most recent instalment of the garnet-clinopyroxene Fe–Mg exchange geothermometer was proposed by Ravna (2000) to address compositional issues by including terms for \({\mathrm{X}}_{\mathrm{Mn}}^{\mathrm{cpx}}\) and \({\mathrm{X}}_{\mathrm{Mn}}^{\mathrm{grt}}\). The calibration was made using a large natural and experimental dataset comprised of peridotitic and granulitic compositions. For this study, KR00 reproduced T of peridotite experiments with a mean and median \(\Delta \mathrm{T}\) like EG79 (μ =  − 43.83 °C, M =  − 37.76 °C), but with slightly lower precision (σ = 114.55 °C, SE = 9.57 °C) (Table 3). Large values for \(\Delta \mathrm{T}\) occurred across most of the experimental PT range (Fig. 1). Systematic overestimation of T was observed above 40 kbar (Fig. 2). At high P (50‒70 kbar), experimental T was overestimated, commonly by up to 200 °C (Fig. 2). Poor performance of KR00 at high P was associated with elevated concentrations of Mg on the M2 cation site in clinopyroxene. We found that experimental T was overestimated on samples containing clinopyroxene with M2 site Mg above 0.35. Overestimation of T was common for experiments with lnKd values below 0.45, whereas underestimation was common above lnKd of 1 (Fig. 3). Increases in \(\Delta \mathrm{T}\) did not correlate with \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\) or \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) in peridotite experiments (Fig. 4). When applied to eclogitic garnet-clinopyroxene pairs, the KR00 calibration returned T estimates with a mean and median \(\Delta \mathrm{T}\) of − 40.12 °C and − 10.98 °C. The \(\Delta \mathrm{T}\) had a standard deviation and standard error of 131.80 °C and 9.39 °C respectively. Erroneous T estimates occurred for experiments containing \(\mathrm{ln}\) Kd values below 0.8 and above 1.5 (Fig. 3). Overestimation of experimental T was also common for garnet with \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) above 0.30 and for several high jadeite (Jd) eclogite experiments (Fig. 4).

Powell (1985) (PO85)

The Powell (1985) (PO85) calibration reproduced T of peridotite experiments with a mean and median \(\Delta \mathrm{T}\) of − 38.56 °C and − 49.05 °C respectively (Table 3). The standard deviation (96.56 °C) and standard error (8.01 °C) for \(\Delta \mathrm{T}\) show similar precision to the EG79 calibration. PO85 returned T estimates up to 100–200 °C above experimental T between experimental T of 1000–1400 °C (Fig. 1). Experimental T was only underestimated for experiments completed above 1600 °C (Fig. 1). The calculated T for PO85 were lower than the experimental T between 20 and 50 kbar (Fig. 2), whereas, above 50 kbar, PO85 returned T estimates that were closer to experimental T (Fig. 2). Variation in \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\) and \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) did not strongly affect the PO85 geothermometer, although underestimation of T was common for several very low \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) experiments (Fig. 4). When applied to garnet-clinopyroxene pairs synthesized from eclogite experiments, the PO85 calibration returned T estimates that generally agreed with experimental T (Fig. 5). Temperature estimates aligned closely with most EG79 estimates on eclogitic compositions (Fig. 5). The PO85 calibration reproduced experimental T with a mean and median \(\Delta \mathrm{T}\) of − 13.74 °C and − 6.69 °C respectively. The standard deviation and standard error for \(\Delta \mathrm{T}\) were 100.54 °C and 7.16 °C respectively. Variation in experimental P did not adversely affect the performance of the PO85 calibration on eclogitic compositions (Fig. 5). Overestimation of T was common for experiments containing \(\mathrm{ln}\) Kd values below 1 (Fig. 3) except for very low values of lnKd (< 0.45) where T was underestimated (Fig. 3). Underestimation was observed for several very low \(\mathrm{ln}\) Kd experiments (< 0.40) (Fig. 3). The PO85 calibration performed well across a range of compositions for \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\) and \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\), however, overestimation was observed on several high \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) experiments (> 0.35) (Fig. 4).

Ai (1994) (A94)

The updated garnet-clinopyroxene Fe–Mg exchange geothermometer calibration of Ai (1994) (A94) addresses several issues associated with earlier versions of the geothermometer by Powell (1985), Krogh (1988) and Ellis and Green (1979) by incorporating a wider range of \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\) and \(\mathrm{lnKd}.\) In this study, the A94 geothermometer reproduced the T of peridotite experiments with a mean and median \(\Delta \mathrm{T}\) of -33.35 ℃ and − 17.15 °C (Table 3). Large standard deviation (112.34 °C) and standard error (9.81 °C) for A94 highlight poor precision on peridotitic compositions (Table 3). Highly scattered T estimates were returned for experiments between 1000 and 1400 °C (Fig. 1). Across this T interval, the value for \(\Delta \mathrm{T}\) was typically more than 200 °C (Fig. 1). Severe overestimation of experimental T was systematic with increasing P (Fig. 2). Overestimation of experimental T increased from ca. 50 °C at 35 kbar to over 200 °C at 70 kbar (Fig. 2). Compositionally, the A94 calibration performed poorly on experiments containing \(\mathrm{ln}\) Kd above 1, and below 0.6 (Fig. 3). The \(\Delta \mathrm{T}\) increased with increasing \(\mathrm{ln}\) Kd (Fig. 3). The A94 calibration performed poorly when applied to garnet-clinopyroxene pairs synthesised from eclogite experiments (Fig. 5). The A94 calibration returned T estimates with a mean and median \(\Delta \mathrm{T}\) of − 72 °C and − 36.56 °C. The standard deviation and standard error for \(\Delta \mathrm{T}\) were 146.27 °C and 10.42 °C, respectively: these values demonstrate poor precision for the calibration of eclogitic compositions. Erroneous T estimates were closely associated with increasing P (Fig. 5). Above 40 kbar, A94 returned T estimates 50–200 °C above the reported experimental value (Fig. 5). Overestimation of T at elevated P appears to be systematic with T commonly underestimated at very low P (15 kbar). Erroneous T estimates for A94 were also observed in experiments containing \(\mathrm{ln}\) Kd values below 1 and above 1.7 for eclogite composition experiments (Fig. 3). Overestimation of T was very common below lnKd values of 1 (Fig. 3).

Harley (1984) (SH84)

The Harley (1984) (SH84) calibration of the garnet-orthopyroxene Fe–Mg geothermometer is widely used for studies of granulites and peridotites. Systematic errors in the SH84 geothermometer at high and low T were suggested by Nimis and Grütter (2010) in their review of mantle geothermometers. As a result, the formulation of this calibration was revised using a natural sample dataset (see Nimis and Grütter 2010). For this study, SH84 reproduced experimental T accurately and precisely between 1000 and 1400 °C (Fig. 1) and across the entire experimental P range (Fig. 2). The SH84 calibration reproduced experimental T with a mean and median \(\Delta \mathrm{T}\) of 32.96 °C and 23.35 °C respectively and a standard deviation and standard error for \(\Delta \mathrm{T}\) of 87.30 and 7.45 °C respectively (Table 3). Larger values for \(\Delta \mathrm{T}\) occurred at high (> 1400 °C) experimental T, which indicates a systematic overestimation of T at elevated experimental T (Fig. 1). Underestimation of T was common on garnet-orthopyroxene pairs with lnKd values above 1 (Fig. 3). The value for \(\Delta \mathrm{T}\) did not correlate with \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\).

Nimis and Grütter (2010) (NG10)

The Nimis and Grütter (2010) garnet-orthopyroxene Fe–Mg exchange geothermometer (NG10) is the most recent revision of the garnet-orthopyroxene geothermometer. This calibration was made using a natural xenolith database using PT estimates from the Taylor (1998) two-pyroxene solvus geothermometer and Nickel and Green (1985) Al-in-orthopyroxene solubility geobarometer. In this study, NG10 reproduced experimental T with a mean and median \(\Delta \mathrm{T}\) of 16.4 °C and 18.3 °C respectively (Table 3). The standard deviation and standard error for \(\Delta \mathrm{T}\) were 127.1 °C and 10.53 °C respectively (Table 3) (Fig. 1). Experimental T was underestimated below 30 kbar by ca. 50‒100 °C and overestimated above 50 kbar by up to 200 °C (Fig. 2). Overestimation of experimental T at elevated experimental P was systematic, increasing from 50 to 100 °C at 50 kbar to ca. 150‒200 °C at 70 kbar (Fig. 2). For lnKd values below 0.5, T was overestimated by 100–300 °C (Fig. 3). Above distribution coefficient values of 1, T was underestimated by ca. 100–200 °C (Fig. 3).

Garnet-clinopyroxene Fe–Mg exchange geothermometry

We have demonstrated that current calibrations of the clinopyroxene-garnet Fe–Mg exchange geothermometer have shortcomings that limit the reliability of T estimates on mantle-derived xenoliths and inclusions in a diamond, particularly for peridotite and pyroxenite compositions. We have also demonstrated that current calibrations are unable to reliably reproduce the experimental T for both eclogitic and peridotitic experiments. To improve this important geothermometer we have recalibrated it across an extended P (15‒70 kbar), T (850‒1750 °C), and compositional range using 193 peridotite, pyroxenite and eclogite synthesis experiments for the dataset described above. The wide compositional range of our calibration dataset is shown in the range of \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) (0.08–0.37), \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\) (0.25–0.82), lnKd (0.24–1.84), and Jd (0–0.64). This compositional range covers—refractory to fertile peridotites from both cratonic and non-cratonic settings and shallow and deep almandine-rich eclogites. Eclogite experiments have been taken from Yaxley and Brey (2004), Nakamura and Hirajima (2005), Spandler et al. (2008), Kiseeva et al. (2012), Litasov et al. (2014), Konzett et al. (2008) and Elazar et al. (2019). Using multiple linear regression to solve for lnKd (y), we found the following expression best reproduced the experimental T of our dataset:

$${\text{T}}_{{{\text{Fe}} - {\text{Mg}}}}^{{{\text{grt}} - {\text{cpx}}}} { (}^{ \circ } {\text{C)}} = { }\frac{3356.34}{{\left( {\left( { - 0.008 \times {\text{P }}\left( {{\text{kbar}}} \right)} \right) + \left( {0.259 \times {\text{X}}_{{{\text{Ca}}}}^{{{\text{grt}}}} } \right) + \left( {0.914 \times {\text{X}}_{{{\text{Mg}}}}^{{{\text{grt}}}} } \right) + \left( { - 0.159 \times {\text{Jd}}^{{{\text{cpx}}}} } \right) + \left( {{\text{ln}}\left( {{\text{ Kd}}_{{{\text{Fe}} - {\text{Mg}}}}^{{{\text{grt}} - {\text{cpx}}}} } \right) + 1.265} \right)} \right)}} - 273{ }$$
(1)

where, \({\text{X}}_{{{\text{Ca}}}}^{{{\text{grt}}}} =\) \(\frac{{{\text{Ca}}}}{{\left( {{\text{Ca}} + {\text{Fe}} + {\text{Mg}}} \right)}}\), \({\text{X}}_{{{\text{Mg}}}}^{{{\text{grt}}}} =\) \(\frac{{{\text{Mg}}}}{{\left( {{\text{Ca}} + {\text{Fe}} + {\text{Mg}}} \right)}}\), \({\text{Jd}}^{{{\text{cpx}}}} = {\text{Na}} - {\text{Cr}} - 2 \times {\text{Ti}}\), \({\text{kd}}_{{{\text{Fe}} - {\text{Mg}}}}^{{{\text{grt}} - {\text{cpx}}}} =\) \(\frac{{\left( {{\text{Fe}}_{{{\text{grt}}}} \times {\text{Mg}}_{{{\text{cpx}}}} } \right)}}{{({\text{Fe}}_{{{\text{cpx}}}} \times {\text{Mg}}_{{{\text{grt}}}} )}} ,\) with all elements calculated on the basis of 12 oxygen anions in garnet and 6 oxygen anions in clinopyroxene. Fe2+  = total Fe.

In our calibration, we have followed the recommendations of Ellis and Green (1979) and Ai (1994) by including terms for \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) and \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\). To account for a wider range of compositions and additional P effects we have incorporated a third term for jadeite concentration in clinopyroxene (\(\mathrm{Jd}=\mathrm{Na}-\mathrm{Cr}-2\times \mathrm{Ti}\)). It is expected that these terms help account for non-idealities in Fe–Mg exchange between garnet and clinopyroxene. Thermodynamically, our updated calibration is an empirical expression, however, as will be shown in greater detail below, our updated calibration reproduces T for experimental and natural datasets with precision and accuracy that is superior to thermodynamically calibrated expressions. Our updated empirical approach benefits from a considerably larger experimental range in P, T, and composition. The large range of these parameters allows the current calibration to be more reliably applied to a diverse set of lithologies. Additionally, the insertion of a new Jd correction improves the capability of the updated calibration on eclogitic and peridotitic compositions. The range in lnKd, Jd, \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) and \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\) in our updated calibration permits application to peridotite, pyroxenite and eclogite compositions. Equation (1) was fit to 193 experiments at a multiple R of 0.96 and R2 of 0.93. Equation (1) reproduced the experimental T of eclogite and peridotite experiments from our calibration dataset with a mean and median \(\Delta\) T of − 1.45 °C and − 0.75 °C respectively (Fig. 6a). The superior accuracy and precision of our calibration are highlighted in Fig. 6a, which shows the calculated and experimental T for the calibration dataset, and Fig. 6a which shows \(\Delta\) T versus experimental T using the same dataset. Application to independent experiments and natural datasets is outlined below. The lower average and median for T estimates demonstrate a considerable increase in accuracy of the geothermometer relative to the original versions by Ellis and Green (1979), Powell (1985), Ai (1994) and Ravna (2000). The lower standard deviation (\(\sigma =55.87 ^\circ \mathrm{C}\)) and standard error (SE = 4.15 °C) for \(\Delta\) T also demonstrate improvements in precision. Our updated calibration has a maximum interpreted estimated uncertainty of \(\pm 75 ^\circ \mathrm{C}\). This value was determined by taking the standard deviation of \(\Delta\) T for our updated calibration which returned a value of 61 °C. Although a more rigorous approach for determining the estimated uncertainty using error propagation (i.e., Sudholz et al. 2021a) is desired, the different experimental and analytical methods for the experiments in our dataset prohibit such an approach. A \(\pm 75 ^\circ \mathrm{C}\) uncertainty is validated below through application to independent experiments that were not included in our calibration dataset and to natural xenolith datasets. Our updated calibration performs reliably across a large T (840–1820 °C) and P (20–70 kbar) range for our calibration dataset. The value for \(\Delta\) T \(\left(\Delta \mathrm{T}={\mathrm{T}}_{\mathrm{exp}- }{\mathrm{T}}_{\mathrm{SUD}22\mathrm{ CPX}-\mathrm{GRT}}\right)\) does not show any systematic correlation with experimental P or T. There is also no correlation between lnKd and \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\) with \(\Delta\) T, which confirms the reliability of both eclogite and peridotite compositions. A ready-to-use version of this calibration is available in Appendix 2.

Fig. 6
figure 6

Comparison of experimental T versus \(\Delta\) T for our updated calibration of the garnet-clinopyroxene and garnet-orthopyroxene Fe–Mg exchange geothermometers

Evaluation using alternative additional experiments: garnet-clinopyroxene Fe–Mg exchange

To further evaluate our updated calibration, we have calculated the equilibration T for 147 additional independent garnet-clinopyroxene pairs synthesised from peridotite and eclogite experiments completed between 850 and 1750 °C and 15–100 kbar. These experiments were taken from an independent database which was not included in our calibration dataset for the garnet-clinopyroxene geothermometer. Experiments in this test dataset were taken from Wallace and Green (1991), Pintér et al. (2021), Girnis et al. (2011), Ardia et al. (2012), Tuff and Gibson (2007), Kessel et al. (2015), Sokol et al. (2016), Pertermann and Hirschmann (2003), Wang et al. (2010), Shatskiy et al. (2020), Mallik and Dasgupta (2012) and Rapp and Watson (1995). These experiments were not included in our experimental calibration because our existing dataset sufficiently covered a wide enough range in P, T and composition that including these experiments added no extra benefit. The filtering of this dataset followed the cation and oxide protocols outlined in the methodology section. The large compositional range (eclogitic to peridotitic) of this test dataset is highlighted in the range of \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) (0.08–0.41), Jd (-0.02–0.64), \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\) (0.12–0.81) and \(\mathrm{ln}{\mathrm{ Kd}}_{\mathrm{Fe}-\mathrm{Mg}}^{\mathrm{grt}-\mathrm{cpx}}\) (0.24–1.84). These compositions are comparable to the ranges used in the experimental calibration dataset (see above) and therefore provide a suitable test for our calibration. Our updated calibration reproduced the experimental T of the test dataset with a mean and median \(\Delta \mathrm{T}\) of 2.14 °C and 10.54 °C respectively. For peridotite experiments, the mean and median \(\Delta \mathrm{T}\) were 11.40 °C and 14.15 °C respectively and for eclogite experiments, these values were 7.63 °C and 2.98 °C. Low values for mean and median highlight superior accuracy for our updated geothermometer compared with earlier versions for both eclogite and peridotite compositions (Fig. 7). The standard deviation for \(\Delta \mathrm{T}\) for the entire dataset was 100.75 °C. For peridotite and eclogite experiments these values were 92.47 °C and 108.60 °C respectively. Analysis of our experimental data indicates that the primary source for large \(\Delta \mathrm{T}\) may be unusually high FeO contents in clinopyroxene: such compositions are uncommon in natural mantle xenolith suites. We, therefore, recommend caution when calculating equilibration PT on clinopyroxene with Fe above 0.35 cpfu. The wider pressure range (15–70 kbar) and incorporation of a jadeite correction for clinopyroxene have resolved shortcomings associated with P as observed in the Ai (1994) and Ravna (2000) calibrations (see above for discussion) (Fig. 7). Temperature was accurately reproduced at very high (> 50 kbar) and low (< 20 kbar) P for eclogite and peridotite compositions (Fig. 7). Our updated calibration also performed reliably between \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\) values of 0.15–0.80 (Fig. 7). This range in \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\) covers a compositional spectrum from almandine-rich eclogite garnet to pyrope-rich peridotite garnet. No systematic bias in the performance of our calibration was recognized across the \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\) range of experiments (Fig. 7). Our calibration performed well on jadeite concentrations between 0 and 0.65 (Fig. 7). Our calibration returned reliable T estimates across a range of \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) values from 0.1 to 0.4 and performed well across a range of MnO concentrations (0–1.4 wt%). As discussed above, an overestimation of T was recorded for clinopyroxene with FeO contents above 10 wt%. We found that corrections for this parameter (i.e., \({\mathrm{X}}_{\mathrm{Fe}}^{\mathrm{cpx}}\)) reduced the precision and accuracy of the calibration on peridotitic compositions. As a result, we have also avoided including a correction for \({\mathrm{X}}_{\mathrm{Fe}}^{\mathrm{cpx}}\).

Fig. 7
figure 7

Comparison between \(\Delta\) T for our updated calibration of the clinopyroxene-garnet Fe–Mg exchange geothermometer against experimental T, experimental P, lnKd, \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\), and Jd. Experiments used in this figure were taken from an independent dataset comprised of garnet and clinopyroxene pairs synthesised from eclogitic (open circle) and peridotitic (filled diamond) compositions. See text for discussion

To alleviate potential biases associated with inter-laboratory differences in experimental pressure, temperature, and analytical conditions, we have calculated the temperature of garnet-clinopyroxene equilibration using garnet-clinopyroxene pairs taken from the Green et al. (2014) experimental dataset. These experiments were conducted and measured using the same internally consistent methods. The analyses of garnet and clinopyroxene for these experiments are given in Appendix 1. The quality of silicate analyses within this dataset followed those outlined in the Methodology section. The pressure and temperature of this dataset range between 840 and 1450 °C and 20–60 kbar. Our updated calibration of the garnet-clinopyroxene Fe–Mg exchange geothermometer was performed reliably across the entire PT range of the dataset (Fig. 8). The mean and median values for \(\Delta \mathrm{T}\) \(\left(\Delta \mathrm{T}={\mathrm{T}}_{\mathrm{exp}- }{\mathrm{T}}_{\mathrm{calc}}\right)\) were − 33 °C and −21 °C. The standard deviation for T estimates was 62 °C. The agreement between calculated and experimental T for our updated calibration is shown in Fig. 8. The value for \(\Delta \mathrm{T}\) did not show any systematic correlation with experimental T or P (Fig. 8).

Fig. 8
figure 8

Evaluation of the updated garnet-clinopyroxene and garnet-orthopyroxene Fe–Mg exchange geothermometers on internally consistent experiments from Green et al. (2014). See Appendix 1

In summary, we have improved the state of garnet-clinopyroxene Fe–Mg exchange geothermometry by incorporating a wider compositional, pressure and temperature range of experiments used in the calibration. We have found that corrections for \({\mathrm{X}}_{\mathrm{Mg}}^{\mathrm{grt}}\), \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) and Jd were important in improving the versatility of the geothermometer across peridotitic and eclogitic compositions. An independent test of our updated calibration on peridotite and eclogite experiments using a separate experimental dataset demonstrates improvements in precision and accuracy relative to all earlier versions of the geothermometer.

Garnet-orthopyroxene Fe–Mg exchange geothermometry

The garnet–orthopyroxene Fe–Mg exchange geothermometer is an important tool for measuring the equilibration PT of refractory harzburgites that make up the metasomatized, and often diamond-bearing roots of cratons. In this study, as well as previous work by Nimis and Grütter (2010), it has been demonstrated that existing iterative combinations that calculate T using the garnet–orthopyroxene Fe–Mg exchange geothermometer have shortcomings that hinder their confident application to natural samples. Although recent attempts have been made to improve this important geothermometer (i.e., Nimis and Grütter 2010), we have demonstrated that these methods also have the shortcoming that prohibit confident application to natural datasets. To improve this geothermometer we have recalibrated it across an extended P (16–70 kbar), T (850–1525 °C), and compositional range (ln \(\mathrm{Kd}=\) 0.52–1.29) using 166 peridotite and pyroxenite synthesis experiments completed in a variety of natural and synthetic, fertile, and refractory experimental starting compositions. Using multiple linear regression to solve for lnKd (y), we found the following expression best reproduced the experimental T for our dataset:

$${\text{T}}_{{{\text{Fe}} - {\text{Mg}}}}^{{{\text{grt}} - {\text{opx}}}} { (}^{ \circ } {\text{C)}} = { }\frac{1851.85}{{\left( {\left( { - 0.007 \times {\text{P }}\left( {{\text{kbar}}} \right)} \right) + \left( { - 1.83 \times {\text{X}}_{{{\text{Ca}}}}^{{{\text{grt}}}} } \right) + \left( {{\text{ln}}\left( {{\text{ Kd}}_{{{\text{Fe}} - {\text{Mg}}}}^{{{\text{grt}} - {\text{cpx}}}} } \right) + 1.08} \right)} \right)}} - 273{ }$$
(2)

where, \({\text{X}}_{{{\text{Ca}}}}^{{{\text{grt}}}} =\) \(\frac{{{\text{Ca}}}}{{\left( {{\text{Ca}} + {\text{Fe}} + {\text{Mg}}} \right)}}\), \({\text{kd}}_{{{\text{Fe}} - {\text{Mg}}}}^{{{\text{grt}} - {\text{opx}}}} =\) \(\frac{{\left( {{\text{Fe}}_{{{\text{grt}}}} \times {\text{Mg}}_{{{\text{opx}}}} } \right)}}{{({\text{Fe}}_{{{\text{opx}}}} \times {\text{Mg}}_{{{\text{grt}}}} )}} ,\) with all elements calculated based on 12 oxygen anions in garnet and 6 oxygen anions in orthopyroxene. Fe2+  = total Fe.

Thermodynamically, Eq. (2) may be regarded as an empirical calibration, however, as will be shown below, the ability of our updated calibration to reproduce the experimental T accurately and precisely for a range of experimental and natural compositions indicates this method is appropriate. As discussed above, our updated empirical approach benefits from a considerably larger experimental range in P, T, and composition. The large range of these parameters allows the calibration to be more reliably applied to a diverse set of lithologies.

Equation (2) reproduced the experimental temperatures of our calibration dataset with a mean and median \(\Delta \mathrm{T}\) of − 4.39 °C and − 10.64 °C respectively. The lower average and median highlight an increase in accuracy of the geothermometer relative to the earlier versions proposed by Harley (1984) and Nimis and Grütter (2010). The lower standard deviation \((\sigma =\) 82.37 °C) and standard error (SE = 6.39 °C) also demonstrate an improvement in precision. The accuracy and precision of our calibration is highlighted in Fig. 6b, which shows the calculated and experimental T for the calibration dataset, and Fig. 6b which shows \(\Delta\) T versus experimental T using the same dataset. Our updated calibration has a maximum estimated uncertainty of \(\pm\) 100 °C. Like our updated garnet–clinopyroxene geothermometer, the estimated uncertainty for this geothermometer value was determined by taking the standard deviation of \(\Delta\) T which returned a value of 83. Although a more rigorous approach for determining the estimated uncertainty using error propagation (i.e., Sudholz et al. 2021a) is desired, the nature of our updated calibration does not permit such an approach. When applied to our independent dataset of experiments, Eq. (2) performed reliably across a large range in experimental P (30–70 kbar) and T (900–1400 °C). Large compositional variations in FeO and MgO do not show any systematic correlation with \(\Delta\) T \(\left(\Delta \mathrm{T}={\mathrm{T}}_{\mathrm{exp}- }{\mathrm{T}}_{\mathrm{SUD}22\mathrm{ OPX}-\mathrm{GRT}}\right).\) Larger-positive values for \(\Delta \mathrm{T}\) are slightly more common for samples containing very low CaO orthopyroxene (< 0.03 cpfu). Nonetheless, the value for \(\Delta \mathrm{T}\) for these compositions remained within \(\pm\) 150 °C. Changes in the concentration of FeO, MgO and CaO in garnet did not adversely affect the reliability of T estimates. When applied to internally consistent experiments from Green et al. (2014) (see Appendix 1), our updated calibration of the garnet–orthopyroxene Fe–Mg exchange geothermometer reproduced experimental T with a mean and median value for \(\Delta\) T \(\left(\Delta \mathrm{T}={\mathrm{T}}_{\mathrm{exp}- }{\mathrm{T}}_{\mathrm{SUD}22\mathrm{ OPX}-\mathrm{GRT}}\right)\) of − 10 °C and − 13 °C respectively. The standard deviation for \(\Delta \mathrm{T}\) was 73 °C. The value for \(\Delta\) T did not show any systematic correlation with experimental P or T (Fig. 6). A ready-to-use version of this calibration is available in Appendix 3.

Evaluation as an iterative geothermobarometer

To evaluate our updated calibrations of the clinopyroxene-garnet Fe–Mg exchange geothermometer and orthopyroxene-garnet Fe–Mg exchange geothermometer we have iteratively calculated the equilibration PT for > 250 peridotite and pyroxenite xenoliths from the Solomon Islands (Ishikawa et al. 2004), Siberian Craton (Udachnaya; Yaxley et al. 2012; Doucet et al. 2013; Agashev et al. 2013), Slave Craton (Ekati/Diavik; Menzies et al. 2004; Yaxley et al. 2017; Creighton et al. 2010); Kaapvaal Craton (Cullinan; Viljoen et al. 2009; Janney et al. 2010) and Chidliak (Kopylova et al. 2019). This set of samples was chosen because they contain a diverse range of garnet, clino- and orthopyroxene chemistries and tectono-magmatic settings (on- and off craton, kimberlite- and basalt hosted xenoliths etc.). Equilibration PT was solved using our updated calibrations in conjunction with the Nickel and Green (1985) Al-in-orthopyroxene geobarometer. For comparison, we also calculated PT using the Taylor (1998) two-pyroxene solvus geothermometer in conjunction with the Nickel and Green (1985) Al-in-orthopyroxene geobarometer. Details of the xenolith samples used, their references and PT estimates are provided in Appendix 4. The updated calibration of the clinopyroxene-garnet Fe–Mg exchange geothermometer returned T estimates that were typically within \(\pm\) 100 °C of those calculated using the Taylor (1998) (TA98) calibration. The mean and median difference in T estimates between both calibrations (\(\Delta \mathrm{T}={\mathrm{T}}_{\mathrm{SUD}22\mathrm{cpx}-\mathrm{grt}- }{\mathrm{T}}_{\mathrm{TA}98}\)) were 45.56 °C and 42.96 °C (Fig. 9). The standard deviation for \(\Delta \mathrm{T}\) was 97.10 °C, which is consistent with our estimated uncertainty on T estimates for the updated calibration. These statistical parameters confirm the improved reliability of our updated calibration in comparison with earlier versions. Figure 9 shows that there are no systematic discrepancies between \(\Delta \mathrm{T}\) and temperatures estimated using the TA98 pyroxene solvus thermometer, which suggests that our updated calibration can be reliably applied to mantle xenoliths across a T range of at least 700–1400 °C. Discrepancies between our updated calibration and TA98 are not correlated with NG85 P estimates (not shown). Our updated calibration reproduced the T of TA98 across a range of garnet, clinopyroxene and orthopyroxene chemical compositions (Fig. 9). T estimates between both calibrations are also similar across a large range of CaO compositions in garnet (Fig. 9). It is only for garnet with CaO below 4.5 wt% that the value for \(\Delta\) T increases above + 100 °C consistently. Our updated calibration reproduces TA98 T estimates across for a range of clinopyroxene Al2O3 concentrations (1–6 wt%) (Fig. 9). For clinopyroxene with Al2O3 below 1 wt%, the value for \(\Delta\) T decreases systematically. The lowest- negative value for \(\Delta\) T occurs for clinopyroxene with Al2O3 contents of < 0.5 wt%. Our updated calibration of the orthopyroxene-garnet Fe–Mg exchange geothermometer also returned T estimates that were typically within \(\pm\) 100 °C of T[TA98] estimates (Fig. 9). The mean and median difference in T estimates between both calibrations (\(\Delta \mathrm{T}={\mathrm{T}}_{\mathrm{SUD}22\mathrm{opx}-\mathrm{grt}- }{\mathrm{T}}_{\mathrm{TA}98}\)) were − 44.55 °C and − 44.56 °C (Fig. 9). The standard deviation for \(\Delta \mathrm{T}\) was 126.80 °C. Our updated calibration performed reliably across a range of garnet and orthopyroxene compositions (Fig. 10). No systematic correlation was observed between \(\Delta\) T and the concentration of FeO, MnO, MgO and CaO in garnet. Reliable T estimates across a range of CaO contents in garnet is likely attributed to the correction for \({\mathrm{X}}_{\mathrm{Ca}}^{\mathrm{grt}}\) (Eq. 2). No correlation was observed for \(\Delta\) T and the concentration of MgO and FeO in orthopyroxene. Increases in SiO2 and NiO in orthopyroxene show a relationship with larger (negative) values for \(\Delta \mathrm{T}\) (not shown).

Fig. 9
figure 9

Comparison between PT estimates solved iteratively for our updated calibrations of clinopyroxene-garnet Fe–Mg exchange geothermometer and orthopyroxene-garnet Fe–Mg exchange geothermometer with the Taylor (1998) two-pyroxene solvus geothermometer for published data on selected peridotite and pyroxenite xenoliths. All T estimates were calculated iteratively using P estimates from the Nickel and Green (1985) Al-in-orthopyroxene geobarometer. References and sample numbers for xenolith data are provided in Appendix 4

Fig. 10
figure 10

Comparison between T[SUD22CPX-GRT]-T[TA98] and T[SUD22OPX-GRT]-T[TA98] for several compositional variables taken from the natural dataset discussed in Fig. 10 and Appendix 4. See text for discussion. Symbols as Fig. 10

Conclusion

The reliability of eight Fe–Mg exchange geothermobarometers for garnet-bearing peridotites and pyroxenites has been examined using an experimental database comprised of 300 published experiments completed between 10 and 70 kbar and 850 to > 1650 °C. All calibrations reproduce experimental T with reasonable accuracy but varying degrees of precision. Shortcomings in these Fe–Mg exchange geothermometers were typically associated with variations in \({\mathrm{Kd}}_{\mathrm{Fe}-\mathrm{Mg}}^{\mathrm{A}-\mathrm{B}}\), M1 and M2 site X2+ cation concentrations, and pressure. To improve the state of mantle geothermometry we have presented revised calibrations of the (1) garnet-clinopyroxene Fe–Mg exchange geothermometer, and (2) garnet-orthopyroxene Fe–Mg exchange geothermometer. Both geothermometers were recalibrated across an extended pressure, temperature and compositional range using similar chemical parameters to those defined in the experimental studies of Ellis and Green (1979) and Harley (1984). Our updated calibrations resolve several issues with earlier calibrations, including a poor performance at higher P and compositional limitations. We demonstrate improvements in precision and accuracy through application to published experimental and natural peridotites, pyroxenites and eclogites. Future work on geothermobarometry for mantle peridotite and pyroxenite should focus on (1) further refining the olivine-garnet Fe–Mg geothermometer (2) evaluating the effects of variations of ferric iron contents on mineral exchange equilibria at different P and T; and (3) evaluating the effects of H2O, K and Na on the performance of geothermobarometers for orthopyroxene and clinopyroxene.