Abstract
The composition and origin of Earth’s earliest continental crust remains enigmatic due to the absence of Hadean (>4 Ga) age rocks. Here we address this question by using machine learning to examine the provenance of the 4.4–3.3 Ga Jack Hills zircons, which constitute the best archive of Earth’s earliest continental crust. Our results reveal that although some Jack Hills zircons may be derived from trondhjemite-tonalite-granodiorite series rocks, which were common during the Archean (4–2.5 Ga), most (as high as ~70%) are sourced from igneous (I-) and sedimentary (S-) type granites. This finding provides clear evidence for rocks other than the trondhjemite-tonalite-granodiorite suite in the Earth’s earliest continental crust. Considering that I- and S-type granites are typical of modern convergent plate margins, the presence of a high proportion of Jack Hills zircons from these rocks supports the operation of a horizontal, mobile-lid tectonic regime in the early Earth.
Similar content being viewed by others
Introduction
Rocks of Hadean age (>4 Ga) are lacking from the Earth’s rock archive. Much of our knowledge regarding the Earth’s earliest history has been gleaned from the geochemical features of the physio-chemical resistant mineral zircon that occurs as detrital grains in metasedimentary rocks from the Jack Hills (JH), Western Australia1,2,3,4,5,6. Deciphering the source rocks of JH zircons is thus critically important in establishing the composition and tectonic affiliation of the Earth’s earliest crust2,7,8, as well as the potential for initial terrestrial habitability9,10. Studies to date have argued for mafic source rocks11,12, impact melts13,14, and felsic source rocks8,15,16,17. However, the elevated δ18O in many JH zircons7,17,18 and the predicted high source melt SiO2 contents of the JH zircons19,20 are not consistent with a dominant mafic source rock origin. The possibility of an impact melt sheet origin was also subsequently ruled out due to the noticeable distinctions between JH zircons and those from rocks at the Sudbury impact crater16,19,21. Although there is an increasing consensus for the derivation of the JH zircons from felsic melts in a continental setting, the exact source rocks remain disputed. A large number of studies22,23,24,25 have suggested that the Hadean continental crust should have compositions comparable to granitoids of the tonalite–trondhjemite–granodiorite (TTG) series (generally produced by melting and/or crystallization of a basaltic source26). This seems reasonable based on the subsequent dominance of TTGs in the Archean (4.0–2.5 Ga) continental crust26,27. This argument has also been justified by comprehensive zircon Hf isotope studies28, recent thermodynamic modeling29, and calculated model melts based on Ti-calibrated zircon/melt partition coefficients30.
Others have proposed the formation of the JH zircons in near-H2O saturated meta- (i.e., I-type) and/or peraluminous (i.e., S-type) magmas commonly seen in modern convergent plate margins6,17,31, rather than in TTG magmas. The supporting evidence includes the low crystallization temperature illustrated by Ti-in-zircon thermometry (the opposite of what is expected from TTG magmas)32,33 and mineral inclusion assemblages that are indicative of I- and S-type granitoids15,17. However, none of the above evidence is unequivocal. The calculated crystallization temperature is highly sensitive to the choice of the TiO2 activity and using a low TiO2 activity (e.g., 0.4–0.5 versus 1) for JH zircons will return a similar temperature range to TTG magmas29. Meanwhile, whether the observed inclusions are primary or not is debated34. Furthermore, even among the studies that advocate derivation from I- and S-type magmas, controversy extends to whether S-type source rocks dominated the felsic portion17, or I-type source rocks prevailed in the Earth’s earliest continental crust16. The predominance of muscovite and quartz inclusions (accounting for nearly three-quarters of inclusions in Hadean JH zircons), if indeed primary, are more consistent with derivation from an S-type dominated magma source15,17. In contrast, aluminum and phosphorus contents in JH zircons, are argued to suggest mainly I- rather than S-type protoliths8,16,35. However, neither Al nor P proxies can effectively identify TTG zircons8,16, thus TTGs remain a possible protolith of the JH zircons.
To address the composition of the Earth’s earliest continental crust and its possible tectonic significance, in this study we establish a machine learning method that can distinguish detrital zircons from TTG, I- and S-type, respectively; then, we apply this method to identify the source rocks of the JH zircons. Our results show that most JH zircons (as high as almost 70%) are from I- and S-type granites, rather than TTGs. This finding runs counter to the general view and carries important information for the style of Earth’s earliest tectonic regimes.
Results and discussion
A machine learning method for distinguishing source rocks of detrital zircons
To provide solid constraints on the Hadean continental crust, a better understanding of the source rock information recorded by zircon is required. In this study, we compiled a zircon trace element dataset from a variety of source rocks. This set included 3168 published zircon analyses from I-type rocks, 2056 from S-type rocks, and 808 from TTGs (see “Method” section), which were all the data available to the authors at the time of writing. This offers a unique opportunity to investigate the relationship between the trace element geochemistry of zircon and its provenance, which in turn can be used to decipher the likely geochemical source of detrital grains for which a connection with the original source is not preserved. Theoretically, S-type granites are more reduced than I-type granites36 and also probably TTGs (due to both being derived from igneous source rocks). Most TTGs should also be more depleted in the HREE than S-type rocks because of the widely accepted derivation of the dominant TTG groups (i.e., medium- and high-pressure groups that account for ~80% of global TTGs) from a garnet-bearing mafic source26, despite some claims to the contrary37. Thus, it is expected that S-type zircons can be distinguished from I-type and TTG zircons according to Ce/Ce* and Eu/Eu*, which are indicators of magma oxidation state38, whereas most TTG zircons should have HREE contents (e.g., Yb) greater than S-type rocks. In practice, however, distinguishing source rock composition using zircon geochemistry is complicated39, as indicated by the noticeable overlap in the chondrite-normalized rare earth element (REE) patterns (Fig. 1a–c and Supplementary Fig. 1), and in the Ce/Ce* versus Eu/Eu* diagram of zircons from those rock types (Fig. 1d). The reason for such overlaps is that the trace element chemistry of zircon is affected by many different variables in addition to parental melt composition (e.g., temperature, pressure, oxygen fugacity, and competition from other minerals40,41,42). Differentiating the relative significance (and thus further deconvolving the effects) of these variables is extremely challenging, especially for detrital grains that lack a direct link with the source rock from which they were derived39. Therefore, although parental melt composition may act as a first-order control on the trace element composition of zircons that crystallized from it, the relationship between zircon compositions and their parental magma may not be as intuitive as we have expected.
In this study we, therefore, have applied machine learning (ML) technology to relate the trace element geochemistry of zircon to provenance (Fig. 2). Compared with traditional classification methods that are based on single elements8,16 or some binary and/or triangular diagrams (where generally only a couple of elements are utilized)39, the advantage of ML is that it can effectively utilize more features and capture complex nonlinear relationships among large datasets43,44,45. This approach promises to achieve a much higher level of classification accuracy than the previous methods. Moreover, ML learns the classification features by itself without being explicitly programmed, and thus the internal, complex relationships within the data can be discovered algorithmically without the requirement for pre-existing knowledge. To acquire the best classification models to identify the granitic sources from which detrital zircons are derived, we applied three common supervised ML approaches, namely, Support Vector Machine (SVM), Random Forest (RF), and Multilayer Perceptron (MLP), and their prediction performance was compared. Details about zircon selection criteria, data curation, and modeling procedures are presented in the Methods section and Supplementary Table 1. It should be noted that our ML classifiers can only output three types of source rocks. This may be problematic when these classifiers are used to identify the provenance of zircons that may come from source rocks other than I-type, S-type and TTG rocks. Thus, a preliminary study is needed to mitigate such concerns before the use of our ML classifiers. However, as discussed at the beginning, previous studies have demonstrated that the source rocks of JH zircons should be characterized by the overwhelming majority of I-type, S-type rocks and/or TTGs over other source rocks (if any). Thus, a ML classifier trained by I-type, S-type and TTG zircons will be appropriate for the provenance studies of the JH zircons.
Seventeen features—including 11 REEs (Ce, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, and Lu), Th, U and 4 derived trace element ratios (Th/U, U/Yb, Ce/Ce* and Eu/Eu*)—were used for the ML algorithms. Th and U concentrations were corrected for radioactive decay since the time of crystallization (see Methods). These 17 features were selected because (1) they are routinely analyzed in many laboratories and are more commonly reported in the literature; and (2) they have been shown to be useful in discriminating zircon provenance39,41,46, despite some claims to the contrary11,47. Moreover, our statistical analysis work has indicated that although none of these selected elements and/or ratios is able to independently identify all three types of zircons, each can distinguish at least one zircon type from the rest (Supplementary Fig. 2). For example, most S-type zircons can be distinguished by lower Ce and higher Tb; most I-type zircons can be distinguished by higher Th/U and higher Ce/Ce*; and most TTG zircons can be distinguished by much lower Th and U.
For each model, the individual metrics for each fold during the tenfold cross-validation process are reported in Supplementary Table 2, with the average performance metrics for the test set after the tenfold cross-validation process summarized in Supplementary Fig. 3 and Supplementary Table 3. It can be seen that according to the performance metrics of the test set, all three trained ML algorithms present good performance in identifying source rocks of zircons, with an overall accuracy of 0.88 for SVM, 0.84 for RF, and 0.87 for MLP (Supplementary Table 3). In trained SVM and MLP models, the precision for each type of zircon is higher than 0.82; in the RF model, the individual precision is also higher (0.82–0.87) except for TTG zircon (0.79). Moreover, three trained ML models are also characterized by higher AUC values (0.967 for SVM, 0.965 for RF, and 0.968 for MLP). All the above results confirm that the trained models perform very well in predicting zircon types. To investigate how the models had learned input-output relationships, we used an explainable artificial intelligence approach (SHAP48). As described in detail in the Materials and Methods section, a SHAP value is calculated for each feature of each zircon type during the training process. The amplitude of the SHAP value reflects how important a feature is for a certain zircon type, while the sign of the SHAP value reflects whether the feature has a positive or negative contribution to the zircon type, in other words why it is important.
By comparing with the statistical analysis pattern (Supplementary Fig. 2), the SHAP summary plots indicate that the relationship between input and output was captured plausibly (Supplementary Figs. 4–6). For example, Th/U was captured as the most important feature in distinguishing I-type zircons for all three models (Supplementary Figs. 4–6); for I-type zircons high Th/U inputs (red) produce high SHAP values and therefore have a strong positive influence on the model output, whereas for S-type and TTG zircons, low Th/U inputs (blue) produce high SHAP values and therefore have a strong positive influence on the model output. This corresponds with our basic understanding derived from the statistical analysis result, where I-type zircons are visually characterized by noticeably higher Th/U than other types of zircons (Supplementary Fig. 2). The other four most important features in distinguishing I-type zircons are Tb, Eu, Yb and Lu for the SVM model (Supplementary Fig. 4); Eu/Eu*, Th, Ce/Ce* and Ce for the RF model (Supplementary Fig. 5); and Tb, Lu, Eu/Eu* and Th for the MLP model (Supplementary Fig. 6). Ce and Eu, as well as the derived ratios (Ce/Ce* and Eu/Eu*) are important in distinguishing S-type zircons (Supplementary Figs. 4–6). Th and Th/U are of importance in distinguishing TTG zircons in the RF and MLP models (Supplementary Figs. 5, 6), whereas Dy and Tm are the most important features in the SVM model (Supplementary Fig. 4). Again, these all correspond well with what has been seen from the statistical analysis result (Supplementary Fig. 2), despite the slight distinctions among models in the relative importance of different features. While we now know the input-output relationships in each trained model, their geological significance, for example, why most I-type zircons are characterized by much higher Th/U than other zircon types49, is still unclear and further research is merited.
Plausibility checks by two case studies
It has been suggested that a model’s stated performance may not accurately reflect its performance post-deployment because of, for example, overfitting50 and black-box effects of the used ML methods51. Thus, before applying these trained models to provenance studies of the JH zircons, we first evaluated them using the 150–50 Ma detrital zircons from the Gangdese magmatic belt in southern Tibet and 3600–2700 Ma detrital zircons from the Western Dharwar Craton, southern India52. None of the Phanerozoic Gangdese detrital zircon grains belongs to the TTG zircon population and previous studies have demonstrated that the 150–50 Ma batholith in the Gangdese magmatic belt (and thus detrital zircons in this area with the same age span) is predominantly I-type53. In contrast, the Western Dharwar Craton detrital zircons should dominantly be of TTG origin54. Thus, these detrital zircons provide two ideal examples to test the plausibility of these ML models in distinguishing the provenance of real-world detrital zircons. The provenance results predicted by the three models are shown in Supplementary Fig. 7. It can be seen that the three ML methods give very similar results for each case. Most of the detrital grains (616 of 733 analyses in SVM, 594 in RF, and 586 in MLP) from the Gangdese magmatic belt are classified into the population from I-type rocks with only a few (2–7%) wrongly classified into the TTG population (Supplementary Fig. 7a), whereas TTGs are identified as the dominant source rocks of the Western Dharwar Craton detrital zircons (53 of 65 analyses in SVM, 46 in RF and 50 in MLP; Supplementary Fig. 7b). These model results are consistent with our basic understanding of local geology.
Overall, the metric results derived from the three test sets (including one test set during training and the above two external test sets) in this study consistently affirm the robustness of the three trained classifiers. Despite this, in the Gangdese case (Supplementary Fig. 7a), the trained RF model returns a greater proportion of incorrectly classified TTG zircons (7%) compared to the SVM and MLP (both 2%) models, indicating the relatively low performance of the RF model in distinguishing TTG zircons. This is also reflected in the confusion matrix where the accuracy of the trained RF model in identifying TTG zircons (0.79) is lower than that of two other models (both 0.88; see Supplementary Fig. 3 and Supplementary Table 3). Considering that correctly distinguishing the detrital zircons of TTG origin is of particular importance for this study, only the trained SVM and MLP models will be used for the provenance studies of the JH zircons.
The provenance of the Jack Hills zircons
We compiled a high-quality JH detrital zircon database comprising 666 published trace element analyses52. The classification results based on the trained SVM and MLP models were given in Fig. 3. The two ML models give a very consistent zircon-type distribution pattern with time, further indicating the high reliability of the results. According to the SVM classifier, 36% of 666 compiled JH grains are from I-type rocks, 33% from S-type rocks, and 31% from TTG rocks (Fig. 3a). The MLP classifier gives a very similar result, with 32% of grains classified into the TTG population (Fig. 3b). Figure 3c further shows that the JH zircons derived from I- and S-type source rocks dominate over those from TTGs except in the Hadean. During 4.2–4.0 Ga, the proportion of JH zircons derived from TTGs is broadly consistent with, and only locally higher than, that from I- and S-type rocks. The same pattern is also observed during 4.4–4.2 Ga, although only 12 grains in the 666 JH data (accounting for less than 2%) give ages older than 4.2 Ga, and thus the zircon sources for this timeframe are less constrained. The above source rock pattern for the JH zircons may be flawed to some degree by a preservation bias inherent in using detrital zircons. However, in the absence of a natural selection mechanism that preferentially excludes zircons formed from TTG magmas, TTGs are unlikely to have contributed noticeably to the Hadean JH zircon population. Overall, it can be seen that the JH continental crust pattern is different from typical Archean continental crust where TTGs account for an overwhelming proportion (>80%) of felsic rocks55.
Implications for the early Earth
Our study shows that the JH continental crust—which probably represents Earth’s earliest continental crust—was not predominantly composed of TTGs. On the contrary, it encompasses a high proportion of I- and S-type rocks (in the Hadean and especially in the Archean) that are commonly richer in K2O. The variety of granitic sources for the JH zircons is noticeably different from the typical Archean continental crust. The record derived from the surviving Archean crust suggests that TTGs should constitute more than 80% of the felsic portion55, while potassic granitoids (including typical I- and S-type rocks) appear later in Earth’s history, locally after 3.2 Ga and globally by the end of the Archean56,57,58,59.
Remarkably, comprehensive trace element analyses have not been conducted for Hadean zircons from terrestrial localities beyond the JH region. This hinders to some degree a robust comparison of JH zircons with other Hadean populations, which, in turn, makes it unclear how representative the JH crust is of the Hadean world60. Nonetheless, we can achieve some clues from the 4.02 Ga Idiwhaa tonalitic gneiss (ITG) within the Acasta Gneiss Complex in Canada. This rock is the oldest well-preserved terrestrial rock unit61. The whole-rock trace-element systematics of the ITG is markedly different from the average Archean TTGs. TTGs are generally characterized by noticeably depleted HREE and indistinct Eu anomalies (due to the involvement of garnet in melting or magma fractionation)62; in contrast, it has been found that the REE pattern of the ITG shows little fractionation of LREEs from HREEs and pronounced negative Eu anomalies (probably due to noticeable plagioclase fractionation)62. Thus, these lines of evidence, combined with those from JH zircons, collectively support the notion that Hadean continental crust was composed of a more diverse suite of granitoids than just the TTGs that predominate in typical Archean crust.
What then, can be said about the early Earth? The diverse assemblage of I- and S-type granitoids and TTG’s in Earth’s earliest continental crust must in part reflect the tectonic setting. The geochemical diversity of Archean TTGs has been generally ascribed to two geodynamic settings: subduction models and plateau-like models. The subduction models involve plate tectonics and dominance of horizontal forces, suggesting that TTGs were produced by partial melting of a subducting slab63,64,65,66. The plateau-like models instead suggest the formation of TTGs near the base of thick, plateau-like basaltic crust in non-plate tectonic regimes37,67,68,69. Many numerical modeling studies have suggested that the hotter mantle conditions in the Hadean Earth than in the present-day Earth—as indicated by geological and geochemical data70,71—may not allow continuous subduction and thus have supported the formation of TTGs under plateau-like vertical tectonic regimes72. The ITGs, though compositional different from TTGs, have been proposed to be nearly identical to those of some intermediate rocks from Iceland, a modern-day plateau setting formed via a mantle plume62. This seemingly supports the proposal of the vertical tectonic settings for the early Earth.
However, for occurrences where a plateau-like setting (whether or not formed via vertical tectonics) is proposed, few I- and S-type granites have been reported22,26. Alternatively, according to modern-day environments, the I- and S-type granites mainly occur in convergent plate margin settings17. This is consistent with recent geochemical modeling and B and Ca isotope studies19,30,73,74, in which a modern continental arc-like setting is proposed for the Hadean Earth. It is noted that modern continental arcs are generally characterized by the overwhelming majority of I- over S-type rocks, whereas S-type source rocks inferred from JH zircons are only slightly lower than the I-type population (Fig. 3). The relatively higher proportion of peraluminous zircons in the JH region than in the modern arcs, if not resulting from a preservation bias, could be explained by the moderate to high-pressure fractionation of hydrous mafic material (although the most volumetrically crucial way to produce strongly peraluminous JH melts might be still through melting of weathered sediments), which could produce alumina-rich melts and thus a higher proportion of peraluminous zircons over those found in typical modern arcs8. Lateral motion of the Hadean and early Archean lithosphere and its recycling into the mantle does not necessarily suggest ‘modern-style’ plate tectonics58,75,76, but is consistent with a more general mobile lid environment, which may take many forms59,77,78.
Overall, our work shows that the dominant components of Earth’s earliest continental crust are consistent with some form of lateral motion of the lithosphere and are difficult to be produced by models based purely on (plume-driven) vertical tectonics. If the earliest TTGs inferred from the JH zircons were indeed formed by plateau-like models as argued for their Archean counterparts37,67,68,69, then the most plausible scenario for the early Earth is that, as in modern Earth, the Hadean continental crust was organized into two different tectonic regimes63. The major difference is probably that in the Hadean Earth, vertical tectonics dominates over modern plate tectonic-like tectonic regime79. This is thus consistent with many arguments that the plate tectonic regime on Earth was unlikely to have commenced synchronously, but rather began locally and progressively became more widespread58,59,77,80,81. Additional work is needed to illustrate how these different tectonic styles reconciled with each other in the early Earth.
Methods
Data compilation and feature selection
Zircon compositions from I-type, S-type and TTG rocks worldwide, as well as detrital zircon compositions from the Gangdese magmatic belt, the Western Dharwar Craton, and the JH region, were compiled from over 140 references (ca. 14,500 analyses) available to the authors at the time of writing52. Previous studies have shown that adakitic rocks, which are mainly characterized by high whole-rock Sr/Y and La/Yb ratios82, represent a special end member of I-type rocks that share many similar geochemical features with Archean TTGs67. To equip the ML models with the ability to distinguish such special I-type end-member from TTG zircons, 4600 zircon analyses from adakitic rocks have been compiled into the I-type dataset. 11 REEs (Ce, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, and Lu), Th, U, Th/U, U/Yb, Ce/Ce* and Eu/Eu* were used as features for all ML algorithms. Due to radioactive decay, the measured Th and U concentrations would be lower than those at the time of crystallization. This is especially true for the Archean TTG zircons and the JH zircons. Thus, both Th and U (and thus the derived Th/U and U/Yb) were corrected back to the time of crystallization. We did not include three REEs—La, Pr and Nd—in our ML models. This is because La and Pr are present at very low levels in natural zircons and are generally close to or below the instrumental detection limits. Thus, they are missing in many zircon analyses and even in the places where they are reported, their contents may not be reliable. Although the Nd concentrations of magmatic zircons are generally far above the detection of limit, our statistical analysis work shows that the difference of Nd (e.g., the medians and interquartile range) among the three types of zircons is not as remarkable as that observed for other REE (Supplementary Fig. 2). Some elements like Al, P, Sc, Hf and Y, which may also be useful in the identification of the origin of zircon, were not used in this study. This is because many geochemical analyses of zircons did not report these elements, and thus excluding these elements allows us to use more published data.
Treatment of missing values and data filtering
According to the above descriptions, to conduct the ML modeling in this study will require at least 13 trace elements (11 REEs plus Th and U). However, not all of these 13 elements were determined causing gaps in the compiled database. This is inevitable because these data were compiled from different studies where different analytical procedures were used and could not have all been capable of determining the full range of elements. For simplicity, we exclude the analyses which contained missing values for Th and U. In contrast, the analyses with partial missing values of REE data were not excluded, since they can be easily extrapolated from other REE concentrations using the method by Zhong et al.38. The use of this composite dataset that comprises different sources of data also requires quality control to handle outliers. Statistical errors can be easily identified (e.g., by the standard deviation), which relies on data being normally distributed. However, geochemical data rarely exhibit normal or log-normal distributions83, indicating that statistical outliers may probably arise as a natural product of diverse geological processes84. Thus, in this study, we did not exclude statistical outliers in case that discarding them may ultimately bias the models. In this study, our focus is on assessing outliers resulting from analytical or human errors. Many studies showed that zircon compositions (especially LREEs) are highly susceptible to contamination by accessory mineral inclusions85,86. The common accessory mineral inclusions include apatite, titanite, monazite, allanite and xenotime3,85, which are also characterized by noticeably higher La contents than the host zircon85. To exclude such artifacts, we follow previous studies and use selection criteria of La < 1 ppm87. The resulting number of individual analyses obtained in result of the above filtering decreased to 9050. 2193 analyses with noticeably discordant ages (discordance more than 20%), which are in general related to alteration and/or metamorphism, were also discarded. For the JH zircons, grains with 207Pb/206Pb age < 3300 Ma were further discarded since they might have experienced noticeable Pb loss. After filtering, 3168 of 8500 zircon grains from I-type rocks, 2056 of 5350 from S-type rocks, 808 of 2305 from TTGs, 733 of 1494 from the Gangdese magmatic belt, 66 of 192 from the Western Dharwar Craton, and 666 of 905 from the JH region were retained. The statistical feature of the compiled zircon data from the three source rocks was shown in Supplementary Fig. 2. As already mentioned, the three distinct zircon populations can be distinguished to a certain degree by each element and/or ratio. The datasets for I- and S-type rocks and TTGs were then randomly subdivided into training (80%) and test (20%) sets, respectively, each preserving the proportion of high and low values for a given element of the full dataset.
Treatment of class imbalance problem
In this study, the compiled zircon analyses from different source rocks are imbalanced: the proportion of zircon from TTGs (13%) is noticeably lower than that from S-type rocks (34%), both in turn lower than that from I-type rocks (53%). Such a class imbalance is a common problem in ML. Previous studies have demonstrated in such a situation most of the classifiers may be biased towards the major classes and thus probably show poor classification rates for minor classes88. The common technique to solve this problem is oversampling the minority class or undersampling the majority class to produce a relatively class-balanced database. In this study, undersampling was used because our preliminary work showed that it worked better than oversampling according to the performance metrics. Specially, we used Tomek Link (developed by Tomek) undersampling technique89. The advantage of Tomek Link is that it does not aim to reach an absolute balance between different classes, rather it focuses on removing the boundary values and the noise from the dataset and does not alter the rest of the dataset89. Thus, there is less chance of losing important information, which has been argued as a common problem for undersampling90.
ML model training
In this study, three supervised ML methods—Support Vector Machine (SVM), Random Forest (RF), and Multilayer Perceptron (MLP)─were used to determine statistical relationships between zircon trace element concentrations and their source rocks. A large number of studies have illustrated that these methods are robust in solving problems from geoscience43. The Python programming language (Python 3.7.0) was used for the three ML algorithms. SVM was conducted with sklearn.svm.SVC in scikitlearn library 0.23.2; RF was conducted with sklearn.ensemble.RandomForestClassifier in scikitlearn library 0.23.2; and for MLP, sklearn.neural_network.MLPClassifier in scikitlearn library 0.23.2 was used. To achieve the best performance results, for each model we used a grid search technique with the tenfold cross-validation method to find the optimal hyper-parameters. Supplementary Table 1 lists the values of the main hyper-parameters used in this study for each model. For parameters that are not listed in Supplementary Table 1, default values were used.
Model evaluation and interpretability
For each algorithm, the model achieved from the training dataset was then applied to the test dataset with its performance being evaluated by various metrics, including confusion matrix, accuracy, and the area under the receiver operating characteristic (ROC) curve (AUC)91. In a confusion matrix, true positive (TP), true negative (TN), false positive (FP), and false-negative (FN), respectively, are presented, which can be used to calculate the overall accuracy and the precision for each type of zircon based on the following equations (Eqs. 1–2). The AUC provides a single measure of the overall model accuracy that is threshold independent. An AUC value of 0.5 indicates the prediction is as good as random, whereas 1 indicates perfect prediction.
Due to the fact that the ML models internally calculate the importance of the values of features, it is often difficult to interpret the results without knowledge of the process between the input and output of data, like a black box. To overcome this limitation, our study applied SHapley Additive exPlanations (SHAP48) to estimate the importance of the studied features and to interpret and analyze the results (see Supplementary Figs. 4–6). Grounded in cooperative game theory, SHAP provides a reliable and consistent ranking of the unique relative importance of each feature. In addition to providing a ranking for the unique and additive importance of all identified features, SHAP allows for examining interactions between features in a model. A positive SHAP value indicates that the feature has a positive contribution to the interest zircon type, while a negative value represents a negative impact on the zircon type.
Data availability
Zircon data used in this study can be downloaded from https://doi.org/10.5061/dryad.zpc866tcm.
Code availability
To make the technique accessible and reproducible to other studies, the code necessary to reproduce the machine learning classifiers are available at https://github.com/ShihuaZhong/Machine-learning-zircon-classifiers.
References
Wilde, S. A., Valley, J. W., Peck, W. H. & Graham, C. M. Evidence from detrital zircons for the existence of continental crust and oceans on the Earth 4.4 Gyr ago. Nature 409, 175–178 (2001).
Harrison, T. M. The Hadean crust: evidence from >4 Ga zircons. Annu. Rev. Earth Planet. Sci. 37, 479–505 (2009).
Cavosie, A. J., Wilde, S. A., Liu, D., Weiblen, P. W. & Valley, J. W. Internal zoning and U–Th–Pb chemistry of Jack Hills detrital zircons: a mineral record of early Archean to Mesoproterozoic (4348–1576 Ma) magmatism. Precambrian Res. 135, 251–279 (2004).
Crowley, J. L., Myers, J. S., Sylvester, P. J. & Cox, R. A. Detrital zircon from the Jack Hills and Mount Narryer, Western Australia: evidence for diverse >4.0 Ga source rocks. J. Geol. 113 (2005).
Dunn, S. J., Nemchin, A. A., Cawood, P. A. & Pidgeon, R. T. Provenance record of the Jack Hills metasedimentary belt: Source of the Earth’s oldest zircons. Precambrian Res. 138, 235–254 (2005).
Borisova, A. Y. et al. Hadean zircon formed due to hydrated ultramafic protocrust melting. Geology 50, 300–304 (2021).
Trail, D. et al. Constraints on Hadean zircon protoliths from oxygen isotopes, Ti-thermometry, and rare earth elements. Geochem. Geophys. Geosyst. 8, (2007).
Ackerson, M. R., Trail, D. & Buettner, J. Emergence of peraluminous crustal magmas and implications for the early Earth. Geochem. Perspectives Lett. 17, 50–54 (2021).
Cawood, P. A., Hawkesworth, C. J. & Dhuime, B. The continental record and the generation of continental crust. Geol. Soc. Am. Bull. 125, 14–32 (2013).
Trail, D., Watson, E. B. & Tailby, N. D. The oxidation state of Hadean magmas and implications for early Earth’s atmosphere. Nature 480, 79–82 (2011).
Coogan, L. A. & Hinton, R. W. Do the trace element compositions of detrital zircons require Hadean continental crust? Geology 34, 633–636 (2006).
Kemp, A. I. S. et al. Hadean crustal evolution revisited: new constraints from Pb–Hf isotope systematics of the Jack Hills zircons. Earth Planet. Sci. Lett. 296, 45–56 (2010).
Marchi, S. et al. Widespread mixing and burial of Earth’s Hadean crust by asteroid impacts. Nature 511, 578–582 (2014).
Kenny, G. G., Whitehouse, M. J. & Kamber, B. S. Differentiated impact melt sheets may be a potential source of Hadean detrital zircon. Geology 44, 435–438 (2016).
Bell, E. A., Boehnke, P., Hopkins-Wielicki, M. D. & Harrison, T. M. Distinguishing primary and secondary inclusion assemblages in Jack Hills zircons. Lithos 234-235, 15–26 (2015).
Burnham, A. D. & Berry, A. J. Formation of Hadean granites by melting of igneous crust. Nature Geoscience 10, 457–461 (2017).
Hopkins, M., Harrison, T. M. & Manning, C. E. Low heat flow inferred from >4 Gyr zircons suggests Hadean plate boundary interactions. Nature 456, 493–496 (2008).
Mojzsis, S. J., Harrison, T. M. & Pidgeon, R. T. Oxygen-isotope evidence from ancient zircons for liquid water at the Earth’s surface 4,300 Myr ago. Nature 409, 178–181 (2001).
Turner, S., Wilde, S., Worner, G., Schaefer, B. & Lai, Y. J. An andesitic source for Jack Hills zircon supports onset of plate tectonics in the Hadean. Nat. Commun. 11, 1241 (2020).
Bell, E. A., Boehnke, P., Harrison, T. M. & Wielicki, M. M. Mineral inclusion assemblage and detrital zircon provenance. Chem. Geol. 477, 151–160 (2018).
Darling, J., Storey, C. & Hawkesworth, C. Impact melt sheet zircons and their implications for the Hadean crust. Geology 37, 927–930 (2009).
Reimink, J. R., Davies, J. H. F. L., Bauer, A. M. & Chacko, T. A comparison between zircons from the Acasta Gneiss Complex and the Jack Hills region. Earth Planet. Sci. Lett. 531, 115975 (2020).
Blichert-Toft, J. & Albarède, F. Hafnium isotopes in Jack Hills zircons and the formation of the Hadean crust. Earth Planet. Sci. Lett. 265, 686–702 (2008).
Bouvier, A.-S. et al. Li isotopes and trace elements as a petrogenetic tracer in zircon: insights from Archean TTGs and sanukitoids. Contrib. Mineral. Petrol. 163, 745–768 (2011).
Nutman, A. P. Comment on “Zircon thermometer reveals minimum melting conditions on earliest Earth” II. Science 311, 779 (2006). author reply 779.
Moyen, J.-F. & Martin, H. Forty years of TTG research. Lithos 148, 312–336 (2012).
Martin, H., Smithies, R. H., Rapp, R., Moyen, J. F. & Champion, D. An overview of adakite, tonalite–trondhjemite–granodiorite (TTG), and sanukitoid: relationships and some implications for crustal evolution. Lithos 79, 1–24 (2005).
Wang, Q. & Wilde, S. A. New constraints on the Hadean to Proterozoic history of the Jack Hills belt, Western Australia. Gondwana Res. 55, 74–91 (2018).
Laurent, O., Moyen, J.-F., Wotzlaw, J.-F., Björnsen, J. & Bachmann, O. Early Earth zircons formed in residual granitic melts produced by tonalite differentiation. Geology 50, 437–441 (2022).
Carley, T. L. et al. Zircon-modeled melts shed light on the formation of Earth’s crust from the Hadean to the Archean. Geology 50, 1028–1032 (2022).
Harrison, T. M. & Wielicki, M. M. From the Hadean to the Himalaya: 4.4 Ga of felsic terrestrial magmatism. Am. Mineralogist 101, 1348–1359 (2016).
Watson, E. B. & Harrison, T. M. Zircon thermometer reveals minimum melting conditions on earliest earth. Science 308, 841–844 (2005).
Harrison, T. M., Watson, E. B. & Aikman, A. B. Temperature spectra of zircon crystallization in plutonic rocks. Geology 35, 635–638 (2007).
Rasmussen, B., Fletcher, I. R., Muhling, J. R., Gregory, C. J. & Wilde, S. A. Metamorphic replacement of mineral inclusions in detrital zircon from Jack Hills, Australia: Implications for the Hadean Earth. Geology 39, 1143–1146 (2011).
Trail, D., Tailby, N., Wang, Y., Mark Harrison, T. & Boehnke, P. Aluminum in zircon as evidence for peraluminous and metaluminous melts from the Hadean to present. Geochem. Geophys. Geosyst. 18, 1580–1593 (2017).
Chappell, B. & White, A. I-and S-type granites in the Lachlan Fold Belt. Trans. Roy. Soc. Edinb.: Earth Sci. 83, 1–26 (1992).
Smithies, R. H. et al. No evidence for high-pressure melting of Earth’s crust in the Archean. Nat. Commun. 10, 5559 (2019).
Zhong, S., Seltmann, R., Qu, H. & Song, Y. Characterization of the zircon Ce anomaly for estimation of oxidation state of magmas: a revised Ce/Ce* method. Mineral. Petrol. 113, 755–763 (2019).
Grimes, C., Wooden, J., Cheadle, M. & John, B. “Fingerprinting” tectono-magmatic provenance using trace elements in igneous zircon. Contrib. Mineral. Petrol. 170, 46 (2015).
Claiborne, L. L. et al. in Microstructural Geochronology: Planetary Records Down to Atom Scale (eds Moser, D. E. et al.) 1–33 (John Wiley & Sons, 2018).
Grimes, C. B. et al. Trace element chemistry of zircons from oceanic crust: a method for distinguishing detrital zircon provenance. Geology 35, 643–646 (2007).
Zhong, S., Li, S., Seltmann, R., Lai, Z. & Zhou, J. The influence of fractionation of REE-enriched minerals on the zircon partition coefficients. Geosci. Front. 12, 101094 (2021).
Bergen, K. J., Johnson, P. A., Hoop, M. V. D. & Beroza, G. C. Machine learning for data-driven discovery in solid Earth geoscience. Science 363, eaau0323 (2019).
Jordan, M. I. & Mitchell, T. M. Machine learning: Trends, perspectives, and prospects. Science 349, 255–260 (2015).
Reichstein, M. et al. Deep learning and process understanding for data-driven Earth system science. Nature 566, 195–204 (2019).
Belousova, E. A., Griffin, W., O’Reilly, S. Y. & Fisher, N. Igneous zircon: trace element composition as an indicator of source rock type. Contrib. Mineral. Petrol. 143, 602–622 (2002).
Hoskin, P. W. O. & Ireland, T. R. Rare earth element chemistry of zircon and its use as a provenance indicator. Geology 28, 627–630 (2000).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inform. Process. Syst. 30 (2017).
Kirkland, C. L., Smithies, R. H., Taylor, R. J. M., Evans, N. & McDonald, B. Zircon Th/U ratios in magmatic environs. Lithos 212-215, 397–414 (2015).
Reunanen, J. Overfitting in making comparisons between variable selection methods. J. Mach. Learn. Res. 3, 1371–1382 (2003).
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).
Zhong, S. Compiled trace element compositions for magmatic zircons from I-type granitoids, S-type granitoids and TTGs, and detrital zircons, Dryad, Dataset, https://doi.org/10.5061/dryad.zpc866tcm (2023).
Ji, W., Wu, F., Liu, C. & Chung, S. Geochronology and petrogenesis of granitic rocks in Gangdese batholith, southern Tibet. Sci. China Ser. D: Earth Sci. 52, 1240–1261 (2009).
Ranjan, S., Upadhyay, D. & Srikantappa, C. Eoarchean to Neoarchean crustal evolution of the Western Dharwar Craton, southern India: Clues from U-Pb-Hf isotope composition of detrital zircon. Precambrian Res. 371 (2022).
Polat, A. Growth of Archean continental crust in oceanic island arcs. Geology 40, 383 (2012).
Sizova, E., Gerya, T., Stüwe, K. & Brown, M. Generation of felsic crust in the Archean: A geodynamic modeling perspective. Precambrian Res. 271, 198–224 (2015).
Keller, C. B. & Schoene, B. Statistical geochemistry reveals disruption in secular lithospheric evolution about 2.5 Gyr ago. Nature 485, 490–493 (2012).
Cawood, P. A. et al. Geological archive of the onset of plate tectonics. Philos. Trans. A: Math Phys. Eng. Sci. 376 (2018).
Cawood, P. A. et al. Secular evolution of continents and the Earth system. Rev. Geophys. 60, e2022RG000789 (2022).
Cawood, P. A. & Hawkesworth, C. J. Continental crustal volume, thickness and area, and their geodynamic implications. Gondwana Res. 66, 116–125 (2019).
Reimink, J. R. et al. No evidence for Hadean continental crust within Earth’s oldest evolved rock unit. Nat. Geosci. 9, 777–780 (2016).
Reimink, J. R., Chacko, T., Stern, R. A. & Heaman, L. M. Earth’s earliest evolved crust generated in an Iceland-like setting. Nat. Geosci. 7, 529–533 (2014).
Adam, J., Rushmer, T., O’Neil, J. & Francis, D. Hadean greenstones from the Nuvvuagittuq fold belt and the origin of the Earth’s early continental crust. Geology 40, 363–366 (2012).
Nagel, T. J., Hoffmann, J. E. & M¨¹nker, C. Generation of Eoarchean tonalite-trondhjemite-granodiorite series from thickened mafic arc crust. Geology 40, 375–378 (2012).
Rapp, R. P., Shimizu, N. & Norman, M. D. Growth of early continental crust by partial melting of eclogite. Nature 425, 605–609 (2003).
Foley, S., Tiepolo, M. & Vannucci, R. Growth of early continental crust controlled by melting of amphibolite in subduction zones. Nature 417, 837–840 (2002).
Smithies, R. H. The Archaean tonalite–trondhjemite–granodiorite (TTG) series is not an analogue of Cenozoic adakite. Earth Planet. Sci. Lett. 182, 115–125 (2000).
Chowdhury, P. et al. Magmatic thickening of crust in non–plate tectonic settings initiated the subaerial rise of Earth’s first continents 3.3 to 3.2 billion years ago. Proc. Natl Acad. Sci. 118, e2105746118 (2021).
Johnson, T. E., Brown, M., Gardiner, N. J., Kirkland, C. L. & Smithies, R. H. Earth’s first stable continents did not form by subduction. Nature 543, 239–242 (2017).
Herzberg, C. & Gazel, E. Petrological evidence for secular cooling in mantle plumes. Nature 458, 619–622 (2009).
Condie, K. C., Aster, R. C. & van Hunen, J. A great thermal divergence in the mantle beginning 2.5 Ga: Geochemical constraints from greenstone basalts and komatiites. Geoscience Front. 7, 543–553 (2016).
Rozel, A. B., Golabek, G. J., Jain, C., Tackley, P. J. & Gerya, T. Continental crust formation on early Earth controlled by intrusive magmatism. Nature 545, 332–335 (2017).
Chowdhury, W., Trail, D. & Bell, E. Boron partitioning between zircon and melt: Insights into Hadean, modern arc, and pegmatitic settings. Chem. Geol. 551, 19763 (2020).
Antonelli, M. A. et al. Calcium isotope evidence for early Archaean carbonates and subduction of oceanic crust. Nat. Commun. 12, 2534 (2021).
Korenaga, J. Initiation and evolution of plate tectonics on earth: theories and observations. Annu. Rev. Earth Planet. Sci. 41, 117–151 (2013).
Duarte, J. C. in Dynamics of Plate Tectonics and Mantle Convection (ed Duarte, J. C.) 1–600 (Elsevier, 2022).
Bauer, A. B. et al. Hafnium isotopes in zircons document the gradual onset of mobile-lid tectonics. Geochem. Perspectives Lett. 14, 1–6 (2020).
Capitanio, F. A., Nebel, O., Cawood, P. A., Weinberg, R. F. & Chowdhury, P. Reconciling thermal regimes and tectonics of the early Earth. Geology 47, 923–927 (2019).
Cawood, P. A. Metamorphic rocks and plate tectonics. Sci. Bull. 65, 968–969 (2020).
Hawkesworth, C., Cawood, P. A. & Dhuime, B. The evolution of the continental crust and the onset of plate tectonics. Front. Earth Sci. (Lausanne) 8 (2020).
Wan, B. et al. Seismological evidence for the earliest global subduction network at 2 Ga ago. Sci. Adv. 6, eabc5491 (2020).
Castillo, P. R. Adakite petrogenesis. Lithos 134-135, 304–316 (2012).
Reimann, C. & Filzmoser, P. Normal and lognormal data distribution in geochemistry: death of a myth. Consequences for the statistical treatment of geochemical and environmental data. Environ. Geol. 39, 1001–1014 (2000).
Nathwani, C. L. et al. From long-lived batholith construction to giant porphyry copper deposit formation: petrological and zircon chemical evolution of the Quellaveco District, Southern Peru. Contrib. Mineral. Petrol. 176, 1–21 (2021).
Zhong, S., Feng, C., Seltmann, R., Li, D. & Qu, H. Can magmatic zircon be distinguished from hydrothermal zircon by trace element composition? The effect of mineral inclusions on zircon trace element composition. Lithos 314-315, 646–657 (2018).
Bindeman, I. N. et al. Field and microanalytical isotopic investigation of ultradepleted in 18O Paleoproterozoic “Slushball Earth” rocks from Karelia, Russia. Geosphere 10, 308–339 (2014).
Tang, M., Chu, X., Hao, J. & Shen, B. Orogenic quiescence in Earth’s middle age. Science 371, 728 (2021).
del Río, S., López, V., Benítez, J. M. & Herrera, F. On the use of MapReduce for imbalanced big data using Random Forest. Inform. Sci. 285, 112–137 (2014).
Kotsiantis, S., Kanellopoulos, D. & Pintelas, P. Handling imbalanced datasets: a review. GESTS Int. Trans. Comput. Sci. Eng. 30, 25–36 (2006).
Chawla, N. V. Data mining for imbalanced datasets: an overview. Data mining and knowledge discovery handbook, 875–886 (2009).
Narkhede, S. Understanding auc-roc curve. Towards Data Sci. 26, 220–227 (2018).
Sun, S.-S. & McDonough, W. F. Chemical and isotopic systematics of oceanic basalts: implications for mantle composition and processes. Geol. Soc. London Special Publ. 42, 313–345 (1989).
Acknowledgements
The authors thank the editor Joao Duarte for handling. We also thank Dr. Hao Dong for help with Matlab codes. This study is financially supported by the Science and Technology Innovation Project of Laoshan Laboratory (LSKJ202204400), the Fundamental Research Funds for the Central Universities of China (202172002), the National Natural Science Foundation of China (42203066; 91958214), the Natural Science Foundation of Shandong Province (ZR2020QD027), the China Postdoctoral Science Foundation (2020T130621, 20180838), and the Australian Research Council (FL160100168).
Author information
Authors and Affiliations
Contributions
S.Z. and S.L. designed the study and prepared the manuscript with the help of Y.L., P.A.C., and R.S. All co-authors contributed to the interpretation presented in the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Earth & Environment thanks Christopher Lawley and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: João Duarte and Joe Aslin.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhong, S., Li, S., Liu, Y. et al. I-type and S-type granites in the Earth’s earliest continental crust. Commun Earth Environ 4, 61 (2023). https://doi.org/10.1038/s43247-023-00731-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s43247-023-00731-7
- Springer Nature Limited