Introduction

Climate change associated with increasing atmospheric greenhouse gas concentrations affects many agricultural regions and threatens global crop production and food security1,2,3,4. In recent years, climate change has increased the risks of simultaneous failure of major crops for breadbaskets across the globe5 and projected climate change may push one-third of food production areas outside a “safe” climatic space6. Coincident with these factors, increasing population and food demand increase the difficulty of achieving the UN’s Sustainable Development Goal target of hunger eradication by 20307,8 and other health and environmental targets9.

Process-based biophysical crop models driven by climate data downscaled from multiple global climate models (GCMs) are often used to project crop yield changes and aid strategic decision-making for agriculture10,11,12. The Agricultural Model Intercomparison and Improvement Project (AgMIP)’s Global Gridded Crop Model Intercomparison (GGCMI) provides the essential data products needed to analyze global crop yield in response to climate change at global scales13,14,15. Projections based on the latest generation of GCMs, those contributing to the Coupled Model Intercomparison Project phase 6 (CMIP6)16, show climate change having a largely negative impact on crop yield in many breadbasket regions but also indicate some positive changes for individual crops, especially across higher latitudes3. However, there are large uncertainties in GCM–GGCM yield projections, both for the breadbasket regions2,17 and at the global gridded scale18,19. These include modeling uncertainty related to the structure and parameters of the GCMs and crop models20,21,22. Recent analyses of uncertainty due to GCM and GGCM at a global scale for different crops have been conducted3,23. These studies reported that GGCMs are often the dominant source of uncertainty in quantifying relative changes in crop yield, even though the GGCM-induced variance decreased in the latest CMIP6-based projections3 compared with an earlier GGCM ensemble-driven with CMIP5 climate projections18. Note that Müller et al.23 also found that the overall variance of a crop-model emulator ensemble increases when driven by CMIP6 in comparison with being driven by the previous CMIP5 generation of GCM climate projections, but the relative shares of GGCM-induced variance were similar between the CMIP5 and CMIP6 ensembles. Wang et al.22 found that the dominant source of uncertainty in wheat (Triticum aestivum L.) yield change was site-specific. A more comprehensive analysis of the sources of uncertainty at a gridded scale for four crops with a very large ensemble also showed that the exclusion of specific model combinations could lead to substantial differences in the representation of uncertainty23, raising the question of how much of the actual uncertainty can be represented by large ensembles.

Optimizing the sample size for GCM–GGCM projections depends on the level of uncertainty and the specific research question. However, the computational requirements of process-based crop models constrain the number of GCMs that can be used in harmonized crop model protocols, thereby influencing the uncertainty of yield projections24. The latest CMIP6-based GGCM Intercomparison Project (GGCMI) phase 3 projections provide the largest archive of process-based future crop yield projections as of today, with 60 combinations of 12 GGCMs and five GCMs3. The selection of five GCMs was a tradeoff between computational requirements and the representation of climate model uncertainty. Running GGCMs for additional GCMs would allow for more robust uncertainty attribution. However, the wide variation among the pool of models can obscure important information when relying solely on the ensemble mean. The effective use of large ensembles has received limited attention.

Several studies have established the minimum necessary ensemble size to effectively capture the uncertainty of models in different fields, with the aim of reducing redundancies and offering guidance for model selection21,24,25,26. For instance, Falconnier, et al.27 found that crop model ensemble skill increased with more maize (Zea mays L.) models considered in an ensemble in Sub-Saharan Africa. They reported that at least eight randomly selected models can ensure satisfactory accuracy in reproducing historical crop yield. However, this does not mean that this number of models is adequate for sampling uncertainty of future yield changes. Crop models that produce similar results for a historical time period can produce divergent results under future climate due to differences in model processes (e.g., temperature response, CO2 fertilization effects)22. McSweeney and Jones28 found that in order to cover the full spectrum of climate change projections for all regions, at least 13 GCMs needed to be considered. Ruane and McDermid29 found that using five GCMs with different basic classes (e.g., relatively cool/wet, cool/dry, middle, hot/wet, and hot/dry) could capture the uncertainty across the full ensemble of simulated future climate change at a study site. However, the guidance for creating representative subsets to accurately reflect the uncertainty of global scale crop yield projections is still not well defined. The uncertainty of these projections is highly sensitive to the selection of crop and climate models23, and there has not yet been a comprehensive study to quantify the impact of ensemble size (GGCM and GCM) selected through random or science-based methods on uncertainty in the agricultural sector.

Wheat, maize, rice (Oryza sativa L.), and soybean (Glycine max (L.) Merr.) provide two-thirds of the caloric intake of the human population30. In this study, we present a new framework for better designing ensembles and extracting more information from a large pool of models. The framework utilizes nine GGCM emulators driven by changes in growing-season climate from 32 CMIP6 GCMs31 and 12 GGCM phase 3 simulations driven by five GCMs3. Because our focus was on modeling uncertainties in the response of yields to climate change, we generated data for a single high greenhouse gas emissions scenario (SSP585) for the end of the 21st century. The study investigates the influences of ensemble configurations on crop yield projections and modeling uncertainty from GGCMs and GCMs under future climate change. We analyze the importance of model composition and ensemble size in determining the primary sources of uncertainty in crop yield projections, with the aim of offering valuable guidance for optimizing ensemble configurations in climate-crop modeling studies. We expect our analysis to enhance the accuracy of crop yield projections from a large ensemble, bridging the gap of notable ensemble uses and contributing to global food security.

Results

Yield projections by different ensembles

We explored changes in simulated crop yield across GCMs for wheat, maize, rice, and soybean between 1980–2010 and 2069–2099 for each GGCM emulator and the multi-model ensemble mean (Fig. S1). Results were generated by considering the CO2 fertilization effect. Without the CO2 fertilization effect, most GCM–GGCM combinations showed a decline in simulated yield, although one or two GGCMs simulated increases in maize, rice, and soybean yields. As for the CO2 fertilization effect, the wheat, rice, and soybean yield changes showed a stronger decrease signal compared with maize. In addition, crop yield changes with the CO2 fertilization effect had a larger difference among the models. Note that the uncertainty in the CO2 fertilization effect as implemented in the GGCMs is one of the major sources of the variance within the GGCM ensemble23. For instance, a large response to CO2 fertilization was observed in the emulator of the JULES model, while the emulator of EPIC-TAMU showed relatively weaker CO2 fertilization effects, especially for wheat and rice.

We used agglomerative-hierarchical clustering analysis (see the “Materials and methods” section) to divide yield projections (considering CO2 fertilization) from GCM–GGCM combinations into three clusters (clusters 1–3) on the basis of spatial patterns and magnitudes of percentage yield changes and ranked yield changes of cluster 1 to cluster 3 from most negative to most positive for the four respective crops (Figs. 1a and b, and S2a, b, respectively). The clustering tended to group results by GGCM, indicating that, in general, differences between GGCMs were responsible for differences between clusters. For instance, cluster 2 in Fig. 1a illustrates that EPIC-TAMU, GEPIC, PEPIC, and PROMET (around 44%) are clustered together due to similar yield changes, yet they include the projections from all 32 GCMs. This result suggests that in this cluster, the crop models are the primary factors influencing the projections of crop yield changes. However, it is worth noting that despite this crop model dominance in distinguishing ensemble members, different GCMs also contributed to the diversity of projections within each crop model group. This is evident as the associated GCMs in clusters 1 and 3 of Fig. 1a do not consistently group by crop models. For example, both cluster 1 and cluster 3 contain projections from pDSSAT, LPJ-GUESS, and LPJml, each driven by different GCMs. Since the spatial patterns of yield changes varied between clusters, the yield changes for cluster 1 (3) were not always ranked the most negative (positive) in a particular region of the world (Fig. 1c, d). For instance, we can see positive wheat yield responses in some regions (e.g., East Africa, south India) in cluster 1 and these regions had some negative yield changes in cluster 2 (Fig. 1c). In addition, some regions experienced yield losses in the future regardless of clusters, including clusters with a tendency towards increases in simulated yield. For example, wheat yield decreased in southern Brazil and southern Africa across three clusters (Fig. 1c), indicating that climate change will have negative effects on those areas.

Fig. 1: Hierarchical clustering of wheat and maize yield changes (between 2069–2099 and 1980–2010) under SSP585.
figure 1

The left panels of a and b are boxplots showing the change in global yield for each cluster and all models. Box boundaries indicate the 25th and 75th percentiles of crop yield change. The black line within each box indicates the median value. The right panels of a and b show (from outside to inside) the GGCM names, the simulated yields of each GGCM (bar plots), the different GCMs, the three clusters, and the cluster tree diagrams, respectively. The three clusters are highlighted with light red, green, and blue. The left panels of c and d are the average wheat and maize yield changes for each cluster. The numbers (n=) at the bottom left of each panel represent the number of ensemble members (GGCM × GCM) used for each cluster. The right panels of c and d are the yield changes averaged by latitude (blue line) and the smoothed relationship between latitude and yield change (black line). Note that the wheat yield is the combined spring wheat and winter wheat yield based on harvested area data. GGCM is global gridded crop model; GCM is global climate model.

We also generated cluster tree diagrams for the GGCM phase 3 simulation, which was improved with better inputs and an updated model version (Figs. S3S6). The spatial patterns of yield changes among the different clusters displayed large variations, similar to those in the GGCM emulator. However, the ensemble member distance in GGCM phase 3 was distinct from that of the GGCM emulator. For instance, in Fig. 1a, the ensemble members (GGCMs driven by GCMs) are mainly grouped by the GGCMs, with only pDSSAT, LPJ-GUESS, and LPJmL appearing concurrently in clusters 1 and 3. For GGCM phase 3, the wheat yield changes within cluster 1 were primarily grouped by two specific GCMs (IPSL-CM6A-LR and UKESM1-0-LL). This can be attributed to the fact that these two GCMs exhibit a higher equilibrium climate sensitivity (ECS)32,33, which leads to expected large yield losses by some climate-sensitive crop models. However, it is worth noting that some crop models (e.g., DNDC, PROMET, and LPJml) still fall in the same branch among the five GCMs (Fig. S3), indicating that different crop models have different levels of sensitivity to climate change.

We propose a novel approach for analyzing ensemble data by defining a distance metric between different ensemble members and reducing the large ensembles to three key patterns. This approach offers a deeper understanding of design ensemble configuration than relying solely on multi-model ensembles that may ultimately obscure results and overlook important details. Furthermore, our cluster scheme can potentially improve the harmonization of crop models and GCMs. For example, by identifying that sensitive crop models (e.g., those sensitive to warming) are driven by high ECS GCMs, we can anticipate that these models will experience a greater yield decline than other crop models (e.g., cluster 1 in Figs. S3 and  S6).

The uncertainty of different ensembles

Crop yield simulations are impacted by local environment and management practices (e.g., climate conditions and crop management) that further influence the source of uncertainty in yield change projections. Therefore, we used such information as inputs for the cluster analysis (another type of cluster analysis for classifying regions, see details in the “Materials and methods” section) to divide the global cropping region into twelve sub-regions for each crop (Fig. S7).

We explored the major source of modeling uncertainty for each subset (the combination of GGCM × GCM based on cluster analysis) for gridded, latitudinal, and regional scales (Fig. 2). For the entire ensemble of all GCMs and GGCMs, the crop model-induced variance share was larger than that of the GCMs (Fig. 2a, c, left panels). For different clusters, the dominant source of uncertainty for yield changes varied among different clusters, especially for maize, rice, and soybean (Figure S8). These results provide examples of how different model combinations can impact the sources of modeling of uncertainty. For example, the ensemble size of cluster 2 (n = 56 and nGGCM = 3, Fig. 1d) and cluster 3 (n = 64 and nGGCM = 2, Fig. 1d) for maize were similar, but the dominant source of uncertainty exhibited large differences at different spatial scales (Fig. 2c, d). For instance, GCM was the dominant source of uncertainty in Northern Europe, Eastern Asia, and the USA for cluster 2 (Fig. 2c). However, in cluster 3, GGCM was the dominant source of uncertainty in these regions. In addition, the ensemble size of cluster 2 (n = 72 and nGGCM = 4, Fig. S2) and cluster 3 (n = 85 and nGGCM = 3, Fig. S2) for rice had a similar number of GGCMs, but they show differences in the sources of uncertainties across sub-regions (Fig. S8). This is because, compared with cluster 3, the three GGCMs of cluster 2 had similar physiological processes (note that both EPIC-TAMU and GEPIC models were developed from EPIC). Several studies have investigated the source of uncertainty based on different sets of crop models19,22,34,35. However, the impact of model compositions, which can significantly influence the uncertainty of crop yield projections, has often been overlooked. For example, some studies that focused on the same region and used a similar number of crop models have shown varied results regarding the source of uncertainty35,36. Thus, we highlight the need to carefully select ensemble members in agricultural impact studies, given that different model compositions can lead to varying sources of uncertainty. In our study, although each cluster consisting of varying numbers of models, represents a unique potential scenario of yield changes under climate change. Such approaches can offer a more comprehensive understanding of the complexity and uncertainties involved in projecting crop yields under climate change.

Fig. 2: The proportion of uncertainty from GCM and GGCM in simulated wheat and maize yield changes in 2069–2099 under SSP585 compared with 1980–2010.
figure 2

The left panels of a and c are the spatial patterns of uncertainty proportions from GCMs and GGCMs at the gridded scale across the globe for different clusters (see Fig. 1) and all ensemble members for wheat and maize. The right panels of a and c show the proportion of uncertainty at different latitudes for wheat and maize. To obtain the uncertainty at different latitudes, we first calculated the yield change averaged by latitude for each ensemble member and then quantified the proportion of uncertainty across different latitudes using ANOVA. b and d show the sources of uncertainty at different sub-regions (see Fig. S7). GGCM is the global gridded crop model; GCM is the global climate model.

Impacts of model quantity on uncertainty

In order to reduce the quantity of simulations, some studies have ranked model performance and selected representative models for further analysis37,38,39, even though the selected models may not reflect the overall characteristics of the full ensemble. Here we quantify the uncertainty associated with the number of models, including nine GGCM emulators and 32 GCMs (Figs. 3 and  S9). The results indicated that the GGCM- or GCM-induced uncertainty proportion first increased with the number of models (GGCM or GCM) used, and then became stable when the number of models exceeded a certain value. We found that for GGCM-emulators, randomly choosing at least five to six crop models (six GGCMs for wheat, maize, and rice, five GGCMs for soybean) and 9–12 GCMs (nine GCMs for wheat, 10 GCMs for maize, 12 GCMs for rice and soybean) adequately captured the uncertainty of yield changes compared with the full model set. For GGCM phase 3 simulations, there were six to seven GGCMs (seven GGCMs for wheat and soybean, six GGCMs for maize and rice) that could reflect the overall uncertainty of yield change projections. The least ensemble size varied between regions (Figs. S10, S11 and Tables S3S4) but was within the range of four to six GGCMs and 6–14 GCMs for most regions. Also, the spatial patterns of GGCM-induced uncertainty and associated standard deviations changed little when the number of models was greater than approximately five (Figs. S12, S13).

Fig. 3: The proportion of uncertainty of yield changes (2069–2099 compared with 1980–2010, under SSP585) for different numbers of GGCMs used for wheat, maize, rice, and soybean.
figure 3

The numbers 2–9 (and 2–8) represent different numbers of GGCMs used, including all of the possible combinations for wheat and maize (rice and soybean). The boxplot shows the distribution of uncertainty due to GGCM selection under different numbers of GGCMs used. The dark and light blue shaded areas respectively represent the thresholds of ±2.5% and ±5% when using the contribution of uncertainty estimated from all GGCMs as the benchmark. Box boundaries indicate the 25th and 75th percentiles. The black line within each box indicates the median value. GGCM is a global gridded crop model; GCM is a global climate model. S3 is the scenario when one model was randomly selected from each cluster (total 3 crop models). S4 is the scenario when four models were selected ensuring that at least one model belonged to each cluster.

One question that arises is whether the minimum effective ensemble size can be further reduced through the use of specific ensemble strategies. To address this, we used the family tree of models to select a subset of GGCMs, ensuring that at least one model belonged to each cluster. Our results indicated that for both GGCM-emulators and Phase 3 simulations, using 3–4 GGCMs was sufficient to capture the overall variance induced by the GGCMs (Figs. 3 and  S14). However, this may not be the case for wheat yield projections in Phase 3 simulations (Fig. S14). The cluster tree for wheat yield (GGCM phase 3) did not always center around the GGCM, as demonstrated by cluster 1 of Fig. S3 which was mainly grouped by IPSL-CM6A-LR and UKESM1-0-LL. Generally, although our results indicated that a subset of GGCMs was sufficient to capture the overall variance, caution should still be exercised when using multi-model ensemble projections directly.

Proportion of uncertainty attributed to individual GGCMs

GGCMs were the dominant source of uncertainty in most of our study areas. Therefore, we here mainly focus on the uncertainty attributed to individual GGCMs. We tested the proportion of uncertainty attributed to individual models by consecutively removing each crop model used, and then comparing the difference of GGCM-induced uncertainty between the full model set and without the particular model (Figs. 4 and S15S17). For wheat, the CARAIB, PEPIC, and pDSSAT models added uncertainty in most regions, while the contributions of LPJ-GUESS and JULES varied among different regions. For example, the LPJ-GUESS model increased the GGCM-induced uncertainty share for wheat yield projections in Australia and decreased the GGCM-induced uncertainty share in southeast Asia. This variable result could be because LPJ-GUESS often is the outlier in strong CO2-fertilization responses31. PEPIC and, especially, pDSSAT increased the GGCM-induced uncertainty share for maize yield projections. This could be due to pDSSAT having a strong temperature response for maize31. JULES increased the GGCM-induced uncertainty share in the Southern Hemisphere and decreased the GGCM-induced uncertainty share mainly in the Northern Hemisphere (Fig. S15). For rice, PEPIC mainly contributed to GGCM-induced uncertainty shares in most regions, especially in Africa; JULES increased the GGCM-induced uncertainty share in Asia; PROMET and pDSSAT had large influences on GGCM-induced uncertainty shares, but the direction varied among different regions (Fig. S16). For soybean, JULES increased the GGCM-induced uncertainty share in almost all regions; CARAIB, PROMET, and pDSSAT also greatly influenced the GGCM-induced uncertainty share, but this varied between different regions (Fig. S17). These results demonstrated that each GGCM affected the uncertainty of model predictions across the globe to some degree, highlighting the importance of model selection at the local scale.

Fig. 4: Changes in the GGCM-induced uncertainty share contributed by individual GGCMs at the grid level for wheat in 2069–2099 under SSP585.
figure 4

The contribution was determined by sequentially removing one of the nine GGCMs from the ensemble of GGCMs at each grid point, and then comparing the GGCM-induced uncertainty share of projected yield changes between the full GGCM model set and the set without the particular model. A positive change indicates that including a particular model increased the GGCM-induced uncertainty share. A negative contribution change indicates that using this model decreased the GGCM-induced uncertainty share. GGCM is global gridded crop model.

Implications and limitations

Although the multi-model ensemble mean is a common summary statistic for yield projections, it has limited utility for determining adaptation strategies that are robust to a range of plausible future yield trajectories, and it may give a misleading result of little change where individual model combinations have changes of different signs. Thus, it is necessary to consider results from diverse collections of models. Our results can provide information for carefully designing GCM–GGCM ensembles that maximize the sampling of uncertainty. We focused on the SSP585 emissions trajectory to explore yield changes under a large climate change signal. Müller et al.23 found that the relative importance of different sources of uncertainty in yield projections was comparable between SSP126 and SSP585 for these GGCM emulators, while overall cross-ensemble variance was substantially larger under SSP585. Therefore, the results provided here are likely representative of other levels of climate change as well.

Understanding the inter-model distances and relationships using hierarchical clustering and family trees offers valuable insights into the similarities and differences among ensemble members, supporting the improvement and diversification of crop models and their responses to climate change40,41. For instance, by identifying and addressing any dependence among models, the overall uncertainty in the ensemble can be more robustly estimated, leading to more reliable yield projections. In addition, our study investigated the influence of ensemble size on uncertainties. We found that the number of models had a large influence on the represented uncertainty in different regions across the globe. For example, using 10 GCMs captured the GCM-induced uncertainty for maize yield change at the global scale (Fig. S9), but more GCMs were needed for the USA, Central Asia, and the southern part of South America (sub-regions 1, 3, 8, and 11) (Fig. S11 and Table S4).

The different yield changes for different ensemble members have important implications for the design of ensemble configurations (Figs. 1 and  S2S6). For example, the JULES simulations of soybean yields were distinct from results obtained with other models and made a large contribution to overall uncertainty, especially in South Asia and Central Europe. How such extreme behavior is dealt with may have a large effect on the results from the multi-model ensemble and corresponding uncertainty42. In these kinds of situations, some previous studies have excluded or down-weighted a given model from the ensemble in order to reduce the overall uncertainty22,43. Therefore, it is very important to carefully screen and evaluate the performance of the individual models for their robustness to simulate climate change impacts on crop yield before they are selected for use in a multi-model ensemble for the purpose of reducing uncertainty in climate change impact assessment44. Additionally, sensitive crop models driven by high ECS GCMs may also lead to extreme behavior (Figs. S3S6), while some GGCMs may not have significant influence from such high ECS GCMs. These results highlight the ongoing challenge of designing an optimal model ensemble. Our study proposes dividing ensemble members (crop models driven by GCMs) into groups based on their characteristics in order to provide a more effective way to select the most suitable ensemble members for a specific application. This strategy can efficiently represent the full ensemble and reduce redundancy.

Different methods have been developed to improve crop model performance and reduce overall modeling uncertainty, such as improving model structure45,46 or parameter optimization19,47. However, conducting this work at the global scale is challenging due to high computation costs and missing calibration target data. Recently, some studies have constrained the uncertainty of GGCMs by using the results of field experiments and statistical models48,49. These studies have suggested that using more detailed information from field experiments can improve model responses to climate warming and further reduce the overall uncertainty. Other factors such as CO2 fertilization are also likely to impact crop growth and bring uncertainty to yield projections. Several studies found that the CO2 fertilization effect was constrained by frequent extreme weather events50,51. However, such impacts have not been fully considered in current simulations, and consequently, these simulations have probably underestimated the yield loss and increased the overall uncertainty48. Therefore, to better understand the interacting impacts of climate change (e.g., heatwave, extreme precipitation, drought, and CO2 fertilization) on crop yields and thereby reduce uncertainty in climate change impact assessment, more controlled environment experiments with long time periods and large ranges of crop genotypes in a wider range of mega-environments are needed52,53.

We recognize several limitations in our study. Some limitations are related to the GGCMs. First, sources of uncertainty such as soil data and management options (e.g., fertilization rate)2,54,55,56,57 and the effects of pests and diseases were not considered by the GGCM simulations18. Second, model parameterization also influences the projections and affects uncertainty. For example, Xiong et al.19 analyzed the uncertainty resulting from parameterization using four parameterization strategies. While structural differences between GGCMs were accounted for in our study, uncertainty in parameter settings for individual GGCMs was not considered. Third, although some GGCMs can capture yield losses that occur under extreme temperature events (including heat and frost) and drought58, they underestimate the magnitude of yield losses due to extremely wet conditions17,59. This is compounded by a fourth limitation related to our use of the change factor method to generate future climate. This method does not fully sample the uncertainty due to changes in extreme climate events, or event changes related to within-season variability (emulators only driven by climate change during the crop growing season), as simulated by the GCMs. Furthermore, the emulator may not perfectly reproduce the raw GGCM simulation. Although we also used the GGCMI phase 3 dataset that used bias-adjusted daily GCM data to compare dominant sources of uncertainty with GGCM emulators, only five GCMs were currently available. Greater efforts are needed to better harmonize the processes of climate and crop models as we look towards closing in on a “certain” projection from a large ensemble.

Materials and methods

GGCM emulators (GGCMI phase 2)

Franke et al.31 developed emulators for many of the GGCMI phase 2 models to facilitate the use of numerous GCM–GGCM combinations to analyze climate change impacts on crop yields23. Crop model emulators have allowed larger sets of GCMs, for example, all CMIP6 GCMs, to be considered in assessments of uncertainties in yield predictions23. The emulators combined the advantages of both process-based and statistical crop models, and have already been widely used in previous studies60,61. The GGCM emulators can successfully capture crop yield responses to temperature, precipitation, CO2 concentration, and nitrogen change. In addition, GGCM emulators have the potential to project crop yields under future climate change31. The change factor method was used to generate future climate data for GGCM emulators. This approach uses future monthly rainfall and temperature changes from the GCMs to perturb historical daily data without changing the distribution of rainfall and temperature. The GGCMI Phase 2 models used in our study are not formally calibrated14,31. It is conceivable that a lack of calibration could influence the relative changes that are the basis of our study, but note that calibration can sometimes have negative impacts on model skill15.

We used the emulators to simulate crop yields at the grid scale for wheat, maize, rice, and soybean. Emulators for nine GGCMs (CARAIB, EPIC-TAMU, JULES, GEPIC, LPJ-GUESS, LPJmL, pDSSAT, PEPIC, and PROMET) for wheat and maize, and eight GGCMs (CARAIB, EPIC-TAMU, JULES, GEPIC, LPJmL, pDSSAT, PEPIC, and PROMET) for rice and soybean were available. The crop yield changes were simulated under both rainfed and irrigated conditions, and we aggregated the rainfed and irrigated yield based on the crop area data of Monfreda et al.62 for maize, rice, and soybean. Winter wheat and spring wheat were simulated individually; however, the data of Monfreda et al.62 did not distinguish the harvest areas of winter wheat and spring wheat. In this study, the harvest areas of winter wheat and spring wheat were distinguished by average temperatures and the length of the growing season according to Müller et al.23. Winter wheat yields and spring wheat yields were then combined.

The emulators used annual values of growing season mean temperature and total precipitation on a 0.5° × 0.5° global grid and atmospheric CO2 concentrations as climate inputs. Because we did not consider growing season adaptation, only the A0 (no growing season adaptation) setting was used in this study. The nitrogen application rate for each grid was the same as used by Elliott et al.13. In this study, we mainly focused on the crop yield change with CO2 fertilization. More detailed information can be found in Müller et al.23. Nitrogen application rate and the areas of rainfed and irrigated crops were assumed to be the same for our baseline and future time periods (1980–2010 and 2069–2099). We focused on the relative yield change (%) for each grid under climate change for model comparisons18.

GGCMI phase 3 crop modeling

The GGCMI phase 3 project3 involved five CMIP6 GCMs (GFDL-ESM4, IPSL-CM6A-LR, MPI-ESM1-2-HR, MRI-ESM2-0, and UKESM10-LL) and 12 GGCMs (ACEA, CROVER, CYGMA1p74, DSSAT-Pythia, EPIC-IIASA, ISAM, LandscapeDNDC, LPJmL, pDSSAT, PEPIC, PROMET, and SIMPLACE-LINTUL5). Daily GCM data were bias-adjusted and downscaled to a 0.5° × 0.5° global grid by the ISIMIP (www.isimip.org) framework63,64,65 prior to input to the GGCMs. These GCMs were selected to best sample the full CMIP6 ensemble in terms of the sensitivity of global mean temperature to increasing atmospheric carbon dioxide concentrations (i.e., equilibrium climate sensitivity) and other factors.

In this study, each model within the ensemble has been subjected to a benchmark performance evaluation for the historical period, ensuring that the models meet a certain level of quality3,14,31. We use the relative change approach, which focuses on the differences between baseline and future projections, accounting for proportional changes instead of absolute values. This method allows for a more comprehensive evaluation of the influence of ensemble configurations, such as model composition and size, on crop yield projections and uncertainties under future climate change scenarios.

Climate data

We downloaded monthly temperature and precipitation data from simulations of 32 GCMs under SSP585 from the CMIP6 archives (Table S1, https://esgf-node.llnl.gov/search/cmip6/). We resampled this data to the 0.5o grid using a bilinear interpolation method. We used the change factor method to generate data for GGCM emulators. Since our study only focuses on the changes in future multi-year average temperature (∆T) and precipitation (∆P) in relation to the baseline of the same GCM, no bias correction is necessary for the computation of ∆T and ∆P. For each GCM, simulated differences in growing season mean temperature and growing season total precipitation between 1980–2010 and 2069–2099 were applied to the baseline data from the AgMIP Modern-Era Retrospective Analysis for Research and Applications (AgMERRA). The global average atmospheric CO2 content was obtained from the International Institute for Applied Systems Analysis (https://iiasa.ac.at/). A CO2 concentration of 810 ppm was used for SSP585 in 2069–2099, and the baseline concentration was 360 ppm. The baseline yield for each GCM was the same for a given GGCM because the change factor method used AgMERRA data for the baseline period.

Two types of cluster analysis

Cluster analysis of crop-growing regions

We divided the global cropping regions into different sub-regions based on the characteristics of local crop productivity, climate conditions, and management10. Four kinds of environmental variables with the k-mean cluster analysis method were used to determine different sub-regions for wheat, maize, rice, and soybean (Table S2). To reflect the crop yield during the baseline period (1980–2010), we used ensembles of either nine or eight GGCMs to determine the average crop yield during the historical period. We selected mean temperature, solar radiation, precipitation, air relative humidity, and wind speed as the main climate factors for the k-mean analysis (Table S2). Temperature, water, and light are the main factors influencing the speed and rate of germination, emergence, and growth of crops66. The climate data we used were from AgMERRA, and are the same as the emulators’ inputs31. We also used latitude and longitude to represent the location information. Nitrogen application rate was also considered as a management option based on Müller et al.23. It is worth noting that the GGCM-emulators take into account management practices including water supply and nitrogen inputs. Given that precipitation is separately accounted for in the cluster analysis, within the framework of the GGCM-emulator projections, only nitrogen fertilizer input was represented as a distinct management variable. Finally, 12 sub-regions were determined for each crop.

Currently, there are several regionalizing plans available, including the IPCC’s climate zones67, global agroecological zones (AEZs), and global food production units (FPUs). Despite their existence, these plans may not be able to effectively capture the regional diversity of sources of uncertainty in crop yields. This is because of the differences in harvest area distributions and factors that influence crop yield.

Cluster analysis of future yield change

We used agglomerative-hierarchical clustering analysis41 to reveal the different ensemble configurations of the relative yield changes when CO2 fertilization was considered. The Canberra distance (D) was used to compute the initial cluster distance among ensemble members68,69. This approach was performed downstream (without any preselection), using the results of crop yield changes at every grid location. This allowed us to cluster ensemble members with high similarity in terms of spatial pattern and magnitude41. There were 288 members (9 GGCMs × 32 GCMs) for wheat and maize, and 256 members (8 GGCMs × 32 GCMs) for rice and soybean. We classified future yield change into three clusters to show three considerably dissimilar yield change predictions. Cluster 1 represented yield decreases, cluster 2 represented small changes in yield, and cluster 3 represented yield increases. Since crop yield is a common and key variable in crop modeling, the investigation of different variables might result in only minor variations in the outcomes.

Uncertainty analysis

ANOVA methodology

We used the analysis of variance (ANOVA) technique to quantify the uncertainty in crop yield changes for wheat, maize, rice, and soybean due to GGCMs and GCMs. The equations for the total sum of squares (SST) for a two-way ANOVA are:

$${{{{{\rm{SST}}}}}}={{{{{{\rm{SS}}}}}}}_{{{{{{\rm{GGCM}}}}}}}+{{{{{{\rm{SS}}}}}}}_{{{{{{\rm{GCM}}}}}}}+{{{{{{\rm{SS}}}}}}}_{{{{{{\rm{GGCM}}}}}}}* {{{{{\rm{GCM}}}}}}$$
(1)
$${{P}}_{{{{{{\rm{GGCM}}}}}}}( \% )={{{{{{\rm{SS}}}}}}}_{{{{{{\rm{GGCM}}}}}}}/{{{{{\rm{SST}}}}}}\times 100$$
(2)
$${{P}}_{{{{{{\rm{GCM}}}}}}}( \% )={{{{{{\rm{SS}}}}}}}_{{{{{{\rm{GCM}}}}}}}/{{{{{\rm{SST}}}}}}\times 100$$
(3)
$${{P}}_{{{{{{\rm{GGCM}}}}}}}* {{{{{\rm{GCM}}}}}}( \% )={{{{{{\rm{SS}}}}}}}_{{{{{{\rm{GGCM}}}}}}}* {{{{{\rm{GCM}}}}}}/{{{{{\rm{SST}}}}}}\times 100$$
(4)

where PGGCM is the uncertainty contribution due to GGCM, PGCM is the uncertainty contribution due to GCM, and PGGCM*GCM is the uncertainty contribution arising from interactive effects between GCMs and GGCMs. In this study, we use the mean proportion of uncertainty as it offers a straightforward and readily interpretable measure of central tendency.

Subsampling scheme

We used ANOVA to estimate how different sample sizes of models would affect the total uncertainty in yield change. For GGCMs, different combinations were selected as \({C}_{i}^{9}\) for wheat and maize, \({C}_{i}^{8}\) for rice and soybean, i varies from 2, 3, …,9 (or 8).

$$\overline{\rm {{SS{T}}}_{(i)}}=\frac{1}{n}\mathop{\sum }\limits_{j=1}^{n}({\rm {S{S}}}_{\rm {{GGCM}}(i)}+{\rm {S{S}}}_{\rm {{GCM}}}+{\rm {S{S}}}_{{\rm {GGCM}}(i)\times {\rm {GCM}}})\,$$
(5)
$${P}_{\rm {{GGCM}}(i)}( \% )=\frac{\frac{1}{n}{\sum }_{j=1}^{n}({\rm {S{S}}}_{{\rm {GGCM}}(i)})}{\overline{{\rm {SS{T}}}_{(i)}}}\times 100$$
(6)

where i is the number of the GGCM used (2–9 for wheat and maize, 2–8 for rice and soybean), n is the total number of \({C}_{i}^{9}\) for wheat and maize, \({C}_{i}^{8}\) for rice and soybean, j is 1,2, …, n.

The ANOVA analysis for the number of GCMs was similar to the analysis for the number of GGCMs. However, because the number of GCMs was 32, the combinations of \({C}_{i}^{32}\) would be more than thousands of millions when i was between 6 and 26. It was therefore not practical to use all combinations when subsampling GCMs. Consequently, we tested the uncertainty change under different sizes of random subsets and found that the uncertainty change tended to be stable when the size was around 100,000 (i ranging from six to 27) for each crop (Fig. S18). Therefore, we set the maximum size of subsampling GCM to be 100,000.

$$\overline{{\rm {SS{T}}}_{(i)}}=\left\{\begin{array}{c}\dfrac{1}{n}\mathop{\sum }\limits_{j=1}^{n}({\rm {S{S}}}_{{\rm {GGCM}}}+{\rm {SS}}_{{\rm {GCM}}(i)}+{\rm {SS}}_{\rm {{GGCM}}\times {\rm {GCM}}(i)})\,2\,\le \,i\,\le \,5\,{\rm {or}}\,i\ge 28\,\\ \dfrac{1}{100,000}\mathop{\sum }\limits_{j=1}^{100,000}({\rm {SS}}_{{\rm {GGCM}}}+{\rm {SS}}_{{\rm {GCM}}(i)}+{\rm {SS}}_{{\rm {GGCM}}\times {\rm {GCM}}(i)})\,6\,\le \,i\,\le \,27\end{array}\,\right.$$
(7)
$${P}_{{\rm {GCM}}(i)}( \% )=\left\{\begin{array}{c}\frac{\frac{1}{n}{\sum }_{j=1}^{n}({\rm {S{S}}}_{{\rm {GCM}}(i)})}{\overline{{\rm {SS{T}}}_{(i)}}}\times 100\,2\,\le \,i\,\le \,5\,{\rm {or}}\,i\ge 28\,\\ \frac{\frac{1}{100,000}{\sum }_{j=1}^{100,000}({\rm {S{S}}}_{{\rm {GCM}}(i)})}{\overline{{\rm {SS{T}}}_{(i)}}}\times 100\,6\le i\le \,27\end{array}\right.$$
(8)

where i is the number of GCMs used, n is the total number of \({C}_{i}^{32}\), j ranges from 1 to n

In this study, we defined the least number of GGCMs necessary to sample uncertainty to be when the GGCM-induced uncertainty was within ±2.5% of the uncertainty for the full ensemble of GGCMs24. However, because the proportion of GCM-induced uncertainty was lower than that for GGCM, we used a threshold of ±1.5% to select the minimum ensemble size for GCMs.