CMIP6 GCM ensemble members versus global surface temperatures

The Coupled Model Intercomparison Project (phase 6) (CMIP6) global circulation models (GCMs) predict equilibrium climate sensitivity (ECS) values ranging between 1.8 and 5.7 $^\circ$C. To narrow this range, we group 38 GCMs into low, medium and high ECS subgroups and test their accuracy and precision in hindcasting the mean global surface warming observed from 1980-1990 to 2011-2021 in the ERA5 T2m, HadCRUT5, GISTEMP v4, and NOAAGlobTemp v5 global surface temperature records. We also compare the GCM hindcasts to the satellite-based UAH MSU v6 lower troposphere global temperature record. We use 143 GCM ensemble averaged simulations under four slightly different forcing conditions, 688 GCM member simulations, and Monte Carlo modeling of the internal variability of the GCMs under three different model accuracy requirements. We found that the medium and high ECS GCMs run too hot up to over 95% and 97% of cases, respectively. The low ECS GCM group agrees best with the warming values obtained from the surface temperature records, ranging between 0.52 and 0.58 $^\circ$C. However, when comparing the observed and GCM hindcasted warming on land and ocean regions, the surface-based temperature records appear to exhibit a significant warming bias. If the satellite based UAH MSU lt record is accurate, actual surface warming from 1980 to 2021 may have been around 0.40 $^\circ$C (or less), i.e. up to about 30% less than what is reported by the surface based temperature records. The latter situation implies that even the low ECS models would have produced excessive warming from 1980 to 2021. These results suggest that the actual ECS may be relatively low, i.e. lower than 3 $^\circ$C or even less than 2 $^\circ$C. Therefore, the projected global climate warming over the next few decades could be moderate and probably not particularly alarming.

CO 2 concentration could cause an increase in global surface temperature of about 1 • C. Therefore, only strong positive climate feedbacks could significantly increase the ECS above such a value, but their existence is still debated.
Constraining the ECS value is an urgent task of climatology. In fact, at least two-thirds of the CMIP6 GCMs could be severely defective. For example, by grouping models into low (1.5 < ECS ≤ 3.0 • C), medium (3.0 < ECS ≤ 4.5 • C) and high (4.5 < ECS ≤ 6.0 • C) sensitivity values, if, say, the actual ECS is less than 3°C, the GCMs with ECS > 3 • C should be ignored. Therefore, it is very important that detailed evaluations of the models are carried out in order to determine if, where and how the models should improve both on a global scale -as proposed, for example, in this work -and on regional scales, as done in numerous other studies (e.g.: Heo et al., 2014;Seo et al., 2018, and many others).
Constraining ECS also has important policy implications because the expected warming for the 21 st century depends on the value of the model's ECS (Grose et al., 2017;Scafetta, 2022): the higher the ECS, the greater the expected warming due to GHG emissions. For example, Huntingford et al. (2020) found that the wide ECS range of CMIP6 GCMs implies that at thermal equilibrium the global surface temperature could warm up between 1.0 • C and 3.3 • C above the preindustrial period  even if anthropocentric emissions cease today.
Scientists already wondered whether a strong response to greenhouse gases could be realistic (Voosen, 2019). Indeed, high ECS CMIP6 models have already been found to perform poorly (e.g.: Ribes et al., 2021;Scafetta, 2022;Tokarska et al., 2020;Zhu et al., 2020) while the medium and even the low ECS models are being carefully evaluated.
For example, Nijsse et al. (2020) derived that the most likely ECS interval should be 1.9-3.4 • C while alternative studies, often empirical based, have suggested that the actual ECS could be even lower, probably between 1 • C and 2.5 • C (e.g.: Lewis and Curry, 2018;Lindzen and Choi, 2011;Scafetta, 2013;Stefani, 2021;van Wijngaarden and Happer, 2020). Most GCMs seem to overestimate the observed surface warming since 1980 (Scafetta, 2021b(Scafetta, , 2022 and also that observed in the global (McKitrick and Christy, 2020) and tropical troposphere (Mitchell et al., 2020), in particular at its top  where the CMIP6 GCMs predict an unobserved hot spot (McKitrick and Christy, 2018). A similar situation also occurred with the previous CMIP3 and CMIP5 GCMs (Fu et al., 2011;Scafetta, 2012aScafetta, , 2013. Actually, as Knutti et al. (2017) acknowledged, there is a dichotomy between the observed and modeled ECS as GCMs tend to favor sensitivity values at the top of the probable range, while several studies based on instrumentally recorded warming and some from paleoclimate favor values in the lower part of the range. Therefore, not only the models with high ECS, but also those with medium ECS should be and are being seriously questioned. Scafetta (2021aScafetta ( , 2022 showed that the performance of the GCMs improves as their ECS decreases and, in any case, the low ECS GCMs appear to be the best performing models. However, even low-ECS GCMs need further evaluation because biases in some regions (e.g. on land) could be offset by opposite biases in other regions (e.g. on ocean). Furthermore, serious uncertainties remain in the solar forcing and in the temperature records themselves (Connolly et al., 2021;D'Aleo, 2016). These uncertainties question the warming trend reported by the available climate records and, directly or indirectly, the models themselves. Finally, climate systems seem to be regulated by various natural oscillations from the decadal to the millennial scales, which the GCMs are unable to reproduce, the presence of which would also imply low ECS values, probably between 1 and 2 • C (Scafetta, 2012a(Scafetta, , 2013(Scafetta, , 2021c. Focusing on the performance of the CMIP6 GCMs, Scafetta (2022) proposed that the probable ECS range could be constrained by statistical investigation to find which GCM group -low, medium or high ECS -best reproduces the observed global surface warming between the 1980-1990 and 2011-2021 as reported by ERA5-T2m (Hersbach et al., 2020;Simmons et al., 2021). The period 1980-2021 was chosen because it is optimally covered by all available climatic temperature records. Scafetta (2022) analyzed the "average" simulations provided by the Koninklijk Nederlands Meteorologisch Instituut (KNMI) Climate Explorer (Oldenborgh, 2020) of 38 CMIP6 GCMs with three shared socioeconomic pathways (SSP) emission scenarios, which also counted for a partial evaluation of the internal variability of the models. The low ECS GCM group was found to be perfectly compatible, at least on a global scale, with the 2011-2021 warming relating to the 1980-1990 period. Conversely, both GCM groups with medium and high ECS showed too high warming trends.
A possible objection to the analysis proposed in Scafetta (2022) is that temperature records should be compared with actual members of the CMIP6 GCM ensemble instead of their ensemble averages because the unforced internal variability of the models produces different results due to uncertainties in the initial conditions as well as in the internal parameters of the models. This problem will be addressed in this paper considering that: 1. physical models, including the GCMs, should be accurate and precise (see Appendix B); 2. there are still open issues regarding the reliability of the available global surface temperature records.
In fact, theoretical models must reproduce observations within a reasonably small error. In our case, it should be evident that the poor precision of a GCM cannot be used as a pretext to justify its poor accuracy. For example, a low-precision model could produce a very wide range of different hindcasts due to its internal variability. In this situation, even if some of its hindcasts fit the observations, the result should still be considered unsatisfactory if the mean of the GCM set diverges too much from the actual data. Similarly, if an ECS GCM group produces a set of hindcasts that too sparsely encompass the observations, the ECS values that characterize that group should be considered unrealistic even though some of the models in the same group might perform better than others. In general, the accuracy, precision and ECS category of the GCMs must be evaluated simultaneously.
Furthermore, surface-based temperature records appear to exhibit non-climatic warming biases due to poorly corrected urban heats or other local surface phenomena (e.g.: Connolly et al., 2021;D'Aleo, 2016;Scafetta, 2021a). To account for this problem, the satellite temperature measurements of the lower troposphere using microwave resonance units (MSU) proposed by the U. of Alabama Huntsville (UAH-MSU-lt v6) (Spencer et al., 2017) will also be analyzed.
UAH-MSU-lt is the temperature record that features the lowest global warming trend (about +0.13°C/decade) from 1980 to 2021 among all available global temperature records. According to GCM simulations, the troposphere is expected to warm up faster than the surface (up to a factor of 3) because greenhouse gases are expected to warm the atmosphere first (Mitchell et al., 2020). Consequently, the global warming trend of the troposphere estimated from satellite measurements should be further reduced to simulate the global warming trend at the surface. Here, these corrections are ignored and UAH-MSU-lt is assumed to represent the possible lowest limit for the global warming trend of the surface. Therefore, comparison with this satellite temperature record could help assess the presence of non-climatic warming bias in the surface temperature records, particularly on land where large contaminated areas appear to exist (cf. Scafetta and Ouyang, 2019;Scafetta, 2021a).
Indeed, preliminary analyzes have shown that the land seems to have warmed too much and too quickly compared to the ocean (Scafetta, 2021a). Connolly et al. (2021) used data from rural stations only and showed that the warming of the Northern Hemisphere's land surface should be significantly lower than what reported by the available surface-based temperature records based on both rural and urban stations. Watts (2022) examined the quality of the U.S. temperature stations from which official temperature records are obtained and concluded that approximately 96% of them could not meet the National Oceanic and Atmospheric Administration (NOAA) requirements for "acceptable placement" because they could be significantly contaminated by different heat sources. In general, the surface temperature records and the homogenization algorithms used to adjust them present several problems that may have exaggerated the warming. Thus, the integrity of the available global surface temperature records and, therefore, the ability to correctly determine the global warming trend of the 20 th and 21 st century should be questioned as well (Connolly et al., 2021;D'Aleo, 2016).
There is a different MSU record (Mears and Wentz, 2016), which shows warming trends that is more compatible with those presented by the surface-based temperature records. However, this alternative satellite-based record is not analyzed here because it would overlap the results of the surface-based temperature records. In any case, adopting it in the present study may not be optimal because it only covers the latitude range from 70.0 • S to 82.5 • N and because it appears to perform worse than UAH-MSU-lt that better agrees with the radiosonde temperature database .
Here, we significantly expand the analysis presented by Scafetta (2022) by testing 143 GCM average simulations and all 688 GCM member simulations available on the KNMI website against four surface-based global temperature records (ERA5-T2m, HadCRUT5, GISTEMP v4, NOAAGlobTemp v5) and the UAH-MSU-lt v6 satellite-based record. Since we wish to narrow the ECS range, we again group the models into three classes corresponding to low, medium and high ECS values, as proposed in Scafetta (2022). ECS GCM groups that produce systematically biased trends (e.g. too hot or too cold relative to the observed temperatures) should be questioned and not used for policy even though some simulations may appear to reproduce the observations. Finally, we compare the GCM hindcasts with observed land and ocean warming values to determine whether the surface-based records could be regionally biased and whether the ECS should be further constrained towards lower values.

Data and methods
We analyze the monthly reanalysis field near-surface air temperature (ERA5-T2m) record from 1980 to 2021 (Hersbach et al., 2020;Simmons et al., 2021). We repeat the same analysis using the HadCRUT5 (Morice et al., 2021), GISTEMP v4 (Lenssen et al., 2019), and NOAAGlobalTemp v5 (Zhang et al., 2019) global surface temperature records. Some of these records, however, may not cover the entire surface of the globe from 1980 to 2021. There are other global surface temperature records such as those proposed by the Japanese Meteorological Agency (JMA, Ishihara, 2006) and by the Berkeley Earth group (BE, Rohde and Hausfather, 2020), which will also be discussed briefly. For completeness, as explained in the Introduction, we add a comparison with the UAH-MSU-lt v6 temperature measurements (Spencer et al., 2017).
We also analyze all 143 "average" surface air temperature (tas) records and all 688 ensemble member records from 38 different CMIP6 GCMs downloadable from KNMI Climate Explorer. These simulations were produced using historical forcings (1850-2014) further extended up to 2100 with four different SSP scenarios: SSP1-2.6 (low GHG emissions), SSP2-4.5 (intermediate GHG emissions), SSP3-7.0 (high GHG emissions ) and SSP5-8.5 (very high greenhouse gas emissions) (IPCC, 2021). These four scenarios are nearly indistinguishable until 2021. Thus, from 1850 to 2021 the four simulation sets can be considered independent assessments of the same models under nearly identical forcing conditions, which also helps to assess in first approximation the internal variability of the models.
The 1980-2021 period was chosen to better evaluate the performance of the CMIP6 GCMs. This period is optimally covered by numerous climatic temperature records including those based on satellite measurements that are alternative to those based on land and oceanic measurements that could be affected by various non-climatic biases, which are difficult to eliminate (D'Aleo, 2016;Watts, 2022). In fact, going back in time from 1980 to 1850, the temperature records are affected by ever-larger uncertainties and uncovered areas, which makes evaluating the CMIP6 models even more difficult. A possible advantage of the present study is that the previous studies evaluating the performance of the CMIP6 models attempted to constrain the ECS by comparing GCM simulations only with surface climate records from 1850 to 2020 (Ribes et al., 2021) or from 1981 to 2014 (Tokarska et al., 2020), or even using uncertain paleoclimate records (Zhu et al., 2020) and concluded that only high-ECS models (ECS > 4.5 • C) could be excluded. However, there are open questions as to whether cooling adjustments applied to different Earth surface temperature records from 1850 to 1980 are justified (D'Aleo, 2016) and whether in more recent periods the global surface climate records are affected by nonclimatic warming biases (Connolly et al., 2021;Scafetta, 2021a). These biases could have exaggerated the 20 th century warming trend and incorrectly provided support for the medium-ECS GCMs.
The 1980-2021 warming for each record is calculated by evaluating the 2011-2021 average temperature anomaly relative to the 1980-1990 period. 11-year intervals are used to bypass biases due to interannual fluctuations such as those related to ENSO and the 11-year solar cycle. Then, we apply standard statistical tests to decide if and how the observed warming values for each of the temperature records are reproduced by the three ECS GCM groups.
The ERA5-T2m global surface temperature average warming from 1980-1990 to 2011-2021 is estimated to be: The other temperature records give: HadCRUT5 (infilled data), ∆T mean = 0.581 • C; GISTEMP v4, ∆T mean = 0.570 • C; NOAAGlobalTemp v5, ∆T mean = 0.523 • C. HadCRUT5, GISTEMP, and ERA5-T2m give nearly identical warmings. We also observe that HadCRUT5 (non-infilled data) gives ∆T mean = 0.549 • C and HadCRUT4 gives ∆T mean = 0.521 • C. BE gives ∆T mean = 0.591 • C and JMA gives ∆T mean = 0.557 • C, which do not differ much from the above estimates. Thus, from 1980 to 2021, the available surface-based global temperature records measure that the global surface warming from 1980-1990 to 2011-2021 has been between 0.52 • C and 0.59 • C, or approximately between 0.50 • C and 0.60 • C, with an average of 0.56 • C. In contrast, the satellite-based UAH-MSU-lt v6 temperature record gives ∆T mean = 0.402 • C, suggesting that 2011-2021 actual warming may have been even less than 0.40 • C because, as explained in the introduction, according to the GCMs the temperature trend of the troposphere should be scaled down to make it compatible with the surface warming trend.
For the temperature records, since 1980 the error of the average over an 11-year period can be estimated to be very small,σ 95% ≈ 0.01 • C (see Appendix A), which represents about 2% of the warming from 1980-1990 to 2011-2021, and is less than the differences between the various temperature records.
As explained in the Introduction, the proposed analysis groups the CMIP6 GCMs into three subsets characterized by low (1.5 < ECS ≤ 3.0 • C), medium (3.0 < ECS ≤ 4.5 • C) and high (4.5 < ECS ≤ 6.0 • C) sensitivity values. This choice is based on the following heuristic considerations. In fact, the IPCC (2013) estimated that the ECS had to have a "likely" range of 1.5 -4.5 • C. Also Wigley et al. (1997) suggested the same interval although the best-fit sensitivity was found to be 2.5°C. This range can be heuristically divided into at least two equal parts: 1.5 < ECS ≤ 3.0 • C and 3.0 < ECS ≤ 4.5 • C. In 2013, the CMIP5 GCMs were used. However, the IPCC (2021) adopted the CMIP6 GCMs that extended the ECS range up to 6 • C so that an equally large third range, 4.5 < ECS ≤ 6.0 • C, could be added to the previous two. Zelinka et al. (2020) explained that the causes of the increased climate sensitivity in the CMIP6 models were due to stronger positive cloud feedbacks due to decreased extratropical cloud cover and albedo that, however, might be questionable. Therefore, the interval 1.5 < ECS ≤ 3.0 • C collects the GCMs with ECS values most consistent with different empirical results, as discussed in the Introduction; the interval 3.0 < ECS ≤ 4.5 • C collects the other GCMs that also the IPCC (2013) would have considered acceptable; finally, the interval 4.5 < ECS ≤ 6.0 • C collects the GCMs included in the IPCC (2021) but which in 2013 the IPCC itself considered to predict an unlikely high ECS.  (2011-2021 minus 1980-1990) hindcasted by 38 CMIP6 GCMs mean simulations. The blue vertical lines represent the temperature change measured by HadCRUT5, ERA5-T2m, GISTEMP v4, NOAAGlobTemp v5, and UAH-MSU-lt v6 temperature records, respectively, with their 95% confidence interval. The three yellow boxes represent the ±1σ dispersion of the data referring to the low (1.5 < ECS ≤ 3.0 • C), medium (3.0 < ECS ≤ 4.5 • C) and high (4.5 < ECS ≤ 6.0 • C) ECS GCMs. See Table 1. Figure 1 shows the GCM simulations (left) and their ensemble mean ± 1σ range (right) grouped according to the three GCM ECS sets with respect to the ERA5-T2m global surface temperature record (black, moving averages at 12 months). All records are temperature anomalies relative to the period 1980-1990. Figure 2 shows a similar comparison with respect to the HadCRUT, GISTEMP, NOAAGlobTemp and UAH-MSU-lt temperature records.

Analysis of the CMIP6 GCM simulations
Both figures show that as the ECS increases, the global surface warming predicted by the models also increases. However, only the low-ECS GCM group can be considered perfectly consistent with the surface-based global temperature records because it encloses them well within the ±1σ GCM range (yellow area).
Figures 1 and 2 also show that, compared to the satellite record, even the GCM group with low ECS seems to overestimate the observed warming. In fact, even for the low ECS GCM group from 2011 to 2021 the UAH-MSU-lt record is not well enclosed within the ±1σ model ensemble (yellow) area although a better agreement is found in the period 2015-2020. The latter was characterized by the significant El Niño warming events of 2015-2016 and 2020 (Appendix A, Figure 11). Therefore, the 2015-2020 warming for the period 2000-2014 could also be temporary (Scafetta, 2021c) and not related to the warming hindcasted by the models because it is clearly due to natural climatic fluctuations while the average warming produced by the models is due to anthropogenic forcing. From 2015 to 2022, in fact, a slightly cooling trend is observed. From 2000 to 2014 the UAH-MSU-lt v6 record also clearly shows the so-called global warming "hiatus" or "pause" (IPCC, 2013). This decade-long lack of warming began to seriously question the GCMs, and various statistical solutions were proposed to circumvent the problem by referring to the fluctuations of the internal variability of the climate system (e.g. Meehl et al., 2011). Figures 1 and 2 also show that, at the present, the "pause" appears missing or attenuated in the latest versions of the surface-based global temperature records. 3.1 Analysis of the GCM average simulations Scafetta (2022) analyzed the average simulations of 38 GCMs using the historical + SSP2-4.5, SSP3-7.0, and SSP5-8.5 radiative forcing scenarios up to June 2021; the warming values for each model were collected in the table there published. Figure 3 graphically shows the results of the same analysis, which was updated to the whole year 2021 and also included the SSP1-2.6 simulations, compared to the temperature observations (green vertical lines). 143 average records are analyzed. For each ECS GCM group the statistics provide (see Table 1 The result confirms that the GCM group with low ECS is perfectly compatible with the observed warming (Eq. 1) within the ±1σ range. In contrast, both GCM groups with medium and high ECS show warming biases. Moreover, as Scafetta (2022) already observed, Figure 3 also shows that none of the medium and high ECS models predict an average warming of less than 0.6 • C, which is above the warming reported by all global temperature surface records. This result suggests that models with ECS > 3 • C should be questioned at the 95% confidence level. Thus, by considering only the GCM ensemble averages for the four SSPs, the real ECS should be equal to or lower than 3 • C.
3.2 Analysis of the full range of the GCM ensemble members Figure 4 shows in four panels the temperature variations (2011-2021 minus 1980-1990) of the 688 simulations of GCM ensemble members available per forcing set (Hist+SSP1-2.6, Hist+SSP2-4.5, Hist+SSP3-7.0 and Hist+SSP5-8.5; red dots) against the five temperature records (vertical lines). The figure visually confirms that the vast majority of ensemble member simulations produced by the GCM groups with medium and high ECS run too hot relative to all five temperature records.
To examine how observed warming values are placed within the distributions of possible GCM hindcasts for each of the three ECS groups, we count how many member simulations record temperatures colder or warmer than each of the five temperature records. Table 2 reports the results.
The analysis confirms that the low-ECS models produce results that well enclose the 2011-2021 average temperatures obtained using the surface temperature temperature records, which always fall within the statistical interval ±1σ (corresponding to the 16-84% probability interval) of the distribution of the GCM hindcasts. In contrast, 94-100% and 97-100% of hindcasts produced by the GCMs with medium and high ECS are warmer than all five temperature records, respectively. Therefore, also considering the full range of the available CMIP6 GCM simulations, the GCMs with medium and high ECS run too hot. Thus, the actual ECS should be equal to or less than 3 • C.
However, 96% of GCM simulations from the low-ECS GCM are warmer and only 4% cooler than the lower troposphere temperature record. So, once again, we found that if UAH-MSU-lt better reproduces the actual global warming from 1980-1990 to 2011-2021, the vast majority of the low-ECS GCM ensemble members would also be found to run too hot.     Table 3 Probability P ∆T<GCMs and P ∆T>GCMs that the 2011-2021 warming hindcast from 1980-1990 to 2011-2021 for each ECS GCM ensemble is warmer or colder, respectively, than HadCRUT5, ERA5-T2m, GISTEMP v4, NOAAGlobTemp v5 and UAH-MSU-lt v6, respectively: see Figure  6.  (d, e, f) Boxplots of the the probability density functions depicted in panels a, b, c, respectively, using double whiskers and boxes indicating the following probability ranges: 2.5%, 16%, 25%, 50%, 75%, 84% and 97.5%.
3.3 Statistical modeling of the GCM unforced internal variability Figure 5 shows the boxplots relating to the simulations shown in Figure 4 for each model. Again, the GCM group with low ECS is best centered around the surface-based observations indicated by the horizontal blue lines while the Observations and GCM hindcasts Modeled Global and Land Warmings (80 • S:80 • N)    GCM groups with medium and high ECS exhibit systematic warming bias except for very few models. However, the dispersion of the boxplots varies greatly among the GCMs because the models are not physically equivalent to each other and, furthermore, probably because of the different number of simulations available for each model. In fact, the GCMs are represented unevenly in the KNMI collection because the number of simulations available for each GCM varies from 3 to 100 among the models: see Figure 5. Therefore, the statistics discussed in Section 3.2 may be skewed towards models with a larger number of available simulations because they will weight more in the statistical test reported in Table 3. This problem could be solved by using a Monte Carlo strategy to simulate the spread of GCM hindcasts that could be associated with unforced internal variability. This exercise is proposed below.
It can be assumed that each GCM produces simulations distributed around a mean µ m with a given standard deviation σ m characterizing its internal variability. We note that σ m should be assumed constant for all GCM averages because it could be interpreted as a "precision" requirement for GCMs. Indeed, GCM hindcasts should always agree with observations within an acceptable statistical uncertainty.
We propose three different options for σ m covering approximately the ranges of the GCM boxplots shown in Figure  5: σ H ≈ 0.05 • C (high precision), σ M ≈ 0.10 • C (medium precision), and σ L ≈ 0.15 • C (low precision). Figure 5 suggests that the high-precision option (σ H ≈ 0.05 • C) could be satisfied by most GCMs; it requires the model mean to be within ±0.1 • C (95% confidence interval) of the actual warming value. The 95% confidence range becomes ±0.2 • C for the medium-precision requirement (σ M ≈ 0.10 • C) and ±0.3 • C for the low-precision option (σ L ≈ 0.15 • C).
Appendix B shows that the interval ±0.1 • C (95% confidence), which corresponds to the high precision option, σ H ≈ 0.05 • C, should be the preferred choice for the acceptable uncertainty related to the internal variability that should be requested for the GCM because it could be derived from the variability of the temperature records themselves. Figure 5 also shows that the low-precision option σ L ≈ 0.15 • C is only consistent with the EC-Earth3 GCM. The usefulness of this model should be questioned because it hindcasts 2011-2021 global surface warming values ranging between 0.5 • C to 1.2 • C with an average of 0.82 • C. This means that EC-Earth3 is both inaccurate and imprecise in hindcasting the global surface warming from 1980 to 2021. Figure 6 shows the combined probability density functions (PDF) and the related boxplots derived from all the GCM means reported in Figure 3 and in Table 1 with the three precision requirements for the three ECS GCM groups compared to the warming levels obtained with the adopted five temperature records. The complementary Gaussian error function was used to evaluate the relative statistical position of the five actual warming values within each probability density function.
For each model mean µ m and precision σ, the probability P m that the GCM hindcast is larger than the measured warming ∆T is Thus, the mean P ∆T<GCMs = 1 N ∑ N m=1 P m across all models for each ECS GCM group gives the probability of obtaining simulations warmer than the reference temperature value. P ∆T<GCMs can also be obtained by integrating the probability density functions shown in Figure 6a-6c from the green line to infinity or by using a Monte Carlo strategy by generating, for example, 1000 computer values from a Gaussian distribution with mean µ m and standard deviation σ. The relevant statistics are shown in Table 3. Figure 6a-6c show that the GCM group with low ECS (blue curves) always produces predictions well-centered on the observed warming for the four surface temperature records because their 2011-2021 values always fall within the ±1σ statistical interval (which corresponds to the 16%-84% probability range) of the GCM distributions for the high, medium, and low precision options, respectively. However, once again, if the actual 1980-2021 warming is given by UAH-MSU-lt, even the GCM group with low ECS seems to be biased towards too hot values in 95%, 91% and 85% of possible cases, respectively, for the three precision options (Table 3).
The predictions of the medium (purple) and high (red) ECS GCM groups always show significant warming biases. Also, particularly for the GCM group with high ECS, the PDF appears to have two peaks, implying that the GCMs in this group are physically very different from each other because they produce very different warming hindcasts that are clustered around 0.8 • C and 1.2 • C; the warmest PDF peak is mostly due to the CanESM5 GCM.
For the high-precision requirement (σ H = 0.05 • C), these two GCM groups produce results warmer than the observed values from a minimum of 98% to a maximum of 100% of their possible output, which is outside the 95% confidence interval. For the medium precision option (σ M = 0.10 • C), the medium and high GCM groups produce results warmer than the observed values from a minimum of 93% to a maximum of 100% of their possible outputs, which is at the limit of the 95% confidence interval. For the low precision option (σ L = 0.15 • C), the GCM groups with medium and high ECS produce warmer results than the four surface-based temperature records from 88% to 95% of cases. Conversely, 99% or more of the theoretical hindcasts of the GCM groups with high and medium ECS would be warmer than UAH-MSU-lt even for the low precision option (σ L = 0.15 • C).
The boxplots illustrated in Figures 6d-6f were obtained using the Monte Carlo strategy proposed above which simulates 1000 randomly distributed outputs for each of the 143 model averages for each of the three precision options (for a total of 3 × 143, 000 theoretical hindcasts). The three panels show that in all cases, with respect to the observed temperature values, the groups with medium and high ECS are well outside the 68% confidence interval (i.e. the ±1σ interval). Furthermore, the GCM groups with medium and high ECS indicate levels of warming that are respectively 30% and 50% greater than those actually observed and, consequently, their accuracy is rather low. The accuracy of the low-ECS GCM group is good compared to the surface-based temperature records, but it still reports average warming that is about 30% larger than that reported by the satellite temperature record. The whisker extension of the boxplot shows that the precision of the low, medium and high ECS groups varies from modest (±0.2 • C) to very poor (±0.5 • C) range from low ECS and high precision GCM group to high and low precision GCM group.

Testing the land versus the ocean warming
Surface-based temperature records imply that the GCM group with low ECS performs better than those with medium and high ECS, which suggests that the most likely ECS value should be equal or lower than 3 • C. However, UAH-MSUlt implies that even the low-ECS GCMs may perform quite poorly. The observed discrepancy between the surface and satellite temperature records may be due to the presence of various non-climatic warming biases in the surface temperature records (Connolly et al., 2021;D'Aleo, 2016;Scafetta, 2021a;Watts, 2022). This problem is now being investigated by comparing the GCM hindcasts against the land and the ocean temperature observations.    Figure 7c and 7d show that ERA5-T2m and HadCRUT5 (infilled data) are global because they adopt interpolations of meteorological models to extend coverage also in data-scattered regions of the globe such as the poles and other inhabited areas (large deserts and forests). Figures 7e and 7f show that the GISTEMP and NOAAGlobTemp records do not cover large areas, in particular, the polar regions are poorly represented. Figure 7b is characterized by lighter colors than the other temperature panels, which means that the UAH-MSU-lt temperature record shows less warming than the surface-based temperature records almost everywhere. All six temperature panels in Figure 7 also show that the land area has warmed more than the ocean region. In any case, Figures 7c-7f show that the surface temperature records present a greater temperature difference between the land and the ocean regions. The visual comparison with the CMIP6 ensemble average simulation (Figure 7a) suggests the same general pattern but, furthermore, the oceanic area appears slightly warmer than all five temperature records. The temperature records also show extensive ocean areas where significant cooling is observed such as around Antarctica, the eastern equatorial Pacific, the North Atlantic and a few other regions. These cooling regions reveal interesting dynamic patterns that are not captured by the average simulation of the CMIP6 ensemble. These patterns are best emphasized in the areal t-test proposed in Scafetta (2022). Table 4 Table 4 also reports the ratios between the land and the ocean warming levels. The area 0 • N:80 • N shows that from 1980-1990 to 2011-2021 the surface temperature records warmed on average by about 0.32 ± 0.05 • C more than the satellite-based UAH-MSU-lt record, while the area 80 • S:80 • N the surface-based records warmed on average by about 0.15 ± 0.02 • C more than the satellite record. A similar warming bias on land also appears in the Southern Hemisphere (60 • S:0 • S) because the surface-based temperate records show ocean warming averaging 0.05 ± 0.03 • C less than the satellite record while their land area warmed by 0.08 ± 0.03 • C more. Figure 8 shows the results for each GCM model (using the 143 GCM average simulations available for each SSP) for the latitudinal interval 60 • S:80 • N, which is optimally covered from all temperatures records and includes all continents except Antarctica. The results are also reported in Tables 5, 6, and 7 and could be used to evaluate possible anomalous temperature trends on the continents.   Figure 8d shows the relationships between average warming on the land and the ocean areas. The mean land/ocean ratio for the vast majority of the models is 1.75 ± 0.20, which is a value placed between the results obtained for the surface temperature records (ranging from 1.95 to 2.32 ) and that of the satellite temperature record, which gives 1.51.
The results shown in Figure 8 can be interpreted as follows. 1) Figure 8b shows that the land surface temperature records are on average 0.4 • C warmer than the satellite-based one. On the contrary, Figure 8c shows that the surface-based ocean temperatures are on average up to a maximum of 0.1 • C warmer than the satellite ones.
2) Therefore, it can be assumed that on the ocean, the satellite-based temperature record is sufficiently compatible with the surface-based ones. If so, the large divergence observed on land between surface and satellite recordings could suggest that the land measurements are significantly contaminated by non-climatic warming biaes, including those related to urbanization (cf.: Connolly et al., 2021;D'Aleo, 2016;Scafetta, 2021a;Watts, 2022).
3) A similar conclusion would also be indirectly supported by the GCM hindcasts which show that the CMIP6 models are usually unable to correctly reconstruct the large land/ocean temperature ratio observed in the surface temperature records. In fact, the models give a land/ocean ratio equal to 1.75 ± 0.20, while the surface records give ratios between 1.95 and 2.32. 4) However, as the GCMs attempt to reconstruct the global surface warming of the surface temperature records even though they cannot adequately explain their large land/ocean warming ratio, the models could have calibrated internal parameters to obtain a compromise that attempts to approximate the global surface warming by simulating a warmer ocean and a cooler land than observed.
If point 4 above is correct, the reliability of the low-ECS GCMs should also be questioned. In fact, 8d shows contradictory results regarding the low-ECS GCMs because some models agree better with the surface-based temperature records, few others agree better with the satellite temperature record, while the rest report land/ocean ratios between the two levels, as the vast majority of the medium and high ECS models does.
We now assume that the GCM's predicted land/ocean temperature ratio (average ratio = 1.75 ± 0.20) corresponds to the actual physical characteristics of the climate system and that the ocean temperature warming of the surface records (on average, 0.43 ± 0.03 • C, see Table 4) is sufficiently accurate. If so, from 1980-1990 to 2011-2021 the earth's surface within the latitude interval 60 • S:80 • N should have warmed by 0.75 ± 0.1 • C instead of the observed 0.93 ± 0.03 • C. If the hypothesis is correct, the spurious warming of the land surface due to uncorrected non-climatic warming biases could be quantified as approximately +0.2 • C. The proposed correction implies that global surface warming from 1980-1990 to 2011-2021 could be at least about 0.05 • C (∼10%) lower than what the surface-based records report, which increases further the warming bias of the medium and high-ECS GCMs observed in Figures 1-6.  1980-1990 to 2011-2021 over land+ocean (total), land, ocean, and land/ocean ratio. See Figure 8.
The results depicted in Figure 8 also help to better evaluate the individual GCMs. For example, Figure 5 suggests that three high-ECS models (CNRM-CM6-1-f2, CNRM-ESM2-1-f2 and CIESM) produce relatively close warming to what is reported by the surface-based temperature records. However, Figure 8d indicates that the same models fail to produce the land/ocean temperature ratio of the same temperature records showing significantly lower (CNRM) or higher (CIESM) results than reported. Therefore, it appears that in these GCMs the biases that occur in some regions are offset by opposite biases that occur in other regions.
The last four columns of Table 4 report the global (land+ocean) and land warming calculated assuming that the ocean warming of the temperature records is correct and that the land/ocean warming ratios hindcasted by the models is correct as well. The global estimate was calculated from the ocean and the land ones weighted with their relative area percentages within each latitudinal range. In particular, we found that for the Northern Hemisphere (0 • N:80 • N), the land could have warmed about 0.087 • C less than what reported on average by HadCRUT5, ERA-T2m, and GISTEMP. This bias roughly corresponds to the different warming estimated in Connolly et al. (2021) for the northern hemisphere land Hist + SSP1-2.6 Hist + SSP2-4.5 Hist + SSP3-7.0 Hist + SSP5-8.5 1980-1990 0.29 ± 0.13 0.29 ± 0.14 0.30 ± 0.13 0.25 ± 0.13 2011-2021 0.68 ± 0.13 0.68 ± 0.15 0.70 ± 0.13 0.60 ± 0.14 2040-2060 1.02 ± 0.14 1.17 ± 0.17 1.32 ± 0.14 1.33 ± 0.14 2080-2100 1.02 ± 0.14 1.55 ± 0.17 2.20 ± 0.16 2.53 ± 0.16 Table 8 Low-ECS GCMs: global surface warming in the periods 1980-1990, 2011-2021, 2040-2060, and 2080-2100 using the simulations depicted in Figure 9. Original GCM simulations; (RF = 0.90) GCM simulations reduced by 10%; (RF = 0.67) GCM simulations reduced by 33%. The temperature anomaly is relative to the 1850-1900 average of the correspondent model set. The temperature records are baselined with the models simulations in 1980-1990. area by comparing the temperature records reconstructed by using both urban+rural stations and rural-only stations that should present significantly mitigated non-climatic warming biases.
In conclusion, the proposed land-ocean comparison suggests that the surface-based temperature records most likely exhibit non-climatic warming biases and that the actual global surface warming from 1980-1990 to 2011-2021 may have been approximately between 0.50 • C and 0.55 • C, which is approximately 10% lower than what is reported in Section 2. This means that the medium and high-ECS GCM groups are further confirmed to run too hot and that the low ECS group performs slightly worse than concluded in Section 3 because the average warming of its hindcasts from 1980-1990 to 2011-2021 is approximately 0.6 • C (Table 1). However, if UAH-MSU-lt reproduces the global surface warming more accurately, the surface-based temperature records would exhibit warming bias of up to 30% of the reported values, which would indicate that even the low ECS GCMs run significantly too hot and need to be scaled down by 33% to reduce their mean warming from 0.6 • C to 0.4 • C, which is the warming reported by the satellite-based measurements. Indeed, another indirect evidence that the land surface temperature records could be affected by a significant warming bias is also given by the divergence observed between instrumental and dendroclimatological proxy temperature records over the past 50 years, where the former show a warming trend significantly higher than the latter (Büntgen et al., 2021;Esper et al., 2018;Scafetta, 2021a).

Climate change expectations for the 21 st century
Climate impacts several areas of economic and environmental importance and its changes may require the implementation of various adaptation policies. However, climate change could also adversely affect some of the Earth's climate systems such as in areas of water scarcity, coastal communities, natural ecosystems and others (IPCC, 2022). It is reasonable to assume that if climate change is too rapid and too significant, different areas could reach a point of vulnerability where adaptation will no longer be sufficient to avoid serious adverse effects. However, adaptation policies are much more affordable than mitigation ones, so the risks associated with possible future climate change should not be overestimated.
The IPCC (2021) used the GCM CMIP6 and various scenarios of global socioeconomic change predicted up to 2100 to produce hypothetical future stories on climate change for the 21 st century. Four SSP scenarios were studied here: the SSP1-2.6 (low GHG emissions in which CO 2 emissions are reduced to zero around 2075); SSP2-4.5 (intermediate GHG emissions in which CO 2 emissions increase around the current rate until 2050, and then decrease but not reach net zero by 2100); SSP3-7.0 (high GHG emissions where CO 2 emissions double by 2100); and SSP5-8.5 (very high GHG emissions where CO 2 emissions triple by 2075).
The IPCC (2022) states that if the global surface temperature rises significantly above 2 • C over the next few decades compared to the pre-industrial period (1850-1900), adaptation policies may not be sufficient to reduce high risks related to climate change. Aggressive climate mitigation policies should therefore be implemented because the CMIP6 GCMs predict that the temperature will likely increase between 2°C and 3 • C (compared to 1850-1900) by 2050 if anthropogenic greenhouse gas emissions are not significantly reduced as soon as possible.
However, in the previous sections we found that only the GCM group with low ECS, which is also the one predicting less warming, optimally reproduces the observed warming from 1980-1990 to 2011-2021 reported by the surface-based global temperature records. Therefore, its scenario forecasts for the 21 st century should be preferred for policy. Furthermore, we also found that global warming from 1980-1990 to 2011-2021 reported by the surface temperature records may need to be reduced on average by about 10% assuming that the ocean warming is correct and that the correct land/ocean Fig. 9 Low-ECS GCM simulations from 1980 to 2100 using historical + SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5 scenarios: (a1-a8) original GCM simulations versus HadCRUT5; (b1-b8) GCM simulations reduced by 10% versus NOAAGlobalTemp v5; (c1-c8) GCM simulations reduced by 33% versus UAH-MSU-lt v6. The ordinates represent the temperature anomaly relative to the 1850-1900 average of the correspondent model set. The temperature records are baselined with the GCM simulations in 1980-1990. temperature ratio is the one predicted by the models. Finally, if UAH-MSU-lt better reproduces the actual warming from 1980-1990 to 2011-2021, even the simulations of the low ECS GCMs would be running too hot and the warming they produce would need to be reduced by 33% to optimally accommodate the observations. Here, we show and discuss how the climate could change in the 21 st century under the above assumptions. Figure 9 shows the simulations produced by the low ECS GCMs from 1980 to 2100 using the historical + SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5 scenarios in three conditions: panels a1-a8 show the original GCM simulations versus HadCRUT5; panels b1-b8 show the GCM simulations reduced by 10% compared to NOAAGlobalTemp v5; and panels c1-c8 show the GCM simulations reduced by 33% compared to UAH-MSU-lt v6. The ordinates represent the temperature temperature anomaly (°C, 1850-1900) trend (°C/year) 1960(°C/year) -1980(°C/year) 1980(°C/year) -1990(°C/year) 1990(°C/year) -2000(°C/year) 2000(°C/year) -2010(°C/year) 2004(°C/year) -2014(°C/year) 2011(°C/year) -2021(°C/year) 2000(°C/year) -2014(°C/year) 2000(°C/year) -2021 0  Table 9 Warming means and trends in various periods for HadCRUT3 (1850), HadCRUT4 (1850), HadCRUT5 (non-infilled data, 1850 and HadCRUT5 (filled data, 1850-2021) global surface temperature records. See Figure 10.
anomaly relative to the 1850-1900 average of the corresponding GCM set. The temperature records are baselined with the model simulations in [1980][1981][1982][1983][1984][1985][1986][1987][1988][1989][1990]. Table 8 reports the global surface warming forecasts produced by the low ECS GCMs in the periods 1980-1990, 2011-2021, 2040-2060 and 2080-2100 in the same three conditions. The analysis shows that the expected warming of the low-ECS GCMs by 2040-2060 is close to 2 • C also for the SSP3-7.0 and SSP5-8.5 scenarios, which Hausfather and Peters (2020) described as "unlikely" and as "highly unlikely", respectively. However, if the surface temperature records contain a warming bias and, therefore, the GCM simulations need to be scaled down to better agree with the actual warming, the projected warming for 2040-2060 could be lower (or even significantly lower if UAH -MSU-lt v6 is correct) than 2 • C also for the SSP3-7.0 and SSP5-8.5 scenarios.
There is indirect evidence that the surface-based temperature reconstructions could be affected by non-climatic warming biases. In fact, compared to the 1850-1900 mean, the average warming of 1980-1990 is 0.54 • C for HadCRUT5 (infilled data), 0.48 • C for GISTEMP v4 (using the period 1880-1900) and 0.47 • C for NOAAGlobalTemp v5 (using the period 1880-1900). However, the low ECS GCMs give a slightly lower 1980-1990 warming, which is 0.41 ± 0.20 • C by averaging all GCM simulations although they better hindcast the 1980-2021 warming. In contrast, the medium and high ECS GCMs give 0.48 ± 0.27 • C and 0.47 ± 0.23 • C, respectively, which better fit the climate records; but then these same GCMs fail to hindcast the observed warming from 1980 to 2021.
The above findings suggest that climate change will likely be moderate over the next few decades and that adaptation policies should be sufficient to manage any adverse effects that may occur.
Furthermore, the warming hindcasted by the low-ECS models from 1850-1900 to 1980-1990 would be lower than 0.41 ± 0.20 • C if the climate simulations produced by them were to be scaled down. This evidence would suggest that the more recently applied homogenization adjustments to climate data to attempt to remove their non-climatic biases may have been inadequate and may have added or left spurious warming. For example,the continuous homogenization adjustments made to the surface-based temperature records during the last 10 years may have improperly cooled the raw temperature data of the past for many land stations (D'Aleo, 2016) and, simultaneously, may have improperly increased the warming trend from the 1970s to the present, and, in particular, that of the period 2000-2021 (Connolly et al., 2021;Scafetta, 2021a;Watts, 2022).
In fact, the scientific literature has indicated the period 2000-2014 as a "hiatus" or "pause" in global warming (IPCC, 2013) because all surface and satellite climate records available before 2014 (e.g. HadCRUT3, which was discontinued in 2014, Brohan et al. 2006) showed more than a decade of relatively little change. Later, however, new versions of the surface temperature records were published (e.g. HadCRUT4 and later HadCRUT5 non-infilled and infilled data) and the 2000-2014 "pause" has gradually disappeared because, from one climate version to the following one, it has been replaced by an increasingly strong warming trend; e.g. the trend was 0.03°C/decade for HadCRUT3, 0.08°C/decade for HadCRUT4, 0.10°C/decade for HadCRUT5 non-infilled data, and 0.14°C/decade for HadCRUT5 infilled data. See Figure 10 and Table 9. Yet, the 2000-2014 global warming "hiatus" is still visible in the UAH-MSU-lt v6 record, which shows a 2000-2014 warming trend of 0.012°C/decade (Figure 2).

Conclusion
Here I tested how well the CMIP6 GCMs -grouped into low, medium and high ECS subgroups -hindcast the global surface temperature warming from 1980-1990 to 2011-2021 reported by four surface temperature records HadCRUT5,GISTEMP v4,and NOAAGlobTemp v5) and by the satellite-based UAH-MSU-lt v6 temperature record. The latter was used as the lowest possible estimate for the global surface temperature warming during the analyzed period. The rationale for adding a comparison with the lower troposphere temperature record is that surface temperatures could be affected by significant non-climatic warming bias due, for example, to poorly corrected urban heats and many other factors (Connolly et al., 2021;D'Aleo, 2016;Scafetta, 2021a;Watts, 2022). For example, indirect evidence for a significant warming bias, especially over land, may be also provided by the so-called "Divergence Problem" that is the apparent decoupling between three ring width chronologies and the rising temperature measurements starting from the 1970s (Büntgen et al., 2021;Esper et al., 2018;Scafetta, 2021a).
Using the 143 GCM mean simulations available for four different SSPs, all medium and high ECS models turn out to be warmer than observations. Using the 688 CMIP6 ensemble member simulations available, 94% to 100% of the simulations produced by GCMs with medium and high ECS hindcasted greater warming than the five temperature records. In contrast, the low-ECS models are statistically distributed around the observed warming values obtained from the four surface-based temperature records. However, if the UAH-MSU-LT record better represents the actual 2011-2021 warming, even the low-ECS GCM group would produce on average too hot hindcasts.
I also tested whether the internal variability of the models could produce results distributed around the observations. Its effect was modeled using three fixed precision options. Assuming high (σ H ≈ 0.05 • C) and medium (σ M ≈ 0.10 • C) precision, it was found that 98-100% and 92 -98%, respectively, of all possible outputs from the medium and high ECS GCMs would be warmer than observations. Only the theoretical results produced by the low-ECS GCM group optimally agree with the surface-based temperature records. If the required model accuracy is quite low (σ L ≈ 0.15 • C), the middle and high GCM simulation groups would agree better with the data, but this agreement could still be quite unsatisfactory because 87% to 93% (which is still well outside the ±1σ or 68% confidence interval) of their hindcasts would still be too hot. In any case, the low precision option should be considered very unsatisfactory because it would allow the GCMs to deviate too much from the observations. Moreover, such poor precision would not seem consistent with the natural variability of the data as argued in Appendix B. Figure 5 suggests that such a low precision could only occur for EC-Earth3 GCM.
Figures 5 and 6 also show that very few GCMs with medium and high ECS could produce some simulations consistent with the actual temperature values. In particular, the two high-ECS CNRM models (Séférian et al., 2019; appear to perform better than the other models of the same group. However, as a group, the high-ECS models are physically incompatible with the low-ECS ones. Indeed, the internal parameters of the GCMs are carefully tuned to obtain results as acceptable as possible (Hourdin et al., 2017;Mauritsen et al., 2019). Therefore, the good performance of some isolated cases could hardly be used to validate the corresponding model since the tuning operations also risk masking fundamental physical problems and, therefore, the need for model and/or forcing improvements.
It was found that only the low-ECS GCM group agrees optimally with the surface-based temperature records because their full hindcast range well encompasses the actual temperature warming values from 1980-1990 to 2011-2021. Therefore, since the three ECS chosen ranges should be considered large enough to be incompatible with each other, the GCM group with low ECS should be preferred to the other two, implying that the most likely ECS should be equal to or lower than 3 • C. This result confirms Scafetta (2022). In fact, the performance of the models seems to increase as the ECS decreases (Scafetta, 2021b).
However, the actual ECS could also be significantly lower than 3 • C if the UAH-MSU-lt record better represents the 2011-2021 surface warming. In fact, the satellite record shows that from 1980-1990 and 2011-2021 the global surface temperature may have warmed by about 0.40 • C, which is about 30% less than 0.58 • C as reported by ERA5-T2m, Had-CRUT5, and GISTEMP v4. In this case, even the GCM group with low ECS would show poor accuracy in reproducing the temperature data because their average hindcast is about 0.60 • C. This means that the actual ECS could also be 33% lower than that which characterizes the low ECS GCM group: that is, it could need to be reduces from 1.8-3.0 • C to 1.2-2.0 • C. This conclusion cannot be ruled out because: (1) the surface temperature records appear to be severely affected by non-climatic warming bias (Connolly et al., 2021;D'Aleo, 2016;Scafetta, 2021a;Watts, 2022), as the direct comparison between land and ocean warming proposed here also seeems to confirm ( Figure 8); (2) because a number of independent studies have concluded that the ECS could be within such a low range (e.g.: Lewis and Curry, 2018;Lindzen and Choi, 2011;Scafetta, 2013;Stefani, 2021;van Wijngaarden and Happer, 2020).
There is a third possibility which would also imply that the actual ECS should be relatively low. The climate system, in fact, appears to be also modulated by multidecennial and millennial natural oscillations such as those related to solar forcings and other astronomical ones, which are not reproduced by the GCMs (cf.: Scafetta, 2013Scafetta, , 2021cWyatt and Curry, 2014). Their presence implies that the ECS of GCMs should be at least halved (cf.: Loehle and Scafetta, 2011;Scafetta, 2012aScafetta, , 2021c and could vary approximately between 1.0 • C and 2.5 • C, as found by several independent studies (cf.: Lewis and Curry, 2018;Lindzen and Choi, 2011;Scafetta, 2013;Stefani, 2021;van Wijngaarden and Happer, 2020). If so, future climate warming and changes will be moderate and naturally oscillating (Scafetta, 2013(Scafetta, , 2021c and the rate of global surface warming should likely remain quite low until 2030-2040, when solar activity is expected to increase again due to its natural multi-decadal oscillations (Scafetta, 2012b;Scafetta and Bianchini, 2022, and several others).
In any case, even remaining within the theoretical framework of the CMIP6 GCMs, it should be concluded that only the low ECS GCM group can be considered sufficiently validated by the global surface warming observed from 1980-1990 to 2011-2021. Therefore, only the 21 st century climate projections produced by the low ECS GCMs should be used for policy. For decades to come, these models predict more moderate warming than the GCM groups with medium and high ECS do for similar greenhouse gas emission scenarios. By 2050, projected warming is expected to be around 2 • C or less even for the worst greenhouse gas emission scenarios. This moderate warming should not be considered particularly alarming because the impact and risk assessments related to it are considered "moderate" assuming even low to no adaptation (IPCC, 2022). Furthermore, as surface-based temperature records are likely affected by warming biases and are characterized by natural oscillations that are not reproduced by the CMIP6 models, the global warming expected for the next few decades may be even more moderate than predicted by the low-ECS GCMs and could easily fall within a safe temperature range where climate adaptation policies will suffice. Therefore, aggressive mitigation policies aimed at rapidly and drastically reducing GHG emissions in order to avoid a too rapid rise in temperature do not seem justified, also because their costs seem to outweigh any realistic benefits (cf. Bezdek et al., 2019).
Appendix A: Evaluation of the error of the mean for temperature records Computer simulations are made of pure numbers and their averages over a given period of time are error free. The uncertainty associated with their unforced internal variability is a different matter and will be discussed in Appendix B.
Conversely, the data points of the temperature records are affected by small statistical errors, which however are not always readily available as is the case with the ERA5-T2m record. Let's address the issue.
A generic time series y t with t = 1, ..., N could be affected by Gaussian distributed uncertainties ξ t with zero mean and standard deviation σ ξ as where x t is the physical signal of the record. Its mean isȳ whereσ = σ ξ / √ N is the error of the mean. It is important to note that σ ξ is the standard error of the uncertainties ξ t , not that of the signal y t . For example, the ERA5-T2m 2011ERA5-T2m -2021ERA5-T2m average (baselined to 1980ERA5-T2m -1990 is 0.578 • C, which was obtained by integrating on the globe the temperature variations that occurred in each cell of the surface grid worldwide. The standard deviation of the same record is σ y,month = 0.20 • C using the monthly record and σ y,year = 0.18 • C using the annual record. For random variables, the error of the mean does not depend on the time resolution of the record, that is, the monthly and yearly resolved records should giveσ year =σ month . However, if we apply the equationσ = σ y / √ N from 2011 to 2021, for ERA5-T2m we getσ year = 0.054 • C (using N=11 points) andσ month = 0.017 • C (using N=132 points), respectively. This shows that from 2011 to 2021 the ERA-T2m record is not composed of random variables floating around an average value, but contains a physical signal.
It can be assumed that the physical signal of ERA5-T2m is represented by the moving averages of the data at 13 months, 5 months or 3 months while the residuals are the noise components that should be used to evaluate σ ξ andσ = σ ξ / √ N. The three choices giveσ = 0.01 • C, σ = 0.006 • C, andσ = 0.005 • C, respectively, which suggest that the actual error of the 2011-2021 mean could beσ = 0.01 • C or probably less. The first two examples of data decomposition are shown in Figure 11.

Fig. 11
Possible decomposition of the ERA5-T2m global surface temperature record (blue) into a signal (orange, made of 13 and 5 month moving averages) and a residual-noise (red) record that could be used to evaluate the error of the mean.
Alternatively, the statistical uncertainty associated with ERA5-T2m could be considered compatible with those explicitly provided by the other available global surface temperature records. In the case of the GISTEMP record, Lenssen et al. (2019) calculated that the resulting 95% uncertainties are nearσ 95%,annual ≈ 0.05 • C in the global annual mean for the last 50 years. HadCRUT5's global surface temperature record includes its 95% confidence interval estimate and, from 2011 to 2021, the uncertainties for the monthly and annual averages areσ 95%,monthly ≈ 0.05 • C andσ 95%,annual ≈ 0.03 • C, respectively. Berkeley Earth land/ocean temperature record estimatesσ 95%,monthly ≈ 0.042 • C,σ 95%,annual ≈ 0.028 • C, andσ 95%,decadal ≈ 0.022 • C during the same period. Note that the error of the mean must decrease as the time scale increases. Therefore, adopting the equationσ 95% = 1.96 × σ ξ / √ N, the probable error for the 2011-2021 mean could be of about 0.01 • C or even smaller. In fact, using the above estimates, we obtain: 0.05/ √ 11 = 0.015 • C, 0.05/ √ 132 = 0.0043 • C, 0.03/ √ 11 = 0.009 • C, 0.042/ √ 132 = 0.004 • C, and 0.028/ √ 11 = 0.008 • C respectively, the mean of which is approximately 0.008 • C. Alternatively, the 95% uncertainty over the period 2011-2021 cannot be greater than about ±0.02 • C, as explicitly reported by the Berkeley Earth land/ocean temperature record for the ten-year scale.
Therefore, various methodologies suggest that the uncertainty of the temperature means in the 11-year period from 2011 to 2021 is very small, around ±0.01 • C at 95% confidence and can be safely ignored as done for example in Scafetta (2022). Appendix B: Provisional assessment of an acceptable internal variability of the models Climate models are unable to accurately simulate temperature records due to various uncertainties. For example, only by varying the initial conditions different climatic trajectories are obtained which evolve around an ideal average with a certain variance.
Each GCM is characterized by its own internal variability. However, in the main text, we argued that such uncertainty could not be arbitrarily large because the models must be both accurate and precise. Figure 12 explains the concepts of "accuracy", which measures the proximity of the model hindcasts to the real value, and of "precision", which indicates the proximity of the same hindcasts to each other.
An acceptable range for the distribution of hindcasts related to the internal variability of the models could be assessed by observing that temperature fluctuations at time scales lower than, for example, 15 years, (which include the climatic oscillations of the ENSO and those of the 11-year solar cycle) may not be adequately predicted by the models. From the point of view of the models, the temperature fluctuations at those time scales could be considered stochastic and their standard deviation from the mean could represent the acceptable range that can be associated with the unforced internal variability of the models (cf. Knight et al., 2009). Figure 13 shows the 15-year high-pass filter curves of HadCRUT5, ERA5-T2m, GISTEMP v4, NOAAGlobTemp v5, and UAH-MSU-lt v6 temperature records from 1980 to 2021. Using the standard deviation σ for each record, the error of the mean on 11-year intervals (e.g. from 2011 to 2021) at the 95% confidence isσ 95% = 1.96 × σ/ √ 11, that is:σ 95% = 0.080 • C for HadCRUT5;σ 95% = 0.094 • C for ERA5-T2m;σ 95% = 0.079 • C for GISTEMP v4;σ 95% = 0.075 • C for GISTEMP v4; andσ 95% = 0.104 • C for UAH-MSU-lt v6.
Based on the above assumptions, over an 11-year period, the uncertainty of the 2011-2021 warming compared to 1980-1990 could be estimated at approximately ±0.1 • C at the 95% confidence, which corresponds to the high precision option (σ H = 0.05 • C) discussed in Section 3. The high precision option should not be interpreted as the actual dispersion produced by each GCM, which varies greatly from model to model, but only as the acceptable uncertainty that a CGM should exhibit in reproducing the warming from 1980-1990 to 2011-2021. In our case, a ±0.1 • C error would imply a ±0.17% of the actual warming from 1980 to 2021, which can be considered a reasonable error.

Conflict of Interest
The author declares no conflict of interest.