1 Introduction

In recent decades, the global temperature has been steadily rising. Between 2011 and 2020, the global average temperature increased by 1.09 °C compared to the preindustrial period (1850–1900), primarily due to human activities (IPCC 2021). The IPCC AR6 Working Group I pointed out that, irrespective of emission scenarios, the temperature rise is inevitably poised to exceed the 1.5 °C threshold within 2020 to 2040. In fact, as of July 2023, global temperatures have surpassed the preindustrial levels by approximately 1.5 °C, marking it as the highest global monthly average temperature in recorded human history (Climate Change Service 2023). Globally, human activities have resulted in an increased frequency and intensity of extreme weather events and climate incidents (Dong et al. 2020, 2022; Engdaw et al. 2023; Jiang et al. 2022; Lu et al. 2016; Madakumbura et al. 2021; Xu et al. 2022; Yin et al. 2017). This has escalated the risks of heatwaves, droughts, and floods, with profound implications for ecosystems, human health, and socioeconomic well-being (Fu and Wen 2002; Williams et al. 2015). As the warming trend continues to intensify, extreme climate events will pose critical threats to food and water security. Moreover, extreme events such as severe heatwaves, floods, droughts, and wildfires have become increasingly common in China. Beyond the threshold of temperature rise, the intensity and scope of these extreme climate events are bound to escalate further, with far-reaching consequences.

China’s climate is significantly influenced by the East Asian monsoon and the Qinghai-Tibet Plateau, resulting in large variability and a wide range of extreme weather and climate events, often leading to frequent disasters (Chen et al. 2023; Cui et al. 2019; Duan and Wu 2005; Liu et al. 2023). In recent years, China has experienced multiple record-breaking extreme climate events. For instance, in the summer of 2023, a historically rare heavy precipitation event occurred in most parts of North China, with the maximum accumulated rainfall reaching 1003 mm in Licheng County, Xingtai, Hebei Province (National Climate Center of China 2023). In the summer of 2022, an extremely prolonged heatwave event affected the central and eastern regions of China, with the overall intensity of the heatwave being the strongest since 1961 (Ma and Yuan 2023). In 2021, Zhengzhou, Henan Province, experienced an unprecedented extreme rainfall event, with a maximum daily precipitation of 624 mm, close to the annual average precipitation of the station (641 mm), resulting in significant loss of life and extensive economic damage (Zhang et al. 2023). In the future, anthropogenic climate change will increase the occurrence of extreme weather and climate events in China, heightening the country’s exposure to climate risks and leading to a lock-in effect (IPCC 2022). Therefore, it is essential to conduct high-precision estimations of the future evolution of extreme climate events in China and take proactive measures in advance.

Climate system models not only have the capability to simulate historical changes in global climate but also provide estimations of future variations, serving as crucial research tools for climate change detection and attribution. The previous CMIP5 models have demonstrated the ability to replicate the increasing spatial distribution of precipitation in China from northwest to southeast. However, they tend to underestimate precipitation in the coastal regions of southern China while overestimating precipitation in the northern part of the country and the Qinghai-Tibet Plateau (Chen et al. 2014). Additionally, they have shown good performance in simulating the climatic means and trends of precipitation extremes in China (Chen and Sun 2015). The simulations of precipitation extremes in the CMIP5 models have revealed wet deviations in western and northern China, accompanied by dry biases in southeastern China, as reported by Jiang et al. (2015). The sixth phase of the International Coupled Model Intercomparison Project (CMIP6) has witnessed the largest participation of models among all previous phases, providing critical data support for the IPCC-AR6 report. Compared to the climate models used in the previous phase, CMIP6 models feature higher resolutions, more complex and refined physical processes, and utilize the latest shared socioeconomic pathways (SSPs) for future projections. Compared to the CMIP5 models, the CMIP6 models exhibit substantial improvements in simulating the dry bias in southern China and show compelling enhancements in capturing the climatological characteristics of extreme precipitation events (Chen et al. 2020; Xu et al. 2021; Zhu et al. 2021). CMIP6 models also perform better in simulating extreme temperature events in China than CMIP5 models (Yang et al. 2023). However, they still encounter difficulties in reproducing the spatial patterns of certain extreme temperature indices, such as TX90P (days with maximum temperature exceeding the 90th percentile) and heatwave events (Hirsch et al. 2021; Zhu et al. 2020). The differences in simulations between the two generations of models are mainly attributed to the upgraded physical schemes in the CMIP6 models (Zhu et al. 2020). Zhu et al. (2021) found that the spatial distributions of extreme indices in the 21st century projected by the CMIP5 and CMIP6 models are generally consistent. The projected results of the CMIP6 models indicate a significant increase in the TXX index (annual maximum of the daily maximum temperature) over northern China, while the southern region is expected to experience the most pronounced increases in TX90P and WSDI (warm spell duration index) (Zhang et al. 2021). In the future, apart from a decreasing trend in the CDD (consecutive dry days) index, various regions across China are expected to experience a significant increase in total precipitation, maximum consecutive 5-day precipitation, and the number of heavy rainfall days. Moreover, the CMIP6 simulations also indicate a greater increase in total precipitation and extreme precipitation over the Yangtze River basin and areas near 40°N (Zhu et al. 2021).

Although the CMIP6 models have shown significant improvements in simulating performance compared to previous model generations, there is still considerable uncertainty in the simulations and projections (Wei et al. 2023; Xu et al. 2021). This primarily stems from the models’ inadequate representation of large-scale atmospheric circulation fields, deficiencies in physical parameterization schemes, challenges in accurately reproducing aerosol-radiation interactions and anthropogenic aerosol-cloud interactions’ radiative forcing, and uncertainties in emission scenarios (Fan et al. 2022; Huang et al. 2013; Van Vuuren et al. 2011; Zhang et al. 2021; Zhou et al. 2014). In addition, the coarse resolution of the models poses challenges in capturing local-scale climate responses and hinders their application at the regional level. Therefore, it is imperative to apply correction and downscaling techniques to the model output before utilizing it. Both dynamic and statistical downscaling techniques have been used extensively to refine the spatial resolution of climate simulations and to bridge the gap between coarse global climate models (GCMs) and local-scale climate information (Tang et al. 2016). However, the application of dynamical downscaling is often limited by its significant computational requirements and the complex thermodynamic equations and computational processes needed to generate high-resolution climate outputs (Han and Wei 2010; Xu et al. 2018). In contrast, statistical downscaling methods provide an alternative approach by establishing empirical relationships between GCM outputs and local observations of climate variables, bypassing the need for complex physical equations. This simplicity and effectiveness have made statistical downscaling a widely used technique in a variety of climate-related studies (Yao et al., 2019; Yang and Tang 2023), providing valuable insights into regional adaptation strategies at finer scales.

In the context of China’s diverse topography and climatic zones, coupled with its vast population, the nation exhibits a pronounced sensitivity and vulnerability to climate extremes, which can have significant socio-economic and environmental repercussions (Wang et al. 2020). So, the motivation for this study stems from the critical need to understand how future climate change may alter the occurrence and severity of extreme climate events across China. The National Aeronautics and Space Administration (NASA) has initiated a global daily data downscaling program, which has performed downscaling treatments on historical and projection experiments of 21 CMIP5 models. This program provides downscaled data at the global scale and high resolution while also correcting biases in both historical and projection experiments. These downscaled data exhibit excellent modeling capabilities for extreme climate events in China’s region, offering more climate change information at the regional scale and reducing the uncertainty range of extreme precipitation events (Chen et al. 2017; Zhou et al. 2018). Recently, NASA released the latest version of these data (NEX-GDDP-CMIP6), which is based on the downscaling and correction of daily values using CMIP6 models (Thrasher et al. 2022). The NEX-GDDP-CMIP6 models provide a valuable tool for assessing these changes. By evaluating the performance of these models in simulating historical extreme climate indices and projecting future changes, this research provides a robust basis for climate risk assessment and inform decision-making processes related to climate change adaptation and mitigation.

This paper is organized as follows: Sect. 2 describes the observations, NEX-GDDP-CMIP6 data, methods and study area used in this study. Section 3 evaluates the results based on the simulations of the 26 NEX-GDDP-CMIP6 models and multi-model ensemble (MME). The future changes in climate extremes projected by MME and climate model agreements are also explored in Sect. 3. The conclusion and discussion are given in Sect. 4.

2 Materials and methods

2.1 Datasets

The CN05.1 dataset is a comprehensive collection of daily observations that have been acquired since 1961 from more than 2,400 ground meteorological stations scattered across various regions within China. Employing the anomaly approach, distinct climate variables’ climatic and anomaly fields were independently interpolated and subsequently integrated to generate gridded data (Wu and Gao 2013). The spatial resolution of this dataset is precisely defined as 0.25° × 0.25°. The daily maximum temperature and precipitation of this dataset were used to evaluate the climate models’ simulation capabilities.

The model data utilized in this study incorporate daily maximum temperature and precipitation data under the historical experiment of 26 CMIP6 models from the Global Daily Downscaled Projections dataset by the National Aeronautics and Space Administration (NASA), as well as the model projections under the SSP1-2.6, SSP2-4.5, and SSP5-8.5 emission scenarios (Thrasher et al. 2022). These scenarios represent a low-emissions “taking the green road” forcing, an intermediate “middle of the road” forcing and a high-emissions “fossil-fueled development” forcing, respectively. The spatial resolution of the dataset is 0.25° × 0.25° (Table 1).

The BCSD method utilized in NEX-GDDP-CMIP6, which integrates bias correction and spatial disaggregation, was applied to refine the CMIP6 historical and future projections. The core algorithm of bias correction remains consistent, but the model adapts scaling factors to the specific climatic characteristics of each region. This ensures that the model’s outputs align more closely with local climate records. Rather than using the same parameters statically for different regions, the model dynamically adjusts them based on local climatology and scaling factors derived from the bias-corrected GCM data and observational datasets. This dynamic adjustment allows the model to accurately represent the unique climatic conditions of each region in China. Capturing the complex topography and diverse climate systems across China is particularly important. This approach of resolution improvement and observational adjustment enhances the performance of the NEX-GDDP-CMIP6 compared to raw CMIP6 models. The dataset can be downloaded from the following website: https://www.nccs.nasa.gov/services/data-collections/land-based-products/nex-gddp-cmip6.

In line with the PCC AR6 report, the reference period spanned from 1995 to 2014, while the projection period covered 2015 to 2100, with a specific focus on the long period of the 21st century (2081–2100).

Table 1 Information on the 26 climate models used in the study

2.2 Methods

In this study, we have employed a comprehensive evaluation approach to assess the performance of the NEX-GDDP-CMIP6 models. The models were evaluated using a set of 13 extreme climate indices, which were chosen for their ability to capture both temperature and precipitation extremes. The Taylor diagram was used to visualize the models’ performance in terms of spatial correlation and standard deviation ratios, while the Taylor skill score (TS) provided a quantitative measure of how well the models simulated the climatology of climate states. These metrics are particularly useful for evaluating the models’ ability to reproduce the climate extremes, which are critical for understanding potential climate change impacts.

We selected a total of 13 extreme climate indices as the primary focus, comprising 5 extreme temperature indices and 8 extreme precipitation indices (Table 2). To facilitate a concise summary of the results, the chosen indices were classified as follows: PTOT (total wet day precipitation), SDII (simple daily intensity), RX1D (max 1 day precipitation), RX5D (max 5 day precipitation), and TXX (maximum of daily maximum temperature) were categorized as absolute indices; CDD (consecutive dry days) and CWD (consecutive wet days) were classified as duration indices; R20 (very heavy precipitation days), R50 (extremely very heavy precipitation days), and T35 (number of high temperature days) were designated as threshold-based indices; and TX90P (warm days), HWF (heat wave frequencies), and HWD (heat wave days) were denoted as percentile indices. TXX, TX90P, CDD, CWD, R20, R50, RX1D, RX5D, SDII, and PTOT were established and defined by the Expert Team on Climate Change Detection and Indices (ETCCDI; http://etccdi.pacificclimate.org/list_27_indices.shtml; Klein et al. 2009), while T35, HWF, HWD, and R50 have also applied widespread application in climate change research (Wu et al. 2023b; Hirsch et al. 2021; Sun et al. 2011; Guo et al., 2016).

Table 2 Extreme temperature and precipitation indices

For a comprehensive investigation of regional variations, we adopted the regional classification established by Zhou et al. (2014) to divide China into 8 subregions (Fig. 1). These subregions included Northeast China (NEC; 39°–54°N, 119°–134°E), North China (NC; 36°–46°N, 111°–119°E), East China (EC; 27°–36°N, 116°–122°E), Central China (CC; 27°–36°N, 106°–116°E), Northwest China (NWC; 36°–46°N, 75°–111°E), the Tibetan Plateau (SWC1; 27°–36°N, 77°–106°E), Southwest China (SWC2; 22°–27°N, 98°–106°E), and South China (SC; 20°–27°N, 106°–120°E).

The Taylor diagram is a vital and concise tool used to assess the capabilities of models by examining the statistics concerning pattern correlation coefficients and the ratios of standard deviations (RSD) obtained from model outputs and observational data. A comprehensive elucidation of these statistics can be found in Taylor’s work (2001). A close proximity of both the correlation coefficient and RSD to 1 signifies a strong performance by the model. The Taylor skill score (TS) is a comprehensive index devised using pattern correlation coefficients and ratios of standard deviations (RSDs). The closer the TS score approaches 1, the stronger the simulation performance of the model. We calculated the Taylor skill scores (TS) for extreme climate indices simulated by individual models and the ensemble mean, as well as the TS scores for the eight subregions simulated by the multi-model ensemble (MME, represented as the multi-model average).

$$TS=\frac{{4{{(1+R)}^2}}}{{{{(\frac{{{\sigma _{sm}}}}{{{\sigma _{to}}}}+\frac{{{\sigma _{to}}}}{{{\sigma _{sm}}}})}^2}{{(1+{R_0})}^2}}}$$
(1)

In this equation, R represents the pattern correlation coefficient, and \({R_0}\)equals 0.999, which is substituted into the formula as the maximum correlation coefficient among all models. \({\sigma _{sm}}\)and \({\sigma _{so}}\)denote the area-weighted ratios of standard deviations (RSDs) for the models and observations, respectively.

Simulated and projected changes in extreme climate indices in the long-term period (2081–2100) relative to the baseline period of 1995–2014 are calculated as follows:

When the indices’ units are mm and mm/day:

$${Relative}\:{Change} = \frac{{Inde{x_{future}} - Inde{x_{baseline}}}}{{Inde{x_{baseline}}}} \times 100\%$$
(2)

When the indices’ units are %, ℃ and day:

$${Relative}\:{Change} = Inde{x_{future}} - Inde{x_{baseline}}$$
(3)
Fig. 1
figure 1

The distinct domains of China’s eight subregions (NEC: northeast China; NC: north China; EC: east China; CC: central China; SC; south China; SWC1: southwest China, region 1; SWC2: southwest China, region 2; and NWC: northwest China; the shaded area represents the topographical height by using the ETOPO data, Unit: m; http://lijianping.cn/dct/attach/Y2xiOmNsYjpuYzoxMzY=)

3 Results

3.1 Performance of NEX-GDDP-CMIP6 models

First, an evaluation of the downscaling models’ capability to simulate the climatology of extreme high-temperature events was conducted (Fig. 2). In the observations, the high value center of extreme high temperatures resided in the northwestern region of China, with values surpassing 40℃. Other high-value areas were in North China and the Yangtze River Basin, where temperatures exceeded 37℃. The lowest values, below 15℃, could be found in the northwestern part of the Qinghai-Tibet Plateau. The NEX-GDDP-CMIP6 ensemble average closely aligned with observations, exhibiting a spatial correlation coefficient of 0.98. However, the model overestimated values in the northeastern region. The regions with high T35 index values were likewise situated in the northwestern part of China, as well as in North China and the Yangtze River Basin, where values exceeded 16 days. Low-value regions encompassed the southwest and northeastern parts of the country, with values below 0.5 days. The model ensemble average effectively simulated the number of hot days, with a spatial correlation coefficient reaching 0.88. Nevertheless, it overestimated and underestimated the number of hot days in the northeastern region and the area south of the Yangtze River, respectively. The high-value regions for TX90P were predominantly concentrated in western Yunnan Province, Guangdong Province, and other areas, with values exceeding 20%. Low-value regions were mainly observed in North China, with values below 12%. The model struggled to reproduce this distribution pattern, generally underestimating the percentage of warm days, with a spatial correlation coefficient of 0.51. The high-value center of HWF was primarily found in Qinghai Province, western Tibet Autonomous Region, western Yunnan, Guangdong, and other places, with values exceeding 9 times. The low-value center appeared in North China, with values below 5 times. The model ensemble average underestimated most regions of the nation, particularly the western areas, exhibiting a spatial correlation coefficient of only 0.46. It performed relatively well in simulating the North China region. Regarding HWD, the high-value center predominantly occurred in the western region of China and Guangdong Province, where values exceed 40 days. The low-value center was in North China and the northeastern region, with values below 24 days. The model ensemble average underestimated values in the eastern part of the NWC region, as well as the CC, EC, and SC regions, with a spatial correlation coefficient of 0.66, but it effectively simulated the remaining areas. In summary, the NEX-GDDP-CMIP6 ensemble average successfully simulated the spatial patterns of TXX and T35 and reasonably simulated the TX90P, HWF, and HWD indices in North China. However, it generally underestimated extreme high-temperature indices in other regions. Compared with the original CMIP6 model, NEX-GDDP-CMIP6 improves the warm bias of the TXx index in Northwest China and cold bias in South China (Wei et al. 2023). The distribution pattern of TX90P simulated by NEX-GDDP-CMIP6 was more consistent with that of the raw CMIP6 models, and also fails to improve the simulation performance for the heat wave index (Zhu et al. 2020; Hirsch et al. 2021).

Fig. 2
figure 2

Spatial distributions of TXX (a, b), T35 (c, d), TX90P (e, f), HWF (g, h) and HWD (i, j) from the observation (a, c, e, g, i) and multimodel ensemble (MME) mean of 26 NEX-GDDP-CMIP6 models (b, d, f, h, j) over China during 1995–2014

Further evaluation of the spatial patterns of extreme high-temperature events using Taylor diagram was conducted. Figure 3 illustrates the spatial correlation coefficients and standard deviation ratios between various model simulations, multimodel ensemble average (MME) simulations, and observations for different extreme high-temperature indices. The results indicated that individual models and the ensemble average performed well in simulating the TXX index, with spatial correlation coefficients of approximately 0.98 and standard deviation ratios close to 1. The simulations for the T35 index were also relatively good, with spatial correlation coefficients exceeding 0.85. However, the standard deviation ratios were above 1.5. The models exhibited poor performance in simulating the TX90P index, with considerable discrepancies among different model results. The MME, BCC-CSM2-MR, INM-CM5-0, INM-CM4-8, and GISS-E2-1-G models demonstrated better performance, with spatial correlation coefficients of approximately 0.5, although their standard deviation ratios were less than 1. The latter three models outperformed the MME and BCC-CSM2-MR models. On the other hand, the ACCESS-ESM1-5, CMCC-CM2-SR5, MIROC6, and MIROC-ES2L models exhibited the poorest performance and exhibited a negative correlation with observations. The models also showed limited ability to simulate the HWF index, with scattered results on the Taylor diagram. The BCC-CSM2-MR model performed relatively well, with a spatial correlation coefficient exceeding 0.5 and a standard deviation ratio close to 1. On the other hand, the ACCESS-ESM1-5, CMCC-CM2-SR5, MIROC6, and MIROC-ES2L models continued to demonstrate a negative correlation with observations. The models’ performance in simulating the HWD index was slightly better than that for TX90P and HWF. The MME, BCC-CSM2-MR, GISS-E2-1-G, and CNRM-CM6-1 models performed the best, with spatial correlation coefficients all exceeding 0.6. The ratio of their standard deviation to that of the observations was close to 1 for the GISS-E2-1-G and CNRM-CM6-1 models. Conversely, the ACCESS-ESM1-5, MIROC6, and MIROC-ES2L models exhibited poorer performance. The analysis indicated that the NEX-GDDP-CMIP6 model performed well in simulating TXX and T35, possibly due to the incorporation of observational data corrections. However, the corrected models had difficulties in simulating the TX90P, HWF, and HWD indices. These indices showed significant deviations from the observations, and there were considerable differences between the various models. Compared to the CMIP6 models, the NEX-GDDP-CMIP6 model improved its performance in representing TXX. The downscaled dataset exhibited a higher spatial correlation with observations and significantly enhanced the consistency among models. (Wei et al. 2023; Zhu et al. 2020). The spatial correlation coefficients of the T35 indices simulated by NEX-GDDP-CMIP6 (0.88) are higher than those of the original CMIP6 model (0.7), and the intermodal variations are more consistent, but the ratios of the standard deviations are higher than those of the CMIP6 model (Guo et al. 2022).

Fig. 3
figure 3

Taylor diagrams of the NEX-GDDP-CMIP6-simulated climatic means (1995–2014) for five extreme temperature indices over China. The azimuthal position represents the pattern spatial correlation, the radial distance from the origin represents the spatial variability, and radial axis shows the spatial root-mean-square deviation

The Taylor skill score provided a more intuitive evaluation of the climate model’s ability to represent the climatology of the climate state. Figure 4 presents a histogram of the Taylor skill scores, where a score closer to 1 indicated a stronger simulating skill. In Fig. 4, both individual models and MME demonstrated strong simulating capabilities for the TXX index, with an MME Taylor skill score of 0.98, followed by the T35 index with a score of 0.67. However, the models exhibited weaker performance in capturing the climatology of the TX90P index over the China region, with an MME score of only 0.49. Nine models had a Taylor skill score exceeding 0.5, among which GISS-E2-1-G and INM-CM4-8 exhibited the highest capability for representing the TX90P climatology, with scores reaching 0.58. The models also showed limited skill in capturing the HWF index over the China region, with an MME score of 0.41. Five models had scores exceeding 0.5, and BCC-CSM2-MR demonstrated the highest capability for this index, reaching a score of 0.56. On the other hand, the HWD index obtained a higher score than TX90P and HWF, with an MME score of 0.6. Thirteen models had scores exceeding 0.5, with CNRM-CM6-1 performing the best and achieving a score of 0.68.

Fig. 4
figure 4

Taylor skill scores of 26 CMIP6 models and their ensemble mean in simulating extreme temperature over China

Table 3 presents the Taylor skill scores for different regions of China based on MME, along with the regional averages and anomalies. For the TXX index, MME exhibited the weakest spatial modeling capabilities for the EC region, with a Taylor skill score of only 0.63, while the NWC region demonstrated the strongest performance, with a score of 0.95. For the regional averages, the MME showed significant deviations in the NC and NEC regions, overestimating them by 2.69℃ and 2.73℃, respectively. Regarding the T35 index, the MME performed best in simulating the NWC region, achieving a score of 0.86. However, it exhibited limited modeling capabilities for the SWC2 and SWC1 regions, with scores close to 0. The MME showed the largest deviation in the EC region, exceeding 4.71 days. For the TX90P index, the MME demonstrated its strongest modeling capability for the NC region, with a Taylor skill score of 0.65, while it exhibited minimal performance for the NWC region, with a score of only 0.06. The largest deviation of the MME occurred in the NC region, with a value of -2.14%. For the HWF index, the MME’s modeling capabilities were generally poor for most regions, except for the NEC and NC regions, where it obtained a score of 0.54 and 0.48. The region with the largest deviation from observations was SWC1, with a value of -1.31 times. Regarding the HWD index, the MME demonstrated its weakest modeling capability for the CC region, with a Taylor skill score of 0.2, while performing best in simulating the NEC region, with a score of 0.69. The SC region showed the largest deviation in the simulations, underestimated by 6.26 days compared to observations. The MME performed best in simulating the NEC and NC regions, with most of the indices’ scores exceeding 0.4. On a national scale, the TXx index simulated by NEX-GDDP-CMIP6 is 1.35 °C higher compared to observations, while the simulation results from CMIP6 models are 0.45 °C lower. The TX90P index simulated by NEX-GDDP-CMIP6 is 1.18% lower compared to observations, while CMIP6 models only show a 0.43% underestimation in the simulation results (Zhu et al. 2020).

Overall, the MME shows varied performance in simulating extreme heat events across different regions of China. MME simulated these events in NC and NEC regions very well. In EC, MME shows the weakest simulation of TXx. The MME struggles with simulating HWF and HWD in regions with complex terrains, such as the southwestern and northwestern areas. From a national perspective, the MME simulations tended to overestimate the TXX index and slightly overestimate the T35 index while underestimating the TX90P, HWF, and HWD indices. Aside from the inaccuracies in the observational data used for model correction and shortcomings of the bias correction method, these biases may also arise from inherent limitations within the models themselves, including the uncertainties in capturing the frequency and intensity of extreme high temperature events (Hirsch et al. 2021; Domeisen et al. 2023). According to previous research, extreme high temperatures in Northwest China are influenced not only by large-scale anticyclones but also significantly by terrain-induced subsidence flows (Chen and Lu 2015). Mesoscale atmospheric circulation processes and the affected land-atmosphere coupling processes by the circulation have a crucial impact on heatwave events in Southwest China (Fu and Wang 2023). Extreme high-temperature events in Northeast China are influenced by the interdecadal variability of the geopotential height in the mid-to-upper troposphere over Mongolia during summer, showing significant interdecadal changes (Hong et al. 2020). Internal variability of the atmosphere, such as the Eurasian teleconnection (EAT) pattern and the Victoria mode (VM) teleconnection, significantly affects extreme high temperatures in Northern China (Xie et al. 2023; Deng et al. 2019). High-temperature heatwaves in Eastern China are doubly influenced by monsoons and topography (Chen and Lu 2015), with the heatwaves in the Yangtze River basin being notably affected by internal atmospheric variability, such as southeast-propagating wave trains from northern Russia and East Asia and quasi-biweekly oscillations (Gao et al. 2018). The sea surface temperatures of the Indian Ocean and the Northwest Pacific have a significant impact on heatwaves in Southern China (Deng et al. 2019). These physical mechanisms are challenging to reproduce in climate models, and model data corrected for biases similarly struggle to replicate the characteristics of heatwaves in different regions of China. Further increases in the resolution of atmospheric models could improve weather system (e.g., blocking high pressure) that are critical for extreme heat events (Davini and D’Andrea 2020; Schiemann et al. 2020), while improvements in the performance of land-surface models (Ukkola et al. 2018; Sippel et al. 2017; Seneviratne et al. 2010; Lorenz et al. 2016) can also enhance the ability to model extreme heat events. These findings highlight the need for continued model refinement to more accurately simulate extreme high-temperature events in different regions of China, which is critical for effective climate adaptation and mitigation strategies. A detailed discussion of the different bias correction methods will be shown in the conclusion section.

Table 3 Taylor skill scores for extreme high temperature indices in different regions of China and regional averages of observations and MME simulations. The value inside the parentheses represents the difference between historical experiments and observations

Figure 5 presents the time series of four extreme high-temperature indices in China, averaged over the period 1961–2014, based on observations and simulations relative to 1995–2014. The time series TXX indicated that prior to the year 2000, the observed index consistently exceeded the MME values. However, after the year 2000, the observed and simulated changes showed a greater level of agreement. The trend coefficient differences between the observed and MME values were distinct, with values of -0.2 °C per decade and 0.2 °C per decade, respectively. Both coefficients passed the significance tests, emphasizing the difficulty of the MME in replicating the temporal characteristics of the TXX index. For the remaining four indices, the observational values exhibited an increasing trend over time, with trend coefficients of 0.3 days per decade, 1.8% per decade, 0.9 times per decade, and 4.7 days per decade, respectively. The multimodel ensemble was generally successful in reproducing the trend changes for these indices, with trend coefficients of 0.5 days per decade, 1.6% per decade, 0.8 times per decade, and 4.3 days per decade, respectively. The observed and simulated extreme high temperature trends have passed the significance test at the 5% level. These results aligned closely with the original CMIP6 models, as bias correction methods failed to alter the trend changes in extreme events. Although the NEX-GDDP-CMIP6 models effectively captured the long-term trend characteristics of the latter four indices, they still exhibited limited capability in simulating interannual variations and amplitudes associated with these indices.

Fig. 5
figure 5

Time series of five extreme temperature indices over China during 1961–2014 relative to the period 1995–2014. The black and red lines indicate the corresponding results from the observations and MME of 26 model simulations. The top and bottom bounds of the shaded area are the maximum and minimum values of 26 model simulations

The NEX-GDDP-CMIP6 models show a high level of accuracy in simulating the spatial distribution of extreme temperature events in China, with particularly good performance in reproducing TXx and T35, especially in North China. However, there is an underestimation of percentile-based indices like TX90P and challenges in regions with complex terrains. In terms of time-varying simulations, the models effectively capture long-term trends but exhibit difficulties in accurately representing interannual changes and amplitudes. Regional simulations reveal a diverse performance, with the relatively high spatial correlation in the Northeast and North China. The models’ enhanced simulating capabilities stem from downscaling and bias correction, leading to more reliable simulation compared to original CMIP6 models (Chen et al. 2017). Ongoing refinements are needed to improve simulations further, particularly for percentile-based indices and regional nuances.

Figure 6 presents the spatial distribution of eight extreme precipitation indices, as observed and simulated by MME. The high-value center of the CDD index was in the northwestern region of China, ranging from 130 to 160 days. Conversely, low-value areas appeared in the southern parts of the country, ranging from 20 to 40 days. This pattern showed an increasing trend from the southeastern coastal areas toward the northwest inland. The MME could replicate this distribution pattern fairly well, with a spatial correlation coefficient of 0.82 compared to the observations. However, the MME noticeably overestimated the values in the northwestern region, with the center position being significantly larger and the values exceeding 160 days. This discrepancy may have been related to inaccuracies in the “observed” data used for model calibration. Regarding the CWD index, the high-value center was in the eastern part of the Qinghai-Tibet Plateau and the western part of the Sichuan Basin, with values exceeding 80 days. The low-value center appeared in the northwestern region of China, with values less than 3 days. This distribution pattern exhibited a north-to-south gradient. The MME could simulate this distribution pattern, with a spatial correlation coefficient of 0.6 compared to the observations. However, it struggled to reproduce the high-value center observed in the data, and it overestimated the values in the Yunnan region, as well as in the northeastern and northern parts of China. For the observed PTOT index, the high-value center was observed in the southeastern coastal areas, with values exceeding 1800 mm. The low-value center appeared in the northwestern region, with values less than 200 mm. This distribution pattern also demonstrated an increase from the southeast to the northwest. The MME was generally capable of reproducing the spatial distribution pattern observed in the data, with a spatial correlation coefficient of 0.95, which was the highest among all the extreme precipitation indices. However, compared to the original CMIP6 models, the MME still overestimated the precipitation over the southern part of the Qinghai-Tibet Plateau. The distribution pattern of the R20 index aligned with that of PTOT, with the high-value center still located in the southeastern coastal areas, exceeding 25 days. The MME could simulate the distribution pattern of R20 as well, with a spatial correlation coefficient of 0.81 compared to the observations. However, it overestimated the values on the southern side of the Qinghai-Tibet Plateau and underestimated the values in the southeastern coastal region, mostly concentrated around the Fujian, Jiangxi, and Anhui provinces. The distribution pattern of R50 was similar to that of R20, with the maximum value exceeding 5 days. The spatial correlation coefficient between the MME and the observations for the R50 index was 0.51, which was the lowest among all the extreme precipitation indices. This indicated a clear underestimation of the R50 index in the southern region of China. The observed distributions of RX1D, RX5D, and SDII were consistent with PTOT, increasing from northwest to southeast. The MME exhibited spatial correlation coefficients of 0.83, 0.89, and 0.86, respectively, compared to the observations. In the southeastern coastal region, the MME noticeably underestimated the values of these three indices. Compared to the original CMIP6 models, the NEX-GDDP-CMIP6 overestimates the CDD index in the Northwest region. However, it improves the simulation of the PTOT index along the southern coast of China and Yunnan province, the R20 index in the Southwest region, the RX5D index in North China and Southwest region, and the SDII index in the Northeast region and Yunnan (Wei et al. 2023). Compared to the old version (NEX-GDDP-CMIP5) simulations of the RX1D and RX5D indices, the new version models exhibit slightly higher spatial correlation coefficients for these two indices (0.83 and 0.89) than the previous version (0.82 and 0.86) (Chen et al. 2017).

Fig. 6
figure 6

As in Fig. 1 but for eight precipitation indices

Figure 7 depicts the Taylor diagram for eight extreme precipitation indices. The NEX-GDDP-CMIP6 models performed the best in simulating the PTOT index, followed by the RX5D and SDII indices. The spatial correlation coefficients for all models were approximately 0.9, and the ratio of standard deviations was also close to 1. Regarding the CDD index, each model demonstrated good modeling capability with spatial correlation coefficients exceeding 0.8 and the ratio of standard deviations ranging from 1.0 to 1.5. However, the models exhibited weaker performance in simulating the CWD index, with spatial correlation coefficients of approximately 0.6 and a significant variation in the ratio of standard deviations, ranging from 0.5 to 1.6 among the different models. The simulation performance of the models for the R20 index was similar to that of the CDD index. The models exhibited the weakest capability in simulating the R50 index, with spatial correlation coefficients below 0.6 and ratios of standard deviations ranging from 0.5 to 1.5. The models performed relatively well in reproducing the spatial pattern of the RX1D index, with spatial correlation coefficients ranging from 0.8 to 0.9. However, the ratio of standard deviations between the model results and observations was approximately 0.5, indicating an underestimation of the RX1D values. The NEX-GDDP-CMIP6 model showed a weaker capability in simulating the CWD and R50 indices but performed the best in simulating PTOT, RX5D, and SDII. Compared to the original CMIP6 model, the NEX-GDDP-CMIP6 showed better performance in capturing the spatial patterns of indices such as CDD, PTOT, and RX5D. However, NEX-GDDP-CMIP6 failed to improve the simulation of the R50 index, with the MME spatial correlation coefficient at 0.52. The original CMIP6 models achieved a spatial correlation coefficient with observations exceeding 0.7, which likely relates to inaccuracies in the “observed” data used for model correction. Apart from the CWD index, the NEX-GDDP-CMIP6 model significantly reduced intermodel discrepancies and demonstrated good modeling capabilities for extreme precipitation events. (Wang et al. 2021; Wei et al. 2023; Xu et al. 2021; Zhu et al. 2020).

Fig. 7
figure 7

As in Fig. 2 but for eight precipitation indices

Figure 8 presents the Taylor skill scores for eight extreme precipitation indices. The models exhibited good performance in simulating the PTOT index, with values above 0.9. The RX5D and SDII indices ranked next, with most models achieving TS scores above 0.85. The modeling performance for the R20 and CDD indices was similar, with scores of approximately 0.75. Regarding the RX1D index, the models demonstrated TS scores of approximately 0.6, with the BCC-CSM2-MR, CMCC-CM2-SR5, CMCC-ESM2, GFDL-ESM4, and TaiESM1 models performing the best, achieving a TS score of 0.72. The MIROC-ES2L model performed the worst, with a TS score of 0.57. The modeling capability for the CWD index was comparable to that of the RX1D index, with TS scores of approximately 0.6. The CNRM-ESM2-1 model performed the best with a TS score of 0.68, while the CanESM5 model performed the worst with a TS score of 0.51. The scores for the R50 index were approximately 0.5, indicating the weakest modeling capability among the indices. The CMCC-CM2-SR5 model performed the best with a TS score of 0.62, while the CanESM5 model performed the worst with a score of 0.37. In summary, the MME exhibited strong modeling capability for the PTOT, RX5D, and SDII indices but performed the poorest in simulating the R50 index.

Fig. 8
figure 8

As in Fig. 3 but for eight precipitation indices

Table 4 provides the Taylor skill scores of extreme precipitation simulated by the MME for different regions in China, as well as the observed and simulated regional averages. For the CDD index, the MME exhibited the weakest modeling capability in the SWC1 region, with a score of only 0.52, while showing the best performance in simulating the EC region, with a score of 0.92. The MME overestimated most regions nationwide, with the largest deviation observed in the NWC region, where it exceeded the observed values by 38.79 days. Regarding the CWD index, the MME demonstrated limited modeling capability for the NWC and NC regions, with scores of only 0.1 and 0.19, respectively. The best simulation was observed in the EC region, with a score of 0.8. Overall, the MME tended to overestimate the CWD index, with the SWC2 region exhibiting the most significant overestimation, exceeding the observations by 16.59 days. For the PTOT index, the MME performed the worst in simulating the SWC1 region, with a TS score of 0.54, while achieving the highest score of 0.96 in the NC region. The MME simulation of the PTOT index generally fell short, particularly in the NWC region, where it was underestimated by 29.87%. Regarding the R20 index, the modeling capability of the MME was weakest in the SWC1 region, with a TS score of 0.13, while demonstrating the strongest capability in the NEC region, with a TS score of 0.86. The MME simulation of the R20 index had a tendency to underestimate, with the largest deviation observed in the SC region, falling short of observations by 11.68 days. The MME exhibited limited modeling capability for the R50 index in most regions, with the best performance observed in the NEC region, achieving a TS score of 0.7. The MME tended to underestimate the R50 index in various subregions, particularly in the SC region, where it fell short of observations by 3.23 days. For the RX1D index, the MME demonstrated the poorest modeling capability in the CC, EC, and SWC2 regions, with TS scores below 0.2, while achieving its best performance in the NEC region with a TS score of 0.81. The MME simulation of the RX1D index generally fell short of observations, with the SC region exhibiting the most significant underestimation, being 55.73% lower than the observed values. The MME performed relatively well in simulating the RX5D index, with the EC region showing weaker modeling capability, indicated by a TS score of only 0.24, while the NEC region exhibited the best simulation, with a TS score of 0.88. However, the MME still underestimated most regions, with the NWC region displaying the largest deviation, falling short of observations by 39.65%. The MME underestimated the SDII index in most regions. The largest deviation from observations occurs in the SC region, with an underestimation of 23.64%. The poorest simulated region was SWC1, with a TS score of 0.26, while the best simulated region was NEC, with a score of 0.83. Nationwide, the PTOT index simulated by NEX-GDDP-CMIP6 shows a slight underestimation of 6.83% compared to observations, while the simulation results of the CMIP6 model significantly overestimate by 79%; the simulated CDD index shows a significant overestimation of 21.25 days, with the largest deviation mainly in the northern regions of China, while the CMIP6 model as a whole shows an underestimation; the R20 index and SDII index are underestimated by 2.62 days and 6.21%, respectively, compared to observations, while the results from the CMIP6 model are overestimated by 1.93 days and 21%, respectively. In summary, NEX-GDDP-CMIP6 has improved the biases of PTOT and SDII indices nationwide. However, the simulated CDD index shows a significant overestimation, while the R20 index exhibits an underestimation (Zhu et al. 2020).

In general, the MME shows different capabilities in simulating extreme precipitation indices in different regions of China. In SWC1, the poor performance of the MME in modelling CDD and PTOT could affect the accuracy of drought and flood forecasts. In NWC, the pronounced deviation in CDD simulation indicates a potential overestimation of drought severity. MME shows very limited CWD simulations in NC. MME excels in simulating CDD in EC, but has deficiencies in simulating RX5D. In SWC2, MME shows a significant overestimation of CWD. In SC, there is an underestimation of R20. The R50 index is generally poorly simulated in most regions, with the exception of the NEC. MME exhibited more consistent performance in the NEC region, with TS scores mostly exceeding 0.5 and smaller deviations from observations. The MME also performed reasonably well for the entire region of China, with TS scores surpassing 0.5. These regional differences in simulation accuracy highlight the need for specific improvements in climate models to better anticipate and manage the impacts of extreme climate events.

In addition to deficiencies in the bias correction methodology and inaccuracies in the observations of the corrected model, the primary causes for the simulation biases in extreme precipitation events across different regions of China are as follows: In Northwest China, extreme precipitation is mainly influenced by the zonal wave pattern, characterized by the deepening of the Siberian trough, and the Central Asian and Mongolian high pressures. In this circulation configuration, the westerly and easterly airflows anomalously strengthen and converge at this location (Ning et al. 2020). The soil temperature and moisture in different areas of this region also show a high correlation with extreme precipitation events (Wu et al. 2023a), and the extreme precipitation events in this area are also accompanied by decadal changes (Guo et al. 2014; Lu et al. 2021). Extreme precipitation in Northeast China is influenced not only by the East Asian summer monsoon but also the cold air and Northeast Cold Vortex (Hong et al. 2020). North China’s extreme precipitation is influenced by four different moisture transport patterns, each corresponding to different moisture sources (Zhang et al. 2024). Simultaneously, this region is affected by a multiscale system of planetary scale, synoptic scale, subsynoptic scale, and air-sea interactions (Zhou et al. 2023; Zhao et al. 2019). The extreme precipitation events in Central China are related to anomalous local cyclonic circulation, the orographic forcing of the Tibetan Plateau, and the non-adiabatic heating gradient between the Central region and its surrounding areas (Ke and Guan 2014). Additionally, the interaction between the Western Pacific Subtropical High and the East Asian monsoon, as well as the terrain’s blocking effect, significantly influences extreme precipitation in this region (Nie and Sun 2022; Yin et al. 2022; Zhao et al. 2022). Southwest China, located in the transition zone influenced by the East Asian summer monsoon, Indian monsoon, and thermal activities of the Tibetan Plateau, experiences extreme precipitation events that are not only affected by the terrain (Xu et al. 2024a) but also influenced by two types of low-frequency oscillations (Nie and Sun 2023). The East Asian summer monsoon and landfalling tropical cyclones are the main weather systems affecting most of the extreme precipitation events in East China (Tang et al. 2021), with the Northwest Pacific Subtropical High being one of the most crucial systems. This system significantly affects the precipitation in the Yangtze-Huai region in June and July, as well as the precipitation in South China regions like the Pearl River Basin in June. The westward extension of the subtropical high pushes tropical cyclones southward, thereby increasing the occurrence of extreme precipitation events in South China (Zhang et al. 2017). Climate models generally have difficulties simulating the physical mechanisms that cause extreme precipitation events in different regions of China.

Table 4 As in table 3 but for eight precipitation indices (the differences between historical experiments and observations for PTOT, RX1D, RX5D and SDII are calculated as a percentile)

We assessed the temporal variability capability of the model regarding extreme precipitation indices in the Chinese region, and Fig. 9 presents the temporal evolution of eight extreme precipitation indices from 1961 to 2014 based on observations and model simulations. Except the CDD index, the observed values of the other extreme precipitation indices exhibited an increasing trend. The trend coefficients of observed CDD, CWD, R20, R50, RX1D, RX5D, SDII, and PTOT were − 1.46 days/10 years, 0.1 days/10 years, 0.03 days/10 years, 0.02 days/10 years, 0.87%/10 years, 0.34%/10 years, 0.34%/10 years, and 0.6%/10 years, respectively. The corresponding trend coefficients from the MME simulations were − 0.13 days/10 years, 0.01 days/10 years, 0.02 days/10 years, 0.01 days/10 years, 0.66%/10 years, 0.5%/10 years, 0.23%/10 years, and 0.12%/10 years, indicating overall agreement between the model simulations and observations. The trend coefficients of CDD, R50, and RX1D from observations and simulations passed a significance test at the 5% level, whereas CWD, R20, and PTOT did not pass the test. The trend coefficients of RX5D and SDII indices from MME simulations passed a significance test at the 5% level, whereas the observed values did not pass the test. Except for the CWD index, the MME struggled to reproduce the interannual variability of the other extreme precipitation indices. This could be due to the small changes in the observed and simulated CWD, while the other indices exhibited significant variability. Regarding the CDD index, the MME values were obviously lower than the observed values before 1985, displaying a declining trend. After 1985, the trend stabilized, but the MME failed to capture this behavior. The observed CDD index demonstrated significant interannual and decadal variability, which the MME struggled to reproduce. The MME simulations approximated the values of the PTOT, R20, and R50 indices compared to observations but struggled to simulate their interannual variability. The observed CDD, RX1D, RX5D, and SDII indices exhibited quasiperiodic variations with significant interannual variability, which were also challenging for the MME to replicate accurately. Discrepancies in the simulation of these indices were evident, with the models generally underestimating the observed variability. This shortcoming is more pronounced in regions with complex topography and diverse climatic conditions, suggesting a need for enhanced physical parameterizations and improved representation of orographic and convective processes in the models.

Fig. 9
figure 9

As in Fig. 4 but for eight precipitation indices

The NEX-GDDP-CMIP6 models demonstrate different abilities in simulating different extreme precipitation indices in China. The models generally perform well in capturing the spatial distribution of some indices, such as PTOT, with high spatial correlation coefficients and reasonable consistency among models. However, they face challenges in accurately simulating other indices like CWD and R50, with significant overestimations or underestimations across different regions. Regional variations in model performance were evident, with the Northeast region showing the best agreement with observations for PTOT and moderate performance for CWD. The models’ ability to simulate interannual variability of precipitation extremes is also limited, particularly for indices like CDD, RX1D, RX5D and SDII, where there is a significant deviation from observed values. Overall, the NEX-GDDP-CMIP6 models provide a useful tool for assessing extreme precipitation patterns in China, but the results highlight the need for further model refinement to improve simulations of specific precipitation indices and to better capture regional differences and temporal variations.

3.2 Future changes in extreme events

Given that multimodel ensembles (MMEs) generally outperform individual models in most cases, the projections of extreme temperature indices during 2081–2100 were obtained using the NEX-GDDP-CMIP6 MME under three scenarios: SSP1-2.6, ssp2-4.5, and SSP5-8.5. Figure 10 illustrates the spatial variability of the TXX, T35, TX90P, HWF, and HWD indices estimated by the MME. The dots represent regions where 90% of the models exhibited consistent signs with the multimodel ensemble average. Under the SSP1-2.6 scenario, the TXX index showed an increase nationwide (Fig. 10a), with a national average increase of 2.14℃ (Table 5). The increase in the northern region was greater than that in the southern region. However, there was considerable inconsistency among models in the SC and NEC regions. Under the SSP1-2.6 scenario, the TXx index projected by NEX-GDDP-CMIP6 shows an increased consistency in the northwest region of China compared to the original CMIP6 models. In the northern part of China, it generally exhibits an increase of about 1 °C relative to the original CMIP6 models. Under the SSP2-4.5 and SSP5-8.5 scenarios, the pattern of TXX index change remained consistent with the low-emission scenario, but the magnitude of increase became more significant. The TXX index increased by 3.2℃ and 5.57℃ nationwide under the SSP2-4.5 and SSP5-8.5 scenarios, respectively, and the consistency among models increased with higher emission scenarios. Under the SSP2-4.5 scenario, the NEX-GDDP-CMIP6 models estimate values in the eastern part of the southwestern region of China that are approximately 1 °C higher than those of the original CMIP6 models. Similarly, under the SSP5-8.5 scenario, the estimated values for the northern region are also about 1 °C higher compared to the original CMIP6 models. The T35 index increased nationwide under the SSP1-2.6 scenario, with high-value centers appearing in the NWC and CC regions, exceeding 30 days, and a national average increase of 12.2 days. Under the SSP2-4.5 and SSP5-8.5 scenarios, the T35 index further increased, with national average increases of 19.26 days and 41.12 days, respectively. As emission scenarios increased, the range of high-value centers expanded. In the SSP5-8.5 scenario, except for the SWC1, SWC2, and NEC regions, most of the country experienced T35 index values exceeding 45 days, and the consistency among models remained high across all three emission scenarios. The TX90P index also increased with higher emission scenarios, exhibiting high consistency among models. Across the three scenarios, the national increases were 20.57%, 31.36%, and 51.18%, respectively. The NEC and NC regions showed relatively smaller increases compared to other regions. Under the three emission scenarios, the TX90P index projected by the NEX-GDDP-CMIP6 is generally about 5% higher than that of the CMIP6 models. Regarding the HWF index, under the SSP1-2.6 scenario, the minimum value center appeared in the NEC region, with values below 6 times nationwide (7.88 times). Under the SSP2-4.5 scenario, the HWF increased nationwide, reaching 10.69 times. In the SSP5-8.5 scenario, the HWF index values were smaller in the SWC1 and SWC2 regions compared to the previous two scenarios but increased in other regions with increasing emission scenarios, reaching 11.45 times nationwide. The model consistency was lower in the SWC1 and SWC2 regions, while it remained high in other regions. The HWD index increased with higher emission scenarios, with the NEC region still exhibiting the minimum value center. The national values for the three scenarios were 70.13 days, 111.05 days, and 190.96 days, respectively. Overall, by the end of the 21st century, the consistency among models regarding the change in extreme high-temperature events was high, with increases observed in most regions nationwide.

Fig. 10
figure 10

Spatial patterns of the projected changes in TXX, T35, TX90P, HWF and HWD for the NEX-GDDP-CMIP6 MME over China under the SSP1-2.6, SSP2-4.5, and SSP5-8.5 scenarios by the end of the 21st century (2081–2100) relative to the period from 1995 to 2014. The stippling is present in regions where more than 90% of the models are consistent with the signs of MME changes

Figure 11 depicts the temporal evolution of the national regional extreme high temperature indices estimated by MME under three emissions scenarios. The indices, namely, TXX, T35, TX90P, and HWD, all exhibited an increasing trend as time progressed. Moreover, as radiative forcing scenarios intensified, the magnitudes of these increases also escalated. By 2100, under the SSP1-2.6 scenario, the TXX, T35, TX90P, and HWD indices experienced respective augmentations of 1.83 °C, 9.3 days, 18.62%, and 62.07 days. Correspondingly, in the SSP2-4.5 scenario, these indices exhibited increases of 2.87℃, 16.32 days, 32.98%, and 115.53 days, respectively. In the SSP5-8.5 scenario, the same indices underwent more substantial amplifications of 6.37 °C, 48.22 days, 59.37%, and 225.52 days. Compared to the original CMIP6 models, under the low emission scenario, the TXx index is 0.2 °C higher, it is relatively close under the medium emission scenario, and nearly 0.5 °C higher under the high emission scenario. The TX90P index is quite similar to the original model under low and medium emission scenarios, but it is approximately 2% higher under the high emission scenario (Wei et al. 2023). The HWF index demonstrated a growth trajectory with time under the low and medium emissions scenarios, albeit with a slower pace after 2060. However, a declining trend became noticeable beyond 2060 under the high emissions scenario, possibly attributable to an increase in the duration of individual heatwaves. Across these three scenarios, the HWF index experienced respective increments of 7.92 times, 11.9 times, and 11.43 times by the year 2100. From Fig. 11, it became apparent that most extreme high temperature indices continuously intensified with the amplification of emissions scenarios and the passage of time. Furthermore, higher emissions scenarios contributed to increased uncertainty among the models.

Fig. 11
figure 11

Temporal evolution of the projected changes in (a) TXX, (b) T35, (c) TX90P, (d) HWF, and (e) HWD for the NEX-GDDP-CMIP6 MME over China under the SSP1-2.6, SSP2-4.5 and SSP5-8.5 scenarios for 2015–2100 relative to the reference period 1995–2014. The shading indicates the minimum and maximum values

Figure 12 illustrates the spatial distribution characteristics of extreme precipitation events in the late 21st century under three emission scenarios. The CDD index was generally consistent among the three scenarios. It increased in southeastern China but decreased in other regions. In the southern part of the northwest region, the low-value center expanded with increasing emission scenarios. Compared to the original CMIP6 models, there is considerable inconsistency among the models, and even under the high emission scenario, the declining trend of the CDD index in the North China region varies significantly among the models, which is different from the CMIP6 models. The distribution pattern of the CWD index was also similar under the three emission scenarios, showing an increasing trend nationwide. The high-value center around the southern side of the SWC1 region expanded with increasing emission scenarios. The simulation of these two indices by the model exhibited significant discrepancies, with scattered regions of only consistent signs in northern China. Under the low emission scenario, the PTOT index increased nationwide, with a high-value center appearing near the Kunlun Mountains, exceeding 20%. Compared to the original CMIP6 models’ distribution pattern of less precipitation in the west and more in the east, the annual total precipitation increase in the northern region of China is greater than in the southern region, with consistency in the models’ projections observed in the southwestern and northeastern parts of the country. Under the medium emission scenario, the distribution pattern of the annual total precipitation is similar to that of the low emission scenario, with high-value center values exceeding 30%, which is higher than the original CMIP6 models (20%), and model consistency continues to be concentrated in the southwestern and northeastern regions. Under the high emission scenario, the NEX-GDDP-CMIP6 models estimate a smaller increase in precipitation for the North China region (20-25%) and the southern region (10-15%) compared to the original CMIP6 models (greater than 30% and 15%-20%). In all three emission scenarios, the high-value center for the R20 index appeared in the SC region. With increasing emission scenarios, the values in this region continuously increased, and the range expanded. The highest value exceeded 5 days. Additionally, under the SSP5-8.5 scenario, a high-value center appeared south of the Qinghai-Tibet Plateau. Except for the NWC region, the region of consistent sign deviation increased in the other areas. Under low and medium emission scenarios, compared to the CMIP6 models, inter-model consistency primarily occurs in parts of Southwest and South China. Under the three emission scenarios, the R20 index in Northeast China, North China, and the Yangtze River Basin is approximately one day less than in the original CMIP6 models. The R50 index also increased with increasing emission scenarios, although the increase was relatively small. The high-value center appeared in the southeastern coastal region, while the change in other areas was minimal. The region of consistent sign deviation expanded with higher emission scenarios. The variations in the RX1D, RX5D, and SDII indices were relatively similar. These three indices consistently increased nationwide under the low and moderate emission scenarios. Under the moderate emission scenario, high-value centers appeared near the Kunlun Mountains and on the eastern side of the NWC region. Under the high emission scenario, the values further increased, and the high-value center expanded to the entire SWC1 region, while the region of consistent sign deviation also expanded. Under SSP1-2.6, the SDII index simulated by NEX-GDDP-CMIP6 aligns closely with the original CMIP6 models. Under SSP2-4.5, the SDII index in central and southern China is about 5% lower than the projections from the original CMIP6 models. Under SSP5-8.5, the SDII index in eastern China is approximately 5-10% lower than the projections from the original CMIP6 models. Under SSP1-2.6 and SSP2-4.5, the RX5D index projected by NEX-GDDP-CMIP6 is about 5% lower than the projections from the original CMIP6 models. Under high emission scenarios, the projected RX5D index in northern China and the Yangtze River basin are approximately 10% smaller than those of CMIP6. Overall, except for the projected CDD index, which showed a distribution of fewer occurrences in the north and more in the south, the other precipitation indices consistently increased. The R20 and R50 indices showed significant increases in the southeastern coastal region, while the PTOT, RX1D, RX5D, and SDII indices primarily experienced significant increases in the western region. The CDD and CWD indices simulated by the MME exhibited significant inconsistencies among the models, whereas the PTOT index demonstrated high consistency in terms of sign only in the northern region of China, with substantial uncertainty persisting in the southern region. For the remaining indices, their consistency in terms of anomalies expanded continually as emission scenarios escalated. Compared to the original CMIP6 models, there are large differences between models in the bias-corrected estimates of the CDD index. Under high emission scenario, the projected increase in the PTOT index is somewhat smaller. The R20 index for northern regions and the Yangtze River basin is estimated to be about one day less than the CMIP6 models’ projections. The model projects the SDII index in eastern China are about 5-10% lower than those of the CMIP6 models. Additionally, the model projects the RX5D index in northern China and the Yangtze River basin are approximately 5% smaller than those of CMIP6.

Fig. 12
figure 12

As in Fig. 9 but for eight precipitation indices

Figure 13 illustrates the temporal variations in extreme precipitation indices in China estimated by the models. The CDD index exhibited a declining trend, with minimal differences in their long-term trends among the three emission scenarios. By 2100, the CDDs were projected to decrease nationwide by 0.8 days, 1.84 days, and 2.86 days under the three emission scenarios, respectively. Conversely, the CWD index showed a weak upward trend, with similarly small long-term trends across different emission scenarios. The CWD was projected to increase by 1.13 days, 1.21 days, and 2.34 days under the three emission scenarios, respectively. The R50 index also exhibited a minimal upward trend across the emission scenarios, although the long-term changes in the high emission scenario still surpassed those in the low and moderate emission scenarios. By 2100, the R50 index was projected to increase by 0.13 days, 0.22 days, and 0.58 days under the three scenarios, respectively. In the high emission scenario, starting from 2060, the values of the remaining extreme precipitation indices were significantly higher than those in the low and moderate emission scenarios. Under the high emission scenario, the PTOT, R20, RX1D, RX5D, and SDII indices increased by 24.93%, 3.21 days, 32.85%, 30.2%, and 17.6%, respectively. The analysis above indicated that, except for the CDD and CWD indices, the other extreme precipitation indices exhibited clear increases under the high emission scenario, emphasizing the urgency of greenhouse gas mitigation. By the end of the 21st century, the NEX-GDDP-CMIP6 model estimates for the SDII index are slightly lower compared to the original CMIP6 models. However, the projections of the RX5, PTOT and the R20 index are a little higher than those of the CMIP6. (Wei et al. 2023).

Fig. 13
figure 13

As in Fig. 10 but for eight precipitation indices

The changes in the regional mean values of extreme indices over the period 2081–2100, compared to 1995–2014, are presented in Table 5. In the future, different subregions of China are set to exhibit distinct regional characteristics in terms of extreme climate events. In the NWC, a particularly remarkable increase in the TXx is expected, especially under SSP 5-8.5. Concurrently, a significant rise in the T35 is also anticipated. However, the CDD index is expected to decrease, suggesting that the region may experience fewer consecutive drought days in the future. The changes in extreme climate events in the SWC1/2 are also prominent, especially in terms of extreme precipitation. The RX1D and RX5D indices are expected to rise across the board, potentially related to changes in future precipitation patterns, increasing the risk of floods and landslides. The SWC1 will see the most pronounced increase in HWD and TX90P. In SC, an upward trend in extreme temperature events is observed, particularly under SSP5-8.5. The increase in high temperature days could have significant impacts on human health and agricultural production. Furthermore, indices related to extreme precipitation, such as R20 and R50, show a significant upward trend in the southeastern coastal areas of the region. In the CC region, both the TXx and T35 are expected to undergo significant increases under high emission scenario, potentially exacerbating extreme heat events in the area. Moreover, intensity index for extreme precipitation events, such as the RX1D, is also anticipated to rise significantly under high emission scenarios, potentially increasing the intensity of extreme precipitation events and thereby elevating the risk of floods. EC sees a moderate increase in extreme temperature events, but a general rise in extreme precipitation indices, especially R50 and CDD. This may indicate more frequent extreme precipitation and drought events in the future, posing challenges for urban waterlogging and water resource management. NC and NEC are also facing increased risks of extreme temperature and precipitation events. Among all subregions, the SDII in NC region ranks 2nd and CDD index in this region has decreased significantly under SSP 5-8.5. The increase in precipitation intensity could exacerbate the pressure on urban drainage systems, increasing the risk of flooding.

In general, under high emission scenario, the NWC region will exhibit the most significant increase in the TXx, while the SC region will experience the most significant rise in the T35. The SWC1 will see the most pronounced increase in HWD and TX90P. Regarding the frequency of extreme precipitation events, the R20 index in SC and the R50 index in EC are expected to show the most significant increases. In terms of the intensity of extreme precipitation, the SWC1 region’s RX1D, RX5D, and SDII indices will see the most substantial increases. The CDD index is projected to decrease most significantly in the NC and to increase most significantly in SC, while the CWD index is projected to increase most significantly in the SWC1. Most regions of China are expected to face more frequent and severe extreme climate events in the future, especially under high emission scenarios. These changes not only pose threats to ecosystems but also challenge human societal economic activities and daily life. Therefore, implementing effective adaptation and mitigation measures to reduce greenhouse gas emissions and enhance societal resilience to climate change becomes particularly crucial.

Table 5 Projected changes in the 13 extreme climate indices for 2081–2100 compared to 1995–2014 in the 8 subregions and China under the three scenarios

4 Conclusion and discussion

In this paper, we examined the performance of 26 NEX-GDDP-CMIP6 models in simulating climatologies and the interannual variability of extreme climate indices across China and its eight subregions. It also analyses the future projections of these indices and the associated uncertainties for the period 2081–2100 under three Shared Socioeconomic Pathways (SSP1-2.6, SSP2-4.5, SSP5-8.5). The results are consistent with an increase in high temperature extremes and an intensification of extreme precipitation events in an increasingly warmer future. These findings are consistent with previous studies using CMIP5 and CMIP6 models (e.g., Zhou et al. 2014; Zhu et al. 2020; Wei et al. 2023). In addition, the more pronounced changes in extreme climate indices associated with increased radiative forcing confirm previous research. This study is providing a broader and more detailed examination of the expected variations in climate extremes across China and its 8 sub-regions. It incorporates multiple dimensions, new scenarios and the latest generation of downscaled climate models. The main conclusions are outlined below:

1) The MME of NEX-GDDP-CMIP6 models captures the spatial distribution of extreme high temperature events effectively, particularly in the northern regions of China. The models perform well in simulating TXx and T35 indices, but they still tend to underestimate TX90P, HWF, and HWD. The MME demonstrates higher spatial correlation coefficients and TS scores for the TXX indices compared to the original CMIP6 MME, but it exhibits slightly larger biases in simulating the index than the CMIP6 MME. Similarly, the MME shows strong performances in modeling extreme precipitation events, with the highest spatial correlation coefficients observed in simulations of the PTOT index. Relative to the CMIP6 models, NEX-GDDP-CMIP6 has successfully improved the biases in the PTOT and SDII indices across the country. The simulated CDD index showed a significant overestimation, while the R20 index shows an underestimation. In comparison to NEX-GDDP-CMIP5 simulations, the newer version demonstrates slightly improved spatial correlation coefficients for the RX1D and RX5D indices. However, challenges remain in accurately modeling the CWD and R50 indices. From a regional perspective, the NEX-GDDP-CMIP6 tends to simulate a higher CDD index in the Northwest region. Nevertheless, it enhances the accuracy of the PTOT simulations along China’s southern coast and in Yunnan province, and it better captures the R20 index in the Southwest area, the RX5D index in North China and Southwest region, as well as the SDII index in the Northeast and Yunnan.

2) Consistency among different models was found to be high for the projected temperature extremes. Specifically, for the entire country, the TXX, T35, TX90P, HWF, and HWD indices are projected to increase by 5.57 ℃, 41.12 days, 51.18%, 11.45 times, and 190.96 days, respectively, under the SSP5-8.5 scenario. Compared to CMIP6 models, TXx is nearly 0.5℃ higher and TX90P is 2% higher under SSP 5-8.5 scenario. The projections of MME in northern China are about 1 °C higher than those of the CMIP6 models under SSP 5-8.5 scenario. In terms of precipitation extremes, there is a projected increase in indices such as CWD, PTOT, R20, R50, RX1D, RX5D, and SDII, while the CDD index is expected to decrease. On a national scale, during the long-term period of the 21st century, the CWD, PTOT, R20, R50, RX1D, RX5D, and SDII indices are projected to increase by 1.57 days, 21.99%, 2.48 days, 0.41 days, 24.56%, 22.45%, and 12.59%, respectively, while the CDD index is projected to decrease by 2 days. Considerable inconsistencies are projected among different models for the CDD and CWD indices in China. Additionally, great inconsistencies are also found for the PTOT index in southern China. With increasing emission scenarios, the consistency among models improves for other precipitation indices as well. There are remarkable differences in the bias-corrected model projections of the CDD index compared to the original CMIP6 models. Under a high emissions scenario, the projected increase in the PTOT index is somewhat smaller. The R20 index projections for the northern regions and the Yangtze River basin are about one day shorter than those projected by the CMIP6 models. Projections for the SDII index in eastern China are projected to be about 5–10% lower than those from the CMIP6 models. In addition, the projections for the RX5D index in northern China and the Yangtze basin are about 5% lower than those from the CMIP6 models.

3) Under high emission scenario, the NWC region will display the most significant rise in TXx, whereas the SC region will witness the most significant increase in T35. The SWC1 region will experience the most pronounced escalation in HWD and TX90P. In terms of the frequency of extreme precipitation events, the R20 index in SC and the R50 index in EC are projected to see the most substantial increases. Regarding the intensity of extreme precipitation, the SWC1 region will observe the largest increases in the RX1D, RX5D, and SDII indices. Conversely, the CDD index in NC is expected to see the most significant reduction, while the CWD index in SWC1 is anticipated to increase most prominently.

Bias correction facilitates the improvement of simulations for extreme precipitation and temperature events, providing more accurate information for policy formulation and adaptation measures. However, uncertainties exist in different bias correction methods. The quantile mapping method adopted by the NEX-GDDP-CMIP6 model effectively corrects the systematic biases of climate models; however, this method may also alter the trend of future projections to a certain extent. In this regard, Cannon et al. (2015) introduced the Quantile Delta Mapping (QDM) algorithm, designed to preserve the relative changes in precipitation quantiles. Comparing Detrending Quantile Mapping (DQM) and standard Quantile Mapping (QM), they found QDM and DQM to perform better in maintaining the original future trends projected by GCMs. Moreover, Tong et al. (2021), upon applying QM and QDM methods, discovered that although both approaches efficiently eliminate the models’ systematic biases, bias correction could significantly alter the estimated change magnitude. This effect, in fact, depends on the applied method, season, and analyzed variable. Similarly, Chen et al. (2022), by applying linear scaling, distribution-based QM, and empirical QM methods to correct simulated temperature and precipitation by the WRF model, significantly reduced model biases, with different methods having varied effects. Among these, the empirical QM method offered the most comprehensive correction effect, while the distribution-based QM method was more effective in handling extreme daily precipitation amounts, and linear scaling showed good performance in correcting seasonal cycles. Regarding the contribution of uncertainties on future projections, Iizumi et al. (2017) noted that, the impact of different bias correction methods and “observation” datasets on precipitation indices in the early and late 21st century actually exceeds the impact of different climate models and emissions scenarios. In the early 21st century, climate models themselves are the main source of uncertainty for temperature index estimates, while in the later periods, emission scenarios become the primary uncertainty factor. For the estimation of long-duration hot and dry events in China’s summer, model uncertainty is the main source of uncertainty throughout the century, although the contribution of downscaling methods is relatively small. However, it is noteworthy that the uncertainty of emission scenarios becomes more important in estimating the intensity of hot and dry events in the mid to late 21st century and becomes the main factor by the end of the century (Yang and Tang 2023). As shown in the study by Lafferty and Sriver (2023), downscaling and bias correction introduce significant uncertainty into local climate projection outcomes. Particularly in the early to mid-21st century, the main sources of uncertainty for precipitation forecasts, extreme climate event predictions, complex terrain areas, and regions with inconsistent historical observational data primarily stem from downscaling and bias correction. Therefore, overly relying on single downscaling methods and bias correction output data may lead to inaccuracies in future climate change projections.

In addition to the uncertainties inherent in bias correction methods themselves, the internal variability of the climate system is a factor that cannot be overlooked. It should be noted that correction methods based solely on statistical relationships are insufficient to fully address the discrepancies in internal climate variability between climate models and observational data, such as variations in large-scale circulation patterns. This may result in suboptimal correction outcomes in certain regions. Therefore, future research should delve deeper into the role of internal variability at local scales and explore how to obtain estimates of internal variability that are independent of specific climate models by integrating statistical techniques. Moreover, to deepen the understanding of the impact of internal climate variability on future projections of extreme events, detailed uncertainty analyses will be conducted using large Single Model Initial-condition Large Ensembles (SMILEs) simulation data. Given the varying performance of different bias correction methods in different contexts, future studies may focus on how to combine multiple methods to achieve optimal results when simulating different climate variables and extreme events (Lafferty and Sriver 2023).

In light of these limitations, our study’s conclusions should be interpreted with caution. While the NEX-GDDP-CMIP6 models provide valuable insights into future climate extremes in China, the uncertainties highlighted necessitate ongoing research. Lastly, as compound extreme climate events become increasingly prevalent across various regions in China, further exploration into the future changes of these compound extreme events is necessary. Although current downscaling methods can replicate the characteristics of single-variable extreme climate events to some extent, they often overlook the multivariable nature of compound extreme events (Park et al. 2023; Xu et al. 2024b; Yang and Tang 2023). Thus, future research will also consider adopting multivariable bias correction and downscaling approaches.