1 Introduction

Climate extreme changes have more severe impacts on multiple aspects in human and natural systems compared to the mean climate changes (Seneviratne et al. 2012). The climate extremes have increased in terms of both frequency and intensity during recent decades as the global mean surface temperature (GMST) has experienced a marked increase (IPCC 2014). A better understanding in changes of climate extremes is of urgent need for policy-making and adaption-planning. The Paris Agreement sets a goal of “holding the increase in the global average temperature well below 2 °C above the pre-industrial levels and to pursue efforts to limit the temperature increase to 1.5 °C above pre-industrial levels” (UNFCC 2015). The different impacts of climate extreme changes between the 1.5 °C versus 2 °C warmer worlds have emerged as a global concern (Hulme 2016; Schleussner et al. 2016; King and Karoly 2017; Mitchell et al. 2017).

The impacts of the half-degree additional warming between the two long-term global warming goals (LTGGs) 1.5 °C and 2 °C have been studied using various approaches including: the transient simulations with a “time-sampling” approximation (Schleussner et al. 2016; King and Karoly 2017; Kharin et al. 2018; Zhang et al. 2018; Zhou et al. 2018a); the stabilized 1.5 °C and 2 °C warming level targeting experiments performed by fully coupled general circulation models (Sanderson et al. 2017; Li et al. 2018; Nangombe et al. 2018) and the atmosphere-only general circulation models such as the ‘Half a degree Additional warming, Prognosis and Projected Impacts’ (HAPPI) project (Lewis et al. 2017; Mitchell et al. 2017; Chevuturi et al. 2018; Zhou et al. 2018b). Different model projection results (or methods) have all projected increase in mean temperature in China under 0.5 °C additional warming with a range between 0.6 and 0.7 °C (Chen and Zhou 2017; Xu et al. 2017; Lin et al. 2018). And a generally warming tendency has also been projected in the extreme temperature. The hottest day and night will increase by ~ 0.6 °C–0.7 °C, and the frequency of the warm days will be increased ~ 130% (Guo et al. 2017; Kharin et al. 2018; Shi et al. 2018). While these different approaches qualitatively agree with each other in the general decreases of extreme temperature under 1.5 °C warming level compared to 2 °C, quantitative differences are seen due to the different modeling strategies and the limitations of climate models in the physical processes parameterization. Observational evidences are hoped to serve as metrics for gauging model performance. Since the GMST has already witnessed a half-degree warming increment during the period of 1991–2010 compared to the period of 1960–1979 (Hansen et al. 2010). This indicate that the observed changes in climate extremes under this historical 0.5 °C warming increment can be used as observational metrics to evaluate model performance, or even regarded as analogues for the future projections (Schleussner et al. 2017). Chen et al. (2018) has pointed out that the temperature extreme changes are detectable in China under the past-half degree warming increment. Zhao and Zhou (2019) has compared the heat extreme changes under historical and future half-degree warming and find out that the historical changes in daytime heat extremes under the past 0.5 °C warming increment is a conservative estimation for the future projection. How to reliably quantify the observed changes is thus of central importance.

Unfortunately, the observational datasets still have many limitations. For example, the sparse spatial coverage of observational stations in some parts of the world does not allow us to give a reliable estimation of the observed temperature changes (Alexander et al. 2006; Caesar et al. 2006; Donat et al. 2013a, b). As an alternative choice, the reanalysis datasets, which can provide consistent spatial and temporal resolution for many decades by assimilating traditional observation, radiosonde, and satellite data into dynamically consistent models (Kalnay and Cai 2003), are hoped to be useful substitutes to the observations. The reanalyses have been demonstrated to be reliable in measuring the monthly-to-annual temperature climatologies and anomalies on a global scale compared to observational datasets, although they still show weaknesses in quantifying the long-term trends (Smith et al. 2001, 2004; Simmons et al. 2010; Compo et al. 2013). Climate extremes tend to manifest different variations compared to the mean state changes (Seneviratne et al. 2014). A systematic assessment of the consistency in extreme temperature changes between a set of gridded observational datasets and reanalysis datasets indicates that while normalized trends during the past ~ 60 years generally compare well, the actual values of annual extremes differ across datasets (Donat et al. 2014). Extended to the whole twentieth century, the long-term trends of extreme temperature manifest a consistent increase globally (with local variations) in three reanalysis datasets, but the spreads across reanalysis datasets are still large in the first half of the twentieth century (Donat et al. 2016). The reanalysis datasets have been used to validate the model performance on simulating the extreme temperature changes during the period of 1979 until now (Kharin et al. 2005, 2007; Sillmann et al. 2013; Angélil et al. 2016).

In comparison to other parts of the world, less effort has been devoted to the validation of reanalysis datasets in quantifying the changes of extreme temperature in China. Some preliminary analyses found that the reanalyses tend to underestimate the climatology daily maximum temperature (TX) and overestimate the climatology daily minimum temperature (TN) especially over regions with complex topography (Mao et al. 2010). The reanalysis data show biases in measuring the long-term trends of extreme temperature over the regions with sparse observations such as the Tibetan Plateau (You et al. 2013; Zhou et al. 2016). The biases of the extreme temperature frequency indices are larger than that of the intensity indices (Zhu et al. 2017). While the existing evaluations on the performance of reanalysis datasets in quantifying the climate states and long-term trends in extreme temperature changes provide useful information, the strengths and weaknesses of reanalysis datasets in measuring the changes of extreme temperature under the historical half-degree warming increment remains unknown. In this study, we aim to answer the following questions: (1) Are reanalysis datasets reliable substitutes for observations on revealing the extreme temperature changes under the historical 0.5 °C warming increment in China? (2) What are the advantages and disadvantages of each reanalysis datasets on measuring the extreme temperature changes and what are the possible causes? We show evidences that the reanalysis datasets can generally reproduce the changes of the extreme temperature under the past 0.5 °C warming but have large spreads on regional scales. JRA-55 shows better consistency with the observation than ERA and 20CR due to more observational data constraints and realistic aerosol data used in the reanalysis system.

The remainder part of the paper is structured as follows: we describe the datasets used and the methods applied in Sect. 2. A comparison of extreme temperature changes derived from a gridded observational dataset and three reanalysis datasets are presented in Sect. 3. The conclusion is given in Sect. 4 along with a discussion.

2 Data and methods

2.1 Gridded observational dataset and reanalysis datasets

We investigate and compare the extreme temperature changes in one gridded observational dataset and three reanalysis datasets. The observational dataset we used is the homogenized gridded CN05.1 with a horizontal resolution of 0.25° × 0.25° and a time range of 1961–2015 (Wu and Gao 2013). It is constructed using more than 2400 observation stations in China. An “anomaly approach” (New et al. 2000) is applied in the data interpolation. The dataset has been proved to have good performances in climatology and long-term trend on both mean state and extreme temperature in comparison with other commonly used observational datasets. Minor differences still can be found in station sparse regions like the western China indicating certain uncertainties in such regions which should be interpreted with care (Xu et al. 2009; Wu and Gao 2013). We also calculated the mean temperature increase in observational records under historical half-degree warming increment which is ~ 0.6 °C in CN05.1, comparable to the model projected results.

Three commonly used reanalysis datasets are compared in our study:

  1. 1.

    European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis (ERA), which is merged from the 45-year ECMWF reanalysis (ERA-40) (Uppala et al. 2005) and the ECMWF Interim reanalysis (ERA-Interim) (Dee et al. 2011), because both of them doesn’t cover the full analyzing period of 1961–2010. The time merged ERA reanalysis dataset includes ERA-40 from the year of 1961 until the year of 1978 and ERA-Interim from the year of 1979 and after (following Schleussner et al. 2017). The ERA reanalysis dataset used is at a horizontal resolution of 1.125° × 1.125°.

  2. 2.

    Japanese 55-year reanalysis dataset (JRA-55), with a horizontal resolution of 1.25° × 1.25° for the time period from 1958 to 2019.

  3. 3.

    Twentieth century reanalysis version 2c (20CR), which provides 1.875° × 1.9° horizontal resolution for the period from 1851 to 2014 (Compo et al. 2011). Noted that 20CR only assimilated hourly and synoptic barometric pressure observations, monthly averaged sea surface temperature, and sea ice concentration field.

We note that there are also other commonly used reanalysis datasets such as the National Centers for Environmental Prediction Reanalysis I (Kalnay et al. 1996) and II (Kanamitsu et al. 2002) (NCEP-R1 and NCEP-R2) and the Modern-Era Retrospective analysis for Research and Applications version 1 (Rienecker et al. 2011) and version 2 (Gelaro et al. 2017) (MERRA and MERRA2). These datasets do not fit our analysis here because most of them (NCEP-R2, MERRA and MERRA2) do not cover the full time range from 1960 to 2010. In addition, NCEP-R1 is also not included in our analysis since it has been demonstrated to have poor performance in measuring extreme temperature changes (Donat et al. 2014).

2.2 Methods

We consider a subset of ten extreme temperature indices following the recommendation of the Expert Team on Climate Change Detection and Indices (ETCCDI) (Zhang et al. 2011), which can be divided into three categories:

  1. 1.

    Intensity indices: daytime hot extreme (TXx), nighttime hot extreme (TNx), daytime cold extreme (TXn) and nighttime cold extreme (TNn);

  2. 2.

    Frequency indices: warm days (TX90p), warm nights (TN90p), cold days (TX10p) and cold nights (TN10p);

  3. 3.

    Duration indices: warm spell duration (WSDI) and cold spell duration (CSDI).

The frequency and duration indices are calculated based on the 10th or 90th percentile threshold derived from a commonly used fixed period of 1961–1990 (Zhang et al. 2011). For more calculation details please refer to Table 1.

Table 1 Definitions and calculations of the extreme temperature indices

For all three reanalysis datasets, the TX and TN are not standard output variables. Following Donat et al. (2014), we use the instantaneous 2 m temperature of 06:00 UTC (14:00 local time) as the proxy of the TX and that of 18:00 UTC (02:00 local time) as the TN. The three reanalysis datasets and the observational dataset CN05.1 differ in horizontal resolutions, to avoid the differences in spatial representativeness (Gervais et al. 2014; Pendergrass and Knutti 2018), all datasets are interpolated onto a common 2.0° × 2.0° grid. We use the first-order area-conservative remapping technique (Jones 1999) as implemented in the Climate Date Operators software (CDO, https://code.zmaw.de/projects/cdo).

We compare the consistency across reanalysis datasets and observational dataset in extreme temperature changes under the half-degree warming increment in a spatially aggregated approach which is proposed in Fischer et al. (2013). Since the extreme changes on a single grid are usually not significant, in order to get more significant signals, the extreme temperature indices are calculated on each of the grid point and then regionally aggregated into an area-weighted spatial probability density function (PDF). The probability denotes the percentage of landmass in continental China that experienced a certain change. A nonparametric assessment: Kernel Density Estimation (KDE) (Scott 2015) is used rather than the traditional approach of the PDF estimations to avoid subjective biases when selecting bins. The KDE bandwidth used here follows the Silverman’s Rule (Bashtannyk and Hyndman 2001).

The GMST during 1991–2010 is 0.5 °C warmer than that during the period of 1960–1979 (Hansen et al. 2010). Considering the time period of the observational dataset CN05.1 which is 1961–2015, we take 1961–1979 as the former 19-year period (which does not affect the results) and 1991–2010 as the latter 20-year period. The differences between the latter and former period represent the changes of the extreme temperature under the historical 0.5 °C warming. Following Schleussner et al. (2017) we randomly choose 100 pairs of 20-year period from 1961 to 2010, their differences are calculated for each dataset as internal variability ranges. The 25–75% range is shown in all the figures to be given below. If the observed change is significantly different from what would be expected to be the internal variability, we consider that the change is detectable. Same approaches have been used in Fischer and Knutti (2014).

The Pearson product-moment coefficient of linear correlation is used to measure the consistency between certain reanalysis dataset and the observational dataset CN05.1 at the corresponding locations. The significance of the pattern correlation coefficient (PCC) is assessed by the Student’s t test.

3 Results

3.1 Changes in the intensity indices

Under the past half-degree warming increment, the observational dataset shows a warming change in all four of the extreme temperature intensity indices in most part of continental China (Fig. 1). The daytime and nighttime cold extremes (TXn and TNn) generally exhibit more significant warming than the corresponding warm extremes (TXx and TNx). The changes are spatially inhomogeneous over China, western and north-eastern China has seen more intense warming (and also in Yangtze River Valley in TNn). The changes in seasonal mean temperature shows good linear relationship with the extreme temperature intensity changes (Fig. 2). This indicates that the seasonal mean TX and TN warming background mainly contributes to the corresponding extreme temperature change. The wider scattering spread in DJF than that in JJA possibly implies larger role of internal variability in the extreme changes in the wintertime. The warming of the extreme temperature is more than 1.0 °C in daytime and nighttime hot extremes (TXx and TNx), and even more than 1.5 °C in the daytime and nighttime cold extremes (TXn and TNn) over these regions. The area in central-eastern China between the Yellow River Valley (~ 40° N) and the Yangtze River Valley (~ 30° N) has seen an insignificant cooling of ~ 1 °C in daytime hot extreme (TXx), which might be attributed to the anthropogenic aerosol changes (Wang et al. 2016; Xu et al. 2018).

Fig. 1
figure 1

Patterns of the extreme temperature intensity indices changes under the past 0.5 °C warming increment in China. Differences are calculated between the latter period of 1991–2010 and the former period of 1961–1979. The black dots denote the 5% significance level

Fig. 2
figure 2

Relationship between the seasonal mean daily maximum and minimum temperature changes and the extreme temperature intensity changes under half-degree warming increment. Blue dots are CN05.1, orange dots are JRA-55, green dots are ERA and purple dots are 20CR

Three reanalysis datasets can generally reproduce the key features in spatial patterns of the extreme temperature intensity changes with the severest warming in western and north-eastern China (except for ERA and 20CR in cold extremes). The region with daytime hot extreme cooling in central-eastern China can also be captured by the ERA and JRA-55. All reanalysis datasets show large deviations from the observed extreme temperature changes in western China and the Tibetan Plateau. Different magnitudes and even opposite signs are seen in these regions where the observation stations are sparsely distributed. There are also large climatology discrepancy seen in these regions which may have certain impact on the changes under the 0.5 °C warming, but it is not solid for all indices (see Fig. SI2 and SI3 for more details).

JRA-55 shows satisfying performance in measuring both the spatial patterns and the magnitudes of the extreme temperature changes. In most regions over continental China, the differences between JRA-55 and CN05.1 in extreme temperature changes are less than 0.5 °C. ERA shows evident cold differences of more than 1.5 °C in north-western China and the Tibetan Plateau and less than 0.5 °C in parts of eastern China. The differences between 20CR and CN05.1 in intensity changes are more than 1.5 °C in part of western and north-eastern China and also less than 0.5 °C in parts of central and eastern China.

If the changes of the extreme temperature intensity indices are aggregated in a spatial probability perspective, it is evident that under the past half-degree warming increment, the extreme temperature intensity indices show consistent warming with the PDFs shift to positive (Fig. 3). In the observational dataset CN05.1, more than 88.4%/95.3% of landmass over continental China has experienced a warming by at least 0.5 °C in daytime/nighttime cold extremes (TXn/TNn). Likewise, more than 46.4%/73.0% of China’s landmass has seen at least 0.5 °C warming in daytime/nighttime hot extremes (TXx/TNx). The PDF shapes of the internal variability of extreme temperature intensity indices in the observational dataset all centered on zero and has quasi-normal distributions. The PDFs of the observed changes under the past half-degree warming increment all shift to positive and are located out of the expected internal variability ranges. Hence, the observed changes in extreme temperature intensity are detectable in continental China as in previous studies (Chen et al. 2018; Zhao and Zhou 2019). The reanalysis datasets are consistent with the observation in this regard.

Fig. 3
figure 3

Changes in the extreme temperature intensity indices under the past 0.5 °C warming increment in China. The PDFs in ad represent the percentage of the aggregated landmass in continental China experienced certain changes during the period of 1991–2010 compared to the period of 1961–1979. Solid lines are results derived from the observational and reanalysis datasets. Blue lines are the results derived from the homogenized gridded observational dataset CN05.1, orange lines are JRA-55, green lines are ERA and purple lines are 20CR. Red lines denote a 0.5 °C warming in certain temperature indices. The corresponding light color ribbons represent the internal variability with a range of 25–75%. The percentage of landmass in continental China that sees a 0.5 °C (1.0 °C) warming is given with blue (pink) shading in e

The distribution patterns of PDF in JRA-55 show close agreement with the observation in all four of the extreme temperature intensity indices especially the daytime hot extreme (TXx). The differences in China’s landmass fraction which exhibits a warming by at least 0.5 °C (1.0 °C) are less than 10% (12%) between JRA-55 and observation (Fig. 3e). In comparison, ERA and 20CR manifest weaker consistency with the observational records. ERA tends to have wider decrease tails than the observation in all the PDFs, indicating an underestimation of the warming in the extreme temperatures, especially in western China according to Fig. 1. On the other hand, 20CR shows narrower PDF distributions than the other two reanalysis datasets in hot extremes (TXx and TNx), indicating less spatial diversity in the intensity changes. The deviations between ERA/20CR and observation in China landmass fraction with extreme temperatures warming over 0.5 °C (1.0 °C) are about 34%/32% (41%/44%) respectively, which are larger than that of JRA-55.

The observation and the reanalyses show greater agreement in the PDFs of the daytime and nighttime hot extreme (TXx and TNx) changes than that of the cold extreme (TXn and TNn) especially in ERA and 20CR. Daytime and nighttime hot extreme (TXx and TNx) generally appears in boreal summer and daytime and nighttime cold extreme (TXn and TNn) generally appears in boreal winter over China. 20CR assimilated hourly and synoptic barometric pressure observations, monthly averaged sea surface temperature, and sea ice concentration field (Compo et al. 2011). Summer climate in China is mainly controlled by the East Asian monsoon which is strongly affected by the tropical SST, land-sea thermal contrast and the western North Subtropical High. It means the temperature changes in summer could be well constrained by observed SST and surface pressure which are assimilated in all the reanalysis. In contrast, in boreal winter, climate in China experience larger atmospheric internal variability, like blocking high, polar vortex and cold surge processes which are less constrained by the SST and the surface pressure. So the winter (cold) extreme temperature change shows more uncertainty than the summer (hot) extreme temperature.

3.2 Changes in the frequency indices

For the extreme temperature frequency indices, the observation shows general increases in warm days/nights (TX90p/TN90p) and decreases in cold days/nights (TX10p/TN10p) under the past half-degree warming increment (Fig. 4). Different from the intensity indices, the changes in cold days/nights (TX10p/TN10p) are milder than that of the warm days/nights (TX90p/TN90p). In general, significant changes (increases in hot extremes and decreases in cold extremes) are seen in the Tibetan Plateau, western and north-eastern China in the observation. The changes in warm and cold nights (TN90p and TN10p) are more than a month in most part of these regions. The region in central-eastern China with cooling in daytime hot extreme (TXx) also sees decreases in warm days (TX90p).

Fig. 4
figure 4

Same as Fig. 1 but for extreme temperature frequency indices

The reanalysis datasets reproduce the key features of the observation but with weaker magnitudes. They tend to generally underestimate the increases of the warm days/nights and the decreases of the cold days/nights. There are larger differences over the regions with insufficient observational station coverage such as western China and the Tibetan Plateau.

JRA-55 shows good consistency with the observation (except for TN90p where JRA-55 underestimate the increases in observation in many regions). Both the spatial patterns and magnitudes of the extreme temperature frequency changes are well captured. The differences between JRA-55 and the observation in frequency changes are less than 7 days in half of the China’s landmass (except for TX90p). ERA shows less consistency with the observation, particularly in north-western China and the Tibetan Plateau. 20CR shows better agreement with the observation in TN90p than other frequency indices. The difference between 20CR and the observation reaches 20 days in the north-eastern China and part of the western China in frequency indices (except for TN90p).

In the context of spatially aggregated PDFs, more than 50% of China’s landmass has seen an increase by at least 19/34 days in warm days/nights (TX90p/TN90p) and a decrease by at least 7/21 days in cold days/nights (TX10p/TN10p) in the observational records (Fig. 5). The PDFs of the changes in warm indices (TX90p and TN90p) generally spread wider than the cold indices (TX10p and TN10p), which denotes that the changes in warm days/nights are more spatially inhomogeneous in continental China (as in the spatial patterns in Fig. 4). Like the extreme temperature intensity indices, the anthropogenic influence is also detectable in the changes of the frequency indices. While only few grid points exhibit significant changes (Fig. 4), the observed spatially aggregated PDFs all shift out of the expected internal variability ranges.

Fig. 5
figure 5

Same as Fig. 3 but for frequency indices. Red lines denote 50% of aggregated China’s landmass. The blue (pink) shading in c represents changes in duration indices in 50% (warmest 25%) of China’s landmass

The reanalysis datasets generally show reasonable agreement with the observations in the frequency indices changes except with lower magnitudes. We calculate the changes of frequency indices in 50% (blue shading) and the warmest 25% (pink shading) of landmass in continental China (Fig. 5e). Noted that different from intensity extremes, the climatology of each frequency extreme varies in a wide range. Changes in the absolute value of each indice are not comparable. JRA-55 shows good agreement with the observation in the frequency changes except for TN90p. In half of China’s landmass, the differences between JRA-55 and the observation are less than 1 day in TX90p, TX10p, ~ 7 days in TN10p, and ~ 13 day in TN90p. There are more pronounced lower tails of the PDF distributions in the warm frequency indices (TX90p and TN90p) and more pronounced higher tails in the cold frequency indices (TX10p and TN10p) in ERA. This indicates that ERA underestimate the increases of warm days/nights (TX90p/TN90p) and the decreases of cold days/nights (TX10p/TN10p). The difference in frequency changes between ERA and CN05.1 is about  4–13 days in half landmass of continental China. 20CR shows good agreement in warm nights (TN90p) with a difference less than 5 days compared to observation in half of China’s land mass, while large deviation of 17 days can be found in cold nights (TN10p).

3.3 Changes in the duration indices

The observed spatial patterns of the extreme temperature duration indices changes under the past half-degree warming increment (Fig. 6a, e) are like that of the extreme temperature intensity and frequency changes. The extension in warm spell duration (WSDI) and shortening in cold spell duration (CSDI) are most significant over western and north-eastern China and weaker over central-eastern China. The warm spell is about 8–12 days longer and the cold spell is about 3–5 days shorter in western and north-eastern China under the 0.5 °C warming increment.

Fig. 6
figure 6

Same as Fig. 1 but for duration indices

The increase in warm spell duration (WSDI) and decrease in cold spell duration (CSDI) are generally captured by reanalysis (Fig. 6), while in regions with sparse observational station coverage, particularly over Tibetan Plateau, all reanalysis datasets show large deviations from the observed extreme temperature duration changes. JRA-55 reproduces most of the key features in the observation. In eastern China where the observational records are rich, the differences between JRA-55 and the observation in duration changes are less than 2 days, while the differences in observation-sparse western China are relatively larger. ERA and 20CR have weaker performance compared to JRA-55. It tends to underestimate the extension of the warm spell and the shortening of the cold spell mainly in western China. The deviation from observation in warm spell duration (WSDI) changes is more than 10 days over vast area of western and north-eastern China in 20CR and more than 10 days over western China in ERA.

We can detect significant changes in duration indices that emerge from internal variability in a spatially aggregated perspective as their PDF of the changes shift out of the expected internal variability (Fig. 7). In the observational records, the warm spell extends by at least 8 days and the cold spell shortens by approximately 3 days over half of total land fractions in continental China. The anthropogenic influence on the warm spell duration (WSDI) is stronger than that on the cold spell duration (CSDI), as the observed changes in the warm spell duration (WSDI) are more significantly different from what would be expected to be the internal variability.

Fig. 7
figure 7

Same as Fig. 3 but for extreme temperature duration indices. Red lines denote 50% of aggregated China’s landmass. The blue (pink) shading in c represents changes in duration indices in 50% (warmest 25%) of China’s landmass

All the reanalysis datasets can capture the general extension tendency of the warm spell and the shortening tendency of the cold spell under the half-degree warming increment in the spatially aggregated perspective, but the magnitudes of the changes are underestimated (except for 20CR in WSDI). Like the observation, the anthropogenic influence signal is less clear in cold spell duration (CDSI) as the PDF distributions of reanalysis datasets manifest limited differences to the expected internal variability. JRA-55 shows good agreement with the observation. The differences between JRA-55 and the observation are less than 1 day in warm spell duration (WSDI) and ~ 2 days in cold spell duration (CSDI) changes in half of continental China. ERA generally underestimates the magnitudes of the extreme temperature duration changes with a difference of ~ 3 days in warm spell duration (WSDI) and ~ 2 days in cold spell duration (CSDI) from the observation. 20CR shows a general warm deviation compared to the observation. It has a difference of ~ 7 days in the warm spell duration (WSDI) changes and ~ 3 days in the warm spell duration (WSDI) over the 50% landmass in continental China.

3.4 Spatial correlation and RMSE analysis

We try to clarify the advantage and disadvantage of reanalysis datasets on revealing the extreme temperature changes based on comprehensive consideration of PCC and Root Mean Squared Error (RMSE) with the observational dataset (Fig. 8). JRA-55 shows generally good consistency with the observation in of extreme temperature changes under the past half-degree warming increment. In China's mainland, the PCC of the extreme temperature changes between JRA-55 and the observation range between 0.74 to 0.94. The RMSE is less than 0.9 °C in intensity indices, ~ 6–18 days in frequency indices and ~ 2–5 days in duration indices. ERA shows generally lower spatial correlation coefficients with the observation than JRA-55 in continental China, with PCC less than 0.74 in all extreme indices. The RMSE are between 0.8 and 1.5 °C in intensity indices, ~ 13–26 days in frequency indices and ~ 3–6 days in duration indices. 20CR shows large uncertainties across different indices with the PCC range between 0.43 and 0.94. RMSE are between 0.5 and 1.6 °C in intensity indices, ~ 10–20 days in frequency indices and ~ 4–10 days in duration indices. It shows generally better performances in hot extreme which mainly occurs in summer. As we have addressed before, by assimilating SST and surface pressure, 20CR may have better performances in summer large-scale circulations which are mainly controlled by the SST and surface pressure.

Fig. 8
figure 8

a Pattern correlation coefficient and b RMSE of the extreme temperature indices changes under the past 0.5 °C warming increment between the observational dataset and the reanalysis datasets in China's mainland and two subregions. Every three column represent a specific reanalysis dataset. Column “CN” represents the mainland of China, “W” represents western China (west of 100° E) and “E” represents eastern China (east of 100° E)

CN05.1 is a homogenized gridded dataset interpolated from more than 2400 stations in China. The observation stations have inhomogeneous spatial distributions in western and eastern China. Only ~ 9% of the stations locate in the regions west of 100° E in China. Given this distribution asymmetry, we divided continental China into two subregions: western China (regions west of 100° E) where the observational network is sparse, and eastern China (regions east of 100° E) where the observation stations are more homogenously and densely distributed. The PCC in extreme temperature changes between the reanalyses and the observation are significantly higher in eastern China than that in western China (except for 20CR in percentile-based warm indices) and the RMSE are generally lower. The PCC of extreme temperature changes between JRA-55 and the observation range between 0.69 to 0.93 in western China, which are lower than that in eastern China in all ten of the extreme temperature indices. Same features are also evident in ERA. 20CR only shows higher PCC in eastern China in intensity indices but RMSE is lower in the eastern China in each indices. The observation sparse regions have more uncertainties due to less observational constraints in reanalysis datasets assimilations. Therefore, it is difficult to estimate which of the reanalyses has a more realistic behavior than others in these regions (Donat et al. 2014).

4 Conclusion and discussion

Due to the high impact of climate extremes on nature and social system, its changes in the 1.5 °C and 2 °C warmer worlds are of global concern on the post-Paris science agenda. While the extreme temperature changes under the 0.5 °C additional warming increment measured by observational datasets are hoped to be useful references for adaptation activities, the observational datasets have suffered from insufficient spatial consistency and temporal continuity in observational stations. The reanalysis datasets are hoped to be useful substitutes of the observations but the quality needs to be assessed. In this study, we evaluate the consistency of the extreme temperature changes under the past half-degree GMST warming increment in continental China across observation and three reanalysis datasets in a spatially aggregated perspective. The major findings are summarized below:

  1. 1.

    All three reanalysis datasets can generally capture the increases in warm extremes and the decreases in cold extremes as the observation over continental China under the historical 0.5 °C warming increment. The spatial distribution patterns are reasonably reproduced. Regions with significant extreme temperature changes such as western and north-eastern China are captured by the reanalyses, but large spreads are seen across datasets, which may be attributed to less observational constraints over these regions in the reanalysis data assimilation (Donat et al. 2014). In a spatially aggregated perspective, the reanalysis datasets show comparable signs but weaker magnitudes in the extreme temperature changes compared to the observation. The anthropogenic influence on the extreme temperature changes under the past half-degree warming increment can be detected in the reanalysis datasets as in the observation.

  2. 2.

    JRA-55 shows good consistency with the observation. Both the spatial patterns and the actual values of the extreme temperature changes are reasonably captured, as evinced by PCC between 0.74 (CSDI) to 0.94 (TN10p) with the observation. The difference in intensity indices changes in China’s landmass that sees at least 0.5 °C warming between JRA-55 and the observation is less than 10.0%. ERA tends to underestimate the magnitudes of the extreme temperature changes. It has large differences with the observation (PCC less than 0.74) especially in western China and the Tibetan Plateau where the observation stations are sparsely distributed. 20CR shows large uncertainties across extreme indices with PCC range between 0.43 (CSDI) to 0.94 (TN90p). It shows generally better consistency in hot extremes than in cold extremes. Because the temperature changes in summer could be well constrained by observed SST and surface pressure assimilated in the reanalysis dataset.

Finally, we present a discussion on the strengths and weaknesses of reanalysis datasets in measuring the extreme temperature changes in continental China. Our evaluation indicates that all three reanalysis datasets manifest weaker performance in revealing the extreme temperature changes in western China and the Tibetan Plateau. This kind of weakness is resulted from less observational constraints in data assimilation process of reanalyses. Among the three datasets, JRA-55 has assimilated more observational data archived by the Japan Meteorological Agency (JMA) particularly over East Asia such as surface temperature, rain-gauge precipitation data etc., which makes it outperform other reanalysis in reproducing the climate changes here (Chen et al. 2014; Huang et al. 2018; Chen et al. 2019). In comparison, 20CR only assimilated hourly and synoptic barometric pressure observations, monthly averaged sea surface temperature, and sea ice concentration field (Compo et al. 2011), and this has led to less observational constraints to the 2 m temperature.

The aerosol concentration changes may significantly affect both mean state and extreme temperature in China. Qian et al. (2011) uses regional simulation together with observation of aerosol properties and shows that due to direct and indirect aerosol effect a cooling trend has been seen in various regions in China during the last decades of the twentieth century. Huang et al. (2006) developed a regional coupled climate-chemistry-aerosol model to examine the impacts of aerosol on surface temperature over East Asia. The experiment shows that the daytime surface temperature cools ~ 0.7 °C over the industrialized parts of China. The future projected changes in temperature extremes are also affected by aerosol changes in different Representation Concentration Pathways (Li et al. 2016a, b; Wang et al. 2016, 2017; Xu et al. 2018).

In reanalysis datasets, the assimilated aerosol determines the calculation of the solar radiation in certain extent (Zhou et al. 2017; Du et al. 2018). Aerosol manifests sophisticate temporal and spatial variations in China (Yu et al. 2003; Li et al. 2016a, b). A “realistic” aerosol optical depth (AOD) data used in the reanalysis assimilation system has been proved to benefit on characterizing the regional warming (Zhou et al. 2017). JRA-55 has used a climatology annual cycle AOD data based on observations from the Moderate Resolution Imaging Spectroradiometer (MODIS) and the Total Ozone Mapping Spectrometer (TOMS) (Kobayashi et al. 2015). In comparison, ERA-40 uses model designed climatology AOD distributions (Tanré et al. 1984) and ERA-Interim uses model designed monthly climatology AOD (Tegen et al. 1997) which has more homogeneous spatial distribution compare to the MODIS observation assimilated in JRA-55 (Benedetti et al. 2008). 20CR only assimilated the volcano aerosols and also has biases with the observation both in temporal and spatial changes. The lack of AOD time evolution and realistic spatial distribution in these two reanalysis datasets partly explains their biases in reproducing the extreme temperature changes.

Besides, the anthropogenic influences are detectable on extreme temperature changes as we presented previously in the observational datasets. But with climatology observed/model designed aerosol concentration in JRA-55/ERA and only volcano aerosol in 20CR, the reanalysis datasets may still has certain biases with observation in extreme temperature changes under anthropogenic influence which should be treated with care.