1 Introduction

Extreme temperature events have a great influence on health, the economy and the natural and built environments (AghaKouchak et al. 2020). Heat waves (HWs) are one of the most worrying effects of climate change, and maximum daily temperature is a relevant variable. Heat waves and cold waves are periods of abnormally high or low temperatures. The World Health Organization (WHO 2020) and the World Meteorological Organization (WMO 2015) rate HWs as one of the most dangerous meteorological events.

Extreme climate anomalies such as those in Texas and Oklahoma in 2011 and Moscow in 2010 are a consequence of global warming (Hansen et al. 2012). Global warming will increase the frequency and severity of heat waves while cold waves decline. At the same time, the number of people exposed to heat waves in Europe will increase (EU 2021).

Climate change forecasts show an increase in temperatures in the twenty-first century, with HWs becoming more frequent and more intense (Fisher and Schär 2010), so that the effects of high temperatures increase. According to Muller et al. (2016), the probabilities of extremely hot summers in many regions of the world are now about ten times greater.

HWs affect socio-economic activities such as water supply, food and livelihood production, energy and transportation, among others. Extreme high temperatures are increasingly affecting crops, with yields showing a decline (Zhu and Troy 2018). A wide range of terrestrial ecosystems is also exposed to HWs. A case in mind is that forests: Allen et al. (2010) show that drought and heat stress associated with climate change could seriously affect the composition, structure and biogeography of forests in many regions. The impact of the legacies of past forest management is analysed by Heres et al. (2021) to explain the current responses of different tree species to climate change.

One of the main negative effects of HWs is the additional mortality that they cause. HWs increase morbidity and mortality in risk groups such as the elderly, children and people who suffer from cardiovascular and respiratory diseases (Fisher and Schär 2010; Campbell et al. 2018). The World Health Organization (WHO 2020) reports that from 1998 to 2017 more than 166,000 people died due to heat waves, including more than 70,000 during the 2003 heat wave in Europe.

The future heat-related mortality induced by climate change in 12 US cities is analysed by Lo et al. (2019). The expected future mortality caused by HWs in two Spanish cities is analysed by Abadie et al. (2019) using a single climate model with two RCP scenarios and also calculating risk measures such as the 95% percentile and the mean of the average of the 5% of worst cases. The impact of HWs on mortality for each Spanish provincial capital over the periods 2021–2050 and 2051–2100 under the RCP 8.5 scenario is analysed by Díaz et al. (2019) with and without adaptation.

Global climate models (GCMs) can be used to estimate the impact of extreme future events, but there is a great deal of uncertainty in their forecasts (IPCC 2012). The probabilistic future behaviour of HWs in the city of Madrid is studied by Abadie and Polanco-Martínez (2022) using twenty-one global climate models under RCP 8.5 and RCP 4.5 scenarios, modelling HWs with three stochastic processes (number per annum, duration and intensity). These distributions are combined with an epidemiological model to obtain expected future mortality and risk measures for the city.

However, when used at a local or regional scale, GCMs can predict more widely differing values. It is therefore advisable to analyse which models perform best for a specific application and a specific location. There may be differences between neighbouring areas, e.g. rural and urban. For example, López-Bueno et al. (2021) analyse and compare the effects of high temperatures on daily mortality in urban and rural populations in the province of Madrid.

Some papers investigate the performance of climate models, e.g. that of Panjwani et al. (2020), who compare the use of six global climate models in simulating extreme temperature events in the regions of India from 1976 to 2005. Their calculations show a hot bias for Central India and a cold bias in the Himalayan region; furthermore, two models perform better than the others. These authors use root mean square errors, correlation coefficients and an agreement index.

Our paper takes a different approach, focusing on the evolution of maximum temperature dynamics under uncertainty. Temperatures are assumed to have a deterministic part and a stochastic part. The trend, seasonality and stochastic behaviour of historical maximum daily temperatures are analysed in nine cities in Southern Spain for a period of 30 years, using an Ornstein-Uhlenbeck stochastic model with jumps. Twenty-one downscaled GCMs are then used for 2006–2019 under scenarios RCP 4.5 and RCP 8.5 to calculate the characteristics (stochastic and deterministic) and compare them to the current data for the same period by applying measures of goodness of fit and classifying GCMs according to these results for each city and climate scenario. The proposed methodology allows selecting for a city one or several climate models that are behaving more closely to actual data, taking into account the deterministic and stochastic behaviour of temperatures. The approach allows focus on the expected value and/or in the risk component in the last case using the stochastic part with better goodness of fit.

The rest of the paper is organised as follows: Section 2 describes the materials and methods used. Section 3 presents and discusses our results. Section 4 presents the main conclusions.

2 Materials and methods

2.1 Data

This study uses daily maximum near-surface air temperatures (tmax) for nine cities in southern Spain. There are four cities very close to the coast (Huelva, Cadiz, Malaga and Cartagena) and five inland cities (Seville, Cordoba, Jaén, Granada and Murcia). Future daily maximum temperatures and their behaviour for 2006–2100 are drawn from the 21 models of NASA Earth Exchange Global Daily Downscaled Projections (NEX-GDDP)Footnote 1, where there is only one time series of maximum temperatures for each model. The NEX-GDDP comprises downscaled climate scenarios for the globe, derived from a general circulation model from the Coupled Model Intercomparison Project Phase 5 (CMIP5), including projections for RCP 4.5 and RCP 8.5 from 21 models and scenarios for which daily scenarios were produced and distributed under the CMIP5 (Thrasher et al. 2012; NEX-GDDP 2021). Table 1 shows these 21 GCM models.

Table 1 GCM models

Historic data are drawn from the E-OBS gridded database (https://www.ecad.eu/) and correspond to 1990–2019, i.e. 30 years, with 10957 values for each city. Figure 1 shows past maximum daily temperatures for the nine cities from 1990 to 2019 (30 years). This figure shows that seasonality is an important component. Table 2 shows some statistics for daily maximum temperatures for the nine cities (1990–2019).

Fig. 1
figure 1

Historical maximum daily temperatures for the nine cities from 1990 to 2019

Table 2 Maximum daily temperature statistics (1990–2019)

2.2 Stochastic model specification

The maximum daily temperatures \({tmax}_t^i\) for each city i on day t is shown in Eq. (1) broken down as the sum of two components: a deterministic component, fi(t), and a stochastic component, \({X}_t^i\):

$${tmax}_t^i={f}^i(t)+{X}_t^i$$
(1)

The second element \({X}_t^i\) is a mean reverting jump diffusion process, while the deterministic part includes a linear trend (Bloomfield 1992) and a seasonal component. The seasonality takes a trigonometric form, which gives a smooth seasonal pattern along with a parsimonious formulation (Campbell & Diebold, 2005). We consider only low seasonal frequencies that are statistically significant for the historical data (1990-2019). Putting all this together, the deterministic component of the model is:

$$\kern0.75em {f}^i(t)={\beta}_1^i+{\beta}_2^it+{\beta}_3^i\ \sin \left(2\pi t\right)+{\beta}_4^i\ \cos \left(2\pi t\right)+{\beta}_5^i\ \sin \left(4\pi t\right)+{\beta}_6^i\ \cos \left(4\pi t\right)+{\beta}_7^i\ \cos \left(6\pi t\right)$$
(2)

where time is measured in years, so the t-index for the series observed ranges from 1/365.25 to the number of years (T=20); \({\beta}_1^i\) and \({\beta}_2^i\) are the intercept and the slope of the linear trend, respectively; and \({\beta}_3^i\) to \({\beta}_7^i\) are the seasonal parameters of the i-th city.

The stochastic component is assumed to follow an Ornstein-Uhlenbeck (O-U) process, which is a flexible way of modelling dependence structures (Barndorff-Nielsen and Shephard 2001). For example, Swishchuk and Cui (2013) propose daily average temperature models driven by O-U processes without jumps for data from Canadian cities for weather derivative pricing applications. The O-U process with jumps is described by the following stochastic differential equation:

$$d{X}_t^i=\left({\alpha}^i-{\kappa}^i{X}_t^i\right) dt+{\sigma}^id{W}_t^i+{J}_t^i\left({\mu}_J^i,{\sigma}_J^i\right)d{q}_t^i$$
(3)

Equation (3) consists of three independent parts. The first term is the drift, which implies mean reversion, where the current stochastic part of the daily maximum temperature in the city i tends to level \(\frac{\alpha^i}{\kappa^i}\) in the long term, with a reversion speed κi > 0. \({W}_t^i\) is a standard Wiener process with stationary increments \(d{W}_t^i\). The volatility of the mean reverting process is σi. The third term of Eq. (3) is a Poisson process with a rate of arrival \({\lambda}^i;d{q}_t^i\) is a Poisson process such that \(d{q}_t^i=1\) with probability λidt and \(d{q}_t^i=0\) with probability 1 − λidt. The jump size is \({J}_t^i\left({\mu}_J^i,{\sigma}_J^i\right)\), which is \(iid\ N\left({\mu}_J^i,{\sigma}_J^i\right):\) if there is a jump in maximum temperature, its size is normally distributed with mean \({\mu}_J^i\) and volatility \({\sigma}_J^i\). Finally, the processes \(d{W}_t^i\) and \(d{q}_t^i\) are independent. Note that Eq. (3) can have negative values, and also, the maximum daily temperature can be negative.

The parameter values of the stochastic part are calculated using maximum likelihood estimation. The Appendix shows how this method is applied to Eq. (3).

2.3 Goodness of fit indicators

The most common indicator for assessing the forecasting performance of each model is the root of the mean square error RMSE (Gleckler et al. 2008). Let \({A}_t^i\) be the value of the variable at time t in city i and let \({F}_t^i(j)\) be its forecast given by model j, the RMSE of model j when predicting the maximum temperature in city i for t = 1, …, n periods is defined as:

$${RMSE}^i(j)=\sqrt{\frac{1}{n}\sum_{t=1}^n{\left({A}_t^i-{F}_t^i(j)\right)}^2}=\sqrt{MSE^i(j)}$$
(4)

The assessment of the goodness of fit per city is complemented by an overall indicator of the performance of each model which is computed by aggregating these measures across cities. The root of the total mean square error of model j, RMSE(j), is as in Eq. (4) but with the square errors averaged for the m cities. Then, this indicator is:

$$RMSE(j)=\sqrt{\frac{1}{mn}\sum_{i=1}^m\sum_{t=1}^n{\left({A}_t^i-{F}_t^i(j)\right)}^2}=\sqrt{\frac{1}{m}\sum_{i=1}^m{MSE}^i(j)}$$
(5)

Finally, to facilitate comparability between models, we normalise the mean error of a model as follows (Gleckler et al. 2008, Ruosteenoja 2021):

$${RMSE}^{\prime }(j)=\frac{RMSE(j)- median\left( RMSE(j)\right)}{median\left( RMSE(j)\right)}$$
(6)

In this way, RMSE(j) measures how well a model j is doing relative to the median value of the 21 models: a negative (positive) value indicates that the model fits better (worse) than the median model.

3 Results and discussion

3.1 Model application with historical data (1990–2019)

Calibrating the seven parameters in Eq. (2) with daily maximum temperatures from 1990 to 2019 using the least squares method gives the results shown in Table 3. Standard errors are calculated with the heteroscedasticity and autocorrelation (HAC) robust method (Newey and West 1987).

Table 3 Parameters of the deterministic components (1990–2019)

More than 75% of the \({tmax}_t^i\) variation is due to the deterministic component, since the R2 values of the estimations are between 0.7555 for Cadiz and 0.8321 for Malaga. The slope of the trend is always positive and statistically significant at the 0.1% level for eight cities and at the 5% for Seville. That is, 30-year data show an increase in maximum temperatures over time in all nine cities. This is in line with the findings of other authors such as Gadea and Gonzalo (2020 and 2021) for Stockholm, Milan and Madrid, among others, and Diebold and Rudebusch (2019) for fifteen US cities. Seville and coastal cities such as Cartagena and Huelva show lower increases in the maximum temperature trend. By contrast, Granada has the highest growth rate in the trend.

The seasonal annual \(\left({\beta}_3^i\ \textrm{and}\ {\beta}_4^i\right)\) and biannual \({\beta}_5^i\) parameters are significant at the 0.1% level for all cities. However, the biannual seasonal parameter \({\beta}_6^i\) is not significant for Cordoba and Murcia, while the last seasonal term cos(6πt) is not statistically relevant for Malaga. Figure 2 shows the seasonal components of coastal cities (panel a) and inland cities (panel b). As expected, in general, the seasonal temperature changes are greater in non-coastal cities, although there are differences between them. For example, the seasonal variation in Murcia, which is located near the Mediterranean Sea, is close to that of coastal cities. The four coastal cities show similar seasonal patterns (panel a).

Fig. 2
figure 2

Seasonality of daily maximum temperatures (1990–2019)

As an example, Fig. 3 shows the maximum daily temperature in Seville, with its deterministic part (upper panel) and its stochastic component once the trend and the seasonality have been removed (lower panel). This figure shows that the deterministic part can be used to forecast the maximum temperature on a particular day in the near future, given that the expected value of the stochastic part is zero.

Fig. 3
figure 3

Daily maximum temperature in Seville (1990–2019) with deterministic and stochastic parts

Once the deterministic part of the maximum daily temperature is removed, the analysis continues with the decomposition of the stochastic component. Table 4 shows some statistics of the estimates of \({X}_t^i\). Doornik-Hansen test rejects the normal distribution hypothesis in all cases. Three cities located in the Mediterranean area (Malaga, Cartagena and Murcia) present both positive skewness and excess of kurtosis. On the other hand, these coefficients are negative for four inland cities (Seville, Cordoba, Jaén and Granada). Moreover, the autocorrelation analysis reveals covariance-stationary dynamics. A mean-reverting process with jumps of the Ornstein-Uhlenbeck type is able to capture these stylised facts, so such a process is applied to the stochastic part of the maximum daily temperature.

Table 4 Stochastic components of \({tmax}_t^i\) statistics (1990–2019)

The parameter values of the stochastic part and the confidence intervals are shown in Table 5. There are jumps with a high mean reverting speed, the price volatility is high and in the jump case, negative values with significant volatility are expected.

Table 5 Parameters of the stochastic components (1990–2019)

In the case of Seville, when there are no jumps, the stochastic component shows a level of \(\frac{\alpha^{Sev}}{\kappa^{Sev}}=2.90\) to which the detrended and seasonal adjusted maximum daily temperature tends in the long term, with a reversion speed of κSev = 84.02. In this case, with no jumps, this variable deviates from long-term equilibrium because of the volatility σSev = 30.221. Since time is measured in years, there is a probability λidt of a jump in the seasonally adjusted and detrended maximum temperature on a specific day, with dt = 1/365. For Seville, there is a probability λSevdt = 0.393 of a jump on a given day. When there is a jump, its size is normally distributed with mean \({\mu}_J^{Sev}=-1.699\) and volatility \({\sigma}_J^{Sev}=2.103\).

The expected value of the stochastic part is always zero. Except for the two cities on the Mediterranean coast (Malaga and Cartagena), when there is a jump, its expected value is negative. These jumps are offset by an expected positive value in the mean-reverting part because the level αi/κi to which the detrended and seasonal adjusted maximum daily temperature tends in the long term is positive for these six cities. In the case of Cartagena and Malaga, the opposite happens because the expected value of the jump is positive. This explains the signs of αi and \({\mu}_J^i\) obtained for these two cities RMSE’(j).

3.2 Assessment of climate model performance (2006–2019)

The climate models show data from 2006 to 2100. The results for the 2006–2019 subsample can be compared with actual data. In this work, no historical data from models are used. In the period analysed (2006–2019), the differences in the models between the RCP4.5 and RCP8.5 scenarios are minor compared to the growing impact in future decades. This section assesses the fit of the 21 climate models (see Table 1) to the actual maximum daily temperature in the nine cities in southern Spain, and their deterministic and stochastic components as established in the model (1).

Table 6 summarises the results of the fit of the models to the data under scenarios RCP 4.5 and RCP 8.5. All RMSE are in a narrow interval, from 4.02 to 4.43. The magnitude of the differences between scenarios RCP 4.5 and 4.8 is also very small, although their model relative rankings do not match.

Table 6 Forecast of maximum daily temperature \({tmax}_t^i\), (2006–2019)

INMCM4 is the median or typical model under scenario RCP 4.5, with RMSE = 4.19, and the other twenty models are between the bounds (−4%, +6%) around this median value. There is evidence that ACCESS1-0 and CCSM4 outperform the rest of the models in the context of RCP 4.5. ACCESS1-0 model is the best in the overall indicator, with RMSE equal to 4.02 (3.9% below the median). It also has the smallest RMSE in 3 of the 9 cities (Cadiz, Murcia and Cartagena), whereas CCSM4 is the selected one for Huelva, Córdoba and Jaén.

Under scenario RCP 8.5, the median RMSE is 4.15 (CCSM4 model), and MIROC-ESM-CHEM has the best average performance (2.9% below the median). Panjwani et al. (2020) conclude that this model also performs relatively better in capturing temperature extreme events over the Indian region. According to the indicator per city in column (7), none of the models dominates. By contrast, BCC-CSM1-1 shows the worst results in both circumstances, RCP 4.5 and RCP 4.8.

Figure 4 summarises the RMSEs of the nine cities using a colour scheme from yellow (low values) to red (high values). These heatmaps show that all models are better able to reproduce the maximum temperature of coastal cities, such as Cartagena or Málaga, than inland cities such as Granada, Cordoba or Jaén that have greater seasonal temperature changes.

Fig. 4
figure 4

Heatmap of RMSE of historical daily maximum temperatures by city (2006–2019)

To end this section, we analyse the ability of the 21 models to forecast the deterministic and stochastic components found in actual daily maximum temperatures. To that end, the model (1)–(3) is calibrated both for the actual data and for the climate model data for the forecasting period, from 1/1/2006 to 31/12/2019. The model is recalibrated for the observed series with this subsample to avoid the effects of structural changes. Some authors have documented changes in the trend or seasonality patterns of climate time series. For example, Gay-Garcia et al. (2009) provide strong evidence in favour of a trend-stationary process with a structural break in the global and hemispheric temperature data-generating process, and Diebold and Rudebusch (2019) find that the seasonality of DTR (difference between the daily maximum and minimum temperatures) in the USA may be changing over time.

Tables 7 and 8 summarise the results of calibrating the deterministic and stochastic elements with actual data from 2006 to 2019. After a comparison of this seasonal pattern with that given in Eq. (2) and Table 3, the last term \({\beta}_7^i\cos \left(6\pi t\right)\) was eliminated because the parameter \({\beta}_7^i\) was not significant at the 5% level in all cases. This change is supported by the Chow test of the null hypothesis of structural stability of seasonality versus the hypothesis of a break on 1/1/2006. This test statistic is a Wald value based on the robust estimator of the covariance matrix for the extended regression with evolving seasonality. All but three reject the hypothesis of seasonal stability hypothesis: there is no evidence of a break in seasonality at the 5% level for the Mediterranean cities of Murcia and Cartagena (and for these two, the dropped term \({\beta}_7^i\) is also not relevant over the whole sample at 5%, see Table 3).

Table 7 Parameters of the deterministic component (2006–2019)
Table 8 Parameters of the stochastic components (2006–2019)

Tables 3 and 7 also reveal that the trend slope for the subsample is always steeper than that for the whole period. Regarding the stochastic component, Table 8 shows that the signs and the significance of the parameters are unchanged, but in this subsample, the parameter of the drift αi and the mean of the jump \({\mu}_J^i\) are not significant for Malaga at 5%.

Summaries of the performance of the 21 models in predicting the deterministic and stochastic components are shown in Tables 9 and 10, respectively. The size of RMSE(j) reveals that the stochastic component is the main source of uncertainty in maximum temperature. Under scenario RCP 4.5, the ACCESS1-0 model is not only the most successful in predicting total \({tmax}_t^i,\) but it also has the best mean square error for the stochastic part in five cities (Cadiz, Malaga, Granada, Murcia and Cartagena) together with the derived overall indicator RMSE(j). On the other hand, the model with a better indicator in predicting the deterministic pattern is model IPSL-CMSA-LR. The great variability in the behaviour of MPI-ESM-MR is striking: it has the worst fit for the deterministic pattern in Cadiz, Malaga and Jaén, but it is the most accurate in Huelva, Seville Córdoba and Murcia. The results by cities conclude that GCM model BCC-CSM1-1 provides the worst accuracy when replicating the deterministic component in four cities.

Table 9 Forecasts of deterministic component fi(t) (2006–2019)
Table 10 Forecasts of the stochastic component \({X}_t^i\) (2006–2019)

Comparisons of the temperature forecasts and their breakdown into deterministic and stochastic components in scenario RCP 8.5 show discrepancies between the different criteria. According to the criterion selected, the models that best replicate deterministic patterns are MIROC-ESM-CHEM (overall RMSE(j)), and CanESM2 (city indicators RMSEi). For the stochastic part, the model selected with the first criteria is again MIROC-ESM-CHEM, while the counts of best RMSEi per city do not indicate a model that clearly outperforms the rest. As in scenario RCP 4.5, for the deterministic component, the results are mixed (or contradictory): for example, the CanESM2 model is the best for Malaga, Jaén and Granada, but the worst for Huelva and Seville. This may be the reason for the poor overall performance of this model, together with IPSL-CM5A-MR. Finally, the relatively poor performance of the GCM model BCC-CSM1-1 in predicting temperature \({tmax}_t^i\) under scenarios RCP 4.5 and RCP 8.5 may be due to the poor prediction accuracy of its two components and, especially, of the stochastic part.

Some papers have looked at similar problems as Ruosteenoja (2021) and Gleckler et al. (2008) using a somewhat different methodology and databases. Gleckler et al. (2008) analyse the performance of CMIP3 climate models during the twentieth century and propose a model climate performance index (MCPI), which is a composite index of the RMSEs calculated for a set of relevant climate variables. They obtain that the relative ranking of models varies considerably for each variable. Ruosteenoja (2021) analyses the CMIP6 model performance for northern and southern Europe. This author calculates a performance index using 1981–2010 data, that is historical model runs. Table 11 shows the main differences with this work.

Table 11 Methodology comparison

4 Conclusions

Information from general circulation models can be used for future prediction of variables that affect health, economic activities and ecosystems. HWs are one of the most significant effects of climate change, and they are manifested mainly through daily maximum temperatures. However, numerous GCMs are used in practice, and when downscaled to a local or regional scale they can give very different results. Selecting the model(s) used in an application is thus an important issue.

This paper first analyses the historical daily maximum temperatures (1990–2019) from nine cities in southern Spain and found that the temperature trend is always positive and statistically significant. Also, the stochastic parameters are calculated showing the historic volatility behaviour.

Second, this paper presents a method for selecting a GCM model for local application under uncertainty. The ability of twenty-one GCMs to predict maximum daily temperatures in Southern Spain is investigated for nine cities under scenarios RCP 4.5 and RCP 8.5, compared with actual data from the period (2006–2019). The results show substantial differences in the goodness of fit.

In order to analyse the time series behaviour of actual and model data, a decomposition into a deterministic part (trend and seasonality) and a stochastic part are used. The stochastic part is a mean-reverting Ornstein-Uhlenbeck process with jumps. This approach enables the value of both deterministic and stochastic parameters to be analysed and used for a specific application based on the expected values and/or the risk measures depending on each case study. Note that the risk measures are associated with the volatility. A model or an ensemble of models can be selected for an application.

The proposed methodology can be applied with some parallel runs of the same model and after calculating the model performance assuming a probability for each parallel run. Note that the result obtained is not necessarily valid for other locations or other climate variables, but the method proposed could be used to select a suitable GCM to be used in each case. The use of the proposed methodology depends on the availability of data, with more years of information it would be more efficient.