1 Introduction

The magnitude and the frequency of high-temperature extreme events have increased remarkably in the recent decades as a result of climate change (IPCC 2013, 2021; Mishra et al. 2015; Manning et al. 2019). In the absence of mitigation policies, global warming is expected to continue during the twenty-first century, and extreme warm temperature events will become more intense and frequent and they will last longer, while extreme cold events are expected to be less likely (IPCC 2013, 2021). Certain regions which are already characterized by warm and hot summer temperatures, like the Mediterranean Europe, are notably concerned (IPCC 2013, 2021; Stegehuis 2016; Manning et al. 2019) and will be impacted by global warming in numerous and diverse fields. The aeronautical sector is among these vulnerable fields. The impacts of climate change on aviation are various and numerous (Thompson 2016; Burbidge 2016; Ryley et al. 2020; Gratton et al. 2022). This study is particularly motivated by those impacts related to the increase in high temperature extremes at the ground level over the airports, directly affecting aircraft performances at takeoff and landing, and airport’s operability.

Higher temperatures are linked to lower aircraft lift and engine thrust. An increase in temperature results in a decrease in air density, which would force the plane to reach a faster speed through thinner air to generate lift at takeoff (Anderson 2005). Considering the aircraft acceleration as a constant, takeoff distances would be lengthened in the process (Zhou et al. 2018; Gratton et al. 2020). Sometimes this speed would be unreachable, depending on the aircraft technical characteristics and/or on runway length limitations, leading to weight restrictions, flight delays or even cancellations (Coffel and Horton 2015; Coffel et al. 2017; Gratton et al. 2020). In addition, engine thrust decreases with temperature (Airbus 2002). It is usually manually restrained to be constant below a certain temperature for fuel economy and environmental reasons. Temperatures above this threshold will have a negative effect on thrust. The temperature threshold is typically chosen to be International Standard Atmospheric Conditions + 15 \(^{\circ }\)C (ISA + 15), that is 30 \(^{\circ }\)C at sea level (Airbus 2002). Above ISA + 15, the higher the temperature, the lower the thrust, which reduces the capacity of the aircraft to lift weight. Also, warmer ambient temperatures result in higher temperatures of the flame in the combustion chamber, leading to an increase of pollutant emissions into the atmosphere, in particular, of nitrogen oxides (Heywood 2018).

Fire risk also increases under extreme hot conditions (Thompson 2016). Special care should be taken at the airport under high ambient temperatures with regards to jet fuel stock, manipulation and use, whose flammability limits range between 38 and 70 \(^{\circ }\)C, depending on the specific fuel type (Edwards 2017). An increase in the exposure to hot conditions might amplify the need for air conditioning and it might damage the infrastructures as well (Thompson 2016). During extreme events, energy supply could be compromised by the demand, and infrastructures would be more likely to suffer sudden punctual damages.

Finally, airport infrastructures and aircraft operations are designed to be adapted to the mean local climate in each case, optimising airport operational efficiency. Also, the aircrafts are designed to operate in a wide range of ambient conditions. Nonetheless, their operational capabilities and performances might be negatively impacted during extreme or record-shattering episodes, since they lay outside the ranges for which they were conceived. With climate change, these out-of-range high temperatures are expected to be more likely and more intense.

To the best of our knowledge, the impact studies carried out so far addressing the increase in high temperatures at the ground level over the airports are focused on aircraft takeoff performances. Previous studies have already modelled and quantified the increase in disrupted aircraft takeoff performances in terms of takeoff distance and maximum takeoff weight due to more frequent high-temperature extremes. Coffel and Horton (2015) and Coffel et al. (2017) evaluated the increase in the number of weight restricted flights related to the increase in high temperatures. Zhou et al. (2018) assessed the lengthening of takeoff distances with the increase of high temperatures at runway level. Both medium and long range aircrafts would see their maximum takeoff weight limited by the increase in high temperatures, and they would also need longer distances for takeoff. Payload penalty is greater for elevated airports with short runways. Nonetheless, their impact quantification might be overestimated, since missions rarely imply the 100% of fuel capacity nor the maximum takeoff weight of the aircraft (Hane 2016). Coffel et al. (2017) and Zhou et al. (2018) combined future climate projections with aircraft technical data to estimate the future evolution of takeoff distances and weight restricted flights in the twenty-first century. The climate projections that these two studies used were performed with Global Climate Models (GCMs) that participate in the 5th phase of the Coupled Model Intercomparison Project (CMIP5) (Taylor et al. 2012). However, the average horizontal spatial resolution of global climate simulations used is about 150 km, which may hamper their representation of very local phenomena such as high-temperature extreme events at the airport scale, and their changes in a warming climate (Salathe et al. 2008; Dulière et al. 2011). Regional climate simulations with finer spatial resolution might be a better approach to study regional and local phenomena (Feser et al. 2011; Di Luca et al. 2012), in particular, those related to climatological extremes. The added value of regional climate models (RCMs) in the study of regional precipitation extremes has already been shown in Sánchez et al. (2011), Torma et al. (2015), Prein et al. (2016), Fantini et al. (2018), Solman and Blázquez (2019), Di Virgilio et al. (2020) and Vichot-Llano et al. (2021), among others. However, there are few studies addressing the added value of high resolution models, such as RCMs, in representing local temperature extremes.

Vautard et al. (2013) studied the effect of high resolution on the representation of heat waves using RCM simulations performed within the international Euro-Coordinated Regional Climate Downscaling Experiment (Euro-CORDEX; Jacob et al. 2014, 2020). They compared the performances of RCMs at two different spatial resolutions (0.11\(^{\circ }\) versus 0.44\(^{\circ }\)). Although a clear added value of the higher resolution could not be established generally, local improvements were found in some regions, in particular, on the coasts of Spain. In Iles et al. (2020), the added value of high resolution in representing temperature extremes over Europe is studied using RCMs and GCMs. Limited benefits are obtained from higher resolution experiments, except over mountains. A recent study by Squintu et al. (2021) has compared the performance of high versus low spatial resolution global models simulations from the CMIP6 HighResMIP experiment (Haarsma et al. 2016). It concludes that increasing model resolution in GCMs does not substantially improve the representation of extreme summer maximum temperatures, and results in weaker temporal trends in Southern Europe in the observational period.

The aim of this study is two-fold. Firstly, it attempts to evaluate the RCM performances as well as their added value in representing local high-temperatures over the main Euro-Mediterranean airports. We focus on the magnitude of the extreme events and the temporal trends in the latest decades. Secondly, it aims to assess the future changes in the magnitude of high-temperature extremes by using RCM projections, which will be compared to GCM projected changes over these airports. To our knowledge, this is the first attempt to evaluate the performances of a multi-RCM ensemble and assess the RCMs added value in terms of temperature extremes at the airport scale. It would also be the first attempt to evaluate high temperature changes at the airports using the state-of-the-art Euro-CORDEX RCMs ensemble. To the best of our knowledge, the consideration of future climate projections from both RCM and GCM ensembles for addressing climate change impacts at regional-to-local scales is unconventional, while it may be crucial for designing adaptation and mitigation policies at these scales.

This document is organised as follows: data and methods are described in Sect. 2, results are presented and discussed in Sect. 3, before conclusions and perspectives are presented in Sect. 4.

2 Data and methods

2.1 Observations, reanalysis and climate simulations

Nine of the most frequented airports located over Southwestern Europe were selected: (1) Adolfo Suárez Madrid-Barajas (MAD), (2) Paris Orly (ORY), (3) Toulouse-Blagnac (TLS), (4) Josep Tarradellas Barcelona-El Prat (BCN), (5) Nice Côte d’Azur (NCE), (6) Leonardo Da Vinci Rome-Fuimicino (FCO), (7) Athens Eleftherios Venizelos (ATH), (8) Milan Malpensa (MXP) and (9) Lyon-Saint Exupery (LYS) (Table 1). A large variety of local topographies are represented within this group of airports: the first three airports are located over flat lands, airports from 4 to 7 are located near the coast and the last two are close to mountain chains (Fig. 1).

Table 1 Observational datasets and reanalysis available at the airports selected for this study
Fig. 1
figure 1

Airports selected as case studies

The variable considered for this study is the daily maximum near-surface temperature (TX) in summer (June, July and August; JJA).

Several observations and reanalysis datasets were considered. Time series of in situ measurements at meteorological stations over the airports from the National Oceanic and Atmospheric Administration (NOAA) Global Historical Climatology Network (GHCN)-Daily dataset were used (Menne et al. 2012b). The NOAA GHCN-Daily dataset contains a large collection of meteorological series from land stations worldwide, and it is the result of international agreements to exchange climate data. Concerning Europe, the data available from the European Climate Assessment and Dataset project (ECA&D; Klein Tank et al. 2002) are considered, with most of the observational records starting in 1961. All the data collected and merged for the NOAA GHCN-Daily dataset undergo a quality-control assessment that is common for all the measurements. This dataset is frequently updated, with at least regular monthly updates for European stations. Here, the latest update of the NOAA GHCN-Daily version 3 was used (Menne et al. 2012a), which was available up to December 2019. The data were accessed via R software using the ‘rnoaa’ package (Chamberlain 2021).

Table 2 Principal characteristics for the datasets used in this study: type, spatial domain, resolution and covered period

E-OBS gridded observational dataset was also analysed (Haylock et al. 2008; Cornes et al. 2018). It is a land-only dataset available over Europe, built on series from stations that are considered in the ECA&D project. The version 24.0e of the dataset, which is available on a regular grid of \(0.1^{\circ } \times 0.1^{\circ }\) (EOBS 01deg) from 1950 to 2021, was used.

The SPAIN02 gridded observational dataset was also considered (Herrera et al. 2012). This is a product provided by the University of Cantabria (Spain) based on meteorological series recorded by the Spanish Meteorological Agency (AEMET) over the peninsular Spain and the Balearic islands. For developping this gridded dataset, a two-step area-averaged interpolation method was applied, where monthly means are interpolated first using thin plate splines, and then daily anomalies are interpolated using ordinary kriging. The fifth and latest version of SPAIN02 dataset (Herrera et al. 2016; Kotlarski et al. 2019) was obtained from the AEMET climate services portal (http://www.aemet.es/es/serviciosclimaticos/cambio_climat/datos_diarios?w=2&w2=1). This covers the period 1971-2015 and is available on a regular \(0.1^{\circ }\times 0.1^{\circ }\) grid.

Another dataset considered for this study is the Système d’Analyse Fournissant des Renseignements Atmosphériques à la Neige-France reanalysis (SAFRAN-France; Quintana-Segui et al. 2008; Vidal et al. 2010). SAFRAN-France is a product of Météo-France, which offers climate data over the Metropolitan France on a 8 km regular grid for the period 1950–2014. An optimal interpolation algorithm is used to spatialize the observations in 300-m vertical layers over climatically homogeneous zones. Preliminary estimates or first guess of the air temperature field, calculated from a meteorological model or a reanalysis such as NCEP, are modified to minimize the weighted sum of the differences between the first guess and the observed values at nearby stations for each grid point. A spatial interpolation is then performed to project the variables onto a regular 8 km-square grid.

Although the SPAIN02 and SAFRAN-France national gridded datasets include more stations for temperature than the NOAA GHCN-Daily or the EOBS dataset, they also present some disadvantages. For instance, SPAIN02 covers a shorter period as compared to the NOAA product and EOBS 01deg, and it has been shown that SAFRAN-France must be used with caution for the analysis of temperature trends (Vidal et al. 2010). The analysis of trends using the NOAA GHCN-Daily series should also be done with caution. In the absence of the best reference, all of the available datasets were considered and intercompared.

Table 1 summarises the observational datasets and reanalysis that are available for each selected airport. Table 2 synthesizes the main characteristics of the datasets described above.

The RCM simulations analysed in this study belong to the Evaluation, the Historical and the Representative Concentration Pathways (RCPs) Euro-CORDEX-11 ensembles (Jacob et al. 2014, 2020). The horizontal spatial resolution of these simulations is 0.11 \(^{\circ }\) (\(\sim\) 12 km). Figure 2 illustrates how the scale of RCMs relates to the scale of the airports. The Evaluation ensemble consists of RCM simulations driven by the ERA-Interim reanalysis (ERAI hereinafter; Berrisford et al. 2011) at their boundaries. Table 3 summarizes the 7 RCMs used in this study from the Evaluation experiment. The Historical experiment consists of climate simulations in which the RCMs are forced by some CMIP5 models. The Historical simulations used here correspond to 58 combinations of 8 driving GCMs for 11 RCMs. Table 4 details the RCM \(\times\) GCM matrix. Also, in the three RCPs scenario experiments (RCP2.6, RCP4.5 and RCP8.5) RCMs are driven by CMIP5 GCMs. Table 5 inventories the GCM-RCM pairs of simulations considered for each of the three RCP scenarios: 11 simulations for RCP2.6, 13 simulations for RCP4.5 and 35 simulations for RCP8.5, resulting from the combination of 7 driving GCMs with 10 RCMs. The CMIP5 ensemble is also used (Taylor et al. 2012), since the driving models for the currently widely available Euro-CORDEX experiments are selected from this data base. Table 6 presents the CMIP5 simulations considered in this study. In total, simulations performed with 34 different GCMs were used.

Fig. 2
figure 2

MAD airport and the grid of the RCM ALADIN63 from Euro-CORDEX-11. The black point corresponds to the meteorological station located at the airport. The diamonds designate the centroids of the model grid cells, with the blue one indicating the selected grid point as the one containing this airport for the analysis

2.2 Characterisation of TX extremes and trends

Table 3 List of the available RCM simulations from the Euro-CORDEX Evaluation experiment in the period 1979–2008

The magnitude of extreme events was analysed from quantile–quantile (q–q) diagrams, as in Herrera et al. (2010) for precipitation or in Christensen and Boberg (2012) for monthly mean temperatures, for instance. In these q–q plots, the quantiles of the different datasets were compared with those of the observational reference dataset.

Trends were computed by the quantile regression method, which was first introduced in Koenker and Bassett (1978), and further developed in Koenker (2005) and Koenker (2017). This method allows the estimation of the linear temporal trends of the TX Probability Distribution Functions (PDFs) by quantiles (or percentiles), thus offering information about the evolution of the PDF shape, and not only about mean changes as the most-commonly used Ordinary Least Squares (OLS) regression method does. It shares the same principle as the OLS regression method for the estimation of the conditional mean. In the OLS regression method, an estimation of the conditional expected value of the response variable is computed by minimizing the sum of squared residuals. This is not the case for conditional quantiles, for which the optimization function is the sum of asymmetrically weighted absolute residuals (Koenker and Hallock 2001), as the number of positive and negative residuals now depends on the quantile range.

Uncertainties were estimated by bootstrapping for both extremes magnitude and quantile trends. In the case of the quantile regression method, the bootstrap sampling was made over 15-consecutive-days clusters to preserve temporal homogeneity. All quantile trends were computed in R using the ‘quantreg’ package (Koenker 2021).

Table 4 The RCM \(\times\) GCM matrix indicating which combinations from the Euro-CORDEX Historical experiment were available for the study in the period 1961-2005

The analysis presented in this study can be organised in five steps.

  1. 1.

    Extreme values and trends of TX in the recent period 1961–2014 were characterized from the observational and reanalysis datasets over the nine selected airports (Table 1). To estimate to what extent the observational reference choice may affect the further evaluation of climate models, the results from the different observations and reanalysis were intercompared. In the gridded datasets, the nearest grid point to each airport was selected for the analysis.

  2. 2.

    The Euro-CORDEX Evaluation experiment was evaluated with respect to an observational dataset in terms of extremes magnitude and the quantile trends of TX for the period 1979–2008. This analysis allowed us to estimate the errors that are specific to RCMs. The land-sea mask corresponding to each RCM was taken into account to select the nearest grid points to the airport locations. The nearest point with at least 0.6 of land area fraction was selected in each case.

  3. 3.

    The Evaluation and Historical Euro-CORDEX RCM ensembles were compared. This step allowed us to analyse the propagation of GCM errors into RCMs in the Historical experiment. It is crucial to evaluate the Historical experiment before assessing future climate projections from the RCP experiments. For a fairer comparison, each RCM in the Evaluation experiment was counted as many times as there were simulations for that same RCM in the Historical experiment.

  4. 4.

    The Historical RCM ensemble was compared to the subset of driving GCMs from the Historical CMIP5 experiment over the airports for the common period 1961–2005: 8 GCMs drove 11 RCMs for a total of 58 GCM-RCM pairs of simulations (see Table 4). The GCM simulations in the driving CMIP5 ensemble were weighted according to the number of RCMs that each GCM forced in the Euro-CORDEX Historical experiment. This step provided an assessment of the added value of increasing the resolution in climate models in representing the magnitude and trends of TX extreme events. For this step, RCMs and GCMs were compared over their native horizontal resolution, which allowed the added value of RCMs over the GCMs to be assessed at small scale. Nearest grid points to airports were selected as presented in step 2, for both GCMs and RCMs. The average distance between the airports and the selected grid points within GCMs was 60 km for inland airports, and about 150 km for coastal airports.

  5. 5.

    Future climate projections from the Euro-CORDEX and the CMIP5 RCP ensembles were analysed at their respective native resolution to investigate the changes in TX extremes for the periods 2021–2050 (near term) and 2071–2100 (long term), with respect to the historical period 1961–2005. This last step allowed the estimation of the magnitude of projected changes in TX extremes, and the comparison of RCM projections to the driving GCM projections, and to the CMIP5 ensemble as a whole, over the airports. One member per model was considered for the whole CMIP5 ensemble. Also, the RCM future projections were compared to those of their driving GCMs in terms of the quantile trends for the median and the upper 90th and 95th percentiles in the same 30-year future periods.

A height correction was applied to all the gridded datasets as in Kotlarski et al. (2019), to offset the temperature differences resulting from the altitude differences between the elevation of the airports and that of the selected grid points. A decrease in temperature with elevation following an adiabatic atmospheric profile was considered, that is, \(-\,6.5\,^{\circ }\)C every + 1000 m.

3 Results

3.1 Observed extreme values and trends

Table 5 The RCM \(\times\) GCM matrix indicating which combinations from the Euro-CORDEX RCPs scenarios experiment were available for the study between 2021 and 2100

The main purpose of this section is to characterize the differences between all of the available datasets over the selected airports, in terms of the magnitude of extreme values and trends of the summer TX.

Figure 3 shows the 90th, 95th and 99th upper percentiles of summer TX for the observations and reanalysis over the airports. All the quantiles are plotted versus those from EOBS 01deg. When both datasets are identical, they overlap on the diagonal line, while points laying above (below) indicate greater values (lower values) compared to the EOBS 01deg quantiles. Warmest temperatures were recorded at MAD airport, with 90th to 99th percentiles ranging from 36.4 to 39.0 \(^{\circ }\)C. Second place in the ranking of the highest observed 99th percentile values is for ATH with 37.4 \(^{\circ }\)C (not shown), and third place is for TLS, with 36.8 \(^{\circ }\)C. On the other hand, the lowest 99th percentile values were observed at NCE and BCN airports, with 33.0 \(^{\circ }\)C and 33.6 \(^{\circ }\)C, respectively. The most moderate temperature extreme is found at ORY airport, with 29.5 \(^{\circ }\)C for the 90th percentile. SAFRAN-France presents lower values than EOBS 01deg for extremely high temperatures at French airports, they are around 0.8 \(^{\circ }\)C smaller. SPAIN02 also presents upper percentile values that are 0.5 \(^{\circ }\)C smaller in average than those of EOBS 01deg at MAD airport, whereas they are 0.2 \(^{\circ }\)C greater in average at BCN airport. Meanwhile, the NOAA GHCN-Daily and the EOBS 01deg datasets show very similar results, even though the NOAA product presents high temperatures that are 0.7 \(^{\circ }\)C smaller than those of EOBS 01deg at BCN airport. The maximum differences between datasets for the TX summer upper percentiles range from \(\pm \,0.3\,^{\circ }\)C for ORY to 1.5 \(^{\circ }\)C for TLS.

Fig. 3
figure 3

q–q plot of the TX upper percentiles observed in the summer season between 1961 and 2014. NOAA GHCN-Daily (green points), SPAIN02 (red points) and SAFRAN-France (blue points). EOBS 01deg is considered as reference (horizontal axis). The tick marks represent MAD airport (stars), TLS airport (triangles down), ORY airport (circles), BCN airport (diamonds), NCE airport (squares) and LYS airport (triangles up)

Observed quantile trends at the nine airports are shown in Fig. 4, as well as the uncertainty associated to the trend estimation for each quantile, as explained in Sect. 2.2. In this study, 19 quantiles from 5th to 95th are considered. All the quantiles of TX increased for all the cases. However, the shape of the quantile trends envelope differs amongst the airports considered. ORY and MAD airports exhibit large asymmetries in the distribution of quantile trends. They show contrasted behaviors, with the strongest increase of highest quantiles at ORY airport (\(0.60 \pm 0.15\,^{\circ }\)C/decade according to EOBS 01deg) and strongest increase of lowest quantiles at MAD airport (\(0.8 \pm 0.2\, ^{\circ }\)C/decade, also for EOBS 01deg). TLS has also experienced slightly stronger trends at higher percentiles. In addition, LYS also presents asymmetries, with trends being larger for the median than for the upper and lower extremes, but only according to the NOAA GHCN-Daily dataset. These uneven distributions of the percentile trends justify the choice of the quantile regression method for the computation of the evolution of high temperatures. For the rest of airports, the PDF shifted towards higher temperatures in an almost homogeneous way, as differences between lowest and highest percentile trends are minimal compared to those obtained at ORY and MAD airports. Trends of the 95th percentile, which is commonly used to characterize extremes, range between 0.25 and 0.75 \(^{\circ }\)C/decade for most of the airports (TLS, BCN, NCE, ATH, LYS and MXP), when considering all of the available datasets. The weakest trends for 95th percentile are observed at FCO airport and they range between 0.13 and 0.27 \(^{\circ }\)C/decade. Positive trends found for all the airports are coherent with the rise in heat events in the Euro-Mediterranean region in recent decades (IPCC 2013, 2021). This study on major Euro-Mediterranean airports expands the list of airports that have been considered so far (Coffel and Horton 2015; Coffel et al. 2017; Zhou et al. 2018), and which have been found to be more exposed to extreme heat conditions because of global warming.

Fig. 4
figure 4

Quantile trends of the TX between 1961 and 2014 in JJA, computed for MAD, TLS, ORY, BCN, NCE, FCO, ATH, LYS and MXP airports from EOBS 01deg (black), NOAA GHCN-Daily (green) and SPAIN02 (red) observational datasets, and for SAFRAN-France (blue) reanalysis. Solid lines correspond to the mean of the bootstrap distribution, and shading indicates the 95% confidence interval. For SPAIN02, a lack of 10-year data at the beginning of the period was accepted, which means that trends were actually computed over the period 1971–2014

The EOBS 01deg dataset shows a very similar behavior to that of the NOAA GHCN-Daily dataset, for both extreme values and trends. This is probably explained by the fact that the nearest in situ stations to the selected airports that the EOBS 01deg dataset integrates are also included in the NOAA GHCN-Daily dataset. SPAIN02 and SAFRAN-France show large discrepancies with respect to the EOBS 01deg and the NOAA GHCN-Daily datasets. In general, SAFRAN-France exhibits larger trends than EOBS 01deg or the NOAA product for the upper quantile, except for the TLS airport. For SPAIN02, the reason of the differences from other datasets can be explained by the sensitivity of trends computation to the period of study (see Table 2). Indeed, if we recompute the quantile trends for a common period (1971–2014) for all the datasets at the Spanish airports, SPAIN02 exhibits more similar trends to the EOBS 01deg and the NOAA GHCN-Daily datasets (Supplement, Fig. 1). Nevertheless, SAFRAN-France reanalysis presents larger trends for the 95th percentile than the EOBS 01deg and the NOAA GHCN-Daily datasets, they are 0.4 \(^{\circ }\)C/decade greater at ORY and NCE, and around 0.15 \(^{\circ }\)C/decade greater at LYS, while it matches both at TLS.

To conclude, differences between datasets for TX trends in the observational period can reach up to 0.4 \(^{\circ }\)C/decade (for central estimates), depending on the location and also on the percentile. However, all of the datasets remain mainly coherent at most airports because of the wide amplitude of the confidence intervals from the quantile trends computation. Also, the maximum difference between datasets obtained for the upper percentile magnitudes of TX is 1.5 \(^{\circ }\)C for temperatures that exceed 35 \(^{\circ }\)C. According to these results, we consider that the choice of the observational reference among these datasets may not be determinant for the evaluation of the climate models in terms of the local TX. We consider hereinafter EOBS 01deg as the observational reference for the evaluation of the climate models at the airport scale, since data are available over all the selected airport locations, and its resolution is nearly the same as that of the RCMs.

3.2 Evaluation of RCMs

Table 6 List of GCM simulations from the CMIP5 RCP scenarios experiments analysed in this study in the period 1961–2005

The magnitude of TX extremes is clearly overestimated by RCMs in the Evaluation experiment (Fig. 5). These biases do not change much with the percentile range, but their amplitudes depend on the airport. The largest biases are found at TLS, NCE and MXP airports, being of + 3.3 \(^{\circ }\)C in average, and reaching up to + 6.0 \(^{\circ }\)C for some models. The lowest biases are found at MAD and ORY airports, being less than + 1.5 \(^{\circ }\)C in average. With SPAIN02 or SAFRAN-France as observational reference, the warm bias of RCMs at MAD, TLS, NCE, and LYS would have been even higher. Conversely, when RCMs are driven by GCMs in the Historical experiment, the MME mean biases change in magnitude and even in sign at most of the airports and decreases. Nonetheless, the amplitude of the MME spread, considered as the difference between the maximum and the miximum from the ensemble simulations, amongst the Historical ensemble is very large, ranging between − 6 and + 6 \(^{\circ }\)C. This change in the behavior of RCMs could be explained by the interaction and/or the superposition of the RCM intrinsic biases with the driving GCM biases, suggesting an error compensation (Colmet-Daage et al. 2018).

Fig. 5
figure 5

q–q plot of the TX upper percentiles between 1979 and 2008 in JJA, for the Euro-CORDEX Evaluation and Historical experiments (pink and blue, respectively). The reference is EOBS 01deg data as explained in the text. Colored solid lines represent the MME mean of each experiment, and shading corresponds to the interval between minimum and maximum values obtained for each model ensemble

Figure 6 shows that, in general, there are no substantial differences between the observed TX quantile trends and the simulated trends by the RCMs in the Evaluation and Historical ensembles, since the observational spread generally lays within the envelope spanned by RCMs for most percentiles across the airports. However, MME mean trends are generally lower than the observed trends. Only at ATH airport, the observed trends do lie completely outside of the Evaluation MME spread for almost all of the percentiles, except for the extremes and the median. The inter-model spread of the Evaluation ensemble for the upper percentiles trend is generally very wide, exceeding 0.3 \(^{\circ }\)C/decade in the 30-years period in most of the cases. The Historical experiment depicts a larger inter-model spread concerning all percentile trends. In particular, the inter-model spread amplitude for the highest quantile trends can reach more than 2 \(^{\circ }\)C/decade.

Fig. 6
figure 6

Quantile trends of the TX between 1979 and 2008 in JJA for the EOBS 01deg observational dataset (black), the Euro-CORDEX Evaluation and Historical experiments (pink and blue, respectively). Error bars represent the 95% confidence interval for the observed trends. Colored solid lines represent the MME mean of each experiment, and shading corresponds to the interval between minimum and maximum values found for each of the two ensembles

Results from the evaluation of RCMs are consistent with Vautard et al. (2013). They also found that RCMs generally overestimate summer temperature extremes, in particular, in the Mediterranean region. Moreover, the positive intrinsic bias of the Euro-CORDEX RCMs for summer temperatures in Southern Europe was also highlighted in Kotlarski et al. (2014).

To conclude, the GCM-driven experiment exhibits smaller biases in the MME mean for extreme temperatures than in the ERAI-driven experiment, with − 0.4 versus + 2.3 \(^{\circ }\)C, in average, but a larger MME spread. The amplified spread in extreme temperatures when RCMs are driven by GCMs was also found in Moberg and Jones (2004), Kjellström et al. (2007) and Nikulin et al. (2011). The RCM MME spread main dependence on the driving GCMs was previously pointed out in Déqué et al. (2012) for mean summer temperatures.

3.3 Added value of RCMs over GCMs

Figure 7 shows that the driving GCM MME mean underestimates the TX extremes over the 5 inland airports, while it underestimates them over the 4 coastal airports. This prevailing cold bias of CMIP5 for summer temperatures over Southwestern Europe is consistent with results found by Cattiaux et al. (2013). An advantageous interaction of the RCM inherent positive biases with the GCM biases could explain the apparent better results of the Historical RCM ensemble versus the Evaluation ensemble found in Sect. 3.2. The large GCM ensemble spread entirely envelops the observations. Only at BCN, NCE and MXP airports TX extremes are found to be completely underestimated by the whole ensemble of driving GCMs. RCMs show an apparent added value regarding TX extremes, as the MME mean is in some cases really close to the observations, and the RCM ensemble seems to be more performant than the driving GCMs, in particular, over BCN, NCE and MXP airports, the first two on the coast, and the third one near to mountain chains. Nonetheless, according to the results in Sect. 3.2, this apparent added value is likely the result of each pair of GCM-RCM errors interaction, as already mentioned. These results are coherent with Vautard et al. (2013), where local improvements were found in the representation of heat waves over Europe in some coastal emplacements, using higher resolution climate simulations. In Iles et al. (2020), increasing the resolution was also found to be beneficial for the representation of high-temperature events over mountainous regions in Europe, as warm biases were smaller at higher resolution.

Fig. 7
figure 7

q–q plot of the TX upper percentiles between 1961 and 2005 in JJA, for the Euro-CORDEX Historical ensemble (blue) and the forcing CMIP5 GCM (weighted) sub-ensemble (green). The reference is EOBS 01deg data as explained in the text. Colored solid lines represent the MME mean of each experiment, and shading corresponds to the interval between minimum and maximum values obtained for each model ensemble

Figure 8 highlights that it is generally not possible to exclude the observations, which are a single realisation of the climate system among all the possible ones, from the distribution of realisations simulated by the RCM and GCM Historical ensembles that represent the internal climate variability (as modelled by both Euro-CORDEX and CMIP5 Historical ensembles). The MME mean of the two ensembles present smaller trends and less variation between TX percentiles than those observed.

Fig. 8
figure 8

Quantile trends of the TX between 1961 and 2005 in JJA for the EOBS 01deg observational dataset (black) and for the Euro-CORDEX Historical ensemble (blue) and the forcing CMIP5 GCM (weighted) sub-ensemble (green). Error bars represent the 95% confidence interval for the observed trends. Colored solid lines represent the MME mean of each experiment, and shading corresponds to the interval between minimum and maximum values found for each of the two ensembles

Moreover, the amplitude of the inter-model spread of the RCM Historical ensemble is generally comparable to the one of the driving GCM ensemble for TX extremes, as shown in Fig. 7, despite the fewer number of unique GCM simulations integrating the second ensemble (9 GCMs in CMIP5 Historical versus 47 GCM-RCM combinations in Euro-CORDEX Historical). As already mentioned in Sect. 3.2, the driving GCM was highlighted as the main source of uncertainty amongst RCMs in simulating mean summer temperatures in Déqué et al. (2012). The comparison of the MME spread amplitudes of the two ensembles regarding TX quantile trends suggests that MME spread amplitude in RCMs is modulated by the MME spread amplitude of the driving GCMs, although larger uncertainties are clearly found in the RCM experiment at TLS, ORY and LYS airports, as shown in Fig. 8.

These results indicate that it is difficult to conclude on an added value of higher resolution RCMs with respect to GCMs in representing extreme values and trends of high temperatures at the small scale of the airport. These findings are consistent with those in Vautard et al. (2013) and Squintu et al. (2021). Only increasing model resolution may not be a sufficient condition to improve the representation of local extreme temperature phenomena and their evolution.

3.4 Future climate projections

Figure 9 represents the future changes of the 95th percentile as simulated by the Euro-CORDEX MME mean for the most severe RCP8.5 scenario. The coastal airports show the smallest increase: less than 2 \(^{\circ }\)C by the near term, and around 5 \(^{\circ }\)C by the end of the century. This is in line with the lower warming projected by the Euro-CORDEX ensemble for the Atlantic coast of Portugal, found in Cardoso et al. (2019), as compared to the eastern and more continental part of the country. At ORY airport, the magnitude of the 95th percentile is also projected to increase approximately 1.7 and 5.0 \(^{\circ }\)C by the near and long term, respectively. The other airports experienced stronger changes in the 95th percentile, ranging between 2 \(^{\circ }\)C for the near term and about 6 \(^{\circ }\)C for late twenty-first century. These results are consistent with the increase of TX extremes during this century projected by the Euro-CORDEX ensemble over the Mediterranean and Southern Europe found in Zittis et al. (2019) and in Coppola et al. (2021). We have also investigated the projected changes for the median and the quantiles 90 and 99, 99.5 and 99.9th. In general, a stronger increase is projected for the highest quantiles with respect to the median under the severe scenario in the two horizon periods (not shown). In particular, these differences between the 95th percentile and the median can reach up to 0.4 \(^{\circ }\)C by the near term, and up to 1.1 \(^{\circ }\)C by the long term. This is in agreement with Cardoso et al. (2019) for future Euro-CORDEX projections over Portugal. They are also in accordance with the future high-temperature changes projected for a list of airports distributed worldwide analysed in previous studies (Coffel and Horton 2015; Coffel et al. 2017; Zhou et al. 2018).

Fig. 9
figure 9

Projected changes in the 95th percentile of the summer TX over MAD, TLS, ORY, BCN, NCE, FCO, ATH, LYS and MXP airports, computed as the difference between the periods 2021–2050 and 1961–2005 (a), and between the periods 2071–2100 and 1961–2005 (b), simulated by the Euro-CORDEX MME mean for the RCP8.5 scenario

MME mean projected changes by the near term found for the RCP2.6 and RCP4.5 scenarios lay within the envelope of the RCP8.5 experiment ensemble (Supplement, Fig. 3a). This result suggests that the emission scenario and anthropogenic forcing are not the dominant sources of uncertainty by the near term, which is consistent with Kay et al. (2015). Instead, models and internal variability may be the main sources of uncertainty. Conversely, MME mean changes are well differentiated by the long term for the different scenarios (Supplement, Fig. 3b). In particular, regarding the 95th percentile, differences between the RCP8.5 and RCP2.6 scenarios range between 3.4 and 4.5 \(^{\circ }\)C, and between 2.2 and 2.9 \(^{\circ }\)C for the RCP8.5 and the RCP4.5 scenarios comparison.

The warming projected for the TX 95th summer percentile by the CMIP5 MME mean is between 0.8 and 1.2 \(^{\circ }\)C greater than the one simulated by the Euro-CORDEX in the near term, and between 1.8 and 2.7 \(^{\circ }\)C greater in the long term, as shown in Fig. 10. The magnitude of this difference was found to be the same for the median as for the upper extremes (Supplement, Fig. 2). Similar differences were found in average when comparing the RCMs to the subset of driving GCMs, although they are slightly less pronounced by the long term. This suggests that the difference between the changes projected by the whole CMIP5 ensemble and those projected by the Euro-CORDEX ensemble is not due to an under-sampling issue in the selection of the forcing models. The MME spread for CMIP5 as a whole is clearly wider than that for the driving GCMs, even though their MME means are not so different from each other.

Fig. 10
figure 10

Projected changes in the 95th percentile of the summer TX between 2021–2050 and 1961–2005 (a), and between 2071–2100 and 1961–2005 (b), over the nine airports simulated by the Euro-CORDEX (blue), the driving GCMs (green) and the CMIP5 (yellow) RCP8.5 experiment ensembles. The boxes are delimited by the first and third quartiles, with the median the segment in between, and points indicating the MME mean. The lower (upper) whiskers correspond to the minimum (maximum) values of the distribution in each case

As for the projected changes, the projected trends in the future periods 2021–2050 and 2071–2100 are greater amongst the driving GCMs than amongst the RCMs, according to their MME means (Supplement, Fig. 4 and 5). The driving CMIP5 GCMs project in average trends 0.9 \(^{\circ }\)C/decade warmer than the Euro-CORDEX RCMs by the near term. In the further period, driving GCMs do also generally project mean warmer trends than RCMs, but differences between the two MME mean are much smaller as compared to the next decades (0.3 \(^{\circ }\)C/decade in average).

Warmer changes projected by CMIP5 GCMs as compared to Euro-CORDEX RCMs were also found in Boé et al. (2020) for the mean summer temperature, and in Coppola et al. (2021) using extreme temperature indices. The lack of evolving aerosols during the twenty-first century in most of the RCMs is highlighted in Boé et al. (2020) as one explanation for this. In addition, in Coppola et al. (2021) differences between changes for cloud cover projected by the two ensembles (Bartók et al. 2017) is pointed out as another plausible explanation, along with the lack of representation of the effect of the plants physiological response to CO\(_2\) in RCMs (Schwingshackl et al. 2019), also mentioned in Boé et al. (2020).

Furthermore, the link between the magnitude of future projected changes for TX extremes by the models and their biases in present climate was investigated. An inter-model Pearson correlation test amongst the Euro-CORDEX and the CMIP5 ensembles was performed for each airport, between the projected changes under the RCP8.5 scenario by 2021–2050 and 2071–2100 with respect to 1961–2005, and the model biases in 1961–2005. This analysis revealed that future projected changes in high-temperature extremes are not generally correlated to the model biases in present climate. Only in few cases this correlation was found to be weak but significant (p-value < 0.05), with correlation values ranging from 0.39 to 0.55 (see Supplementary Figs. 6–9). This is in contrast with what was suggested for mean summer temperature in Boberg and Christensen (2012) and for monthly temperatures in Christensen and Boberg (2012).

In summary, mean projected changes by the Euro-CORDEX ensemble (by the CMIP5 ensemble, respectively) for TX extremes during the twenty-first century under the RCP8.5 scenario, relative to the historical period, range between + 1.7 and + 2.2 \(^{\circ }\)C (+ 2.7 and + 3.2 \(^{\circ }\)C) by the near term, and between + 4.9 and + 6.2 \(^{\circ }\)C (+ 7.2 and + 8.5 \(^{\circ }\)C) by the long term, over the main Southwestern European airports. MAD would be the airport most exposed to extreme heat, since this location combines one of the largest projected changes amongst all the case studies with the warmest extreme values observed in the present climate. Changes projected by RCMs are much smaller than the ones projected by CMIP5 GCMs.

4 Conclusions

The aeronautics and aviation industries are vulnerable to global warming as aircraft performances and operations depend on air temperature. High-resolution RCMs may be an appropriate tool to address the study of future potential impacts at the airport scale. The prior evaluation of RCMs is crucial before carrying out this impact assessment.

In this study, the performance of RCMs from Euro-CORDEX in the simulation of extreme values and trends of high temperatures at the airport scale was evaluated. The series of the maximum daily 2-m temperature at nine of the major Euro-Mediterranean airports were analysed for the past decades. Most of the airports considered as case studies are original to this study. They have not been considered before, and yet they are major airports located in one of the most important climate change “hot spots”. A set of observations and reanalysis products were first analysed and compared amongst each other in order to estimate how the observational reference choice can influence the evaluation of the climate models. Trends were computed using quantile regression for a list of percentiles sampling the whole summer TX PDF for each airport. This method allows us to obtain the evolution of the shape of the PDFs, in particular, the median and extreme trends, and not only the mean trends like the most-commonly used OLS regression method. Positive trends exceeding 0.2 \(^{\circ }\)C/decade in the observational period 1961–2014 were found for all the airports, and regarding all of the TX quantiles. This corroborates the existence of a potencial risk for airports over the Euro-Mediterranean region due to global warming. In addition, the median and extreme quantiles showed marked differences in terms of the warming magnitude, in particular, for ORY and MAD airports. While higher quantiles experienced larger warming than the lower quantiles and even the median at ORY, a contrasted behavior was found for the MAD airport. This result highlights the need for computing the temperature trends by quantiles, since TX extreme events are the most problematic for aviation, yet this method is not broadly used in the literature. Although observational datasets exhibit some differences, the maximum divergence concerning the TX upper percentiles is 1.5 \(^{\circ }\)C for temperatures exceeding 35 \(^{\circ }\)C, and TX trends are mainly coherent amongst the datasets for most of the airports.

Euro-CORDEX RCMs were evaluated by comparing regional climate simulations with an observational reference. First, RCM performances when driven by the ERAI reanalysis were studied using the Evaluation experiment. A systematic overestimation of TX extremes by RCMs was found. As suggested in Vautard et al. (2013), the overestimation of temperatures by Euro-CORDEX RCMs may be linked to the underestimation of the precipitation over these locations. This affects the regional partitioning between sensible and latent heat fluxes, to the detriment of latent heat, since less soil moisture is available for evaporation. This hypothesis was not further investigated in our study. An additional analysis of the RCM Historical experiment reveals that model performances improve when RCMs are driven by GCMs. However, this improvement in RCM performances is likely the result of error interaction for each pair of GCM-RCM combinations, reducing biases but not for good reasons. In addition, the observed trend is included in the distribution of the past trends simulated by the RCMs. Finally, the added value of RCMs with respect to GCMs was explored. The quantile regression method enabled a more detailed evaluation and comparison of RCMs versus GCMs than that seen in the literature in terms of temperature trends. Similar results were found in the two ensembles for the simulated TX trends, whose distributions do not differ substantially from the observed trends. Nonetheless, even if similar trends are simulated by RCMs and GCMs in the past period, changes projected by the two ensembles in future climate largely differ from each other, as already highlighted in Boé et al. (2020) and in Coppola et al. (2021). Indeed, we also found that CMIP5 GCMs project warmer changes than the Euro-CORDEX ensemble even if both ensembles project a robust increase for all the airports. This disparity between GCMs and RCMs is ascribed to differences in the representation of aerosols within RCMs versus GCMs in Boé et al. (2020), and to differences in the representation of plant physiological effects (Schwingshackl et al. 2019). In Coppola et al. (2021), different cloud cover future evolution in GCMs and RCMs (Bartók et al. 2017) is also mentioned as another possible reason for this. Contrary to what the RCM and the GCM ensembles simulate in the present climate, the upper extremes are projected to experience larger warming than the median in the future. We consider important to investigate the reason for this in further studies.

The smaller warming in RCMs compared to GCMs is an important issue that should be taken into account for future impact assessment with RCM projections. On the one hand, our results from the evaluation of RCMs and GCMs in the present climate do not allow us to conclude that one ensemble is better than the other. On the other hand, concerning future projections, CMIP5 GCMs consider more realistic changes in forcing factors than Euro-CORDEX RCMs (Schwingshackl et al. 2019; Boé et al. 2020). Thus, considering only regional climate projections would lead to an underestimation of the real uncertainty in future climate projections. The design of adaptation and mitigation policies at regional to local scales should not be based solely on RCM future projections. As long as these large discrepancies are not fully explained, we find that both RCM and GCM future projections should be taken into consideration for impact assessment and the development of climate change policies at the airport scale. The mean increase in TX extremes across the airports is projected by RCMs (by GCMs, respectively) to be greater than 1.7 \(^{\circ }\)C (2.7 \(^{\circ }\)C) in the next decades by RCMs and GCMs under the severe RCP8.5 scenario, and it could even reach 2.2 \(^{\circ }\)C (3.2 \(^{\circ }\)C) for some airports. By the end of the twenty-first century high-temperatures are projected by RCMs (by GCMs, respectively) to be more than 4.9 \(^{\circ }\) C (7.2 \(^{\circ }\)C) warmer than in recent decades across all the airports, and up to 6.2 \(^{\circ }\)C (8.5 \(^{\circ }\)C) warmer in some cases. MAD airport would be the the location most exposed to extreme heat conditions, as it combines the highest temperature extremes in the present period with a large projected increase in future climate.

We conclude from this study that there is no generally prevailing added value in the state-of-the-art Euro-CORDEX RCMs in the representation of TX extremes and of their temporal trends at the airport scale, despite their higher spatial resolution. As highlighted in Vautard et al. (2013) and Sørland et al. (2018), physical parametrizations would also play a major role, which encourages the implementation of more realistic parametrizations at the regional scale in RCMs. Urban areas are generally represented in Euro-CORDEX RCMs as rock covers with high roughness length, high albedo and low water storage capacities (Langendijk et al. 2019). Recent studies have achieved better results by implementing more realistic parametrizations and also modelling some mesoscale processes. In Daniel et al. (2019) more accurate results for near-surface temperature are obtained by considering a more detailed description of the materials and of the topography of urban areas for the Parisian region. This leads not only to a better representation of the local temperature in the city, but also of the city interactions with its surroundings, in particular, the Urban Heat Island effect. For the study of local high-temperatures at the airport scale, the characterization of the airport as a city as in Daniel et al. (2019), as well as the nearby cities which can influence the temperature at the airport could play an important role. Also, in Nabat et al. (2020), a better representation of near-surface temperatures in the Euro-Mediterranean region is attained by considering aerosols transport and enlarging the set of aerosol types taken into account. This highlights that improvements can be made in high resolution RCMs, and further effort needs to be done in this direction. Moreover, concerning precipitation extremes, Caillaud et al. (2021) show the added value of finer-resolution RCM in the representation of heavy rainfall events when combined with explicitly resolved deep convection and proper parametrizations of mesoscale processes. This new generation of Convection-Permitting RCMs could also bring a better representation of the 2-m near-surface air temperature (Lucas-Picher et al. 2021).

The impact of climate change on aviation is an emergent field of research. We find that our analysis of climate information presents some improvements with respect to the impact studies carried out so far at the airport scale:

  • the intercomparison of different observational datasets before evaluating climate model performances, to further assess future projections,

  • the evaluation of trends using quantile regression,

  • the consideration of both multi-GCM and multi-RCM ensembles of future climate projections over the airports.