1 Introduction

Clear-Air Turbulence (CAT) is an upper-level atmospheric phenomenon that has a hazardous and expensive impact on the aviation sector. Atmospheric turbulence is the leading cause (71%) of all in-flight weather-related injuries ( Hu et al. 2021) and annually costs the United States of America US$200 million (Williams 2014). CAT usually develops in cloud-free, stably stratified atmospheric environments (Jaeger and Sprenger 2007) and is undetectable using current on-board radar equipment. CAT develops in regions of shear-driven instability and is often found around upper-levelFootnote 1 jet streams. Jet streams are narrow bands of intense winds, which have a strong seasonal dependence and owe their intensity to latitudinal horizontal temperature gradients. Due to the steepening of the pole-to-equator temperature gradient in the upper troposphere and lower stratosphere, jet streams are expected to intensify in wind shear with anthropogenic climate change (Lee et al. 2019). The extra annual cost on the aviation industry to avoid CAT is £16 million (Search Technology 2000). In a future scenario, with double the pre-industrial CO2 atmospheric concentration, longer transatlantic flights would add an extra 2000 hours of annual travel and an additional 70 million kg of CO2 in annual fuel emissions (Williams 2016). In the same CO2 scenario, Williams and Joshi (2013) projected winter-time moderate-or-greater CAT encounters to increase by 40–170% over the North Atlantic. Building on this previous work, and using the same CO2 concentration scenario, Williams (2017) projected moderate CAT to increase by 94% in wintertime over the North Atlantic basin. Moderate turbulence inflicts vertical accelerations of up to 0.5 g (4.9 m s−2) on aircraft (Lane et al. 2004). Transatlantic air travel often confronts CAT due to the presence of the mid-latitude eddy-driven jet stream over the North Atlantic.

Williams and Joshi (2013) and Williams (2017) used a Coupled Model Inter-comparison Project phase 3 (CMIP3) coupled atmosphere-ocean climate model, with a grid resolution of 2.5° × 2.0°. The World Climate Research Programme (WCRP), previously called the World Group on Coupled Modelling (WGCM), first developed the Coupled Model Inter-comparison Projects in the 1990s, to evaluate and improve global climate models (GCM) and to understand future and past climate variabilities (Bock et al. 2020). The three latest generations are CMIP3, CMIP5, and CMIP6 (Bock et al. 2020).  Hu et al. (2021), using CMIP5 GCMs for their control state and a regional climate model, found an increase in CAT severities across the South China Sea, with moderate turbulence increasing by 12% over 50 years. Storer et al. (2017) also found a significant rise in moderate winter-time CAT across the globe. They projected an increase of 143%, 100%, 90%, and 127% at 200 hPa over the North Atlantic, North America, North Pacific, and Europe, respectively. Storer et al. (2017) used a CMIP5 GCM, namely the Met Office Hadley Centre HadGEM2-ES model. Williams and Storer (2022) compared this GCM against ERA-Interim reanalysis data from the European Centre for Medium Range Forecasts (ECMWF) and concluded that GCMs can successfully diagnose CAT and its response to climate change, when compared to reanalysis data.

The horizontal resolution coarseness of CMIP3 and CMIP5 GCMs led to the development of the High-resolution inter-comparison project (HighResMIP), which is a subsection of CMIP6 (Haarsma et al. 2016). PRIMAVERA (PRocess-based climate sIMulation: AdVances in high-resolution modelling and European Risk Assessments), launched in 2015 to manage and collate GCMs, further aided the development of HighResMIP. HighResMIP was created for the scientific community to evaluate the dependence of model resolution on a particular phenomenon and to reduce the resolution gap between numerical weather prediction (NWP) models and GCMs (Haarsma et al. 2016). Haarsma et al. (2016) suggested that an improvement in the horizontal resolution within a model could lead to a better representation of vertical dynamics in a system. For example, they found that vertically moving small-scale gravity waves were better represented in models with a finer horizontal resolution. The new CMIP generation also uses the Shared Socioeconomic Pathways (SSP) scenarios. These global warming simulations, within CMIP6, were found to better represent northern hemisphere (NH) storm tracks and jet streams, than CMIP5’s Representative Concentration Pathway (RCP) 4.5 or CMIP3’s special report emission scenario SRESA1B (Harvey et al. 2020). These different emission scenarios have led to an increase in climate sensitivity for CMIP6 models.

Upper-level atmospheric turbulence is anywhere from planetary to millimetres in horizontal size, but typically impacts aviation between 100 m and 1 km (Storer et al. 2017). Sharman et al. (2014), using extensive pilot reports (PIREPS), suggest a climatology for the turbulent state of the upper atmosphere. They concluded that a median patch of turbulence was approximately 60–70 km wide horizontally and 1 km deep vertically. Several CMIP6 GCMs have a close capability to resolve this median length scale (Sect. 2; Methodology). CAT has not previously been investigated with GCMs with such a capability. Therefore, this paper explores projected moderate CAT changes in time and with anthropogenic climate change using CMIP6 HighResMIP GCMs. A multi-model approach is applied to understand the dependence of CAT projections on model resolution. This study includes all seasons, over the North Atlantic. The layout of this paper is as follows. Section 2 discusses the approach and GCMs modelled data used. The results are discussed in Sect. 3. Section 4 draws the main conclusions from these findings.

2 Data and methods

The Met Office Hadley Centre HadGEM3-GC3.1 model, the Max-Plank Institute MPI-ESM1-2 model, and the European community Earth systems EC-Earth-3 model are the three CMIP6 HighResMIP GCMs chosen for this upper-level turbulence analysis. HadGEM3-GC3.1 is the latest Met Office global climate model configuration and was the UK community’s submission to CMIP6. This configuration has many new improvements and systematic error corrections compared to the corresponding CMIP5 submission (HadGEM3-GC2; Williams et al. (2018). HadGEM3-GC3.1 scored a high 727 out of 1000 on the Watterson et al. (2014) basic overall climate model metric, with HadGEM3-GC3.0 and HadGEM3-GC2 averaging at 711 and 686, respectively (Williams et al. 2018). HadGEM3-GC3.1 has three model resolutions available: HH/HM which has a horizontal grid spacing of 25 km, MM which has 60 km, and LL which has 135 km. The number of ensemble members for each resolution is found in Table 1.

Table 1 The horizontal grid spacing and the number of ensemble members for three CMIP6 HighResMIP GCMs: EC-Earth3, HadGEM3-GC3.1, and MPI-ESMI-2

Twenty-seven European research organisations and universities worked together to submit the updated version of EC-Earth to CMIP6. EC-Earth’s new atmosphere and ocean model projections have finer horizontal and vertical resolutions than CMIP5’s EC-Earth-2 GCMs (Haarsma et al 2020). Their HighResMIP sub-models are EC-Earth-3P (71 km) and EC-Earth-3P-HR (36 km). Due to a problem with greenhouse gas concentrations within ensemble member number 1 of EC-Earth-3P (71 km), two years (2013, 2014) are omitted from the EC-Earth-3P analysis in Sect. 3. The Max-Plank Institute Meteorology Earth System GCMs are the final models used within this investigation: MPI-ESM1-2-HR (67 km) and MPI-ESM1-2-XR (34 km). MPI-ESM1-2-HR had a computational cost 20 times greater than its older, coarser version (-LR). This has led to improvements in representations of teleconnections and mid-latitude dynamics (Mauritsen et al. 2019). The Max-Planck Institute sub-models only have one ensemble member (Table 1). The CMIP6 GCMs have three main tiered experiments. Of these, the historically forced coupled climate and ocean experiments (hist-1950) and the future projected forcings of 2015–2050 (highres-future) were used. Future projections (2015–2050) simulated using the SSP5-8.5 high-end impact scenario (Haarsma et al. 2016). Theses CMIP6 models were chosen due to their accessibility and available data on certain height levels, necessary to calculate our CAT indices. These models cannot resolve the thin vertical depth (1 km) of a patch of turbulence in the atmosphere. Therefore, this study focuses on the horizontal grid spacings across the GCMs (Table 1).

Due to the difficultly in resolving sub-grid scale turbulent kinetic energy (TKE), twenty-one diagnostics are used to represent CAT. These indices, first collated by Williams and Joshi (2013), have been used in previous literature to represent turbulent flow and instability, and the usage of 21 indices ensure the results are as robust as possible. Using an ensemble of these diagnostics permits diagnostic uncertainty quantification. Each index is listed in the appendix. The assumption that energy cascades from larger scales into smaller eddies is made with many NWP models and on average, is an appropriate representation of the spatial structure of atmospheric turbulence (Koch et al. 2005), and is so applied within this study. Each index represents different mechanisms for turbulent air flow. For example, an anticyclonically curved jet stream and the CAT associated with it, could be well represented by the vorticity advection index in combination with the magnitude of vertical wind shear. The Brown index, which is a combination of absolute vorticity and flow deformation, does not perform well in a strongly anticyclonic system because it does not efficiently distinguish between anticyclonic or cyclonic flow (Knox 1997). The frontogenesis function, an index related to the amplification of inertia-gravity waves during frontogenesis and their breakdown (Lane et al. 2004), also does not perform well in anticyclonic flow (Knox 1997), but well in cyclonical bent jet streams. The commonly used CAT forecasting Graphical Turbulence Guidance (GTG) tool is made up from several of our indices. Sharman and Pearson (2017) verified, through receiving operating characteristic curve analysis, that these GTG diagnostics effectively diagnose light-or-greater CAT at high altitudes. This paper takes an ensemble across the twenty-one indices to encapsulate a range of CAT-generated situations.

The cube-rooted eddy dissipation rate (EDR) is a common quantitative measure for atmospheric turbulence, as it is directly proportional to the root-mean-square vertical acceleration of a plane (MacCready 1964). This paper follows Williams (2017) and assumes certain EDR values relate to the severity of the turbulence encounter and the upper percentile ranges in each index correspond to these dissipation rates. Light, light-to-moderate, moderate, moderate-to-severe, and severe turbulent airflow arises in the 97.0–99.1%, 99.1–99.6%, 99.6–99.8%, 99.8–99.9%, 99.9–100% percentiles of each index, respectively. The cube-rooted EDR values (m2/3 s−1) for the previously defined severities, in succession, are 0.1–0.2, 0.2–0.3, 0.3–0.4, 0.4–0.5, and > 0.5 (Williams 2017). This paper mainly focuses on moderate turbulence encounters, as moderate turbulence is more common than severe turbulence and more hazardous than light turbulence. To quantify a change in time and with climate change, a reference period must be defined. Here we use moderate CAT values between the years 1950–1959 inclusive as a reference to quantify the percentage change in CAT. All seasons are included, within this controlled state. The yearly moderate CAT percentage changes over a 100-year period (1950–2050) are discussed in Sect. 3. The North Atlantic basin (50–75° N, 300–350° E) is the region of interest. The atmospheric height of analysis is at the typical cruising height of aircraft, 200 hPa.

3 Results

Moderate turbulence occurs within an EDR range of 0.3–0.4 m2/3 s−1, or within the 99.6–99.8th percentile ranges within each index. The middle (green) boxplots, across Fig. 1, show the distribution of the 99.6th percentile value for each index between 1950 and 1959 for all GCMs and their sub-models. Indices are abbreviated with full names listed in the Fig. 1 caption. When considering one model, its simulated threshold values are different for each severity. However, if considering multiple models, these values could be related to more than one severity. Median lines that reside to the far left of the ranges suggest that one or two outlying sub-models have altered the spread and led to an overlap of box plots within Fig. 1. Interestingly, often if the boxes do not overlap, the whiskers, which represent remaining points above and below 75th and 25th quartiles of the distribution, are overlayed. This is shown in Fig. 1a (vertical wind shear of the horizontal wind), where boxes do not touch but the whiskers do. Vertical wind shear is a diagnostic linked directly with the generation of CAT and is a component in several of the indices. Generally, within Fig. 1, as the CAT severity increases, the medians of the diagnosed threshold values, and their standard deviations, increase. Interestingly, flow deformation multiplied by vertical temperature gradient has a broad range of light values, compared to the other severities (Fig. 1g). This is also apparent for light threshold ranges for negative Richardson number (Fig. 1e), an interesting finding as severe turbulence arises in the upper 99.9th percentile of an index and its thresholds would be more likely to differ than light turbulence. In the rest of this paper, for simplicity, only moderate CAT events are evaluated. Moderate turbulence has a suitable spread of threshold values within Fig. 1 for a robust comparison between GCMs.

Fig. 1
figure 1

The distribution of 1950–59 EDR-related thresholds displayed for each diagnostic; vertical wind shear of the horizontal wind (W_Shear; a), Brown energy dissipation rate (BED; b), variant one of Ellrod's index (E1; c), variant two of Ellrod's index (E2; d), negative Richardson number (Neg_Ri; e), horizontal temperature gradient (HTG; f), flow deformation times vertical temperature gradient (FD_VTG; g), version 1 of North Carolina State University index (NCSU_V1; h), horizontal wind speed (W_Speed; i), wind Speed times directional shear (W_Speed_DS; j), magnitude of horizontal divergence (Hor_Div; k), magnitude of residual of non-linear balance equation (Res_NLBE; l), vertical vorticity squared (Vort_sq; m), relative vorticity advection (Rel_Vort_adv; n), magnitude of absolute negative vorticity advection (Neg_Vort_adv, o), flow deformation (FD; p), flow deformation times wind speed (FD_W_Speed; q), frontogenesis function (FF; r), Brown index (BI; s), potential vorticity (PV; t), Colson-Panofsky index (CP; u). Thresholds created using all ensemble runs, for all sub-models within the three GCMs listed in Sect. 2. There are five CAT severities displayed: light (blue), light to moderate (orange), moderate (green), moderate to severe (grey) and severe (purple). The 25th and 75th percentiles bound the boxes shown, with the vertical black line through each box representing the median data point. The whiskers extend to show the rest of the distribution

3.1 CAT variations in time for individual CAT indices

Out of the twenty-one indices, twelve show a definitive overall increase in moderate CAT in time (Fig. 2), despite some interannual variability. This increase is particularly evident in the last few decades of the hundred-year period. These diagnostics are as follows: version 1 of the North Caroline State University index (NCSU: Fig. 2h), wind speed (Fig. 2i), residual of non-linear balance equation (Fig. 2l), vertical vorticity squared (Fig. 2m), relative vorticity advection (Fig. 2n), negative absolute vorticity advection (Fig. 2o), flow deformation (Fig. 2p), flow deformation multiplied by wind speed (Fig. 2q), frontogenesis function (Fig. 2r), Brown's index (Fig. 2s), potential vorticity (PV; Fig. 2t) and Colson-Panofsky index (Fig. 2u). PV has a few anomalously large peaks above + 1000%. This is only for EC-Earth-3P (71 km) models in the years 2012 and 2050. Wind speed multiplied by directional shear (Fig. 2j) and magnitude of horizontal divergence (Fig. 2k) both had 6 out of 7 sub models projecting an increase in CAT. It can be extrapolated that on average, across the model resolutions, fourteen indices project an increase in CAT between 1950 and 2050. Interestingly, relative vorticity advection (Fig. 2q), negative absolute vorticity advection (Fig. 2r) and Brown index (Fig. 2s) project their greatest increase in CAT within coarser models of each sub-model, particularly EC-Earth-3P (71 km) and HadGEM3-GC3.1-LL (135 km). Here we use the term “coarser” to refer to models with horizontal grid spacing \(\ge\) 60 km.

Fig. 2
figure 2

The projected change in moderate CAT encounters from the threshold values for twenty-one indices that represent CAT across time. The percentage change in time for the three chosen CMIP6 GCMs is also shown, with each sub model included. Findings are averaged across each ensemble member (if applicable). The range of percentage change, shown through colour bars within the subplots, differ and increase from a to u

Variant one of Ellrod's index (T1; Fig 2c), which is a combination of flow deformation and wind shear (Ellrod and Knapp 1992), has one model (HadGEM3-GC3.1-MM) projecting no change to CAT. All other sub models project T1 decreasing in time. Vertical wind shear of the horizontal wind (Fig. 2a), Brown energy dissipation rate (Fig. 2b), variant two of Ellrod's index (Fig. 2d), negative Richardson number (Fig. 2e) and horizontal temperature gradient (Fig. 2f) also project a decline in moderate CAT but for all GCMs. CAT develops in regions of increased vertical wind shear instability, yet horizontal wind shear ends the period with a decrease in the number of moderate CAT events from the 1950 threshold period (Fig. 2d). Projected wind speed related CAT events at 200 hPa are expected to rise in this period, with maximums of +400% (Fig. 2i). Wind speeds across neighbouring atmospheric heights may increase at a similar or larger rate, decreasing wind shear values. Atrill et al. (2021) also project a decline in wind shear over polar regions and determined the same explanation, but the annual mean wind shear is increasing over the North Atlantic Ocean in time, as found by Lee et al. (2019). Flow deformation multiplied by vertical temperature gradient interestingly had three models projecting a rise in moderate CAT and four simulating a decline. The three sub-models projecting an increase are MPI-ESM1-2-HR, EC-Earth-3P and HadGEM3-GC3.1-LL. These are all the coarsest sub-models within their respective GCMs.

Due to all indices being closely linked and not physically independent, it is difficult to group them. One could look at the building components and group diagnostics in terms of having a vertical derivative or not. There is an almost even split across the indices, with ten and eleven diagnostics linked with either vertical and horizontal or just horizontal derivatives. Overall, there is a mixture of indices projected to increase and decrease in time in both categories. However, when including wind speed times directional shear (Fig. 2j) and magnitude of horizontal divergence (Fig. 2k) increasing in time, 82% of horizontal-only derivative related diagnostics project increases in CAT. There is a half-half mix within the vertical derivatives category (including T1).

3.2 North Atlantic seasonal CAT projections

Within this section, Fig. 3 displays the yearly percentage changes for each season, using the 1950s decade and all seasons within this period as a reference to investigate the seasonality of CAT. Figure 4 shows the same information, but with the reference period now being for each season separately rather than all seasons, so that all lines oscillate around zero in the 1950s by definition. EC-Earth and HadGEM3-GC3.1 models project an increase in winter moderate CAT in time over the North Atlantic. After the year 2030 in Fig. 3, HadGEM3-GC3.1 sub models project 100% more moderate DJF CAT events than the reference period. Despite the coarsest HadGEM3-GC3.1 model (-LL: 135 km) suggesting a dip after 2040 to + 50%, there are only marginal differences between each HadGEM3-GC3.1 sub-model for DJF in time. However, within Fig. 4, when compared to its individual 1950s reference, DJF is not increasing at the fastest rate over the period. Due to the independence of the variables and normality within the distribution, Fig. 4 meets the assumption for linear regression analysis and slopes (trends in time) are discussed. The slopes representing DJF are second steepest in three models, third steepest in two models and smallest in two models (EC-Earth-3P, HadGEM3-GC3.1-MM; Fig. 4b, f). Wintertime over the North Atlantic is historically a period of increased upper-level instability with the greatest risk of moderate turbulence, due to the strengthening of the meridional temperature gradient. Despite strong model agreement projecting an increase in moderate DJF CAT event, other seasons could increase at a greater rate.

Fig. 3
figure 3

Seasonal percentage changes in CAT, from the 1950 to 1959 threshold period that includes all seasons, against time (averaged across ensemble members). Season defined by a differing colour and line style, with northern hemisphere (NH) Winter (December–January–February; DJF), autumn (September–October–November; SON), summer (June–July–August; JJA) and spring (March–April–May; MAM) coloured navy blue and dashed, pink and solid, light blue and dash dotted, and orange and dotted, respectively. Population standard deviation error shaded

Fig. 4
figure 4

Seasonal percentage changes in CAT, from the 1950 to 1959 threshold period that includes just one season, against time (averaged across ensemble members). Season defined by a differing colour and line style as done in Fig. 3.

The moderate CAT frequency in summer is also projected to rise at the end of the 100-year period, as shown in HadGEM3-GC3.1 and EC-Earth-3 subplots within Fig. 3. JJA lines reside initially on or below 0% throughout Fig. 3 from 1950 to 2000 but shift positively and reach around + 25% (HadGEM3-GC3.1) and + 35% (EC-Earth-3) by 2020. JJA lines fluctuate at the end of the period at a similar range to DJF/SON at the beginning of the 100-years. This could imply that current and future JJA turbulence encounters have increased to the same original rate that DJF and SON had in the 1950s. Future CAT encounters in summer could become as common as the 1950s' most turbulent seasons. This rapid change in JJA is evident in Fig. 4. In fact, the two EC-Earth-3P sub-models within Fig. 4a, b project a significant increase in summer moderate CAT in time over the North Atlantic, at a greater rate in time than comparative seasons within the figure. These EC-Earth JJA rates peak at maximums of + 69.18% (3P-HR, Fig. 3a) in 2035 and + 76.13% (3P, Fig. 4b) in 2048. The coarser HadGEM3-GC3.1 models, MM (60 km) and LL (135KM) also project a significant JJA increase with similar trend of 0.37%/year and 0.38%/year (Fig. 3f, g).

MPI-ESM1-2-XR, -HR and HadGEM3-GC3.1-HM/HH (Fig. 4c–e) project a smaller percentage change in time for JJA moderate turbulent events, with slopes of 0.12 and 3 %/year. The MPI-ESM1-2 results (Figs. 3d, e and  4d, e) generally have no significant  trends in time for DJF, MAM, or JJA and have a significant amount of inter-annual variability. This model does, however, project an obvious increase in CAT for northern hemisphere autumn (SON). There is strong multi-model agreement that CAT events in SON will increase at a rapid rate in time. After 2020 and 2040 in Fig. 3, MPI-ESM1-2-HR (67 km) and EC-Earth-3P (71 km) project a sharp increase in the number of SON CAT events, reaching maximums of + 150%. Finer-resolution counter parts of MPI-ESM1-2 and HadGEM3-GC3.1 project an increase, but not at the same rate. Within Fig. 4, the greatest increase is simulated within the Max-Plank coarser sub-model (67 km, Fig. 4d) at + 105.19% in 2027. Averaging across sub-models, MPI-ESM1-2, EC-Earth and HadGEM3-GC3.1 simulate maximums of + 95.72%, + 62.69% and + 71.96% near the end of the 100-year period. The Max-Plank and Met Office Centre Hadley sub-models project the greatest rate increase in CAT events to occur within NH autumn.

Spring over the North Atlantic is a season with characteristically the fewest moderate CAT events. When including all seasons in the 1950–59 reference, the MAM lines reside below zero (Fig. 3). This suggests that the number of DJF and SON projected moderate CAT events in the 1950s is greater than the number of MAM CAT events throughout the 100-year period. However, there is an increase displayed in the HadGEM3-GC3.1 MAM lines within Fig. 4, with one sub model projecting a greater increase than DJF at 0.35%/year (Fig. 4f). Despite this increase, MAM is quite variable across GCMs and sub-models, for example MPI-ESM1-2-HR projects a negative slope in time (Fig. 4d). Therefore, further seasonal analysis is needed.

3.3 Moderate CAT variations with TAS

Thus far Sect. 3 has concentrated on the change in CAT with time. To isolate the relationship between global surface warming and upper-atmospheric CAT over the North Atlantic, the warming trend in each model is considered. The mean near-surface (2 m height) global temperatures in time are displayed in Fig. 5. As anticipated, these temperatures increase with time for all GCMs across all seasons. The SSP high-end projections modelled within the CMIP6 GCMs relate this warming to anthropogenic sources. All ensemble members, if applicable, have been included in Fig. 5. MAM and SON have a very similar warming trend in time, across all climate models. This relates to the equal and opposite seasonal differences between the northern and southern hemispheres. One may have assumed DJF and JJA to have the same global average in surface air temperature, like MAM and SON, but they differ due to differences in land mass between the hemispheres and the larger oceans to land heat capacity.

Fig. 5
figure 5

The global mean seasonal near surface temperatures (TAS) projected over time for each chosen CMIP6 global climate model. All sub models and ensemble members are included. Global mean TAS for HadGEM-GC3 DJF, MAM, JJA and SON are displayed in subplots a, d, g and j. For EC-Earth the same retrospective seasons are shown in subplots b, e, h, and k and MPI-ESM1.2 are shown in subplots c, f, i and l

HadGEM3-GC3.1 has the largest sample size available with 11 ensemble members over the sub models against only 6 other GCM members. HADGEM3-GC3.1 models have coldest start of the period, relative to the other GCMs. This is most apparent for ensemble member number 4 within HadGEM3-GC3.1-LL model runs, with a large change in DJF near-surface air temperatures (TAS) of 4.05 °C between 1950 and 2050. MPI-ESM1-2 and EC-Earth-3 models project a similar and interchangeable increase in global temperature change. Within MPI-ESM1-2, the finer -XR sub model is overall colder than the HR model, particularly in the beginning of the period. This new generation of CMIP models has a higher climate sensitivity than previous CMIP5 generations (Harvey et al. 2020). This spread of future temperature could be linked to this heightened sensitivity. Sensitivity is defined here as an outcome that arises from the physical and dynamical chaos within climate models, and something that is not directly developed by a climate modeller. This may be one explanation as to why different ensemble runs, within the same sub model, reach temperatures at different stages in time within Fig. 5. The change in TAS for each year from 1950 is analysed in later figures, to better compare the warming trend with CAT in each sub-model and to consider the inter-annual TAS variability.

Figure 6 outlines the relationships between moderate CAT percentage change (Fig. 4) and the change in global-mean seasonal TAS. The slopes of regression are a form of analysis used to determine a trend within TAS. There is good internal agreement across the HadGEM3-GC3.1 (Fig. 6e–g) models for DJF with projected CAT increasing by 8.67–9.74%/°C. The Met Office Hadley Centre mid-resolution (60 km) sub model (HadGEM3-GC3.1-MM) projects NH spring values to increase by 11.69% per degree. This is the second fastest rate after NH autumn within Fig. 6f, placing DJF with the lowest increase. Despite EC-Earth-3P (71 km) projecting a similar increase of 11.89%/°C for MAM, the remaining sub-models within MPI-ESM1-2 and EC-Earth project an insignificant increase with TAS.

Fig. 6
figure 6

The moderate percentage change scattered against change in mean global seasonal near surface temperature (TAS). Shade of scatter relates to the year of moderate CAT events. Season defined by colour and line style, with DJF, SON, JJA, and MAM coloured blue and dashed, pink and dotted, light blue and solid, and orange and dash dotted, respectively. The line of regression slopes, with 95% confidence intervals, take an average over ensemble members

There is an apparent model grid dependence for SON projections. Across all GCMs, the coarser sub-model versions project the maximum SON percentage changes, by a difference of ~ 2%/°C for HadGEM3-GC3.1 and EC-Earth-3 models and by ~ 6% between MPI-ESM1-2 rates. EC-Earth3P (71 km) and HadGEM3-GC3.1-LL (135 km) simulate increases of 14.53%/°C and 14.74%/°C, respectively. MPI-ESM1-2-HR simulated the largest projected SON rate of 19.14%/°C. However, the confidence of this trend is combatted by a large interval of ± 7.97%/°C. MPI-ESM1-2’s results have a large spectrum of uncertainty with relatively large confidence interval ranges. This could, however, be related to the ensemble size of MPI-ESM1. NH Summer projections vary across Fig. 6, from HadGEM3-GC3.1 modelled data suggesting an increase at a similar or just below SON. MPI-ESM1-2 projected an increase in JJA by 6.75%/degree on average and EC-Earth-3P simulate respectively large JJA trends, projecting + 20.64%/°C and + 18.75%/°C for EC-Earth-3P-HR (36 km) and -3P (71 km). EC-Earth-3P-HR projects larger increase for summer-time moderate CAT per degree than any other season or model. This suggests a rapid increase in the number of CAT events for summer, which is a season that has historically not been as comparably turbulent.

3.4 Averaged trends with anthropogenic climate changes

To reduce uncertainty that arose from averaging over ensemble members, Fig. 7 displays Fig. 6’s regression line slopes, with confidence intervals for all ensemble members. Within Fig. 7, the boxes are colour coded in terms of resolution range. Autumn across the North Atlantic is projected to have a large increase per degree of surface warming. SON’s slopes (Fig. 7c) range from 10.78 to 15.24, 13.21 to 19.14, and 10.45 to 17.37%/°C for the EC-Earth-3P, MPI-ESM-2, and HadGEM3-GC3.1 ensembles, respectively. Each square, representing an individual ensemble member's slope, displayed within Fig. 7c propagates closely to the median line. This is also visible for DJF trends (Fig. 7a). DJF's slopes has few multi-model disagreements, with HadGEM3-GC3.1, MPI-ESM1-2, and EC-Earth-3 on average increasing by 9.10 ± 1.83, 9.73 ± 6.26, and 7.56 ± 2.86 % per degree.

Fig. 7
figure 7

Regression line slopes showing the trend between moderate CAT percentage changes over the North Atlantic, against the global mean seasonal near-surface temperature for all ensemble members within HadGEM3-GC3.1, EC-Earth-3P and MPI-ESM1-2. The colour of the square represents the range of which the sub model’s horizontal resolution resides within, with 25–36 km shaded orange, 60–71 km shaded blue, and 135 km shaded green. The navy dashed line is the average (median) across all squares for each season, with subplot a, b, c and d displaying slopes from DJF, JJA, SON and MAM

This strong model agreement is not apparent within NH summer projections, with JJA on average increasing by 12.41 ± 1.39%/°C, 6.71 ± 4.76%/°C and 21.18 ± 3.02%/C for HadGEM3-GC3.1, MPI-ESM1-2 and EC-Earth-3 models. This wide variability is evident across GCMs but is not the case internally, with ensemble members residing usually in similar brackets. For example, all EC-Earth-3 ensemble member JJA results lie above the median line, ranging from 15.70 to 22.53%/C. Interestingly, NH summer is projected to increase in the number of moderate CAT events at a similar (or greater) rate than NH Autumn. NH spring over the North Atlantic has a wide spread of moderate CAT projections. On average EC-Earth-3 and HadGEM3-GC3.1 MAM slopes are increasing at a similar rate with median values of 9.64 ± 4.75 and 8.89 ± 3.31%/°C, but there is a considerable spread of 0.07 to 13.21 and 2.37 to 23.90%/°C, respectively. HadGEM3-GC3.1-MM ensemble 1 projects the greatest increase in moderate CAT per degree for NH spring (23.90%/°C). However, this comes with a high slope uncertainty of ± 8.87%/°C. Ensemble 2 of this Met Office Hadley Centre model projects an increase of only 5.77 ± 7.40%/°C. To complete these large uncertainties associated with NH spring, MPI-ESM1-2 has the smallest average increase of + 2.50 ± 7.1%/°C.

There is at least one ensemble member for each of the GCMs that projects a minimal change per degree in spring. This adds weight to a MAM future scenario with negligible moderate CAT increases. However, there are many projections with increases similar to DJF. If one takes an ensemble member and multi-model average across these GCMs, MAM is projected to increase by  8.81 ± 1.90%/°C. This quantitative median applied to the other seasons leads to  an overall increase in moderate CAT events by 8.94 ± 1.54%/°C, 13.82 ± 1.27%/°C, and 13.69 ± 1.28%/°C for DJF, SON, and JJA, respectively. This is not a weighted average, as HadGEM3-GC3.1 makes up 11 of the 17 sampled slopes, so has a larger influence on averages across GCMs.

4 Summary and discussion

This paper has explored the projected moderate CAT changes with anthropogenic climate change using a sample of CMIP6 HighResMIP GCMs. This publication uses three models, the Met-Office Hadley Centre model HadGEM3-GC3.1, the Max-Plank Institute model MPI-ESM1-2, and from a collaboration of European universities and organisations, the EC-Earth-3 model. All model simulations cover the period 1950–2050 and follow the CMIP6 HighResMIP protocol. There are several different resolutions across these GCMs, and so a multi-model approach was applied to understand the dependency of CAT projections on model resolution. CAT changes in the North Atlantic in northern hemisphere winter (DJF), summer (JJA), autumn (SON), and spring (MAM) are individually analysed from Figs. 3, 4, 6 and 7. Twenty-one indices are used to represent the different dynamical scenarios for CAT production.

There were variations in the results for the different indices and climate models. Most diagnostics displayed an increase in moderate CAT from 1950 to 2050. The greatest projected increase arose in the last 30–40 years of the period of interest.  For three indices, the sub-models with coarser resolutions (grid lengths \(\ge\) 60 km) simulated the maximum increases in moderate CAT.

HadGEM3-GC3.1 and EC-Earth-3 models projected an increase in wintertime CAT across the 100-year period, with large interannual fluctuations after 2030. These models also simulated an increase in CAT with time for all seasons. In contrast, the MPI-ESM1-2 changes had no significant trends for DJF, MAM, or JJA in time. However, there was strong multi-model agreement that CAT events in SON will increase at a fast rate in time, compared to other seasons. The greatest seasonal increase for SON was projected in MPI-ESM1-2-HR by + 105% in late 2020 (Fig. 4d).The number of CAT events in summer is also projected to rise at the end of the 100-year period, as shown in the HadGEM3-GC3.1 and EC-Earth-3 subplots within Figs. 3 and 4. This implies current and future JJA turbulence encounters may have increased to the same original rate DJF and SON had in the 1950s.

Overall, no dependence on model horizontal resolution was found after averaging across the indices for DJF, JJA and MAM. Global seasonal-mean near-surface temperature was used as the metric of global warming in each model, allowing the expression of projected CAT changes per degree of global warming. Within SON results, and individual diagnostics, coarser sub-models generally produce greater increases in time and with near-surface warming. We speculate that perhaps the move to high resolution climate models may result in projections of increased SON CAT being revised downward slightly. Autumn, over the North Atlantic, had good multi-model agreement on future CAT projections, with all GCMs agreeing an increase between around 10 and 20%/°C, with a median percentage increase of 13.82 ± 1.27%/°C. A high multi-model agreement also arose for DJF CAT changes. Wintertime CAT on average will increase by 8.94 ± 1.54%/°C. This differs from JJA slopes, which had a considerable spread across the GCMs. NH summer CAT projections, over the North Atlantic, on average suggest an increase of 13.69 ± 1.28%/°C, a similar average to NH autumn results. However, on average for each GCM, JJA is projected to increase in the number of CAT encounters by 21.18 ± 3.02%/°C, 12.41 ± 1.39%/°C and 6.71 ± 4.76%/°C for EC-Earth-3, HadGEM3-GC3.1 and MPI-ESM1-2. Despite CAT previously not being commonly encountered in NH spring, both HadGEM3-GC3.1 and EC-Earth-3 models projected a significant increase time and with global warming. On average, across all models, MAM had an increase in the number of moderate CAT events by 8.81 ± 1.90% per degree of global seasonal near-surface warming. This rate is similar to that found for the most turbulent season DJF.

In summary, this multi-model analysis found that moderate CAT will increase in future over the North Atlantic, for all seasons. This will have consequences for to the aviation industry, with more flights disrupted and increased damages and costs. Future work should quantify how this overall increase will impact aircraft directly and if the increase found over the North Atlantic in this study is linked to a higher density of CAT outbreaks in particular regions.