1 Introduction

1.1 Winter windstorms and seasonal forecasting

Severe winter windstorms are amongst the most damaging natural hazards in Europe and have a large impact on society. A potential increase in these extreme events over densely populated regions of Europe can result in tragic loss. Consequently, this leads to a crucial interest from society (e.g., government, contingency planners), industry (e.g., the insurance sector) and scientists in order to create and to benefit from better prediction capabilities. Previous studies about mid-latitude cyclones and storminess are numerous and reviews of past and potential future developments can be found e.g., in Ulbrich et al. (2009) and Feser et al. (2015) or more recently in Catto et al. (2019). Multiple studies were conducted to quantify objectively extra-tropical cyclones (e.g., Murray and Simmonds 1991; Roberts et al. 2014; Trigo 2006). A more detailed overview of cyclone tracking methods and their performances can be found in Neu et al. (2013). Nevertheless, to allow for an analysis of the extreme, most damaging events at the tail of the intensity distribution, those objective trackers may not be focussed enough on potential impact to society. Consequently, Leckebusch et al. (2008b) introduced an objective percentile-based, near-surface wind tracking to identify those windstorms which have high impact potential. This method is widely used and has successfully been applied to different data sets: synoptic forecasts (e.g., Ng and Leckebusch 2021), seasonal-to-decadal data (e.g. Befort et al. 2019; Kruschke et al. 2016; Schuster et al. 2019), as well as for climate time scales (e.g., Nissen et al. 2013) and in relation to potential damages (Walz and Leckebusch 2019).

Studies of seasonal predictability have already shown skill for mean seasonal conditions in Europe e.g., for temperature or precipitation (cf. e.g., Fereday et al. 2012; Folland et al. 2012; Kim et al. 2012; Palmer et al. 2004) and have gained increasing interest (Domeisen 2020; Simon et al. 2007). Recent studies focussing on seasonal predictability of European average temperature and precipitation has shown them to be usable, but with limited forecast skill. Yet, forecast skill of these parameters by using circulation patterns as predictors are higher (Athanasiadis et al. 2017; Baker et al. 2018; Scaife et al. 2014). Seasonal forecast skill can be further improved using larger ensemble sizes (e.g. Mishra et al. 2019). In winter, the extratropical skill for temperature and precipitation appears to originate in the tropics (Scaife et al. 2017) where precipitation shows strong seasonal forecast skill especially in the east pacific (Scaife et al. 2019b). For other seasons, skill is generally more limited but there has been some skill found for summer rainfall and circulation (Beverley et al. 2019; Dunstone et al. 2018).

One of the first studies to investigate systematically the seasonal forecast skill of severe winter windstorms was published by Renggli et al. (2011). Their study revealed overall only marginally usable skill, but a better skill for high and low frequency storm seasons than for neutral seasons. Recent studies investigated newer versions of operational seasonal forecast suites, showing now significant positive forecast skill for storm frequencies over North-West Europe and the North-East Atlantic region (e.g. Befort et al. 2019; Hansen et al. 2019). The former investigated the seasonal forecasts of the UK Met Office (Global Seasonal Forecast System Version 5—GloSea5) and two forecasts from the European Centre for Medium Weather Forecast (ECMWF—System 3 and 4). They found positive skill for windstorm frequency especially over Western Europe with a high influence of the North Atlantic Oscillation (NAO). Hansen et al. (2019) revealed a strong connectivity of storm count predictions to stratospheric sudden warmings. Other storminess measures (not event based/tracked) show strong and significant skill on a seasonal scale in GloSea5 (Scaife et al. 2014). They also investigated the NAO as the main driver of European storminess and showed that signal-to-noise ratios were anomalously low and that the model predicted the real world better than its own members. Subsequent studies verified this for other timescales and this ‘signal-to-noise paradox’ appears to be widespread in long range predictions when there is high enough skill (Eade et al. 2014; Dunstone et al. 2016; Scaife and Smith 2018; Weisheimer et al. 2019). This signal-to-noise paradox will be investigated for the first time for storm forecasts here.

1.2 Forecast skill of large-scale patterns important for extra-tropical cyclones and European storminess

The NAO is the predominant variability pattern for European weather and climate, including extra-tropical cyclones and severe winter storms (Ambaum et al. 2001; Hurrell et al. 2001, 2003; Leckebusch et al. 2008a). Studies of the seasonal forecast skill of the NAO now show agreement that there is predictability of the NAO index at lead times of 2–4 months, calculated with different methods and in various data sets (Athanasiadis et al. 2017; Baker et al. 2018; Domeisen et al. 2018; Dunstone et al. 2016; Hansen et al. 2019; Scaife et al. 2014; Weisheimer et al. 2017, 2019).

Besides the NAO, other dominant large-scale patterns are of interest for European weather (Zubiate et al. 2017), for example, the Scandinavian Pattern (SCA: Barnston and Livezey 1987; Bueh and Nakamura 2007) and the East-Atlantic Pattern (EA: Nesterov 2009; Wallace and Gutzler 1981).

The influence of NAO phases on the development of cyclones and extreme windstorms were investigated by Pinto et al. (2008) and Donat et al. (2010). They conclude a strong positive NAO phase leads to a higher number of extreme cyclones. The connection between windstorms and large-scale patterns was further investigated e.g., by Walz et al. (2018a). They developed a so-called “map of drivers” revealing the spatial distribution of the most dominant large-scale pattern influencing the inter-annual variability of windstorm frequency through a multiple linear regression model. Related to this, a later publication from these authors showed also a link between the NAO and potential windstorm losses (Walz and Leckebusch 2019). Befort et al. (2019) and Scaife et al. (2014) used the NAO as the main driver over Europe for an indirect approach to statistically forecast windstorm numbers or general storminess, respectively, based solely on the model forecasted NAO. This indirect regression-based approach results in significantly positive forecast skill of windstorm frequency over the British Isles but with a lack of skill at around 45° N. This is in line with the “map of drivers” (Walz et al. 2018a), where for this region the EA index (rather than the NAO) is identified as the leading influence factor. The main dominant large-scale patterns influencing storm frequency over the North Atlantic/European region were identified to be the NAO, SCA and EA. For regions with high damage potential, Walz et al. (2018a) could successfully model the interannual variability of severe windstorms based on these three steering factors.

The present study will expand on the investigations from Befort et al. (2019) and Walz et al. (2018a) in two major aspects. For the first time, the forecast skill of the seasonal windstorm intensity over Europe will be analysed. Secondly, the impact of important large-scale patterns (NAO, SCA, and EA) for skilful seasonal windstorm predictions is analysed more systematically and compared to the skill of forecasts based on explicitly modelled storms. The paper is structured as follows: Data description is given in Sect. 2. The methodology is described in detail in Sect. 3. In Sect. 4, results for the direct and indirect approach and the signal-to-noise paradox are presented. The paper finishes with a discussion and conclusion in Sect. 5.

2 Data

The UK Met Office “Global Seasonal Forecast System Version 5” (GloSea5, MacLachlan et al. 2015) is used to quantify the seasonal prediction skill of objectively identified and tracked windstorms. Previous studies already showed positive skill in GloSea5 for predicting various parameters (Befort et al. 2019; Scaife et al. 2014, 2019a). The GloSea5 hindcast data are available from the Copernicus Climate Change Service (C3S) for 1993–2016 and in this study we used 6 hourly resolution for 10 m wind speed and mean sea level pressure (MSLP). The spatial resolution of GloSea5 is 0.83° in longitude and 0.55° in latitude. The hindcast has four different initialisation dates per month (1st, 9th, 17th and 25th of each month), runs over 7 months with 7 members for each initialisation date. The members for the same date differ only by use of a stochastic physics scheme (MacLachlan et al. 2015).

This study investigates the main winter storm season from December to February (DJF). Consequently, the initialisations around the 1st of November (i.e., the 25th October, 1st November and 9th November) were used. 3 system versions are available for GloSea5 which refer to small model updates. This results in an ensemble of 63 members (3 initialisations × 3 system versions × 7 members) for GloSea5 in this study. As observational reference, the ECMWF reanalysis ERA5 data set (Hersbach et al. 2019) is used for the same years (1993–2016) as available for the hindcast data set. The 10 m wind speed is used for windstorm identification and tracking and the MSLP for calculating large-scale patterns (for details cf. Sect. 3) both in 6 hourly steps with the 0.25° spatial resolution of ERA5.

The storm frequency data for both data sets is analysed on a 2.5° × 2.5° grid resolution as identified tracks are counted from the storm tracking outputs. This extrapolation is part of the frequency algorithm described in the Sect. 3 and retained for a better comparison with previous windstorm frequency studies. It leads to a better reflection of the true available track information, as a finer defined grid would contain data gaps. For storm intensities, the model resolution is unaffected. Hence, for a grid-based comparison, ERA5 data were re-gridded onto the GloSea5 resolution by a bilinear interpolation.

3 Methods

An overview of the investigation approach is shown in Fig. 1. For analysing predictive skill of severe windstorm events, all individual windstorm events in all ensemble members are identified, tracked, and then analysed with respect to seasonal frequency and the two intensity measures. In order to quantify the individual role of large-scale variability patterns, an indirect (regression-based) approach uses the large-scale patterns to build a statistical link (regression) to storm counts and intensity, which is then used to statistically model storm parameters. These regressions are built in two different ways: (a) ERA5-based with the observed large-scale patterns and storm parameters and (b) GloSea5-based with forecast large-scale patterns and storm parameters. After creating these links, they are used with GloSea5 large-scale patterns to predict windstorms within the indirect approach. Consequently, in both indirect approach settings, the predictive skill comes solely from the predictive skill of the large-scale patterns of the forecast ensemble. In (a) a real-world link to local storm parameters is utilised and in (b) a model-world link between large-scale patterns and storm parameters is used. The windstorm prediction skill from these statistical regressions is validated with Kendall correlation as for the direct approach. Kendall correlations and receiver operating characteristic (ROC) are used to measure skill (Wilks 2011, please see the Supplementary Appendix for details).

Fig. 1
figure 1

Workflow of methodology

3.1 Severe windstorm tracking

Severe windstorm events are objectively identified and tracked following the approach by Leckebusch et al. (2008b). Windstorms are defined as clustered (minimum 130,000km2) exceedance of the local 98th percentile of the 10 m wind speed per grid cell. This local 98th percentile was identified as a suitable threshold for potential damages from extra-tropical cyclones (Klawa and Ulbrich 2003). A storm track output was created if this exceedance lasted for at least 8 time steps (~ 42 h, Angus and Leckebusch 2020; Priestley et al. 2020). Three core parameters are analysed for windstorm activity with these tracking outputs: the frequency, the season integrated intensity and the season-averaged event intensity.

Windstorm frequency was established by Leckebusch et al. (2008b) as track densities (following Leckebusch and Ulbrich 2004) where the latter is further defined and described by Kruschke (2015). The storm frequency calculation uses a specific radius to be representative for the affected area for counting events. Similar to Befort et al. (2019), in this study a radius of 700 km is selected in line with the principal area of influence of severe extra-tropical cyclones. A validation study with different radii can be found in Degenhardt et al. (2020). The storm counts per grid cell are accumulated per season.

Windstorm intensity per season is quantified based on respective storm footprints (all grid cells the windstorm is affecting) and the related objective intensity measure, the Storm Severity Index (SSI, introduced by Leckebusch et al. 2008b) in two ways. Firstly, the so-called Event SSI (ESSI) is used as the total sum of all grid cell SSI values of an individual storm track over all time steps. This ESSI is consequently attributed to all grid cells of the storm footprint. To avoid scattered extreme values and to gain a homogeneous severity area, the individual grid cells are smoothed with their eight surrounding grid cells. Grid-cell-based intensities per individual storm are accumulated over the whole winter season to result in a seasonal intensity measure. The seasonal accumulated ESSI (ESSIa) is one intensity measure in this study and represents the total severity of the season. Secondly, as a further intensity measure, the ESSIa has been standardised by the number of storms per grid cell. Hence, this measure is not a season integral, but storm-count normalized ESSI (ESSIs) per season and represents thus an average severity of a storm in that season.

3.2 Calculation of large-scale patterns

Besides the validation of the principle forecast skill of winter windstorms derived from the model, our study utilizes the three dominant large-scale patterns over Europe as predictors to statistically calculate storm parameters. Our definition and calculation of large-scale patterns follows the NOAA (National Oceanic and Atmospheric Administration) definitions (Barnston and Livezey 1987) also used e.g., in Walz et al. (2018b). Large-scale patterns are calculated as monthly means with an EOF analysis over a standardized monthly mean anomaly of MSLP for the North Atlantic/European region (100° W–40° E, 30°–75° N). The first EOF is identified as the North Atlantic Oscillation (NAO), the second as the Scandinavian Pattern (SCA) and the third is the East-Atlantic Pattern (EA). These patterns for ERA5 and GloSea5 respectively, can be found in the Supplementary Appendix (Fig. A1). To enable a better comparison between the reanalysis data set and the forecast model, the principal component analysis (PCA) calculation for both data sets is based on the monthly EOF patterns from ERA5. This method of projection can be done because the EOF patterns for the three selected large-scale patterns are spatially well represented in the GloSea5 hindcast data set (cf. Fig. A1): the climatological EOF patterns are similar in ERA5 and GloSea5. The resulting indices are used as seasonal means (DJF) to compare with the seasonal windstorm parameters, seasonal frequency and intensity.

3.3 Skill analysis for direct and indirect approaches

In the first, direct approach, predictive skill of windstorm parameters (frequency, ESSIa and ESSIs) are diagnosed directly based on storms identified and tracked in individual members of the seasonal hindcast ensembles (GloSea5). This approach was also utilised in Befort et al. (2019) for windstorm frequency forecasts, consequently our results should be directly comparable with this previous study. In addition, windstorm intensity is investigated in a similar way.

In the second, indirect approach, storm parameters are calculated via a statistical regression, based on a combination of large-scale patterns as predictors. This method was previously used (e.g. in Befort et al. 2019; Scaife et al. 2014). To do this, a multiple linear regression model is built for each of the three windstorm parameters as predictands (frequency, ESSIa or ESSIs) and with the three selected large-scale patterns (NAO, SCA and EA) as predictors (see Eqs. 1, 2). For GloSea5 the ensemble mean is used to build the regression model.

Step 1 to build the regressions for the indirect approach is to use the large-scales patterns, either from ERA5 or GloSea5 and link them to the storm parameters in the respective data set. As seen in Eqs. 1 and 2 this leads by the definition of a multi-linear regression to the regression coefficients β.

$${Storm Parameter}_{ERA5} \sim {\beta }_{NAO:ERA5 }{NAO}_{ERA5}+ {\beta }_{SCA:ERA5 } SCA_{ERA5}+ {\beta }_{EA:ERA5} EA_{ERA5}$$
$${Storm Parameter}_{GloSea5} \sim {\beta }_{NAO:GloSea5 } {NAO}_{GloSea5}+ {{\beta }_{SCA:GloSea5 } SCA}_{GloSea5}+ {{\beta }_{EA:GloSea5 } EA}_{GloSea5}$$

These regressions will be validated by an ANOVA (analysis of variance) testing (Von Storch and Zwiers 2001) to investigate the influence of the three large-scale patterns on the storm parameter prediction individually and in combination.

In step 2 of this approach, these coefficients are used and connected with GloSea5 large-scale pattern indices to statistically predict the storm parameters by using the large-scale patterns (see Eqs. 3, 4).

$${\beta }_{NAO:ERA5 }{NAO}_{GloSea5}+ {{\beta }_{SCA:ERA5 } SCA}_{GloSea5}+ {{\beta }_{EA:ERA5 } EA}_{GloSea5}= {Storm Parameter}_{stat. forcasted}$$
$${\beta }_{NAO:GloSea5 }{NAO}_{GloSea5}+ {{\beta }_{SCA:GloSea5 } SCA}_{GloSea5}+ {{\beta }_{EA:GloSea5 } EA}_{GloSea5}= {Storm Parameter}_{stat. forcasted}$$

ERA5-based regression as well as the GloSea5-based regression to windstorms are applied to finally deduce windstorm parameters; more details can be found in the Supplementary Appendix.

Studies have also used a generalised linear Poisson model for the investigation of severe windstorm event counts (e.g., Walz et al. 2018b). Here we are not just investigating storm counts, but also continuous storm intensity values, thus a simple multi-linear model was chosen. Nevertheless, the performance of a simple multi-linear model in comparison to a Poisson model was tested but revealed no significant difference in performance. More detailed descriptions of the 2 different regression settings and skill measures can be found in the Supplementary Appendix.

3.4 Signal-to-noise paradox

The Signal-to-noise paradox is investigated as described by Eade et al. (2014) and Scaife and Smith (2018) by using the ratio of predictable component (RPC). The RPC is the ratio of the direct forecast skill of the model ensemble mean (rmo) and the average correlation of ensemble members with the ensemble mean (rmm). If the RPC (= rmo/rmm) is higher than 1 then the skill of the model in predicting observations is higher than the skill of the model in predicting its own ensemble members.

4 Results

4.1 Direct forecasts of frequency and intensity

The windstorm frequency forecast skill following the direct approach is assessed by the ranked Kendall-\({\tau }_{b}\)-correlation between ERA5 and GloSea5 ensemble mean (Fig. 2a). Highly significant positive correlations over extended areas over Europe at the end of the Atlantic storm track are revealed: the British Isles, northern France, the Scandinavian region and over the Azores. Significant skill at the 95% level is found with a significance test for Kendall correlation. These results corroborate similar patterns in Befort et al. (2019) and Scaife et al. (2014) but within the latest GloSea5 model version and on event-based storm counts.

Fig. 2
figure 2

Forecast skill for storm parameter (Frequency, ESSIa, ESSIs). Ranked Kendall-\({\tau }_{b}\)-correlation between ERA5 and GloSea5 GC2 ensemble mean (63 member) from winter season 1993/94 to 2015/16, regions with 1.9 or less storms in observation (ERA5) are shaded, dotted grid points are significant on 95% level

For the integrated seasonal intensity (ESSIa, Fig. 2b), similarly positive significant skill over western central Europe especially over the UK, the North Sea and large parts of Scandinavia is found. A secondary positive skill area is also found over the Azores. In comparison to the skill pattern for storm frequency, the significant signal is shifted slightly towards the northwest. The skill pattern for the standardised storm intensity (ESSIs, Fig. 2c) shows a very coherent pattern, although with slightly reduced areas of significance. Interestingly, a core region at the end of the North Atlantic storm track, mainly the region upstream of Ireland and over the south of Norway shows significant positive forecast skill. Overall, for the first time, coherent and significant skill for a dedicated, objective storm intensity measure is identified in seasonal forecasts for highly relevant regions over Europe and the Northeast Atlantic.

In addition to correlations, the ROC curve statistic is used as forecast verification. The ROCSSs are presented as spatial distributions (Fig. 3). Skill for storm parameters is assessed using terciles. ROCSS values are increased over similar regions as already revealed by the correlation-based analysis: For frequency and both intensity measures, regions over western central Europe and Scandinavia (e.g., the British Isles, the North Sea, northern France, Norway, Sweden and the Baltic region) show high and significant ROCSS with values up to 0.9, depending on the specific region, tercile and storm parameter. Overall, the lower and higher frequency/intensity terciles seem to be better forecast than the middle tercile as is often seen in seasonal forecasts (Mason et al. 2021).

Fig. 3
figure 3

Forecast ROC skill scores. Area under the ROC curve per grid cell between ERA5 and GloSea5 GC2 ensemble (63 member) for storm parameter (Frequency, ESSIa, ESSIs) from winter season 1993/94 to 2015/16, regions with 1.9 or less storms in observation (ERA5) are shaded, dotted grid points are significant on 95% level

The spatial patterns of positive and significant Kendall correlations (Fig. 2) are similar to the pattern of ROCSS. This means the high correlation skill in windstorm frequencies over the UK, Azores and east Scandinavia are shown as high ROCSS in the lower frequency seasons and the correlation skill over Norway in the higher frequency cases. Same for ESSIa where the correlation skill over the UK (Norway) is seen spatially coherent with the significant areas of the lower (higher) intensity seasons. For ESSIs, the significant ROCSS areas are seen in line with the correlation skill within the lower (higher) intensity season over Ireland (Norway). This suggests that the ROC and correlation scores contain largely similar information about the forecast skill (cf. Yang et al. 2018) and that the skilful forecast originate mainly from the extreme seasons.

Seasonal forecasts are highly likely to be skilful on greater spatial scales. With the ROC curves (Fig. 4), an investigation on a city scale was made to ensure skilful predictions for densely populated regions. Three cities were selected as small-scale representative and averaged with the adjacent westerly and north-westerly grid cell. The British Isles are represented by a London (0° E, 52.5° N) region. An area for Hamburg (10° E, 52.5° N), Northern Germany, was picked as representative for Central Europe. And for the Scandinavian region, Oslo (10° E, 60° N) is selected.

Fig. 4
figure 4

Forecast skill of the city scale. Mean ROC curve for 3 grid cells (London, Hamburg and Oslo) and their respective westerly and northwesterly grid cell, between ERA5 and GloSea5 GC2 ensemble (63 member) for storm parameter (Frequency, ESSIa, ESSIs) from winter season 1993/94 to 2015/16, associated ROC skill score as area under the curve in the bottom corner for the single “city”-grid cell, significance of ROCSS on 95% level marked with *; highest tercile (purple), middle tercile (green), lower tercile (orange)

At all 3 locations and for all 3 storm parameters the middle tercile (green) never shows significant ROC scores and they are mainly close to the zero-skill line (Fig. 4). This means that neutral seasons are not skilfully predicted. For the highest and lowest tercile, the ROCSS reveal an overall capability for skilful prediction (Fig. 4). In detail, the London area forecast has significant positive prediction skill for the higher tercile of storm frequency, but not for lower or middle tercile seasons. However, for ESSIa and ESSIs: the ROC curves are close to the no-skill line, especially for medium and high intensity seasons, although the low intensity seasons are close to significant. The Hamburg grid cells show significant forecast skill for high and low frequency seasons. The total accumulated intensity measure (ESSIa) and storm-count standardised intensity (ESSIs) show no significant ROC scores for either category. For the Oslo region, the highest tercile (purple in Fig. 4) of all three storm parameters is significantly well predicted, as well as the lower activity tercile for the standardised intensity (ESSIs). The storm frequency has the highest and significant forecast skill in all three city regions in the higher tercile seasons.

4.2 Indirect forecasts of frequency and intensity

Here the role of the relevant large-scale circulation patterns in producing skilful predictions of windstorm parameters is analysed over the Northeast-Atlantic and Europe by the two different algorithms described above (Sect. 3).

The indirect approach with ERA5-based regression is generated with ERA5 storm parameters and ERA5 large-scale pattern indices. The derived regression slopes per grid cell are presented in Fig. 5 for the 3 different windstorm parameters, respectively. These slopes represent the connection between windstorms and large-scale patterns. For windstorm frequency, the regression slopes show, as expected, a strong relation between the windstorm frequency and the large-scale pattern, NAO, SCA and EA. Both windstorm intensity measures, ESSIa and ESSIs, show similar patterns in the relation between storms and large-scale pattern, but with less strong slopes, and therefore with less strong linear relations. For all three windstorm parameters, the large-scale patterns cover different regions which potentially could increase the area of skilful windstorm forecast using this combination of predictors, although skill in patterns other than the NAO on seasonal timescales is only suggested in a few studies (e.g., Lledó et al. 2020; Baker et al. 2018).

Fig. 5
figure 5

Linear models of storm parameters (Frequency, ESSIa, ESSIs). Regression slopes from multiple linear regressions to the first three EOF patterns (NAO, SCA and EA), in ERA5 winter season 1993/94 to 2015/19

The ability of this combined linear ERA5-based regression to explain observed storms is shown in Fig. 6. The coefficient of determination, R2, shows the percentage of the explained variance. A high R2 value of the model for storm frequency is seen especially over the storm track region from North Atlantic to the British Isles, mid-Europe, North Sea and Baltic Sea. The explained variance for the entire regression is up to almost 80%, meaning the calculated regression with NAO, SCA and EA covers up to 80% of the variance found in seasonal windstorm frequencies. For both intensity measures, less variance is explained. However, over the UK and the North Sea, the combined regression model can explain up to 50–60% of the observed storm intensity. The intensity measures have a higher proportion of variance which is not explained by the large-scale pattern indices, perhaps suggesting it is less predictable.

Fig. 6
figure 6

Variance explained by linear models of storm parameters (Frequency, ESSIa, ESSIs). Determination parameter R2 (ac) and individual ratio of explained variance from ANOVA of first three EOF patterns (NAO (df), SCA (gi), EA (jl)), in ERA5 winter season 1993/94 to 2015/16

ANOVA is used to quantify the influence of the three large-scale patterns (Fig. 6d–l). This means the individual figures for each large-scale pattern show the percentage of explained variance that contributes to the total explained variance R2. For all three storm parameters, the NAO is the dominant variability mode to explain the variance in the storm parameters over most of the European region north of 45° N. It is even closer to 1 over Scandinavia which means the NAO is mainly responsible for the statistical performance of the regression. But the ANOVA shows an absence of connection between NAO and windstorm parameters between 40° and 50° N. Exactly in this region, SCA and EA show high values of explained variances for all three windstorm parameters and are thus potentially good candidates to improve skill if they are predictable.

Besides the ERA5-based regression, the link between storm parameters and large-scale patterns is investigated within the model (ensemble mean), thus based on the link between large-scale circulation and windstorm parameters purely from GloSea5 data. The resulting figures, like GloSea5-based regression slopes and verification (R2 and ANOVA), can be found in the Supplementary Appendix. The regression slopes, respectively for each large-scale pattern in the regression (Supplementary Appendix Fig. A2), show, similar to Fig. 5, a very strong connection between the windstorm parameter and the individual large-scale patterns. Only the spatial pattern from the SCA slopes is slightly different compared to the equivalent from the ERA5 regression as the maximum slope is shifted towards the northwest (model bias). The validation of the GloSea5 regression is calculated the same way as the ERA5 regression validation and can be found in Fig. A3 in the Supplementary Appendix. The R2 (Fig. A3, a–c) shows a high percentage of explained variance of the total regression over similar regions to ERA5 (Fig. 6). The R2 for storm frequency shows a slight extension of high values over larger parts of the North Atlantic. This means the GloSea5 large-scale patterns are explaining more of GloSea5 windstorm counts in this region than the ERA5 patterns explain of the ERA5 storm counts in the reanalysis data set. The individual contribution to the explained variance (ANOVA; Fig. A3, d–l) for NAO shows similar patterns as the ERA5-based regression. The GloSea5-based regression seems to show a stronger influence of EA over the region that is independent of the NAO. The results of the ERA5-based regression lead to a similar ratio of explained variance from SCA and EA. As mentioned for the GloSea5-based regression slopes, the dominant SCA pattern is shifted towards the northwest and the same is seen in the ANOVA. The regions where SCA explains a significant part of the variance of storm parameters are shifted west-northwest and hence, play a smaller part in complementing the NAO as a source of variability.

4.3 Skill comparison between direct and indirect forecast approach

For the ease of comparison, Fig. 7 compares the Kendall correlations between windstorm parameters of ERA5 and those directly derived from GloSea5 with the skill resulting from the indirect method. These windstorm parameters are forecast using the ERA5-based regression, thus reflecting the real-world link between large-scale patterns and storms, but the prediction is coming from using the forecast GloSea5 large-scale patterns. Following this ERA5-based regression link, all three storm parameters reveal a very similar spatial pattern to the direct approach, which corroborates the principal usefulness of such a regression-based forecast approach. Nevertheless, overall, the area of significant positive skill is less pronounced. For the storm frequency over northern France, the English Channel, and southern Scandinavia, a slightly higher forecast skill is provided by the direct approach and for most areas over the British Isles there are no significant differences. In addition, the accumulated storm intensity measure, ESSIa, shows higher skill for the direct approach over the whole region stretching from the Northeast-Atlantic across the British Isles to the Norwegian Sea and Scandinavia. For the ESSIs the differences are very scattered, but upstream of Ireland the direct approach shows some stronger skill than the indirect. The corresponding skill using entirely GloSea5-based regression by applying GloSea5 large-scale pattern indices to the model link also shows overall very similar skill patterns but fewer significant grid cells than the direct approach. Only a few grid cells downstream of Ireland show more skill in the GloSea5-based regression approach than in the direct approach.

Fig. 7
figure 7

Forecast skill for storm parameter (frequency, ESSIa, ESSIs). Ranked Kendall-\({\tau }_{b}\)-correlation between ERA5 and GloSea5 GC2 ensemble mean (63 member) (1st row), between ERA5 and regression model storm parameters (2nd row) and difference of significant correlation values as (1st row) minus (2nd and 3rd row) (4th and 5th row). Regions with 1.9 or less storms in observation (ERA5) are shaded. Dotted grid points are significant on 95% level (1st 3 rows). For the ease of comparison only those regions where one of the approaches shows positive significant skill are shown as differences and marked by a dot (last 2 rows)

The GloSea5-based regression skill is lower in most regions than the direct approach and the ERA5-based regression. Both ERA5- and GloSea5-based regression relations cannot predict windstorm frequency or intensity as well as the explicit storms in the forecast model itself in storm relevant regions (Fig. 7, 4th and 5th row). These regressions are calculated using the total time series with all data points, to eliminate any potential internal effects. The leave-one-out method with one season as test data was tested for the ERA5-based regressions and the differences show the same patterns for all three storm parameters (see Supplementary Appendix, Fig. A4). The windstorm relevant areas with significant correlations in the direct approach show strong positive differences from the other methods. For storm frequencies this difference corresponds to a skill increase of up to 75% and for the intensity measure even more than 100% compared to the indirect approach. As in Befort et al. (2019) the focus is on significant changes in skill and the differences show that the direct approach has higher skill over north-western Europe than in the indirect approach.

4.4 Skill verification and signal-to-noise paradox (direct approach)

The possibility of a signal-to-noise paradox in the seasonal winter windstorm forecast, for tracked windstorms and their parameters (frequency and intensity measures), is investigated using the direct approach. The ratio of predictable component (RPC) for the storm parameters shows very similar patterns (Fig. 8) as the direct forecast skill (Fig. 2) itself. The RPC values are high and exceed 1 in the same regions where the direct approach shows significant forecast skill. The RPC is up to 3 for the storm frequency and up to 5 for the intensity measures over the British Isles. This implies that the forecast windstorm signals, like the large-scale flow, are too weak in amplitude given the high correlation skill. Hence, from the presented findings, it can be concluded that seasonal forecasts underpredict the amplitude of year-to-year fluctuations in windstorms. This finding will be investigated in more depth in an independent study.

Fig. 8
figure 8

Ratio of predictable components (RPC) for storm parameter (Frequency, ESSIa, ESSIs). RPC = rmo/rmm, with r as ranked Kendall-\({\tau }_{b}\)-correlation between ERA5 and GloSea5 GC2 ensemble (63 member) from winter season 1993/94 to 2015/16, regions with 1.9 or less storms in observation (ERA5) are suppressed (grey). A ratio above 1 means that the signal-to-noise paradox exists as the model mean is better representing the reality than its individual members

5 Discussion and conclusion

This study investigated three storm parameters that define extra-tropical winter windstorms: the storm frequency (Befort et al. 2019; Kruschke 2015), the accumulated storm severity per season (ESSIa) and the average storm severity per season (ESSIs).

Two approaches to assess forecast skill are applied for relevant storm characteristics. The “direct approach” diagnoses the forecast skill by identifying and tracking windstorms directly within the hindcasts. The indirect approach uses multi-linear regression to large-scale circulation patterns, where the regression relation can be taken from observations or the model.

The storm frequency and other storminess definitions show similar significance patterns over Europe to those found by Befort et al. (2019) and Scaife et al. (2014). The highest positive skill for frequency is found over southern Great Britain, northern France and southern Scandinavia. In addition to existing studies which exclusively investigate windstorm frequency, the intensity forecast is evaluated: the combined storm intensity measure (ESSIa) shows similar high skill but is shifted towards the northwest, hence positive skill is identified more over the North Atlantic and the exit region of the North Atlantic storm track. The event-averaged storm severity measure (ESSIs) shows positive skill only in a region downstream of Ireland.

These regions are all located at the end of the North Atlantic storm track. The region over the central North Atlantic which contains the climatological maximum windstorm activity does not show strong forecast skill. This may be due to the fact that large scale modes, which explain much of the skill, have a smaller influence there. This study shows that NAO, SCA and EA are responsible for forecast storm variations at the end of the storm track. SCA and EA show higher explained variabilities over the central North Atlantic than NAO, but it seems this is not enough to strengthen the forecast skill. The addition of SCA and EA helps in regions where these large-scale patterns show their centres of action (Fig. A1), but not in the central North Atlantic, this region would need further investigations with additional factors.

A separate skill analysis for seasons with positive/neutral/negative large-scale patterns has been done by Renggli et al. (2011) and shows stronger skill for extreme NAO seasons.

The study from Clark et al. (2017) additionally investigated the reliability of the GloSea5 forecast for 1.5 m temperature and 10 m wind speed. They summarized for air temperature a reliable forecast over the same region as found in this study for forecast skill of windstorms. Interestingly, for wind speed most of this region (storm track area over North Atlantic and the UK) was found to be “underconfident” in its prediction. As seen in the RPC results (Sect. 4.4), the signal-to-noise paradox is present in the forecasts for all three windstorm parameters, suggesting that forecast ensemble mean signals are too weak which would explain the underconfidence. Further investigations of the signal-to-noise paradox in storm predictions are required and are planned in a separate study.

The ROCSS maps and selected ROC curves allow skill estimates for different storm parameter categories. The lowest and highest terciles of the storm parameter are more predictable than the middle tercile. This is in agreement with Renggli et al. (2011), who also found better prediction skill for the lower and higher terciles of storm frequency. Their study investigated just storm frequency, but here we show similar results for better prediction of lower and higher terciles of storm intensity.

The second approach used in this study is the “indirect approach”. A combined linear regression with the three dominant large-scale patterns (EOF-based definition) for the North Atlantic and Europe (Walz et al. 2018a)—NAO, SCA and EA. The resulting EOF patterns of ERA5 and GloSea5 MSLP do not reveal any significant spatial difference. The applied PCA is done by using the ERA5 EOF pattern for both data sets without loss of generality to ensure a robust comparison.

Using the ERA5-based regression, 80% of the interannual storm frequency variance can be explained by the three large-scale patterns. For storm intensity measures the linear regression model is less well defined but is still significant. The results for the ERA5-based regression show very clearly the main influence of the NAO around 60° N over Europe. A gap of explained variance from NAO is seen at around 45° N for all storm parameters as expected given this is close to the neutral line of the NAO influence. SCA is the leading pattern for the western part of the gap and EA for the more eastern part in the ERA5-based regressions. For the GloSea5-based regression, the same area shows EA as the dominant contributor for the variance. SCA is less important for the storm track region over the North Atlantic in the model climate. The magnitude of explained variance is very high for the NAO which is in line with this mode being the main large-scale mode influencing the North Atlantic region (Hurrell et al. 2001) and the result by Scaife et al. (2014), showing the NAO is the strongest predictable factor for seasonal forecasts for Europe. The link between storm frequency and the NAO has also been shown by Befort et al. (2019), with the same kind of lack of storm predictions around 45° N. This study showed SCA and EA can be used to increase storm forecast skill by including those two additional large-scale patterns compared to Befort et al. (2019).

The different regression approaches show on one hand that there is a link between the large-scale patterns: NAO, SCA and EA, and storm count and storm intensity. This link also exists in the model climate with the SCA pattern having a smaller influence in the model than the real climate. The comparison between the direct method and the indirect method shows that the direct forecast model skill is higher than for the regression approach. A comparison of prediction skill for intensity shows a similar higher skill in the direct method. Compared to storm frequency, the increase in skill for intensity is more pronounced. This study improved the forecast skill based on large-scale pattern regressions for storm frequency in comparison to Befort et al. (2019) and show similar high difference between regression-based and direct-model skill for storm intensity measures.

In summary, this study showed that in addition to positive seasonal windstorm count prediction skill, there is also positive event-based windstorm intensity skill in the GloSea5 seasonal forecasts. A statistical regression approach combining the three dominant large-scale patterns over Europe improves the indirect windstorm forecast compared with NAO-based regression alone, but it still does not show higher forecast skill than the direct approach.