1 Introduction

An accurate simulation of the global circulation response to El Niño-Southern Oscillation (ENSO) episodes is a primary target of any seasonal forecasting system. While the response to ENSO in many tropical regions and over the North Pacific/North American (PNA) midlatitudes is fairly well reproduced by most coupled models used for seasonal predictions, simulating the response over the North Atlantic/European (NAE) region has been much more difficult. This should actually be expected for a number of reasons: in the PNA sector, the ENSO response is the result of direct Rossby wave propagation from the tropical Pacific, while the NAE response comes from a superposition of tropospheric and stratospheric pathways (Ineson and Scaife 2009; Jiménez-Esteve and Domeisen 2018; Domeisen et al. 2019). Also, the weaker amplitude of the average wintertime NAE response makes it more difficult to emerge from the background of strong, unforced low-frequency variability of the North Atlantic circulation (e.g. Mezzina et al. 2020), leading to a low signal-to-noise ratio which is significantly underestimated by many models (Scaife et al. 2014; Stockdale et al. 2015; Scaife and Smith 2018).

Recent studies have pointed to another property of the NAE response to ENSO as a source of uncertainty in the detection and the prediction of such a response: namely, the fact that the associated anomaly pattern changes significantly from the late autumn/early winter period to the latter part of the boreal winter (King et al. 2018, 2021). These two periods are often represented by two-month means in November–December (ND) and January–February (JF). Studies that have investigated the ability of global climate models (GCMs) to simulate the ENSO response in these periods (Ayarzagüena et al 2018; Molteni et al. 2020) have shown that multi-decadal GCM simulations show a much higher level of fidelity in the simulation of the late winter response, while the amplitude of ND response is significantly underestimated. Ayarzagüena et al. claimed that initialised seasonal predictions made with the UK Met Office (UKMO) coupled model showed improved simulations of the early-winter response compared to historical multi-decadal runs of the same model; also, they argued that in ND connections between the tropical Pacific and Atlantic basins produced Rossby wave sources in the Caribbean region, from which planetary-scale waves could propagate into the North Atlantic.

A different dynamical connection was advocated by Abid et al. (2021) to explain the difference between the early- and late-winter response to ENSO. They noted that connections between rainfall in the western/central Indian Ocean (WCIO) and in the central Pacific were stronger in early winter, and that teleconnections from Indian Ocean heating anomalies could generate a Rossby wave train leading to a NAE response resembling the positive phase of the North Atlantic Oscillation (NAO). Therefore, they argued that while the negative-NAO signature of the late-winter El Niño response was a direct response to Rossby waves emanating from the Pacific, the positive-NAO projection (although meridionally shifted) of the early-winter response was mainly a consequence of the Indian Ocean-NAO teleconnection.

Even though the seasonality in the ENSO response over the North Pacific and North America is not specifically investigated in this study, seasonal and sub-seasonal variations in the phase and amplitude of the Pacific/American response and the associated model biases have also been documented (Kumar and Hoerling 1998; Spencer and Slingo 2003; Bladé et al. 2008; Chen et al. 2020, 2022 and references within), as well as the combined effect of tropical Indian and Pacific anomalies on the North Pacific and northern hemispheric response (Deser and Phillips 2006; Annamalai et al. 2007; Fletcher and Cassou 2015).

Because of the difficulty in the simulation of the early-winter response to ENSO, many recent papers investigating the performance of one or more modelling systems have focussed on the late-winter response (e.g. López-Parages et al. 2016; Benassi et al. 2022; Mezzina et al. 2022). In this study, we exploit the opportunity provided by the archive of the multi-model ensemble (MME) system of the Copernicus Climate Change Service (C3S; see http://copernicus.climate.eu) to diagnose the early- and late-winter response to ENSO in re-forecast ensembles covering a 24-year period and produced by 9 state-of-the-art GCMs from 8 institutions, with a specific focus on the NAE region. Since the individual models contributing to the C3S MME are updated relatively frequently, we use re-forecasts from those versions that were operational in winter 2021–22.

A comprehensive description of the data and models used in this study is provided in Sect. 2 of the paper, together with a description of the statistical methodology. Results on composite anomalies for El Niño and La Niña events, and signal-to-noise ratios for the projections of individual anomalies on such patterns, are presented in Sect. 3. Section 4 investigates how separate contributions from anomalies in the Indian and Pacific Oceans contribute to the total extra-tropical signal in early-winter ENSO events. Conclusions are drawn in Sect. 5.

2 Data and methodology

2.1 Available models, reference data, and selection of ENSO years

Data from re-forecast ensembles from 9 different coupled models were available in the C3S Climate Data Store at the time of writing. These are listed in Table 1, where they are ordered according to the date in which they joined the C3S MME. Note that Environment and Climate Change Canada (ECCC) is the only institution providing data from two different models; their re-forecast ensembles, as well as those of the Japan Meteorological Agency (JMA), consist of a much smaller number of members than those from the other six institutions.

Table 1 Seasonal forecasting systems providing re-forecasts diagnosed in this study

As mentioned above, the contributing models have been updated one or more times since the start of the C3S MME operational activities; re-forecasts from the versions operational in winter 2021–22 were selected. While in operational practice the performance of seasonal forecasts for the boreal winter is usually assessed from ensembles initialised on 1 November, here we wanted to compare the ability of simulating the ND and the JF response to ENSO with the same lead time. Therefore, for all available years in the re-forecast archive (1993–2016), ND data were extracted from ensembles initialised on 1 September, while JF data were extracted from ensembles initialised on 1 November, corresponding to forecast months 3 and 4 in both cases. It should be noted, however, that for those ensemble systems which are initialised in lagged mode using start dates preceding the 1st day of each month, the average forecast time corresponding to ND and JF is actually longer than for ensembles initialised in “burst” mode at the beginning of the month. Our choice of selecting a forecast range of 3–4 months, rather than 2–3, reduces the impact of the differences in initialisation strategy.

In the following sections, statistics derived from model data are compared with the corresponding ones computed from the ERA5 re-analysis (Hersbach et al. 2020). Using sea-surface temperature (SST) from ERA5 from 1980 to 2020, the average Niño3.4 index was computed for each of the 40 ‘extended’ winters (November-to-February); from this series, the 13 winters within the upper-third of the distribution were classified as El Niño (EN) episodes, the 13 winters within the lower-third were classified as La Niña (LN) episodes. Out of these, 8 EN winters and 9 LN winters fall within the MME re-forecast period. The selected EN and LN winters are listed in Table 2.

Table 2 List of El Niño and La Niña winters (indicated by the year of November) in the ERA5 re-analysis record

2.2 Statistical definitions

Let x be a spatial variable spanning a 2-dimensional domain, τ an index of events spanning time and (for model data) ensemble member, A (x, τ) the anomaly of an atmospheric variable A with respect to its climatological mean, and s(τ) the SST anomaly in the Niño3.4 region. The regression pattern of A on the Niño3.4 index is defined by:

$$R_{{}} \left( x \right) \, = \sum_{{{{\varvec{\uptau}}} }} A_{{}} \left( {x, \, \tau } \right){\text{ s}}\left( \tau \right)/\sum_{{{{\varvec{\uptau}}} }} {\text{s}}^{{2}} \left( \tau \right)$$
(1)

Once R(x) is computed, each anomaly A (x, τ) can be decomposed into a component proportional to s(τ) and a residual field O (x, τ) such that, at any grid point x, its time series is uncorrelated (i.e. temporally orthogonal) to s(τ):

$$A_{{}} \left( {x, \, \tau } \right) \, = R_{{}} (x){\text{ s}}\left( \tau \right) \, + O_{{}} \left( {x, \, \tau } \right)$$
(2a)

On the other hand, at any time τ, O (x, τ) may have a non-null spatial projection ε(τ) onto R(x) within a given spatial domain D; since ε(τ) is a linear combination of grid-point data which are all uncorrelated with s(τ), it will also be uncorrelated with s(τ). If, for any τ, we further divide O (x, τ) into its projection onto R(x) and a spatially orthogonal residual O’ (x, τ), we obtain

$$A(x,\tau ) = R(x)[{\text{ s}}(\tau ) + \varepsilon (\tau )] + O^{\prime}(x,\tau ) \, = R(x){\text{ b}}(\tau ) + O^{\prime}(x,\tau )$$
(2b)

where b(τ) is the total spatial projection of A (x, τ) onto R (x) at any given time, and O’ (x, τ) is spatially orthogonal to R (x). It should be noted that while R (x) is defined at each spatial grid-point, and therefore Eq. (2a) is valid for any spatial domain considered, the decomposition given by Eq. (2b) is dependent on the domain D selected for the computation of spatial projections. Equation (2b) can be interpreted as a separation between the linear response to ENSO [R (x) s(τ)], the component of unforced variability which projects on the same spatial patterns as the ENSO response [R (x) ε(τ)], and the remaining (spatially orthogonal) part of unforced variability [O’ (x, τ)].

While usually regression patterns are computed using the full range of anomalies, for the results presented in Sect. 3 we have applied Eqs. (1) and (2a, 2b) separately to the data from El Niño winters and from La Niña winters, even though anomalies are defined with respect to the climatological mean of the full sample. Since ERA5 data only include one realization per year, instead of the multiple realizations provided by the forecast ensembles, the regression patterns for ERA5 have been computed from all the El Niño and La Niña years available in the 40-year re-analysis record, in order to increase their statistical significance.

Given the regression patterns for El Niño and La Niña events, we define the composite anomalies during such events as:

$$C_{EN} \left( x \right) = R_{EN} \left( {\text{x}} \right)\mu \left[ {{\text{s}}\left( \tau \right)_{EN}} \right]$$
(3a)
$$C_{LN} \left( x \right) = R_{LN} \left( {\text{x}} \right) \, \mu \left[ {{\text{s}}\left( \tau \right)_{LN}} \right]$$
(3b)

where μ denotes the time average and the EN or LN subscript indicates the sample of El Niño or La Niña events. With respect to the traditional definition of composites as simple averages, here they are defined by a weighted average, the weight being proportional to the amplitude of the Niño3.4 index.

Using a regression-based definition of composites reduces a potential inconsistency in comparing observed and re-forecast data for years selected on the basis of the observed Niño3.4 index: because of the ensemble dispersion and loss of predictability with increasing forecast time, some members started from ENSO conditions of moderate amplitude may not maintain an SST anomaly of sufficient amplitude (or even the same sign) to be still classified as an El Niño or La Niña event (as shown in the scatter diagrams for model data in Fig. 6, Sect. 3). However, in these cases the anomalies A (x, τ) are multiplied by a near-zero value of s(τ) in the regression (Eq. 1), and therefore have a negligible impact on the result.

Since the focus of our analysis is on the Euro-Atlantic response to ENSO, the decomposition of anomalies into a component proportional to the Niño3.4 index and one orthogonal to this index, as in Eq. (2a, 2b), is performed on the NAE domain (70W-30E, 25N-80N). This domain is also used to quantify the similarity between re-analysis and model composites, by computing (for each model) the pattern correlation between the two composites and the model-to-ERA5 amplitude ratio.

A number of studies have addressed the issue of inconsistency between the actual skill of seasonal predictions in the North Atlantic region during winter (Eade et al. 2014; Stockdale et al. 2015; Scaife and Smith 2018; Strommen and Palmer 2019; Zhang and Kirtman 2019) and the expected skill derived from the spread of individual members around the ensemble mean. This issue is often referred to as the “signal-to-noise paradox”, since in this space and time domain the ensemble-mean forecasts appear to be in better agreement with observations than with individual ensemble members. In this context, ‘signal’ refers to the component of extra-tropical atmospheric variability forced by slowly-varying elements of the climate system (e.g. tropical SST), while ‘noise’ refers to the variability generated by internal atmospheric dynamics, which is supposed to be unpredictable beyond a few weeks into the forecast.

Focussing on the time series of a specific variability index (for example, an NAO or Arctic Oscillation index obtained by projecting model anomalies on a pre-defined pattern), the signal-to-noise ratio is a function of the standard deviation of the ensemble-mean forecast and the root-mean-square deviation of individual members’ values from the ensemble mean (e.g., Scaife and Smith 2018). Since observations provide a single realisation for each season, these terms cannot be computed from observed data. However, the decomposition of anomalies given by Eq. (2b) provides a way to define a measure of signal-to-noise that can be applied to both re-analysis and model data. Specifically, if we assume that the ‘signal’ is defined by the linear response to the Niño3.4 SST anomalies (as computed by the regression), the noise can be estimated as the difference ε(τ) between the total projection onto the regression pattern R(x) and the value proportional to the Niño3.4 index. Using the terminology introduced by Eq. (2b), and applying this definition separately to El Niño and La Niña years (as denoted by the EN and LN subscripts), we get:

$${\text{S}}/{\text{N}}_{{{\text{EN}}}} = \sigma \left[ {{\text{s}}\left( \tau \right)_{{{\text{EN}}}} } \right]_{{}} /\sigma \left[ {\varepsilon \left( \tau \right)_{{{\text{EN}}}} } \right]$$
(4a)
$${\text{S}}/{\text{N}}_{{{\text{LN}}}} = \sigma \left[ {{\text{s}}\left( \tau \right)_{{{\text{LN}}}} } \right]_{{}} /\sigma \left[ {\varepsilon \left( \tau \right)_{{{\text{LN}}}} } \right]$$
(4b)

where σ indicates the standard deviation of a given time series, from either re-analysis or model data.

3 Results

3.1 ENSO composites

Figure 1 shows the composites of 500-hPa geopotential height for El Niño and La Niña events, computed from ERA5 data in ND and JF according to Eqs. (3a3b). Differences between the composites in positive and negative ENSO events are also shown in the right-hand column. The thick black line defines the border of the NAE domain used to compute projections, scores and signal-to-noise ratios. As expected, these composites are very similar to those computed in previous studies (see e.g. Deser et al. 2017; Ayarzagüena et al. 2018; King et al. 2021, which include extensive analyses of the associated uncertainty; see also Fig. S1 in Supplementary Information for a comparison with ERA5 composites defined as simple averages); here we focus our comments on the features of the North Atlantic/European response:

  • During El Niño winters, the response in the Euro-Atlantic region has a larger amplitude in ND than in JF, especially over Europe.

  • In both sub-seasons and ENSO phases, the response is shifted meridionally with respect to the canonical NAO pattern, especially when the latter is defined by the traditional Iceland-Azores (or Lisbon) index. As shown in Molteni et al. (2020), a NAO definition based on the meridional height gradient over a wider (and more western) sector of the North Atlantic provides a better match to the response in JF. Also, in ND the response shows a wave pattern with stronger zonal gradients than in JF, leading to anomalous meridional flow over Europe.

  • The spatial anti-correlation of the El Niño and La Niña composites is stronger in ND than in JF in the NAE sector (see the lin coefficients above panels 1c and f), while the opposite is true over the Pacific.

Fig. 1
figure 1

Composites of 500-hPa height (m) in the ND (top row) and JF (bottom row) sub-seasons, from ERA5 data. Left (a and d): El Niño composites; centre (b and e): La Niña composites; right (c and f): difference between El Niño and La Niña composites. The NAE sector used for further calculations is delimited by the black border. The two red boxes in panel c show the areas used to compute the Eastern Atlantic Gradient (EAG) indices listed in Table 3. The lin. coefficient listed above panels c and f is the absolute value of the spatial correlation between the El Niño and La Niña composites

We now look at how composites computed from re-forecast ensembles match those from ERA5 data; the similarity is quantified by the spatial correlation between model and ERA5 patterns, and the ratio of their root-mean-square (r.m.s.) amplitudes in the NAE sector. For brevity, in this section we show the full set of maps only for the ECMWF and UKMO models (Figs. 2 and 3, respectively), with maps for other models being included in Supplementary Information. Statistics for all models are summarised in Fig. 4 below.

Fig. 2
figure 2

As in Fig. 1, for the height composites (in m) computed from ECMWF re-forecasts. Spatial correlations and amplitude ratios with respect to the ERA5 patterns are listed above each panel

Fig. 3
figure 3

As in Fig. 1, for the height composites (in m) computed from UKMO re-forecasts. Spatial correlations and amplitude ratios with respect to the ERA5 patterns are listed above each panel

Fig. 4
figure 4

Taylor diagrams representing the ENSO composites from model ensembles as a sum of two components, one parallel (x-axis) and one orthogonal (y-axis) to the ERA5 composites. ERA5 composites are represented by the black squares at coordinates (1,0). Coordinates are non-dimensional, as they represent ratios of amplitudes between model and re-analysis composites

At first glance, it is evident that both models reproduce the JF composites with better accuracy than in the ND composites, the amplitudes of the latter ones being substantially under-estimated. In JF, these two models provide similar (and very good) results over the Pacific region, while in the NAE sector the agreement with ERA5 is better for El Niño than for La Niña events (this is further discussed in Sect. 3.2). In this region, the UKMO composites in JF have a larger amplitude, closer to the observed values, while the ECMWF model shows a very weak signal during La Niña winters. On the other hand, the ECMWF model does a better job in ND, particularly during El Niño years, when the UKMO composite shows a much weaker amplitude.

Correlation and amplitude ratios from all available models are used to compute the so-called Taylor diagrams (Taylor 2001) shown in Fig. 4, where the ERA5 composites in the NAE region are used as reference. These diagrams show that the deficiencies discussed above are present in re-forecasts from all models, in particular the under-estimation of the amplitude of ND composites. While for JF the points representing the 9 models tend to cluster around the circle of (relative) amplitude 1, with a wider dispersion for La Niña events, the points for the ND composites are close to the 0.5 amplitude circle.

Since we have chosen to compute the ERA5 composites from a 40-year sample, instead of using only the 24 years included in the re-forecast, one may question if the discrepancies between model and re-analysis data may be attributed to the different selection of years. We argue that the ‘noise’ generated by the single realization per year in the re-analysis sample is likely to increase the differences with respect to the smoother ensemble composites, so reducing such noise by extending the re-analysis sample is expected to produce better statistics for the model composites. To test this hypothesis, we recomputed all the correlations and amplitude ratios used for the generation of Fig. 4 by comparing the model composites to re-analysis composites from the same 24 years covered by the reforecasts. The results of this comparison mostly confirm our hypothesis; specifically, when using the 24-year ERA5 sample:

  • the spatial correlation of La Niña composites is decreased for all models, in both ND and JF;

  • the amplitude ratios are never improved in ND composites, while for JF data these statistics are improved only when one model overestimates the observed amplitude (this only occurs in 3 cases out of 18);

  • The only score for which some marginal improvement is found is the spatial correlation of ND El Niño composites: 5 out of 9 model have a higher correlation using the reduced ERA5 sample, but overall the average correlation (across all 9 models) only shows a negligible increase from 0.591 to 0.594.

Indeed, the consistency in the reduced amplitude of the ND composites across the 9 models raises the question of whether the stronger anomalies in the ERA5 composites may have resulted from the limited sample of re-analysis record, even when using 40 years. In other words, may the ND composites from ERA5 have a larger amplitude simply because they are computed from 13 El Niño and La Niña anomalies, instead of the much larger sample produced by the ensembles? Would model composites only computed from 13 realizations match the ERA5 amplitude?

Since the main feature of the ND composites is the increased/decreased geopotential gradient in the eastern part of the North Atlantic during the El Niño/La Niña events, we have tested this hypothesis using a re-sampling approach, as in Deser et al. (2017 and King et al. (2021):

  • An index of the geopotential height gradient in the eastern Atlantic (EAG) was defined as the difference between the average height anomaly in the two regions (25W-5E, 25N-40N) and (50W-10W, 50N-65N), where positive and negative anomalies are found in the ERA5 composites for ND (see top-right panel in Fig. 1). Note that the two areas are centred approximately 5 degrees south of the points traditionally used to define an NAO index.

  • From the re-forecast data of each model, for both El Niño and La Niña events, 5000 sub-samples were created including SST and height anomalies from 13 ensemble members, randomly selected from different years, in order to have the same number of realisations as in the 40-year ERA5 record; the selection was constrained to include at least one member for each ENSO year in the re-forecast dataset, and to select only members with a Niño3.4 index exceeding (in absolute value) a suitable threshold.

  • The EAG index was computed from the ERA5 composites shown in Fig. 1 (obtained from 13 El Niño and 13 La Niña winters), from the model composites including all available members in El Niño and La Niña years, and from all the pseudo-random samples obtained by selecting 13 ensemble members as described above.

  • For each model, we have computed the probability of getting a 13-member composite with an EAG index of greater (or equal) amplitude than the ERA5 index.

The results of this test are shown in Table 3, where the EAG from the full ERA5 record and the model ensembles are listed, together with the probabilities defined above. In the case of La Niña composites, for all models (except one) there is a probability exceeding 10% that the difference from the ERA5 composite may be accounted for by random differences among 13-case samples. However, for the El Niño composites, only for 3 models there is a 5-to-6% chance to obtain an EAG index larger than the ERA5 value from 13-member samples; for all other models, this probability is less than 1%. Therefore, at least for the El Niño events, we can state that there is only a very small chance that the weaker anomalies in the model composites, compared to the ERA5 results, are compatible with the sampling error of the re-analysis composites.

Table 3 Values of the Eastern Atlantic Gradient (EAG) of 500-hPa geopotential height anomaly [(25W/5E, 25N/40N) – (50W/10W, 50N/65N)] in the ND ENSO composites from ERA5 and model re-forecasts

Of course, no statistical test is totally reliable, and here it may be argued that by using data from models with weaker-than-observed teleconnections we actually overestimate the “noise” associated with such teleconnections in the real world. The implication of this argument is that the chance of ND model composites matching the observed amplitude is likely to be even lower than what suggested by the values in Table 3. In Table S1 in Supplementary Information, we show a similar table obtained by re-sampling the ERA5 values of EAG, instead of model data (this approach will also be used in Sect. 3.2 below). Although, as expected, the estimated probabilities of model results being compatible with ERA5 data are lower than in Table 3, the main message about the much higher significance of the differences for El Niño events remains valid.

3.2 Signal-to-noise ratios

We now examine how well the projections of individual NAE anomalies of 500-hPa height onto the ENSO regression patterns are approximated by a linear relationship with the Niño3.4 index; the signal-to-noise ratio defined in Eqs. (4a4b) provide a quantitative measure of such a relationship. Figure 5 shows scatter diagrams for the b(τ) projections of individual height anomalies onto the R(x) pattern (see Eq. 2b), plotted against the Niño3.4 index s(τ); these values are shown for both the El Niño and La Niña cases, using red and blue coloured marks respectively. A straight diagonal line, representing an exact linear relationship, is also plotted as a reference; the ε(τ) term in Eqs. (2b) and (4a4b) corresponds to the difference between the individual points and the diagonal line at the same s(τ) value.

Fig. 5
figure 5

Scatter diagrams of projections of 500-hPa height anomalies in the NAE region onto the regression patterns for El Niño (red points) and La Niña (blue points) events, plotted against the Niño3.4 index (on x-axis). Top: from ERA5 data; centre: from ECMWF ensembles; bottom: from UKMO ensembles. Left column for Nov-Dec. data, right column for Jan-Feb. data. The S/N ratios on the left and right side of each panel refer to La Niña and El Niño events respectively

Figure 5 shows results for ERA5 data in the full 1980–2020 record, and for the ECMWF and UKMO re-forecast ensembles started in years 1993 to 2016 (each ensemble including 25 and 28 members respectively). Above each scatter plot, the signal-to-noise (S/N) ratio is shown for both El Niño and La Niña events. It should be noted that, because of the spread of solutions within the ensembles, the range of observed Niño3.4 values is fully covered by the model realizations. As for the composite patterns, scatter diagrams for other models are shown in Supplementary Information.

In ERA5 data, the S/N ratio is larger for the El Niño than for La Niña winters, in both ND and JF sub-seasons. This is also the case for the ECMWF ensembles, while the UKMO re-forecasts reproduce this feature in JF but not in ND. Also, in ERA5 data, projections for El Niño events have a larger S/N ratio in ND than in JF (consistent with the larger amplitude of the composite response in the former period), while the ND and JF ratios are very similar for La Niña events. The S/N ratios from the ECMWF and UKMO models are remarkably close to each other, with the exception of El Niño events in ND, when the ECMWF model has a higher S/N ratio (although well below the observed value). Again, this difference has a correspondence in the better simulation of the anomaly composite for El Niño events in ND by the ECMWF model (see Fig. 2).

A summary of S/N ratios for all model contributing to the C3S MME is given by the histograms shown in Fig. 6. Looking at the two top panels, showing ratios for El Niño and La Niña in ND and JF respectively, two positive conclusions can be reached regarding the agreement between model and ERA5 data:

  • The larger S/N ratio in El Niño than in La Niña events, shown by ERA5 data, is reproduced in both sub-seasons by all models (with just one exception).

  • The S/N ratios from ERA5 data are well within the range of values from the 9 models, for both ENSO phases in JF and for the La Niña events in ND.

Fig. 6
figure 6

Comparison of S/N ratios between nine model ensembles and ERA5 data. a For El Niño and La Niña events in ND; b for El Niño and La Niña events in JF; c for El Niño events in ND and JF

However, when S/N ratio for El Niño events in ND and JF are compared (bottom panel in Fig. 6), the re-forecasts shows either negligible differences between the two sub-seasons (for three models), or a larger S/N ratio during JF, contrarily to ERA5 data. On average, the S/N ratios for El Niño events in ND is almost twice as large in ERA5 than in model re-forecasts. Since the ERA5 ratios are obtained from just 13 values in either period, it is natural to ask what is the probability that the higher ERA5 value in ND is just a result of insufficient sampling.

To address this question, we have recomputed the ERA5 S/N ratios from all possible sub-samples obtained by removing either 3 or 4 events out of the 13 El Niño winters in the ERA5 record. With this procedure, 1001 subsamples were obtained, including a number of El Niño events (9 or 10) comparable to those occurring in a 30-year period. It turned out that a larger S/N ratio in ND than in JF is reproduced in 98.9% of the cases; therefore, there is only a negligible chance that the discrepancy between model and ERA5 data is due to a sampling issue. We have also checked if this result was reproduced in ERA5 near-surface data, by recomputing regression patterns and the projections of individual anomaly from mean-sea-level pressure (MSLP) data instead of 500-hPa height. We found that the difference in S/N ratio between El Niño events in ND and JF is actually larger using MSLP instead of 500-hPa height data, with a 1.21 value for ND events and a 0.73 value for JF events, and a 100% confidence value obtained from 9- or -10-winter sub-sampling. (Scatter plots for MSLP anomalies from ERA5, ECMWF and UKMO data are shown in Supplementary Information).

As noticed above, in JF most models reproduce quite well the lower S/N ratio in La Niña episodes compared to El Niño events, as found in ERA5 data. This difference (which is also reproduced with very high significance in re-sampled data) implies that La Niña composites are more affected by unforced atmospheric variability than the El Niño composites, which may (at least partially) explain the weaker agreement of ERA5 and model composites for La Niña versus El Niño events over the NAE region.

4 Connections with Indian Ocean rainfall

The results in Sect. 3 clearly reveal a serious deficiency in the way seasonal forecast models represent the early-winter teleconnection of ENSO with the North Atlantic circulation. In this section, we explore the hypothesis that the early-winter deficiency is originated by the superposition of teleconnections originated from different parts of the tropical ocean, and specifically by the models’ failure to represent the constructive interference between signals originated from the central Pacific and the Indian Ocean.

In a recent paper, Abid et al. (2021) argued that the North Atlantic response to ENSO in early winter is due to a superposition of Rossby waves originated from the central Pacific and from a dipolar heating anomaly in the Indian Ocean, with opposite signs over the western/central Indian Ocean (WCIO) and the Maritime Continents. This was shown using both statistical diagnostics from re-analysis data, and the results of a modelling experiment based on two large ensemble of simulations, one of which included an additional, idealised heating dipole in the Indian Ocean. During El Niño events, the anomalies in the Walker circulation induce upward motion in the WCIO region, leading to increased rainfall and diabatic heating, while rainfall over the Maritime Continents is reduced. Abid et al. (2021) showed that the planetary wave originated from this heating dipole is the main source of the upper-tropospheric (NAO-like) response over the North Atlantic.

To illustrate these connections using the same methodology used in Sect. 2, the top panels on Fig. 7 shows the difference between El Niño and La Niña composites of ERA5 rainfall in the Indo-Pacific region, during ND (Fig. 7a) and JF (Fig. 7b). It is evident that the rainfall anomaly associated with ENSO events in the WCIO region (west of 90oE) is much stronger in ND than in JF, and therefore the contribution of teleconnections originated from WCIO is more important in the former period. In the bottom row of Fig. 7, the ND rainfall composites produced by the ECMWF (panel 7c) and UKMO re-forecasts (panel 7d) are shown for comparison. If we consider a domain covering the whole tropical Indian Ocean and the Maritime continents (shown by the box in Fig. 7), the spatial correlation with the ND ERA5 composite and the ratio of the anomaly amplitude are very close to 1 for both models. However, differences from ERA5 can be seen in the model composites between 70 and 90oE, where the two models show anomalies of opposite sign north and south of the Equator, instead of a transition in sign across 90oE in both northern and southern latitudes.

Fig. 7
figure 7

Difference between rainfall composites in El Niño and La Niña events over the Indo-Pacific region. Top row: ERA5 composites in ND (a) and JF (b). Bottom row: composites in ND from the ECMWF (c) and UKMO (d) re-forecasts. Statistics are computed over the region limited by the black border: lin.: absolute value of the spatial correlation between separate El Niño and La Niña composites (not shown); cor.: spatial correlation between model and ERA5 composites; amp.r.: ratio of rms amplitudes in model/ERA5 composites

The teleconnection from the western Indian Ocean to the North Atlantic is particularly difficult to simulate even from state-of-the-art GCMs, according to a study by Molteni et al. (2020) which diagnosed the output of multi-decadal runs of several European climate models. The same difficulty was detected in seasonal forecasts from the ECMWF SEAS5 (Johnson et al. 2019). It is therefore plausible that a poor simulation of the response from the Indian Ocean heating associated with ENSO may contribute to the deficient representation of the total North Atlantic response in early winter. In late winter, the connection between ENSO and rainfall anomalies in the western Indian Ocean is considerably weakened, reducing the impact of a mis-representation of the WCIO teleconnection.

To separate the contributions of anomalies over the two tropical oceans, we make use of the technique of partial regression, as applied by Abid et al. (2021) Here, following Johnson et al. (2019) and Molteni et al. (2020), we just use the rainfall anomaly over the WCIO region (40E-90E, 10N-10S) as an indicator of the Indian Ocean heating anomalies, and we relate this anomaly to 500-hPa height anomalies in the northern extratropics. Specifically, for each realisation in ERA5 and model ensembles, we compute the following quantities:

  • s(τ): SST anomaly in Niño3.4;

  • p(τ): precipitation anomaly in WCIO;

  • s0(τ): component of s(τ) which is uncorrelated with p(τ);

  • p0(τ): component of p(τ) which is uncorrelated with s(τ).

Using the notation above, the WCIO precipitation anomaly can be written as:

$${\text{p}}\left( \tau \right) \, = {\text{ p*}} {\text{ s}}\left( \tau \right) \, + {\text{ p}}_{0} \left( \tau \right)$$
(5a)

where the regression coefficient p* is given by

$${\text{p}}* \, = \sum_{{{{\varvec{\uptau}}} }} \left[ {{\text{p}}\left( \tau \right){\text{ s}}\left( \tau \right)} \right]/\sum_{{{{\varvec{\uptau}}} }} {\text{s}}^{{2}} \left( \tau \right)$$
(5b)

As demonstrated in the Appendix, if we indicate with R0(x) the regression pattern of 500-hPa height on s0, and with P(x) and P0(x) the regression patterns of 500-hPa height on the precipitation indices p(τ) and p0(τ) respectively, we can decompose the full regression on the Niño3.4 index into a linear combination of the regression patterns computed from s0(τ) and p0(τ):

$$R\left( x \right) = R_{0} (x) + {\text{ p* }} P_{0} (x)$$
(6)

Because the decomposition given by Eq. 6 requires the computation of a number of second-order statistics, which in practice reduce the number of degrees of freedom, in this section we compute regressions from data in all available winters, including both phases of ENSO as well as winters with neutral ENSO conditions.

The regression coefficient p*, which quantifies the dependence of WCIO rainfall on the Niño3.4 SST anomaly, determines the contribution that the Indian Ocean teleconnection gives to the total regression pattern from Niño3.4 anomalies. From ERA5 data, we obtain p* = 0.34 in ND and p* = 0.09 in JF, confirming that the Indian Ocean connection is only relevant in early winter. In this sub-season, and in the hypothesis that models are able to represent the Pacific teleconnection realistically, errors in the simulation of R(x) may come either from an under-estimation of p*, or from an inaccurate simulation of the P0(x) pattern.

In Fig. 8, we show the four regression patterns R(x), R0(x), P(x), P0(x), computed from ERA5 data.

Fig. 8
figure 8

Regression patterns of 500-hPa height on the four indices s(τ), s0(τ), p(τ), p0(τ) defined in Sect. 4, from ERA5 data. Top row: regression patterns R(x) and R0(x) derived from the Niño3.4 SST anomaly; units: m/oK. Bottom row: regression patterns P(x) and P0(x) derived from the precipitation anomaly in WCIO; units: m/(mm day−1)

A striking feature of the regression patterns in Fig. 8 is the positive spatial correlation between the teleconnections with the ‘independent’ components of the Niño3.4 and WCIO indices (s0 and p0 respectively) over the Atlantic and Europe (panels 8b and d). Therefore when the two independent components are combined, accounting for the correlation between the two indices, the total regression patterns R(x) and P(x) (panels 8a and c respectively) are strengthened over the NAE region. In this sector, the amplitude of the regression pattern R(x), computed from the original Niño3.4 anomaly, has approximately twice the amplitude of the R0(x) pattern obtained from the s0 component of the Niño3.4 index uncorrelated with WCIO rainfall (compare panels 8a and b). As noted by Abid et al. (2021), when the regression patterns with the ‘independent’ tropical signals are considered (patterns R0(x) and P0(x) in panels 8b and d), the teleconnection with WCIO rainfall has a much stronger amplitude than the Niño3.4 teleconnection over the NAE region.

Figures 9 and 10 show the same regression patterns as in Fig. 8, but computed from re-forecast ensembles from ECMWF and UKMO respectively. Spatial correlations and anomaly ratios with respect to the ERA5 patterns (again for the NAE region) are listed above each panel, and also reported in Table 4.

Fig. 9
figure 9

As in Fig. 8, from the ECMWF re-forecast ensembles. Units: m/oK in top panels, m/(mm day−1) in bottom panels. Spatial correlations and amplitude ratios with respect to the ERA5 patterns are listed above each panel

Fig. 10
figure 10

As in Fig. 8, from the UKMO re-forecast ensembles. Units: m/oK in top panels, m/(mm day−1) in bottom panels. Spatial correlations and amplitude ratios with respect to the ERA5 patterns are listed above each panel

Table 4 Spatial correlations and amplitude ratios (over the NAE sector) of the ECMWF and UKMO regression patterns shown in Figs. 9 and 10, with respect to the ERA5 patterns in Fig. 8

Before looking at those patterns, it is important to check if the relationship between the Niño3.4 and WCIO indices, as quantified by the p* coefficient, is realistically reproduced by the two models. We find p* = 0.45 for the ECMWF ensembles and p* = 0.29 for the UKMO ensembles; respectively, these values are about 30% higher and 10% lower than the ERA5 coefficient, so we cannot attribute the deficiencies in the Niño3.4 regression pattern to a significant underestimation of the Niño3.4 -WCIO connection (as also suggested by Fig. 7).

Turning to the re-forecast regression patterns, a common feature between the two models is that the regression pattern P0(x) on the ‘independent’ WCIO signal p0(τ) is much more zonally symmetric than the corresponding ERA5 pattern, and of much weaker amplitude. However, a different picture emerges from the two models’ data when considering how the teleconnections from the two tropical signals s0(τ) and p0(τ) interact with each other.

In the ECMWF ensembles, the regression patterns R0(x) and P0(x) (panels 9b and d) are both poorly correlated with the ERA5 counterparts, and also negatively correlated with each other over the NAE region; combining the two signals leads to an improvement in the spatial correlation of the ‘full’ Niño3.4 teleconnection (R(x) in panel 9a) with the ERA5 pattern, but also a reduction in the total amplitude. In the UKMO ensembles, the regression patterns R0(x) and P0(x) (panels 10b and d) are positively correlated over the Atlantic, as in ERA5, but both have a much weaker amplitude than the re-analysis patterns. When they are combined, the response over the North Atlantic increases (see panel 10a), but the impact of adding the WCIO contribution is much more modest that in the ERA5 teleconnections.

In summary, these results confirm earlier findings on the serious deficiencies found in the simulation of Indian Ocean teleconnections by state-of-the-art GCMs (Molteni et al. 2020). These deficiencies undoubtedly contribute to the poor simulations of the early-winter teleconnection between ENSO and the North Atlantic circulation, but we cannot conclude that they are the only cause. The teleconnection with the component s0(τ) of the Niño3.4 anomaly, uncorrelated with WCIO rainfall, is also misrepresented by the two models: the ECMWF regression on s0(τ) has a realistic amplitude but a poor spatial correlation with the ERA5 pattern, while the UKMO ensembles produce a better correlated pattern but with half the amplitude of the ERA5 regression.

Finally we note that, if we had a larger sample of years for both re-analysis and re-forecasts, we could also obtain a reliable estimate the ENSO-independent teleconnection of WCIO rainfall by computing the regression pattern P0(x) only using years with neutral ENSO conditions, without any further decomposition. With just 7 neutral ENSO years in the re-forecast sample, the latter procedure may not provide a robust estimate, but we just used it to check the consistency with the results of the statistical decomposition over the full 24-year re-forecast sample (i.e. with the pattern P0(x) computed from Eq. 6). The results for ERA5 and both the ECMWF and UKMO models (not shown) are broadly consistent with the results of the statistical decomposition made from the full data record, confirming the zonally-symmetric structure of the model teleconnections and their much reduced amplitude with respect to the ERA5 pattern.

5 Summary and conclusions

In this study, we have investigated how nine state-of-the art seasonal forecasting systems, namely those contributing to the operational C3S MME in winter 2021/22, perform in simulating the early- and late-winter response to ENSO, with a focus on the North Atlantic-European (NAE) region. Using re-forecast ensembles covering winters 1993/94 to 2016/17, we have computed regression and composite patterns for El Niño and La Niña events, as well as signal-to-noise (S/N) ratios for the projections of individual anomalies onto the regression patterns in the NAE sector. The main results are as follows:

  • While all models reproduce late-winter (JF) statistics with a high degree of fidelity, the amplitude of the early-winter (ND) response in NAE is severely under-estimated with respect to the patterns derived from ERA5 data, on average by a factor of 2.

  • Models do agree with ERA-5 data in showing a larger S/N ratio in El Niño winters than in La Niña winters, but do not reproduce the large S/N ratio found in ERA-5 data for the ND El Niño response. Contrarily to ERA5, a majority of models produce a larger S/N ratio in JF than in ND for El Niño events.

We have further investigated to what extent the deficiencies in simulating the early-winter NAE response can be attributed to a poor representation of the constructive interference of teleconnections from the Indian and Pacific Oceans. Following Abid et al. (2021), we have separated the independent contribution from central Pacific SST anomalies and the Indian Ocean rainfall anomalies, which are in turn affected by ENSO. As the total regression pattern against the Niño3.4 index can be written as a superposition of the two terms, we looked at how the individual contributions from the two ocean basins were simulated by the ECMWF and UKMO models, in comparison to ERA5 results.

These diagnostics show that, while in re-analysis data the Indian Ocean teleconnection plays a major role in reinforcing the Pacific signal propagating into the NAE region (actually being the dominant signal, as argued by Abid et al. 2021), both the ECMWF and UKMO models produce a weaker and more zonally-symmetric teleconnection, which fails to increase significantly the fidelity of the total ENSO response. However, deficiencies in either phase or amplitude were also found in the linearly-independent teleconnection from Niño3.4 SST, implying that the Indian Ocean teleconnection is not the only source of error in the early-winter ENSO response.

Given the large sample size allowed by the multiplicity of realisations in the re-forecast ensembles, the main source of statistical uncertainty in our results comes from the limited sample of seasonal means in the ERA5 record, especially when El Niño and La Niña events are considered separately. By using a re-sampling method, we have tested that the difference in S/N ratio between the ND and JF response to El Niño is statistically significant, with less than 1% chance of this result being obtained by chance from ERA5 data. Using the same technique, we have shown that the differences between ERA5 and model composites for El Niño events in ND are also highly significant. With regard to the contribution of the Indian Ocean teleconnection in ND, it should be pointed out that teleconnections following the MJO phases when convection is active over the Indian Ocean are also underestimated by many state-of-the-art models (see e.g. Vitart 2017 for a summary of results from the S2S database), pointing to a common problem across different time scales.

In conclusion, it is fair to say that understanding why so many state-of-the-art coupled models fail (in a rather similar way) to represent the observed features of the early-winter ENSO response, while all doing a rather good job in JF, is a particularly difficult task. Given this limited comprehension, it is important not to be excessively complacent about the positive results published recently about late-winter seasonal forecasts for the northern extratropics (e.g. Scaife et al. 2014; Dunstone et al. 2016; Mezzina et al. 2022). While the JF (or JFM) period is more relevant in terms of temperature anomalies, late autumn and early winter represent the main rainy season in many parts of southern Europe. As shown by Ayarzagüena et al. (2018) and Abid et al. (2021), the early-winter ENSO teleconnection involves inter-dependent signals from different parts of the tropical oceans, and represents a stringent test on the ability of coupled models to reproduce the complexity of tropical-extratropical interactions.