1 Introduction

The El Niño-Southern Oscillation (ENSO) (Dijkstra 2005; Clarke 2008; Sarachik and Cane 2010; Wang et al. 2017; Timmermann et al. 2018; McPhaden et al. 2020) can be considered as a quasi-oscillation of the Pacific ocean-atmosphere system, consisting of irregular warm (“El Niño”) and cold (“La Niña”) deviations from the long-term mean. Strong El Niño episodes can lead to extreme weather events (like extreme rainfall and droughts) in various parts of the globe (Davis 2001; Wen 2002; Kovats et al. 2003; Donnelly et al. 2007; Corral et al. 2010; McPhaden et al. 2020). To mitigate at least some of the adverse societal and economic impacts, early forecasts of El Niño events are thus highly desirable.

To forecast El Niño events, many state-of-the-art coupled climate models, as well as a variety of statistical approaches (Cane et al. 1986; Penland and Sardeshmukh 1995; Tziperman et al. 1997; Fedorov et al. 2003; Galanti et al. 2003; Kirtman 2003; Chen et al. 2004; Palmer et al. 2004; Luo et al. 2008; Chen and Cane 2008; Chekroun et al. 2011; Saha et al. 2014; Chapman et al. 2015; Feng et al. 2016; Lu et al. 2016; Rodriguez-Mendez et al. 2016; Meng et al. 2018; Noteboom et al. 2018; Ham et al. 2019; DeCastro et al. 2020; Petersik and Dijkstra 2020; Hassanibesheli et al. 2022), have been suggested, and monthly updated overviews of the latest operational forecasts (consisting of 17 dynamical and 9 statistical methods) are available from the International Research Institute for Climate and Society (IRI 2023a). While these forecasts are quite successful at shorter lead times, they have limited anticipation power at larger lead times. In particular, they generally fail to overcome the so-called “spring barrier” (see, e.g., Webster 1995; Goddard et al. 2001), which shortens their typical warning time to around 6 months (Barnston et al. 2012; McPhaden et al. 2020) (see also the discussion in Tippett et al. 2020).

Table 1 Oceanic Niño Index (ONI) 2012 - present (data from CPC 2023)

In 2012, an alternative forecasting approach (Ludescher et al. 2012, 2013) (see also Ludescher et al. 2014) has been suggested, which is based on complex-networks analysis (Tsonis et al. 2006; Yamasaki et al. 2008; Donges et al. 2009; Gozolchiani et al. 2011; Dijkstra et al. 2019; Fan et al. 2021; Ludescher et al. 2021). The approach analyses the strength of the cooperativity represented by the mean link strength S(t) in a Pacific climate network, and gives an alarm when S crosses a fixed threshold, predicting a new El Niño episode to come in the following year. The optimal threshold \(\Theta \) was determined in a learning period between 1950 and 1980. In the period between 1981 and 2011, this threshold \(\Theta \) was used to hindcast the presence (alarm) or absence (no alarm) of a new El Niño event in the following year. After the threshold is fixed, there is no free parameter in the approach.

The procedure to split the known data (at that time between 1950 and 2011) into a learning phase and a hindcasting phase is necessary for statistical forecasting methods and aims to reduce the risk of an overfitting to spurious precursors. But the mere fact that each algorithm, when being developed, can only make “predictions” of events that have already occurred automatically introduces a certain “publication” bias, because only those algorithms that are successful in both the learning and hindcasting phase will be considered and published.

Fig. 1
figure 1

The structure of the climate network. Each of the 14 grid points in the “El Niño basin” (red circles) is linked to each of the 193 grid points outside this domain (blue circles). The green rectangle denotes the Niño3.4 region

The true test for statistical forecasts are real-time forecasts. For the climate network approach, the period of real-time forecasts started in 2011. Here we evaluate the real-time forecasts of the network approach. First, in Section 2, we describe how El Niño-events are classified by the Oceanic Niño Index (ONI) and list the ONI values between 2011 and present. Next, in Section 3, we briefly describe the climate network approach. In Sections 4 and 5, we analyse its real-time forecasts between 2011 and present and determine the statistical significance of the forecast. In Section 6, we describe an improvement of the algorithm, which is based on the false alarm statistics.

2 Data

The ENSO phenomenon is quantified by the Oceanic Niño Index (ONI), which is defined as the three-month running-mean sea surface temperature (SST) anomalies in the Niño3.4 region (see Fig. 1) and is a principal measure for monitoring, assessing, and predicting ENSO.

An El Niño-episode is said to occur when the index is at least 0.5°C above the average for a period of at least 5 months. Table 1 shows the ONI between 2012 and present, as communicated by the National Oceanic and Atmospheric Administration (NOAA) (CPC 2023). The El Niño periods are in boldface. The table shows that there were no El Niño periods in 2012, 2013, 2017, 2020, 2021, and 2022. In May 2023, an El Niño started and is still ongoing at the time of writing.

Fig. 2
figure 2

The forecasting scheme. We compare the average link strength S(t) in the climate network (red curve) with the decision threshold \(\Theta =2.82\) (horizontal line) and the ONI (right scale), between January 1981 and December 2011. When the link strength crosses the threshold from below and the last available ONI is below 0.5°C, we give an alarm and predict that a new El Niño episode will start in the following calendar year. Periods with an ONI greater or equal to 0.5°C for less than 5 months, i.e., periods that do not satify the definition of an El Niño are shown in light blue. The El Niño episodes (when the ONI is greater or equal 0.5°C for at least 5 months) are displayed in dark blue. Correct predictions are marked by green arrows and false alarms by dashed arrows. The early false alarms in February 1994 and July 2004 are followed by at least one ONI value equal or above 0.5°C in the same year. We would like to note that in the shown hindcasting period of the algorithm, the temporal distance between two El Niño onsets is at least 2 calendar years. However, this does not hold in general since in the learning phase (1950-1980, here not shown, see, e.g., the table in CPC 2023), there were 3 instances where an El Niño onset was directly followed by a 2nd onset in the following calendar year

3 The climate network approach

The structure of the climate network considered here is shown in Fig. 1. The network is based on a combination of the networks introduced by Yamasaki et al. (2008) and Gozolchiani et al. (2011), who studied cooperative phenomena during El Niño events. The nodes of the network consist of 14 grid points in the “El Niño basin” (red circles) (Gozolchiani et al. 2011) (which roughly covers the Niño1, Niño2, Niño3, and Niño3.4 regions), and 193 grid points outside this domain (blue circles) (Yamasaki et al. 2008).

The green rectangle denotes the Niño3.4 region where the ONI is calculated. The grid points are the nodes of the climate network and are characterized by their surface air temperature (SAT) anomaly. The SAT data are obtained from the NCEP Reanalysis 1 dataset (Kalnay et al. 1996; NCEP-NCAR 2023).

Each node inside the El Niño basin is linked to each node outside the basin. The link strength between two nodes (i.e., the strength of the teleconnections between them) at a given time t is determined from the values of their time-lagged cross-correlation (see Appendix A) for which we consider time lags between 0 and 200 days. For each pair of nodes i and j, we determine, for the given time t, the maximum, the mean, and the standard deviation around the mean of the absolute value of the cross-correlation function, and define the link strength \(S_{ij }(t)\) as the difference between the maximum and the mean value, divided by the standard deviation. Accordingly, \(S_{ij }(t)\) describes the link strength relative to the underlying background noise (signal-to-noise ratio). By averaging over all individual links in the network at a given instant t, one obtains the mean link strength S(t), which is the crucial entity in the climate network approach (for details, see Gozolchiani et al. 2011; Ludescher et al. 2013) and Appendix A). The variation of S(t) with time t can be considered as a measure of the way the cooperativity between the equatorial “El Niño basin” and the rest of the tropical and subtropical Pacific region changes with time t. S(t) has a remarkable property: it typically decays during an El Niño event (Ludescher et al. 2013) and rises in the year before an event starts. This rise of S(t) can be used as a precursor for the event (Ludescher et al. 2013, 2014).

The optimized algorithm involves an empirical decision threshold \(\Theta \). Whenever S crosses \(\Theta \) from below while the most recent ONI is below 0.5°C, the algorithm sounds an alarm and predicts the start of a new El Niño episode in the following year. Otherwise, it predicts the absence of a new El Niño event.

In the learning phase between 1950 and 1980, all thresholds above the temporal mean of S(t) were considered and the optimal ones, i.e., those ones that lead to the best predictions in the learning phase, were determined. \(\Theta \)-values between 2.815 and 2.834 lead to the best performance (Ludescher et al. 2013), with a false alarm rate of 1/20.

In the hindcasting phase (1981-2011) (see Fig. 2, where \(\Theta =2.82\)), the performance of these thresholds was tested; thresholds between 2.815 and 2.826 gave the best results. Figure 2 shows that the alarms were correct in 75% and the non-alarms in 86% of all cases. For \(\Theta \)-values between 2.827 and 2.834, the performance was only slightly weaker.

Fig. 3
figure 3

The real-time forecasts. Same as Fig. 2, but for the period between January 2011 and December 2023. As in Fig. 2, the false alarm (in 2019) is followed by at least one ONI value equal to or above 0.5°C (highlighted in light blue) in the same year. Only alarms until 2022, where the outcome is known, are marked by arrows

4 Real-time forecasts between 2011 and present

Figure 3 shows the forecasts of the network approach between 2011 and 2022. In 4 years (2013, 2017, 2019, and 2022) the algorithm predicted the onset of a new El Niño event in the following calendar year. Only the alarm of 2019 was a false alarm. The present El Niño started in May 2023, so the alarm given in June 2022 was also correct.

In 8 years (2011, 2012, 2014, 2015, 2016, 2018, 2020, 2021) the approach did not give an alarm and thus correctly predicted the absence of a new El Niño in the following year. This is true also for 2014, since in 2015 no new El Niño episode, separated from the foregoing one by at least one ONI value below 0.5, started. Also these forecasts of the absence of a new El Niño event are far from being trivial as a comparison with the official forecasts by the International Research Institute for Climate and Society (IRI 2023b) shows:

(i) While the climate network approach already in December 2011 indicated the absence of a new El Niño in 2012, the CPC/IRI consensus probabilistic ENSO forecast provided in August and September 2012 75 and 65 percent probability, respectively, for the presence of El Niño conditions in December 2012 (NDJ).

(ii) In spring 2017, most dynamical and statistical models falsely predicted an event in 2017. For instance, the vast majority of the ensemble members of the North American Multimodel forecasted, in April 2017, positive anomalies, while the actual SSTA turned out to be negative (Tippett et al. 2020).

Indeed, according to Tippett et al. (2020), climate models tend to predict warming when initialized after observed warming conditions and cooling when initialized after observed cooling conditions, and thus failed to capture the correct direction of ENSO evolution in half of the 8 springs between 2011 and 2018.

Next, we turn to the question whether the real-time forecasts of the climate network approach are statistically significant, i.e., whether the same performance can be obtained by random guessing or not.

5 Statistical significance of the forecasts

For obtaining the statistical significance of a given configuration \(K_0\) containing N forecasts with \(n_c\) correct alarms and \(n_f\) false alarms, one has to determine the probability \(w_0\) that a configuration with the same number \(n_c\) of correct alarms and the same number \(n_f\) of false alarms can be obtained by guessing. In addition, one has to consider all configurations \(K_1, K_2, \dots , K_m\) with a better or equal quality of forecast and determine the corresponding probabilities \(w_1, w_2, \dots , w_m\). Then the probability p that by guessing the same or better forecasts can be made is given by

$$\begin{aligned} p=\sum _{i=0}^m w_i, \end{aligned}$$
(1)

p is called the p-value. When p is below 0.05, the forecasts are called statistically significant at a 0.05 level; when p is below 0.01, the forecasts are called highly significant.

To obtain the p-value, we consider 2 different null hypotheses, \(H_R\) and \(H_C\). In the first null hypothesis \(H_R\) we assume that the given forecast configuration can be obtained by randomly guessing with the climatological El Niño onset probability. This is a priori justified since our algorithm allows the occurrence of 2 or more subsequent El Niño onset alarms (see the alarms in 1993 (correct alarm) and 1994 (false alarm) (Fig. 2)), and the observed (1950-2023) temporal distance between 2 El Niño onsets is between 1 and 5 years, see, e.g., the table in CPC 2023). In the second more stringent null hypothesis \(H_C\) we go one step further and take into account that the El Niño onsets are correlated, in particular in the hindcasting and forecasting phase (1981-2023) we are interested in, 2 El Niño onsets are separated by at least one calendar year.

5.1 Uncorrelated random guessing

For obtaining the probabilities \(w_i\), we first need to determine the occurrence probability q of the onset of El Niño episodes. In the 43 years between January 1981 and 2023, 12 El Niño episodes started, so the occurrence probability is \(q=12/43\cong 0.279\).

First we focus on the occurrence of new El Niño episodes in the period between January 2012 and December 2023. Denoting years where a new event started by \(+\) and years where no new event started by −, the observed configuration of years with and without new El Niño events is

$$\begin{aligned} (-,-,+,-,-,-,+,-,-,-,-,+), \end{aligned}$$
(2)

where the most left symbol refers to 2012 and the right-most symbol to 2023 where a new El Niño episode started in May. Note that in the El Niño episode between 2014 and 2016, the years 2015 and 2016 are “-” years, since no new El Niño event started during them. For the period between 2012 and 2023, the network approach predicted the configuration

$$\begin{aligned} (-,-,+,-,-,-,+,-,+,-,-,+), \end{aligned}$$
(3)

which differs from the observed configuration only in the year 2020 (+ instead -), where a new event was falsely predicted to come.

There are 9 possible configurations where one of the \( - \) signs in the observed configuration (2) is changed into a \(+\) sign, and all have the same quality of forecast. Accordingly, the probability of randomly guessing one of these 9 configurations is \(w_0=9 q^4(1-q)^8\).

There is only one better forecast possible: the probability \(w_1\) of randomly guessing the observed configuration (2) is \(w_1=q^3(1-q)^9\). Accordingly, the p-value of the real-time forecasts is

$$\begin{aligned} p=9q^4(1-q)^8 + q^3(1-q)^9. \end{aligned}$$
(4)

This yields, with \(q=12/43\),

$$\begin{aligned} p\cong 5.1 \times 10^{-3}, \,\, \mathrm{period\,\, 2011-present}, \end{aligned}$$
(5)

which is well below the high-significance threshold \(p=10^{-2}\).

When we consider both the hindcasting and forecasting period (January 1981 - December 2023) the p-value is obtained in exactly the same way, but there are more configurations to be considered. In the 43 years between 1981 and 2023, 12 new El Niño episodes started. In the 42 target years between 1982 and 2023, the network algorithm correctly forecasted 9 of these events and gave 3 false alarms. Accordingly, the hit rate \(\alpha _+\) defined as the number of correct alarms \(n_c\) divided by the number of events, is 9/12, while the false alarm rate, defined as the number of false alarms \(n_f\) divided by the number of non-events, is 1/10. Thus the rate \(\alpha _-\) of correctly predicted non-events is \((30-3)/30=9/10\). Both numbers, \(\alpha _+\) and \(\alpha _-\) quantify the performance of the algorithm. The probability of randomly guessing a configuration with \(n_c\) correct events and \(n_f\) false events is given by

$$\begin{aligned} w=\left( {\begin{array}{c}12\\ n_c\end{array}}\right) \left( {\begin{array}{c}30\\ n_f\end{array}}\right) q^{n_c+n_f}(1-q)^{42-n_c-n_f}. \end{aligned}$$
(6)

The binomial coefficients describe the number of ways \(n_c\) events can be chosen out of 12 events and \(n_f\) false events out of 30 non-events; \(q=12/43\) as above.

We need to determine w for all configurations with a similar or better predictive power. A natural measure for the predictive power is \(P={(\alpha _+ + \alpha _-) -1}\), which is 1 when the forecast is perfect and 0 when the forecast is purely random. Here, \(P=(3/4+ 9/10)-1 = 0.65\).

Accordingly, for estimating the p-value of our forecast, we take into account all configurations with a higher or equal predictive power, i.e., (\(n_c=8, n_f=0\)), (\(n_c=9, n_f=0,1,2, 3\)), (\(n_c=10, n_f=0,1,\dots ,5\)), (\(n_c=11, n_f=0,1,\dots ,8\)), and (\(n_c=12, n_f=0,1,\dots ,10\)). For each of these combinations of \((n_c, n_f)\), we determine w from (6) and sum up (1) the obtained probabilities. The result is

$$\begin{aligned} p\cong 3.0 \times 10^{-5}, \,\, \mathrm{period\,\, 1981-present}. \end{aligned}$$
(7)

5.2 Correlated random guessing

Figures 2 and 3 show that between 1981 and 2023, subsequent El Niño onsets do not occur: in the year in which an El Niño period ends, no new El Niño starts afterwards. Accordingly, the El Niño onsets are anticorrelated. If in December of any year between 1981 and present, El Niño conditions prevailed, then these conditions either continued throughout the next year (as in 2015) or ended in the next year. Accordingly, the probability \(q_2\) that after a December with El Niño conditions, a new El Niño event will start in the following year is 0. This considerably modifies the guessing procedure and leads to a higher p-value than in the case of purely random guessing. In the more realistic case of a small non zero probability \(q_2\), the p-value will be between the p-value for random guessing and the p-value for \(q_2=0\). Here we focus solely on the strongly anticorrelated case \(q_2=0\), since its p-value serves as an upper bound for both correlated and random guessing. In the absence of El Niño conditions in December, a new El Niño event will start in the following year with probability \(q_1\). For simplicity, we confine ourselves to the forecasting phase.

In the following, we denote a year with an El Niño onset by N and add the index 1 or 2 for the duration of the event. A year where no new El Niño event starts is denoted by 0. Then the observed configuration is

$$\begin{aligned} (0,0,[N_2,0], 0,[N_1,0],0, 0,0,N_1), \end{aligned}$$
(8)

where the brackets mark the correlated complexes of N and 0. The probability for guessing correctly this configuration is

$$\begin{aligned} w_1=q_1^3(1-q_1)^6. \end{aligned}$$
(9)

The network algorithm yielded

$$\begin{aligned} (0,0,[N_2,0], 0,[N_1,0],f, 0,0,N_1), \end{aligned}$$
(10)

where f denotes the false alarm. A false alarm can happen only in those years that do not belong to an N0 complex. There are 6 configurations with one false alarm, i.e.

$$\begin{aligned} p=6q_1^4(1-q_1)^5 + q_1^3(1-q_1)^6. \end{aligned}$$
(11)

Between 1981 and 2023, there were 12 “0”-years followed by an El Niño period “N”, and 17 “0”-years followed by a “0” year, i.e., non El Niño onset years. Thus \(q_1=12/29\), yielding \(w_1\cong 0.003\) and \(p\cong 0.015\). Accordingly, even when we compare the climate network forecasts with those of strongly correlated random guessing, the network forecasts are significant at the 0.015 level.

Fig. 4
figure 4

Forecasts based on ocean temperatures. Similar to Figs. 2 and 3, but here the mean link strength S(t) is calculated from the potential temparature at 5 meter ocean depth. The construction of the climate network has been adapted to match the GODAS dataset: the climate network grid resolution is now 5° and the temporal resolution of the input data is 5 days. The figure shows version 2 of the forecasting algorithm for the decision threshold \(\Theta =2.12\). All major El Niños are also predicted with the GODAS data set. In contrast, 2 small El Niños are now not predicted (1994 and 2018), but the 2006 El Niño is predicted additionally

6 Further improvement of the climate network algorithm based on the false alarm characteristics

Figures  2 and 3 show that all false alarms in the hindcasting and forecasting period (1994, 2004, and 2019) are followed by at least one ONI value equal or above 0.5°C in the same calendar year. This suggests that there may be only a low chance that an alarm is correct when the ONI does not stay below 0.5°C for the rest of the year. Accordingly, an improved algorithm (version 2) based on this feature may consist of 2 steps. (i) In the first step, a (preliminary) alarm is given when S crosses the threshold from below, indicating the possible appearance of an El Niño event in the following year. This alarm can occur at any time in a calendar year. (ii) When the ONI stays below 0.5°C until the end of December, this alarm is confirmed. Otherwise, the alarm is withdrawn and the absence of an El Niño onset is predicted for the following year.

When applying version 2 of the algorithm to the period between 2011 and 2022, all forecasts turn out to be correct, resulting in p-values \(p\cong 1.1\times 10^{-3} \) for random guessing and \(p\cong 2.9 \times 10^{-3}\) for correlated guessing.

Figure 3 shows that in January 2023, the mean link strength S(t) crossed the threshold \(\Theta \), giving a preliminary alarm. However, since an El Niño started afterwards, the ONI did not stay below 0.5°C until December 2023. Therefore, this preliminary alarm is withdrawn, indicating the absence of an El Niño onset in 2024. Since there were 3 missed El Niño events between 1981 and 2023 and 30 correct predictions for the absence of a new El Niño, the probability for the absence of an El Niño onset in 2024 is 30/33 \(\approx 90.9\%\).

7 Applying the climate network approach to ocean temperatures

So far, the climate network to forecast the onsets of El Niño events or their absence has been constructed only based on surface air temperatures (SATs). An advantage of this physical quantity is the long data length. For instance, the NCEP Reanalysis 1 starts in the year 1948. Ocean datasets with high temporal resolution become available only with the advent of regular satellite and buoy observations since about 1980.

We test the performance of our climate network approach based on the ocean temperatures provided by the NCEP Global Ocean Data Assimilation System (GODAS) reanalysis (Behringer et al. 1998). Since the resolution of the GODAS reanalysis does not coincide with the 7.5° grid resolution of the climate network based on the NCEP Reanalysis 1 (see Fig. 1), we use a 5° resolution to match the SAT network as closely as possible. We also adapted the algorithm to pentad (5-day) time resolution since this is the highest available resolution of this dataset. Figure 4 shows the results based on the potential temperature at 5-meter depth, which is the dataset’s uppermost level. Since our climate network approach has already been validated on SAT data and the ocean data is too short for a reasonable splitting into a learning and hindcasting phase, we show a direct fit to the data. We find that version 2 of the climate network algorithm adapted to the GODAS data leads to 10 alarms, 8 of which turned out to be correct. All major El Niños are also predicted with this data set. In contrast, 2 small El Niño are now not predicted (1994 and 2018), but the 2006 El Niño is predicted additionally. Calculating the p-value as described in Section 5.1 leads to \(p\cong 1.1\times 10^{-4}\).

8 Conclusions

In summary, we have evaluated the quality of the real-time El Niño forecasts made by the climate network approach. We have used two null hypotheses to determine the statistical significance of the forecasts and found that the forecast is at least significant at the 0.015 level, this way clearly rejecting the null hypothesis that the same performance might be obtained by random or correlated guessing. We are not aware of any other method that allows, within a period of 12 years, a similar quality of real-time forecasts with a lead time of about 1 y.

The climate network approach suggests that the emergence of cooperativity between the El Niño basin and the rest of the Pacific is an important prerequisite for the development of an El Niño event in the following year. We can speculate that the westerly wind bursts are more effective in initiating a large scale El Niño event when the Pacific is in a cooperative state, and this would explain the success of the complex network approach. But, a detailed analysis remains for future work.

The high prediction skill of the forecast and its long lead time should allow early mitigation methods. One of the advantages of the network approach is that it does not contain a freely choosable fit parameter. The underlying climate network was introduced in a different context and independently of any El Niño forecasting well before it was used to forecast El Niño events. Also the parameters used in the calculation of the link strengths had been fixed before (Yamasaki et al. 2008). The only new parameter in the algorithm, the threshold \(\Theta \), was fixed in the learning phase (Ludescher et al. 2013). The reanalysis (NCEP) temperature data can be easily obtained from (NCEP-NCAR 2023). Since also the calculation of the link strengths is straightforward and not computationally demanding, the network approach can be easily used to obtain real-time El Niño forecasts, which is an additional advantage besides the long lead time.

The climate network-based approach discussed here forecasts the onset or absence of an El Niño event in the following calendar year with high accuracy. The approach can be combined (Ludescher et al. 2023a) with additional statistical forecasting methods for the magnitude (Meng et al. 2020) and type (Ludescher et al. 2023b) of an event. This way, the events’s risk potential can be estimated much in advance, and thus, more time becomes available to plan and implement adapted mitigation measures.

So far, the climate network approach has been applied only to forecasting the onset of an El Niño episode. It is an open question, how to extend it to early forecast also La Niña episodes. The majority of El Niño episodes, in particular the strong ones, are followed by a La Niña in the consecutive year, so here, the forecast is more straightforward. But often, 2-year or even 3-year La Niña episodes, like the one between 2020 and 2023, occur, and the challenge is to predict both the onset and the length of a La Niña episode. We think that a combination of the climate network approach with deterministic approaches that can take advantage of ENSO’s quasi-oscillatory nature may be instrumental in developing an early forecasting approach for La Niña episodes.