Introduction

ENSO is an irregular periodic fluctuation (i.e. every 2–7 years) in wind and sea surface temperature (SST) in the central and eastern tropical Pacific Ocean. It has a significant impact on global climate patterns because it can change atmospheric circulation worldwide, which affects temperature and precipitation. Agriculture yields, commodity supply chains, energy demand, water resources, animal migration, etc. are all impacted. The ENSO cycle has three phases: El Nino, La Nina, and Neutral (L'Heureux 2014).

  • El Nino / warm phase: the unusual warm ocean temperatures in the eastern Pacific

  • La Nina / cool phase: the unusual cool ocean temperatures in the eastern Pacific

  • Neutral: neither El Nino nor La Nina

ENSO indicators include Oceanic Nino Index (ONI), Multivariate ENSO Index (MEI.v2), Southern Oscillation Index (SOI), Trans-Nino Index (TNI), Pacific-North American (PNA) Index, Outgoing Longwave Radiation (OLR), TAO/Triton Data Display, and sea surface temperature (SST) indices which are Nino 1 + 2, Nino 3, Nino 3.4, Nino 4. According to the National Oceanic and Atmospheric Administration (NOAA), the ENSO indicators are the ONI, SOI, OLR, MEI.v2, TAO/Triton Data Display, and SST anomalies (Barnston 2015; NOAA n.d.).

Linear approaches have historically dominated time series forecasting since they can solve many straightforward forecasting problems. To address the shortcomings of linear methods, nonlinear approaches have been applied to climate studies. Deep learning has been used to improve weather forecasts and predict climate variability because it is more effective at identifying weather features (Gupta 2019; Shin et al. 2022).

ENSO exhibits seasonal patterns and is a time series forecasting problem (NOAA n.d.). Deep learning holds significant potential for forecasting time series, especially learning about temporal dependencies and addressing temporal patterns like trends and seasonality. Deep neural networks support multiple inputs and outputs and can learn arbitrarily complicated mappings. Deep learning techniques for forecasting time series include convolutional neural network (CNN), recurrent neural network (RNN) such as simple RNN and long short-term memory (LSTM), and hybrids (CNN-LSTM, ConvLSTM) (Brownlee n.d.).

The contributions of this study include:

  • A comprehensive ENSO dataset containing standardized monthly climate data spanning 74 years, from 1950 to 2023 (Mir 2022a)

  • Data analysis to validate the dataset (Mir 2022b)

  • Comparison of deep models to verify the dataset and assess how well they fit the data (Mir 2023)

Related work

The deep models used in ENSO analysis and prediction mainly include CNN, RNN, and hybrids, as shown in Table 1.

Table 1 Deep learning models and data used for forecasting ENSO

Shin et al. (Shin et al. 2022) used CNN to analyze ENSO patterns from climate model simulations. The model accurately forecasted ENSO for a 9-month lead with high correlation (~ 0.82). Ham et al. (2019) trained a convolution network using transfer learning on historical simulations and reanalysis data for multi-year ENSO predictions. The model successfully forecasted ENSO 12 months ahead, with an accuracy of 66.7%. Cao et al. (Cao et al. 2020) studied ENSO prediction based on LSTM using meteorological time-series data. They also showed that the quality of the dataset is important for good prediction results. Ha et al. (2021) used LSTM, ConvLSTM-LSTM, and ConvLSTM-GRU networks, to predict the monthly streamflow of the Yangtze River. They proved that there is an implicit relationship between ENSO and streamflow data that can be learned by the neural network. Mahesh et al. (2019) trained a convolutional and recurrent neural networks model on physical simulations for predicting monthly ENSO temperatures. Wang et al. (2023) reviewed how deep learning has been used to predict ENSO in the past 10 years and provided a summary of the most influential papers on this topic. They discussed the potential of deep learning for ENSO prediction and the challenges that need to be addressed.

Methodology

ENSO data since 1950 was collected from reliable sources such as NOAA (NOAA n.d.), NCEI (NCEI n.d.), NCAR (NCAR n.d.), and other official weather sites (Null 2023; National Centers for Environmental Information NOAA n.d.). Data prior to 1950 is available for some indicators like Nino 3.4 SST, but for the primary and most recent indicators such as MEI.v2 and OLR, the data is unavailable. The dataset was prepared and formatted using pandas and Excel.

A range of data analysis and visualization techniques were utilized to validate the dataset by examining the following aspects: seasonal patterns in the ENSO, the strength of ENSO phases over time, the relationship between ENSO and ONI, and the correlation between different indicators. ONI is widely accepted and used by NOAA as the standard for classifying ENSO events. The correlation analysis was specifically conducted to explore the relationship between ONI and other ENSO indicators to identify potential predictor variables for ONI prediction.

During preprocessing, several operations were performed to prepare the data for training. The data was transformed into supervised learning format to get input and output sequences. This is necessary as time series forecasting problems need to be reframed as supervised learning problems. The preceding 12 months' time steps were employed to forecast the subsequent 3 months' time steps. The data was divided into training, validation, and test sets in an 80:10:10 ratio. Separate scalers were employed to normalize the input and output variables, ensuring restoration of the scaling to obtain the actual output values. To assess the suitability of the dataset and predict ONI, four deep learning models were trained: CNN, Simple RNN, LSTM, and CNN-LSTM. The data was also reshaped to adhere to the input shape specifications of the model.

To evaluate the performance and accuracy of each model in fitting the data and making predictions, several evaluation metrics such as root-mean-square error (RMSE), mean absolute percentage error (MAPE), mean absolute error (MAE), and r-squared (R2) were used. These metrics provide a comprehensive assessment of the model’s accuracy and prediction quality.

$$MAE =\frac{1}{n}{\Sigma }_{i=1}^{n}|{{\text{y}}}_{{\text{i}}}- \widehat{{y}_{i}}|$$
(1)
$$MAPE =\frac{1}{n}{\Sigma }_{i=1}^{n}\left|\frac{{y}_{i}-\widehat{{y}_{i}}}{{y}_{i}}\right|$$
(2)
$$RMSE =\sqrt{\frac{1}{n}{\Sigma }_{i=1}^{n}{\left({y}_{i}-\widehat{{y}_{i}}\right)}^{2}}$$
(3)
$${R}^{2} =\frac{{\Sigma }_{i=1}^{n}{\left({y}_{i}-\widehat{{y}_{i}}\right)}^{2}}{{\Sigma }_{i=1}^{n}{\left({y}_{i}-\overline{y}\right)}^{2}}$$
(4)
$$\left(n\;sample\;size,\;y_{i\;}actual\;value,\;\widehat{y_i}\;predicted\;value\;,\overline y\;mean\right)$$

Figure 1 summarizes the entire process discussed above from data collection to prediction, showing the basic steps involved in training a machine learning model.

Fig. 1
figure 1

The basic process of training and evaluating a deep learning model

Results

A comprehensive ENSO dataset spanning 1950–2023 was created, containing standardized monthly data. Tables 2 and 3 provides an overview of the dataset's content.

Table 2 ENSO dataset: indicator columns details
Table 3 ENSO dataset: other columns details

El Nino events contribute to a slight increase in global surface temperature, while La Nina events tend to have a cooling effect. Global temperature anomalies column is added in the dataset to provide data for studying the complex relationship between ENSO and global temperature. The data analysis (Fig. 2) shows a consistent rise in global temperature, thus validating the dataset.

Fig. 2
figure 2

Changes in global average surface temperature from 1950–2023. Blue bars indicate cooler-than-average years; red bars show warmer-than-average years

The primary indicators for ENSO are ONI and MEI.v2. ONI, the preferred indicator by NOAA, calculates the 3-month average SST anomaly in the Nino 3.4 region. El Nino events are identified by SST anomalies at or above + 0.5 °C, La Nina events by anomalies at or below -0.5 °C and Neutral by anomalies between -0.5 °C and + 0.5 °C. The threshold is further categorized as Weak (0.5 to 0.9), Moderate (1.0 to 1.4), Strong (1.5 to 1.9), and Very Strong (≥ 2.0) events (Null 2023). Visualization of ONI and ENSO phase-intensity data (Figs. 3 and 4) confirms the correlation between ENSO and ONI, validating the above statements and ensuring the dataset's accuracy. The pattern in Figs. 3 and 4 also confirms the seasonality in ENSO.

Fig. 3
figure 3

ENSO events and their strength since 1950. The positive bars in red correspond to El Nino events while the negative bars in blue correspond to La Nina events

Fig. 4
figure 4

Effect of ONI on defining ENSO event type and intensity. The positive peaks correspond to El Nino events while the negative peaks correspond to La Nina events

The correlation analysis results (Fig. 5) indicate that ONI has a significant positive correlation with Nino 3.4 SST anomalies and MEI.v2, while it has a strong negative correlation with OLR and SOI. Therefore, these indicators were selected as predictors for ONI prediction. The correlation analysis also validates the dataset.

Fig. 5
figure 5

Relation of ONI with other indicators. A value above + 0.5 indicates a strong positive (direct) relation and below -0.5 indicates a strong negative (inverse) relation

Table 4 presents the performance comparison of the deep models in forecasting ENSO using test data. The results show that the LSTM model outperformed other models in terms of accuracy with a root mean square error of 0.05, and better fitting the dataset with an r-squared value of 0.87.

Table 4 Performance comparison of the deep models

Conclusion

In this study, we generated a comprehensive ENSO dataset, spanning 74 years (1950–2023), by compiling standardized monthly climate data from official sources. Through data analysis, we explored the seasonality in ENSO, the relationship between ENSO and ONI, and other trends and correlations within the ENSO data. We also trained and compared four deep-learning networks (CNN, RNN, LSTM, and CNN-LSTM) to forecast ENSO. The LSTM network achieved superior performance and demonstrated a good fit with the data. This aligns with its ability to capture long-term dependencies and patterns in time series data, crucial for accurately predicting periodic phenomena like ENSO with its 3–7 year El Nino/La Nina cycle. Both data analysis and the prediction results validate the dataset. As a valuable resource for future research, the dataset (Mir 2022a) and code (Mir 2022b, 2023) are publicly available on Kaggle, with regular updates planned. This paves the way for advancements in ENSO analysis and forecasting, leading to enhanced global preparedness for its wide-ranging impacts.