Introduction

El Niño–Southern Oscillation (ENSO) events are typically divided into two classes: El Niño (warm) and La Niña (cold) events. Regardless of the event class, each event has at least two types of sea surface temperature (SST) anomalies: Central Pacific (CP) anomalies and Eastern Pacific (EP) anomalies (Ashok et al. 2007; Kao and Yu 2009; Kug et al. 2009). This subdivision depends on the geographical locations and evolution of strong anomalies. Typical differences or even slight differences in SST patterns result in various atmospheric circulations and cause chain reactions (James 1994; Trenberth and Hurrell 1994; Trenberth et al. 1998; Alexander et al. 2002), such as subtropical highs with different locations and strengths and tropical cyclones with different locations and trajectories (Tu′uholoaki et al. 2023). Clustering of ENSO SST events enables differentiating certain time events and determining any differences between such events. Therefore, SST clustering should be performed in accordance with not only geographical positions but also quantifiable and objective methods.

EP and CP types of ENSO dynamics or their frequencies could be attributed to global warming (Capotondi et al. 2015). Vecchi and Wittenberg (2010) present numerical climate model findings indicating that global warming leads to weakened Walker’s circulations and a decrease in the zonal slope of the thermocline around the equator, potentially resulting in a higher occurrence of CP events (McPhaden 2012). However, Yeh et al. (2012) argue that the thermocline inclination or the number of EP or CP events may be driven by natural variability. The challenge lies in differentiating the transition processes between CP and EP events, as mixed characteristics are often observed (Hernández et al. 2020), where certain EP events exhibit similarities to CP events for a period before diverging, and vice versa. In essence, no two ENSO events, whether CP or EP, are alike (McPhaden et al. 2006). One potential solution to this issue is the application of clustering techniques, which can help assess similarity and identify transitional processes.

The success of clustering depends on the distribution of data points. For example, data points that are farther apart are less likely to look similar. Isometric feature mapping (ISOMAP) is a separated data point and nonlinear dimensionality reduction method that accurately reflects the actual distance through tracing along the local linearity in the original nonlinear structure (Tenebaum et al. 2000; Balasubramanian et al. 2002). During the last step of ISOMAP, temporal principal components (PCs) and spatial empirical orthogonal functions (EOFs) are generated as the principal component analysis (PCA). Close ISOMAP PC data points have similar patterns, and leading ISOMAP PCs provide proper distributed low-dimensional data points for clustering (Bayá et al. 2008; Tasoulis et al. 2020).

Multiple regression analysis relies on leading PCs to reduce the number of predictor variables in order to achieve improved predictions and more efficient computations (Stock and Watson 2002; Sousa et al. 2007). Sousa et al. (2007) used a feedforward artificial neural network (NN) with PCs to replace a multiple nonlinear regression in order to achieve improved predictions. Brunton et al. (2020) supported this idea and proposed the calculation of computational fluid dynamics (CFD) problems in low-dimensional data points.

Traditional CFD numerical models are based on differential equations with sufficiently small spatial and temporal grids. Fine grid meshes are typically associated with additional data points, a sufficiently small time-integration step to ensure computational stability, and a high computational cost. Data-driven methods, that is, methods in which data are driven by statistics and not by physical laws (e.g., PCA, regression, and NNs), are used to address the limitations of this time-integration step. However, these methods cannot be solely used to solve CFD problems. Instead, the future trend of solving CFD problems is to combine both numerical and statistical methods. The term “data-driven” should have a generalized meaning pertaining to the transplantation of data variations. Brunton and Kutz (2022) used several machine learning algorithms, such as feedforward NNs, autoencoders, and deep learning, to solve CFD problems. However, when data-driven methods are used in the original high-dimensional data space, computational cost represents a substantial burden. These high-dimensional space data points should be counted in terms of low-dimensional data (Brunton et al. 2020; Brunton and Kutz 2022). In other words, if statistical data-driven models do not extract meaningful reduced dimensional data points or reduced features, they will not be able to offer improved forecasts (Wilks 2019).

Machine learning or artificial intelligence (AI) approaches, such as data-driven models or NN have become increasingly popular in the field of numerical weather prediction (NWP) models. These approaches have shown in enhancing various aspects of NWP models including the utilization of observation data, data assimilation, CFD numerical schemes, and post-processing (Bonavita et al. 2021; Dueben et al. 2021). The European Centre for Medium-Range Weather Forecasts (EMMWF) has put forth a 10-year development roadmap that emphasizes the integration of machine learning and NWP models. The developments include using data-driven models or AI models to help physical parameterization calculations (e.g. gravity wave drag, cloud, radiation, etc.), the tangent/adjoint models for data assimilation or ensemble prediction, and downscaling post-processing. Data-driven models offer the ability to learn and simulate the performance of past numerical models without being constrained by complex physical theorems or CFD equations. These approaches reduce the development and maintenance costs associated with creating tangent linear/adjoint models. Researchers no longer need to derive equations or model schemes that precisely correspond to the current nonlinear models. Instead, the calculations are performed through the composition of NN functions, eliminating the need for linearized and transposed operators of the original complex nonlinear models (Hatfield et al. 2021).

In this study, we clustered and predicted Pacific SST by using leading ISOMAP PCs. In terms of clustering, ISOMAP provided well-distributed SST leading PCs. In terms of prediction, the leading SST ISOMAP PC trajectories showed similarities with the Lorenz 63 model variable trajectories. Brunton and Kutz (2022) used a feedforward NN to study Lorenz 63 model variable trajectories. They successfully simulated similar trajectories to those of numerical models. In this study, we hypothesized that the use of appropriate learning algorithms in SST PC prediction and the combination of these PCs with time-fixed spatial SST EOFs would result in improved future predictions. We also hypothesized that spatial SST EOFs would not change in all our prediction experiments and that the learning algorithms would focus on temporal leading SST PC variations. The reason for fixed EOFs is that the leading EOFs are nearly identical especially when calculating to remove the seasonal cycle or monthly climatological mean (Elken et al. 2019).

The forecasting of SST using nonlinear regression and neural networks has been explored in previous studies by Tangang et al. (1997, 1998a, 1998b), Tang et al. (2000), and Wu et al. (2006). These studies utilized leading SST PCs as predictors in their models, along with sea level pressure and prior SST anomalies. They achieved favorable results in terms of correlation skills, particularly in predicting SST anomalies in the Pacific equator area, by considering the leading 5 PCs. In this work, we introduced some advancements to the existing approach. Instead of using traditional PCs, we adopted ISOMAP PCs, which offered improved distribution for resolving different ENSO events. Furthermore, we extended the hidden layers of the neural network, experiments with different neuron numbers, and incorporated 20 ISOMAP PCs as multi-dimensional space trajectories in our model. These modifications aimed to improve the forecasting accuracy of our NN model.

The remainder of this paper is organized as follows. Section “Data and methods” explains the clustering and learning algorithms used for SST prediction. Section “Clustering” outlines the cluster distributions and actual SST patterns of cluster centroids. It also provides clues for determining the similarity between an SST figure and a specific type of cluster instead of using CP or EP as a qualitative description. Section “Prediction” explains the prediction of SST through support vector regression (SVR) and NNs. Finally, Section “Conclusions and discussion” summarizes the main research findings and potential future improvements in data-driven SST prediction.

Data and methods

SST data were obtained from the fifth version of the Extended Reconstructed Sea Surface Temperature data set of the National Oceanic and Atmospheric Administration (NOAA), National Climatic Data Center, based on data from the Comprehensive Ocean–Atmosphere Data Set, collected from January 1980 to December 2022. El Niño, normal, and La Niña events were differentiated using the Niño 3.4 (170°W–120°W, 5°S–5°N) index of the NOAA Climate Prediction Center. El Niño events were defined in the Niño 3.4 region when the moving 3-month average SST anomaly exceeded 0.5 °C for at least 5 months. In contrast, La Niña or so called anti-El Niño events were defined when the average SST anomaly fell below 0.5 °C for 5 months. Pacific Ocean domain (120°E–60°W, 30°S–30°N) SST was used for clustering and learning algorithm prediction.

In accordance with Tseng (2022), SST ISOMAP was calculated using the distance matrix formed by the covariance of SST anomalies with a nearest neighbor number of 44. The results indicated that the 20 leading SST ISOMAP PCs explained approximately 90% of the variance in the reconstructed geodesic distance data matrix (cf. Figure 7 in Tseng 2022). These 20 reduced low-dimensional components were sufficient for clustering and data-driven model prediction. In addition, the computational cost of clustering and prediction was low, and computational complexity was lower than that in the original high-dimensional space. About ISOMAP and the traditional PCA, we highlighted in the Appendix.


ISOMAP measures the geodesic distance to differentiate the structure of data PC points, so the clustering based on the distance k-means was used in this article. The algorithm k-means for clustering that minimized the total reconstruction function by the proper cluster number. The reconstruction cost function (Theodoridis et al. 2010) was defined as

$$J_{{}} (\theta ,u) = \sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{M} {u_{ij}\, \left\| {x_{i} - \theta_{j} } \right\|^{2} } } \,\, ,$$
(1)

where \(\theta = (\theta_{1}^{T} ,...,\theta_{m}^{T} )^{T}\) represents different cluster center vectors, \(\left\| { \, \cdot \, } \right\|\) is the Euclidean distance, \(x_{i}\) represents the leading PC vectors, and

$${u_{ij}} = \left\{ {\begin{array}{*{20}{c}} {1,}&{{x_i}{\text{ is cosest to }}{\theta _j}} \\ {0,}&{{\text{otherwise}}} \end{array}} \right.,$$
(2)

where N is the number of leading PC vectors (data points) and M is the number of clusters that should be initially provided. After obtaining a satisfactory distribution of points for the SST ISOMAP PCs, the leading 20 PCs were selected, and k-means clustering was applied using the Euclidean distance calculation.

Three of the leading SST ISOMAP PC temporal trajectories demonstrated spiral circles with similar but not completely identical regular swinging behaviors and with different durations in each spiral cycle (Tseng 2022). These findings prompted a comparison with the 3D trajectories of the Lorenz 63 model, which Brunton and Kutz (2022) employed neural network algorithms to learn and replicate the model's numerical predictions. This comparison sparked the idea of using the leading three PCs, or even more components, as variables for prediction. The approach involved training learning algorithms to predict the PC points at the next time step, multiplying these points by the assumed time-invariant spatial EOFs, and generating predictions for SST. SVR and NNs (Algorithms 1 and 2) were used as learning algorithms for trajectory predictions. The 20 leading ISOMAP PCs were then selected for SVR and NN training, as in the clustering process. After the leading number exceeded 20, the latest 10 year corresponding to the prediction period in El Niño or La Niña event average PCs were used to construct residual PCs (number more than 20). For example, if one wanted to predict SST from May 2019 to Apr. 2020, El Niño to normal year, then the residual PCs would use the time period from Mar. 2016 to Feb. 2017, El Niño to normal year and to La Niña. These residual PCs were selected with spatial EOFs to obtain fine spatial prediction structures. They occupied few variances and did not influence the prediction results. Both SVR and NN predicted and the residual PCs were multiplied by the climatological spatial EOFs (from January 1980 to December 2018) and used to achieve SST predictions related to actual physical space. The idea of using EOF, leading PCs and residual PCs are shown in Fig. 1. When we calculated ISOMAP EOFs and PCs, the SST climatological mean was removed. We used learning algorithms to predict the leading 20 PCs and filled them with selected residual PCs, then times the climatological spatial EOFs to reconstruct SST anomalies (SSTA). Finally, we could obtain SST predictions by adding the climatological mean SST.

Fig. 1
figure 1

The idea of using PCs to predict. The leading PCs are used for training algorithms and predicting to next time steps. With residual PCs times climatological EOFs, the predictive data on physical space SST anomalies (SSTA) can be reconstructed

During SVR training, the SST ISOMAP PCs were regarded as independent components and trained individually. The SVR provided the prediction PC values. During the NN training process, the network resembled the feedforward network algorithm of Brunton and Kutz (2022) and exhibited three hidden layers with ten neurons. One-step time lags in the input PC data were used as the output PC data in the NN. However, given the trade-off between computational efficiency and final prediction accuracy, no particular reason was given for selecting such numbers of hidden layers and neurons. The hidden layers 3–20 and the number of neurons 10–200 had been tested and there were no significant forecast skills improvement.

In NN prediction experiment, we arrange SST data into the training data set and the testing data set as the traditional machine learning did. At the same time, we made the training data set have its own training, validation, and testing data parts. This arrangement was for avoiding overfitting and execution efficiency. First, we kept Jan. 2019–Dec. 2021 period to test NN model. This was the testing data set and was never used in training process. Then, during the NN training process, the period of time from Jan. 1980 to Dec. 2018 was the training data set. There were approximately 60%, 20%, and 20% of the data were randomly chosen for training, validation, and testing, respectively, in training NN model. When the NN model was trained, 60% training data were used to generate the model, 20% training data were used for validation and for tuning the model, the rest 20% training data were used for testing without joining to tune the model. Adding the validation and testing data in training process could effectively generate the model and improve the overfitting. Considering only the efficiency and accuracy of generating NN models, there was still no particular reason to use the 6:2:2 ratio. Consequently, the NN prediction results were not identical in different NN training processed, because the different data (including training, validation, and testing) were used to produce the NN model. To obtain more robust NN results, the NN model training and final prediction test were counted 20 times, and the prediction average was used as the final prediction value. In contrast to NN, the SVR separated the data into training and testing two parts. All training data were used for validation to generate the SVR model. The forecast skills for the final predictions of SVR and NNs were the root-mean-square error (RMSE) and anomaly correlation coefficient (ACC), whose equations and calculation methods are outlined in detail in the appendix.

figure a
figure b

Clustering

Figure 2 depicts the relationship between the cost function defined in Eq. (1) and the number of clusters. No elbow point minimizing the cost function and representing the optimal number of clusters was identified. Therefore, seven clusters were subjectively selected. As shown in Fig. 2, cluster numbers exceeding seven resulted in small values of the cost function. These clusters (greater than seven or more) did not highlight clear differences in the SST distributions.

Fig. 2
figure 2

Clustering cost function and number of clusters in the k-means algorithm. A relatively small value of the cost function indicates an appropriate number of clusters

Figure 3 shows the centroids of seven clusters and the distribution of these clusters. The seven events close to these seven centroids were as follows: two El Niño events (October 1987 and December 2009), two La Niña events (February 2006 and October 2011), and three normal events (September 1984, June 1994, and June 2014). Analysis of these cluster centroids revealed clear El Niño/La Niña CP and EP events. Even normal events were easily divided into three clusters depending on the SST anomalies (warm or cold) in the western coast of South America. Notably, the event observed in September 1984 was a normal SST event, whereas the event observed in October 1984 was a La Niña event. The centroid close to September 1984 lay on the dividing line between normal and La Niña events. In addition, the SST anomaly pattern of cold water in the EP in September 1984 was similar to a La Niña phenomenon. The warm or cold SST anomalies belonged to the CP or EP type could be revealed clearly in this cluster map. Strong ENSO and EP events tended to appear on both sides of the iso 1 axis (x axis), with the first PC approaching their minima or maxima. The first PC had the strong east–west variation pattern around the equator (cf. Figure 6 in Tseng 2022). It corresponded the strong EP events with larger the first PC values. In contrast, CP and weak events were observed in locations with smaller the first leading PC around the value 0 in the iso 1 axis direction. These results are consistent with those of previous studies investigating warm and cold events (Kao and Yu 2009; Yu et al. 2011).

Fig. 3
figure 3

Two-dimensional projection of seven clusters (different color points) and their centroids (+ sign) with k-means clustering based on the 20 leading ISOMAP PCs. The figure depicts SST anomalies close to the centroids

Tracing a single El Niño and La Niña case revealed the unique trajectory or preferable location of this specific case. The EP case did not abruptly skip to a CP pattern. Instead, it evolved over time, as in the case of 1997/98 EP El Niño evolving between 2002/03 CP El Niño and 1998/2000 EP La Niña. The clustering positions and trajectories provided additional clues to analyze the El Niño or La Niña processes instead of analyzing the entire SST anomalies or variations of the Niño 3.4 index. When compositing the similar event cases, this clustering provides the guidance. For the future implementation, the numerical SST model results or their ensemble results can be projected on this clustering map. The model results can be used to examine the differences with these real historical events.

Although our clusters could exhibit CP/EP ENSO events through locations different from the cluster centroids, or provide guidance for composite similar CP/EP cases. These ISOMAP PC clusters marked or anchored centroids or historical points to clearly visualize and examine current or future SST event trajectories. We thought the process of any one event was also important and the dynamics should be restored. When examining the recent La Niña event from August 2020 to February 2023 (Fig. 6a), we could find an evolutionary ISOMAP PC trajectory close to the CP La Niña centroid in February 2002 but ultimately quite different from the 2002 case. We also noticed that this latest La Niña event evolved slowly and the beneath dynamics worth studying in the future. We thought studying this latest case by composite analysis must be very careful.

Prediction

In this section, we compared the predictions of SVR and NN for SST. The testing period was concentrated between 2019 and 2022, preceded by the training period. This was to ensure that the training data set did not contain the testing data used for prediction, and that the learning algorithms had not acquired known answers before making the prediction. To validate the SVR and NN forecasts, 36-month consecutive time periods were selected from Jan. 2019 to Dec. 2021, and their RMSE and ACC values were averaged. Because the SVR could not get good prediction results by using the training data far from the initial forecast month. For example, to predict the Jan. 2020 SST in SVR, the last of training data set time was Dec. 2019. But for NN, the training data set time limit was from Jan. 1980 to Dec. 2018. This arrangement was because the NN prediction results much better than the SVR. Moreover, we found that the last training data time in NN prediction did not need to be 1 month before the start prediction month as in SVR.

For the SVR, orthogonal PCs capable of being separated during training were used. Figure 4a depicts the three leading ISOMAP PCs versus time variations for both training and testing scenarios. The blue curves represent the training sets, the green curves represent the testing sets, and the red curves represent the SVR-predicted values. The same training and forecasting procedures were used in the 20 leading PCs. The other 17 ISOMAP PCs training and testing process were not shown in here. As shown in Fig. 3a, the training period spanned from January 1980 to December 2021. In this case, the prediction period spanned from January 2022 to October 2022. Notice that the second PC SVR testing accuracy was worse than the first or the third PCs. Due to the second ISOMAP PC variation differed from the other two PCs, the second ISOMAP PC values exhibited a monotonically decreasing trend over the period of 41 years. SVR always tried to get the predictive values in the final training time and the testing stage back to higher than what actually happened. The second PC monotonically variation was similar to the second PC in Li et al. (2019) monotonically rising findings. Other random time periods exhibited comparable acceptable training results but poor testing results (not shown).

Fig. 4
figure 4

a Three leading PCs from the training data (blue curves), testing data (green curves), and SVR prediction (red curves). The training period lasted from January 1980 to December 2021, and the prediction period lasted from January 2022 to October 2022. b Three leading PC trajectories obtained from NNs (blue curve) and true observations (red curve) between March 2021 and December 2021

To achieve future predictions, a feedforward NN was used to train the 20 leading PCs within the training period from January 1980 to December 2018. Figure 4b depicts the NN prediction trajectory (blue) and actual observation trajectory (red) for the first three ISOMAP PCs from March 2021 to December 2021. For convenience, only the trajectories composed of the first three PCs are shown in Fig. 4b. Once the prediction trajectory perfectly matched the observation trajectory, the forecast was regarded as the optimal solution. Although we were unable to match the NN trajectory with the observed trajectory, the forecast skills were acceptable and still outperformed the SVR forecasts (Fig. 5). We also noticed that prediction performance should be examined by forecast skills rather than the leading PC trajectories. However, we did not illustrate the NN ISOMAP PCs individually as we did with the SVR, because the NN PCs exhibited similar variations during the training period but unsatisfactory testing results (forecasts). Furthermore, the SVR PC 3D prediction trajectories were not illustrated, because they were inferior to those of the NN.

Fig. 5
figure 5

Forecast initial time from Jan. 2019 to Dec. 2021, 36-case prediction average values of RMSE and ACC versus lead months. The blue curves belong to the SVR, the red curves belong to the NN, and the red dashed curves focus on Niño 3.4 area: a RMSE, b ACC. The gray curves highlight all 36-case prediction values

Figure 5a, b depicts 36-case average forecast RMSE and ACC values. The forecast initial time is from Jan. 2019 to Dec. 2021, with total 36 cases and 48-month predictions. The blue curves belong to the SVR results, and the red curves belong to the NN results. Since SVR forecasts underperform beyond 10 months, 10 is chosen here as the number of lead months. The forecasting ability of the NN was superior to that of the SVR, with an RMSE value between 0.3 and 0.4 and an ACC value between 0.68 and 0.8. Some test cases maintained an ACC value of 0.7 for approximately 18–24 months. On the other hand, for Niño 3.4 area predictions (Fig. 5 red dashed curves), these scores were similar to or even slightly higher than those of SST forecasts in ECMWF seasonal model (Molteni et al. 2011), hybrid coordinate ocean model (Thoppil et al. 2021), and National Centers for Environment Prediction/Environmental Model Center Global Ensemble Forecast System (personal communication with Yuejian Zhu).

An intriguing case occurred between May 2019 and December 2022. SST anomalies revealed El Niño events in the first 2 months, followed by normal events, La Niña events, brief normal events for approximately 2 months, and then La Niña events once again. Figure 6a depicts this evolution of 3D ISOMAP PCs. In Fig. 6c, this actual observation trajectory is represented by a red curve, and the NN forecast trajectory is represented by a blue curve. Initially, the NN was unable to accurately predict this period but the trend was acceptable. As shown in Fig. 6b, the initial 12-month ACC values remained between 0.6 and 0.7, and the RMSE values remained between 0.4 and 0.7. For the final outputs, using predictive PCs, residual PCs, spatial EOFs, and climatological monthly mean SST, physical space SST predictions could be reconstructed. Figure 7a shows at lead times of the 7–9 months SST predictions, and the validation period is from October to December 2021, the same period as the PC trajectories in Fig. 4b. The ACC values in this period time were 0.6–0.7 and the RMSE values were 0.35–0.4. The right column of the Fig. 7b shows the real observation from ERSSTv5 data.

Fig. 6
figure 6

a Three leading SST ISOMAP PC trajectories between May 2019 and December 2022. SST anomalies reveal El Niño events, normal events, La Niña events, normal events again, and then La Niña events. El Niño, normal, and La Niña event are marked by red, yellow, and blue points, respectively. b RMSE and ACC values from the NN during this period. c NN prediction trajectory (blue curve) and true observation (red curve) during this period

Fig. 7
figure 7

SST prediction by NN (left column a) and real SST observation from ERSSTv5 (right column b). The prediction initial time is Mar. 2021 in Fig. 3b. The period from Oct. 2021 to Dec. 2021 is the forecast lead month number 7–9. The ACC value is around 0.6–0.7

To summarize, based on Fig. 6, although the ensemble mean forecast trajectory (Fig. 6c blue curve) primarily followed the second PC variation and not the first PC variation as in the observation period (Fig. 6c red curve), the NN forecasts outperformed the SVR forecasts. At least, the initial 12-month predictions were acceptable. During this period, half of the prediction trajectory samples turned right along the first PC axis positive direction, whereas the other half turned left along the first PC axis negative direction. If the selected ensemble forecasts had large differences in this positive–negative mode, the ensemble average would remove the positive–negative first PC variation and remain the trajectory variation contributed by other PCs that did not differ significantly among the ensemble members. Averaging the results served as a reminder that the data composite analysis would have yielded similar results that would have eliminated the main variation.

Conclusions and discussion

In this study, low-dimensional SST PC points obtained using ISOMAP were used to differentiate clusters and predict variations in time trajectories. By highlighting SST anomaly patterns, these clusters were able to determine the relationships between various ENSO and normal events. With three clusters, the corresponding centroids easily identified El Niño, normal, and La Niña events. The new ENSO index was also used to measure the entire Pacific basin SST anomaly, as opposed to merely the small-area Niño 3.4 index. Overall, the evolution of centroids over time served as an intriguing aspect of ENSO evolution.

By inverting the forecasts with climatological spatial EOFs, we studied the time variations of low-dimensional PC points and used them to predict future SST through SVR and NNs. This method was computationally more efficient than the original high-dimensionality numerical model, and it produced acceptably accurate long-term forecasts. Although the slow change of the SST model was a possible reason for the forecast accuracy, the forecast based on low-dimensional data points was still one of the potential forecast models to obtain the correct SST in a short time.

Currently, our data-driven models are being developed to predict other atmospheric variables, such as wind vector field data. Because the wind vector fields usually have discontinuity in time variation that are difficult for learning algorithms. Other dimensionality reduction techniques, such as self-organizing maps, which successfully cluster zonal wind in boreal summer (Rousi et al. 2022), probably provide the low-dimensional wind information to do NN forecasts. Moreover, we are employing and evaluating autoencoders and diffusion maps that were used to replace ISOMAP for the prediction of atmosphere–ocean fluids. We also extend this data-driven ISOMAP and NN methods for learning historical and ensemble member predictions in low-dimensional space, which can be used for improving ensemble forecasts.

Despite the need for additional tuning of SVR and NN parameters, such as slack variables and the number of neurons, current models have been worked well. However, the optimal number of hidden layers in NNs remains unclear, and other function compositions may yield superior results. Moreover, because of the limited SST sample size, simple models were used to avoid overfitting. Therefore, future research should investigate low-dimensional changes predicted by numerical weather models as a potential solution.