1 Sea Surface Temperature and Tropical Instability Waves

With the development of earth observation satellites and various active and passive sensors, massive ocean data have been acquired. For instance, the cumulative satellite data archive volume at the National Oceanic and Atmospheric Administration’s National Centers for Environmental Information reached ~7.5 petabytes in 2016. The projected volume by 2030 is ~50 petabytes [32]. Many oceanic gridded products (e.g., sea surface temperature (SST), sea surface winds, and sea surface height) have been generated from such deluges of satellite data. These products provide an unprecedented golden opportunity for in-depth research and demonstrate the urgent need to develop effective methods to explore time-series data. SST can be measured from space and has the longest history among satellite-derived oceanic products widely used to reveal the evolution of various important oceanic phenomena such as El Niño, western boundary current, and tropical instability wave (TIW) [18]. Thus, SST is a critical parameter in understating physical oceanography, biological oceanography, and atmosphere-ocean interaction; it is also a key input parameter for climate and weather modeling. The models in traditional statistical analysis have relatively limited complexity. This could make the models not work well, when used to model the oceanic phenomena that are complicated by nature.

Recently, another new research and application front that utilizes available tremendous data using deep learning (DL) technology has emerged. With DL, substantially more complex models can be built to mine rules deeply hidden in SST data. DL is a subset of machine learning that teaches computers to learn and make decisions or predictions based on input data. The deep neural network (DNN) technique is one of the most popular and powerful DL techniques, achieving successes in computer vision and speech recognition [15, 17]. A DNN is a multilayer neural network (NN). In most network layers of a DNN, input values are weighted, combined, and then transformed by an activation function to incorporate nonlinearity into the network. The output values of a network layer are linked to the next layer as input. All weights of a DNN are iteratively optimized by combining error backpropagation and gradient-based optimization to make the DNN suitable for finding the underlying relationship among its inputs and outputs. Such a multilayer structure allows the DNN to learn data features with multiple abstraction levels, which is impossible to imagine by the human brain [15]. Convolutional layers, named for their mathematical form, are a core type of network layer widely used in DNN models. In a convolutional layer, the output value at a specific site is calculated by weighting and combining the nearby sites’ input values. Each output site shares the same weights. Thus, a convolutional layer has fewer weights to be optimized than a traditional fully connected layer that uses independent weights to connect all input and output sites. As a result, using the convolutional layer is particularly efficient in processing multi-dimensional data. Therefore, compared with traditional statistical models, DNN-based DL models can be much more complex and thus, after trained by a large quantity of sample data, can more efficiently learn the inherent characteristics behind them. Recently, DL applications in the prediction of future images in videos have drawn extensive attention in the field of computer vision [24, 35]. Ocean SST forecasting is similar to image prediction in videos, where future SST maps are forecasted based on the previous maps using a DL model. Because of the abovementioned similarities, we believe DL technology will help us to model oceanic phenomena in a different and promising way that is driven by ever-increasing big ocean data, although DL applications in oceanography and other geosciences just begin in recent years [31]. Therefore, using the large accumulated amount and long time series of satellite SST data, we can build a pure data-driven SST forecasting model that capture the spatial-temporal variations of a complicated yet important oceanic phenomenon, TIW, which has effects on transport of heat, mass and momentum in the ocean, air-sea and biophysical interactions, climate change, etc. As an internally generated ocean variability with time scales of approximately 15-40 days, TIWs produce large perturbations to physical and biological fields in the ocean, including SST. Furthermore, TIW-produced SST perturbations induce almost instantaneous atmospheric surface wind responses, forming TIW-scale interactions between the atmosphere and ocean. Although TIWs are dominantly controlled by the background ocean state, TIW evolution and predictability are affected by air-sea coupling at TIW scales. TIW forecasting is a challenging task because the spatial-temporal variation of TIW is significant, with large shape distortions and deformations and seasonal and interannual variability caused by the El Niño-Southern oscillation. Both high-resolution grids of the space domain discrezation and realistic parameterizations of the relevant physical processes are required, when we use numerically model TIWs. All these lead to substantial difficulties in realistic simulation of TIW-related oceanic and atmospheric responses and the coupled air-sea interactions. Dynamical equation-based numerical modeling for TIWs requires not only high spatial resolution but also realistic parameterizations of the relevant physical processes. As a result, substantial difficulties exist in realistically simulating TIW-related atmospheric responses and the coupled air-sea interactions. Therefore, the data-driven model was applied to the SST field in the eastern equatorial Pacific Ocean to show that the TIW propagation can be forecasted by the data-driven model.

Satellite-derived SSTs have long been assimilated into numerical models to improve their forecasts. Recently, the NN-based strategy was proposed to perform a similar role as data assimilation. For example, in [27], a NN model is used to find the bias correction term in a numerical SST forecasting model. Compared with a numerical model, a data-driven forecasting model is much simpler and computationally efficient. The forecast made by a data-driven model relies only on prior data of minimal physical parameters or even one parameter. As another example, an SST pattern time series can be expanded as the sum of products of time-dependent principal component scales and corresponding space-dependent eigenvectors following empirical orthogonal function (EOF) analysis. Thus, the forecast of the SSTs at grids can be approximately reduced to the forecasts of several SST leading principal components [40]. Recently, NN models were developed to directly forecast SSTs without EOF approximations, including both site-specific and -independent models. A site-specific model considers the site difference, so makes SST forecasts with different NN models at different sites [26]. However, as each site needs building a NN model, the computation coat is high in the NN-training phase of a site-specific model, and sufficient NN-training samples are also required at each site. When use a site-independent model to forecast SSTs, different sites share the same SST forecast model [2, 42, 44]. This makes site-independent models more efficient. However, when forecasting a future SST at one site, these recent models only utilize the prior SST series at the very close neighboring sites. The models may have limitations over a large area because the SST patterns controlled by large-scale phenomena could be related to each other within a vast ocean area. Thus, maybe a wider SST series centering at a forecast site should be utilized to forecast the future SST.

In the following section, we introduce a multi-scale scheme DNN with four stacked composite layers for SST forecasting in the eastern equatorial Pacific Ocean, which overcomes the shortcomings of previous data-driven SST forecasting models. The idea of a multi-scale scheme has achieved notable successes in the field of computer vision, e.g., DNN applications in semantic segmentation [21, 33], but has not been explored in the oceanography field. Considering the natural differences among different sites, we also build a space-dependent but time-independent bias correction map and then combine it with the multi-scale DNN to develop the final data-driven SST forecasting model, named the DL model for brevity.

The developed DL model was applied to forecast the SST pattern variations associated with the TIWs in the eastern equatorial Pacific Ocean. TIWs are an important ocean dynamic phenomenon in both the equatorial Pacific and Atlantic Oceans. They were first captured in the current meter records and infrared satellite images in the 1970s [6, 18]. One prominent characteristic of Pacific TIWs is its cusp-shaped and westward-propagating waves at both flanks of the equatorial Pacific cold tongue where the north flank has stronger signal. Previous studies have estimated the wavelength, period, and phase speed of TIWs from various data sources, and their values are typically within the ranges of 600 to 2000 km, 15 to 40 days, and 17-86 km/day [3, 4, 12, 13, 19, 28, 29, 38, 39]. Previous studies also suggested that the generation of TIWs could be the result of barotropic and baroclinic instability processes of the meridional and vertical shear among the westward South Equatorial Current, the eastward Equatorial Undercurrent, and the North Equatorial Counter Current [4, 23, 30, 34]. As a result, TIWs are inactive/active during boreal spring/fall, because the current shear is weaker/stronger at that time. Moreover, TIWs are suppressed and even indiscernible during strong El Niño years when the Pacific cold tongue and the related equatorial current shear are too weak and vice versa during La Nina years [39]. Conversely, TIWs also have feedback to the El Niño-Southern Oscillation, affecting its asymmetry and irregularity [1, 10, 11]. The physical and biological processes of TIWs are complicated. As has been widely illustrated, TIWs have a profound effect on the distribution of SST, sea surface height anomaly, chlorophyll-\(\alpha \), rain, salinity, and winds in the eastern equatorial Pacific Ocean [3, 14, 28, 29, 38]. TIW induces horizontal convection and vertical mixing in the upper sea [12, 13, 20, 25]. The mixing reaches even the lower half of the thermocline, a fact that is still not well considered in most physical models [20]. TIWs affect the equatorial chlorophyll-\(\alpha \) concentration by transporting nutrients to the upper ocean [7, 9, 43]. Conversely, modeling analyses indicate that chlorophyll-\(\alpha \) may modulate solar radiation in the upper ocean and weaken TIWs [36, 37]. TIWs also interact with the atmosphere because of the sea surface wind modulation caused by the TIW-induced SST anomalies [21, 41, 45,46,47]. Moreover, a spatial correlation between SST and cloud patterns is observed during the TIW seasons. The clouds appearing in the warm troughs of the TIWs are usually generated by cool low-level winds crossing the SST fronts and, in turn, dampen the TIW-induced SST anomalies by reducing the incident solar radiation over the warm troughs [5]. More comprehensive physical models for TIW studies are still ongoing, and many of the above-mentioned aspects should be considered to make the models more realistic [12, 14, 20, 36, 37, 45,46,47], which is a difficult challenge. In contrast, the time series of data contain all these factors. Owing to the strong data-mining ability, a data-driven DL model can automatically learn comprehensive rules of SST spatial-temporal variations from the data, and does not depict various complex processes by using physical equations.

2 Data and Model of SST Forecasting

There are two parts in the model: a DNN and a constant map. The DNN is multi-scale, having a network structure of four stacked composite layers for different spatial resolutions. The DNN uses the SSTs from the preceding fourteen steps to estimate the SSTs at the following step. The interval between the two steps is five days. The DNN-made estimation is followed by the correction with the constant map for reducing bias. The details are given below.

2.1 Satellite Remote Sensing SST Data

The DL model was built and tested with the SST products of Remote Sensing Systems. The products were made from both microwave and infrared sensor measurements. Our studied area is a rectangular region spanning from 120 \(^\circ \)W to 180 \(^\circ \)W in longitude and from 10 \(^\circ \)S to 10 \(^\circ \)N in latitude. The products from 2006 to 2019 were collected in our study. These 9-km-grid products were averaged to the 18-km-grid SST data. The SST data were divided into two parts according to time. The first part (1st Jan 2006–31st Dec 2009) and the second part (1st Jan 2010–31st Mar 2019) were used to build and test the DL model, respectively. By considering that TIWs have about a fifteen-to-forty-days temporal scale, the time step of the DL model is set to five days. Based on the preceding thirteen and current-step SST maps, the DL model forecasts the SST map at the following future time step, the fifth step. Therefore, a sample in our study is an SST series consisting of sequent fifteen SST maps. Then, the SST series was shifted day by day to get the second, third, fourth, etc. The DL model forecasts the fifteenth-step SST map in each series based on the first-fourteen-steps SST maps. The forecasted SST map was then validated using the series’s fifteenth-step SST map. Approximately one thousand four hundred series were generated in the first part of the SST data, and three thousand four hundred series samples were generated in the second part of the SST data. It should be noted that a significant El Niño event occurred during the period of 2014–2016, which is covered by the second part of the SST data.

Fig. 1
figure 1

The DL model receives SST maps at the previous and current time steps and then outputs the SST map at the future time step. The major part of the DL model is a DNN having four stacked composite layers. The bias correction map is added to the DNN output to obtain the forecasted SST map

2.2 Architecture and Training of the DL Model

As shown in Fig. 1, the DL model is composed of a trained multi-scale DNN and a time-independent bias-correction map. The DNN is a stack of four composite layers. And each composite layer has four cascaded convolutional layers.In this region,the value of SSTs range from 16 \(^\circ \)C to 34 \(^\circ \)C, and the range was rescaled to [−1, 1]. In order to fed to the corresponding composite layers at different stack levels, a 2 \(\times \) 2 average pooling operation was used to downsample the SST maps. These composite layers process the SST maps at different spatial resolutions. The lower the stack level, the higher the resolution. Except the top level,each higher resolution composite layer at a lower stack level requires the output of the composite layer at the upper stack level. And the output need to be up-sampled. The input of the DNN consists of 14 SST maps at the current step and the previous 13 steps. Considering the input SST map at the current time step is more correlative to the future SST map, the DL model also directly linked the input SST map at the current step to the last convolutional layer along with the up-sampled output of the lower resolution composite layer at the upper stack level. The rectified linear unit function has better error gradient propagation [8], so it was used as the activation for the first three convolutional layers of each composite layer. The tanh activation was used for the last convolutional layer of each composite layer except for the bottom composite layer. The tanh activation rescales the output of each composite layer to [−1, 1] that matches the input range of the higher resolution composite layer where the output is fed after the up-sampling. The activation of the last convolutional layer of the bottom composite layer is a linear function and is used to make the DNN output unbounded. The four convolutional layers of each composite layer include 8, 16, 32 and 1 channels. The kernel sizes of the four convolutional layers of the top composite layer are all 3 \(\times \) 3. Those of the other composite layers are 5 \(\times \) 5, 3 \(\times \) 3, 3 \(\times \) 3 and 5 \(\times \) 5, respectively.

For a general network layer, one site in the output map is connected to multiple sites in the input map. Thus, the value at the output site is only dependent on the values at these input sites rather than the whole input map. These input sites form the receptive field of the output site. For instance, the input sites inside a receptive field of a convolutional layer are weighted and connected to the corresponding output site by the convolution kernel. The receptive field can be enlarged by using average pooling layers to down-sampling the inputs before feeding them to the subsequent layer. Then, the output can be treated with the same number of up-sampling layers to restore the resolution. SST variations in different locations may be correlated by oceanic phenomena with large scales. Considering this, we use the SST series of a wider area to forecast the SST at the area center. Therefore, the DNN is designed to be multi-scale to obtain the wider receptive field. After three down- and up-samplings among the four composite layers, the receptive field size of the whole DNN extended by about twelve times. For forecasting TIWs, this size is large enough.

The SST-map-series samples for building the DNN were divided into the training and validation datasets, according to the ratio of 3:1. The input area is set to be larger than the output (forecast) area in order to ensure that the input area covers the whole DNN receptive field. The following loss function is used to optimize the DNN:

$$\begin{aligned} Loss = \sum _{k = 1}^{K}{\sum _{(m,n) \in \text {Grid}s_{\text {output}}}^{}\left( \text {SS}T_{\text {output}}^{(k)}(m,n) - \text {SS}T_{\text {true}}^{(k)}(m,n) \right) ^{2}} \end{aligned}$$

where \(\text {SS}T_{\text {true}}^{(k)}(m,n)\) is the fifteenth-step satellite SST map. k denotes the kth sample, and K is the sample number of the training or validation dataset. (m, n) denote the grid (m, n) of the output area, and Gridsoutput is the grid set. \(\text {SS}T_{\text {output}}^{(k)}(m,n)\) is the DNN-forecasted SST. The Adam algorithm [16] was used to optimize the DNN parameters on the training dataset, and the maximum number of epochs was set to be 2500. The optimization was implemented using the CUDA technique on a NVidia Quadro M4000. The memory of the graphics card is eight GB. In order to avoid overfitting to the training dataset, the loss value on the validation dataset was also calculated during the optimization procedure. The smallest loss value (the validation dataset) was achieved at the one hundred and twenty-nineth epoch costing about one hundred and fourteen minutes. The parameter values corresponding to the smallest loss value were adopted.

Parameters in convolutional layers are the same for different sites. In addition, there is no optimizable parameter in both average pooling and up-sampling layers. Thus, the DNN is independent of the site. However, the environmental background of the study area is inhomogeneous. There is a spatial trend that the SST is overall higher in the west than in the east. This may cause evolution differences among the SST pattern in different areas. Therefore, an SST correction map is included in the DL model, which is added to the DNN-forecasted SST map to make the final forecast (Fig. 1). By using the samples during the training period, this SST correction map is generated by calculating the bias of the DNN at each grid after the optimization.

The operating efficiency of the developed DL model is very high. It only takes about 1 minute to forecast SSTs for all testing samples on an ordinary desktop computer.

3 SST Forecast of TIW Motion Using the DL Model During the Testing Period (2010/01–2019/03)

Figure 2a–c shows the satellite SST maps of the testing period, and Fig. 2d–f shows the SST forecast result by the DL model. The maps are matched closely in shape, where the most notable feature is the characteristic of TIWs that propagate westward. The characteristic is cusp-shaped and irregular deformations.

Fig. 2
figure 2

Satellite SST maps a to c and DL-forecasted SST maps d to f

Figure 3 shows the output of the four composite layers in Fig. 1 at three continuous time steps and visualized from the first (bottom) to the fourth (top) stack level of the DNN. For the sake of clarity, the coarse-resolution results at higher levels are converted to the initial resolution using the nearest neighbor interpolation method. Then the results are rescaled to [−1, 1]. All outputs show a westward propagating signal similar to the satellite SST maps as shown in Fig. 2a–c. These maps are extracted from the DNN network during the training period(2006-2009) and show the temporal and spatial characteristics of TIW. Related parameters in the network are learned by DNN from sample data. The TIWs’ motion can be forecasted by these features.

Fig. 3
figure 3

The outputs at three consecutive time steps (the same to the steps in Fig. 2) of the fourth-(top)-stack-level composite layer of the DNN a to c, the third-stack-level composite layer, the third-stack-level composite layer d to f, the second-stack-level composite layer g to i, and the first-(bottom)-stack-level composite layer j to l

Fig. 4
figure 4

Procedure of estimating SST pattern zonal speed: a Two zonal sequences of SST MAs at the longitudes of the grids are calculated from the SST maps at two times, where blue and red denote the first time and the second time, respectively. The first sequence is satellite MAs, and the second sequence is forecasted by the DL model. b Two sequences of SST MAs after their linear trends are removed. c Cross-correlations of the two sequences after the linear trends are removed. d An enlarged image of the green box in Fig. 4c, where the three green points are the maximum discrete cross-correlation and the two cross-correlations at the neighboring discrete zonal lags. e The three green points can be interpolated with a quadratic curve (black line), and the zonal lag corresponding to the peak of the curve is considered as the exact zonal lag with the maximum cross-correlation. The speed can then be estimated by dividing the exact zonal lag by the time interval

The forecasted and satellite SST maps’ meridional averages (MAs) are calculated. The maximum detrended cross-correlation between the MAs at the current time step and the next step along the equator can estimate the westward propagation speed of the SST pattern.

During TIW Seasons, MAs calculated by SST can reflect the westward propagation signal of the SST pattern. The forecast area exists an approximately linear zonal trend of SST, which is warm in the western part and cold in the eastern part. Moreover, the trend is superimposed with the above signal. An instance of two zonal sequences of SST MAs at the longitudes of the grids of the forecast area and at two consecutive time steps is given (Fig. 4a). The red lines represent the MAs of the DL-forecasted SST map after five days(one time step), and the blue lines represent the MAs of the satellite SST map sequence. The westward propagation of the signal becomes more obvious after removing the linear zonal trend of the SST MAs(Fig. 4b). The two sequences of detrended SST MAs series’s cross-correlations can be calculated at the discrete zonal lags (Fig. 4c) , and can find the discrete lag with the maximum cross-correlation and its two neighboring discrete lags (Fig. 4d). A quadratic curve can interpolate the cross-correlation of three discrete lags. The peak lag of the interpolated curve is considered to be the exact lag of the maximum cross-correlation between two non-trending SST MA sequences(Fig. 4e).In mathematical form, this is

$$\begin{aligned} lag_{\text {exact}} = \frac{1}{2} \cdot \frac{y_{1}(la{g_{2}}^{2} - la{g_{3}}^{2}) + y_{2}(la{g_{3}}^{2} - la{g_{1}}^{2}) + y_{3}(la{g_{1}}^{2} - la{g_{2}}^{2})}{y_{1}(lag_{2} - lag_{3}) + y_{2}(lag_{3} - lag_{1}) + y_{3}(lag_{1} - lag_{2})} \end{aligned}$$

where \(lag_{1}\), \(lag_{2}\), and \(lag_{3}\) are the three discrete lags, and, \(y_{1}\), \(y_{2}\), and \(y_{3}\) are the corresponding cross-correlations. Finally, the propagation speed can be obtained by dividing the exact lag by the time interval.

Figure 5 shows the estimated speeds mainly ranges from 0 to 100 km/day [3, 4, 12, 13, 19, 28, 29, 38, 39]. The green solid curve represents the SST pattern propagation velocity predicted by the DL model. The red dashed curve represents the velocity estimated by the satellite/satellite SST MA pairs. The two curves are in good agreement. Both curves show very consistent TIW seasonal fluctuations.In the TIW season, TIW controls the motion of the SST pattern. Thus, the DL-forecasted SST pattern propagation velocity can be regarded as the TIW speed. Nevertheless, the SST pattern is inert, and there is no apparent westward motion in the no- or weak-TIW seasons.

Fig. 5
figure 5

Temporal variation of the SST pattern associated with TIW westward propagation during the testing period. We calculated the MAs of the satellite and forecasted SST maps and then estimated the speed of the SST pattern westward propagation based on the maximum detrended cross-correlation along the equator between the MAs of the satellite SST map at the current time step and those of the satellite or predicted SST map (brown dashed curve: the speed calculated by satellite/satellite pairs, green solid curve: the speed calculated by satellite/DL-predicted pairs, orange dotted curve: the daily Niño3.4 index)

The DL model can also forecast recursively. In this recursive frame, the forecasted SST, the present satellite SST, and the previous 12 satellite SSTs were used to forecast the SST at the second recursive step, and then, the two forecasted SSTs, the current satellite SST and the previous 11 satellite SSTs were utilized to forecast the SST at the third recursive step. Therefore, the DL model recursively forecasts the SST in the subsequent steps (the fourth, fifth, sixth, etc. recursive steps). Figure 6 shows an example of the recursively forecasted SST maps at the subsequent three time steps after the final time step in Fig. 2. As can be seen from the figure, the DL model can still work well and forecast the TIWs’ westward motion in general.

Fig. 6
figure 6

Satellite-observed SST a to c and DL model-forecasted SST d to f

4 Interannual Variation in TIW Westward Propagation

The daily Niño3.4 index data were also overlaid on Fig. 5, and denoted by orange dotted curve. The data was provided by the KNMI (the Royal Netherlands Meteorological Institute) Climate Explorer. Fig. 5 shows that the DL-forecasted TIW speed values and the Niño3.4 index values are 180 degrees out of phase. There is a major El Niño event from 2014 to 2016, and the TIW speeds were almost zero for the weakening of meridional SST gradients during this time. The measurements of mooring and Argo float from 2000 to 2010 also validate this fact, in which TIW kinetic energy and occurrence probability show negative correlation with the Niño3.4 index [11]. The correlation coefficient between the Niño3.4 index values and the speed values estimated from satellite/satellite SST MA pairs is -0.38, with a P-value close to zero and a 95% confidence interval of (−0.35, −0.41). The corresponding statistic results for the DL-forecasted speeds are -0.53, with a P-value close to zero and (−0.50, −0.55).

Fig. 7
figure 7

Zonal TIW westward propagating speeds at 2-degree latitude bands. a Distribution of speeds estimated from satellite/satellite SST MA pairs and distributions of forecasted speeds estimated from b satellite/DL-forecasted SST MA pairs. The white blanks denote outliers beyond the range from 0 to 100 km/day

5 Zonally Westward Propagation of TIWs

Figure 7 gives the zonal TIW westward propagation speeds at 2-degree latitude bands, which were estimated from the satellite/satellite maps and the satellite/DL-forecasted SST maps, respectively. As can be seen from the figure, the estimated speed distributions are consistent with each other and their temporal fluctuations are similar during TIW seasons. The fluctuations are also similar to the curves in Fig. 5. Furthermore, the equatorial bands have higher speeds than the higher-latitude bands. All these results are in agreement with the previous findings for the reason that TIWs at different latitudes are controlled by different dynamic mechanisms with their speeds determined by equatorial wave processes [22, 38].

6 Accuracy During the Testing Period (2010/01–2019/03)

The root mean square error (RMSE) and bias variation of the DL model over time were calculated during the testing period and are given in Fig. 8. From the figure, it can be seen that the RMSE and bias are generally stable. The RMSE fluctuates between 0.15 \(^\circ \)C to 0.45 \(^\circ \)C, while the bias fluctuates between \(-0.15\,^\circ \)C to 0.15 \(^\circ \)C. Due to the rapid change of the SST pattern, the RMSE of the DL model is larger during the TIW seasons (Fig. 8a). There are approximately 3300 samples at each grid point. The RMSE and bias at each grid were calculated, and the RMSE and bias spatial distributions of the DL model are given in Fig. 9. The RMSE of the cold tongue area is higher than other areas. This is caused by the large spatial gradient and fast temporal variation of the SST in the cold tongue area. In the study area, the global RMSE of all grids and all samples is 0.29 \(^\circ \)C and the bias is \(-0.01\,^\circ \)C.

Fig. 8
figure 8

RMSE a and bias b temporal trends. The RMSE and bias temporal trends were calculated sample by sample from the forecasting errors at all grids

Fig. 9
figure 9

RMSE a and bias b spatial distributions. The RMSE and bias spatial distributions were calculated grid by grid from the forecasting errors of all samples

For the recursive forecasting, the global RMSE and bias of the DL model from 5 days to 150 days after the current time step (i.e., recursive steps 1 to 30) are given in Fig. 10. It can be found that the DL model’s accuracy declines with the evolution of time. It should be noted that there will be no satellite SST in the model input after 14 recursive steps. Even so, the RMSE does not grow rapidly and is still smaller than 0.80 \(^\circ \)C at the 15th recursive step. Meanwhile, the magnitude of the DL model’s bias is also smaller than 0.10 \(^\circ \)C at the 30th recursive step.

Fig. 10
figure 10

The global RMSE and bias of DL model implemented recursively concerning the number of recursive steps. In the recursive model, the DL-forecasted SST at a future time step is fed back to the model input to forecast the SST at the next future time step. The recursive steps from 1 to 30 are correspond to 5 days to 150 days after the current time step. After 14 recursive steps, there is no satellite SST map at the model input, and all input SST maps are from the model’s forecast. The global RMSE and bias were calculated from the forecasting errors of all samples at all grids at each recursive step

7 Conclusions

In this chapter, a data-driven DL SST forecasting model using the DNN technique was built. The DL model accurately forecasted the spatial-temporal variation of the SST pattern with a RMSE of 0.29 \(^\circ \)C and the TIW’s propagations that agree well with actual satellite observations.

The DL model is different from previous models. The DL model consists of a multi-scale DNN with four stacked composite layers and a time-independent but site-dependent bias correction map. In this design, the DL model takes the spatial dependence of a site-specific forecast over a large surrounding area and the bias correction of the DNN at different sites into consideration. The DL model was tested for nine years without overlapping with the training period. The results show that the DL model effectively forecasts the SST variation associated with TIWs. The DL-forecasted TIW speed is in good agreement with that estimated from the satellite SST maps. Both of the speeds present the consistent seasonal cycle and interannual modulation, and the interannual modulation is negatively correlated with the Niño3.4 index. TIW speeds are higher in equator than other latitudes. The DL model can also forecast SSTs at future steps in a recursive manner, although the accuracy degrades with time for the loss of actual satellite SST input.

The developed model results show DNN’s great potential for marine forecasting utilizing gridded data. Compared with numerical forecasting models, DL forecast models are straightly driven by real measurements and elude the complex process, including model parameterizations and approximations, various physical equations, and a substantial computational burden. DL models are able to forecast accurately with the help of a few physical parameters’ prior information. In our case, only one SST parameter was used. Almost all of the DL model’s computational cost is spent on the iterative optimization of the weights. Emerging technologies on hardware, e.g., CUDA, can easily speed up this learning procedure. If the DNN has been trained and obtained the bias correction map, the DL model can make an efficient forecast with no iteration. Therefore, it can work very rapidly. In our case, it only takes about one minute to forecast the SST pattern of the testing period by an ordinary desktop computer. As far as DNN is a data-driven technology, whether training or using, sufficient data is always the basic requirement. Fortunately, sufficient data and DNN’s outstanding learning capability fully cater to the growing amount of marine satellite observations in the era of remote sensing big data.