Skip to main content
Log in

Improving statistical prediction and revealing nonlinearity of ENSO using observations of ocean heat content in the tropical Pacific

  • Published:
Climate Dynamics Aims and scope Submit manuscript

Abstract

It is well-known that the upper ocean heat content (OHC) variability in the tropical Pacific contains valuable information about dynamics of El Niño–Southern Oscillation (ENSO). Here we combine sea surface temperature (SST) and OHC indices derived from the gridded datasets to construct a phase space for data-driven ENSO models. Using a Bayesian optimization method, we construct linear as well as nonlinear models for these indices. We find that the joint SST-OHC optimal models yield significant benefits in predicting both the SST and OHC as compared with the separate SST or OHC models. It is shown that these models substantially reduce seasonal predictability barriers in each variable—the spring barrier in the SST index and the winter barrier in the OHC index. We also reveal the significant nonlinear relationships between the ENSO variables manifesting on interannual scales, which opens prospects for improving yearly ENSO forecasting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability statement

The ERSST data set was downloaded from the NOAA National Centers for Environmental Information (Huang et al. 2017). The OHC data provided by the Institute of Atmospheric Physics Chinese Academy of Sciences are available at https://pan.cstcloud.cn/s/sloceeVQjo. The monthly CO\(_2\) trends provided by the NOAA Global Monitoring Laboratory are available at https://gml.noaa.gov/aftp/products/trends/co2/.

References

Download references

Acknowledgements

This research was supported by Grant #19-42-04121 from the Russian Science Foundation (selecting the state variables and constructing the data-driven models of ENSO). Also, it was supported by Grant #19-02-00502 from the Russian Foundation for Basic Research (studying the ENSO phase locking to the annual cycle) and project #075-02-2022-875 of Program for the Development of the Regional Scientific and Educational Mathematical Center ”Mathematics of Future Technologies” (Bayesian algorithm for model optimization). The authors are grateful to Prof. Alexander M. Feigin and Dr. Andrey Gavrilov from the Institute of Applied Physics of RAS for fruitful discussions, as well as the anonymous reviewer for useful comments.

Funding

This research was supported by Grant #19-42-04121 from the Russian Science Foundation (selecting the state variables and constructing the data-driven models of ENSO). Also, it was supported by Grant #19-02-00502 from the Russian Foundation for Basic Research (studying the ENSO phase locking to the annual cycle) and project #075-02-2022-875 of Program for the Development of the Regional Scientific and Educational Mathematical Center ”Mathematics of Future Technologies” (Bayesian algorithm for model optimization).

Author information

Authors and Affiliations

Authors

Contributions

The authors contributed equally to the methodology and analysis of the results. AS performed calculations and wrote the initial manuscript.

Corresponding author

Correspondence to Aleksei Seleznev.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Bayesian approach to ENSO model learning and optimization

Appendix: Bayesian approach to ENSO model learning and optimization

Here we outline the Bayesian approach we use for learning and optimization of the stochastic model (2). The optimal model relied on observed data is supposed to be a right balance between the “too simple” model poorly describing data and “too complex” model which contains too many parameters and tends to be ovefitted to the available sample rather than to capture the laws underlying the dynamics. Let the \({\mathbf {H}}=\left\{ H_1,H_2,\ldots ,H_i,\ldots \right\}\) is the set of possible hypotheses about the model complexity. In the case of a stochastic model (2) each hypothesis \(H_i\) is determined by the particular combination of the hyperparametrs lq in the case of the linear parameterization (3)–(4) of deterministic part \({\mathbf {f}}\) and lm in the case of the nonlinear parameterization (5). According to the Bayes rule the probability \(P(H_{i}|{\mathbf {Y}})\) that the model \(H_i\) produces the observed time series \({\mathbf {Y}}=({\mathbf {p}}_1,\ldots ,{\mathbf {p}}_n)\) is equal to:

$$\begin{aligned} P(H_{i}|{\mathbf {Y}})=\frac{P({\mathbf {Y}}|H_{i} )P(H_{i} )}{\sum _{i}P({\mathbf {Y}}|H_{i})P(H_{i})}. \end{aligned}$$
(9)

Here probability density function (PDF) \(P({\mathbf {Y}}|H_{i} )\) is the evidence (marginal likelihood) of the model \(H_i\) characterizing the probability of the observed data \({\mathbf {Y}}\) to belong to the whole possible ensemble of time series which can be produced by the model \(H_i\); \(P(H_{i} )\) is a prior probability of the model \(H_i\). The denominator in (9) is a normalization term which does not depend on \(H_i\). Assuming all the models from \({\mathbf {H}}\) equiprobable a priori, the expression (9) can be rewritten as \(P(H_{i}|{\mathbf {Y}})=\alpha P({\mathbf {Y}}|H_{i} )\) where \(\alpha\) is independent of \(H_i\). Let us define the Bayesian criterion of the model optimality

$$\begin{aligned} L=-\log {P({\mathbf {Y}}|H_{i} )}, \end{aligned}$$
(10)

minimization of which leads to maximization of the PDF \(P({\mathbf {Y}}|H_{i} )\). The optimality criterion (10) has a clear interpretation. If the model \(H_i\) is too simple, than the observed data likely lie on a tail of the PDF \(P({\mathbf {Y}}|H_{i} )\). Therefore, the probability that the observed data \({\mathbf {Y}}\) could be produced by such a model is small. In contrast, the overfitted model, due to a large number of parameters, produces a widely distributed population of different datasets, which lowers again the PDF of the observed \({\mathbf {Y}}\). Therefore, the optimality (10) helps to select the optimal model that is neither too simple nor overfitted model.

The evidence \(P({\mathbf {Y}}|H_{i} )\) is expressed via integration of the product of the corresponding likelihood function \(P({\mathbf {Y}}|\varvec{\mu }_{{\mathbf {f}}},\varvec{\mu }_{\widehat{{\mathbf {g}}}},H_{i} )\) and the prior distribution \(P(\varvec{\mu }_{{\mathbf {f}}},\varvec{\mu }_{\widehat{{\mathbf {g}}}}|H_{i} )\) over the model parameter space:

$$\begin{aligned} \begin{array}{c} P({\mathbf {Y}} |H_i) = \int P({\mathbf {Y}}|\varvec{\mu }_{{\mathbf {f}}},\varvec{\mu }_{\widehat{{\mathbf {g}}}},H_{i} ) \cdot P(\varvec{\mu }_{{\mathbf {f}}},\varvec{\mu }_{\widehat{{\mathbf {g}}}}|H_{i} ) d\varvec{\mu }_{{\mathbf {f}}}\varvec{\mu }_{\widehat{{\mathbf {g}}}}. \end{array} \end{aligned}$$
(11)

Here the vectors \(\varvec{\mu }_{{\mathbf {f}}},\varvec{\mu }_{\widehat{{\mathbf {g}}}}\) contain parameters of the deterministic part \({\mathbf {f}}\) and the stochastic part of the model (2), respectively. The likelihood function \(P({\mathbf {Y}}|\varvec{\mu }_{{\mathbf {f}}},\varvec{\mu }_{\widehat{{\mathbf {g}}}},H_{i} )\) corresponds to the assumption that the stochastic part of the model is the delta-correlated in time Gaussian process with the amplitude \(\widehat{{\mathbf {g}}}\) (see Sect. 2.2.1):

$$\begin{aligned}P({\mathbf {Y}}|\varvec{\mu }_{{\mathbf {f}}},\varvec{\mu }_{\widehat{{\mathbf {g}}}},H_{i} ) = \nonumber \\ \prod \limits _{n=1}^{l}P_\mathcal {N}({\mathbf {p}}_n,\widehat{I}) \times \prod \limits _{n=l+1}^{N} P_\mathcal {N}({\mathbf {p}}_n-{\mathbf {f}}\big ({\mathbf {p}}_{n-1},\ldots ,{\mathbf {p}}_{n-l}\big ),\widehat{{\mathbf {g}}}\widehat{{\mathbf {g}}}^T). \end{aligned}$$
(12)

Here \(\hat{I}\) is the \(d \times d\) identity matrix, \(P_\mathcal {N}({\mathbf {u}},\widehat{\Sigma }):=\frac{1}{\sqrt{(2\pi )^{d} |\widehat{\Sigma }|}}\exp {\left( -\frac{1}{2} {\mathbf {u}}^{T}\widehat{\Sigma }^{-1}{\mathbf {u}} \right) }\), \({\mathbf {u}} \in \mathbb {R}^d\), \(\prod \limits _{n=1}^{l}P_\mathcal {N}({\mathbf {p}}_n,\widehat{I})\) is a term describing PDF of the initial state of the model (see Gavrilov et al. 2017 for more details). The prior PDF \(P(\varvec{\mu }_{{\mathbf {f}}},\varvec{\mu }_{\widehat{{\mathbf {g}}}}|H_{i} )\) is the product of Gaussian PDFs for each parameter of the model. The proper choice of dispersions of the corresponding PDFs for linear (3)–(4) and nonlinear (5) parametrizations is discussed in detail in Mukhin et al. (2021), and Seleznev et al. (2019).

The evidence (11) is estimated using the Laplace’s method based on approximate integrating in the neighborhood of maximum of integrand. Let us denote the minus logarithm of the integrand in (11) as \(\varPsi _{H_i}(\varvec{\mu }_{{\mathbf {f}}},\varvec{\mu }_{\widehat{{\mathbf {g}}}})\). Then the integrand can be rewritten as:

$$\begin{aligned} \begin{array}{l} P({\mathbf {Y}}|\varvec{\mu }_{{\mathbf {f}}},\varvec{\mu }_{\widehat{{\mathbf {g}}}},H_{i} ) \cdot P(\varvec{\mu }_{{\mathbf {f}}},\varvec{\mu }_{\widehat{{\mathbf {g}}}}|H_{i} ) = \exp (-\varPsi _{H_i}(\varvec{\mu }_{{\mathbf {f}}},\varvec{\mu }_{\widehat{{\mathbf {g}}}})). \end{array} \end{aligned}$$
(13)

The integration of (11) using Laplace method by decomposing the function \(\varPsi _{H_i}(\varvec{\mu }_{{\mathbf {f}}},\varvec{\mu }_{\widehat{{\mathbf {g}}}})\) in the neighborhood of its minimum into a second-order Taylor series leads to the following expression for the optimality criterion (10):

$$\begin{aligned} L= & {} -\log {P({\mathbf {Y}}|H_{i} )}\nonumber \\= & {} \varPsi _{H_i}(\overline{\varvec{\mu }_{{\mathbf {f}}}},\overline{\varvec{\mu }_{\widehat{{\mathbf {g}}}}})+ \frac{1}{2} \ln \left[ \frac{1}{(2\pi )^M}\left| \nabla \nabla ^T\varPsi _{H_i}(\overline{\varvec{\mu }_{{\mathbf {f}}}},\overline{\varvec{\mu }_{\widehat{{\mathbf {g}}}}}) \right| \right] . \end{aligned}$$
(14)

Here \(\varPsi _{H_i}(\overline{\varvec{\mu }_{{\mathbf {f}}}},\overline{\varvec{\mu }_{\widehat{{\mathbf {g}}}}})\) is the function value at its minimum, M is full number of model parameters collected in vectors \(\varvec{\mu }_{{\mathbf {f}}},\varvec{\mu }_{\widehat{{\mathbf {g}}}}\), \(\nabla \nabla ^T \varPsi _{H_i}(\overline{\varvec{\mu }_{{\mathbf {f}}}},\overline{\varvec{\mu }_{\widehat{{\mathbf {g}}}}})\) is the \(M \times M\) matrix of the second derivatives (hessian matrix) at the minimum. The first term in (14) reflects the accuracy of data approximation by the model. It decreases with expanding the model complexity, i.e. with growing of number of parameters, and therefore prevents too simple models. In contrast, the second term in (14) increases with growing of the number of model parameters and penalizes the overfitted models. The particular algorithm we use for numerical calculation of (14) can be found in Seleznev et al. (2019).

In practice, to select the optimal hyperparameters, we iterate over the integers q (or m) and l in a wide predefined range and select those that provide the smallest L. In this work we define the range \([ 0,1, \ldots , 6 ]\) for q, \([1,2,\ldots ,10]\) for m, and \([1,2,\ldots ,10]\) for l.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Seleznev, A., Mukhin, D. Improving statistical prediction and revealing nonlinearity of ENSO using observations of ocean heat content in the tropical Pacific. Clim Dyn 60, 1–15 (2023). https://doi.org/10.1007/s00382-022-06298-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00382-022-06298-x

Keywords

Navigation