Linear dynamical modes as new variables for data-driven ENSO forecast

Gavrilov, Andrey; Seleznev, Aleksei; Mukhin, Dmitry; Loskutov, Evgeny; Feigin, Alexander; Kurths, Juergen

doi:10.1007/s00382-018-4255-7

Linear dynamical modes as new variables for data-driven ENSO forecast

Published: 16 May 2018

Volume 52, pages 2199–2216, (2019)
Cite this article

Climate Dynamics Aims and scope Submit manuscript

Andrey Gavrilov ORCID: orcid.org/0000-0001-9779-7308¹,
Aleksei Seleznev¹,
Dmitry Mukhin¹,
Evgeny Loskutov¹,
Alexander Feigin¹ &
…
Juergen Kurths^1,2

1037 Accesses
25 Citations
11 Altmetric
1 Mention
Explore all metrics

Abstract

A new data-driven model for analysis and prediction of spatially distributed time series is proposed. The model is based on a linear dynamical mode (LDM) decomposition of the observed data which is derived from a recently developed nonlinear dimensionality reduction approach. The key point of this approach is its ability to take into account simple dynamical properties of the observed system by means of revealing the system’s dominant time scales. The LDMs are used as new variables for empirical construction of a nonlinear stochastic evolution operator. The method is applied to the sea surface temperature anomaly field in the tropical belt where the El Nino Southern Oscillation (ENSO) is the main mode of variability. The advantage of LDMs versus traditionally used empirical orthogonal function decomposition is demonstrated for this data. Specifically, it is shown that the new model has a competitive ENSO forecast skill in comparison with the other existing ENSO models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A pre-whitening with block-bootstrap cross-correlation procedure for temporal alignment of data sampled by eddy covariance systems

Article Open access 21 April 2024

Machine Learning Strategies for Time Series Forecasting

A multi-scale adaptive grid partition method based on two-dimensional Fourier transform for ZTD

Article 29 May 2024

Notes

NOAA_ERSST_V4 data provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their Web site at http://www.esrl.noaa.gov/psd/.

References

Alexander MA, Bladé I, Newman M, Lanzante JR, Lau NC, Scott JD (2002) The atmospheric bridge: the influence of ENSO teleconnections on air-sea interaction over the global oceans. J Clim 15(16):2205–2231. https://doi.org/10.1175/1520-0442(2002)015<2205:TABTIO>2.0.CO;2
Article Google Scholar
Barnston AG, Chelliah M, Goldenberg SB (1997) Documentation of a highly ENSO related sst region in the equatorial Pacific: research note. Atmos Ocean 35(3):367–383. https://doi.org/10.1080/07055900.1997.9649597
Article Google Scholar
Barnston AG, Tippett MK, L’Heureux ML, Li S, Dewitt DG (2012) Skill of real-time seasonal ENSO model predictions during 2002–11: is our capability increasing? Bull Am Meteorol Soc 93(5):631–651. https://doi.org/10.1175/BAMS-D-11-00111.1
Article Google Scholar
Berliner LM, Wikle CK, Cressie N (2000) Long-lead prediction of Pacific SSTs via Bayesian dynamic modeling. J Clim 13(22):3953–3968. https://doi.org/10.1175/1520-0442(2001)013<3953:LLPOPS>2.0.CO;2
Article Google Scholar
Bjerknes J (1969) Atmospheric teleconnections from the equatorial Pacific. Mon Weather Rev 97(3):163–172. https://doi.org/10.1175/1520-0493(1969)097<0163:ATFTEP>2.3.CO;2
Article Google Scholar
Chekroun MD, Kondrashov D (2017) Data-adaptive harmonic spectra and multilayer Stuart–Landau models. Chaos Interdiscip J Nonlinear Sci 27(9):093110. https://doi.org/10.1063/1.4989400
Article Google Scholar
Coifman RR, Lafon S (2006) Diffusion maps. Appl Comput Harmon Anal 21(1):5–30. https://doi.org/10.1016/j.acha.2006.04.006
Article Google Scholar
de la Iglesia MD, Tabak EG (2013) Principal dynamical components. Commun Pure Appl Math 66(1):48–82. https://doi.org/10.1002/cpa.21411. arXiv:1012.3963v1
Article Google Scholar
DelSole T, Tippett MK (2009b) Average predictability time. Part II: seamless diagnoses of predictability on multiple time scales. J Atmos Sci 66(5):1188–1204. https://doi.org/10.1175/2008JAS2869.1
Article Google Scholar
DelSole T, Tippett MK (2009a) Average predictability time. Part I: theory. J Atmos Sci 66(5):1172–1187. https://doi.org/10.1175/2008JAS2868.1
Article Google Scholar
Dong D, McAvoy T (1996) Nonlinear principal component analysis–based on principal curves and neural networks. Comput Chem Eng 20(1):65–78. https://doi.org/10.1016/0098-1354(95)00003-K
Article Google Scholar
Gámez AJ, Zhou CS, Timmermann A, Kurths J (2004) Nonlinear dimensionality reduction in climate data. Nonlinear Process Geophys 11(3):393–398. https://doi.org/10.5194/npg-11-393-2004
Article Google Scholar
Gavrilov A, Mukhin D, Loskutov E, Volodin E, Feigin A, Kurths J (2016) Method for reconstructing nonlinear modes with adaptive structure from multidimensional data. Chaos Interdiscip J Nonlinear Sci 26(12):123101. https://doi.org/10.1063/1.4968852
Article Google Scholar
Gavrilov A, Loskutov E, Mukhin D (2017) Bayesian optimization of empirical model with state-dependent stochastic forcing. Chaos Solitons Fract 104:372. https://doi.org/10.1016/j.chaos.2017.08.032
Article Google Scholar
Ghil M, Allen MR, Dettinger MD, Ide K, Kondrashov D, Mann ME, Robertson AW, Saunders A, Tian Y, Varadi F, Yiou P (2002) Advanced spectral methods for climatic time series. Rev Geophys 40(1):1003. https://doi.org/10.1029/2000RG000092
Article Google Scholar
Grieger B, Latif M (1994) Reconstruction of the El Niño attractor with neural networks. Clim Dyn 10:267–276
Article Google Scholar
Guckenheimer J, Timmermann A, Dijkstra H, Roberts A (2017) (Un)predictability of strong El Niño events. Dyn Stat Clim Syst. https://doi.org/10.1093/climsys/dzx004
Article Google Scholar
Hannachi A, Jolliffe IT, Stephenson DB (2007) Empirical orthogonal functions and related techniques in atmospheric science: a review. Int J Climatol 27(9):1119–1152. https://doi.org/10.1002/joc.1499
Article Google Scholar
Hasselmann K (1988) PIPs and POPs: the reduction of complex dynamical systems using principal interaction and oscillation patterns. J Geophys Res 93(D9):11015. https://doi.org/10.1029/JD093iD09p11015
Article Google Scholar
Hastie T (1984) Principal curves and surfaces. Ph.D Dissertation. PhD thesis, Stanford Linear Accelerator Center, Stanford University. http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-r-276.pdf. Accessed 29 May 2015
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Article Google Scholar
Huang B, Banzon VF, Freeman E, Lawrimore J, Liu W, Peterson TC, Smith TM, Thorne PW, Woodruff SD, Zhang HM, Huang B, Banzon VF, Freeman E, Freeman E, Lawrimore J, Liu W, Peterson TC, Smith TM, Thorne PW, Woodruff SD, Zhang HM (2015) Extended reconstructed sea surface temperature version 4 (ERSST.v4). Part I: upgrades and intercomparisons. J Clim 28(3):911–930. https://doi.org/10.1175/JCLI-D-14-00006.1
Article Google Scholar
Jeffreys H (1998) Theory of probability. Clarendon.https://global.oup.com/academic/product/the-theory-of-probability-9780198503682?cc=ru&lang=en&. Accessed 9 May 2017
Johnson SD, Battisti DS, Sarachik ES (2000) Seasonality in an empirically derived Markov model of tropical Pacific sea surface temperature anomalies. J Clim 13(18):3327–3335. https://doi.org/10.1175/1520-0442(2000)013<3327:SIAEDM>2.0.CO;2
Article Google Scholar
Jolliffe IT (1986) Principal component analysis. Springer series in statistics, 2nd edn. Springer, New York. https://doi.org/10.1007/978-1-4757-1904-8
Book Google Scholar
Kao HY, Yu JY (2009) Contrasting eastern-Pacific and central-Pacific types of ENSO. J Clim 22(3):615–632. https://doi.org/10.1175/2008JCLI2309.1
Article Google Scholar
Kessler WS (2002) Is ENSO a cycle or a series of events? Geophys Res Lett 29(23):40. https://doi.org/10.1029/2002GL015924
Article Google Scholar
Kondrashov D, Kravtsov D, Robertson AW, Ghil M (2015) A hierarchy of data-based ENSO models. J Clim 18(21):4425–4444. https://doi.org/10.1175/JCLI3567.1
Article Google Scholar
Kramer MA (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE 37(2):233–243. https://doi.org/10.1002/aic.690370209
Article Google Scholar
Kravtsov S (2012) An empirical model of decadal ENSO variability. Clim Dyn 39(9–10):2377–2391. https://doi.org/10.1007/s00382-012-1424-y
Article Google Scholar
Kravtsov S, Kondrashov D, Ghil M (2005) Multilevel regression modeling of nonlinear processes: derivation and applications to climatic variability. J Clim 18:4404–4424
Article Google Scholar
Kravtsov S, Kondrashov D, Ghil M (2009) Empirical model reduction and the modeling hierarchy in climate dynamics. In: Palmer T, Williams P (eds) Stochastic physics and climate modelling. Cambridge University Press, Cambridge, pp 35–72
Google Scholar
Kwasniok F (1996) The reduction of complex dynamical systems using principal interaction patterns. Phys D Nonlinear Phenom 92(1–2):28–60. https://doi.org/10.1016/0167-2789(95)00280-4
Article Google Scholar
Kwasniok F (1997) Optimal Galerkin approximations of partial differential equations using principal interaction patterns. Phys Rev 55(5):5365–5375. https://doi.org/10.1103/PhysRevE.55.5365
Article Google Scholar
Kwasniok F (2007) Reduced atmospheric models using dynamically motivated basis functions. J Atmos Sci 64(10):3452–3474. https://doi.org/10.1175/JAS4022.1
Article Google Scholar
Liu W, Huang B, Thorne PW, Banzon VF, Zhang HM, Freeman E, Lawrimore J, Peterson TC, Smith TM, Woodruff SD (2015) Extended reconstructed sea surface temperature version 4 (ERSST.v4): part II. Parametric and structural uncertainty estimations. J Clim 28(3):931–951
Article Google Scholar
Loskutov EM, Molkov YI, Mukhin DN, Feigin AM (2008) Markov chain Monte Carlo method in Bayesian reconstruction of dynamical systems from noisy chaotic time series. Phys Rev E Stat Nonlinear Soft Matter Phys 77(6):066214. https://doi.org/10.1103/PhysRevE.77.066214
Article Google Scholar
Maimon O, Rokach L (2010) Data mining and knowledge discovery handbook. Springer, New York
Book Google Scholar
Molkov YI, Mukhin DN, Loskutov EM, Feigin AM, Fidelin GA (2009) Using the minimum description length principle for global reconstruction of dynamic systems from noisy time series. Phys Rev E Stat Nonlinear Soft Matter Phys 80(4):046207. https://doi.org/10.1103/PhysRevE.80.046207
Article Google Scholar
Molkov YI, Mukhin DN, Loskutov EM, Timushev RI, Feigin AM (2011) Prognosis of qualitative system behavior by noisy, nonstationary, chaotic time series. Phys Rev E 84(3):036215. https://doi.org/10.1103/PhysRevE.84.036215
Article Google Scholar
Molkov YI, Loskutov EM, Mukhin DN, Feigin AM (2012) Random dynamical models from time series. Phys Rev E 85(3):036216. https://doi.org/10.1103/PhysRevE.85.036216
Article Google Scholar
Mukhin DN, Feigin AM, Loskutov EM, Molkov YI (2006) Modified Bayesian approach for the reconstruction of dynamical systems from time series. Phys Rev E Stat Nonlinear Soft Matter Phys 73(3):036211. https://doi.org/10.1103/PhysRevE.73.036211
Article Google Scholar
Mukhin D, Gavrilov A, Feigin A, Loskutov E, Kurths J (2015a) Principal nonlinear dynamical modes of climate variability. Sci Rep 5:15510. https://doi.org/10.1038/srep15510
Article Google Scholar
Mukhin D, Kondrashov D, Loskutov E, Gavrilov A, Feigin A, Ghil M (2015b) Predicting critical transitions in ENSO models. Part II: spatially dependent models. J Clim 28(5):1962–1976. https://doi.org/10.1175/JCLI-D-14-00240.1
Article Google Scholar
Mukhin D, Loskutov E, Mukhina A, Feigin A, Zaliapin I, Ghil M (2015c) Predicting critical transitions in ENSO models. Part I: methodology and simple models with memory. J Clim 28(5):1940–1961. https://doi.org/10.1175/JCLI-D-14-00239.1
Article Google Scholar
Mukhin D, Gavrilov A, Loskutov E, Feigin A, Kurths J (2017) Nonlinear reconstruction of global climate leading modes on decadal scales. Clim Dyn. https://doi.org/10.1007/s00382-017-4013-2
Article Google Scholar
Penland C, Magorian T (1993) Prediction of Niño 3 sea surface temperatures using linear inverse modeling. J Clim 6(6):1067–1076. https://doi.org/10.1175/1520-0442(1993)006<1067:PONSST>2.0.CO;2
Article Google Scholar
Penland C, Sardeshmukh PD (1995) The optimal growth of tropical sea surface temperature anomalies. J Clim 8(8):1999–2024. https://doi.org/10.1175/1520-0442(1995)008<1999:TOGOTS>2.0.CO;2
Article Google Scholar
Pires CAL, Hannachi A (2017) independent subspace analysis of the sea surface temperature variability: non-Gaussian sources and sensitivity to sampling and dimensionality. Complexity 2017:1–23. https://doi.org/10.1155/2017/3076810
Article Google Scholar
Pires CAL, Ribeiro AFS (2016) Separation of the atmospheric variability into non-Gaussian multidimensional sources by projection. Clim Dyn 48:1–30. https://doi.org/10.1007/s00382-016-3112-9
Article Google Scholar
Preisendorfer R (1988) Principal component analysis in meteorology and oceanography. Elsevier, London
Google Scholar
Rasmusson EM, Carpenter TH (1982) Variations in tropical sea surface temperature and surface wind fields associated with the Southern oscillation/El Niño. Mon Weather Rev 110(5):354–384. https://doi.org/10.1175/1520-0493(1982)110<0354:VITSST>2.0.CO;2
Article Google Scholar
Rossi V, Vila JP (2006) Bayesian multioutput feedforward neural networks comparison: a conjugate prior approach. IEEE Trans Neural Netw 17(1):35–47. https://doi.org/10.1109/TNN.2005.860883
Article Google Scholar
Saha S, Moorthi S, Wu X, Wang J, Nadiga S, Tripp P, Behringer D, Hou YT, Chuang HY, Iredell M, Ek M, Meng J, Yang R, Mendez MP, van den Dool H, Zhang Q, Wang W, Chen M, Becker E (2014) The NCEP climate forecast system version 2. J Clim 27(6):2185–2202. https://doi.org/10.1175/JCLI-D-12-00823.1
Article Google Scholar
Scholkopf B, Smola A, Muller KR (1998) Nonlinear component analysis as a Kernel Eigenvalue problem. Neural Comp 10(5):1299–1319. https://doi.org/10.1162/089976698300017467
Article Google Scholar
Strounine K, Kravtsov S, Kondrashov D, Ghil M (2010) Reduced models of atmospheric low-frequency variability: parameter estimation and comparative performance. Phys D Nonlinear Phenom 239:145–166
Article Google Scholar
Suarez MJ, Schopf PS (1988) A delayed action oscillator for ENSO. J Atmos Sci 45(21):3283–3287. https://doi.org/10.1175/1520-0469(1988)045<3283:ADAOFE>2.0.CO;2
Article Google Scholar
Tan S, Mayrovouniotis ML (1995) Reducing data dimensionality through optimizing neural network inputs. AIChE J 41(6):1471–1480. https://doi.org/10.1002/aic.690410612
Article Google Scholar
Tippett MK, Barnston AG, Li S (2012) Performance of recent multimodel ENSO forecasts. J Appl Meteorol Climatol 51(3):637–654. https://doi.org/10.1175/JAMC-D-11-093.1
Article Google Scholar
Trenberth KE (1997) The definition of El Niño. Bull Am Meteorol Soc 78(12):2771–2777. https://doi.org/10.1175/1520-0477(1997)078<2771:TDOENO>2.0.CO;2
Article Google Scholar
Vejmelka M, Pokorná L, Hlinka J, Hartman D, Jajcay N, Paluš M (2015) Non-random correlation structures and dimensionality reduction in multivariate climate data. Clim Dyn 44(9–10):2663–2682. https://doi.org/10.1007/s00382-014-2244-z
Article Google Scholar
Wang C, Deser C, Yu JY, DiNezio P, Clement A (2017) El Niño and southern oscillation (ENSO): a review. Springer, Dordrecht, pp 85–106. https://doi.org/10.1007/978-94-017-7499-4-4
Book Google Scholar
Wu A, Hsieh WW, Tang B (2006) Neural network forecasts of the tropical Pacific sea surface temperatures. Neural Netw 19(2):145–154. https://doi.org/10.1016/j.neunet.2006.01.004
Article Google Scholar
Wyrtki K (1975) El Niño The dynamic response of the equatorial Pacific ocean to atmospheric forcing. J Phys Oceanogr 5(4):572–584. https://doi.org/10.1175/1520-0485(1975)005<0572:ENTDRO>2.0.CO;2
Article Google Scholar
Xin X, Gao F, Wei M, Wu T, Fang Y, Zhang J (2017) Decadal prediction skill of BCC-CSM1.1 climate model in East Asia. Int J Climatol 38:584. https://doi.org/10.1002/joc.5195
Article Google Scholar
Xue Y, Leetmaa A, Ji M (2000) ENSO prediction with Markov models: the impact of sea level. J Clim 13(4):849–871. https://doi.org/10.1175/1520-0442(2000)013<0849:EPWMMT>2.0.CO;2
Article Google Scholar
Zebiak SE, Cane MA (1987) A Model El Ni&ntilde southern oscillation. Mon Weather Rev 115(10):2262–2278. https://doi.org/10.1175/1520-0493(1987)115<2262:AMENO>2.0.CO;2
Article Google Scholar
Zhang G, Kline D (2007) Quarterly time-series forecasting with neural networks. IEEE Trans Neural Netw 18(6):1800–1814. https://doi.org/10.1109/TNN.2007.896859. http://ieeexplore.ieee.org/document/4359174/

Download references

Acknowledgements

The results (Sects. 3, 4) were obtained and analyzed under support of the Government of Russian Federation (Agreement #14.Z50.31.0033 with the Institute of Applied Physics of RAS). The methods (Sect. 2) were developed under support of the Russian Science Foundation (Grant #16-12-10198).

Author information

Authors and Affiliations

Institute of Applied Physics of RAS, 46 Ul’yanov Str., 603950, Nizhny Novgorod, Russia
Andrey Gavrilov, Aleksei Seleznev, Dmitry Mukhin, Evgeny Loskutov, Alexander Feigin & Juergen Kurths
Potsdam Institute for Climate Impact Research, Telegraphenberg A31, 14473, Potsdam, Germany
Juergen Kurths

Authors

Andrey Gavrilov
View author publications
You can also search for this author in PubMed Google Scholar
Aleksei Seleznev
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Mukhin
View author publications
You can also search for this author in PubMed Google Scholar
Evgeny Loskutov
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Feigin
View author publications
You can also search for this author in PubMed Google Scholar
Juergen Kurths
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrey Gavrilov.

Appendices

Appendix 1: LDM decomposition

Here we describe LDM decomposition which is a subcase of more general NDM decomposition developed essentially in Gavrilov et al. (2016), Mukhin et al. (2015a) and Mukhin et al. (2017).

LDM expression At first, in accordance with Sec.IIB1 of Gavrilov et al. (2016), the initial data ${\mathbf {x}}=({\mathbf {x}}_1,\ldots ,{\mathbf {x}}_N)$ is preliminarily rotated to full traditional EOF basis:

$$\begin{aligned} {\mathbf {y}}_n={\mathbf {V}}^T\cdot {\mathbf {x}}_n. \end{aligned}$$

(12)

Here ${\mathbf {x}}_n \in \mathbb {R}^D$ is spatially distributed data measured at the time moment n (D is equal to the number of nodes in spatial grid), ${\mathbf {y}}_n \in \mathbb {R}^D$ is a vector of EOF-based PCs, ${\mathbf {V}}$ is a full $D \times D$ rotation matrix whose columns are D EOFs obtained via the diagonalization of the covariance matrix $\frac{1}{N}\sum \nolimits _{n=1}^{N}{\mathbf {x}}_n\cdot {\mathbf {x}}_n^T$ and ordered so that the corresponding eigenvalues $\lambda _1,\ldots \lambda _D$ decrease.

After, the model of LDM is constructed in the space of K leading PCs:

$$\begin{aligned} \left( \begin{array}{c} y_{1,n} \\ \vdots \\ y_{K,n} \end{array} \right) = {\mathbf {A}}\cdot {\mathbf {p}}_n+{\mathbf {c}} + \sigma \left( \begin{array}{c} \varvec{\xi }_{1,n} \\ \vdots \\ \varvec{\xi }_{K,n} \end{array} \right) , \end{aligned}$$

(13)

Here ${\mathbf {p}}_n \in \mathbb {R}^d$ is a vector of LDM-based PCs, $K \times d$ matrix ${\mathbf {A}}$ and $K-$dimensional vector ${\mathbf {c}}$ are the parameters of LDM, $\varvec{\xi }_{1,n},\ldots ,\varvec{\xi }_{K,n}$ are delta-correlated random normal processes and $\sigma$ is their amplitude.

By direct analogy with geometrical interpretation of NDMs (Sect. IIA of Gavrilov et al. (2016)) the LDM defines a $d-$dimensional linear manifold in the space spanned by K leading EOFs (we assume that $d \le K$), and the time series ${\mathbf {p}}_{1},\ldots ,{\mathbf {p}}_{N}$ are the values of coordinates on this manifold. The number K is a hyperparameter which is optimized in the algorithm. To do it correctly all the other PCs are modeled as a noise:

$$\begin{aligned} \begin{array}{lcl} y_{K+1,n} &{} = &{} \sigma _{K + 1} \varvec{\xi }_{K + 1,n}, \\ ~~~\vdots &{}&{} \\ y_{D,n} &{} = &{} \sigma _{D} \varvec{\xi }_{D,n}. \end{array} \end{aligned}$$

(14)

Here $\varvec{\xi }_{K+1,n},\ldots ,\varvec{\xi }_{D,n}$ are delta-correlated random normal processes and $\sigma _{K+1},\ldots ,\sigma _D$ are their amplitudes (not to be confused with $\sigma _k$ in Sect. 2.2.2).

Taking into account Eqs. (12)–(14) we can rewrite the LDM as ${\mathbf {L}}({\mathbf {p}}_n)$ introduced in the Eq. (3):

$$\begin{aligned} {\mathbf {L}}({\mathbf {p}}_n) = {\mathbf {V}}\cdot \left( \begin{array}{c} \sum \limits _{i=1}^{d} A_{1i}p_{in}+c_1 \\ \vdots \\ \sum \limits _{i=1}^{d} A_{Ki}p_{in}+c_K \\ 0 \\ \vdots \\ 0 \end{array} \right) . \end{aligned}$$

(15)

LDM learning As was mentioned in Sect. 2.1.2, to estimate both the values ${\mathbf {A}}$, ${\mathbf {c}}$, $\varvec{\varsigma }=(\sigma ;\sigma _{K+1},\ldots ,\sigma _{D})$ and the time series ${\mathbf {p}}=({\mathbf {p}}_1,\ldots ,{\mathbf {p}}_N)$ in the described model we use probabilistic Bayesian approach (Jeffreys 1998). According to this approach, all the unknown parameters ${\mathbf {A}}$, ${\mathbf {c}}$, $\varvec{\varsigma }$, ${\mathbf {p}}$ are assigned the prior posterior PDF and then they are learned by the maximization of the posterior PDF which updates the prior PDFs using the observations ${\mathbf {x}}$ and the model (3). The choice of the prior PDFs is explained in great detail in Appendix A of Gavrilov et al. (2016) for the case of one-dimensional NDMs which is trivially generalized for the case of multidimensional manifold (Mukhin et al. 2017). Here we present these PDFs rewritten for the special case of LDMs omitting the detailed explanation.

Expression for the prior PDF for PCs introduced in the main text is as follows:

$$\begin{aligned} \begin{array}{l} P_{pr}({\mathbf {p}}|\varvec{\tau }) ={\prod \limits _{i=1}^{d} \frac{1}{\sqrt{2\pi }}} \exp {\left( -\frac{\big |p_{i1}\big |^2}{2} \right) }\times \prod \limits _{i=1}^{d}\prod \limits _{n=1}^{N-1}\frac{1}{\sqrt{2\pi (1-\alpha _i^2)}}\exp {\left( -\frac{(p_{i,n+1} - \alpha _{i} \cdot p_{in})^2}{ 2 (1-\alpha _i^2)} \right) },\\ \alpha _{i}:=\exp \left( -{1}/{\tau _i}\right) . \end{array} \end{aligned}$$

(16)

Actually, this PDF corresponds to prior assumption that the time series $p_{i1},\ldots ,p_{iN}$ can be produced by the simplest linear stochastic evolution operator (red noise) with autocorrelation time $\tau _i$, thus taking into account dynamical information about connection of the successive states with each other (see also Appendix A1 of Gavrilov et al. 2016). The vector $\varvec{\tau }$ is treated as a hyperparameter (see below), and no specific apriori knowledge is assumed about it.

One should keep in mind that the $\tau _i$’s only correspond to autocorrelation times of the a priori assumed PCs (prior PDF (16)). However, this prior PDF is updated by the observations, in consistence with the Bayesian framework. As a result, the $\tau _i$’s are not exactly equal to the autocorrelation times of the obtained PCs. In practice, they lead to a restriction on the smallest scale of each PC, i.e. filtering out all scales faster than some value related to $\tau _i$.

The prior PDF for LDM coefficients ${\mathbf {A}}$ and ${\mathbf {c}}$ is taken in order to complement the PDF (16):

$$\begin{aligned} \begin{array}{l} P_{pr}({\mathbf {A}},{\mathbf {c}}| K) =\prod \limits _{k=1}^{K}\left( \frac{d+1}{2\pi \lambda _k}\right) ^{\frac{d+1}{2}} \exp {\left( -\frac{d+1}{2\lambda _k} (c_k^2+\sum \limits _{i=1}^{d} A_{ki}^2)\right) }. \end{array} \end{aligned}$$

(17)

It is derived directly from Appendix A2 of Gavrilov et al. (2016) and provides the uniqueness of variances of PCs (i.e. normalization).

The prior PDF for $\varvec{\varsigma }$ is simply taken as constant because the Bayesian problem is well-posed by $\varvec{\varsigma }$:

$$\begin{aligned} P_{pr}(\varvec{\varsigma }| K)=const. \end{aligned}$$

(18)

Thus, the Bayesian posterior PDF is expressed via the prior PDFs in a standard way:

$$P({\mathbf {A}}, {\mathbf {c}}, \varvec{\varsigma },{\mathbf {p}}, |{\mathbf {x}},\varvec{\tau },K)\propto P({\mathbf {x}}|{\mathbf {A}},{\mathbf {c}},\varvec{\varsigma },{\mathbf {p}},\varvec{\tau },K)\times P_{pr}({\mathbf {p}}|\varvec{\tau }) ~ P_{pr}({\mathbf {A}},{\mathbf {c}}| K) ~ P_{pr}(\varvec{\varsigma }| K).$$

(19)

Here the likelihood of the LDM $P({\mathbf {x}}|{\mathbf {A}},{\mathbf {c}},\varvec{\varsigma },{\mathbf {p}},\varvec{\tau },K)$ is the only function depending on the observation. It is written from the Gaussianity assumption of the noise in our LDM model (12)–(15):

$$\begin{aligned} \begin{array}{l} P({\mathbf {x}}|{\mathbf {A}},{\mathbf {c}},\varvec{\varsigma },{\mathbf {p}},\varvec{\tau },K) =P({\mathbf {V}}^T{\mathbf {x}}|{\mathbf {A}},{\mathbf {c}},\varvec{\varsigma },{\mathbf {p}},\varvec{\tau },K)\\ =\prod \limits _{n=1}^N \prod \limits _{k=1}^K \frac{1}{\sqrt{2\pi \sigma ^2}} \exp {\left[ -\frac{\left| \sum \nolimits _{i=1}^D V_{ik}x_{in}-\sum \nolimits _{i=1}^{d} A_{ki}p_{in}-c_k\right| ^2}{ 2 \sigma ^2} \right] } \\ \times \prod \limits _{n=1}^{N}\prod \limits _{k=K+1}^{D}\frac{1}{\sqrt{2\pi \sigma _{k}^2}} \exp {\left[ -\frac{\left| \sum \nolimits _{i=1}^D V_{ik}x_{in}\right| ^2}{ 2 \sigma _{k}^2} \right] }. \end{array} \end{aligned}$$

(20)

The parameters ${\mathbf {A}}$, ${\mathbf {c}}$, $\varvec{\varsigma }$, ${\mathbf {p}}$ of LDM are found by maximization of (19). Note that the exponential factors in (20) are quadratic by ${\mathbf {p}}$ as well as by coefficients ${\mathbf {A}}$, ${\mathbf {c}}$, and the number of the coefficients is proportional to $d+1$, while in case of general NDM parameterized by nonlinear polynomial the exponential factor would be a polynomial of higher order by ${\mathbf {p}}$, and the number of coefficients would grow dramatically with the manifold dimension d and polynomial degree m (proportional to $C^{m}_{m+d}$). These facts make the maximization problem well-posed (solution uniqueness) and numerically fast for LDM dimensions $d>2$, in contrast with the case of general NDM.

LDM optimization To find the hyperparameters $(\varvec{\tau },K)$ the Bayesian evidence, or marginal likelihood, is used as a cost function. It represents the probability of the model with given hyperparameters to reproduce the observed data and is defined as follows:

$$\begin{aligned} P({\mathbf {x}}|\varvec{\tau },K)&= \int \limits _{{\mathbf {A}},{\mathbf {c}},\varvec{\varsigma },{\mathbf {p}}} [ P({\mathbf {x}}|{\mathbf {A}},{\mathbf {c}},\varvec{\varsigma },{\mathbf {p}},\varvec{\tau },K) \\ & \quad \times P_{pr}({\mathbf {p}}|\varvec{\tau }) ~P_{pr}({\mathbf {A}},{\mathbf {c}}| K) ~P_{pr}(\varvec{\varsigma }| K) ]. \end{aligned}$$

(21)

Such a cost function allows us to find in Bayesian way optimal hyperparameters for which the model is neither underfitted, nor overfitted. It is explored in detail in Gavrilov et al. (2016).

To estimate (21) for different hyperparameters and find the optimal LDM we use exactly the same algorithm based on Laplace approximation as given in Gavrilov et al. (2016).

Appendix 2: Bayesian approach to evolution operator construction

Here we describe the procedure of construction of evolution operator for PCs. Generally it is analogical to the procedure described in Molkov et al. (2011), Molkov et al. (2012), Mukhin et al. (2015b) and Gavrilov et al. (2017), but with slight modifications. According to this procedure, we formulate at first the model and prior PDF for its parameters explicitly.

The evolution operator model is defined by the Eq. (5) where we parameterize the function $\varvec{\varphi }$ by ANN in the form of perceptron with one hidden layer and hyperbolic tangent activation function (Hornik et al. 1989):

$$\begin{aligned} \varvec{\varphi }({\mathbf {z}},{\mathbf {f}},t)=\sum \limits _{i=1}^m(\varvec{\alpha }_{i} + t\varvec{\beta }_{i})\cdot \tanh (\varvec{\omega }_{i}^T{\mathbf {z}}+\varvec{\nu }_{i}^T{\mathbf {f}}+\gamma _{i}). \end{aligned}$$

(22)

Here ${\mathbf {z}}$ stands for the collection of ${\mathbf {p}}_{n-1},\ldots ,{\mathbf {p}}_{n-l}$ from (5); $\varvec{\alpha }_i\in \mathbb {R}^{d}$, $\varvec{\beta }_i\in \mathbb {R}^{d}$, $\varvec{\omega }_i\in \mathbb {R}^{ld}$, $\varvec{\nu }_i\in \mathbb {R}^{\dim {\mathbf {f}}}$, $\gamma _i\in \mathbb {R}$ are the ANN coefficients, let us collect them into vector $\varvec{\mu }_{\varphi }=(\varvec{\alpha }_1,\varvec{\beta }_1,\varvec{\omega }_1,\varvec{\nu }_1,\gamma _1,\ldots ,\varvec{\alpha }_m,\varvec{\beta }_m,\varvec{\omega }_m,\varvec{\nu }_m,\gamma _m)$.

Generally, to model slow non-stationarity of the parameters of real system’s evolution operator, it is appropriate to introduce a weak time-dependence into all parameters of $\varvec{\varphi }$. In the first-order expansion it leads to a linear dependence of $\varvec{\varphi }$ on time t. However, for the particular case of ANN it can be readily shown that the form (22) provides a universal approximation of such an expansion for any nonlinear function of $({\mathbf {z}},{\mathbf {f}},t)$ (Molkov et al. 2011).

The matrix $\hat{{\mathbf {g}}}$ in the random part of the model is parameterized as lower-triangular matrix via its values $\varvec{\mu }_{g}=(g_{11}; \ g_{21}, \ g_{22};\,\, \ldots ;g_{m1}, \ldots , \ g_{mm})$, which is sufficient for representation of an arbitrary cross-correlation matrix of random part (Mukhin et al. 2015b).

The prior PDF for ANN coefficients is derived in Molkov et al. (2011) and Molkov et al. (2012) for the case of normalized ANN inputs and outputs. Here we have multidimensional time series whose components have different norms, and this difference is important. If we take the norms into account, it is easy to write the following modified expression for the prior PDF for the model parameters using the expressions from (Molkov et al. 2011, 2012):

$$\begin{aligned} P_{pr}(\varvec{\mu }_\varphi ,\varvec{\mu }_g|m)&\propto \prod \limits _{i=1}^{m}\prod \limits _{k=1}^{d} \sqrt{\frac{m}{2\pi s_k }} \exp {\left( -\frac{m\alpha _{ki}^2}{2 s_k } \right) } \\ &\quad \times \prod \limits _{i=1}^{m}\prod \limits _{j=1}^{ld} \sqrt{\frac{1}{2\pi \sigma _\omega ^2}} \exp {\left( -\frac{\omega _{ji}^2}{2\sigma _\omega ^2} \right) } \times \quad \prod \limits _{i=1}^{m}\prod \limits _{j=1}^{\dim {\mathbf {f}}} \sqrt{\frac{1}{2\pi \sigma _\nu ^2}} \exp {\left( -\frac{\nu _{ji}^2}{2\sigma _\nu ^2} \right) } \\ &\quad\times \prod \limits _{i=1}^{m} \sqrt{\frac{1}{2\pi \sigma _\gamma ^2}} \exp {\left( -\frac{\gamma _{i}^2}{2\sigma _\gamma ^2} \right) }. \end{aligned}$$

(23)

Here $s_k=\frac{1}{N-1}\sum \nolimits _{n=2}^{N}(p_{k,n}-p_{k,n-1})^2$, $\sigma _\omega ^2=2Nd/\sum \nolimits _{n=1}^{N}|{\mathbf {p}}_{n}|^2$, $\sigma _\nu ^2=2N\dim {\mathbf {f}}/\sum \nolimits _{n=1}^{N}|{\mathbf {f}}_{n}|^2$, $\sigma _\gamma ^2=2\cdot (ld+\dim {\mathbf {f}})$. In fact, in the case of EOF-ANN model the time series ${\mathbf {p}}_n$ were used with their variances, while in the case of LDM-ANN model ${\mathbf {p}}_n$ were normalized to unity variance because their variance does not keep any information about the system. The time series of forcing were always normalized to equal variances.

In the notation defined above, the expression for the probability of generating the time series ${\mathbf {p}}$ by the model is the following:

$$\begin{aligned} & P({\mathbf {p}}_{1},\ldots ,{\mathbf {p}}_N |\varvec{\mu }_\varphi ,\varvec{\mu }_g,m,l) \\ &\quad =\prod \limits _{n=1}^{l}\prod \limits _{k=1}^{d}\frac{1}{\sqrt{2\pi \lambda _k}}\exp {\left( -\frac{p_{k,n}^2}{2 \lambda _k } \right) } \\ &\qquad \times\prod \limits _{n=l+1}^{N}\frac{1}{\sqrt{\det (2\pi \hat{{\mathbf {g}}}\hat{{\mathbf {g}}}^T)}}\exp {\left( -\frac{{\mathbf {h}}_n^T (\hat{{\mathbf {g}}}\hat{{\mathbf {g}}}^T)^{-1} {\mathbf {h}}_n}{2} \right) },\\ {\mathbf {h}}_n &= {\mathbf {p}}_{n}-{\mathbf {p}}_{n-1}-\varvec{\varphi }\big ({\mathbf {p}}_{n-1},\ldots ,{\mathbf {p}}_{n-l};\,\, {\mathbf {f}}_n;t_n\big ). \end{aligned}$$

(24)

Here $\lambda _k=\frac{1}{N}\sum \nolimits _{n=1}^{N}p_{k,n}^2$. Note, in the expression (24) the initializing values ${\mathbf {p}}_{1},\ldots ,{\mathbf {p}}_{l}$ were treated as a model parameters via the first term (see Gavrilov et al. 2017 for detailed explanation).

The further procedure of model construction is similar to the above-described procedure for LDM construction. To learn the model parameters we maximize the Bayesian posterior PDF which can be expressed as follows:

$$\begin{aligned} \begin{array}{l} P(\varvec{\mu }_\varphi ,\varvec{\mu }_g | {\mathbf {p}}_{1},\ldots ,{\mathbf {p}}_N,m,l)\propto \\ ~~~~~~~~ P({\mathbf {p}}_{1},\ldots ,{\mathbf {p}}_N |\varvec{\mu }_\varphi ,\varvec{\mu }_g,m,l)\times P_{pr}(\varvec{\mu }_\varphi ,\varvec{\mu }_g|m). \end{array} \end{aligned}$$

(25)

To find optimal hyperparameters (l, m) we maximize marginal likelihood:

$$\begin{aligned} & P( {\mathbf {p}}_{1},\ldots ,{\mathbf {p}}_N | m,l )\\ &\quad = \int \limits _{\varvec{\mu }_\varphi ,\varvec{\mu }_g}P({\mathbf {p}}_{1},\ldots ,{\mathbf {p}}_N |\varvec{\mu }_\varphi ,\varvec{\mu }_g,m,l)\times P_{pr}(\varvec{\mu }_\varphi ,\varvec{\mu }_g|m). \end{aligned}$$

(26)

The particular algorithm for learning and optimization using (25) and (26) is the same as described in Gavrilov et al. (2017).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gavrilov, A., Seleznev, A., Mukhin, D. et al. Linear dynamical modes as new variables for data-driven ENSO forecast. Clim Dyn 52, 2199–2216 (2019). https://doi.org/10.1007/s00382-018-4255-7

Download citation

Received: 22 January 2018
Accepted: 06 May 2018
Published: 16 May 2018
Issue Date: 15 February 2019
DOI: https://doi.org/10.1007/s00382-018-4255-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Linear dynamical modes as new variables for data-driven ENSO forecast

Abstract

Access this article

Similar content being viewed by others

A pre-whitening with block-bootstrap cross-correlation procedure for temporal alignment of data sampled by eddy covariance systems

Machine Learning Strategies for Time Series Forecasting

A multi-scale adaptive grid partition method based on two-dimensional Fourier transform for ZTD

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: LDM decomposition

Appendix 2: Bayesian approach to evolution operator construction

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Linear dynamical modes as new variables for data-driven ENSO forecast

Abstract

Access this article

Similar content being viewed by others

A pre-whitening with block-bootstrap cross-correlation procedure for temporal alignment of data sampled by eddy covariance systems

Machine Learning Strategies for Time Series Forecasting

A multi-scale adaptive grid partition method based on two-dimensional Fourier transform for ZTD

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: LDM decomposition

Appendix 2: Bayesian approach to evolution operator construction

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation