Abstract
A new data-driven model for analysis and prediction of spatially distributed time series is proposed. The model is based on a linear dynamical mode (LDM) decomposition of the observed data which is derived from a recently developed nonlinear dimensionality reduction approach. The key point of this approach is its ability to take into account simple dynamical properties of the observed system by means of revealing the system’s dominant time scales. The LDMs are used as new variables for empirical construction of a nonlinear stochastic evolution operator. The method is applied to the sea surface temperature anomaly field in the tropical belt where the El Nino Southern Oscillation (ENSO) is the main mode of variability. The advantage of LDMs versus traditionally used empirical orthogonal function decomposition is demonstrated for this data. Specifically, it is shown that the new model has a competitive ENSO forecast skill in comparison with the other existing ENSO models.
Similar content being viewed by others
Notes
NOAA_ERSST_V4 data provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their Web site at http://www.esrl.noaa.gov/psd/.
References
Alexander MA, Bladé I, Newman M, Lanzante JR, Lau NC, Scott JD (2002) The atmospheric bridge: the influence of ENSO teleconnections on air-sea interaction over the global oceans. J Clim 15(16):2205–2231. https://doi.org/10.1175/1520-0442(2002)015<2205:TABTIO>2.0.CO;2
Barnston AG, Chelliah M, Goldenberg SB (1997) Documentation of a highly ENSO related sst region in the equatorial Pacific: research note. Atmos Ocean 35(3):367–383. https://doi.org/10.1080/07055900.1997.9649597
Barnston AG, Tippett MK, L’Heureux ML, Li S, Dewitt DG (2012) Skill of real-time seasonal ENSO model predictions during 2002–11: is our capability increasing? Bull Am Meteorol Soc 93(5):631–651. https://doi.org/10.1175/BAMS-D-11-00111.1
Berliner LM, Wikle CK, Cressie N (2000) Long-lead prediction of Pacific SSTs via Bayesian dynamic modeling. J Clim 13(22):3953–3968. https://doi.org/10.1175/1520-0442(2001)013<3953:LLPOPS>2.0.CO;2
Bjerknes J (1969) Atmospheric teleconnections from the equatorial Pacific. Mon Weather Rev 97(3):163–172. https://doi.org/10.1175/1520-0493(1969)097<0163:ATFTEP>2.3.CO;2
Chekroun MD, Kondrashov D (2017) Data-adaptive harmonic spectra and multilayer Stuart–Landau models. Chaos Interdiscip J Nonlinear Sci 27(9):093110. https://doi.org/10.1063/1.4989400
Coifman RR, Lafon S (2006) Diffusion maps. Appl Comput Harmon Anal 21(1):5–30. https://doi.org/10.1016/j.acha.2006.04.006
de la Iglesia MD, Tabak EG (2013) Principal dynamical components. Commun Pure Appl Math 66(1):48–82. https://doi.org/10.1002/cpa.21411. arXiv:1012.3963v1
DelSole T, Tippett MK (2009b) Average predictability time. Part II: seamless diagnoses of predictability on multiple time scales. J Atmos Sci 66(5):1188–1204. https://doi.org/10.1175/2008JAS2869.1
DelSole T, Tippett MK (2009a) Average predictability time. Part I: theory. J Atmos Sci 66(5):1172–1187. https://doi.org/10.1175/2008JAS2868.1
Dong D, McAvoy T (1996) Nonlinear principal component analysis–based on principal curves and neural networks. Comput Chem Eng 20(1):65–78. https://doi.org/10.1016/0098-1354(95)00003-K
Gámez AJ, Zhou CS, Timmermann A, Kurths J (2004) Nonlinear dimensionality reduction in climate data. Nonlinear Process Geophys 11(3):393–398. https://doi.org/10.5194/npg-11-393-2004
Gavrilov A, Mukhin D, Loskutov E, Volodin E, Feigin A, Kurths J (2016) Method for reconstructing nonlinear modes with adaptive structure from multidimensional data. Chaos Interdiscip J Nonlinear Sci 26(12):123101. https://doi.org/10.1063/1.4968852
Gavrilov A, Loskutov E, Mukhin D (2017) Bayesian optimization of empirical model with state-dependent stochastic forcing. Chaos Solitons Fract 104:372. https://doi.org/10.1016/j.chaos.2017.08.032
Ghil M, Allen MR, Dettinger MD, Ide K, Kondrashov D, Mann ME, Robertson AW, Saunders A, Tian Y, Varadi F, Yiou P (2002) Advanced spectral methods for climatic time series. Rev Geophys 40(1):1003. https://doi.org/10.1029/2000RG000092
Grieger B, Latif M (1994) Reconstruction of the El Niño attractor with neural networks. Clim Dyn 10:267–276
Guckenheimer J, Timmermann A, Dijkstra H, Roberts A (2017) (Un)predictability of strong El Niño events. Dyn Stat Clim Syst. https://doi.org/10.1093/climsys/dzx004
Hannachi A, Jolliffe IT, Stephenson DB (2007) Empirical orthogonal functions and related techniques in atmospheric science: a review. Int J Climatol 27(9):1119–1152. https://doi.org/10.1002/joc.1499
Hasselmann K (1988) PIPs and POPs: the reduction of complex dynamical systems using principal interaction and oscillation patterns. J Geophys Res 93(D9):11015. https://doi.org/10.1029/JD093iD09p11015
Hastie T (1984) Principal curves and surfaces. Ph.D Dissertation. PhD thesis, Stanford Linear Accelerator Center, Stanford University. http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-r-276.pdf. Accessed 29 May 2015
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Huang B, Banzon VF, Freeman E, Lawrimore J, Liu W, Peterson TC, Smith TM, Thorne PW, Woodruff SD, Zhang HM, Huang B, Banzon VF, Freeman E, Freeman E, Lawrimore J, Liu W, Peterson TC, Smith TM, Thorne PW, Woodruff SD, Zhang HM (2015) Extended reconstructed sea surface temperature version 4 (ERSST.v4). Part I: upgrades and intercomparisons. J Clim 28(3):911–930. https://doi.org/10.1175/JCLI-D-14-00006.1
Jeffreys H (1998) Theory of probability. Clarendon.https://global.oup.com/academic/product/the-theory-of-probability-9780198503682?cc=ru&lang=en&. Accessed 9 May 2017
Johnson SD, Battisti DS, Sarachik ES (2000) Seasonality in an empirically derived Markov model of tropical Pacific sea surface temperature anomalies. J Clim 13(18):3327–3335. https://doi.org/10.1175/1520-0442(2000)013<3327:SIAEDM>2.0.CO;2
Jolliffe IT (1986) Principal component analysis. Springer series in statistics, 2nd edn. Springer, New York. https://doi.org/10.1007/978-1-4757-1904-8
Kao HY, Yu JY (2009) Contrasting eastern-Pacific and central-Pacific types of ENSO. J Clim 22(3):615–632. https://doi.org/10.1175/2008JCLI2309.1
Kessler WS (2002) Is ENSO a cycle or a series of events? Geophys Res Lett 29(23):40. https://doi.org/10.1029/2002GL015924
Kondrashov D, Kravtsov D, Robertson AW, Ghil M (2015) A hierarchy of data-based ENSO models. J Clim 18(21):4425–4444. https://doi.org/10.1175/JCLI3567.1
Kramer MA (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE 37(2):233–243. https://doi.org/10.1002/aic.690370209
Kravtsov S (2012) An empirical model of decadal ENSO variability. Clim Dyn 39(9–10):2377–2391. https://doi.org/10.1007/s00382-012-1424-y
Kravtsov S, Kondrashov D, Ghil M (2005) Multilevel regression modeling of nonlinear processes: derivation and applications to climatic variability. J Clim 18:4404–4424
Kravtsov S, Kondrashov D, Ghil M (2009) Empirical model reduction and the modeling hierarchy in climate dynamics. In: Palmer T, Williams P (eds) Stochastic physics and climate modelling. Cambridge University Press, Cambridge, pp 35–72
Kwasniok F (1996) The reduction of complex dynamical systems using principal interaction patterns. Phys D Nonlinear Phenom 92(1–2):28–60. https://doi.org/10.1016/0167-2789(95)00280-4
Kwasniok F (1997) Optimal Galerkin approximations of partial differential equations using principal interaction patterns. Phys Rev 55(5):5365–5375. https://doi.org/10.1103/PhysRevE.55.5365
Kwasniok F (2007) Reduced atmospheric models using dynamically motivated basis functions. J Atmos Sci 64(10):3452–3474. https://doi.org/10.1175/JAS4022.1
Liu W, Huang B, Thorne PW, Banzon VF, Zhang HM, Freeman E, Lawrimore J, Peterson TC, Smith TM, Woodruff SD (2015) Extended reconstructed sea surface temperature version 4 (ERSST.v4): part II. Parametric and structural uncertainty estimations. J Clim 28(3):931–951
Loskutov EM, Molkov YI, Mukhin DN, Feigin AM (2008) Markov chain Monte Carlo method in Bayesian reconstruction of dynamical systems from noisy chaotic time series. Phys Rev E Stat Nonlinear Soft Matter Phys 77(6):066214. https://doi.org/10.1103/PhysRevE.77.066214
Maimon O, Rokach L (2010) Data mining and knowledge discovery handbook. Springer, New York
Molkov YI, Mukhin DN, Loskutov EM, Feigin AM, Fidelin GA (2009) Using the minimum description length principle for global reconstruction of dynamic systems from noisy time series. Phys Rev E Stat Nonlinear Soft Matter Phys 80(4):046207. https://doi.org/10.1103/PhysRevE.80.046207
Molkov YI, Mukhin DN, Loskutov EM, Timushev RI, Feigin AM (2011) Prognosis of qualitative system behavior by noisy, nonstationary, chaotic time series. Phys Rev E 84(3):036215. https://doi.org/10.1103/PhysRevE.84.036215
Molkov YI, Loskutov EM, Mukhin DN, Feigin AM (2012) Random dynamical models from time series. Phys Rev E 85(3):036216. https://doi.org/10.1103/PhysRevE.85.036216
Mukhin DN, Feigin AM, Loskutov EM, Molkov YI (2006) Modified Bayesian approach for the reconstruction of dynamical systems from time series. Phys Rev E Stat Nonlinear Soft Matter Phys 73(3):036211. https://doi.org/10.1103/PhysRevE.73.036211
Mukhin D, Gavrilov A, Feigin A, Loskutov E, Kurths J (2015a) Principal nonlinear dynamical modes of climate variability. Sci Rep 5:15510. https://doi.org/10.1038/srep15510
Mukhin D, Kondrashov D, Loskutov E, Gavrilov A, Feigin A, Ghil M (2015b) Predicting critical transitions in ENSO models. Part II: spatially dependent models. J Clim 28(5):1962–1976. https://doi.org/10.1175/JCLI-D-14-00240.1
Mukhin D, Loskutov E, Mukhina A, Feigin A, Zaliapin I, Ghil M (2015c) Predicting critical transitions in ENSO models. Part I: methodology and simple models with memory. J Clim 28(5):1940–1961. https://doi.org/10.1175/JCLI-D-14-00239.1
Mukhin D, Gavrilov A, Loskutov E, Feigin A, Kurths J (2017) Nonlinear reconstruction of global climate leading modes on decadal scales. Clim Dyn. https://doi.org/10.1007/s00382-017-4013-2
Penland C, Magorian T (1993) Prediction of Niño 3 sea surface temperatures using linear inverse modeling. J Clim 6(6):1067–1076. https://doi.org/10.1175/1520-0442(1993)006<1067:PONSST>2.0.CO;2
Penland C, Sardeshmukh PD (1995) The optimal growth of tropical sea surface temperature anomalies. J Clim 8(8):1999–2024. https://doi.org/10.1175/1520-0442(1995)008<1999:TOGOTS>2.0.CO;2
Pires CAL, Hannachi A (2017) independent subspace analysis of the sea surface temperature variability: non-Gaussian sources and sensitivity to sampling and dimensionality. Complexity 2017:1–23. https://doi.org/10.1155/2017/3076810
Pires CAL, Ribeiro AFS (2016) Separation of the atmospheric variability into non-Gaussian multidimensional sources by projection. Clim Dyn 48:1–30. https://doi.org/10.1007/s00382-016-3112-9
Preisendorfer R (1988) Principal component analysis in meteorology and oceanography. Elsevier, London
Rasmusson EM, Carpenter TH (1982) Variations in tropical sea surface temperature and surface wind fields associated with the Southern oscillation/El Niño. Mon Weather Rev 110(5):354–384. https://doi.org/10.1175/1520-0493(1982)110<0354:VITSST>2.0.CO;2
Rossi V, Vila JP (2006) Bayesian multioutput feedforward neural networks comparison: a conjugate prior approach. IEEE Trans Neural Netw 17(1):35–47. https://doi.org/10.1109/TNN.2005.860883
Saha S, Moorthi S, Wu X, Wang J, Nadiga S, Tripp P, Behringer D, Hou YT, Chuang HY, Iredell M, Ek M, Meng J, Yang R, Mendez MP, van den Dool H, Zhang Q, Wang W, Chen M, Becker E (2014) The NCEP climate forecast system version 2. J Clim 27(6):2185–2202. https://doi.org/10.1175/JCLI-D-12-00823.1
Scholkopf B, Smola A, Muller KR (1998) Nonlinear component analysis as a Kernel Eigenvalue problem. Neural Comp 10(5):1299–1319. https://doi.org/10.1162/089976698300017467
Strounine K, Kravtsov S, Kondrashov D, Ghil M (2010) Reduced models of atmospheric low-frequency variability: parameter estimation and comparative performance. Phys D Nonlinear Phenom 239:145–166
Suarez MJ, Schopf PS (1988) A delayed action oscillator for ENSO. J Atmos Sci 45(21):3283–3287. https://doi.org/10.1175/1520-0469(1988)045<3283:ADAOFE>2.0.CO;2
Tan S, Mayrovouniotis ML (1995) Reducing data dimensionality through optimizing neural network inputs. AIChE J 41(6):1471–1480. https://doi.org/10.1002/aic.690410612
Tippett MK, Barnston AG, Li S (2012) Performance of recent multimodel ENSO forecasts. J Appl Meteorol Climatol 51(3):637–654. https://doi.org/10.1175/JAMC-D-11-093.1
Trenberth KE (1997) The definition of El Niño. Bull Am Meteorol Soc 78(12):2771–2777. https://doi.org/10.1175/1520-0477(1997)078<2771:TDOENO>2.0.CO;2
Vejmelka M, Pokorná L, Hlinka J, Hartman D, Jajcay N, Paluš M (2015) Non-random correlation structures and dimensionality reduction in multivariate climate data. Clim Dyn 44(9–10):2663–2682. https://doi.org/10.1007/s00382-014-2244-z
Wang C, Deser C, Yu JY, DiNezio P, Clement A (2017) El Niño and southern oscillation (ENSO): a review. Springer, Dordrecht, pp 85–106. https://doi.org/10.1007/978-94-017-7499-4-4
Wu A, Hsieh WW, Tang B (2006) Neural network forecasts of the tropical Pacific sea surface temperatures. Neural Netw 19(2):145–154. https://doi.org/10.1016/j.neunet.2006.01.004
Wyrtki K (1975) El Niño The dynamic response of the equatorial Pacific ocean to atmospheric forcing. J Phys Oceanogr 5(4):572–584. https://doi.org/10.1175/1520-0485(1975)005<0572:ENTDRO>2.0.CO;2
Xin X, Gao F, Wei M, Wu T, Fang Y, Zhang J (2017) Decadal prediction skill of BCC-CSM1.1 climate model in East Asia. Int J Climatol 38:584. https://doi.org/10.1002/joc.5195
Xue Y, Leetmaa A, Ji M (2000) ENSO prediction with Markov models: the impact of sea level. J Clim 13(4):849–871. https://doi.org/10.1175/1520-0442(2000)013<0849:EPWMMT>2.0.CO;2
Zebiak SE, Cane MA (1987) A Model El Niñ southern oscillation. Mon Weather Rev 115(10):2262–2278. https://doi.org/10.1175/1520-0493(1987)115<2262:AMENO>2.0.CO;2
Zhang G, Kline D (2007) Quarterly time-series forecasting with neural networks. IEEE Trans Neural Netw 18(6):1800–1814. https://doi.org/10.1109/TNN.2007.896859. http://ieeexplore.ieee.org/document/4359174/
Acknowledgements
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: LDM decomposition
Here we describe LDM decomposition which is a subcase of more general NDM decomposition developed essentially in Gavrilov et al. (2016), Mukhin et al. (2015a) and Mukhin et al. (2017).
LDM expression At first, in accordance with Sec.IIB1 of Gavrilov et al. (2016), the initial data \({\mathbf {x}}=({\mathbf {x}}_1,\ldots ,{\mathbf {x}}_N)\) is preliminarily rotated to full traditional EOF basis:
Here \({\mathbf {x}}_n \in \mathbb {R}^D\) is spatially distributed data measured at the time moment n (D is equal to the number of nodes in spatial grid), \({\mathbf {y}}_n \in \mathbb {R}^D\) is a vector of EOF-based PCs, \({\mathbf {V}}\) is a full \(D \times D\) rotation matrix whose columns are D EOFs obtained via the diagonalization of the covariance matrix \(\frac{1}{N}\sum \nolimits _{n=1}^{N}{\mathbf {x}}_n\cdot {\mathbf {x}}_n^T\) and ordered so that the corresponding eigenvalues \(\lambda _1,\ldots \lambda _D\) decrease.
After, the model of LDM is constructed in the space of K leading PCs:
Here \({\mathbf {p}}_n \in \mathbb {R}^d\) is a vector of LDM-based PCs, \(K \times d\) matrix \({\mathbf {A}}\) and \(K-\)dimensional vector \({\mathbf {c}}\) are the parameters of LDM, \(\varvec{\xi }_{1,n},\ldots ,\varvec{\xi }_{K,n}\) are delta-correlated random normal processes and \(\sigma\) is their amplitude.
By direct analogy with geometrical interpretation of NDMs (Sect. IIA of Gavrilov et al. (2016)) the LDM defines a \(d-\)dimensional linear manifold in the space spanned by K leading EOFs (we assume that \(d \le K\)), and the time series \({\mathbf {p}}_{1},\ldots ,{\mathbf {p}}_{N}\) are the values of coordinates on this manifold. The number K is a hyperparameter which is optimized in the algorithm. To do it correctly all the other PCs are modeled as a noise:
Here \(\varvec{\xi }_{K+1,n},\ldots ,\varvec{\xi }_{D,n}\) are delta-correlated random normal processes and \(\sigma _{K+1},\ldots ,\sigma _D\) are their amplitudes (not to be confused with \(\sigma _k\) in Sect. 2.2.2).
Taking into account Eqs. (12)–(14) we can rewrite the LDM as \({\mathbf {L}}({\mathbf {p}}_n)\) introduced in the Eq. (3):
LDM learning As was mentioned in Sect. 2.1.2, to estimate both the values \({\mathbf {A}}\), \({\mathbf {c}}\), \(\varvec{\varsigma }=(\sigma ;\sigma _{K+1},\ldots ,\sigma _{D})\) and the time series \({\mathbf {p}}=({\mathbf {p}}_1,\ldots ,{\mathbf {p}}_N)\) in the described model we use probabilistic Bayesian approach (Jeffreys 1998). According to this approach, all the unknown parameters \({\mathbf {A}}\), \({\mathbf {c}}\), \(\varvec{\varsigma }\), \({\mathbf {p}}\) are assigned the prior posterior PDF and then they are learned by the maximization of the posterior PDF which updates the prior PDFs using the observations \({\mathbf {x}}\) and the model (3). The choice of the prior PDFs is explained in great detail in Appendix A of Gavrilov et al. (2016) for the case of one-dimensional NDMs which is trivially generalized for the case of multidimensional manifold (Mukhin et al. 2017). Here we present these PDFs rewritten for the special case of LDMs omitting the detailed explanation.
Expression for the prior PDF for PCs introduced in the main text is as follows:
Actually, this PDF corresponds to prior assumption that the time series \(p_{i1},\ldots ,p_{iN}\) can be produced by the simplest linear stochastic evolution operator (red noise) with autocorrelation time \(\tau _i\), thus taking into account dynamical information about connection of the successive states with each other (see also Appendix A1 of Gavrilov et al. 2016). The vector \(\varvec{\tau }\) is treated as a hyperparameter (see below), and no specific apriori knowledge is assumed about it.
One should keep in mind that the \(\tau _i\)’s only correspond to autocorrelation times of the a priori assumed PCs (prior PDF (16)). However, this prior PDF is updated by the observations, in consistence with the Bayesian framework. As a result, the \(\tau _i\)’s are not exactly equal to the autocorrelation times of the obtained PCs. In practice, they lead to a restriction on the smallest scale of each PC, i.e. filtering out all scales faster than some value related to \(\tau _i\).
The prior PDF for LDM coefficients \({\mathbf {A}}\) and \({\mathbf {c}}\) is taken in order to complement the PDF (16):
It is derived directly from Appendix A2 of Gavrilov et al. (2016) and provides the uniqueness of variances of PCs (i.e. normalization).
The prior PDF for \(\varvec{\varsigma }\) is simply taken as constant because the Bayesian problem is well-posed by \(\varvec{\varsigma }\):
Thus, the Bayesian posterior PDF is expressed via the prior PDFs in a standard way:
Here the likelihood of the LDM \(P({\mathbf {x}}|{\mathbf {A}},{\mathbf {c}},\varvec{\varsigma },{\mathbf {p}},\varvec{\tau },K)\) is the only function depending on the observation. It is written from the Gaussianity assumption of the noise in our LDM model (12)–(15):
The parameters \({\mathbf {A}}\), \({\mathbf {c}}\), \(\varvec{\varsigma }\), \({\mathbf {p}}\) of LDM are found by maximization of (19). Note that the exponential factors in (20) are quadratic by \({\mathbf {p}}\) as well as by coefficients \({\mathbf {A}}\), \({\mathbf {c}}\), and the number of the coefficients is proportional to \(d+1\), while in case of general NDM parameterized by nonlinear polynomial the exponential factor would be a polynomial of higher order by \({\mathbf {p}}\), and the number of coefficients would grow dramatically with the manifold dimension d and polynomial degree m (proportional to \(C^{m}_{m+d}\)). These facts make the maximization problem well-posed (solution uniqueness) and numerically fast for LDM dimensions \(d>2\), in contrast with the case of general NDM.
LDM optimization To find the hyperparameters \((\varvec{\tau },K)\) the Bayesian evidence, or marginal likelihood, is used as a cost function. It represents the probability of the model with given hyperparameters to reproduce the observed data and is defined as follows:
Such a cost function allows us to find in Bayesian way optimal hyperparameters for which the model is neither underfitted, nor overfitted. It is explored in detail in Gavrilov et al. (2016).
To estimate (21) for different hyperparameters and find the optimal LDM we use exactly the same algorithm based on Laplace approximation as given in Gavrilov et al. (2016).
Appendix 2: Bayesian approach to evolution operator construction
Here we describe the procedure of construction of evolution operator for PCs. Generally it is analogical to the procedure described in Molkov et al. (2011), Molkov et al. (2012), Mukhin et al. (2015b) and Gavrilov et al. (2017), but with slight modifications. According to this procedure, we formulate at first the model and prior PDF for its parameters explicitly.
The evolution operator model is defined by the Eq. (5) where we parameterize the function \(\varvec{\varphi }\) by ANN in the form of perceptron with one hidden layer and hyperbolic tangent activation function (Hornik et al. 1989):
Here \({\mathbf {z}}\) stands for the collection of \({\mathbf {p}}_{n-1},\ldots ,{\mathbf {p}}_{n-l}\) from (5); \(\varvec{\alpha }_i\in \mathbb {R}^{d}\), \(\varvec{\beta }_i\in \mathbb {R}^{d}\), \(\varvec{\omega }_i\in \mathbb {R}^{ld}\), \(\varvec{\nu }_i\in \mathbb {R}^{\dim {\mathbf {f}}}\), \(\gamma _i\in \mathbb {R}\) are the ANN coefficients, let us collect them into vector \(\varvec{\mu }_{\varphi }=(\varvec{\alpha }_1,\varvec{\beta }_1,\varvec{\omega }_1,\varvec{\nu }_1,\gamma _1,\ldots ,\varvec{\alpha }_m,\varvec{\beta }_m,\varvec{\omega }_m,\varvec{\nu }_m,\gamma _m)\).
Generally, to model slow non-stationarity of the parameters of real system’s evolution operator, it is appropriate to introduce a weak time-dependence into all parameters of \(\varvec{\varphi }\). In the first-order expansion it leads to a linear dependence of \(\varvec{\varphi }\) on time t. However, for the particular case of ANN it can be readily shown that the form (22) provides a universal approximation of such an expansion for any nonlinear function of \(({\mathbf {z}},{\mathbf {f}},t)\) (Molkov et al. 2011).
The matrix \(\hat{{\mathbf {g}}}\) in the random part of the model is parameterized as lower-triangular matrix via its values \(\varvec{\mu }_{g}=(g_{11}; \ g_{21}, \ g_{22};\,\, \ldots ;g_{m1}, \ldots , \ g_{mm})\), which is sufficient for representation of an arbitrary cross-correlation matrix of random part (Mukhin et al. 2015b).
The prior PDF for ANN coefficients is derived in Molkov et al. (2011) and Molkov et al. (2012) for the case of normalized ANN inputs and outputs. Here we have multidimensional time series whose components have different norms, and this difference is important. If we take the norms into account, it is easy to write the following modified expression for the prior PDF for the model parameters using the expressions from (Molkov et al. 2011, 2012):
Here \(s_k=\frac{1}{N-1}\sum \nolimits _{n=2}^{N}(p_{k,n}-p_{k,n-1})^2\), \(\sigma _\omega ^2=2Nd/\sum \nolimits _{n=1}^{N}|{\mathbf {p}}_{n}|^2\), \(\sigma _\nu ^2=2N\dim {\mathbf {f}}/\sum \nolimits _{n=1}^{N}|{\mathbf {f}}_{n}|^2\), \(\sigma _\gamma ^2=2\cdot (ld+\dim {\mathbf {f}})\). In fact, in the case of EOF-ANN model the time series \({\mathbf {p}}_n\) were used with their variances, while in the case of LDM-ANN model \({\mathbf {p}}_n\) were normalized to unity variance because their variance does not keep any information about the system. The time series of forcing were always normalized to equal variances.
In the notation defined above, the expression for the probability of generating the time series \({\mathbf {p}}\) by the model is the following:
Here \(\lambda _k=\frac{1}{N}\sum \nolimits _{n=1}^{N}p_{k,n}^2\). Note, in the expression (24) the initializing values \({\mathbf {p}}_{1},\ldots ,{\mathbf {p}}_{l}\) were treated as a model parameters via the first term (see Gavrilov et al. 2017 for detailed explanation).
The further procedure of model construction is similar to the above-described procedure for LDM construction. To learn the model parameters we maximize the Bayesian posterior PDF which can be expressed as follows:
To find optimal hyperparameters (l, m) we maximize marginal likelihood:
The particular algorithm for learning and optimization using (25) and (26) is the same as described in Gavrilov et al. (2017).
Rights and permissions
About this article
Cite this article
Gavrilov, A., Seleznev, A., Mukhin, D. et al. Linear dynamical modes as new variables for data-driven ENSO forecast. Clim Dyn 52, 2199–2216 (2019). https://doi.org/10.1007/s00382-018-4255-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00382-018-4255-7