We use a prototypical time-delayed thermoacoustic system composed of a longitudinal acoustic cavity and a heat source modelled with a nonlinear time-delayed model [8, 12, 16], which has been used to optimize ergodic averages in [8] with a dynamical systems approach. The non-dimensional governing equations are
$$\begin{aligned} \partial _t u + \partial _x p = 0, \quad \partial _t p + \partial _x u + \zeta p - \dot{q}\delta (x - x_f)= 0, \end{aligned}$$
(9)
where u, p, \(\zeta \) and \(\dot{q}\) are the non-dimensionalized acoustic velocity, pressure, damping and heat-release rate, respectively. \(\delta \) is the Dirac delta. These equations are discretized by using \(N_g\) Galerkin modes
$$\begin{aligned} u(x,t) = \sum \nolimits _{j=1}^{N_g} \eta _j(t)\cos (j \pi x), \quad p(x,t) = -\sum \nolimits _{j=1}^{N_g} \mu _j(t) \sin (j \pi x), \end{aligned}$$
(10)
which results in a system of \(2N_g\) oscillators, which are nonlinearly coupled through the heat released by the heat source
$$\begin{aligned} \dot{\eta }_j - j \pi \mu _j = 0, \quad \dot{\mu }_j + j \pi \eta _j + \zeta _j \mu _j + 2 \dot{q} \sin (j \pi x_f) = 0 , \end{aligned}$$
(11)
where \(x_f=0.2\) is the heat source location and \(\zeta _j = 0.1 j + 0.06 j^{1/2}\) is the modal damping [8]. The heat release rate, \(\dot{q}\), is given by the modified King’s law [8], \(\dot{q}(t) = \beta [ \left( 1+u(x_f, t-\tau )\right) ^{1/2} - 1 ]\), where \(\beta \) and \(\tau \) are the heat release intensity parameter and the time delay, respectively. With the nomenclature of Sect. 2, \(\varvec{y}(n) = (\eta _1; \dots ; \eta _{N_g}; \mu _1 ; \dots ; \mu _{N_g})\). Using 10 Galerkin modes (\(N_g=10\)), \(\beta =7.0\) and \(\tau =0.2\) results in a chaotic motion (Fig. 2), with the leading Lyapunov exponent being \(\lambda _1 \approx 0.12\) [8]. (The leading Lyapunov exponent measures the rate of (exponential) separation of two close initial conditions, i.e. an initial separation \(||\varvec{\delta u}_0||\) grows asymptotically like \(||\varvec{\delta u}_0|| e^{\lambda _1 t}\).) However, for the same choice of parameter values, the solution with \(N_g=1\) is a limit cycle (i.e. a periodic solution).
The echo state network is trained on data generated with \(N_g=10\), while the physical knowledge (ROM in Fig. 1) is generated with \(N_g=1\) only. We wish to predict the time average of the instantaneous acoustic energy,
$$\begin{aligned} E_{ac}(t)=\int _0^1 \frac{1}{2}(u^2 + p^2) \, dx, \end{aligned}$$
(12)
which is a relevant metric in the optimization of thermoacoustic systems [8]. The reservoir is composed of 100 units, a modest size, half of which receive their input from \(\varvec{u}\), while the other half receives it from the output of the ROM, \( \hat{\varvec{y}}_\text {ROM}\). The entries of \(\varvec{W}_\mathrm {in}\) are randomly generated from the uniform distribution \(\text {unif}(-\sigma _\mathrm {in}, \sigma _\mathrm {in})\), where \(\sigma _\mathrm {in} = 0.2\). The matrix \(\varvec{W}\) is highly sparse, with only 3% of non-zero entries from the uniform distribution \(\text {unif}(-1, 1)\). Finally, \(\varvec{W}\) is scaled such that its spectral radius, \(\rho \), is 0.1 and 0.3 for the ESN and the hESN, respectively. The time step is \(\varDelta t = 0.01\). The network is trained for \(N_t = 5000\) units, which corresponds to 6 Lyapunov times, i.e. \(6\lambda _1^{-1}\). The data is generated by integrating Eq. (11) in time with \(N_g=10\), resulting in \(N_u = N_y = 20\). In the hESN, the ROM is obtained by integrating the same equations, but with \(N_g=1\) (one Galerkin mode only) unless otherwise stated. Ridge regression is performed with \(\gamma =10^{-7}\). The values of the hyperparameters are taken from the literature [5, 14] and a grid search, which, while not the most efficient, is well suited when there are few hyperparameters, such as this work’s ESN architecture.
On the one hand, Fig. 3a shows the instantaneous error of the first modes of the acoustic velocity and pressure \((\eta _1; \mu _1)\) for the ESN, hESN and ROM. None of these can accurately predict the instantaneous state of the system. On the other hand, Fig. 3b shows the error of the prediction of the average acoustic energy. Once again, the ROM alone does a poor job at predicting the statistics of the system, with an error of 50%. This should not come at a surprise since, as discussed previously, the ROM does not even produce a chaotic solution. The ESN, trained on data only, performs marginally better, with an error of 48%. In contrast, the hESN predicts the time-averaged acoustic energy satisfactorily, with an error of about 7%. This is remarkable, since both the ESN and the ROM do a poor job at predicting the average acoustic energy. However, when the ESN is combined with prior knowledge from the ROM, the prediction becomes significantly better. Moreover, while the hESN’s error still decreases at the end of the prediction period, \(t=250\), which is 5 times the training data time, the ESN and the ROM stabilize much earlier, at a time similar to that of the training data. This result shows that complementing the ESN with a cheap physical model (only 10% the number of degrees of freedom of the full system) can greatly improve the accuracy of the predictions, with no need for more data or neurons. Figure 3c shows the relative error as a function of the number of Galerkin modes in the ROM, which is a proxy for the quality of the model. For each \(N_g\), we take the median of 16 reservoir realizations. As expected, as the quality of the model increases, so does the quality of the prediction. This effect is most noticeable from \(N_g=1\) to 4, with the curve presenting diminishing returns. The downside of increasing \(N_g\) is obviously the increase in computational cost. At \(N_g=10\), the original system is recovered. However, the error does not tend exactly to 0 because \(\varvec{W}_\mathrm {out}\) can not combine the ROM’s output only (i.e. 0 entries for reservoir nodes) due to: i) the regularization factor in ridge regression that penalizes large entries; ii) numerical error. This graph further strengthens the point previously made that cheap physical models can greatly improve the prediction of physical systems with data techniques.
We stress that the optimal values of hyperparameters for a certain set of physical parameters, e.g. \((\beta _1, \tau _1)\), might not be optimal for a different set of physical parameters \((\beta _2, \tau _2)\). This should not be surprising, since different physical parameters will result in different attractors. For example, Fig. 4 shows that changing the physical parameters from \((\beta =7.0, \tau =0.2)\) to \((\beta =6.0, \tau =0.3)\) results in a change of type of attractor from chaotic to limit cycle. For the hESN to predict the limit cycle, the value of \(\sigma _\mathrm {in}\) must change from 0.2 to 0.03 Thus, if the hESN (or any deep learning technique in general) is to be used to predict the dynamics of various physical configurations (e.g. the generation of a bifurcation diagram), then it should be coupled with a robust method for the automatic selection of optimal hyperparameters [1], with a promising candidate being Bayesian optimization [15, 17].