Abstract
Predicting the evolution of dynamics from a given trajectory history of an unknown system is an important and challenging problem. This paper presents a modelfree method of forecasting unknown chaotic systems through reconstructing vector fields from noisy measured data via an adaptation of optimal control methods. This technique is also applicable to partially observed systems using a Takens delay embedding approach. The algorithms are validated on the Lorenz system and the fourdimensional hyperchaotic Rössler system, and demonstrate successful predictions well beyond the Lyapunov timescale. It is found that for small datasets or datasets with large levels of noise, the prediction accuracy of partially observed systems approaches that of fully observed systems. The presented approach also allows the modelfree assessment of local predictability on the attractor by evolving initial condition density through the reconstructed vector fields via estimation of the transfer operator. The method is compared to predictions made by an imperfect model which highlights the utility of modelfree approaches when the only available models have significant model error. The capability of this method for reconstruction of continuous and global vector fields may be applied to model validation, forecasting of initial conditions not in the training set, and modelfree filtering.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The ability to predict unknown systems exhibiting chaotic behaviour is important for many natural sciences such as meteorology (Lorenz 1963, 1969; PérezMuñuzuri and Gelpi 2000), fluid mechanics (Takens 1981; Brunton et al. 2016), and ecology (Sugihara and May 1990; Sugihara et al. 2012). These systems are sometimes described by simple dynamical systems even if their time series trajectories seem complex. The corruption from measurement noise, sensitivity to initial conditions, and partial measurement of system variables present additional difficulties for the prediction and control of such systems. Uncovering the underlying structure of a chaotic system would allow greater understanding of the dynamics as well as presenting methods for forecasting and control.
Many techniques for prediction rely on developing models based on collected data and using these to forecast into the future. These models may be derived through first principles analysis or datadriven model derivation techniques such as with genetic programming (Babovic and Keijzer 2000), sparse regression (Brunton et al. 2016), or neural networks (Raissi et al. 2019). Ad hoc assumptions are frequently used to close equation sets to produce closedform expressions, and noisy measurements may complicate structure and parameter estimation. These model errors inevitably magnify prediction errors, and therefore, forecasts may only be accurate for small timescales.
When data is available and accurate models cannot be derived, modelfree prediction allows the forecasting of chaotic variables without first deriving closedform dynamical models. There exist many approaches for modelfree prediction from time series data such as the method of analogues (Lorenz 1969; Casdagli 1989) and neural network methods (Gauthier et al. 2021; Gilpin 2020; Pathak et al. 2018). Many of these techniques perform very well in a noisefree environment, with examples such as the accurate prediction of Kuramoto–Sivashinsky dynamics up to eight Lyapunov time constants into the future (Pathak et al. 2018). These approaches typically require data smoothing when noise is present in the measured data; however, ad hoc methods of smoothing may incorrectly remove dynamics on short timescales by attenuating highfrequency components of the signal.
Vector field reconstruction techniques (Gouesbet and Letellier 1994) estimate the phase space vector field of dynamical systems by computing the temporal derivatives of a trajectory and interpolating to fill a region of space. This is usually required for prediction, in order to obtain a vector field (locally near the trajectory) to evolve a trajectory into the future. However, computing derivatives from noisy data can lead to large errors. As an alternative, this paper presents an approach to predicting chaotic systems by reconstructing the vector field of noisy trajectory data using optimal control methods (Lawden 1975, e.g.). The optimal control procedure estimates derivatives through numerical integration and is therefore more robust to noise. In the presence of large levels of noise, reconstructing the vector field in this control theoretic way allows the reconstructed system dynamics to be energy optimal. Rather than penalising high frequencies to remove noise, the amplitude of the controlled vector field is penalised. This allows short timescale dynamics to be included in the resulting discovered vector field.
For situations where all variables of an unknown system cannot be measured, we must rely on partial observations; for example, the evolution of only one system variable. Takens’ theorem (Takens 1981) presents conditions where the full attractor of a dynamical system can be embedded into a higher dimension from the measurement of this one variable. This embedded attractor preserves properties of the full attractor that do not change under smooth coordinate transforms. The measurement of one variable therefore inherits certain information of the original dynamics, and can be used to infer properties of the full system. This paper will show that methods of reconstructing vector fields for full systems can be used for partially observed systems.
An important insight from statistical mechanics is that for chaotic systems, predicting the probability density from a large number of initial conditions avoids many issues of chaotic sensitivity encountered when predicting individual trajectories (Lasota et al. 1994). Estimating the vector field of a chaotic attractor from time series data allows a connection from the trajectory view to this density view. In this work, it is shown that the density perspective can provide each trajectory prediction with an uncertainty distribution, found through evolving densities of initial conditions past final time. Rather than giving a global quantification of predictability for chaotic systems with the maximal Lyapunov exponent (Wolf et al. 1985), this approach presents a modelfree method for assessing shortterm predictability for local states of fully observed or partially observed systems. Several methods exist for evolving density distributions through a dynamical system, such as the Fokker–Planck equation (Risken and Risken 1996) or the transfer operator (Froyland et al. 2007; GonzálezTokman 2018; Balasuriya 2021; Blachut and GonzálezTokman 2020). The transfer operator, or Perron–Frobenius operator, is a solution operator of a Fokker–Planck equation (Balasuriya 2021) and provides a natural datadriven framework for the density perspective. In this paper, the transfer operator of reconstructed vector fields will be numerically approximated by Ulam matrices (Dellnitz et al. 2001) to simulate the spread of densities.
The main contributions of this work are in the derivation of algorithms for modelfree prediction which (i) estimates well beyond the expected timescale for noisy, intermediatelength data from highly sensitive (or chaotic) systems, and (ii) provides the uncertainty of the prediction via a probability density function. The first of these is achieved via a novel application of optimal control methodology—an unusual approach for prediction which we demonstrate is effective in settings of low data and high noise. The second is computed via an estimation of the transfer operator, which pushes forward uncertainty distributions. These methods may be simply adapted for partial observations to allow the prediction of trajectories and densities of systems where data only contains measurement of some system variables. The paper is organised as follows. The method of reconstructing the vector field of an unknown dynamical system from noisy data is presented in Sect. 2.1. The optimal control problem for vector field reconstruction is defined and the Pontryagin maximum principle is used to derive coupled equations that solve this problem. The algorithms for modelfree prediction from the reconstructed vector field are developed in Sects. 2.2 and 2.3. The method of recreating the full system attractor from partially observed data by delay embedding is explained in this section to enable partially observed prediction. The process of estimating the transfer operator from the reconstructed vector field is also shown. The effectiveness of these algorithms is demonstrated in Sect. 3 through predictions of simulated data from the chaotic Lorenz system and the hyperchaotic Rössler system, and through forecasting the uncertainty distributions of the Lorenz system. The robustness of the algorithms to the control parameter, amount of data, noise level, and initial condition uncertainty is then quantitatively assessed alongside the robustness of models with model error. Section 4 finally discusses limitations and extensions to the presented work.
2 Prediction and Its Uncertainty
Given time series data up to a time T, the prediction or forecasting problem is to determine how the time series evolves into the future up to a time \(T+T_f\). This is a difficult problem for several reasons:

The time series may only contain measurements of some system variables, but not all N of them; this is called a ‘partially observed system’.

The time series may not traverse regions in phase space into which it may venture in the future, and consequently there will not be sufficient information to base predictions on.

Most nonlinear systems in high dimensions exhibit chaos, where mild changes in initial conditions lead to an exponential increase in separation, and hence prediction beyond the Lyapunov timescale associated with this exponential error increase is fraught (Wolf 1986).
A schematic diagram of the prediction problem is shown in Fig. 1.
In this section, we address the prediction problem, while considering some of the difficulties outlined above. First, it is necessary to use the trajectory data until time T to construct the vector field which drives the evolution of the trajectory. Our methodology for doing this via an optimal control approach is first presented in Sect. 2.1. Next, the procedure for prediction beyond time T using the reconstructed vector field is outlined in Sect. 2.2. Next, if the available data is partially observed, i.e. if only the trajectory of one variable from a higherdimensional system is known, we propose an algorithm for prediction in Sect. 2.3. The process of quantifying the uncertainty of modelfree predictions will be described in Sect. 2.4, and error metrics which can be used for verifying the effectiveness of the process will be discussed in Sect. 2.5.
2.1 Optimal ControlBased Vector Field Reconstruction
Vector field reconstruction methods are techniques for recreating vector fields from trajectory data to learn differential equation models of the observed system. Given trajectory data from an unknown system, these methods estimate the temporal derivatives of the trajectories to then define the vector field in a region of space and time. For autonomous (timeinvariant) systems, the vector field can be described by only its variation in phase space. Vector field reconstruction has been successfully performed using polynomial fitting (Gouesbet and Letellier 1994), deep learning (Han et al. 2019), and linear combinations of basis fields (Haufe et al. 2009). In contrast, this section presents a modelfree method for vector field reconstruction in the presence of noise using an optimal control approach. We remark that we only require vector field reconstruction local to the trajectory for the purpose of evolving it in time, rather than a global vector field reconstruction which may be an end in and of itself.
In vector field reconstruction in general, the time derivative of the trajectory data must be estimated to determine the vector field along the trajectory path. If there is noise in the trajectory data, numerical differentiation will produce large errors in this derivative. In applied settings, ad hoc approaches are therefore used in choosing alternative differentiation techniques. Most commonly, preprocessing steps are used to smooth the data and decrease the level of noise in the system, allowing for simple differentiation methods to be used subsequently. Despite the existence of many sophisticated smoothing techniques, choosing appropriate smoothing parameters to minimise loss of information is challenging. Therefore, alternate methods of approximating derivatives of noisy data are required. Two common approaches for estimating derivatives are (i) to approximate the noisy data by curve fitting with differentiable functions and then computing the derivative (Knowles and Renka 2014), and (ii) to use a regularisation approach (Chartrand 2011).
Here, optimal control (Lawden 1975, e.g.) is proposed as a direct method for derivative estimation that simultaneously smooths the trajectory data and performs an energyoptimal vector field reconstruction. Optimal control is a wellestablished methodology that has been used for optimal fluid mixing (Zhang and Balasuriya 2020), optimal power management (Yu et al. 1970), and optimal trajectory smoothing (Dey and Krishnaprasad 2012). In this instance, the cost function is defined as a weighted linear sum of the trajectory tracking error and the trajectory energy, which will be described in detail later in this section. The Pontryagin maximum principle is then invoked to find pairs of controlled trajectories and control field values from the reference trajectories. The controlled trajectories will be energyoptimally smoothed, and the control field values along this trajectory will be an estimate of the derivative of the reference trajectory. This method also requires selection of an energy parameter to control the scale of attenuated noise. As the output of this approach is both a smoothed trajectory and a derivative estimate, the smoothed trajectory may be compared to the reference trajectory for validation. The generated vector field can then be used to predict the trajectory into the future.
The above procedure is now explained in detail. Suppose time series measurements are available for N system variables, which can be represented at each time t by the measured trajectory vector \({\hat{\textbf{x}}}(t)=[\hat{x}_1(t), \hat{x}_2(t), \dots ,\hat{x}_N(t)]^\textsf{T}\). Without loss of generality, assume that the time is shifted so that \( t \in [0,T] \). In realistic situations, t is discrete and potentially infrequent, and intermediate t values would need to be filled in via interpolation. It is assumed that the time evolution of the controlled trajectory \( \textbf{x} \) is governed by
for an unknown vector field \( \textbf{u} \), where the overdot denotes the tderivative. Moreover, it is assumed that the data is possibly corrupted by noise. In this section, the goal is to determine an ‘optimal’ \( \textbf{u} \) from the measurement data \( \hat{\textbf{x}} \). (Prediction from partially observed measurements—only some of the components of \( \hat{\textbf{x}} \)—is addressed in the subsequent Sect. 2.3.)
Define the true states as \(\textbf{x}(t)=[x_1(t), x_2(t), \dots ,x_N(t)]^\textsf{T}\), and the N energy parameters as \(\mu _i > 0\), for \( i = 1, \dots , N \). The approach here is to consider the true trajectory \( \textbf{x}(t) \) as being ‘controlled’ by the unknown vector field \( \textbf{u} \), and hence the problem is recast as seeking to minimise the cost function
where the diagonal matrix Q is defined by
In some settings, the \( \mu _i \) are selected to be equal; this value is then the energy parameter \(\mu \). The cost function J puts a penalty on the distance between the controlled trajectories \(\textbf{x}\) and the measured trajectories \({\hat{\textbf{x}}}\), and the squared magnitude of the control field \(\textbf{u}\) for all time. Therefore, the algorithm attempts to create a vector field that produces trajectories that match the reference dynamics when given the same initial condition, while minimising the energy of the controlled system.
The energy parameter can be tuned to accommodate varying expectations of noise in the data, with a lower \(\mu \) resulting in a smoother controlled curve. Penalising the energy of the vector field is a physically intuitive method for trajectory smoothing, avoiding methods such as penalising high frequencies (as with lowpass filters) which may be associated with relevant timescales of the inherent dynamics. The robustness of the method to the choice of energy parameter is discussed in Sect. 3.3.
The Pontryagin maximum principle (Lawden 1975) can be applied to the optimal control problem (2) to show that the controlled state trajectories \(\textbf{x}\) and costate trajectories \(\textbf{p}\) satisfy
The derivation of (4) is shown in Appendix A. When this system of equations is solved, \( \textbf{x} \) and \( \textbf{u} =  \textbf{p}/2 \) provide the required smoothed trajectory and vector field associated with the problem.
As is standard, the condition for the trajectory \( \textbf{x} \) is an initial condition, whereas that for the costate \( \textbf{p} \) is a terminal condition (Lawden 1975; Zhang and Balasuriya 2020). This latter condition—sometimes called the transversality condition—forces the vector field to match measured velocities at final time. However, given the time series \( \hat{\textbf{x}} \) at discrete times within the interval [0, T] , it is only possible to estimate \( \dot{\hat{\textbf{x}}}(T) \). Errors in this approximation will result in the turnpike phenomena in the controlled trajectory (Zaslavski 2015). Taking into account the noise in the data, the approximation of the derivative at final time is found by smoothing the trajectory and then numerically differentiating it. The approximation is calculated using the backwards difference method on the measurement data after being Savitzky–Golay filtered. Numerical differentiation is avoided at all other times.
There are many methods in the literature for numerically solving coupled initial/terminal systems such as (4) in the optimal control context (see (Zhang and Balasuriya 2020) for a discussion). Many of these (indirect or multiple shooting; collocation; simultaneous, sequential or direct transcription) are quite sensitive, and may fail unless parameters are chosen carefully. The fourthorder finite difference boundary value problem solver bvp4c is used to provide a \(C^1\)continuous solution on the interval of integration (Kierzenka and Shampine 2001). The trajectory \(\textbf{x}\) and the vector field \(\textbf{u}\) are therefore found by numerical integration rather than differentiation, and hence, this method is robust to noise.
The numerical solution to (4) provides values of \( \textbf{x} \) and \( \textbf{u} =  \textbf{p}/2 \) at discrete time instances \( t_j \in [0,T] \). This represents a set of points \( \textbf{x} \left( t_j \right) \) in the state space at which the vector field \( \textbf{u} \) is known. Delaunay triangulation interpolation can be used to interpolate \( \textbf{u} \) to the convex hull of data points \( \textbf{x}(t_j) \). This algorithm is effective for up to six dimensions, and therefore, sixdimensional autonomous systems can effectively be analysed with this method. Extrapolation outside the convex hull is of course errorprone, since there is insufficient data. We can nonetheless provide reasonable estimates by defining auxiliary points \(\textbf{x}_{\text {aux}}\) around the convex hull, sufficiently far from the sampled points, that have artificial \( \textbf{u}_{\text {aux}} \) values of \( \textbf{0} \). Construction of the vector field using this approach enables prediction of the time series \( \hat{\textbf{x}} \) into the future, as is described in the next section. This methodology for reconstructing a vector field from noisy data is described in Algorithm 1.
2.2 Fully Observed Prediction
The system is ‘fully observed’ if the time evolution of all N variables in the system are known. This is the situation discussed in the previous section, in which the time series of each of the N components of \( \hat{\textbf{x}}(t) \) were known over a time duration [0, T] . The intention is to predict the evolution of each of these components beyond time T. From the results of the previous section, the governing vector field \( \textbf{u} \) can be reconstructed. Since the fully observed data represents a trajectory of (1) until time T, a standard numerical solution scheme can then be used to integrate the trajectory forward in time. In this paper, the variablestep, variableorder solver ode15s (Shampine and Reichelt 1997) is used with relative and absolute tolerances of \(10^{6}\).
It may seem natural to use the time T (i.e. the final time at which measured trajectory data is available) as the initialisation time \( T_p \) (henceforth called the prediction time) for implementing prediction. However, the data is noisy and consequently the errors in the time derivative (necessary for vector field reconstruction as outlined in the previous section) can be quite large. If the final measured point (at time T) has high derivative, small errors can quickly compound to large errors in the evolving trajectory. Choosing an earlier time \( T_p < T \) where the derivative, and hence vector field, is small in magnitude will decrease compounding errors. However, choosing \( T_p \) much earlier than T will decrease prediction accuracy because the available information between times \( T_p \) and T would be ignored. We propose that the process of choosing the prediction time \( T_p \) therefore involves finding the time where the derivative is minimised over some interval \([T\delta ,T]\):
where the ‘minimum argument’ \({\text {argmin}}\) is found numerically as the time \(t\in [T\delta ,T]\) which minimises \(\Vert \textbf{u}(t)\Vert \). The improvement in accuracy by varying the prediction time \( T_p \) to minimise the derivative is presented through simulations in Appendix B. Figure 2 presents an example of prediction to \(T_f\) time units into the future beyond the dataavailable time T.
With these observations in mind, the fully observed prediction framework which we have described is presented in Algorithm 2. Its implementation is presented in Sect. 3.
2.3 Partially Observed Prediction
In practice, the simultaneous measurement of all relevant variables of a system is not always possible, and therefore, we must rely on partial state observation. This means that we do not have access to all N of the system variables’ evolution, but rather to only a subset of them. Here, we will assume we only have access to one variable, but the following approach easily extends to partially observed systems with more than one observed variable. We can leverage Takens’ theorem (Takens 1981) to reconstruct the full system attractor in a higher embedded space through using delay coordinate embeddings. The embedded vector field (subsequently known as the Takens field) may then be estimated using the process in Sect. 2.2 and used to evolve the embedded dynamics for prediction of the partially observed variable.
Consider a continuous dynamical system \(\dot{\textbf{x}}=\textbf{f}(\textbf{x})\) in a general dimension \(\textbf{x}\in X\), and suppose that this possesses an attractor A towards which trajectories beginning in a certain region tend to evolve. Let h be the ‘partial observation function’ \(x(t)=h\left( \textbf{x}(t)\right) \), which gives us the evolution of one time series from this system; i.e. we do not have access to each of the variables’ evolution, but instead have either just one system variable, or some other variable derived from these system variables. Takens’ theorem enables an embedding \(\textbf{x}_{\text {emb}}=\mathbf {\Phi } (x)\) which takes the time series associated with the one observable variable x(t), and constructs an evolution in a mdimensional phase space.
Many embeddings exist, such as the delay and differential embeddings originally presented by Takens (1981), and the bundle embedding as an extension to forced and stochastic systems (Stark 1999; Stark et al. 2003). Throughout this work, we used the classical delay embedding with time delay \(\tau \) and embedding dimension m
which at any time t, samples the time series x(t) at m values corresponding to time delays \(\{ 0,\tau ,\dots ,(m1)\tau \}\). As t progresses, the mdimensional variable \(\textbf{x}_{\text {emb}}(t) \) evolves according to a dynamical system \(\dot{\textbf{x}}_{\text {emb}}=\textbf{g}(\textbf{x}_{\text {emb}})\) where \(\textbf{g}:\mathbb {R}^m\rightarrow \mathbb {R}^m \) is inferred from the data. An attractor A in the original system is expected to be visible in the embedded system by a set \( \tilde{A} \). Appropriate values of time delay \(\tau \) and embedding dimension m must be chosen for the delay embedding to give a proper reconstruction of the full system attractor.
There have been many methods devised for selecting optimal values of these parameters such as through using mutual information (Fraser and Swinney 1986), symbolic dynamics (MatillaGarcía et al. 2021), or using the correlation integral (Kim et al. 1999). Throughout this research, the optimal time delay \(\tau \) given by the mutual information method was used. The embedding dimension m was chosen to be equal to the dimension of the full attractor and methods for inferring the attractor dimension are required in practice, such as choosing the lowest dimension where there are no false nearest neighbours (Kennel et al. 1992). The presented algorithms also work for nonuniform embeddings (Judd and Mees 1998), where delay coordinates are lagged by differing delay sizes.
The embedding process described above is visualised by the diagrams in Fig. 3a (adapted from a discrete version by Vlachos and Kugiumtzis (2008)) and Fig. 3b. Given that the vector fields \( \textbf{f} \) and \( \textbf{g} \) govern the evolutions in the original and embedded phase spaces, respectively, we have the trajectory evolution equations
which define the mappings \( \textbf{F} \) and \( \textbf{G} \) in Fig. 3a. These mappings provide a way to estimate the variables at time \( \Delta t \) into the future. Figure 3a shows that the partially observed variable x(t) can therefore be predicted through embedding it into \(\mathbb {R}^m\), evolving the dynamics in the embedded system, and then separating the forecast of the partially observed variable from this embedded trajectory forecast. This is equivalent to applying the functions \( \mathbf {\Phi } \), \( \textbf{G} \) and \( \mathbf {\Phi } ^{1} \) in order to reconstruct the partially observed trajectory x(t) .
It is evident that embedding is effective for time series from stationary processes; however, nonstationarity can be handled through overembedding (Verdes et al. 2006). This result relies on the ability to transform any ndimensional nonautonomous system into an \((n+1)\)dimensional autonomous system through defining time as a state variable.
Techniques for predicting partial observations from delay embeddings reduce to approximating the function g that evolves the embedded dynamics. A common technique for estimating g is to use the method of analogues (Lorenz 1969; Casdagli 1989; Jayawardena and Lai 1994; Sugihara and May 1990; Abarbanel et al. 1994; PérezMuñuzuri and Gelpi 2000; Hamilton et al. 2016). This algorithm constructs a linear autoregressive model of g at the current state using the nearest neighbours, points within a predefined neighbourhood around the current state. The regression coefficients can be found through the future values of the nearest neighbours, and then, this regression model can be used to forecast the current state forward in time. This process can then be iterated to forecast neighbourhoodtoneighbourhood indefinitely. Rather than constructing local linear maps at each state, global representations of the embedded attractor provide a single, nonlinear function for the whole dataset. These global models may be created through polynomial fitting (Giona et al. 1991) and are smaller and therefore more convenient for computation; however, they are less accurate than the local linear method (Abarbanel et al. 1994).
Recently, there has been much focus on using machine learning techniques to estimate g. The pattern detection power of machine learning has enabled the modelfree prediction of chaotic time series using delay embeddings. The use of recurrent neural networks in reservoir computing to estimate g has shown excellent results for input data with low noise or no noise (Pathak et al. 2018; Gauthier et al. 2021). Rather than using typical pattern identification, reservoir computers are dynamical systems themselves and are trained on trajectory data to learn to emulate the underlying dynamical system.
In our situation, the data is permitted to be noisy, and the embedding procedure amplifies noise (Casdagli et al. 1991). Therefore, the optimal control approach of approximating the Takens field allows a global, continuous, and nonlinear representation of g to be estimated. Once this field has been approximated, numerical integration is used to forecast the embedded trajectory past the final time.
As the coordinates of the delay embedding are delayed signals of the partial observation, the Takens field can be found simply by estimating the derivative of the observation and then delaying this vector. Once the field has been approximated and the triangulation has been defined, the method of choosing the point of prediction, \(T_p\), presented for the full observation prediction is applied. Numerical integration can then be used to forecast the embedded dynamics, and the first component of this trajectory is the prediction of the partially observed variable. The variablestep, variableorder solver ode15s (Shampine and Reichelt 1997) was used with relative and absolute tolerances of \(10^{6}\).
Algorithm 3 summarises the partial system prediction framework that has been described in this section. This algorithm is distinct from Algorithm 2 as it calculates the Takens field \(\textbf{u}_{\text {emb}}\) by lifting the derivative of x, u, into the embedded space. This algorithm is therefore computationally cheaper than performing Algorithm 2 on the embedded trajectory. After integrating the Takens field to obtain a prediction in embedded space \(\textbf{x}_{\text {emb}}(t)\), the embedding must then be inverted to produce the predicted partial trajectory x(t).
2.4 Forecasting Uncertainty
The ability to reconstruct the vector field of fully and partially observed variables presents the opportunity to estimate how probability densities evolve along the attractors. This allows a quantification of the uncertainty of prediction and estimations of prediction intervals. This uncertainty is a combination of the chaotic sensitivity of the attractor, and the structural imperfection of the estimated system. We will show that the analysis of this uncertainty may be performed via the transfer operator. The transfer operator advects densities through flows (Lasota et al. 1994), permitting a probabilistic description of transport and mixing (Froyland and Padberg 2009), and has been used to study molecular dynamics (Deuflhard et al. 2012), and coherent structures in the ocean (Froyland et al. 2007) and the atmosphere (GonzálezTokman 2018; Blachut and GonzálezTokman 2020). Rather than using modelled or measured vector fields, this work allows the transfer operator to be estimated from reconstructed vector fields to evolve initial condition densities on attractors without prior knowledge of the system. Typically either short simulations of many initial conditions or long simulations of a small number of trajectories are needed to accurately approximate the operator (Klus et al. 2016). However, the presented approach for vector field reconstruction requires only a single intermediatelength trajectory. Further, because the transfer operator can be estimated from fully or partially observed variables, invariant densities and other important dynamical objects may be analysed without the need for a model or access to vector field measurements.
Given an autonomous dynamical system with discrete time evolution \(M:X\rightarrow X\), defined for time step \(\Delta t\) as in Equations (7) and (8), the evolution of trajectories over n discrete time steps may be described iteratively as the composition
where \(\varvec{x}\in X\) and X is a phase space. In the fully observed case, the evolution rule M is \(F:A\rightarrow A\), and the partially observed case has evolution rule \(G:\tilde{A}\rightarrow \tilde{A}\) as in Fig. 3.
The transfer operator or Perron–Frobenius operator \(\mathcal {L}:L^1(X,\text {vol})\rightarrow L^1(X,\text {vol})\), is defined as
where vol is the Lebesgue measure on X and B is any measurable subset of X. This operator therefore describes the evolution of initial condition density \(f\in L^1(X,\text {vol})\) through the dynamics M. The evolution of densities may therefore be described iteratively as the composition
The operator is assumed to be nonsingular and hence does not create density (Lasota et al. 1994).
For numerical settings, the transfer operator may be approximated through Ulam’s method (Ulam 1960). This method partitions the solution space into d bins \(\{ B_i \}\) and approximates \(\mathcal {L}\) as a \(d \times d\) matrix known as the Ulam matrix P. The i, jth entry of P is calculated by evolving q uniformly distributed initial conditions \(\{ \varvec{x}_{1},..., \varvec{x}_{q} \} \in B_i\) through the vector field, and determining the proportion that arrive in bin \(B_j\). The Ulam matrix is therefore defined as
The GAIO software package (Dellnitz et al. 2001) was used to calculate the Ulam matrix where the flow map M is approximated by integrating the reconstructed vector field. Initial condition densities \(\varvec{\rho }(T_p)\in \mathbb {R}^d\) defined by the state at prediction time may then be evolved through iterative multiplication of this matrix,
This evolving density distribution provides a quantification of the uncertainty for each prediction. The density distribution for a particular variable is found by projecting the distribution onto the relevant coordinate. For initialisation, the bin containing the state at prediction time \(\varvec{x}(T_p)\in B_i\) was set to have density 1, and all other bins were given zero density. Algorithm 4 describes the process of evolving probability density distributions.
2.5 Error Metric
Analysing the accuracy of predictions of chaotic variables requires the definition of an error metric. The standard rootmeansquare error metric gives misleadingly low errors for predictions that initially diverge and then meet back with the true trajectory; we instead defined a metric that includes the Lyapunov time of the system to weight the shortterm prediction accuracy. The Lyapunov time, calculated as the reciprocal of the maximal Lyapunov exponent, is the characteristic timescale of chaotic systems, and represents the time over which a small error will grow by a factor of e. For the systems investigated in this work, this value has been precisely estimated (Viswanath 1998; Letellier and Rossler 2007). When only data is available, the maximal Lyapunov exponent may still be estimated by datadriven methods (Wolf et al. 1985). This exponential growth of errors means that predicting a chaotic system beyond the Lyapunov time is fraught.
The predictions \({\tilde{x}}_i\) of true trajectory \(x_i\) at time index \(i\in \{1\dots I\}\), where
are compared through using the weighted error metric
and \(\Delta t=t_2t_1\) is the simulation time step and \(T_\lambda \) is the Lyapunov time of the system. The error metric is a weighted \( \text {L}^2 \)norm and represents the averaged distance from the true trajectory (noiseless), weighted to the start of prediction. The weighting term \(\gamma \) is defined such that at one Lyapunov time into prediction, the weighting factor has decreased by 1/2.
3 Results
This section presents simulation results that demonstrate the presented prediction method and density prediction method for both fully and partially observed systems. The results are compared to trajectories from the true model, and from a model with model imperfection. Case studies are presented first, followed by an assessment of the robustness and accuracy of prediction through analysing the error statistics of 100 simulations while varying parameters including the energy parameter \(\mu \), the amount of trajectory data, and the amount of measurement noise in the trajectories. The sensitivity to initial conditions of the discovered velocity field was also investigated.
In this section, measured trajectory datasets were synthesised through solving the chaotic Lorenz (L63) system (Lorenz 1963) and the hyperchaotic Rössler system (Rossler 1979). Given an initial condition, the systems were numerically solved and a section of the dataset at the beginning was excluded to ensure the synthesised data lies sufficiently close to the attractor. The time steps of the simulated data were chosen to be sufficiently small as to not influence the results.
3.1 Case Studies
Two wellknown chaotic systems, one in three dimensions and one in four dimensions, were used to assess the accuracy of prediction using the proposed techniques. For each system, modelfree predictions were compared to trajectories from models with model imperfection. An introduced parameter \(\epsilon \) acts to perturb coefficients in the differential equations of each system to simulate model error. To effectively compare the effects of measurement noise on modelfree predictions to the effects of model error on modelbased predictions, the divergence introduced by initial condition error was not considered, and therefore, the initial conditions for the imperfect model were chosen to be the noiseless, true initial conditions. The timevarying absolute error between the true trajectory and both the algorithm predictions, and the imperfect model trajectories are presented and compared with the linearised approximation of chaotic divergence in Appendix C.
Table 1 outlines the colour code used throughout this section. A comparison of the attractors from the noisy data, imperfect model, fully observed reconstruction, and partially observed reconstruction are also shown. These plots highlight the qualitative similarity between the attractor reconstructed from partial observations and the full system attractors.
The Lorenz (L63) system, first derived in 1963, is a threedimensional continuous dynamical system that was originally studied as a simplified model of atmospheric convection (Lorenz 1963). The governing equations for this classical chaotic system are
The parameter values used in Lorenz (1963) were chosen for the true system: \(\sigma =10\), \(\beta =\frac{8}{3}\), \(\rho =28\), and \(\epsilon =0\). The Lyapunov time of the L63 system is \(T_\lambda \approx 1.104\) time units (Viswanath 1998). Trajectory data was synthesised by choosing an initial condition, simulating for 40 Lyapunov times and retaining the last 10 Lyapunov times.
First consider the full observation of all three variables x, y, and z. Having obtained a time series of 10 Lyapunov times length with time step \(\Delta t=0.01\) from (16), we add Gaussian white noise of variance \(\sigma ^2=0.1\). In using Algorithm 2 to forecast the variation of one component of this trajectory, we chose an energy parameter of \(\mu =1000\). Prediction for \(T_f= 5 \) time units beyond the final time T is shown in Fig. 4a. The forecast between prediction time \(T_p\) and final time T is omitted to show only the forecast of unavailable trajectory data.
Next, suppose that only one component of the trajectory, say x, is available, while the other two variables are hidden. Using a time delay of \( \tau = 0.11 \) time units, the optimal value given by the mutual information method, the trajectory is embedded in \(\mathbb {R}^3\). Gaussian white noise of variance \(\sigma ^2=0.1\) was then added. We then simulated 10 Lyapunov times worth of data with time step \(\Delta t=0.01\), chose an energy parameter of \(\mu =1000\), and predicted for \( T_f=5 \) time units as shown in Fig. 4c. Bearing in mind the difficulty of predicting beyond a time of \( T_\lambda \approx 1.104 \), this is an excellent outcome for a partially observed chaotic system with measurement noise.
Model error was introduced by perturbing two terms in Equation (16) with perturbation parameter \(\epsilon \). The two nonlinear terms were chosen, as methods for generating differential equations from data estimate these coefficients with less accuracy than the linear terms (Champion et al. 2019). A perturbation of \(\epsilon =0.05\) was selected as the median prediction error for this value is closest to the median predictive error for the predictions of Algorithm 3 for training data noise of \(\sigma ^2=0.1\), as seen in Sect. 3. The true initial conditions without noise were used to integrate the imperfect model for \(T_f= 5 \) time units, and the resulting trajectories are shown in Fig. 4a and c.
The hyperchaotic fourdimensional Rössler system, first studied in 1979, exhibits two positive Lyapunov exponents (Rossler 1979). (Hyperchaotic systems are defined to have at least two positive Lyapunov exponents and therefore must lie in four dimensions or higher.) The system proposed by Rössler was the first such system discovered, and possesses a fractal attractor in \( \mathbb {R}^4 \). The equations that describe the hyperchaotic Rössler system are
The parameter values selected in Rossler (1979) were used for the true system; \(a=0.25\), \(b=3\), \(c=0.5\), \(d=0.05\), and \(\epsilon =0\). The Lyapunov time of this hyperchaotic Rössler system is \(T_\lambda \approx 8.929\) time units (Letellier and Rossler 2007). Trajectory data was synthesised by choosing an initial condition, simulating for 100 Lyapunov times and retaining the last 40 Lyapunov times.
Timeseries data with time length of 40 Lyapunov times and time step \(\Delta t=0.05\) was obtained from the Rössler hyperchaotic system (17), and Gaussian white noise of variance \(\sigma ^2=0.05\) was added. Due to the varying magnitudes of components in the trajectory data, different energy parameters were chosen for the different components: \(\mu _x=\mu _y=250\), \(\mu _z=500\), and \(\mu _w\!=\!50\). Prediction for 40 time units beyond the final time T using Algorithm 2 is shown in Fig. 4b.
Next, we supposed that only the time series for the x component were available. A time delay of \( \tau = 1.75 \) time units was chosen to embed the time series into \(\mathbb {R}^4\). Gaussian white noise of variance \(\sigma ^2=0.05\) was added and the trajectory data of 40 Lyapunov times in length with time step \(\Delta t=0.05\) was used. The prediction using Algorithm 3 with \( \mu = 250 \) for 40 time units is shown in Fig. 4d. Again, good prediction beyond the Lyapunov time is observed, even in this hyperchaotic situation of partially observed data corrupted by noise.
The perturbation parameter \(\epsilon \) was increased in Equation (17) to simulate model error. The coefficients of the nonlinear term and linear x term were selected to be perturbed and a value of \(\epsilon =0.05\) was chosen. The true noiseless initial conditions were used to evolve the model error system for \(T_f= 40 \) time units after final time T and the resulting trajectories are shown in Fig. 4b and d.
3.2 Density Forecast
The uncertainty in each prediction may be quantified through the estimation of the transfer operator of the vector field. The density distributions were calculated over time for the fully and partially observed cases using the same initial conditions and compared to the distribution found using the analytic vector field. The flow map used to estimate the transfer operator was defined using the time step used in simulations. An intermediatelength trajectory with 10 Lyapunov times of data was used for training. For each field, the operator was approximated by partitioning the state space into \(2^{20}\) bins and seeding each bin with 25 initial conditions. The initial density distribution was evolved for 5 time units past final time using Algorithm 4. Distributions are represented with dark colours for low density and light colours for high density. The true trajectory and corresponding trajectory predictions are plotted on top of the evolved distributions in white. Trajectory predictions with forecasted density distributions are shown in Fig. 5.
This particular simulation shows a state of the system where the true trajectory initially experiences few switching events. This results in a period of predictability, similar to blocking patterns in meteorology (Tantet et al. 2015). Switching events can be seen in the plots as density moving from the positivex lobe of the attractor to the negativex lobe, after reaching the saddle point at \(x=0\). While the true system and imperfect model have experienced negligible switching after 1 Lyapunov time, the reconstructed transfer operators show switching before this time, transporting some density to the alternate lobe. This gives an indication of the diffusion that noise and reconstruction error has introduced into the estimated fields. However, the evolved density distributions may still provide a further understanding of the predictability of trajectory predictions. All plots show that there is high certainty that after 1 Lyapunov time the trajectory is in the positivex lobe of the attractor. For other initial conditions, this analysis may reveal that divergence of initial condition density results in low certainty for the state after 1 Lyapunov time. This provides a local predictability quantification that is a more precise tool than estimating the maximal Lyapunov exponent, which only gives a global quantification of predictability.
3.3 Robustness
The robustness of our algorithm was assessed by varying: the energy parameter, length of time for which data is used for prediction, level of noise added to the trajectory data, and initial condition perturbation. The sensitivity of a model with introduced model error was also investigated by varying the level of coefficient perturbation. The L63 system was used to assess robustness. To compare the fully and partially observed algorithms, only the prediction of the xcomponent was studied for the full system case, and the xcomponent was used for the embedding in the partial system case. The error metric (15) was used to compare predictions.
The trajectories were predicted \(T_f=5\) time units after final time T. The time step used in all simulations was \(\Delta t=0.01\). The error statistics are studied through the median and the median absolute deviation (mad). These measures are robust to a wide range of distributions and quantify the location and spread of the error distribution.
3.3.1 Energy Parameter
The selected energy parameter \( \mu \) weights the tracking and energy terms in the optimal control problem; therefore, it affects the discovered vector field. To investigate the variation of prediction accuracy, the L63 system was predicted for trajectory datasets with different energy parameters. For each value of \( \mu \), 100 simulations were performed with 10 Lyapunov times of given trajectory data that were corrupted by Gaussian noise of variance \(\sigma ^2=0.1\). The distributions of the simulations are shown in Fig. 6a. The blue histograms are based on a full observation (all three variables), while the red histograms are for the partially observed situation.
3.3.2 Amount of Data
The time length of data that is measured is expected to impact the effectiveness of prediction. This was analysed through predicting the L63 system given several trajectory datasets of varying time length. The time lengths T of these datasets were in integer multiples of the Lyapunov time \( T_\lambda \approx 1.104 \) of the L63 system. For each dataset, 100 simulations were performed with random initial conditions and a fixed level of noise of variance \(\sigma ^2=0.1\). An energy parameter of \(\mu =1000\) was chosen. The distributions of simulations are shown in Fig. 6b.
3.3.3 Noise Level
Observed time series data will always be corrupted by some level of measurement noise, and therefore, studying the effect this has on prediction is necessary to assess robustness. Noise was added to the simulated L63 trajectory data to analyse the predictive power with noisy data. The added noise was Gaussian white noise \(\mathcal {N}(0,\sigma ^2)\). For each level of noise variance \( \sigma ^2 \), 100 simulations were performed with given dataset sizes of \( T=10T_\lambda \). The values of energy parameter for increasing noise level were \(\mu =2500,1750,1000,400,300\), respectively, for the variances shown in Fig. 6c. The distributions of the simulations are shown in Fig. 6c.
3.3.4 Initial Condition Perturbation
As we are considering chaotic systems, trajectories will experience sensitivity to initial conditions. If there is error in the chosen initial conditions, separate from the measurement noise that has been smoothed, it is expected that prediction error will increase. This was investigated by perturbing the initial condition of the predicted trajectory and comparing the error statistics for different magnitudes of perturbation variance. A sample from a Gaussian distribution \(\mathcal {N}(0,\sigma _{\text {IC}}^2)\) was added to each component of the prediction point given by \(\textbf{x}(T_p)\) to investigate the sensitivity to initial conditions of the presented algorithms. For the fully observed setting, this method assumes each component has equal uncertainty in the initial condition. For the partially observed setting, we are assume that there is uncertainty in the initial condition, as well as uncertainty at the points \((m1)\) multiples of \(\tau \) delayed from that point. This allows the investigation of errors in the trajectory data that result from uncertainty not mitigated by the optimal control smoothing. One set of noiseless trajectory data of length \( T=10T_\lambda \) was used to remove the effects of measurement noise on the perturbation. The simulation initial condition was randomly sampled to have values (0.7730, 0.9876, 12). For each variance \( \sigma _{\text {IC}}^2 \), predictions were conducted for 100 perturbed initial conditions. An energy parameter of \(\mu =5000\) was chosen. The distributions of the simulations are shown in Fig. 6d. Simulations with different initial conditions gave results with similar trends in the median, but with varying distribution of error.
3.3.5 Model Error
As described in Sect. 3.1, model error may be simulated through the perturbation of coefficients in the differential equations through perturbation parameter \(\epsilon \). The error between the simulated trajectories of the true system and the imperfect model with the same initial conditions were studied through variation of \(\epsilon \). Perturbations range from mild model error to significant model error with \(\epsilon =0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1\) which represent coefficient errors from \(1\%\) to \(100\%\). For each level of perturbation, 100 simulations were performed and the distributions of the error from these simulations are shown in Fig. 6e.
4 Discussion and Conclusion
Through developing a method of velocity field reconstruction using optimal control and combining this with Takens’ embedding theorem, we have developed algorithms to enable fully and partially observed prediction of chaotic variables in settings with measurement noise. The predictability of these states is also quantified through the estimation of the transfer operator of each vector field. The application of these algorithms to two wellknown chaotic systems has been demonstrated and robustness studies have been presented for variation in parameter value, data length, and level of noise.
The robustness of the energy parameter \(\mu \) shows that a large range of \(\mu \) will give similar median prediction accuracy, as highlighted in Fig. 6a. Parameter tuning techniques such as ordinary crossvalidation may be used to find optimal values of the energy parameter (Dey and Krishnaprasad 2012). As the optimal control approach of approximating the vector field penalises the amplitude of the velocity, for large noise levels the prediction trajectory amplitudes are attenuated. Therefore, an adaptive energy parameter may be beneficial to reduce this attenuation. Future work is required to enhance the energy parameter selection process.
The amount of trajectory data given for the reconstruction of vector fields is important for accurate prediction, as can be noted by Fig. 6b. Both full and partial prediction increase in accuracy as the amount of given trajectory data increases. For online approaches to prediction using vector field reconstruction, the prediction accuracy is therefore expected to increase as the algorithm is operating, if every new observation is used to increase the fidelity of the vector field.
For low levels of noise, prediction using full observation is more accurate; however, as the level of noise increases, the fully observed and partially observed prediction error medians approach each other (Fig. 6c). This suggests that for large levels of noise, prediction using partially observed variables is as accurate as prediction using knowledge of every state variable.
The error distribution for varying levels of initial condition uncertainty changes significantly as the uncertainty increases, as is presented in Fig. 6d. The full and partial prediction error median and mad both increase as the variance of the initial condition perturbation increases, which is expected due to the chaotic nature of the Lorenz system. The difference between the median of the partial prediction and the full prediction also decreases for increasing perturbation variance.
The biggest challenge in prediction of noisy, chaotic systems is sensitivity to the initial condition. If noisy measurements are taken, the imperfect observations of the signal result in an uncertainty of appropriate initial conditions for prediction. This is a major problem in fields such as numerical weather prediction where perfect measurement of initial conditions is not possible and initialisation processes are required. The optimal control approach of trajectory smoothing seeks to avoid this issue by reconstructing an energyoptimal analogue of the measured dynamical system and using an initial condition from the energyoptimally smoothed trajectory. This initial condition will therefore be perturbed by the measurement noise and so more investigation into the relation between uncertainty in initial condition and uncertainty in the discovered analogue system is required.
The evolution of initial condition density using the transfer operator highlights this accumulation of uncertainty through the apparent diffusion present in the reconstructed vector fields. Measurement noise and the amplification of that noise through delay embedding introduce diffusion on the attractor, decreasing the already low predictability of the true system. However, the resulting uncertainty distributions can still provide modelfree methods with estimates of the predictability of particular states, rather than global predictability measures.
As the imperfection of the Lorenz model increases from mild model error to significant model error, the predictive accuracy of the model decreases as is evident by Fig. 6e. Despite having access to the true initial conditions, perturbation of the coefficients of the model results in notable prediction error. Comparing Fig. 6c with Fig. 6e highlights the tradeoff between measurement noise and model error in choosing an appropriate prediction method. For situations involving large measurement error, it is preferable to use a model even with mild model error. However, in scenarios of significant model error, the presented modelfree algorithm outperforms the model. This tradeoff is especially important when considering which datadriven technique to use for prediction, as the drop in predictive accuracy from the introduced model error for parametric methods may exceed that introduced from observational noise for nonparametric methods.
For systems that have dynamics on short timescales, such as the zcomponent of the hyperchaotic Rössler system, methods of noise reduction such as using a lowpass filter will attenuate information which is crucially important to reconstructing the dynamics. In contrast, the energyoptimal algorithm presented here penalises the amplitude of the vector field and therefore will include short timescale dynamics but decrease their amplitude to a less severe extent.
The presented simulations demonstrate the effectiveness of modelfree methods for prediction using vector field reconstruction. Both full and partial observations were used to predict three and fourdimensional systems; however in higher dimensions, the algorithms become computationally expensive. Through extension of the Delaunay triangulation method or incorporating dimensionality reduction algorithms, this technique may be applicable to systems of any dimension.
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Abarbanel, H.D.I., Carroll, T.A., Pecora, L.M., Sidorowich, J.J., Tsimring, L.S.: Predicting physical variables in timedelay embedding. Phys. Rev. E 49, 1840–1853 (1994). https://doi.org/10.1103/PhysRevE.49.1840
Babovic, V., Keijzer, M.: Genetic programming as a model induction engine. J. Hydroinf. 2, 35–60 (2000)
Balasuriya, S.: Stochastic approaches to Lagrangian coherent structures. Adv. Stud. Pure Math. Math. Soc. Jpn. 85, 95–104 (2021)
Blachut, C., GonzálezTokman, C.: A tale of two vortices: How numerical ergodic theory and transfer operators reveal fundamental changes to coherent structures in nonautonomous dynamical systems. J. Comput. Dyn. 7(2), 369 (2020)
Brunton, S.L., Proctor, J.L., Kutz, J.N., Bialek, W.: Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. U.S.A. 113, 3932–3937 (2016)
Casdagli, M., Eubank, S., Farmer, J.D., Gibson, J.: A theory of state space reconstruction in the presence of noise. Information Dynamics, pp. 61–96 (1991)
Casdagli, M.: Nonlinear prediction of chaotic time series. Physica D 35, 335–356 (1989)
Champion, K., Lusch, B., Kutz, J.N., Brunton, S.L.: Datadriven discovery of coordinates and governing equations. Proc. Natl. Acad. Sci. 116(45), 22445–22451 (2019)
Chartrand, R.: Numerical differentiation of noisy, nonsmooth data. ISRN Appl. Math. 2011, 1–11 (2011)
Dellnitz, M., Froyland, G., Junge, O.: The algorithms behind GAIOset oriented numerical methods for dynamical systems. In: Ergodic Theory, Analysis, and Efficient Simulation of Dynamical Systems. Springer, pp. 145–174 (2001)
Deuflhard, P., Hermans, J., Leimkuhler, B., Mark, A.E., Reich, S., Skeel, R.D.: Computational molecular dynamics: challenges, methods, ideas. In: Proceeding of the 2nd International Symposium on Algorithms for Macromolecular Modelling, Berlin, May 21–24, 1997. Springer Science & Business Media, vol. 4 (2012)
Dey B., Krishnaprasad, P.S.: Trajectory smoothing as a linear optimal control problem. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2012, pp. 1490–1497 (2012)
Fraser, A.M., Swinney, H.L.: Independent coordinates for strange attractors from mutual information. Phys. Rev. A 33, 1134–1140 (1986)
Froyland, G., Padberg, K.: Almostinvariant sets and invariant manifolds—connecting probabilistic and geometric descriptions of coherent structures in flows. Physica D 238(16), 1507–1523 (2009)
Froyland, G., Padberg, K., England, M.H., Treguier, A.M.: Detection of coherent oceanic structures via transfer operators. Phys. Rev. Lett. 98(22), 224503 (2007)
Gauthier, D.J., Bollt, E., Griffith, A., Barbosa, W.A.: Next generation reservoir computing. Nat. Commun. 12, 1–8 (2021)
Gilpin, W.: Deep reconstruction of strange attractors from time series. Adv. Neural. Inf. Process. Syst. 33, 204 (2020)
Giona, M., Lentini, F., Cimagalli, V.: Functional reconstruction and local prediction of chaotic time series. Phys. Rev. A 44, 3496–3502 (1991). https://doi.org/10.1103/PhysRevA.44.3496
GonzálezTokman, C.: Multiplicative ergodic theorems for transfer operators: towards the identification and analysis of coherent structures in nonautonomous dynamical systems. Contemp. Math. 709, 31–52 (2018)
Gouesbet, G., Letellier, C.: Global vectorfield reconstruction by using a multivariate polynomial L2 approximation on nets. Phys. Rev. E 49, 4955–4972 (1994)
Hamilton, F., Berry, T., Sauer, T.: Ensemble Kalman filtering without a model. Phys. Rev. X 6, 1–12 (2016)
Han, J., Tao, J., Zheng, H., Guo, H., Chen, D.Z., Wang, C.: Flow field reduction via reconstructing vector data from 3D streamlines using deep learning. IEEE Comput. Gra. Appl. 39, 54–67 (2019)
Haufe, S., Nikulin, V.V., Ziehe, A., Müller, K.R., Nolte, G.: Estimating vector fields using sparse basis field expansions. Advances in Neural Information Processing Systems 21—Proceedings of the 2008 Conference, pp. 617–624 (2009)
Jayawardena, A.W., Lai, F.: Analysis and prediction of chaos in rainfall and stream flow time series. J. Hydrol. 153, 23–52 (1994)
Judd, K., Mees, A.: Embedding as a modeling problem. Physica D 120, 273–286 (1998)
Kennel, M.B., Brown, R., Abarbanel, H.D.: Determining embedding dimension for phasespace reconstruction using a geometrical construction. Phys. Rev. A 45(6), 3403 (1992)
Kierzenka, J., Shampine, L.F.: A BVP solver based on residual control and the MATLAB PSE. ACM Trans. Math. Softw. 27, 299–316 (2001). https://doi.org/10.1145/502800.502801
Kim, H.S., Eykholt, R., Salas, J.D.: Nonlinear dynamics, delay times, and embedding windows. Physica D 127, 48–60 (1999)
Klus, S., Koltai, P., Schütte, C.: On the numerical approximation of the Perron–Frobenius and Koopman operator. J. Comput. Dyn. 3, 51–79 (2016)
Knowles, I., Renka, R.: Methods for numerical differentiation of noisy data. Electron. J. Differ. Equ. 21, 235–246 (2014)
Lasota, A., Mackey, M.C.: Chaos, fractals, and noise: stochastic aspects of dynamics. In: Lasota, A., Mackey, M.C. (eds.) Applied Mathematical Sciences, vol. 2. SpringerVerlag, New York (1994)
Lawden, D.F.: Analytical Methods of Optimization. Scottish Academic Press, Edinburgh (1975)
Letellier, C., Rossler, O.E.: Hyperchaos. Scholarpedia 2, 1936 (2007)
Lorenz, E.: Deterministic nonperiodic flow. J. Atmos. Sci. 20, 130 (1963)
Lorenz, E.: Atmospheric predictability as revealed by naturally occurring analogues. J. Atmos. Sci. 26, 636 (1969)
MatillaGarcía, M., Morales, I., Rodríguez, J.M., Marín, M.R.: Selection of embedding dimension and delay time in phase space reconstruction via symbolic dynamics. Entropy 23, 1–13 (2021)
Pathak, J., Hunt, B., Girvan, M., Lu, Z., Ott, E.: Modelfree prediction of large spatiotemporally chaotic systems from data: a reservoir computing approach. Phys. Rev. Lett. 120, 24102 (2018). https://doi.org/10.1103/PhysRevLett.120.024102
PérezMuñuzuri, V., Gelpi, I.R.: Application of nonlinear forecasting techniques for meteorological modeling. Ann. Geophys. 18, 1349–1359 (2000)
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physicsinformed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019). https://doi.org/10.1016/j.jcp.2018.10.045
Risken, H., Risken, H.: FokkerPlanck Equation. Springer, Berlin (1996)
Rossler, O.E.: An equation for hyperchaos. Phys. Lett. A 71, 155–157 (1979)
Shampine, L.F., Reichelt, M.W.: The MATLAB ODE suite. J. Sci. Comput. 18, 1–22 (1997)
Stark, J.: Delay embeddings for forced systems. I. Deterministic forcing. J. Nonlinear Sci. 9, 255–332 (1999)
Stark, J., Broomhead, D.S., Davies, M.E., Huke, J.: Delay embeddings for forced systems. II. Stochastic forcing. J. Nonlinear Sci. 13, 519–577 (2003)
Sugihara, G., May, R.M.: Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series. Nature 334, 734–741 (1990)
Sugihara, G., May, R., Ye, H., Hsieh, C.H., Deyle, E., Fogarty, M., Munch, S.: Detecting causality in complex ecosystems. Science 338, 496–500 (2012)
Takens, F.: Detecting strange attractors in turbulence. Lect. Notes Math. 898, 336–381 (1981)
Tantet, A., van der Burgt, F.R., Dijkstra, H.A.: An early warning indicator for atmospheric blocking events using transfer operators. Chaos 25, 2 (2015)
Ulam, S.: A Collection of Mathematical Problems, ser. Interscience tracts in pure and applied mathematics. Interscience Publishers, Geneva (1960)
Verdes, P.F., Granitto, P.M., Ceccatto, H.A.: Overembedding method for modeling nonstationary systems. Phys. Rev. Lett. 96, 118701 (2006)
Viswanath, D.: Lyapunov Exponents from Random Fibonacci Sequences to the Lorenz equations. Cornell University, Cornell (1998)
Vlachos, I., Kugiumtzis, D.: State space reconstruction for multivariate time series prediction. In: Nonlinear Phenomena in Complex Systems [Online]. Available: http://arxiv.org/abs/0809.2220 (2008)
Wolf, A.: Quantifying chaos with Lyapunov exponents. Chaos 16, 285–317 (1986)
Wolf, A., Swift, J.B., Swinney, H.L., Vastano, J.A.: Determining Lyapunov exponents from a time series. Physica D 16(3), 285–317 (1985)
Yu, Y.N., Vongsuriya, K., Wedman, L.N.: Application of an optimal control theory to a power system. IEEE Trans. Power Appar. Syst. 1, 55–62 (1970)
Zaslavski, A.J.: Turnpike Theory of ContinuousTime Linear Optimal Control Problems, vol. 104. Springer, Berlin (2015)
Zhang, L., Balasuriya, S.: Controlling trajectories globally via spatiotemporal finitetime optimal control. SIAM J. Appl. Dyn. Syst. 19, 1609–1632 (2020)
Acknowledgements
SPM acknowledges with thanks support from Australian Government Research Training Program Scholarships. SB and CB were partially supported by the Australian Research Council via grant DP200101764.
Funding
Open Access funding enabled and organized by CAUL and its Member Institutions.
Author information
Authors and Affiliations
Contributions
SPM wrote the main manuscript text and performed all numerical simulations. All authors designed the research, reviewed the results, and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by Oliver Junge.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file 1 (mov 769 KB)
Appendices
Appendix
Derivation of Optimal Control Method
Consider the problem of controlling a trajectory \(\textbf{x} \) to a measured, noisy trajectory \(\hat{\textbf{x}}\) by the control field \( \textbf{u} \) as given in (1). The cost might be defined more generally than in (2) as
where Q and R are \(n\times n\) diagonal matrices that define the weightings of the trajectory tracking and velocity amplitude terms, respectively. As we do not have any constant terms in our integral, only the ratio of corresponding components of the Q and R matrices will effect the control and therefore we may equivalently consider the case where R is the identity matrix; hence, we use the cost (2). Following Pontryagin’s maximum principle (Lawden 1975), the Hamiltonian of this problem is
where \(\textbf{p}\in \mathbb {R}^{n}\) is a set of introduced Lagrange multipliers, referred to as the costates. The control field \(\textbf{u}\) can be found through the condition for optimality that \(\nabla _u H=0\) along optimal trajectories. This is known as the optimal control law, and results in
The controlled trajectories \(\textbf{x}\) and conjugate momenta \(\textbf{p}\) can be found through the coupled equations
along with appropriate initial and terminal conditions. Using the optimal control law, and the fact that Q is a diagonal matrix, results in (4).
Prediction Point Analysis
Due to the turnpike phenomena, the initialising time of prediction \( T_p \) should not be chosen to be T, final time at which data is available. Due to the sensitivity to initial conditions, it is preferable to choose a point near the end of the dataset to maximise prediction accuracy; however, as argued in Sect. 2.2, it is preferable to choose \( T_p \) near a value where \( \textbf{u} \) is close to zero to avoid compounding errors. To investigate the legitimacy of this idea, one set of simulated trajectory data of the xcomponent of the L63 system was corrupted with 100 different random seeds of variance \(\sigma ^2=0.01\). Prediction was then performed for each of the different datasets beginning at 16 different points \( T_p \) near the end of the trajectory data. The prediction error defined in equation 15 was calculated and averaged over each simulation to produce Fig. 7. The timeaxis has been shifted to be \( T_p  T_m \), where \( T_m \) is the time value corresponding to \( \left\ \textbf{u} \right\ \) being minimised in the time interval \( [T\delta , T] \), where \( \delta = 0.6\). There is a relatively welldefined concave minimum when \( T_p  T_m = 0 \), i.e. when the prediction time \( T_p \) was chosen to be the time \( T_m \) near T at which the vector field magnitude is minimum.
Error Analysis
The maximal Lyapunov exponent of a chaotic system characterises the rate of divergence of two trajectories, initially separated by separation vector \(\varvec{\delta }(0)\). The linearised approximation of divergence is given by
where \(\lambda \) is the maximal Lyapunov exponent of the system.
The absolute errors of predictions and model error trajectories were compared to the expected divergence given by the maximal Lyapunov exponent of each system. The magnitude of the initial separation vector is selected to be the standard deviation of Gaussian noise added to the training data, \(\varvec{\delta }(0)=\sigma \). The absolute error plots are shown in Fig. 8
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
McGowan, S.P., Robertson, W.S.P., Blachut, C. et al. Optimal Reconstruction of Vector Fields from Data for Prediction and Uncertainty Quantification. J Nonlinear Sci 34, 73 (2024). https://doi.org/10.1007/s00332024100471
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00332024100471