1 Introduction

The ability to predict unknown systems exhibiting chaotic behaviour is important for many natural sciences such as meteorology (Lorenz 1963, 1969; Pérez-Muñuzuri and Gelpi 2000), fluid mechanics (Takens 1981; Brunton et al. 2016), and ecology (Sugihara and May 1990; Sugihara et al. 2012). These systems are sometimes described by simple dynamical systems even if their time series trajectories seem complex. The corruption from measurement noise, sensitivity to initial conditions, and partial measurement of system variables present additional difficulties for the prediction and control of such systems. Uncovering the underlying structure of a chaotic system would allow greater understanding of the dynamics as well as presenting methods for forecasting and control.

Many techniques for prediction rely on developing models based on collected data and using these to forecast into the future. These models may be derived through first principles analysis or data-driven model derivation techniques such as with genetic programming (Babovic and Keijzer 2000), sparse regression (Brunton et al. 2016), or neural networks (Raissi et al. 2019). Ad hoc assumptions are frequently used to close equation sets to produce closed-form expressions, and noisy measurements may complicate structure and parameter estimation. These model errors inevitably magnify prediction errors, and therefore, forecasts may only be accurate for small timescales.

When data is available and accurate models cannot be derived, model-free prediction allows the forecasting of chaotic variables without first deriving closed-form dynamical models. There exist many approaches for model-free prediction from time series data such as the method of analogues (Lorenz 1969; Casdagli 1989) and neural network methods (Gauthier et al. 2021; Gilpin 2020; Pathak et al. 2018). Many of these techniques perform very well in a noise-free environment, with examples such as the accurate prediction of Kuramoto–Sivashinsky dynamics up to eight Lyapunov time constants into the future (Pathak et al. 2018). These approaches typically require data smoothing when noise is present in the measured data; however, ad hoc methods of smoothing may incorrectly remove dynamics on short timescales by attenuating high-frequency components of the signal.

Vector field reconstruction techniques (Gouesbet and Letellier 1994) estimate the phase space vector field of dynamical systems by computing the temporal derivatives of a trajectory and interpolating to fill a region of space. This is usually required for prediction, in order to obtain a vector field (locally near the trajectory) to evolve a trajectory into the future. However, computing derivatives from noisy data can lead to large errors. As an alternative, this paper presents an approach to predicting chaotic systems by reconstructing the vector field of noisy trajectory data using optimal control methods (Lawden 1975, e.g.). The optimal control procedure estimates derivatives through numerical integration and is therefore more robust to noise. In the presence of large levels of noise, reconstructing the vector field in this control theoretic way allows the reconstructed system dynamics to be energy optimal. Rather than penalising high frequencies to remove noise, the amplitude of the controlled vector field is penalised. This allows short timescale dynamics to be included in the resulting discovered vector field.

For situations where all variables of an unknown system cannot be measured, we must rely on partial observations; for example, the evolution of only one system variable. Takens’ theorem (Takens 1981) presents conditions where the full attractor of a dynamical system can be embedded into a higher dimension from the measurement of this one variable. This embedded attractor preserves properties of the full attractor that do not change under smooth coordinate transforms. The measurement of one variable therefore inherits certain information of the original dynamics, and can be used to infer properties of the full system. This paper will show that methods of reconstructing vector fields for full systems can be used for partially observed systems.

An important insight from statistical mechanics is that for chaotic systems, predicting the probability density from a large number of initial conditions avoids many issues of chaotic sensitivity encountered when predicting individual trajectories (Lasota et al. 1994). Estimating the vector field of a chaotic attractor from time series data allows a connection from the trajectory view to this density view. In this work, it is shown that the density perspective can provide each trajectory prediction with an uncertainty distribution, found through evolving densities of initial conditions past final time. Rather than giving a global quantification of predictability for chaotic systems with the maximal Lyapunov exponent (Wolf et al. 1985), this approach presents a model-free method for assessing short-term predictability for local states of fully observed or partially observed systems. Several methods exist for evolving density distributions through a dynamical system, such as the Fokker–Planck equation (Risken and Risken 1996) or the transfer operator (Froyland et al. 2007; González-Tokman 2018; Balasuriya 2021; Blachut and González-Tokman 2020). The transfer operator, or Perron–Frobenius operator, is a solution operator of a Fokker–Planck equation (Balasuriya 2021) and provides a natural data-driven framework for the density perspective. In this paper, the transfer operator of reconstructed vector fields will be numerically approximated by Ulam matrices (Dellnitz et al. 2001) to simulate the spread of densities.

The main contributions of this work are in the derivation of algorithms for model-free prediction which (i) estimates well beyond the expected timescale for noisy, intermediate-length data from highly sensitive (or chaotic) systems, and (ii) provides the uncertainty of the prediction via a probability density function. The first of these is achieved via a novel application of optimal control methodology—an unusual approach for prediction which we demonstrate is effective in settings of low data and high noise. The second is computed via an estimation of the transfer operator, which pushes forward uncertainty distributions. These methods may be simply adapted for partial observations to allow the prediction of trajectories and densities of systems where data only contains measurement of some system variables. The paper is organised as follows. The method of reconstructing the vector field of an unknown dynamical system from noisy data is presented in Sect. 2.1. The optimal control problem for vector field reconstruction is defined and the Pontryagin maximum principle is used to derive coupled equations that solve this problem. The algorithms for model-free prediction from the reconstructed vector field are developed in Sects. 2.2 and 2.3. The method of recreating the full system attractor from partially observed data by delay embedding is explained in this section to enable partially observed prediction. The process of estimating the transfer operator from the reconstructed vector field is also shown. The effectiveness of these algorithms is demonstrated in Sect. 3 through predictions of simulated data from the chaotic Lorenz system and the hyperchaotic Rössler system, and through forecasting the uncertainty distributions of the Lorenz system. The robustness of the algorithms to the control parameter, amount of data, noise level, and initial condition uncertainty is then quantitatively assessed alongside the robustness of models with model error. Section 4 finally discusses limitations and extensions to the presented work.

2 Prediction and Its Uncertainty

Given time series data up to a time T, the prediction or forecasting problem is to determine how the time series evolves into the future up to a time \(T+T_f\). This is a difficult problem for several reasons:

  • The time series may only contain measurements of some system variables, but not all N of them; this is called a ‘partially observed system’.

  • The time series may not traverse regions in phase space into which it may venture in the future, and consequently there will not be sufficient information to base predictions on.

  • Most nonlinear systems in high dimensions exhibit chaos, where mild changes in initial conditions lead to an exponential increase in separation, and hence prediction beyond the Lyapunov timescale associated with this exponential error increase is fraught (Wolf 1986).

A schematic diagram of the prediction problem is shown in Fig. 1.

Fig. 1
figure 1

Schematic diagram of prediction problem. Noisy, partial trajectory data from 0 to T (dotted black), true future trajectory from T to \(T+T_f\) (dashed black), and prediction from T to \(T+T_f\) (red) (Color figure online)

In this section, we address the prediction problem, while considering some of the difficulties outlined above. First, it is necessary to use the trajectory data until time T to construct the vector field which drives the evolution of the trajectory. Our methodology for doing this via an optimal control approach is first presented in Sect. 2.1. Next, the procedure for prediction beyond time T using the reconstructed vector field is outlined in Sect. 2.2. Next, if the available data is partially observed, i.e. if only the trajectory of one variable from a higher-dimensional system is known, we propose an algorithm for prediction in Sect. 2.3. The process of quantifying the uncertainty of model-free predictions will be described in Sect. 2.4, and error metrics which can be used for verifying the effectiveness of the process will be discussed in Sect. 2.5.

2.1 Optimal Control-Based Vector Field Reconstruction

Vector field reconstruction methods are techniques for recreating vector fields from trajectory data to learn differential equation models of the observed system. Given trajectory data from an unknown system, these methods estimate the temporal derivatives of the trajectories to then define the vector field in a region of space and time. For autonomous (time-invariant) systems, the vector field can be described by only its variation in phase space. Vector field reconstruction has been successfully performed using polynomial fitting (Gouesbet and Letellier 1994), deep learning (Han et al. 2019), and linear combinations of basis fields (Haufe et al. 2009). In contrast, this section presents a model-free method for vector field reconstruction in the presence of noise using an optimal control approach. We remark that we only require vector field reconstruction local to the trajectory for the purpose of evolving it in time, rather than a global vector field reconstruction which may be an end in and of itself.

In vector field reconstruction in general, the time derivative of the trajectory data must be estimated to determine the vector field along the trajectory path. If there is noise in the trajectory data, numerical differentiation will produce large errors in this derivative. In applied settings, ad hoc approaches are therefore used in choosing alternative differentiation techniques. Most commonly, preprocessing steps are used to smooth the data and decrease the level of noise in the system, allowing for simple differentiation methods to be used subsequently. Despite the existence of many sophisticated smoothing techniques, choosing appropriate smoothing parameters to minimise loss of information is challenging. Therefore, alternate methods of approximating derivatives of noisy data are required. Two common approaches for estimating derivatives are (i) to approximate the noisy data by curve fitting with differentiable functions and then computing the derivative (Knowles and Renka 2014), and (ii) to use a regularisation approach (Chartrand 2011).

Here, optimal control (Lawden 1975, e.g.) is proposed as a direct method for derivative estimation that simultaneously smooths the trajectory data and performs an energy-optimal vector field reconstruction. Optimal control is a well-established methodology that has been used for optimal fluid mixing (Zhang and Balasuriya 2020), optimal power management (Yu et al. 1970), and optimal trajectory smoothing (Dey and Krishnaprasad 2012). In this instance, the cost function is defined as a weighted linear sum of the trajectory tracking error and the trajectory energy, which will be described in detail later in this section. The Pontryagin maximum principle is then invoked to find pairs of controlled trajectories and control field values from the reference trajectories. The controlled trajectories will be energy-optimally smoothed, and the control field values along this trajectory will be an estimate of the derivative of the reference trajectory. This method also requires selection of an energy parameter to control the scale of attenuated noise. As the output of this approach is both a smoothed trajectory and a derivative estimate, the smoothed trajectory may be compared to the reference trajectory for validation. The generated vector field can then be used to predict the trajectory into the future.

The above procedure is now explained in detail. Suppose time series measurements are available for N system variables, which can be represented at each time t by the measured trajectory vector \({\hat{\textbf{x}}}(t)=[\hat{x}_1(t), \hat{x}_2(t), \dots ,\hat{x}_N(t)]^\textsf{T}\). Without loss of generality, assume that the time is shifted so that \( t \in [0,T] \). In realistic situations, t is discrete and potentially infrequent, and intermediate t values would need to be filled in via interpolation. It is assumed that the time evolution of the controlled trajectory \( \textbf{x} \) is governed by

$$\begin{aligned} \dot{\textbf{x}} = \textbf{u}\left( \textbf{x} \right) \,, \end{aligned}$$

for an unknown vector field \( \textbf{u} \), where the overdot denotes the t-derivative. Moreover, it is assumed that the data is possibly corrupted by noise. In this section, the goal is to determine an ‘optimal’ \( \textbf{u} \) from the measurement data \( \hat{\textbf{x}} \). (Prediction from partially observed measurements—only some of the components of \( \hat{\textbf{x}} \)—is addressed in the subsequent Sect. 2.3.)

Define the true states as \(\textbf{x}(t)=[x_1(t), x_2(t), \dots ,x_N(t)]^\textsf{T}\), and the N energy parameters as \(\mu _i > 0\), for \( i = 1, \dots , N \). The approach here is to consider the true trajectory \( \textbf{x}(t) \) as being ‘controlled’ by the unknown vector field \( \textbf{u} \), and hence the problem is recast as seeking to minimise the cost function

$$\begin{aligned} J = \int ^T_0 (\textbf{x}-\hat{\textbf{x}})^\textsf{T}Q \,(\textbf{x}-\hat{\textbf{x}}) + \textbf{u}^\textsf{T}\textbf{u} \, \text {d}t, \end{aligned}$$

where the diagonal matrix Q is defined by

$$\begin{aligned} Q= \begin{bmatrix} \mu _1 &{} 0 &{} \cdots &{}0 \\ 0&{} \mu _2 &{} \cdots &{}0 \\ \vdots &{} \vdots &{}\ddots &{}\vdots \\ 0&{} 0 &{} \cdots &{}\mu _N \\ \end{bmatrix}. \end{aligned}$$

In some settings, the \( \mu _i \) are selected to be equal; this value is then the energy parameter \(\mu \). The cost function J puts a penalty on the distance between the controlled trajectories \(\textbf{x}\) and the measured trajectories \({\hat{\textbf{x}}}\), and the squared magnitude of the control field \(\textbf{u}\) for all time. Therefore, the algorithm attempts to create a vector field that produces trajectories that match the reference dynamics when given the same initial condition, while minimising the energy of the controlled system.

The energy parameter can be tuned to accommodate varying expectations of noise in the data, with a lower \(\mu \) resulting in a smoother controlled curve. Penalising the energy of the vector field is a physically intuitive method for trajectory smoothing, avoiding methods such as penalising high frequencies (as with low-pass filters) which may be associated with relevant timescales of the inherent dynamics. The robustness of the method to the choice of energy parameter is discussed in Sect. 3.3.

The Pontryagin maximum principle (Lawden 1975) can be applied to the optimal control problem (2) to show that the controlled state trajectories \(\textbf{x}\) and costate trajectories \(\textbf{p}\) satisfy

$$\begin{aligned} {\left\{ \begin{array}{ll} \dot{\textbf{x}} =-\frac{1}{2}\textbf{p}\\ \dot{\textbf{p}} = -2 \, Q \, \left( \textbf{x}-\hat{\textbf{x}} \right) \end{array}\right. }, \quad \text {with conditions} \quad {\left\{ \begin{array}{ll} \textbf{x}(0) = {\hat{\textbf{x}}}(0) \\ \textbf{p}(T) = -2 \, \dot{\hat{\textbf{x}}}(T) \end{array}\right. }. \end{aligned}$$

The derivation of (4) is shown in Appendix A. When this system of equations is solved, \( \textbf{x} \) and \( \textbf{u} = - \textbf{p}/2 \) provide the required smoothed trajectory and vector field associated with the problem.

As is standard, the condition for the trajectory \( \textbf{x} \) is an initial condition, whereas that for the costate \( \textbf{p} \) is a terminal condition (Lawden 1975; Zhang and Balasuriya 2020). This latter condition—sometimes called the transversality condition—forces the vector field to match measured velocities at final time. However, given the time series \( \hat{\textbf{x}} \) at discrete times within the interval [0, T] , it is only possible to estimate \( \dot{\hat{\textbf{x}}}(T) \). Errors in this approximation will result in the turnpike phenomena in the controlled trajectory (Zaslavski 2015). Taking into account the noise in the data, the approximation of the derivative at final time is found by smoothing the trajectory and then numerically differentiating it. The approximation is calculated using the backwards difference method on the measurement data after being Savitzky–Golay filtered. Numerical differentiation is avoided at all other times.

There are many methods in the literature for numerically solving coupled initial/terminal systems such as (4) in the optimal control context (see (Zhang and Balasuriya 2020) for a discussion). Many of these (indirect or multiple shooting; collocation; simultaneous, sequential or direct transcription) are quite sensitive, and may fail unless parameters are chosen carefully. The fourth-order finite difference boundary value problem solver bvp4c is used to provide a \(C^1\)-continuous solution on the interval of integration (Kierzenka and Shampine 2001). The trajectory \(\textbf{x}\) and the vector field \(\textbf{u}\) are therefore found by numerical integration rather than differentiation, and hence, this method is robust to noise.

The numerical solution to (4) provides values of \( \textbf{x} \) and \( \textbf{u} = - \textbf{p}/2 \) at discrete time instances \( t_j \in [0,T] \). This represents a set of points \( \textbf{x} \left( t_j \right) \) in the state space at which the vector field \( \textbf{u} \) is known. Delaunay triangulation interpolation can be used to interpolate \( \textbf{u} \) to the convex hull of data points \( \textbf{x}(t_j) \). This algorithm is effective for up to six dimensions, and therefore, six-dimensional autonomous systems can effectively be analysed with this method. Extrapolation outside the convex hull is of course error-prone, since there is insufficient data. We can nonetheless provide reasonable estimates by defining auxiliary points \(\textbf{x}_{\text {aux}}\) around the convex hull, sufficiently far from the sampled points, that have artificial \( \textbf{u}_{\text {aux}} \) values of \( \textbf{0} \). Construction of the vector field using this approach enables prediction of the time series \( \hat{\textbf{x}} \) into the future, as is described in the next section. This methodology for reconstructing a vector field from noisy data is described in Algorithm 1.

Algorithm 1
figure a

Vector field reconstruction

2.2 Fully Observed Prediction

The system is ‘fully observed’ if the time evolution of all N variables in the system are known. This is the situation discussed in the previous section, in which the time series of each of the N components of \( \hat{\textbf{x}}(t) \) were known over a time duration [0, T] . The intention is to predict the evolution of each of these components beyond time T. From the results of the previous section, the governing vector field \( \textbf{u} \) can be reconstructed. Since the fully observed data represents a trajectory of (1) until time T, a standard numerical solution scheme can then be used to integrate the trajectory forward in time. In this paper, the variable-step, variable-order solver ode15s (Shampine and Reichelt 1997) is used with relative and absolute tolerances of \(10^{-6}\).

It may seem natural to use the time T (i.e. the final time at which measured trajectory data is available) as the initialisation time \( T_p \) (henceforth called the prediction time) for implementing prediction. However, the data is noisy and consequently the errors in the time derivative (necessary for vector field reconstruction as outlined in the previous section) can be quite large. If the final measured point (at time T) has high derivative, small errors can quickly compound to large errors in the evolving trajectory. Choosing an earlier time \( T_p < T \) where the derivative, and hence vector field, is small in magnitude will decrease compounding errors. However, choosing \( T_p \) much earlier than T will decrease prediction accuracy because the available information between times \( T_p \) and T would be ignored. We propose that the process of choosing the prediction time \( T_p \) therefore involves finding the time where the derivative is minimised over some interval \([T-\delta ,T]\):

$$\begin{aligned} T_p ={\text {argmin}}_{t\in [T-\delta ,T]}\Vert \textbf{u}(t)\Vert , \end{aligned}$$

where the ‘minimum argument’ \({\text {argmin}}\) is found numerically as the time \(t\in [T-\delta ,T]\) which minimises \(\Vert \textbf{u}(t)\Vert \). The improvement in accuracy by varying the prediction time \( T_p \) to minimise the derivative is presented through simulations in Appendix B. Figure 2 presents an example of prediction to \(T_f\) time units into the future beyond the data-available time T.

Fig. 2
figure 2

Visual example of prediction. Curves shown are the smoothed trajectory from 0 to T (solid black), true future trajectory from T to \(T+T_f\) (dashed black), and prediction from \(T_p\) to \(T+T_f\) (red)

Algorithm 2
figure b

Prediction from full observations

With these observations in mind, the fully observed prediction framework which we have described is presented in Algorithm 2. Its implementation is presented in Sect. 3.

2.3 Partially Observed Prediction

In practice, the simultaneous measurement of all relevant variables of a system is not always possible, and therefore, we must rely on partial state observation. This means that we do not have access to all N of the system variables’ evolution, but rather to only a subset of them. Here, we will assume we only have access to one variable, but the following approach easily extends to partially observed systems with more than one observed variable. We can leverage Takens’ theorem (Takens 1981) to reconstruct the full system attractor in a higher embedded space through using delay coordinate embeddings. The embedded vector field (subsequently known as the Takens field) may then be estimated using the process in Sect. 2.2 and used to evolve the embedded dynamics for prediction of the partially observed variable.

Consider a continuous dynamical system \(\dot{\textbf{x}}=\textbf{f}(\textbf{x})\) in a general dimension \(\textbf{x}\in X\), and suppose that this possesses an attractor A towards which trajectories beginning in a certain region tend to evolve. Let h be the ‘partial observation function’ \(x(t)=h\left( \textbf{x}(t)\right) \), which gives us the evolution of one time series from this system; i.e. we do not have access to each of the variables’ evolution, but instead have either just one system variable, or some other variable derived from these system variables. Takens’ theorem enables an embedding \(\textbf{x}_{\text {emb}}=\mathbf {\Phi } (x)\) which takes the time series associated with the one observable variable x(t), and constructs an evolution in a m-dimensional phase space.

Many embeddings exist, such as the delay and differential embeddings originally presented by Takens (1981), and the bundle embedding as an extension to forced and stochastic systems (Stark 1999; Stark et al. 2003). Throughout this work, we used the classical delay embedding with time delay \(\tau \) and embedding dimension m

$$\begin{aligned} \textbf{x}_{\text {emb}}(t)=\mathbf {\Phi } _{\tau ,m}(x):=\left[ x(t), x(t\!-\!\tau ),\dots ,x(t\!-\!(m\!-\!1)\tau )\right] ^\textsf{T}\end{aligned}$$

which at any time t, samples the time series x(t) at m values corresponding to time delays \(\{ 0,\tau ,\dots ,(m-1)\tau \}\). As t progresses, the m-dimensional variable \(\textbf{x}_{\text {emb}}(t) \) evolves according to a dynamical system \(\dot{\textbf{x}}_{\text {emb}}=\textbf{g}(\textbf{x}_{\text {emb}})\) where \(\textbf{g}:\mathbb {R}^m\rightarrow \mathbb {R}^m \) is inferred from the data. An attractor A in the original system is expected to be visible in the embedded system by a set \( \tilde{A} \). Appropriate values of time delay \(\tau \) and embedding dimension m must be chosen for the delay embedding to give a proper reconstruction of the full system attractor.

There have been many methods devised for selecting optimal values of these parameters such as through using mutual information (Fraser and Swinney 1986), symbolic dynamics (Matilla-García et al. 2021), or using the correlation integral (Kim et al. 1999). Throughout this research, the optimal time delay \(\tau \) given by the mutual information method was used. The embedding dimension m was chosen to be equal to the dimension of the full attractor and methods for inferring the attractor dimension are required in practice, such as choosing the lowest dimension where there are no false nearest neighbours (Kennel et al. 1992). The presented algorithms also work for non-uniform embeddings (Judd and Mees 1998), where delay coordinates are lagged by differing delay sizes.

Fig. 3
figure 3

Embedding process for continuous dynamical systems. a Symbolic representation of embedding process. b Visual example of the embedding process

The embedding process described above is visualised by the diagrams in Fig. 3a (adapted from a discrete version by Vlachos and Kugiumtzis (2008)) and Fig. 3b. Given that the vector fields \( \textbf{f} \) and \( \textbf{g} \) govern the evolutions in the original and embedded phase spaces, respectively, we have the trajectory evolution equations

$$\begin{aligned} \textbf{x}(t+\Delta t)&= \textbf{x}(t)+\int ^{t+\Delta t}_{t}\!\!\!\textbf{f}(\textbf{x}(\tau ))\,\text {d}\tau , \text { and} \end{aligned}$$
$$\begin{aligned} \textbf{x}_{\text {emb}}(t+\Delta t)&= \textbf{x}_{\text {emb}}(t)+\int ^{t+\Delta t}_{t}\!\!\!\textbf{g}(\textbf{x}_{\text {emb}}(\tau )) \,\text {d}\tau \end{aligned}$$

which define the mappings \( \textbf{F} \) and \( \textbf{G} \) in Fig. 3a. These mappings provide a way to estimate the variables at time \( \Delta t \) into the future. Figure 3a shows that the partially observed variable x(t) can therefore be predicted through embedding it into \(\mathbb {R}^m\), evolving the dynamics in the embedded system, and then separating the forecast of the partially observed variable from this embedded trajectory forecast. This is equivalent to applying the functions \( \mathbf {\Phi } \), \( \textbf{G} \) and \( \mathbf {\Phi } ^{-1} \) in order to reconstruct the partially observed trajectory x(t) .

It is evident that embedding is effective for time series from stationary processes; however, non-stationarity can be handled through overembedding (Verdes et al. 2006). This result relies on the ability to transform any n-dimensional non-autonomous system into an \((n+1)\)-dimensional autonomous system through defining time as a state variable.

Techniques for predicting partial observations from delay embeddings reduce to approximating the function g that evolves the embedded dynamics. A common technique for estimating g is to use the method of analogues (Lorenz 1969; Casdagli 1989; Jayawardena and Lai 1994; Sugihara and May 1990; Abarbanel et al. 1994; Pérez-Muñuzuri and Gelpi 2000; Hamilton et al. 2016). This algorithm constructs a linear autoregressive model of g at the current state using the nearest neighbours, points within a pre-defined neighbourhood around the current state. The regression coefficients can be found through the future values of the nearest neighbours, and then, this regression model can be used to forecast the current state forward in time. This process can then be iterated to forecast neighbourhood-to-neighbourhood indefinitely. Rather than constructing local linear maps at each state, global representations of the embedded attractor provide a single, nonlinear function for the whole dataset. These global models may be created through polynomial fitting (Giona et al. 1991) and are smaller and therefore more convenient for computation; however, they are less accurate than the local linear method (Abarbanel et al. 1994).

Recently, there has been much focus on using machine learning techniques to estimate g. The pattern detection power of machine learning has enabled the model-free prediction of chaotic time series using delay embeddings. The use of recurrent neural networks in reservoir computing to estimate g has shown excellent results for input data with low noise or no noise (Pathak et al. 2018; Gauthier et al. 2021). Rather than using typical pattern identification, reservoir computers are dynamical systems themselves and are trained on trajectory data to learn to emulate the underlying dynamical system.

In our situation, the data is permitted to be noisy, and the embedding procedure amplifies noise (Casdagli et al. 1991). Therefore, the optimal control approach of approximating the Takens field allows a global, continuous, and nonlinear representation of g to be estimated. Once this field has been approximated, numerical integration is used to forecast the embedded trajectory past the final time.

As the coordinates of the delay embedding are delayed signals of the partial observation, the Takens field can be found simply by estimating the derivative of the observation and then delaying this vector. Once the field has been approximated and the triangulation has been defined, the method of choosing the point of prediction, \(T_p\), presented for the full observation prediction is applied. Numerical integration can then be used to forecast the embedded dynamics, and the first component of this trajectory is the prediction of the partially observed variable. The variable-step, variable-order solver ode15s (Shampine and Reichelt 1997) was used with relative and absolute tolerances of \(10^{-6}\).

Algorithm 3 summarises the partial system prediction framework that has been described in this section. This algorithm is distinct from Algorithm 2 as it calculates the Takens field \(\textbf{u}_{\text {emb}}\) by lifting the derivative of x, u, into the embedded space. This algorithm is therefore computationally cheaper than performing Algorithm 2 on the embedded trajectory. After integrating the Takens field to obtain a prediction in embedded space \(\textbf{x}_{\text {emb}}(t)\), the embedding must then be inverted to produce the predicted partial trajectory x(t).

Algorithm 3
figure c

Prediction from partial observations

2.4 Forecasting Uncertainty

The ability to reconstruct the vector field of fully and partially observed variables presents the opportunity to estimate how probability densities evolve along the attractors. This allows a quantification of the uncertainty of prediction and estimations of prediction intervals. This uncertainty is a combination of the chaotic sensitivity of the attractor, and the structural imperfection of the estimated system. We will show that the analysis of this uncertainty may be performed via the transfer operator. The transfer operator advects densities through flows (Lasota et al. 1994), permitting a probabilistic description of transport and mixing (Froyland and Padberg 2009), and has been used to study molecular dynamics (Deuflhard et al. 2012), and coherent structures in the ocean (Froyland et al. 2007) and the atmosphere (González-Tokman 2018; Blachut and González-Tokman 2020). Rather than using modelled or measured vector fields, this work allows the transfer operator to be estimated from reconstructed vector fields to evolve initial condition densities on attractors without prior knowledge of the system. Typically either short simulations of many initial conditions or long simulations of a small number of trajectories are needed to accurately approximate the operator (Klus et al. 2016). However, the presented approach for vector field reconstruction requires only a single intermediate-length trajectory. Further, because the transfer operator can be estimated from fully or partially observed variables, invariant densities and other important dynamical objects may be analysed without the need for a model or access to vector field measurements.

Given an autonomous dynamical system with discrete time evolution \(M:X\rightarrow X\), defined for time step \(\Delta t\) as in Equations (7) and (8), the evolution of trajectories over n discrete time steps may be described iteratively as the composition

$$\begin{aligned} M^n(\varvec{x})= {\mathop {\overbrace{M\circ \cdots \circ M}}\limits ^{n \text { times}}} (\varvec{x}) \end{aligned}$$

where \(\varvec{x}\in X\) and X is a phase space. In the fully observed case, the evolution rule M is \(F:A\rightarrow A\), and the partially observed case has evolution rule \(G:\tilde{A}\rightarrow \tilde{A}\) as in Fig. 3.

The transfer operator or Perron–Frobenius operator \(\mathcal {L}:L^1(X,\text {vol})\rightarrow L^1(X,\text {vol})\), is defined as

$$\begin{aligned} \int _B \mathcal {L}f(\varvec{x})\,d\text {vol}(\varvec{x}) = \int _{M^{-1}(B)} f(\varvec{x})\,d\text {vol}(\varvec{x}), \end{aligned}$$

where vol is the Lebesgue measure on X and B is any measurable subset of X. This operator therefore describes the evolution of initial condition density \(f\in L^1(X,\text {vol})\) through the dynamics M. The evolution of densities may therefore be described iteratively as the composition

$$\begin{aligned} \mathcal {L}^n (f(\varvec{x}))=\mathcal {L}\circ \cdots \circ \mathcal {L}(f(\varvec{x})). \end{aligned}$$

The operator is assumed to be non-singular and hence does not create density (Lasota et al. 1994).

For numerical settings, the transfer operator may be approximated through Ulam’s method (Ulam 1960). This method partitions the solution space into d bins \(\{ B_i \}\) and approximates \(\mathcal {L}\) as a \(d \times d\) matrix known as the Ulam matrix P. The ij-th entry of P is calculated by evolving q uniformly distributed initial conditions \(\{ \varvec{x}_{1},..., \varvec{x}_{q} \} \in B_i\) through the vector field, and determining the proportion that arrive in bin \(B_j\). The Ulam matrix is therefore defined as

$$\begin{aligned} P_{i,j}=\# \{ \varvec{x} \in \{ \varvec{x}_1, \ldots , \varvec{x}_q \}: M(\varvec{x}) \in B_j \} / q. \end{aligned}$$

The GAIO software package (Dellnitz et al. 2001) was used to calculate the Ulam matrix where the flow map M is approximated by integrating the reconstructed vector field. Initial condition densities \(\varvec{\rho }(T_p)\in \mathbb {R}^d\) defined by the state at prediction time may then be evolved through iterative multiplication of this matrix,

$$\begin{aligned} \varvec{\rho }(T_p+n\Delta t) = \varvec{\rho }(T_p)P^n. \end{aligned}$$

This evolving density distribution provides a quantification of the uncertainty for each prediction. The density distribution for a particular variable is found by projecting the distribution onto the relevant coordinate. For initialisation, the bin containing the state at prediction time \(\varvec{x}(T_p)\in B_i\) was set to have density 1, and all other bins were given zero density. Algorithm 4 describes the process of evolving probability density distributions.

Algorithm 4
figure d

Probability density evolution

2.5 Error Metric

Analysing the accuracy of predictions of chaotic variables requires the definition of an error metric. The standard root-mean-square error metric gives misleadingly low errors for predictions that initially diverge and then meet back with the true trajectory; we instead defined a metric that includes the Lyapunov time of the system to weight the short-term prediction accuracy. The Lyapunov time, calculated as the reciprocal of the maximal Lyapunov exponent, is the characteristic timescale of chaotic systems, and represents the time over which a small error will grow by a factor of e. For the systems investigated in this work, this value has been precisely estimated (Viswanath 1998; Letellier and Rossler 2007). When only data is available, the maximal Lyapunov exponent may still be estimated by data-driven methods (Wolf et al. 1985). This exponential growth of errors means that predicting a chaotic system beyond the Lyapunov time is fraught.

The predictions \({\tilde{x}}_i\) of true trajectory \(x_i\) at time index \(i\in \{1\dots I\}\), where

$$\begin{aligned} t_i=T+T_f\frac{i-1}{I-1} , \end{aligned}$$

are compared through using the weighted error metric

$$\begin{aligned} E = \sqrt{\frac{\sum ^{I}_{i=1}\gamma ^i(x_i-\tilde{x}_i)^2}{\sum ^{I}_{i=1}\gamma ^i}} \quad , \quad \text {where} \quad \gamma =\left( \frac{1}{2}\right) ^{{\Delta t}/{T_{\lambda }}} \! \! \! \! \! \! \! \! \! \end{aligned}$$

and \(\Delta t=t_2-t_1\) is the simulation time step and \(T_\lambda \) is the Lyapunov time of the system. The error metric is a weighted \( \text {L}^2 \)-norm and represents the averaged distance from the true trajectory (noiseless), weighted to the start of prediction. The weighting term \(\gamma \) is defined such that at one Lyapunov time into prediction, the weighting factor has decreased by 1/2.

3 Results

This section presents simulation results that demonstrate the presented prediction method and density prediction method for both fully and partially observed systems. The results are compared to trajectories from the true model, and from a model with model imperfection. Case studies are presented first, followed by an assessment of the robustness and accuracy of prediction through analysing the error statistics of 100 simulations while varying parameters including the energy parameter \(\mu \), the amount of trajectory data, and the amount of measurement noise in the trajectories. The sensitivity to initial conditions of the discovered velocity field was also investigated.

In this section, measured trajectory datasets were synthesised through solving the chaotic Lorenz (L63) system (Lorenz 1963) and the hyperchaotic Rössler system (Rossler 1979). Given an initial condition, the systems were numerically solved and a section of the dataset at the beginning was excluded to ensure the synthesised data lies sufficiently close to the attractor. The time steps of the simulated data were chosen to be sufficiently small as to not influence the results.

3.1 Case Studies

Two well-known chaotic systems, one in three dimensions and one in four dimensions, were used to assess the accuracy of prediction using the proposed techniques. For each system, model-free predictions were compared to trajectories from models with model imperfection. An introduced parameter \(\epsilon \) acts to perturb coefficients in the differential equations of each system to simulate model error. To effectively compare the effects of measurement noise on model-free predictions to the effects of model error on model-based predictions, the divergence introduced by initial condition error was not considered, and therefore, the initial conditions for the imperfect model were chosen to be the noiseless, true initial conditions. The time-varying absolute error between the true trajectory and both the algorithm predictions, and the imperfect model trajectories are presented and compared with the linearised approximation of chaotic divergence in Appendix C.

Table 1 outlines the colour code used throughout this section. A comparison of the attractors from the noisy data, imperfect model, fully observed reconstruction, and partially observed reconstruction are also shown. These plots highlight the qualitative similarity between the attractor reconstructed from partial observations and the full system attractors.

Table 1 Colour code for results and comparison of Lorenz attractors

The Lorenz (L63) system, first derived in 1963, is a three-dimensional continuous dynamical system that was originally studied as a simplified model of atmospheric convection (Lorenz 1963). The governing equations for this classical chaotic system are

$$\begin{aligned} \left\{ \begin{array}{ll}\dot{x}&{}=\sigma (y-x) \\ \dot{y}&{}=\rho x-(1+\epsilon )xz-y \\ \dot{z}&{}=(1+\epsilon )xy-\beta z \end{array}\right. . \end{aligned}$$

The parameter values used in Lorenz (1963) were chosen for the true system: \(\sigma =10\), \(\beta =\frac{8}{3}\), \(\rho =28\), and \(\epsilon =0\). The Lyapunov time of the L63 system is \(T_\lambda \approx 1.104\) time units (Viswanath 1998). Trajectory data was synthesised by choosing an initial condition, simulating for 40 Lyapunov times and retaining the last 10 Lyapunov times.

First consider the full observation of all three variables x, y, and z. Having obtained a time series of 10 Lyapunov times length with time step \(\Delta t=0.01\) from (16), we add Gaussian white noise of variance \(\sigma ^2=0.1\). In using Algorithm 2 to forecast the variation of one component of this trajectory, we chose an energy parameter of \(\mu =1000\). Prediction for \(T_f= 5 \) time units beyond the final time T is shown in Fig. 4a. The forecast between prediction time \(T_p\) and final time T is omitted to show only the forecast of unavailable trajectory data.

Next, suppose that only one component of the trajectory, say x, is available, while the other two variables are hidden. Using a time delay of \( \tau = 0.11 \) time units, the optimal value given by the mutual information method, the trajectory is embedded in \(\mathbb {R}^3\). Gaussian white noise of variance \(\sigma ^2=0.1\) was then added. We then simulated 10 Lyapunov times worth of data with time step \(\Delta t=0.01\), chose an energy parameter of \(\mu =1000\), and predicted for \( T_f=5 \) time units as shown in Fig. 4c. Bearing in mind the difficulty of predicting beyond a time of \( T_\lambda \approx 1.104 \), this is an excellent outcome for a partially observed chaotic system with measurement noise.

Model error was introduced by perturbing two terms in Equation (16) with perturbation parameter \(\epsilon \). The two nonlinear terms were chosen, as methods for generating differential equations from data estimate these coefficients with less accuracy than the linear terms (Champion et al. 2019). A perturbation of \(\epsilon =0.05\) was selected as the median prediction error for this value is closest to the median predictive error for the predictions of Algorithm 3 for training data noise of \(\sigma ^2=0.1\), as seen in Sect. 3. The true initial conditions without noise were used to integrate the imperfect model for \(T_f= 5 \) time units, and the resulting trajectories are shown in Fig. 4a and c.

The hyperchaotic four-dimensional Rössler system, first studied in 1979, exhibits two positive Lyapunov exponents (Rossler 1979). (Hyperchaotic systems are defined to have at least two positive Lyapunov exponents and therefore must lie in four dimensions or higher.) The system proposed by Rössler was the first such system discovered, and possesses a fractal attractor in \( \mathbb {R}^4 \). The equations that describe the hyperchaotic Rössler system are

$$\begin{aligned} \left\{ \begin{array}{ll} \dot{x}&{}=-y-z\\ \dot{y}&{}=(1+\epsilon )x+ay+w\\ \dot{z}&{}=b+(1+\epsilon )xz\\ \dot{w}&{}=-cz+dw \end{array}\right. . \end{aligned}$$

The parameter values selected in Rossler (1979) were used for the true system; \(a=0.25\), \(b=3\), \(c=0.5\), \(d=0.05\), and \(\epsilon =0\). The Lyapunov time of this hyperchaotic Rössler system is \(T_\lambda \approx 8.929\) time units (Letellier and Rossler 2007). Trajectory data was synthesised by choosing an initial condition, simulating for 100 Lyapunov times and retaining the last 40 Lyapunov times.

Time-series data with time length of 40 Lyapunov times and time step \(\Delta t=0.05\) was obtained from the Rössler hyperchaotic system (17), and Gaussian white noise of variance \(\sigma ^2=0.05\) was added. Due to the varying magnitudes of components in the trajectory data, different energy parameters were chosen for the different components: \(\mu _x=\mu _y=250\), \(\mu _z=500\), and \(\mu _w\!=\!50\). Prediction for 40 time units beyond the final time T using Algorithm 2 is shown in Fig. 4b.

Next, we supposed that only the time series for the x component were available. A time delay of \( \tau = 1.75 \) time units was chosen to embed the time series into \(\mathbb {R}^4\). Gaussian white noise of variance \(\sigma ^2=0.05\) was added and the trajectory data of 40 Lyapunov times in length with time step \(\Delta t=0.05\) was used. The prediction using Algorithm 3 with \( \mu = 250 \) for 40 time units is shown in Fig. 4d. Again, good prediction beyond the Lyapunov time is observed, even in this hyperchaotic situation of partially observed data corrupted by noise.

Fig. 4
figure 4

Predictions of chaotic trajectories with time measured from the final time T. a Full observation prediction with weighted error \(E=5.1949\) (blue), imperfect model with \(E=4.6425\) (green), and true trajectory (black) of L63 system. Initial condition of simulation was \((0.6456,-0.3384,12)\). b Full observation prediction (blue), imperfect model (green), and true trajectory (black) of hyperchaotic Rössler system. Initial condition of simulation was \((-8,-8,15,20)\). c Partial observation prediction with \(E=6.0287\) (red), imperfect model with \(E=9.2497\) (green), and true trajectory (black) of L63 system. Initial condition of simulation was \((-0.5826,-0.9403,12)\). d Partial observation prediction (red), imperfect model (green), and true trajectory (black) of hyperchaotic Rössler system. Initial condition of simulation was \((-6,-6,15,15)\) (Color figure online)

The perturbation parameter \(\epsilon \) was increased in Equation (17) to simulate model error. The coefficients of the nonlinear term and linear x term were selected to be perturbed and a value of \(\epsilon =0.05\) was chosen. The true noiseless initial conditions were used to evolve the model error system for \(T_f= 40 \) time units after final time T and the resulting trajectories are shown in Fig. 4b and d.

3.2 Density Forecast

The uncertainty in each prediction may be quantified through the estimation of the transfer operator of the vector field. The density distributions were calculated over time for the fully and partially observed cases using the same initial conditions and compared to the distribution found using the analytic vector field. The flow map used to estimate the transfer operator was defined using the time step used in simulations. An intermediate-length trajectory with 10 Lyapunov times of data was used for training. For each field, the operator was approximated by partitioning the state space into \(2^{20}\) bins and seeding each bin with 25 initial conditions. The initial density distribution was evolved for 5 time units past final time using Algorithm 4. Distributions are represented with dark colours for low density and light colours for high density. The true trajectory and corresponding trajectory predictions are plotted on top of the evolved distributions in white. Trajectory predictions with forecasted density distributions are shown in Fig. 5.

Fig. 5
figure 5

Comparison of forecasting density distribution for true field (black), imperfect model field (green), fully observed field (blue), and partially observed field (red). Density plots are shown for times \(T_\lambda \), \(2T_\lambda \), and 3\(T_\lambda \) after final time with horizontal line indicating predicted state. Initial condition of full system was (0.6883, 0.1461, 12). An animated version of this figure is provided in the supplementary material (Color figure online)

This particular simulation shows a state of the system where the true trajectory initially experiences few switching events. This results in a period of predictability, similar to blocking patterns in meteorology (Tantet et al. 2015). Switching events can be seen in the plots as density moving from the positive-x lobe of the attractor to the negative-x lobe, after reaching the saddle point at \(x=0\). While the true system and imperfect model have experienced negligible switching after 1 Lyapunov time, the reconstructed transfer operators show switching before this time, transporting some density to the alternate lobe. This gives an indication of the diffusion that noise and reconstruction error has introduced into the estimated fields. However, the evolved density distributions may still provide a further understanding of the predictability of trajectory predictions. All plots show that there is high certainty that after 1 Lyapunov time the trajectory is in the positive-x lobe of the attractor. For other initial conditions, this analysis may reveal that divergence of initial condition density results in low certainty for the state after 1 Lyapunov time. This provides a local predictability quantification that is a more precise tool than estimating the maximal Lyapunov exponent, which only gives a global quantification of predictability.

3.3 Robustness

The robustness of our algorithm was assessed by varying: the energy parameter, length of time for which data is used for prediction, level of noise added to the trajectory data, and initial condition perturbation. The sensitivity of a model with introduced model error was also investigated by varying the level of coefficient perturbation. The L63 system was used to assess robustness. To compare the fully and partially observed algorithms, only the prediction of the x-component was studied for the full system case, and the x-component was used for the embedding in the partial system case. The error metric (15) was used to compare predictions.

The trajectories were predicted \(T_f=5\) time units after final time T. The time step used in all simulations was \(\Delta t=0.01\). The error statistics are studied through the median and the median absolute deviation (mad). These measures are robust to a wide range of distributions and quantify the location and spread of the error distribution.

3.3.1 Energy Parameter

The selected energy parameter \( \mu \) weights the tracking and energy terms in the optimal control problem; therefore, it affects the discovered vector field. To investigate the variation of prediction accuracy, the L63 system was predicted for trajectory datasets with different energy parameters. For each value of \( \mu \), 100 simulations were performed with 10 Lyapunov times of given trajectory data that were corrupted by Gaussian noise of variance \(\sigma ^2=0.1\). The distributions of the simulations are shown in Fig. 6a. The blue histograms are based on a full observation (all three variables), while the red histograms are for the partially observed situation.

3.3.2 Amount of Data

The time length of data that is measured is expected to impact the effectiveness of prediction. This was analysed through predicting the L63 system given several trajectory datasets of varying time length. The time lengths T of these datasets were in integer multiples of the Lyapunov time \( T_\lambda \approx 1.104 \) of the L63 system. For each dataset, 100 simulations were performed with random initial conditions and a fixed level of noise of variance \(\sigma ^2=0.1\). An energy parameter of \(\mu =1000\) was chosen. The distributions of simulations are shown in Fig. 6b.

3.3.3 Noise Level

Observed time series data will always be corrupted by some level of measurement noise, and therefore, studying the effect this has on prediction is necessary to assess robustness. Noise was added to the simulated L63 trajectory data to analyse the predictive power with noisy data. The added noise was Gaussian white noise \(\mathcal {N}(0,\sigma ^2)\). For each level of noise variance \( \sigma ^2 \), 100 simulations were performed with given dataset sizes of \( T=10T_\lambda \). The values of energy parameter for increasing noise level were \(\mu =2500,1750,1000,400,300\), respectively, for the variances shown in Fig. 6c. The distributions of the simulations are shown in Fig. 6c.

3.3.4 Initial Condition Perturbation

As we are considering chaotic systems, trajectories will experience sensitivity to initial conditions. If there is error in the chosen initial conditions, separate from the measurement noise that has been smoothed, it is expected that prediction error will increase. This was investigated by perturbing the initial condition of the predicted trajectory and comparing the error statistics for different magnitudes of perturbation variance. A sample from a Gaussian distribution \(\mathcal {N}(0,\sigma _{\text {IC}}^2)\) was added to each component of the prediction point given by \(\textbf{x}(T_p)\) to investigate the sensitivity to initial conditions of the presented algorithms. For the fully observed setting, this method assumes each component has equal uncertainty in the initial condition. For the partially observed setting, we are assume that there is uncertainty in the initial condition, as well as uncertainty at the points \((m-1)\) multiples of \(\tau \) delayed from that point. This allows the investigation of errors in the trajectory data that result from uncertainty not mitigated by the optimal control smoothing. One set of noiseless trajectory data of length \( T=10T_\lambda \) was used to remove the effects of measurement noise on the perturbation. The simulation initial condition was randomly sampled to have values (0.7730, 0.9876, 12). For each variance \( \sigma _{\text {IC}}^2 \), predictions were conducted for 100 perturbed initial conditions. An energy parameter of \(\mu =5000\) was chosen. The distributions of the simulations are shown in Fig. 6d. Simulations with different initial conditions gave results with similar trends in the median, but with varying distribution of error.

3.3.5 Model Error

As described in Sect. 3.1, model error may be simulated through the perturbation of coefficients in the differential equations through perturbation parameter \(\epsilon \). The error between the simulated trajectories of the true system and the imperfect model with the same initial conditions were studied through variation of \(\epsilon \). Perturbations range from mild model error to significant model error with \(\epsilon =0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1\) which represent coefficient errors from \(1\%\) to \(100\%\). For each level of perturbation, 100 simulations were performed and the distributions of the error from these simulations are shown in Fig. 6e.

Fig. 6
figure 6

Robustness analysis for full observation (blue), partial observation (red), and imperfect model (green) predictions of the L63 system. Median and mad are used to depict the summary statistics of each distribution, with the central thick horizontal line indicating the median of the distribution and the thinner exterior horizontal lines indicating one mad from the median. a Prediction error statistics with varying energy parameter \(\mu \). b Prediction error statistics with varying time length of data T. c Prediction error statistics with varying noise variance \(\sigma ^2\). d Prediction error statistics with varying initial condition perturbation with noise variance \(\sigma _{\text {IC}}^2\). e Prediction error statistics for model with varying parameter perturbation \(\epsilon \) (Color figure online)

4 Discussion and Conclusion

Through developing a method of velocity field reconstruction using optimal control and combining this with Takens’ embedding theorem, we have developed algorithms to enable fully and partially observed prediction of chaotic variables in settings with measurement noise. The predictability of these states is also quantified through the estimation of the transfer operator of each vector field. The application of these algorithms to two well-known chaotic systems has been demonstrated and robustness studies have been presented for variation in parameter value, data length, and level of noise.

The robustness of the energy parameter \(\mu \) shows that a large range of \(\mu \) will give similar median prediction accuracy, as highlighted in Fig. 6a. Parameter tuning techniques such as ordinary cross-validation may be used to find optimal values of the energy parameter (Dey and Krishnaprasad 2012). As the optimal control approach of approximating the vector field penalises the amplitude of the velocity, for large noise levels the prediction trajectory amplitudes are attenuated. Therefore, an adaptive energy parameter may be beneficial to reduce this attenuation. Future work is required to enhance the energy parameter selection process.

The amount of trajectory data given for the reconstruction of vector fields is important for accurate prediction, as can be noted by Fig. 6b. Both full and partial prediction increase in accuracy as the amount of given trajectory data increases. For online approaches to prediction using vector field reconstruction, the prediction accuracy is therefore expected to increase as the algorithm is operating, if every new observation is used to increase the fidelity of the vector field.

For low levels of noise, prediction using full observation is more accurate; however, as the level of noise increases, the fully observed and partially observed prediction error medians approach each other (Fig. 6c). This suggests that for large levels of noise, prediction using partially observed variables is as accurate as prediction using knowledge of every state variable.

The error distribution for varying levels of initial condition uncertainty changes significantly as the uncertainty increases, as is presented in Fig. 6d. The full and partial prediction error median and mad both increase as the variance of the initial condition perturbation increases, which is expected due to the chaotic nature of the Lorenz system. The difference between the median of the partial prediction and the full prediction also decreases for increasing perturbation variance.

The biggest challenge in prediction of noisy, chaotic systems is sensitivity to the initial condition. If noisy measurements are taken, the imperfect observations of the signal result in an uncertainty of appropriate initial conditions for prediction. This is a major problem in fields such as numerical weather prediction where perfect measurement of initial conditions is not possible and initialisation processes are required. The optimal control approach of trajectory smoothing seeks to avoid this issue by reconstructing an energy-optimal analogue of the measured dynamical system and using an initial condition from the energy-optimally smoothed trajectory. This initial condition will therefore be perturbed by the measurement noise and so more investigation into the relation between uncertainty in initial condition and uncertainty in the discovered analogue system is required.

The evolution of initial condition density using the transfer operator highlights this accumulation of uncertainty through the apparent diffusion present in the reconstructed vector fields. Measurement noise and the amplification of that noise through delay embedding introduce diffusion on the attractor, decreasing the already low predictability of the true system. However, the resulting uncertainty distributions can still provide model-free methods with estimates of the predictability of particular states, rather than global predictability measures.

As the imperfection of the Lorenz model increases from mild model error to significant model error, the predictive accuracy of the model decreases as is evident by Fig. 6e. Despite having access to the true initial conditions, perturbation of the coefficients of the model results in notable prediction error. Comparing Fig. 6c with Fig. 6e highlights the trade-off between measurement noise and model error in choosing an appropriate prediction method. For situations involving large measurement error, it is preferable to use a model even with mild model error. However, in scenarios of significant model error, the presented model-free algorithm outperforms the model. This trade-off is especially important when considering which data-driven technique to use for prediction, as the drop in predictive accuracy from the introduced model error for parametric methods may exceed that introduced from observational noise for nonparametric methods.

For systems that have dynamics on short timescales, such as the z-component of the hyperchaotic Rössler system, methods of noise reduction such as using a low-pass filter will attenuate information which is crucially important to reconstructing the dynamics. In contrast, the energy-optimal algorithm presented here penalises the amplitude of the vector field and therefore will include short timescale dynamics but decrease their amplitude to a less severe extent.

The presented simulations demonstrate the effectiveness of model-free methods for prediction using vector field reconstruction. Both full and partial observations were used to predict three and four-dimensional systems; however in higher dimensions, the algorithms become computationally expensive. Through extension of the Delaunay triangulation method or incorporating dimensionality reduction algorithms, this technique may be applicable to systems of any dimension.