Trajectory prediction based on conditional Hamiltonian generative network for incomplete observation image sequences

The combination of Hamiltonian formalism and neural networks is playing an important role in dealing with chaotic systems. Aiming at the problem of motion control under the condition of unknown physical quantity and incomplete observation set, a trajectory prediction model based on conditional Hamiltonian generating network (CHGN) for incomplete observation image sequences is proposed. CHGN is composed of Conditional AutoEncoder (CVAE), Hamiltonian neural network (HNN) and Velocity–Verlet integrator. CVAE encoder converts the short-term continuous observation image sequence into target motion state features represented by generalized coordinates and generalized momentum, and generates the trajectory prediction image at the specified time. HNN is used to learn potential Hamiltonian physical quantities, so as to understand more chaotic system dynamics information to realize state cognition. Velocity–Verlet integrator predicts the motion state at any moment according to the Hamiltonian learned by HNN at the current moment. The motion state and the specified time are used as the input of CVAE decoder to generate the target prediction image from the potential motion space. Experimental results show that CHGN can accurately predict target trajectories over a long period of time based on incomplete short-term image sequences, and has better performance with minimum mean square error(MSE) on three physical system datasets than existing deep learning methods.


Introduction
In the target movement with uncertainty and uncontrollability, how to achieve accurate trajectory prediction is an important direction of unmanned control system research [1][2][3][4][5]. With the rise of deep learning [6][7][8], more and more researchers use a large amount of observation data to learn dynamic models of moving targets [9,10]. These models have powerful information mining and presentation capabilities, and perform well in complex system prediction. However, they also rely on a large amount of supervised data and trained by numerical fitting, so they are often inadequate when faced with scientific problems under non-idealized data, which is also known as learning blindness problem with chaotic data.
To improve the adaptability of nonlinear complex systems, many scholars [11][12][13][14][15][16] have done a lot of work on model parameter optimization, control algorithm adaptation, etc., and achieved good results. However, these methods require high model assumptions and human experience knowledge. Actually, the description of motion involves physical quanti-ties and concepts [17,18], including time, velocity, potential energy, kinetic energy, etc., these physical quantities are the decisive factors of the motion model. With the development of the combination of neural network and molecular dynamics, dynamic neural network [19][20][21][22] has attracted more and more attention, and opened up new directions for the prediction of complex high-dimensional nonlinear data.
Learning the physical laws behind the target movement through neural networks can make these networks better adapt to the chaos in complex environment [23], which could improve the prediction performance of dynamical systems, even nonlinear systems of many dimensions [24]. The researchers found that the symplectic phase space structure combined with Hamiltonian dynamics [25,26] proved to be valuable. Mavrogiannis [27] present a planning framework for multi-agent trajectory prediction and generation with topological invariants enforced by Hamiltonian dynamics. Bertalan [28] used a collection of observations to explore Hamiltonian systems of underlying structure in the data. Scott [29] trained Hamiltonian neural networks on increasingly difficult dynamical systems to improve the learning and forecasting capabilities.
The above methods are suitable for the physical systems with known physical quantities and measurable observations. For the time series with unknown conserved quantities, neural networks can be used to learn them, so as to find the dynamics model of the system. Choudhary [30] prepend a conventional neural network to a Hamiltonian neural network (HNN), and the combination could accurately predict Hamiltonian dynamics from generalised non-canonical coordinates. Toth [31] introduced a Hamiltonian generative network(HGN), which can learn Hamiltonian dynamics from high-dimensional observations (such as images) without restrictive domain assumptions.
However, in the face of the unknown physical system or the lack of complete motion trajectory image sequence, it is still a difficult problem to accurately predict the target motion trajectory. Aiming to the problem of motion control under the condition of unknown physical quantity and incomplete observation set, a trajectory prediction model based on conditional Hamiltonian generating network(CHGN) for incomplete observation image sequences is proposed, which can generate predictions of motion trajectory images at any specified time based on the incomplete observation set without canonical coordinates. First, Conditional AutoEncoder (CVAE) encoder converts the short-term continuous observation image sequence into target motion state features represented by generalized coordinates and generalized momentum, forms a potential state probability distribution, and generates the trajectory prediction image at the given specified time. Then, HNN is used to learn potential Hamiltonian physical quantities, so as to understand more chaotic system dynamics information to realize state cogni-tion. Finally, Velocity-Verlet integrator is built to predict the motion state at any moment according to the Hamiltonian learned by HNN at the current moment. The motion state and the specified time are used as the input of CVAE decoder to generate the target prediction image from the potential motion space. The experimental results show that CHGN can accurately predict the target trajectory at any time in a long time based on incomplete short-time image sequence, and has better performance than existing deep learning methods.
The main contributions of this article are as follows: (1) Under the short-term sequence learning, realize the constrained control of the time variable to the long-term prediction of the trajectory. (2) Solve the state input constraints of numerical coordinates, construct a unified model of hidden dynamic state and explicit trajectory generation. (3) Use Velocity-Verlet integrator for feature variational learning, which is more suitable for long-term dynamic evolution at the specified time.
The remainder of this paper is organized as follows: In second section, we introduce related work. In third and fourth sections, the frame of CHGN and prediction algorithm are proposed respectively. The results of experiments are reported in fifth section. The last section concludes the paper.

Related work
Many behaviors and changes in complex systems occur continuously. Neural Ordinary Differential Network (Neural ODE) [32,33] is known as a novel machine learning architecture of continuous dynamic system. Compared with the traditional deep neural network, the most significant contribution of Neural ODE is to further deepen the depth of the network, and realize the generation model of infinite depth in principle. The dynamic models build complicated transformations by composing a sequence of transformations to a hidden state: where t ∈ 0, . . . T and h t ∈ R D . The hidden state iterative updates can be seen as an Euler discretization of a continuous transformation [34]. Starting from the input layer h(0), the output layer h(T ) could be defined to be the solution to this ODE initial value problem at some time T. A Hamiltonian mechanics is used to train the model to learn and respect exact conservation laws in an unsupervised manner in reference [48,49]. Since time is actually continuous, a better approach would be to express dynamics as a set of differential equations and then integrate them from an initial state at t 0 to a final state at t 1 .
S denotes the time derivatives of the coordinates of the system. The entire model uses the neural network to parameterize the Hamiltonian weights H , and learn potential features directly from the data.
indicates the time evolution of the system.
The above model is suitable for data containing observable motion state, but not for higher dimensional temporal targets, such as a group of changing motion images. Then, HGN came along with deep learning approach capable of reliably learning Hamiltonian dynamics from pixel observations in literature [31]. HGN can be seen as the first approach capable of consistently learning Hamiltonian dynamics from high-dimensional observations without restrictive domain assumptions.
Although the ODEs effectively improve the dynamic cognition problem of the system, there are two important issues that need to be resolved: (1) For incomplete high-dimensional observation data, how to make use of the excellent performance of Hamiltonian in dynamic prediction to achieve motion prediction; (2) How to realize the control of time variable to state prediction under the self-learning mode of potential Hamiltonian physical quantity.
However, in unknown time-varying environment, the prediction of long-term and high-dimensional system evolution with time constraints is still a problem, which is the key to the solution of the CHGN model proposed in this paper.

The frame of model
To achieve the prediction of time series moving targets under specific time constraints, a CHGN model based on Hamiltonian dynamics neural network approach is proposed. The whole model involves cross-domain transformation between image features and potential Hamiltonian features under time conditions, dynamic evolution of Hamiltonian features, and sequence generation of target moving images.
With the development of deep learning technology in various fields [35,36], The neural network in CHGN model combines dynamic theory and image processing technology across domains to achieve fully autonomous Hamiltonian physical quantity learning, dynamic motion prediction and controllable directional image generation. Therefore, CHGN is composed of CVAE, HNN and Velocity-Verlet integrator, as shown in Fig. 1.
In the CVAE encoding stage, the short-term continuous observation image sequence is converted into target motion physical quantities, which is represented by generalized coordinates and generalized momentum; while in the CVAE decoding stage, the decoder converts the generalized coordinate information inferred from the motion prediction model into the target image, and realizes the trajectory prediction image generation control under time constraints.
HNN is used to learn potential Hamiltonian physical quantities, so as to understand more chaotic system dynamics information to realize state cognition. The Hamiltonian specifies a vector field over the phase space that describes all possible dynamics of the system, which provides potential physical laws for subsequent motion prediction.
Velocity-Verlet integrator estimates the future state of systems from inferred values of the system coordinates and momentum by numerically integrated the Hamiltonian. According to the Hamiltonian learned by the neural network, combined with the length of time, the value of the motion state at any time can be obtained.

Conditional autoEncoder
CVAE is a directional and controllable image generation technique [37][38][39]. By re-parameterizing the hidden vector to follow a Gaussian distribution, a new image can be generated by giving a hidden vector to follow a Gaussian distribution under the control condition. Different from general Conditional Variational Autoencoder, the input of VCAE decoder constructed in CHGN is not directly sampled from the latent probability distribution of the hidden layer. Instead, the motion prediction is inferred according to the learned Hamiltonian equation at the initial time, and the generalized coordinates at the specified time are decoded to generate the target trajectory image. The overall structure is represented in Fig. 2.
The encoder tries to learn q θ (z | x, y), which is equivalent to learning hidden representative data or encoding x to y conditions. The incomplete observation image sequence X i contains at least two short-term target images, Combined with t c time , the hidden code of the motion state can be obtained through the CNN network [40,41], which can be represented by generalized coordinates and generalized momentum, where, E cnn is the CNN extraction network, which encodes the observation sequence as a hidden layer representation of the motion state. y represents time t c . The output can be seen as a posterior over the initial state z ∼ q θ (z | x, y), corresponding to the system's coordinates in phase space at the first frame of the sequence. Let the variational approximate posterior be a multivariate Gaussian with a diagonal covariance structure shown as in reference [37].
u i and σ 2 i are learned by the mean and variance network, which contains two independent fully connected networks to learn the mean and variance of the initial motion state of the observation sequence following a certain physical law. The input of the fully connected networks is the highdimensional features extracted by CNN network, the weight of hidden layer neurons is 256 dimensions, and the output is 512 dimensions of self-learning data distribution parameters. After obtaining the mean and variance, perform reparameterization operations to solve the calculation problem of K L divergence.
If all p(z | x, y) are very close to the standard normal distribution N (0, I ), To make the model have generative ability, CVAE requires that each p(z | x, y) be aligned with the normal distribution, then the following formula is obtained: According to the ELBO [42], Because the objective function obtained in Eq. (8) is to be maximized during training, to obtain the loss function, the loss is defined as :

Hamiltonian neural network
CVAE provides an image generation framework, and how to learn the Hamiltonian and perform motion trajectory inference is the key to predicting unknown trajectories.
To overcome limitations of conventional neural networks, recent neural network algorithms have incorporated ideas from physics especially when forecasting dynamical systems. HNN learning a dynamical system intakes an abstract phase space S = (q, p) ∈ R 2n , where q ∈ R n is a vector of position coordinates, and p ∈ R n is the corresponding vector of momenta. The output can be seen as a single energy-like variable H, according to Hamilton's recipe: HNN learns the Hamiltonian function, which is a generator of trajectories. Since the same Hamiltonian function generates ordered and chaotic orbits, learning the Hamiltonian function enables the network to predict orbits outside the training set. The workflow diagram of HNN is shown in Fig. 3.
HNN has an advantages in dealing with chaotic systems. The main difference is that instead of directly outputting the predicted value ∂q ∂t and ∂ p ∂t , it learns the physical quantity of the potential Hamiltonian. After obtaining the physical motion law parameters at that moment, the target motion state can be discretely processed from the initial state to the final state of the system through a set of differential equations,

Velocity-verlet
CHGN draws on related theories of molecular dynamics [43][44][45][46] and estimate the future state of systems using Velocity-Verlet [47] algorithm from inferred values of the system position and momentum by numerically integrated the Hamiltonian. The Velocity-Verlet algorithm has made corrections to the main defects of the commonly used Verlet algorithm and the Leapfrog algorithm, that is, the calculation of the speed with the O(δt 2 )-order deviation. Velocity-Verlet algorithm can obtain the current position and velocity at the same time, which is more practical in the actual integration process. Figure 4 demonstrates that the kinematic state at any moment can be obtained by cyclic reasoning.
Equations (11)- (13) give the specific processing of the Velocity-Verlet algorithm following Ref. [47], and the position and momentum at the next moment are derived through the half-step acceleration unchanged. By calculating the halfstep length t + δt/2 velocity, the kinetic energy at t + δ is derived, so as to obtain the position, velocity and acceleration at t + δt.
The latent embedding s = (q, p) is then treated as an estimate of the position and the momentum of the system depicted in the images, where the momentum is assumed to be equal to the velocity of the system. Hamilton's equations following Eq. (9) imply the linear equation of motion:

Algorithm description
The CGAN model can use the incomplete observation image sequence and the specified time conditions to generate the motion trajectory prediction image at the future time under the time conditions. Based on the model structure, the model inference generation algorithm is given, as shown in Algorithm 1.
In Algorithm 1, the motion state feature extracted by the CVAE encoder F θ are taken as the initial state, that is, the moment is considered to be t = −1. The HNN model H learns Hamiltonian physical quantities H according to (q, p), and then predicts the physical parameters of the motion at any time in the long-term reasoning mode through the Hamiltonian equation and the Velocity-Verlet algorithm, and finally restores the predicted motion trajectory image through the CVAE decoder F φ .

Experimental results analysis
To directly compare the performance of CHGN to related SOTA methods, the datasets analogous to the data used in [49] are generated. The datasets contained observations of the time evolution of different physical systems: mass-spring, pendulum, and two body. The physical quantities of dynamics change with the initial coordinates and momentum. The data is not numerical data, instead of high-dimensional image sequences. H = H(q t+1 , p t+1/2δt ) 10: p t+1 = p t+1/2δt − 0.5 ∂ H ∂q δt 11: (q, p) = (q t+1 , p t+1 ) 12: // Using Velocity-Verlet algorithm to predict target motion 13: end for 14:x tc = F φ (q tc , t c ) 15: // Generation of prediction images 16: returnx tc ; There are three steps in data generation process. First, the initial state is randomly sampled, and a n -step(n represents the length of time, here is set to 30) rollout following the ground truth Hamiltonian dynamics is generated. Then add Gaussian noise with standard deviation σ 2 = 0.1 to each phase-space coordinate, and finally render them to the corresponding 64 × 64 pixel observation image sequence.

Qualitative functional level comparison and analysis
To facilitate the understanding of the experimental settings, the mass-spring system is taken as an example to comprehensively analyze the experimental results from many aspects, including training data, training methods, training conditions, experimental results and comparative analysis. Figure 5 shows the comparison of the performance on different motion prediction trajectories at the functional level.
At present, there are two main methods: one is the classical deep neural network model [6], such as CNN, and the other is the dynamic neural network model, the typical representative of which is HGN [31]. The observation images of target motion trajectory is shown in Fig. 5a, and target position changes over time. Figure 5b is the comparison effect of the predictions of the three models: Baseline, HGN, CHGN. Baseline is the standard CNN model, and its motion parameter prediction is completely obtained by numerical fitting of neural network, so the generated images are more average of pixels. As can be seen from the deviation difference diagram of its ground truth and predicted trajectory, the prediction error is large. HGN uses the learned Hamiltonian to predict the motion trajectory of physical parameters, and the predicted value is closer to the ground truth. However, the HGN input is a complete set of observations and cannot achieve time-constrained predictive control. CHGN can realize motion trajectory reasoning based on only shortterm continuous incomplete observation image sequences, using Hamiltonian dynamics and CVAE to generate prediction images. It has time-constrained control capabilities and The overall performance of the above figure shows the final motion prediction performance. To quantitatively compare the prediction effects of each model, the mean square error (MSE) of the trajectory image under the same time step is displayed, as shown in Fig. 6. For fair comparison, training sets and test sets on all model are kept consistent respectively. Since the initial state of the target is randomly sampled, the position in the image is different at each moment in each movement trajectory sequence.
It can be seen that compared with baseline, CHGN and HGN have smaller MSE, and CHGN has the best motion reasoning performance with the increase of time step. There are two reasons. First, it has the control ability of time condition constraint. On the other hand, it has better accuracy by learning Hamiltonian physical quantity and using Velocity-Verlet algorithm to carry out motion reasoning.

Quantitative performance level comparison and analysis
More experimental results have been verified in different dynamic systems. The comparison of the convergence loss is shown in Fig. 7. The horizontal axis is the number of training iterations, and the vertical axis represents the con-vergence loss. During the training process, it is found that CHGN converges faster than other deep learning methods including HGN with lower loss.
In addition, dynamic evolution algorithm is involved in adaptive evolution of high-dimensional features. We compared the two evolution methods of Leapfrog and Velocity-Verlet. Table 1 shows that the average MSE of the pixel reconstructions on both the train and test data is an order of magnitude better for CHGN compared to Baseline. In particular, CHGN that uses Velocity-Verlet algorithm to realize motion inference under time condition control, has the best prediction and reconstruction effect.
More prediction image performance on different dynamic system is shown in Fig. 8. To prove the time condition control ability, the first 5 consecutive times and the last 5 consecutive times of the test data are selected to show the predictive performance.
From the motion prediction experiments of the pendulum system and the two-body system, it can be seen that although HGN also predicts the future time trajectory by learning the Hamiltonian function, its performance is still inferior to that of CHGN. On the one hand, all the observed images are required as input; on the other hand, the highdimensional time-stacked images are directly output, so the

Conclusion
Combining the study of physical laws with the prediction of neural networks is an important direction in dynamics research. Aiming to the problem of motion control under the condition of unknown physical quantity and incomplete observation set without restrictive domain assumptions, a trajectory prediction model based on CHGN is proposed, which can generate predictions of motion trajectory images at any specified time based on the incomplete observation set without canonical coordinates. CVAE encoder converts the short-term continuous observation image sequence into target motion state features represented by generalized coordinates and generalized momentum, forms a potential state probability distribution, and generates the trajectory prediction image at the given specified time. Then, HNN is used to learn potential Hamiltonian physical quantities, so as to understand more chaotic system dynamics information to realize state cognition. Finally, Velocity-Verlet integrator is built to predict the motion state at any moment according to the Hamiltonian learned by HNN at the current moment. The motion state and the specified time are used as the input of CVAE decoder to generate the target prediction image from the potential motion space. Experimental results show that CHGN can accurately predict target trajectories over a long period of time based on incomplete short-term image sequences, and has better performance than existing deep learning methods.
It is worth pointing out that although CVAE and HNN learn autonomously through neural networks, evolution time and step size factors involved in the dynamic evolution of high-dimensional features affect the actual system effect. How to determine the optimal evolution path is the current limitation, which is also the future research direction.

Conflict of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.