Application of neural ordinary differential equations to the prediction of multi-agent systems

Dynamic systems are usually described by differential equations, but formulating these equations requires a high level of expertise and a detailed understanding of the observed system to be modelled. In this work, we present a data-driven approach, which tries to find a parameterization of neural differential equations system to describe the underlying dynamic of the observed data. The presented method is applied to a multi-agent system with thousand agents.


Introduction
We present an approach to learn the dynamics of hundreds to thousands of agents in a system, where the underlying assumption is that the agents follow a collective dynamic, which is not obviously recognisable, but present. In this way, we hope to show scientists a method that allows them to verify model assumptions based on ordinary differential equations on their data. For the purposes of this report, we shall limit ourselves to the Hamiltonian of canonical classical mechanic equations with which movements of objects can be described by their position and velocity. Examples for such dynamics systems are flocks of birds or the movement of cell components. The underlying dynamics allow many conclusions, e.g. about the energy distribution in the system or whether there are symmetries or not.
To visualise the motion of many particles or of a swarm with many agents several approaches exists, like the group of optical flow methods [8,9,12], cross-correlation [1] and Kalman filters [19], these methods can be translated into more complex approaches like particle tracking velocimetry (PTV) [5,14,15,20]. If a sufficient amount of data is available and agent/particle density is high enough, a vector field can be obtained by mapping the trajectories on a Cartesian grid. Thus, it is possible to visualise the dynamics of the underlying system. However, these approaches often result in an algorithmic frameworks of high complexity. The approach presented here is based on neural ordinary differential equations (nODEs) [3]. An artificial neural network (ANN) is used to approximate the Hamiltonian of canonical classical mechanic equations. This allows describing the underlying particle-or agent-swarm directly as a dynamic system without detours. To train the here-presented approach only a list of positions with their momentum is necessary. In this study, we derive the method for this and apply it to simulated data based on the boids [18] algorithm.

Methods
Considering a set of N coordinates (q, p) , where q ∈ ℝ N represents the positions and p ∈ ℝ N their momentum. A scalar function H(q, p) is called the Hamiltonian if: This system can be rephrased without restricting generality to where ∇ is the gradient operator. For nODEs it is common to approximate (Eq. 2) by ANNs, denoted by H (q, p) , where is a vector with all parameters of the ANN. A (feedforward) artificial neural network, also known as a multi-layer perceptron (MLP), is a series of logistic regression models stacked on top of each other. MLPs can be used for classification problems as well as for regression problems, depending on the final layer. In this work, we are solving regression problems defined by: where y is the desired regression output, x the input vector, w the weight matrix, v j = V j,∶ is the jth row of V , hidden layers z(x) = (x, V) of size H, where g is the so-called activation function. N(⋅, 2 ) is the symbol for the normal distribution with variance 2 . The hyperbolic tangent (tanh) is used as the activation function. To use (Eq. 3) for H (q, p) the framework of nODEs is used, which are a family of ANNs where the hidden layers parameterize the derivative of the hidden state using a neural network and the output of the network is computed using a black-box differential equation solver, where backpropagation [2,4,10] through the ODE solver is applied [3]. A special feature of nODEs is that the gradient is calculated using the adjoined sensitivity method [17], allowing a linear relation between problem size and computational complexity and a low memory consumption. After obtaining the gradient, the parameters of H (q, p) are trained by a gradient method with adaptive moment estimation, called ADAM [11]. The input data are all pairs of (q, p) in the training-set. And the loss function which is supposed to be optimised is where the first two terms simply describe the quadratic deviation, ‖.‖ 2 2 denotes the l 2 -norm and the third term ‖ ‖ 1 , with ‖.‖ 1 denotes the l 1 -norm and forces sparisty on for with ∈ ℝ a weighting parameter (in this work = 0.3 ). While the details mentioned above may only differ in technical details from the nODE approach as described in [3], the addition of the sparsity term is a novelty to [3] which has proven to be necessary for the convergence to achieve the results discussed later. The implementation of H (q, p) was done with PyTorch [16] and is based on [3]. The network architecture consists of five dense layers, each layer has an activation layer with tanh. The first layer expects 2 inputs and has 128 nodes, followed by four more layers with 128 nodes and finally output layer with one output node. In this work, we have deliberately avoided dividing the data into a training, validation and test set, because we are interested in the vector field in the end, which is not directly evident from the data anyhow. A generalization beyond the spatio-temporal domain of the training data is not the goal of this study. In the following results two points are examined. First, the learned vector field is validated if it can reproduce the ground truth trajectories from the boid algorithm, when using the initial conditions from the boid algorithm. Second, the trained system is used on new data, but with the same parameters for the random sources and compared to the nearest neighbours of the training data.

Results
To investigate the presented approach empirically, a simulation of the flocking behaviour of birds is used. For this simulation 1000 agents with separation, alignment and cohesion rules were modelled by the boids algorithm [18]. The boids algorithm with reflecting boundaries was implemented using python [21] and the numpy package [7]. The 1000 agents were placed randomly centred around (0, 0) ( Fig. 1) but without overlap and with random initial velocities from U[0.2, 5] and random accelerations of U[0, 0.03] , where U[a, b] is the uniform distribution between a and b. The system was simulated t max = 5000 time steps. These data were then used to train the presented approach with early stopping [6] and a patience of 15. In (Fig. 2) the values of the loss function over the epochs are plotted. Early stopping ends the training if the loss function does not improve over the last 15 epochs. The system was trained on a Nvidia GTX 1080, an epoch calculates on average 61 ± 3 seconds. In total 54 min 54 s, but probably half of the time would also be sufficient. After training the vector field of the nODE can be visualised, by generating an equidistant grid with n = 50 points and applying the learned H (⋅, 0) to each point. The resulting vector field is illustrated in (Fig. 3). Based on the learned vector field it's possible to predict trajectories, using some values for q and p as initial conditions and applying this to the field. To check if the presented approach learned the underlying dynamic from the boid algorithm, all q and p from the training data were selected. A few exemplary trajectories are shown in (Fig. 4). The corresponding statistics are available in blue in (Fig. 5). It can be seen (Fig. 5) that the trajectories deviate about 7 units of length in the median, but there are also cases where the deviate exceeds 80 units, with a domain size of 10 3 × 10 3 this is a comparably small error. To validate our approach further, we took the predicted nODE and randomly initialised 1000 agents with the same scheme as above and calculated the trajectories based on nODE over 5000 time steps. For each trajectory, the nearest neighbour of the original trajectories was searched and the l 2 -deviation was calculated. This deviations can be seen in Fig. 5 in light orange. The expectation of the distribution is about 80 length units, which in relation to the total domain size again is not much.

Discussion
The approach presented here is an extension of the classic nODE approach. By adding a sparse regularisation term to (Eq. 4), it was possible to train the nODE network to learn the underlying dynamic of the agent motions from the boids algorithm. Surprisingly, even the resulting trajectories from the learned system only have a maximum deviation of 7 length units (divided by the domain width this equals 0.7% ) compared to the ground truth data, which means that the underlying dynamic of the training data can be reconstructed. Here, it should be taken into account that the ANN used to learned the vector field has a very high degree of freedom which allows it to overfit the training data and describe even complex trajectories. Considering randomly initialised agents, from the same source distribution, the deviation increases in the median to up to 80 length units (or 8% ) which is an increase of a factor of 10, but given the large domain this is still quite a good result. It also should be noted that no hyperparameter optimisation was considered for the approach presented at this stage. It is, therefore, to be expected that, if the loss function or the architecture of the network is further adapted, even better results could be achieved. It is also worth mentioning that we have chosen ADAM [11] because of its fast convergence, other algorithms such as SGD [13] may achieve better results, too. Methods like these need further investigation to understand Fig. 1 Snapshots of 1000 agent system simulated with the boids algorithm [18] at different times (from left to right: t = 0 , t = 500 , t = 1000 ). Each dot represents an agent and the borders are reflective in detail in which cases they generalise and in which not. However, we hope that approaches like the presented one can be helpful to gain new insights into systems where models based on first principles are hard to obtain. Especially, applications where only a part of the dynamics are described by first-principles can benefit from such "hybrid" modelling approaches where the unknown part of the dynamic is approximated by an nODE.
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.