Hybrid modeling of a multidimensional coupled nonlinear system with integration of Hamiltonian mechanics

This study concerns hybrid modeling of a multidimensional coupled nonlinear system. The underlying basis for the model is derived from Hamiltonian mechanics capitalizing on the broad utility and efficiency of energy-based reasoning in modeling high-dimensional systems. The hybrid model is essentially an artificial neural network with a computational graph that is modified from conventional neural networks in a few significant ways. The first modification includes incorporating an intermediate scalar function representing the Hamiltonian learned from data. The second modification enhances input/output channels for capturing the multidimensional dynamics of the system. The main goal of such hybrid reasoning is to improve the extrapolation capability of the model by enforcing conformance with some key aspects of the underlying physics in the form of a bias. The results demonstrate that incorporating this physics-based bias into the hybrid model empowers it to produce long-term and physically plausible predictions. The proposed modeling approach also shows high scalability for energy-based modeling of multidimensional dynamic systems in general.


Introduction
Understanding phenomena observed in engineering systems has always been an essential objective of scientific studies. Specifically, in engineering fields, a deeper understanding improves many aspects, such as design, performance prediction, verification, validation, optimization, control, diagnostics, and prognostics. Hence, tremendous efforts have been invested in model development to obtain a compact and accurate abstraction of engineering systems. An accurate model can explain the root cause of observations, uncover basic underlying causalities, validate assumptions, and discriminate between different hypotheses.
Recent advances in computational power and sensing technology along with large-scale collection of sensory data have enabled researchers to develop and validate complex and high-dimensional machine/deep learning models. These models are critically important in studying complex, partially known, and multidimensional systems where the contribution of classical physics-based modeling is limited. However, in addition to the rather obvious need for large amounts of data, the possibility of violating the underlying causalities hampers the generalization capability, applicability, and scalability of the machine learning models.
Widespread recognition of the above issues has led to physics-informed machine learning, which is a relatively new research paradigm that addresses these issues by integrating physics and machine learning. The objective is to endow the data-driven models with physics-based reasoning to enhance their strengths and avoid limitations. Integrating physics can result in more robust and interpretable models with physically plausible and consistent predictions. There is a wide range of approaches for integrating physicsinductive biases, priors, and constraints into learning from data. The selection of an approach depends on many variables, such as data availability, system complexity, and model performance criteria [1,2]. A notable approach is improving and generalizing the information content of input of a machine learning model by physics-informed feature extraction. This idea has been explored for a variety of problems including fault diagnostics of prototypical nonlinear mechanical systems [3,4] bearing fault diagnostics [5][6][7][8], gear diagnostics [9], crack detection [10], flood modeling [11] and bifurcation analysis of nonlinear dynamic systems [12]. Other ideas include synthetic data generation by simulation of a physics-based model to amplify a machine learning model input domain [13]; formulating a hybrid loss function for the learning process and penalizing the model for deviation from physical constraints [2]; pre-training a machine learning model using physics-based simulation data and fine-tuning based on a limited number of observations [14]; customizing the topology of a machine learning model by adding intermediate physics-informed components [15]; designing the structure of a machine learning model using physical laws [16]; uncertainty reduction in machine learning using physics [17]; and integrating machine learning models with first principle physical laws for solving partial differential equations [18]. A comprehensive discussion of these approaches can be found in [1,2,19].
A significant advancement was achieved with respect to dynamic system modeling by proposing ordinary differential equation (ODE) networks [20], where the neural networks were considered as parameterized ODE functions. The improved efficiency of these networks in modeling a wide range of dynamic systems over plain vanilla recurrent neural networks has been demonstrated in the literature [21]. Integrating ODE solvers in the computational steps of these networks results in advantages such as parameter efficiency, adaptive com-putation, and scalability. However, they are not able to encode the underlying physics through the learning process.
Other studies have focused on more generic approaches for incorporating physics in order to reduce the hypothesis space of machine learning models. 1 The laws of physics show invariant characteristics under certain transformations in space and time coordinates that empower them with universality [22][23][24]. For example, a classical dynamic system shows timereversal symmetry, which keeps the corresponding equations of motion unchanged under time-reversal transformation. Accordingly, symmetries in dynamic systems induce conservation laws based on Noether's theorem [25]. Enforcing the machine learning models to comply with such symmetries and corresponding conservation laws has been shown to significantly improve their performance in dynamic systems modeling [26].
Embedding symmetries in machine learning models has achieved remarkable improvements in the area of computer vision as well. The symmetries were embedded by nonlinear transformations, which either completely preserve the symmetry (equivariant) or are invariant to certain transformations of inputs. As a result, rotational, translational, scale, and gauge equivariant computational layers were obtained and incorporated into the learning process [27,28]. Also, these nonlinear transformations were further modified to impose symmetry in dynamic systems modeling as well [29,30]. The same objective was used in [26] by imposing a time-reversibility constraint in the learning loss function.
In a recent study [31], the idea of parameterization of a scalar function that represents the Hamiltonian in a conservative dynamic system by a neural network was explored. The network was compelled to approximate a conserved function from observations by adding further computational steps. Accordingly, inductive biases, inspired by the conservation of energy, were incorporated into the learning process from data. From a mathematical point of view, they can be categorized into: (i) ODE bias, where the derivatives of the states are learned instead of the states; (ii) second-order bias, where the model changes in posi-tion through changes in velocity; (iii) energy conservation bias, where an energy-like function is learned from data; and (iv) symplectic bias, where the phase space of the corresponding system is conserved with a Hamiltonian vector field structure [32].
Hence, making the learning process aware of generic-and important--physical knowledge, such as conservation laws, results in hybrid models with much improved capability in dynamic system modeling. These models are capable of improving the performance measures in many aspects, such as accuracy, data efficiency, generalizability, extrapolability, interpretability, transferability, explainability, and robustness of the predictions. Although the literature shows some recent studies on these hybrid models, their application in many aspects of dynamic system modeling is still far from mature [31,33,34].
The current study hence focuses on two principle aspects of energy-based hybrid modeling. Firstly, it exploits the facility of energy-based reasoning as opposed to the Newtonian approach for modeling a multidimensional dynamic system. Secondly, the incorporation of physics in the hybrid model is targeted toward improvement in extrapolation performance, meaning the ability of the model to be accurate in scenarios out of the training range.
The rest of the paper is organized as follows. First, fundamental concepts of Hamiltonian mechanics used in dynamic system modeling are outlined in Sect. 2.1. Next, the characteristics of the hybrid model are discussed in Sect. 2.2. Then, hybrid modeling is applied to a multidimensional nonlinear coupled oscillator in Sect. 2.3. Section 3 presents the modeling results, and a comprehensive discussion about different aspects of the hybrid model is presented in Sect. 4. Finally, the paper ends with a conclusion in Sect. 5.

Dynamic system modeling
Hamiltonian mechanics is used to model the time evolution of classical conservative dynamic systems and employs a scalar function called the Hamiltonian [35,36]. This mathematical framework can provide topological and mathematical/numerical representations of the dynamic system [37]. This framework was initially developed for classical solid mechanics.
However, it has been applied to a broad range of fields of science, e.g., fluid mechanics, statistical mechanics, and quantum mechanics [38].
The dynamics of a conservative system is abstracted in a set of generalized coordinates of position and momentum with M dynamic variables. Then, the Hamiltonian H is defined as a scalar function using , where q(t) and p(t) are position and momentum vectors, respectively. Hamiltonian's equations in these coordinates are written as follows.
whereq andṗ stand for time derivatives of q and p. The dynamic system time evolution is obtained by integration ofq andṗ over time.

Multidimensional Hamiltonian neural network
In general, the dynamics of a system is fully characterized by a mapping that relates the position and momentum to their corresponding time derivatives. In Hamiltonian mechanics, this mapping is developed indirectly based on an intermediate scalar function analogous to the energy of the system, and it is also formulated mathematically through Hamiltonian equations. Alternatively, this mapping can be approximated by a parameterized machine learning model, such as an artificial neural network, which allows one to infer the dynamics of the system from data through a supervised learning process. To improve the adherence of the machine learning model to physics, the concept of indirect mapping and the consideration of an intermediate function can be integrated into the learning system. This is achieved by modifying the computational graph of the machine learning model and adding further computational steps derived from Hamiltonian equations. Therefore, the principal idea behind the hybrid model is the indirect mapping from position and momentum to their associated time derivatives. The machine learning model parameterizes a scalar function similar to the Hamiltonian, and it takes the in-graph derivatives of that function based on Hamiltonian equations to return time derivatives of position and momentum.
It should be noted that additional computational steps can cause implementation issues such as increased computational cost, increased sensitivity to initialization, algorithm divergence, and error instability. Notwithstanding that, addressing these issues and modifying the flow of information inside the network derived from Hamiltonian mechanics improves the consistency of the hybrid model with physics.
One of the critical steps in addressing these issues is the optimization of hyperparameters of the hybrid model, such as the number of layers, neurons, learning rate, and activation function, using hyperparameter optimization techniques such as Bayesian optimization, grid search, and random search. Upon removing the implementation issues, the hybrid model can be endowed with a rigorous concept of Hamiltonian mechanics for inferring dynamics from data. Additional details regarding the hybrid model are as follows.
In the hybrid model the Hamiltonian is parameterized by a neural network and is learned from data. In contrast to classic neural networks that ignore the underlying physics and learn purely from data, learning of the Hamiltonian neural network is subject to a bias which imposes conservation of energy.
In a supervised learning process, a Hamiltonian neural network with parameters parameterizes a Hamiltonian H (q, p) from data using q and p as inputs. Accordingly, it returns time derivatives of q and p using Eq. (1). Building upon this, we go forward with a multidimensional Hamiltonian neural network designed to model the dynamics of a coupled oscillator. With x and y as two degrees of freedom, the outputs of the Hamiltonian neural network can be written as follows.
where S H can be interpreted as a vector field over the inputs of H which directs the evolution of the system in the direction of the gradient of H . In addition, given the energy conservation, S H is a symplectic gradient of the Hamiltonian, so this conservation is embedded in the model development through Eq. (2). The time evolution of the system is obtained by integrating this vector field over time as follows.
where q t 0 x , p t 0 x , q t 0 y , and p t 0 y are initial position and momentum, associated with x and y, respectively. This equation provides for phase space reconstruction and for determining the trajectory of the dynamic system starting from an initial condition.
Compared to a conventional neural network, the Hamiltonian neural network learns conserved quantities representing energy from data. Also, over the forward pass and before computing the loss, it takes derivatives with respect to inputs. Accordingly, the network customized loss function is written as follows.
where q gt x , p gt x , q gt y , and p gt y stand for the ground truth position and momentum associated with x and y, respectively. Also, λ 1 and λ 2 are Lagrange multipliers.
Minimizing the loss function through the training process provides network parameters , including weights and biases. Figure 1 demonstrates the computational graph of a conventional neural network and a Hamiltonian neural network. In the second network, direct predictions of time derivatives of position and momentum are replaced by the terms ∂ H ∂ p x , − ∂ H ∂q x , ∂ H ∂ p y , and − ∂ H ∂q y . This customized network architecture allows a Hamiltonian neural network to learn the conserved quantity (Hamiltonian, analogous to the total energy) directly from data.
In this network, the mapping from the position and momentum to associated time derivatives is indirectly developed in the learning process. It passes through an intermediate physics-inspired computational step which is embedded in the network to approximate the Hamiltonian. Also, it is used for taking derivatives with respect to the inputs based on the Hamiltonian equation.
Furthermore, given the multidimensionality of the dynamic system, a Hamiltonian neural network with an extended number of input/output channels is developed in this study. The improved computational graph provides for incorporating the effects of each subsystem and the coupling between them in learning the Hamiltonian.

Coupled nonlinear oscillator
The performance of the proposed hybrid modeling approach is evaluated using a coupled nonlinear oscillator demonstrating a sufficiently complex dynamic system. Coupled nonlinear oscillators represent dynamics of a wide range of physical, chemical, and biological systems. A coupled oscillator with a quartic nonlinearity is considered in this study which is used to model free vibrations of stretched strings [39], dynamics of coupled nonlinear pendulums [40], etc. The schematic of this system is shown in Fig. 2 where x and y show the two degrees of freedom and m 1 and m 2 stand for the masses of the subsystems. Also, k lx and k ly represent the linear stiffness coefficients associated with x and y degrees of freedom, respectively, and k nx and k ny stand for the corresponding nonlinear coefficients. Also, k c denotes the coefficient of the nonlinear stiffness of the coupling. Assuming m 1 = m 2 , k lx = k ly = a, k nx = k ny = b, and k c = c, the Hamiltonian can be written as follows [41].
whereẋ andẏ show time derivatives of the two degrees of freedom and a, b, and c are constant parameters. The details of model development and deployment are explained in this section. The corresponding graphical representation is illustrated in Fig. 3. The samples, which include initial conditions for each subsystem, are distributed randomly between E min and E max , where i and j are numerators for development and deployment samples, respectively.
In the development process of the model, n number of initial conditions are selected randomly. Then, the position, momentum, and associated time derivatives of each sub-system are observed in a time range [0, τ 1 ] with a time step of dτ . Each sample includes q x , p x , q y , and p y as the inputs andq x ,ṗ x ,q y , andṗ y as the outputs which are associated with n number of trajectories starting from initial conditions over the observation time range ([0, τ 1 ]). The mapping between these input and output sets fully characterizes the dynamics of the system, and the model is developed to capture it.
In the deployment section, which is to evaluate the performance of the model, it is employed to predict the system response. Hence, it is used in an initial value problem (IVP) for an unseen value of the initial condition. Accordingly, m number of unseen initial conditions are selected randomly. Each initial condition, including q x • , p x • , q y • , and p y • , is used in the IVP along with the developed model to roll out the dynamics of the system. Numerical time integration (INTG) is carried out in the time range of [0, τ 2 ] for predicting the time evolution of each subsystem. The IVP is solved using the fourth-order Runge-Kutta as a numerical time integrator. The outputs are predicted q x (t), p x (t), q y (t), and p y (t) within the time range of [0, τ 2 ]. As discussed earlier, this study focuses on the extrapolation capability of the hybrid model, so it is assumed that τ 2 = kτ 1 , where k > 1. Time range of [τ 1 , τ 2 ], which is noted by a yellow region in Fig. 3, includes a range of data to which the model has not been exposed, so it is used for evaluating the extrapolation capability of the model.

Network optimization
At first, we employ a standard neural network called the baseline neural network (BNN) to model the dynamics of the coupled oscillator. The BNN is a feedforward network with a multi-layer perceptron architecture comprising multiple fully connected layers with neurons. Unlike the Hamiltonian neural network, the BNN only performs a forward computational pass in each iteration before calculating the error. As a result, the position and momentum of each subsystem are directly mapped to their corresponding time derivatives within the network.
Then, a Hamiltonian neural network (HNN) is employed for the same modeling and analysis, as outlined in Sect. 2.2. In this network, both forward and backward computational paths are taken in each iteration before calculating the error in accordance with the modified computational graph. The forward path approximates a scalar function similar to the Hamiltonian, while the backward path takes in-graph derivatives based on the Hamiltonian equations to compute the time derivatives of position and momentum for two degrees of freedom. This customized graph enables the incorporation of Hamiltonian concepts in characterizing the dynamics of the system and enforcing the network to comply with general principles governing the system.
For performance evaluation of the models and analysis, the values of the parameters of the dynamic system and networks are set as follows Sheeja and Sabir [41]. a = 0.1, b = 0.1, c = 3, n = 200, E min = 0, E max = 1, τ 1 = 30, dτ = 0.01, k = 10, and 80/20% split for development and deployment. The hyperparameters of the networks' architecture, including the number of hidden layers and neurons, are determined using a grid search. The number of hidden layers is selected from the range [1,4] and the number of neurons is selected from the range of [4,256]. Both networks have three fully connected layers, each with 200 hidden nodes and a hyperbolic tangent (tanh) activation function. The performance measure is rootmean-square error (RMSE), and the Adam optimizer with an initial learning rate of 0.001 is used for training both BNN and HNN. The optimization is stopped after 800 and 1000 epochs for the BNN and HNN, respectively.

Results
This section aims to evaluate the model's performance using statistical and physical metrics. The statistical metric used is the root-mean-square error (RMSE), where the error is the mismatch between prediction and observation for individual samples of the deployment set. It quantifies the fitness of the models in predicting each data point over trajectories starting from m unseen initial conditions. The results show that the HNN and BNN converge to a comparable statistical performance measure (R M S E), meaning there is no significant difference in their performance.
Given that this metric quantifies the fitness of the model statistically, it may not be representative enough to measure the consistency of the model with the governing physics. A physical metric is considered based on energy to evaluate compliance of the model with underlying physics. This measure used is the energy deviation error (Er RM SE ), which quantifies the rootmean-square of error where the error is difference between the predicted and conserved energy of each trajectory. Given that energy is conserved for each trajectory regardless of time length, this measure can also be used for evaluating the extrapolation capability of each model. Figure 4 shows the results of comparison between performance of HNN and BNN within the ranges [0, τ 1 ] and [τ 1 , τ 2 ] using the physical measure. In detail, as described in Sect. 2.4, HNN and BNN are used in an IVP for m number of unseen initial conditions assigned for the model development to obtain Fig. 4a. The IVP is solved within the time range [0, τ 1 ], which is the same as the time range of model development data. The energy deviation error (Er RM SE ) is calculated for each trajectory. Then, the mean of this physics-based performance metric is compared for BNN and HNN in Fig. 4a. It can be seen that HNN shows a relative improvement of 13% with respect to the BNN in compliance with the conservation of energy.
The same analysis is carried out within the time range of [τ 1 , τ 2 ] that represents the extrapolation range. It can be seen that the relative improvement increases to 82% in this case. Comparing the value of this physicsbased performance measure in two time ranges shows a significant improvement in the extrapolation range. It demonstrates notable capability of the hybrid model in providing physically plausible extrapolation. The physics-based bias embedded in the development of this model empowers it with this strength. In contrast, analyzing BNN shows that energy error is accumulated as the time range expands. It can be attributed to the absence of knowledge of the physics of the dynamic system. It brings about violation of governing physics and demonstrates an inability to capture the long-term dynamics of the system. Therefore, the corresponding error grows as the model goes beyond the development range.
In order to provide further analysis an unseen initial condition ([q x • , p x • , q y • , p y • ] T = [0.1, 0.1, 0.1, 0.1] T ) is selected, and predictions of models are investigated. The energy deviation of HNN and BNN over time is demonstrated in Fig. 5. Deviation values versus time steps are obtained after solving the IVP for the selected initial condition within the time ranges [0, τ 1 ] and [τ 1 , τ 2 ]. The latter time range, which represents the extrapolation, is demonstrated in the yellow region. This figure shows that BNN drifts off as time passes which results in the accumulation of the error in the extrapolation range. It can be concluded that using BNN for the prediction of conditions out of training range intensifies deviation from the underlying physics of the problem. However, HNN shows relatively higher compliance with energy conservation that originated from physics bias imposed on the development process. For the selected initial condition, two phase spaces associated with each subsystem are illustrated in Fig. 6. Figure 6a, b is the results of BNN predictions over the range [0, τ 2 ] for the two subsystems. It can be seen that as time passes, the phase expands, which is in sharp contrast with the conservative characteristics of the dynamic system. Figure 6c, d shows the results of the same analysis for HNN. Regardless of oscillations in trajectories, it shows a higher consistency with the ground truth data compared to BNN. The hybrid model shows 20, 6, 27, and 30% improvement in predicting q x , p x , q y , and p y , respectively, for the selected initial condition. These results are in accordance with the error of energy deviation, which again emphasizes the importance of incorporating physical knowledge into hybrid model development.

Discussion
One of the primary goals of dynamic system modeling is achieving an abstraction of a system with high predictive accuracy. In physics-based reasoning, the models aim to develop a generic representation of dynamic systems based on causalities and interaction of the underlying mechanisms. That results in achieving analytic abstraction and governing constitutive relations with high predictive accuracy. However, developing such models is not trivial (and, in fact, oftentimes impossible) for many real-world systems owing to uncertainty, high dimensionality, and complexity of systems.
On the other hand, data-driven models can extract statistical relationships based on correlations between observations with reduced demands for a detailed understating of underlying mechanisms. However, their predictive accuracy for the scenarios out of training range is a major weakness that hampers their application range. This task becomes even more complicated in the case of modeling a multidimensional dynamic system with coupled subsystems. The interaction of subsystems and coupling between them further complicates the inference of the dynamics from data. The issue with predictive accuracy can be attributed to the fact that these models predict the future based on what they learn from historical observations. If these observations are not informative enough for the model to capture the system's long-lasting and generic characteristics, its predictive accuracy will be adversely affected.
The current study proposes using a dynamic system's invariant characteristics, which are unchanged under certain transformations, for hybrid modeling. These properties, known as the symmetries of transformations, and induced conservation, can benefit hybrid models. This study imposes an inductive bias into the hybrid model based on time translation symmetry which mandates energy conservation. The hybrid model targets accurately learning the dynamics of the multidimensional system and improving extrapolation capability.
In the analyzed example, the imposed bias improved the extrapolation capability of the model by enforcing it to comply with the underlying physics. Inadequacy of historical data for learning the underlying physics was compensated by the imposed bias. The proposed hybrid reasoning can be considered an example of integrating invariant characteristics of a system into modeling. They can be integrated into modeling in the form of fundamental priors, physical constraints, and inductive bias which reduce the hypothesis space of a model and result in physically plausible predictions.
Also, in this study, the computational graph of the model is customized by increasing the number of input/output channels enabling it to learn the dynamics of a multidimensional system. This extension is critical given the facility of energy-based reasoning compared to the Newtonian approach. The energy of the dynamic system is the sum of the energy of subsystems, and there is no need to decouple the subsystems and calculate the internal forces under the energy conservation Fig. 6 Comparison between the phase spaces generated by a BNN for x degree of freedom, b BNN for y degree of freedom, c HNN for x degree of freedom, d HNN for y degree of freedom assumption. The results of this study are scalable to high-dimensional systems in which energy-based analysis of a dynamic system is preferred.

Conclusion
The current study is concerned with hybrid modeling of a multidimensional coupled nonlinear system. The hybrid model is empowered with Hamiltonian mechanics concepts, given the relative ease of energy-based reasoning in modeling multidimensional systems. An artificial neural network with a modified computational graph is used as the hybrid model. The first modification includes an intermediate scalar function representing the Hamiltonian learned from data. The second modification is increasing the number of input and output channels enabling the network to capture the dynamics of the multidimensional coupled system. The hybrid modeling is targeted toward achieving physically plausible extrapolation for data out of the training range. Performance of this model is compared to a similar standard neural network. Physics-based prior that originates from energy conservation has endowed the hybrid model to return long-term predictions without violating the underlying physics. The pure data-driven model predicts the future entirely based on observations from the history of the dynamic system. In contrast, the hybrid model is endowed with physical knowledge about the future of the system.
Results show that the hybrid model outperforms the pure data-driven model up to 14 and 82% in energy deviation metric within and out of the training range, respectively. The notable difference between the performances out of the training range demonstrates high extrapolation capability of the hybrid model. In other words, the hybrid model holds consistency with underlying physics over the range of data to which it has not been exposed. The model's predictive capability is critically important since it can be used for a wide range of applications, such as design, response prediction, optimization, control, diagnostics, and prognostics. Also, it has been shown that the proposed hybrid model is scalable to multidimensional dynamic systems with coupled subsystems. In high-dimensional systems, energybased reasoning for modeling dynamics of a system is preferred. In this approach, unlike the Newtonian methods, there is no need for decoupling the subsystems and calculating the internal forces under energy conservation assumption. Evaluating the scalability of the proposed modeling approach for learning dynamics of systems with higher dimensions is the topic of our ongoing research.