Port-metriplectic neural networks: thermodynamics-informed machine learning of complex physical systems

We develop inductive biases for the machine learning of complex physical systems based on the port-Hamiltonian formalism. To satisfy by construction the principles of thermodynamics in the learned physics (conservation of energy, non-negative entropy production), we modify accordingly the port-Hamiltonian formalism so as to achieve a port-metriplectic one. We show that the constructed networks are able to learn the physics of complex systems by parts, thus alleviating the burden associated to the experimental characterization and posterior learning process of this kind of systems. Predictions can be done, however, at the scale of the complete system. Examples are shown on the performance of the proposed technique.


Introduction
Recently, the possibility of developing learned simulators has attracted an important research activity in the computational mechanics community and beyond.By "learned simulators" we mean methodologies able to learn from data the dynamics of a physical system so as to perform accurate predictions about previously unseen situations without the burden associated to the construction of numerical models by means of finite elements, finite volumes or similar techniques [1,2,3].Among their advantages we can cite that they are based on reusable architectures, can be optimized to work under really stringent real-time feedback rates, and are specially well suited for optimization and inverse problems.
While original, black-box approaches showed great promise, both industry and academia are reluctant to generalize their use, since small modifications in the input data may cause nonsense results.This is at the origin of the development and employ of inductive biases during the learning process [3,4].An inductive bias allows the learning algorithm to prioritize one particular solution over any other [5].This is particularly interesting for physical phenomena for which previous knowledge exists.Paul Dirac once said that [6] 'The underlying physical laws necessary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely known, and the difficulty is only that the exact application of these laws leads to equations much too complicated to be soluble.'Therefore, in the presence of centuries of knowledge about virtually any physical phenomena, it is simply nonsense to ignore it and to favor theory-blind, black-box approaches.
In this paper we develop a novel strategy based on the port-Hamiltonian formalism, which we extend so as to comply with the first and second principles of thermodynamics by construction [7,8,9].Port-Hamiltonian formalisms extend the well-known Hamiltonian (thus, conservative) physics to open systems and introduce the possibility of dissipation and control through external actuation within this theory.We show here, however, that general port-Hamiltonian systems do not comply a priori with the laws of thermodynamics and modify them so as to ensure this fulfillment.Based on this new formalism, which we call port-metriplectic, since it is at the same time metric and symplectic, we construct a deep neural network methodology to learn the physics of complex systems from data.The resulting port-metriplectic networks will comply by construction with the principles of thermodynamics-that can be enforced through hard or soft constraints-while they allow to analyze complex systems by parts.These parts will then communicate through energy ports to construct the final, complex systems.
The outline of the paper is as follows.In Section 2 we review the state of the art in the development of machine learning strategies that impose energy conservation by using a Hamiltonian formalism.We include here neural networks based upon port-Hamiltonian formalisms, which we show not be necessarily compliant with the principles of thermodynamics.We then develop the concept of portmetriplectic networks in Section 3.Then, in Section 4 we analyze the performance of the just developed neural networks, while in Section 5 we draw some conclusions.

Hamiltonian neural networks 2.1 Reversible dynamics as an inductive bias
Learning the physics of a given phenomenon from data can be seen as learning a dynamical problem [10].If we assume that the problem is governed by a set of variables z, which we can measure experimentally-a detailed discussion on the limitations and implications of this assumption can be found at [11]-, then the problem of learning the evolution of the system in time can be seen as finding the structure of the dynamical problem or, in other words, to find by regression the flow map Equivalently, we must find the particular form of the function f governing the dynamics of the system.This is done by regression, where neural networks play an important role, but by no means constitute the only possibility, as in [12,13,14].
This particular form of seeing the problem has important advantages.For instance, if the system under scrutiny is known to be conservative, or reversible, we can impose as an inductive bias the Hamiltonian form of the sought function f , where the Hamiltonian, H, whose canonical form depends on the position and momenta of particles, is now the total energy of the system, E. Under this prism, the problem (1) is now seen as to find the precise form of the skewsymmetric (symplectic) matrix L and the form of the energy of the system, E(z).If we enforce the particular form given by Eq. (3) during our regression procedure, it is straightforward to prove that the resulting evolution will be conservative.
Many works have leveraged this approach.Several authors take advantage of the Hamiltonian structure to construct symplectic integrators to predict conservative dynamical systems [15,16,17].Others, use the Hamiltonian principles to design more expressive deep neural network architectures [18] or to find the Hamiltonian function and phase space from data [19,20].The Hamiltonian paradigm is also widely used in quantum mechanics, where similar deep learning literature can be found in problems such as electron dynamics [21], learning ground states [22] or optimal control [23].Alternative formulations can be developed by resorting to the equivalent Lagrangian formalism, see [4,24,25,26,27], among others.

Port-Hamiltonian neural networks
If the physical phenomenon at hand is known to be dissipative, or if the system is open and thus no guarantee on the conservation of energy exists, things become more intricate.For dissipative systems, the easiest form of the evolution Eq. ( 1) could be, perhaps, a gradient flow [28].Their evolution can be established after some (dissipation) potential R in the form [29] Recently, the so-called Symplectic ODE nets (symODEN) [30,31], have tackled the issue of introducing dissipation in the learned description of the physics.It is also the approach followed in [32].More recently, two distinct works have tackled the dissipation problem by relaxing equivariance in the networks [33,34].
These works seem to be closely related to the vast corps of literature on the port-Hamiltonian approach to dynamical systems [7,9,8].Port-Hamiltonian systems assume an evolution of the system in the form where (q, p) are the generalized position and momenta, dissipation is included by adding a symmetric, positive semi-definite matrix D, and control is considered through an actuation term u and a non-linear function of the position g(q).Eq. ( 4) reduces to the Hamiltonian description if no dissipation nor control are considered.Here, we have assumed a canonical form for the Hamiltonian, i.e., that it depends on a set of variables z = {q, p}.More general forms can be expressed similarly.
The true advantage of using port-Hamiltonian formalisms as inductive biases in the learning procedure stems from the fact that, on one side, they allow the introduction of dissipation and control and, on the other, they model open systems (as opposed to classical Hamiltonian descriptions where energy conservation assumes inherently that the system is closed) [35].
Therefore, the use of port-Hamiltonian formalisms as inductive biases in learning processes is extremely interesting.However, as will be demonstrated in the next section, classical port-Hamiltonian schemes do not guarantee a priori to comply with the laws of thermodynamics, see [11].

Metriplectic biases for dissipative phenomena
In the case of dissipative phenomena, the first in proposing the introduction of a second potential, the so-called Mathieu potential, seems to have been Morrison [36,37], Grmela [38,39] and Kaufman [40].They suggested to consider an evolution of the governing variables of the type where S is precisely this second (dissipation) potential, entropy.
This formulation is often referred to as metriplectic, since it is metric and symplectic at the same time.Here, M (z) is a symmetric, positive semi-definite dissipation matrix and L(z), the Poisson matrix, continues to be skewsymmetric.
However, for this formulation to be consistent with the principles of thermodynamics, two additional conditions must hold, the so-called degeneracy conditions: and which give rise to the General Equation for the non-Equilibrium Reversible-Irreversible Coupling, GENERIC, equations [41,42,43,44,45].
In a nutshell, Eqs. ( 6) and (7) state that the energy potential is independent of dissipation, whereas entropy is unrelated to the energy conservation.If they hold, it is straightforward to demonstrate that, given the skew-symmetry of L, given the positive semi-definiteness of M .
These properties have been leveraged in some of our former works to develop what we have coined as thermodynamics-informed neural networks [46,47,48].
Given experimental data sets D i containing labelled pairs of a single-step state vector z t and its evolution in time z t+1 , we construct a neural network by considering two different loss terms.First, a data-loss term that takes into account the correctness of the network prediction of the state vector at subsequent time steps by integrating GENERIC in time, i.e., n is ground truth solution and żnet n is the network prediction.The choice of the time derivative instead of the state vector itself is employed to regularize the global loss function to a uniform order of magnitude with respect to the degeneracy terms.
We then consider a second loss term to take into account the fulfillment of the degeneracy equations in a soft way, .
These networks have have demonstrated to work very well for physics perception and reasoning in combination with computer vision [49,50].
These two loss terms are weighted and averaged over the N batch batched snapshots.
Alternative formulations of these thermodynamicsinformed networks exist in which the degeneracy conditions are imposed in hard way, see [51] and [52].
It is worth noting that, by comparing Eqs. ( 5) and ( 6)- (7), on one side, and Eq. ( 4), on the other, one readily concludes that port-Hamiltonian biases do not necessarily ensure the fulfillment of the principles of thermodynamics.Note that, since entropy does enter the classical port-Hamiltonian formulation, it is difficult to impose the fulfillment of the second principle of thermodynamics.Therefore, we suggest to extend the GENERIC formalism to open systems so as to develop alternative portmetriplectic biases.These are developed in the next section.

Port-metriplectic neural networks
Very few works exist, to the best of our knowledge, on the development of GENERIC formulations for open systems, that may lead to the development of portmetriplectic formulations.Maybe the only exception is [53], later on revisited by [54,55], both published in conference proceedings and, of course, with no machine learning approximations.Both approaches are essentially identical, and start from the bracket formulation of For open systems, these brackets take the form and In other words, both brackets are decomposed additively into bulk and boundary contributions.With this decomposition in mind, the GENERIC principle (5) now reads The degeneracy conditions ( 6) and ( 7) must be satisfied by the bulk operators only, since it is possible, in general, that there may be a reversible flux of entropy at the boundary or, equivalently, an irreversible flux of energy at the boundary [53], and The particular form of the boundary terms in Eq. ( 13) depends, of course, of the particular phenomenon under scrutiny, but in a general way it can be expressed using L and M operators as More particular expressions can be developed if we know in advance some properties of the system at hand.For instance, in Section 4.1 we deal with a double pendulum by learning the behavior of each pendulum separately.If we know in advance that the only boundary term comes from the energy-entropy pair transmitted by the other pendulum, and no other external contribution is present, more detailed assumptions in the form of degeneracy conditions can be assumed.This may lead to a decrease in learning time or the employ of less data.
Fig. 1 sketches the approach developed herein for complex systems.In the numerical results section below we explore the particular form that these terms could acquire for both finite and infinite dimensional problems.
We propose two learning procedures which correspond to different level of information available of the dynamics of the system.In the first example, we focus on two coupled subsystems in which we learn the self and boundary contributions of both subsystems to the global dynamics of the problem.This is the case when the interest is focused on the complete system divided into smaller subsystems.
In the second example, we suppose that the external influence is determined by a load vector as a result of an unknown external interaction with another subsystem.Thus, the learning procedure is focused on the self and boundary contributions of only one subsystem based on an external interaction.This case is convenient for applications where only partial information of the system is available.
4 Numerical results

Double thermoelastic pendulum
The first example is a double thermoelastic pendulum consisting of two masses m 1 and m 2 connected by two springs of variable lengths λ 1 and λ 2 and natural lengths at rest λ 0 1 and λ 0 2 , as depicted in Fig. 2.
x y Figure 2: Double thermoelastic pendulum.A single pendulum with external perturbations is learned, and the coupling between both systems is achieved via the portmetriplectic framework.
The set of variables describing each pendulum are here chosen to be where q, p and s are the position, linear momentum and entropy of the pendulum mass.
The evolution of the state variables of the second pendulum is defined as where the first two positive terms describe the self contribution of the simple pendulum (conservative and dissipative effects) and the third term describes the dissipative effect produced by the first pendulum affecting over the second pendulum.
On the other hand, the evolution of the state variables of the first pendulum is defined as where in this case the first two positive terms describe the self contribution of the first simple pendulum (conservative and dissipative effects) and the third and fourth terms describe the external contribution on the conservative and dissipative parts, both produced by the influence of the second pendulum over the first pendulum.
Note that the first pendulum has no conservative contribution to the second pendulum, i.e., the term does not exist.However, there is a conservative contribution from the second pendulum on the first pendulum, see [57].
It is worth noting, as previously pointed out in [35], that the fact that every term in Eq. ( 18) depends on the state variables z 1 makes the learning procedure more intricate.This is caused by the non-separable structure of Eq. ( 18).This problem is not present if the port terms depend only on time, as it is the case in Section 4.2 below.To overcome this limitation, we employ a structure-preserving neural network for each of the terms in Eq. ( 18).These networks share the weights, however, for both pendula, if they are known in advance to be identical.
The fact of using individual approximations of the dynamics of each subsystem (each pendulum) allows to use artificial neural networks of considerably smaller size with respect to an analysis of the whole problem using a larger number of variables to describe the global state [46].
The database consists of 50 different simulations with random initial conditions of position q and linear momentum p of both masses m 1 and m 2 around a mean position and linear momentum of q 1 = (4.5, 4.5) m, p 1 = (2, 4.5) kg•m/s, and q 2 = (−0.5,1.5) m, p 2 = (1.4,−0.2) kg•m/s respectively.The masses of the double pendulum are set to m 1 = 1 kg and m 2 = 2 kg, joint with springs of a natural length of λ 0 1 = 2 m and λ 0 2 = 1 m and thermal constant of C 1 = 0.02 J and C 2 = 0.2 J and conductivity constant of κ = 0.5.Note that the double pendulum constitutes a closed system as a whole, but this is not the case for each one of the simple pendula.Both start from a temperature of 300K.The simulation time of the movement is T = 60 s in N T = 200 time increments of ∆t = 0.3 s.
The boxplot in Fig. 3 shows the statistics of the L2 relative error of the rollout train and test simulations.

Interacting beams
In this example we consider two viscoeleastic beams that can interact through contact between them, see Fig. 4, and whose physics are to be learned.Synthetic data come from finite element simulations, assuming a strain energy potential of the type with J el the elastic volume ratio, I 1 and I 2 are the two invariants of the left Cauchy-Green deformation tensor, C 10 and C 01 are shear material constants and D 1 is the material compressibility parameter.The viscoelastic behavior is described by a two-term Prony series of the dimensionless shear relaxation modulus, with relaxation coefficients of ḡ1 and ḡ2 , and relaxation times of τ 1 and τ 2 .
We assume that the necessary state variables for for a proper description of the beams are the position q, its velocity v and the stress tensor σ, at each node of the discretization of the beams.Since both beams are identical, see Fig. 4 we characterize only one of them and develop a port-metriplectic learned simulator for the joint system.To do so, we employ thermodynamicsinformed graph neural networks [48].
Basically, a graph neural network is constructed on top of a graph structure G = (V, E, u), where V = {1, ..., n} is a set of |V| = n vertices, E ⊆ V ×V is a set of |E| = e edges and u is the global feature vector.Each vertex and edge in the graph is associated with a node in the finite element model from which data are obtained.The global feature vector defines the properties shared by all the nodes in the graph, such as constitutive properties.More details on the precise formulation can be found at [48].
To ensure traslational invariance of the learned model, the position variables of the system, q i , are assigned to the edge feature vector e ij so the edge features represent relative distances (q ij = q i − q j ) between nodes.The rest of the state variables are assigned to the node feature vector v i .We employ an encode-process-decode scheme [3], built upon multilayer perceptrons (MLPs) shared between all the nodes and edges of the graph.
We use this graph-based framework to learn the self contribution of the dynamics, i.e. the first two terms of Eq. ( 16).The boundary terms are learned using a standard structure-preserving neural network [46] with the additional input of the external forces applied to the beam.The results are presented in Fig. 5.The error magnitude is similar as the reported in previous work [48] in addition to the consistent formulation of post-metriplectic dynamics.

Conclusions
In this paper we have made two main contributions.On one side, the development of port-Hamiltonian-like approximations for dissipative open systems that communicate with other systems by exchanging energy and entropy through ports in their boundaries.This formulation extends the classical port-Hamiltonian approaches while guaranteeing the fulfillment of the laws of thermodynamics (conservation of energy in the bulk system, nonnegative entropy production).The resulting formulation, which we refer as port-metriplectic-since it consists of a metric term and a symplectic one-, presents a rigorous thermodynamic description of the dissipative behavior of the system.
On the other hand, the just developed formulation is employed as an inductive bias for the machine learning of the physics of complex systems from measured data.This bias is developed as a soft constraint in the loss term, although it can also be imposed straightforwardly as a hard constraint.
The resulting neural networks, for which we have formulated two distinct versions, one based on standard multilayer perceptrons, and a second one based on graph neural networks, have shown an excellent performance.Error bars are equivalent to those obtained in previous works of the authors, by employing a closed-system approach to the same physics.The new approach opens the door to the development of learned simulators for complex systems through piece-wise learning of the physical behavior of each of its components.The final, global simulator is then obtained by assembling each piece through their ports.

Figure 1 :
Figure1: Complex system created as a connected group of subsystems Ω i .The dynamics of any subsystem is described using a GENERIC formulation including conservative and dissipative terms, taking also into account the external contributions in the boundary terms Γ i .Sub-systems communicate between each other through ports by the exchange of energy and entropy.

Figure 3 :
Figure 3: Box plots for the relative L2 error for all the rollout snapshots of the double pendulum in both train and test cases.The state variables represented are position (q), momentum (p), and entropy (s).

Figure 4 :
Figure 4: One single beam problem is analyzed from simulation data.The resulting system is composed of two of these beams interacting together.

Figure 5 :
Figure5: Box plots for the relative L2 error for all the rollout snapshots of the interacting beams in both train and test cases.The state variables represented are position (q), velocity (v), and stress tensor (σ).