Information Geometry of Dynamics on Graphs and Hypergraphs

We introduce a new information-geometric structure associated with the dynamics on discrete objects such as graphs and hypergraphs. The presented setup consists of two dually flat structures built on the vertex and edge spaces, respectively. The former is the conventional duality between density and potential, e.g., the probability density and its logarithmic form induced by a convex thermodynamic function. The latter is the duality between flux and force induced by a convex and symmetric dissipation function, which drives the dynamics of the density. These two are connected topologically by the homological algebraic relation induced by the underlying discrete objects. The generalized gradient flow in this doubly dual flat structure is an extension of the gradient flows on Riemannian manifolds, which include Markov jump processes and nonlinear chemical reaction dynamics as well as the natural gradient and mirror descent. The information-geometric projections on this doubly dual flat structure lead to information-geometric extensions of the Helmholtz-Hodge decomposition and the Otto structure in $L^{2}$ Wasserstein geometry. The structure can be extended to non-gradient nonequilibrium flows, from which we also obtain the induced dually flat structure on cycle spaces. This abstract but general framework can extend the applicability of information geometry to various problems of linear and nonlinear dynamics.


I. INTRODUCTION
Information geometry is finding and establishing its firm position as the geometric language in various scientific disciplines [1,2]. Information geometry enables us to gain an intuitive understanding of the structures behind complicated problems of inference and estimation, for which Euclidean or Riemannian geometry is not sufficient. In addition, it can provide ways to devise new solutions and approaches for the problems[1]. While information geometry was originally developed for statistics, its applicability is now not limited to statistical problems. If notions of probability, information, or positive density appear in a problem, it is natural enough to consider its information-geometric structure.

A. Information geometry of dynamics
Dynamical systems and phenomena can be potential targets for information geometry, as we conventionally consider dynamics of probability distributions [3][4][5], e.g., the Fokker-Planck equations (FPE) and the Master equation, or those of positive densities, e.g, population dynamics, epidemic models, diffusion dynamics on networks, and chemical reaction dynamics [6][7][8]. Although applications of information geometry to those dynamics have been attempted almost since its birth, information geometry for dynamics is much less organized and principled compared with that for static problems in statistics, optimization, and others [1]. In connection with statistical inference, information geometry was employed by Amari and others to investigate Gaussian time series and autoregressive moving average (ARMA) models by representing their power spectrum as parametric manifolds [9][10][11]. This idea was also used to investigate linear systems [12]. Markov jump processes on finite states [1] were investigated information-geometrically by considering the hierarchical structure of joint or conditional probabilities at different time points, e.g., P θ (x 1 , x 2 , · · · , x t ) [13]. or by introducing exponential families of Markov kernel (transition matrices), T θ (x|x ′ ), via exponential tilting of the kernel [14? -20]. Furthermore, information geometry was applied to random walks and nonlinear diffusion equations of porous media [21,22]. In relation to mechanics, integrable systems were associated with the dualistic gradient flow of information geometry in seminal works [23,24], and other connections of information geometry with Lagrangian or [1] Also known as Markov chains Hamiltonian mechanics have been pursued [25][26][27].

B. Information measures for dynamics
Concurrently with and almost independently of these attempts within the community of information geometry, information measures relevant to information geometry have been employed and manifested in various problems of dynamical systems and stochastic processes in information theory [28], filtering theory [29,30], control theory [31][32][33], and non-equilibrium physics and chemistry [34][35][36]. Kullback-Leibler (KL) divergence [37] for probabilities and positive densities was found to work as a Lyapunov function of Markov jump processes (MJP) [5], FPE [3,38], deterministic chemical reaction networks (CRN) [39,40], and other dynamical systems [41,42], the origin of which can date back to Gibbs' H-theorem [43]. Among those topics, since the establishment of chemical thermodynamics by Gibbs [43] and chemical kinetics by Guldberg and Waage [44], CRN has worked as a seedbed for cultivating the theory between dynamics and divergence due to its close connection with thermodynamics [45][46][47].
In addition to the KL divergence, the Fisher-information-like quantity I F [p] := p(r)(∇ r ln p(r)) 2 dr ∈ R ≥0 (1) was also revealed to play important roles in characterizing dynamics for densities on a continuous space, e.g., Gaussian convolution, diffusion processes, and FPE [53][54][55]. Various governing equations in physics were claimed to be derived in a unified way from this quantity [35]. The quantity I F looks like the Fisher information [56] but is different from the conventional Fisher information matrix [57][58][59] because the derivative ∇ r ln p(r) is not for the parameters but for the base space variable of p(r) [2] . Because I F is a scalar, we follow [58] and call it Fisher information number. The Fisher information number I F was found to be related to the Kullback-Leibler divergence in additive Gaussian channels [53] and other [2] The relation between the two forms of Fisher information has been explained in multiple ways. For example, they are related as the shift of the base space via parameters [35,60]. The Fisher information number was introduced by Rao [57].
systems [55,61], which is known as the De Bruijn identity [53]. In addition, the logarithmic Sobolev inequality is also viewed as a relation between the Fisher information number and the KL divergence (or Shannon information) [62,63]. These results have recently been associated with the formal Riemannian geometric structure induced by the L 2 -Wasserstein geometry [64,65].

C. Information geometry and dynamics in machine learning
On top of these traditional trends, information geometry is now playing pivotal roles in machine learning to design and evaluate online optimization algorithms (dynamics) in the space of model parameters such as natural gradient [66] and mirror descent [67,68] as well as evolutionary computation (information-geometric optimization) [69]. Geometric interpretation allows us to understand the behaviors and efficiency of algorithms and their dynamics more intuitively in a principled manner [68][69][70].

D. Aim and contributions of this work
Despite the wide applicability and the history of information geometry, we still lack a solid theoretical framework to unify these outcomes that spread across different fields from the viewpoint of information geometry. In this work, we introduce a new information geometric structure for the dynamics of probability and positive densities. In this structure, we consider not only the single dually flat structure built on the space of densities as in [23,24] but also another structure constructed on the space of fluxes. The two of them are linked algebraically and topologically via the continuity equation and the gradient equation as illustrated in Fig. 1.
Under this doubly dual flat structure, we can consider the dynamics of densities as a generalized flow, and various results known so far can be unified in this framework. We exclusively consider dynamics of densities on finite-dimensional discrete manifolds, i.e., finite graphs or hypergraphs, because the structure introduced here can be explicitly manifested in this setup and also because we do not need the mathematically elaborated setup for infinite-dimensional information geometry on a smooth manifold [71]. For the case of FPE in a continuous state space, the dually flat structure built on the flux space is reduced to the formal Riemannian geometric structure of L 2 Wasserstein geometry because the convex functions that induce the dually flat structure become quadratic. Our structure generalizes the linear inner product on the tangent and cotangent spaces of the Riemannian structure with the nonlinear Legendre transform, thereby requiring information and Hessian geometry.
By elucidating this information geometric structure, we can easily see that some quantities such as the bilinear product, convex thermodynamic potential functions, the Fisher information matrix, and the Fisher information number are consolidated into one quantity for FPE with the quadratic convex functions (see Sec. V D and Sec. V E). Therefore, our structure provides a way to unify the dualistic gradient flow mentioned in Sec. I A and also the information-number related topics in Sec. I B. From the viewpoint of homological algebra, the structure we work on is a modification of the chain and cochain complexes of graphs or hypergraphs by replacing the usual inner product duality [72] on each pair of chains and cochains with Legendre duality. Moreover, the dually flat space built on the flux space is linked to a finite-dimensional version of Orlicz spaces [73], which have been employed for constructing infinite-dimensional information geometry [71]. From this generalization and the nice properties of the doubly dual flat structures, we can obtain generalized versions of the Helmholtz-Hodge-Kodaira (HHK) de-composition (Thm. 1), the Otto calculus (Thm. 2), and its extension to cycle spaces(Thm. 3).
Our construction of information geometry for dynamics is heavily based on the idea of  [74][75][76][77][78][79][80][81]. We clarified its information-geometric aspects in the context of CRN and thermodynamics in our previous work [82]. We also concurrently elucidated the intimate link of equilibrium chemical thermodynamics and information geometry on the density state space [83][84][85]. In light of those, the contribution of this work is three-fold. First, we integrate these results in terms of information geometry, which clarifies the underlying geometric nature of the problem, provides transparent interpretations for known results, and leads to new information geometric results and insights (Thm. 1-Thm. 3); Second, this structure substantially extends the applicability of information geometry to a wide variety of dynamical problems and can unify the results obtained in different disciplines; Lastly, the structure links information geometry to algebraic graph theory, discrete calculus, and algebraic homology, which was not appreciated before and provides a way to consider the topology of the underlying manifold in information geometry.

E. Organization of this paper
This work is organized as follows: In Sec. II, we introduce the models of dynamics on graphs and hypergraphs. In Sec. III, we outline the homological algebra of graphs and hypergraphs. In Sec. IV, we abstractly introduce the doubly dual flat structures on the density and flux spaces and define the generalized flow associated with the structures.
In Sec. V, we clarify that the introduced structures include a wide class of dynamics on graph and hypergraph. In Sec. VI and Sec. VII, we further define information-geometric objects and quantities, which naturally appear from the setup and play an integral role in the subsequent analysis of dynamics. In Sec. VIII and Sec. IX, we derive several results for equilibrium and nonequilibrium flows, respectively. Finally, we provide a summary and prospects of our work in Sec. X. The notation and symbols are listed in Appendix.

II. CLASSES OF MODELS FOR DYNAMICS ON GRAPH AND HYPERGRAPH
In this work, we focus on linear and nonlinear dynamics defined on graphs [86] and hypergraphs [87].
The linear dynamics on graphs (LDG) includes Markov jump processes (MJP) [88], consensus dynamics [89], monomolecular chemical reaction networks [90], linear electric circuits [91], and other linear dynamics on graphs [86,92]. We consider an extension of LDG to hypergraphs and nonlinear dynamics, common instances of which are chemical reaction networks (CRN) with the law of mass action (LMA) kinetics [8] and polynomial dynamical systems (PDS) [93]. Because the extension we deal with in this work is a subclass of nonlinear dynamical systems on hypergraphs, we use CRN to designate this subclass.
In the following subsections, LDG and CRN are introduced using the language of algebraic graph theory [86,94]. Then, we also give a brief and formal introduction of the Fokker-  [4] (Fig. 2 (a)). The incidence relation is represented by the incidence otherwise.
An edge-weighted finite graph G k ± := ({v i }, {e e }, B, {k ± e }) has two positive weighting parameters k ± e = (k + e , k − e ) ∈ R >0 for each edge e e . Parameters k + e and k − e are denoted as forward and reverse rates or weights of the edge e, respectively. [4] This means that we exclude loops. A reversible linear dynamics (rLDG) on a graph is defined on the edge-weighted finite graph G k ± : Definition 2 (Reversible linear dynamics on a graph G k ± ). The reversible linear dynamics of the non-negative density x(t) = (x 1 (t), · · · , x Nv (t)) T ∈ R Nv ≥0 on G k ± is defined by the continuity equationẋ and linear forward and reverse one-way fluxes j ± (x) = (j ± 1 (x), · · · , j ± Ne (x)) T ∈ R Ne ≥0 with the following specific functional form [5] : where j(x) := j + (x) − j − (x) is the total flux, the symbol • denotes the component-wise product of two vectors [6] , and B + and B − are the head and tail incidence matrices defined respectively as B + := max[B, 0] and B − := max[−B, 0]. The incidence matrix B in Eq. 2 is often regarded as the discrete divergence operator on a graph [72] and denoted also as [5] We may consider other functional forms for j(x), which can induce nonlinear dynamics on the graph.
In this work, we focus mainly on the linear case. [6] Also known as the Hadamard product or Schur product of vectors. div B = B to emphasize this interpretation in this work [7] .
A reversible Markov jump process (rMJP) is a representative class of the rLDG describing random jumps of noninteracting particles on G k ± . The weighting parameter k + e is interpreted as the forward jump rate from the head of the oriented edge e e to its tail, whereas k − e is the reverse jump rate from the tail to the head of e e [8] . For infinitely many such particles, we consider p i (t) ∈ [0, 1], the fraction of particles on vertex v i at time t, which is a non-negative density on vertices. Then, the forward and reverse one-way fluxes on the eth edge defined by Eq. 3 are represented as where v + e and v − e are the head and tail vertices of edge e e [9] . The linearity of j ± e (p) with respect to p comes from the independence of the dynamics of each particle on the graph.
Definition 3 (Weighted asymmetric graph Laplacian [94,95]). For G k ± , the corresponding weighted asymmetric graph Laplacian is defined by where θ := (k + , k − ). Using L θ , Eq. 2 and Eq. 3 are represented aṡ often called a discrete gradient operator [72]. For a Euclidean space, they are the same. [8] If we allow k ± e to be 0, we can include the irreversible MJP and also LDG in this formulation.
We leave this extension for future work because it should require additional assumptions on the Legendre duality introduced in the subsequent sections. [9] Here, we have abused the notation v + e to indicate the index of the vertex v + e . Eq. 3 is reduced to Eq.
4 because b ± i,e = +1 only when i is the index of the head (tail) vertex v ± e of e e and 0 otherwise.
The operator L θ is reduced to the weighted symmetric graph Laplacian if k + = k − and also to the conventional graph Laplacian if k + = k − = 1 [94,95]. Eq. 6 can cover linear transport on graphs, consensus dynamics on graphs [89], and other linear dynamics on graphs [86,92].
B. Chemical reaction network and polynomial dynamics on hypergraphs Next, we introduce a class of nonlinear dynamical systems on hypergraphs, which includes the rLDG (Eq. 2 and Eq. 3) as a special case. The most common instance is deterministic chemical reaction networks (CRN) with the law of mass action (LMA) kinetics [7,8,44,96], and this class is sometimes referred to as polynomial dynamical systems (PDS). Because the major part of the PDS theory has been developed for CRN, we use CRN to introduce and specify this class of nonlinear systems in this work. where N X , N e ∈ Z >0 (Fig. 2 (b)). Each hyperedge e e connects two different hyperverticeŝ v + e andv − e wherev + e =v − e [10] . The hypervertices are multisets of vertices is the number of the ith vertex included in the ℓth hypervertex. Thus, the nonnegative integer vector γ ℓ := (γ 1,ℓ , · · · , γ N X ,ℓ ) T ∈ Z N X ≥0 defines the ℓth hypervertex. Let Nv be the total number of the hypervertices and L := be the hypervertex matrix. The matrix B ∈ {0, ±} Nv×Ne is the incidence matrix encoding the incidence relations among the hypervertices and the hyperedges.
The hypergraph incidence matrix S ∈ Z N X ×Ne is then defined as If In the context of CRN theory, the vertices {X i } correspond to the molecular species involved in a CRN, and each hyperedge e e represents a pair of forward and reverse reactions: [10] This means that we exclude loop hyperedges.
where the forward and reverse reactions are from left to right and from right to left, respectively. Head and tail hyperverticesv + e := (γ + 1,e X 1 + · · · + γ + N X ,e X N X ) andv − e := (γ − 1,e X 1 + · · · + γ − N X ,e X N X ) in Eq. 8 are the sets of reactants and products of the eth forward reaction, respectively. More specifically, γ + i,e ∈ Z and γ − i,e ∈ Z are the numbers of the molecules X i involved as the reactants and products of the eth forward reaction, respectively.
For the reverse reaction,v − e andv + e are the reactants and products. The hypervertices are called complexes in CRN theory [8] [11] . From {γ + i,e } and {γ − i,e }, we can define where ∓s e specify the change in the number of molecules induced when the eth forward or reverse reaction occurs just once, respectively. The hypergraph incidence matrix S defined in Eq. 7 is represented as S = (s 1 , · · · , s Ne ) ∈ Z N X ×Ne . In chemistry, the negative of s e and S, i.e., −s e and −S, are called the stoichiometric vector and matrix, respectively [8].
Remark 1. To define a reversible CRN hypergraph, the hypergraph matrix S is not sufficient.
If the head and tail hypervertices of a hyperedge can contain the same vertex (molecule), the corresponding element in S of such a shared vertex becomes 0 by canceling out. Thus, the existence of shared vertices (molecules) is invisible in S, and the pair ( L , B) is required to define H. Such shared molecules are called catalysts in CRN.
For a CRN hypergraph, the continuity equation for CRN is defined: Definition 5 (CRN continuity equation). Let a vector of nonnegative densities x = (x 1 , · · · , x N X ) T ∈ R N X ≥0 represents the concentration of molecules {X i }. The CRN continuity equation is defined asẋ where j + e (x) ∈ R ≥0 and j − e (x) ∈ R ≥0 are the one-way fluxes of the eth forward and reverse reactions, j ± (x) := (j ± 1 (x), · · · , j ± Ne (x)) T ∈ R Ne ≥0 are their vector representations, and j(x) := j + (x) − j − (x) ∈ R Ne is the total reaction flux [7,8,96]. The hypergraph divergence operator div S := S is defined accordingly. [11] Because we use the term complex to for the cell complex in homological algebra, we use hypervertices to indicate the complexes of the CRN theory.
To define the dynamics of a CRN, the functional form of j ± e (x) is required [12] . Before introducing specific forms, we define two important properties of the flux and also other functions defined on edges: Definition 6 (Consistency of the flux j ± (x) with the hypergraph H). The one-way flux The consistency condition is indispensable to prohibit a reaction that can decrease x i from occurring when x i = 0. For j ± (x), the locality means that the flux of the eth reaction depends only on the concentrations of its reactants and products. The local flux is determined solely by the information stored on the vertices incident to the edge and plays a crucial role when we regard the structure introduced in this work as an extension of differential forms on continuous manifolds to graphs and hypergraphs. When we work on specific forms of fluxes, we consider only local fluxes consistent with the given hypergraph H.
In chemistry, we have a variety of candidates for the functional form of flux, e.g., the Michaelis-Menten function, Hill's function, and others [7,97]. Among those, the LMA kinetics is the most basic and well-established one.  [12] Even though the functional form of j ± e (x) is automatically determined in the case of LDG because of the linearity, we have multiple possibilities to define j ± e (x) for the nonlinear case.
where k + e ∈ R >0 and k − e ∈ R >0 are the reaction rate constants of the eth forward and reverse reactions, respectively. In this form, it can be compactly represented as where [13] . We use the subscript MA as in j ± MA (x) to discriminate this specific form of the flux from others. We can easily observe that j ± MA (x) is consistent and local with respect to H. Furthermore, j ± MA (x) is specified by the edge-weighted CRN hypergraph Owing to the functional form of Eq. 12, a CRN with LMA kinetics is also called a PDS in which we have the polynomials of x on the right-hand side of Eq. 10.
Remark 2 (Algebraic aspect of LMA kinetics). Because x L T is a vector of monomials of x, each one-way flux, j + e (x) and j − e (x) are monomials of x under Eq. 12 and thus the total flux j e (x) = j + e (x) − j − e (x) is a binomial. This fact links the real algebraic geometry of the toric varieties [98,99] to CRN [100,101] as it does in algebraic statistics [102,103].
Remark 3 (Extended LMA kinetics). While we mainly work on the normal LMA kinetics, we can extend it. The extended LMA kinetics defined on H is defined as where g(x) ∈ R Ne >0 for x ∈ R Ne ≥0 and is local with respect to H [14] . An example of the extended LMA kinetics is reversible Michaelis-Menten kinetics [105].
By combining the continuity equation (Eq. 10) and the LMA kinetics (Eq. 12), we have the following: where L θ is the weighted asymmetric graph Laplacian defined as in Eq. 5. Now, we can see that CRN contains rLDG (Eq. 6) as a special case if L = I. Owing to this inclusion relation, [13] We should note an important relation, Let r ∈ R d be a vector in a d dimensional Euclidean space. We consider infinitely many noninteracting particles randomly walking in the space and describe the dynamics by a probability density p t (r) of the particles. The continuity equation where j FP [p t (r)] ∈ R d is the probability flux, ∇ := (∂/∂r 1 , · · · , ∂/∂r d ) T is the gradient operator on the Euclidean space, and (∇·) : where F (r) ∈ R d is the drift force, D 0 ∈ R >0 is the diffusion constant.

AND HYPERGRAPHS
The algebraic and topological structure of the dynamics on graphs and hypergraphs can be explicitly and abstractly treated by using the language of discrete calculus and homological algebra. The discrete version of the gradient and divergence mentioned in Sec. II is also characterized. In this section, we briefly introduce the chain and cochain complexes defined for a finite graph or a hypergraph and discrete calculus [72,94,106,107]. We first introduce the complexes for a graph G and then extend them to a hypergraph H algebraically [15] . It [15] The complex used here should not be confused with complexes used in CRN theory [8] should be noted that the conventional discrete calculus (the discrete version of the theory for differential forms) presumes the metric structure in the dual space of chains and cochains or that of cochains on primal and dual complexes [108,109]. However, we are going to introduce Legendre duality instead. For this purpose, our introduction of chain and cochain complexes depends only on the topological (algebraic) information of the underlying graph and hypergraph [72] without specifying the metric information.

A. Chain and cochain complexes on graphs
The elements of a graph G are called cells in discrete calculus and algebraic graph theory [16] . A vertex and an edge are, respectively, called 0-cell and 1-cell, and the graph G is denoted as a cell-complex [17] . For each type of cell, we consider vectors (chains and cochains) defined on the cells. For G, a 0-chain with field R is an N v -tuple of real scalars, each of which is assigned to a vertex, i.e., a 0 cell. Thus, a 0-chain is a real vector defined on the vertices of G with the basis {v i }. This basis is called the standard basis. The components of the vector x ∈ C 0 (G) are given as x(v i ) := x i . The vector space of real 0-chains is called the vertex space and denoted as C 0 (G) = R Nv [94] [18] [19] . Similarly, a real 1-chain is a real vector defined on the edges of G. The real vector space of 1-chains is called the edge space and denoted as [16] We follow the terminology in [72]. While we use "cell", we do not presume any n-dimensional topological manifold underlying the graph. The graph is just treated algebraically as in algebraic graph theory and homological algebra. The definition of the higher-order elements than edges for a graph requires additional structural information to the incidence matrix of the graph, e.g., the edge-face incidence matrix. [18] In algebraic graph theory, the chain of a graph G is defined as an integer-valued vector space Z Nv to represent the discrete and combinatorial nature of G and also to specify the domain of integration. Here, we use R as the field of the vector space. Specifically, our chains C p (G; R) should be considered as the cochains on the dual complex C p (G * ; R), which is isomorphic to C p (G; R). In the theory of differential forms on N -dimensional manifolds, p-forms (p cochains) and (N − p)-forms (N − p cochains) have the same dimension. The pform and the (N − p)-form are linked by the Hodge star operator. For a cell complex, this duality be-C 1 (G) = R Ne . The standard basis is introduced by using edges {e e } accordingly. A flux j is a 1-chain: j(e e ) := j e [20] . The graph incidence matrix B induces the discrete differential δ 1 : C 1 (G) → C 0 (G) as δ 1 j := Bj [21] .
To obtain an exact sequence, we algebraically define the (−1) and 2 chains and the corresponding differentials δ 0 and δ 2 .
is a set of complete basis of KerB where v i ∈ {0, +1, −1} Ne . In algebraic graph theory, KerB is called a cycle subspace [86,94,110]. For a graph G, we can construct {v i } i∈[1,Nz] by, for example, using the fundamental cycle basis of G obtained from a fixed spanning tree of G [22] [86]. Thus, C 2 (G) is the vector space defined on the cycles of G and isomorphic to the cycle subspace. We define a matrix, V := (v 1 , · · · , v Nz ) [23] , and the differential From the matrix U := (u 1 , · · · , u N l ) T , the differential δ 0 : C 0 (G) → C −1 (G) is defined as  [72]. Here, to make the presentation simple without introducing the dual complex, we avoid these technical arguments. [19] In the theory of differential forms, the density is an N -form (cochain) of anN -dimensional manifold, which is linked to a 0-form via the Hodge star operator. [20] Again, in the theory of differential forms, the flux is an (N − 1)-form (cochain) of an N dimensional manifold. Through the Hodge star operator, we identify it with the 1-chain with the field R. [21] In algebraic graph theory, B is also identical to the discrete boundary operator from C 1 (G; Z) to C 0 (G; Z). [22] The spanning tree chosen defines a fundamental cycle and cocycle bases. [23] V T is called the fundamental tieset matrix in graph theory the exact chain sequence [24] [25] : Because C p (G) is a vector space for each p = {−1, 0, 1, 2}, we can consider its dual vector space C p (G) := C * p (G) consisting of the linear functions on C p (G). An element of C p (G) is called p-cochain. Let ·, · : C p (G)×C p (G) → R be the standard bilinear pairing of the p-chain and p-cochain defined with the standard basis. The transposes of U, B, and V induce the differentials between cochains as δ −1 := U T : and δ 1 := V T : C 1 (G) → C 2 (G). The differentials δ p on cochains are the adjoint of the differentials δ p on chains, which induce the exact cochain sequence: Note that the definition of chains, cochains, and differential operators are topological in the sense that we do not include any metric information in them.

B. Chain and cochain complexes on hypergraphs
The definitions of chain and cochain complexes introduced above are algebraically extended to hypergraphs H simply by replacing the graph incidence matrix B with the hypergraph incidence matrix S.
Definition 9 (Exact chain and cochain sequences on a hypergraph). The chain and cochain complexes on a hypergraph are defined by the following diagram: [ 24] We should note that the sequence is not canonical because U and V depend on the choice of bases. [25] Upon necessity, we can consider the harmonic components by employing an under-complete basis for

V.
The bases, V and U, are obtained as integral bases, i.e., the components of V and U can be chosen from Z because S is an integer-valued matrix [26] . As we will explain in Sec. VI and Sec. IX, the meaning of C 2 (H) can be retained as the space on generalized cycles. The meaning of C −1 (H) becomes the space of conserved quantities under the dynamics (Eq. 10).

C. Discrete calculus on graphs and hypergraphs
The p-cochain and p-chain introduced above are the algebraic abstraction of the pdifferential form and its Hodge dual on a differential manifold [72]. Accordingly, the discrete versions of gradient, divergence, and curl are associated with the differentials (exterior derivative).
Definition 10 (Discrete gradients, divergences, and curls). The discrete gradient is defined as grad B := δ 0 = B T for G and also as grad S := δ 0 = S T for H. The adjoints of the gradients are defined with the corresponding adjoint differentials: grad * B := δ 1 = B and grad * S := δ 1 = S. They are called discrete divergences and denoted as div B = grad * B and div S = grad * S [27][28] [29] . The discrete curl and its adjoint are defined as curl V := δ 1 = V T and curl * V := δ 2 = V, respectively.

D. Metric structure in discrete calculus
In the theory of LDG, a metric matrix M p and its associated inner product are typically endowed for each p. To contrast it with the Legendre duality introduced later, we briefly outline it here. For an edge-weighted graph G k ± and for the case that k + = k − = k ∈ R Ne >0 , [26] As far as we know, we do not have a systematic and widely-appreciated way to define these bases because we have multiple ways to extend the notion of spanning tree of a graph to a hypergraph. [27] These notations are consistent with those in Sec.
II [28] If we define 3-cells, then the proper divergence is defined as the differentials C 2 (G) → C 3 (G). [29] This convention is confusing because, in the the-ory of differential form, the divergence is identified with δ 2 : C 2 → C 3 . This is because C 3 can be isomorphic to C 0 via the Hodge star operator in the three dimensional Euclidean space [72]. In algebraic graph theory, the trivial metric, i.e., the identity matrix, is implicitly employed when grad * B is identified with div B [94]. M 0 = I and M 1 = diag[1/k] are conventionally employed. With these metric matrices, the graph Laplacian introduced in Eq. 5 can be described as where M p := M −1 p . By including such metric information, the following pair of metric gradient and divergence is often used in graph theory and network theory: This symmetric graph Laplacian L k induces a linear dynamics on graph via Eq. 6:ẋ The eigenvalues and eigenvectors of L k enable us to obtain spectral information of the underlying graph [95]. Even for nonlinear dynamics on a hypergraph as in Eq. 14, the same symmetric Laplacian can provide some information when k + = k − = k. We can also include other information in the metric such as the degree of vertices [111]. Various normalizations of the graph Laplacian can be attributed to the choice of metrics.
However, such a choice of metric matrices ends up only with linear dynamics and is relevant only when the weighting is symmetric: k + = k − = k. In addition, it may not always capture important aspects of the dynamics such as gradient flow properties and informationtheoretic properties, because nonlinear terms such as ln p appear in information-theoretic quantities. To extend the class of dynamics being covered and to enable the informationgeometric characterization of dynamics, we have to generalize the conventional inner product structure by replacing it with the Hessian-information geometric structure induced by convex functions.

FLOW
In this section, we introduce two pairs of dually flat spaces ( Fig. 1): one is associated with the vertex spaces, i.e., the dual spaces of 0-chains and 0-cochains. The other corresponds to the edge spaces, i.e., the dual spaces of 1-chains and 1-cochains. By combining them, the dynamics on graphs and hypergraphs are characterized as a generalized flow.

A. Dually flat spaces on vertices and thermodynamic functions
We work on the density x and the vertex space for CRN because its reduction to rLDG is straightforward. For a probability vector p, the introduction of dually flat spaces of p and ln p is natural from the information-geometric viewpoint. In CRN, x is the vector of concentrations of molecular species. As we recently clarified [47,101], the dually flat spaces, in this case, result from the Legendre duality between extensive and intensive variables in thermodynamics, which is also natural from the physical viewpoint.
Definition 11 (Primal vertex space (density spaces)). The primal vertex space (also called density space) is the positive orthant X := R N X >0 of the vector space R N X , together with an isomorphism R N X ≃ C 0 (H) (Fig. 1, lower left).
Remark 4. The density space X is defined as the positive orthant rather than as R N X ≥0 , which excludes the cases where some elements of x become 0. From the viewpoint of information geometry, this restriction is necessary to consider densities with the same support (all x in X should be equivalent in terms of absolute continuity of measures). From the viewpoint of dynamical systems, depending on the specific functional form of the flux j(x), the trajectory is known as persistence [30] . Without going into this intricate problem, we simply assume that x(t) ∈ X for t ∈ [0, ∞]. We call ∂X := R N X ≥0 \ X the boundary of X .
We define the dual vertex space by the Legendre transform via the thermodynamic function: Definition 12 (Primal thermodynamic function). A strictly convex differentiable function Φ(x) : X → R is called the primal thermodynamic function [31] [32] if the following two [30] The persistence of a dynamical system is a hard problem, and the persistence for a subclass of CRN was an open problem until very recently [112,113], which goes by the name of Global Attractor Conjecture since 1974. [31] In information geometry, the convex function inducing duality is often called a potential function. We avoid using the word "potential" to dis-criminate it with an element of the dual vertex space, which is called a potential (field) or chemical potential in physics and chemistry. In addition, "thermodynamic" implies the additional two conditions we introduced. [32] We may consider a convex function Φ(x), which conditions are satisfied: (1) the associated Legendre transformation (2) for any x in ∈ X and any point on the boundary holds where Definition 13 (Dual vertex space (potential space) and dual thermodynamic function).
The dual vertex space Y = R N X called potential (field) space is an affine space dual to X with the associated vector space C 0 (H)(( Fig. 1, upper left)) [33] . The dual thermodynamic function Φ * (y) : Y → R is the Legendre-Fenchel conjugate of the primal thermodynamic function: where ·, · : C 0 (H) × C 0 (H) → R is the bilinear pairing under the standard basis. From the properties of the primal function, Φ * (y) is also a strictly convex differentiable function.
The Legendre transformations, ∂Φ and ∂Φ * , are continuous and establish a bijection between X and Y, where ∂Φ * = ∂Φ −1 . In the following, we regard a pair (x, y) with the same decoration as a Legendre dual pair satisfying y = ∂Φ(x). For a pair, the Legendre-Fenchel-Young identity holds: Different pairs are discriminated with the difference of decorations as (x ′ , y ′ ) or (x p , y p ).
Based on the thermodynamic functions, the Bregman divergences can be defined: does not induce a bijection between X and Y, e.g., the one which is not strictly convex. Such a situation can arise if a phase transition occurs. It would be an important extension to include this class of functions in this framework. [33] Y is associated with the 0-cochain. In the theory of differential forms, a 0-form is often described as a potential field on a manifold. Our choice of the potential space is consistent with this convention.
The non-negativity of the Bregman divergence follows from the Fenchel-Young inequality for products [115,116]. Furthermore, from the strict convexity of the thermodynamic function, Bregman divergences are defined for (y, y ′ ) and also for (x, y ′ ) as Because (x, y) and (x ′ , y ′ ) are Legendre pairs, all the three representations are equivalent [34] : Finally, the Hessian matrices of the primal and dual thermodynamic functions are defined when they are twice differentiable [36] : Definition 15 (Hessian matrices). The primal and dual Hessian matrices, G x ∈ R Nv×Nv and G * y ∈ R Nv×Nv , of thermodynamic functions, Φ(x) and Φ * (y), are defined as In addition, they are positive definite [37] and G −1 x = G * y holds for a Legendre dual pair x and y.
The Hessian matrices induce linear mappings between the tangent spaces T x X and T y Y where T x X and T y Y are isomorphic to the corresponding cotangent spaces T * y Y and T * x X and also to copies of C 0 (H) and C 0 (H): [117]. [35] We here used the Legendre-Fenchel-Young identity (Eq. 25). [36] When we work on Hessian matrices, we always suppose additionally that they are twice-differentiable. [37] The positive-definiteness follows from the properties of thermodynamic functions being strictly convex and twice differentiable.
The typical example of the duality between x and y in statistics is that between x = p and y = ln p. Other than this typical one, depending on the purpose, we adopt or design different forms of thermodynamic functions (Φ(x), Φ * (y)), associated dual variables, and Bregman divergence to endow different properties to inference or estimation methods that we are designing[1]. In the case of CRN, the thermodynamic functions and Legendre duality are associated with the equilibrium thermodynamics [118]. Specifically, as we recently demonstrated [119], X and Y are the conjugate spaces of the extensive and intensive thermodynamic variables (density of molecules and their chemical potential), Φ(x) is the thermodynamic potential function of the system, and the Bregman divergence becomes the difference of the total entropy. These correspondences are derived directly from the axiomatic formulation of thermodynamics [118,119]. The explicit functional form of Φ(x) is then determined by the physical details of the thermodynamic system that we work on.
Before closing this subsection, we introduce the notion of separability, which will be linked to the locality of the flux.

Definition 16 (Separability of a thermodynamic function). A thermodynamic function
where c i > 0 and x o i > 0 are positive weights and φ(x) : R >0 → R is a scalar primal thermodynamic function without condition (2) in Def. 12 [38] [39] . [40] . If a thermodynamic function is separable, then the corresponding Bregman divergence is separable. The Hessian matrices become diagonal for a separable thermodynamic function. Most of our results can hold without the separability, but common thermodynamic functions and related quantities are typically separable. For example, the Kullback-Leibler divergence is an example of separable Bregman divergences. [38] Separability should not be confused with the separability of topological space. The separability here is motivated by the separability of the multidimensional convolution operation. [39] The condition (2) is not necessary because it automatically follows from the strict convexity and the condition (1). [40] One may further generalize the separability such

B. Dually flat spaces on edges and dissipation functions
Next, we introduce another dually flat structure onto the edge space of graphs and hypergraphs based on the flux-force relation.
Definition 17 (Primal and dual edge spaces (flux and force spaces)). The flux and force spaces on the edges, J x = R Ne and F x = R Ne , are a pair of the primal and dual vector spaces defined for each x ∈ X , which are isomorphic to the copies of C 1 (H) and C 1 (H), respectively ( Fig. 1, right) [41] . The bilinear pairing under the standard basis ·, · : To introduce Legendre duality on (J x , F x ), we use the dissipation functions: Definition 18 (Dissipation function [42] , is a strictly convex and continuously differentiable function with respect to f for all x ∈ X that also satisfies the following additional conditions: Symmetric: Bounded below by 0: Proposition 1 (Duality of dissipation functions). The Legendre-Fenchel conjugate of , is also the dissipation function on J x . Ψ x (j) and Ψ * x (f ) are called primal and dual dissipation functions.
Proof. For each x ∈ X , the function Ψ x (j) is strictly convex, continuously differentiable, . From the convexity and symmetry, the minimum of Ψ x (j) is attained at j = 0 and [41] While we introduced the affine spaces for X and Y associated with C 0 (H) and C 0 (H), we here directly associate J x and F x with C 1 (H) and C 1 (H). [42] The definition of dissipation functions is more strict than those used in the previous works, e.g., [74]. This is because we define generalized projections in this space as in [82].
From these properties, for each x ∈ X, the one-to-one Legendre duality via Legendre transformations is established for all over (J x , F x ): In the following, we abbreviate the Legendre transformations as [43] . Similarly to the Legendre dual pair (x, y) in the spaces X and Y, a pair of flux and force with the same decoration, e.g., (j, f ) x or (j 0 , f 0 ) x , represents a Legendre dual pair linked by Eq. 34 at x. We omit the x-dependency for simplicity. The Legendre dual pair (j, f ) satisfies the Legendre-Fenchel-Young identity for each x ∈ X : Furthermore, the additional conditions of dissipation functions endow the Legendre duality with the properties necessary to work as an extension of a Riemannian metric structure: Proposition 2 ( [74]). The Legendre transformations satisfy the following properties: (Pairing of 0 ∈ J and 0 ∈ F ): The first property means that zero force f = 0 and zero flux j = 0 are always Legendre dual regardless of x, and the second one indicates that if (j, f ) is a Legendre dual pair, then (−j, −f ) is one as well [44] . The third property, as well as the nonnegativity of the dissipation functions, enable them to work as extensions of the norm [45] .
With the dissipation functions, Ψ x (j) and Ψ * x (f ), we now have the second dually flat structure on the edge spaces (J x , F x ). In these dually flat spaces, we define the Bregman divergence and Hessian matrices: [43] We do not use differentiation of Ψ * x (f ) and Ψ x (j) with respect to other variables. [44] From the physical point of view, these conditions are consistent with the thermodynamic requirement that, if the force is zero, the corresponding flux becomes zero, and vice versa and that a signreversed force induced the sign-reversed flux. [45] In the context of thermodynamics, the nonnegativity of j, f is linked to the nonnegativity of the entropy production rate and thus the 2nd law of thermodynamics.

Definition 19 (Bregman divergence and Hessian matrices on the edge spaces). For each
x ∈ X , the Bregman divergence between j ∈ J x and f ′ ∈ F x is defined as are also defined analogously to the Bregman divergence on the vertex space (X , Y). For a Legendre conjugate pair of twice differentiable dissipation functions, the Hessian matrices, G x,j and G * x,f , are defined as These matrices are positive-definite.
The Legendre dual structure via the dissipation functions provides an extension of a Riemannian metric structure in the following sense. If the dissipation function is a quadratic function, i.e., a positive definite quadratic form as where M * x is a positive definite N e × N e matrix, the Legendre transformation is reduced to the linear mapping ∂Ψ q, * x (f ) = M * x f [46] . Then, the bilinear paring, j, f ′ = j, M x j ′ = M * x f , f ′ , becomes the inner product under the metric matrix M x . The dissipation functions are associated with the induced norms: x . Finally, we also introduce the notion of separability to the dissipation functions: Definition 20 (Separability and locality of dissipation functions). A dissipation function where ω e (x) > 0 and f o e (x) > 0 for x ∈ X are positive weights and ψ * (f ) : R → R is a scalar dissipation function, i.e., a strictly convex differentiable scalar function satisfying Eq. 32, Eq. 33, and Eq. 31. If ω(x) and f o e (x) are additionally local, then the dissipation function is separable and local. If Ψ * x (f ) is separable, then its dual Ψ x (j) is also separable. The same is true for the locality.

C. Generalized flow on graphs and hypergraphs and its steady state
Because of the one-to-one Legendre duality between (j, f ) x , the continuity equation (Eq. 10) can be represented as a generalized flow driven by the force f (x) conjugate to j(x) [76]: This representation is not dependent on the specific functional form of f (x) and Ψ * x (f ) and also on the definition of div S as long as the generated j(x) is consistent with H [47] [48] .
Thus, we can potentially apply this framework to various systems by choosing these functions appropriately depending on the system or the problem we work on.
The generalized flow naturally encompasses three types of steady states: [47] The consistency is required because of our choice of R NX ≥0 as the state space. [48] The consistency with H is assumed to hold. We discuss this matter in Discussion.
where we used j( The following proposition ensures that the generalized gradient flow behaves like the conventional gradient flow: Proof. F (x t ) is non-increasing over time as follows: where Eq. 38 is used. The equality holds iff It should be noted that, even if F (x) has a single minimum, the steady state x st := lim t→∞ x(t) may not be the minimum, becauseḞ(x t ) = 0 holds for any x ∈ M DB [50] .
The generalized gradient flow of this form (Eq. 48) was devised in the process to extend the conventional gradient flow to metric spaces [124,125] [51] . Furthermore, dissipation functions have been recognized since the seminal work of Onsager [126][127][128]. However, only quadratic dissipation functions have been investigated until very recently [74][75][76][77][78][79][80][81]. This may be partly because we lack an adequate geometric language to handle the non-quadratic cases, i.e., information geometry [52] . Actually, if the dissipation function is quadratic Ψ q, * x [f ] as in Eq. 41, then the generalized flow (Eq. 43) formally reduces to the flow on a Riemannian manifold with the metric SM * x S T . The non-negativity ofḞ (x t ) is essentially attributed to the fact thatḞ(x t ) = − j(x), f (x) holds in Eq. 48 for the generalized gradient flow. The converse also holds.
Proposition 4 (De Giorgi's formulation of generalized gradient flow [74,78]). Let x t be a generalized flow induced by a force f (x). x t is the generalized gradient flow of [49] M ST = M DB = ∅ can hold, e.g., when F (x) is a strictly monotonous function. [50] In addition, there exists the possibility that x(t) converges to the boundary of X . [51] The metric here means a general metric, which is not restricted to one associated with the inner product. [52] Another reason is that the generalized gradient flow was devised to investigate infinite-dimensional partial differential equation, in which the formally obtained results via the gradient flow structure should be rigorously proven via a different approach.
Proof. For a generalized flow x t driven by force f (x) as in Eq. 43 and for any F (x), the following inequality holds: where we define f ′ (x) := grad S ∂F (x). The last inequality becomes an equality if and only if f ′ (x t ) is the Legendre dual of j(x t ) [53] , i.e., Thus, Eq. 49 holds only when x t is the generalized gradient flow of F (x).
De Giorgi's formulation is a well-established approach for defining gradient flow in metric spaces [124].

E. Equilibrium and nonequilibrium flow
In this work, we mainly focus on the case that where D X Φ is the Bregman divergence associated with a thermodynamic function Φ.
Definition 24 (Equilibrium gradient force and equilibrium flow). The force generated by the gradient of Bregman divergence associated with a thermodynamic function Φ is called the equilibrium gradient force, and the following equation is denoted as the thermodynamic gradient equation: [53] The inequality and the equality condition are usually derived by using Cauchy-Schwarz inequality [129]. From the information-geometric framework, they are trivially attributed to the non-negativity of Bregman divergence. wherex ∈ X is a parameter. A generalized flow x(t) is an equilibrium flow if it is driven by the thermodynamic gradient force : Using the relation ∂D X Φ [x x] = ∂Φ(x) −ỹ whereỹ = ∂Φ(x), Eq. 56 can be rewritten aṡ which explicitly shows the contribution of both the thermodynamic function and the dissipation function to the dynamics ( Fig. 3 (a)).
Various properties of the equilibrium flow (Eq. 56) can be obtained from the dually flat structures as we will see in the following sections. In addition, the equilibrium flow captures the properties that the dynamics of thermodynamic equilibrium systems should hold. In this sense, the equilibrium flow is the mathematical representation of the dynamics of equilibrium systems.
Beyond the gradient equilibrium flow, we also consider the non-gradient nonequilibrium flow of the following type: Definition 25 (Nonequilibrium force and nonequilibrium flow). The force generated by the constant shift of the thermodynamic gradient force is called nonequilibrium force where f N E ∈ Im[S T ] [54] . If the shift parameter f N E satisfies f N E ∈ Im[S T ], then the nonequilibrium force is reduced to the equilibrium force f N E = 0 by appropriately changingx. The nonequilibrium flow is the flow induced by the nonequilibrium force ( Fig. 3 (b)):ẋ In the next section, we show that this equation can cover a sufficiently wide class of models, e.g., all types of rLDG and CRN with extended LMA kinetics. Equation 59 can also be associated with nonequilibrium dynamics with a constant environmental force. The techniques in information geometry, Hessian geometry, and convex analysis enable us to investigate such non-gradient dynamics.
Remark 6 (Variational modeling [130]). We introduced and characterized dynamics based on the thermodynamic functions and dissipation functions. While we employed a restricted definition to link dynamics to information geometry, we may further generalize this approach by appropriately choosing the state space, f (x), Ψ * x (f ), and div S . For example, we may consider a x-dependent and noninteger-valued stoichiometric matrix S(x). The equilibrium flow may not be restricted to F (x) = D X Φ [x x], and the nonequilibrium flow may be defined for x-dependent f N E (x). This type of approach for modeling dissipative dynamics has been known as variational modeling.
Before closing this section, we mention that the existence of DB states, i.e., M DB = ∅ is necessary and sufficient for a nonequilibrium flow to be the equilibrium flow.
The necessity follows basically from Prop. 3, but we have to show M ST = ∅. This will be shown in the following section (Lemma 1). [54] In physics, such f N E can be identified with a nonequilibrium force applied externally to the system.

TIONS
Before investigating the dynamics of the equilibrium (Eq. 56) and nonequilibrium (Eq. A. Thermodynamic and dissipation functions for symmetric rLDG The symmetric rLDG defined by Eq. 20 is treated as an illustrative example. From the form of the graph Laplacian (Eq. 19) [55] , it is easy to see that Eq. 20 coincides with Eq. 56 where Y = X , M 0 = I, M 1 = diag[k], andx = 0 [56] . Because both M 0 and M 1 are diagonal, these thermodynamic and dissipation functions are separable. [55] It should be noted that this representation holds only when k + = k − holds. [56] While Y = R Nv , no problem arises because M 0 is the trivial identity matrix in this case. Also, this dynamics can be extended to X = R NX .

B. Explicit form of thermodynamic functions for rLDG and CRN
For rLDG (Eq. 3) and CRN with LMA kinetics (Eq. 12), the following dual thermodynamic functions are particularly relevant [57] : which induce the following Legendre transformation: Here, Y = R N X , and x o ∈ X is a parameter determining the point in X that is associated with the origin of Y via the Legendre transformation. For these thermodynamic functions, the Bregman divergence is reduced to the generalized Kullback-Leibler divergence.
These thermodynamic functions and the Bregman divergence are separable.
If we choose x o = 1, then the conventional dual representation for the distribution p on a discrete space is recovered: where we use p instead of x. In this case, Y is the space of the logarithm of p. These representations hold even if p is not a probability density. If p satisfies 1 T p = 1, the generalized KL divergence becomes the normal KL divergence D X [p p ′ ] = ln p p ′ T p [58] . [57] For CRN, these forms of the thermodynamic functions are derived from the macroscopic thermodynamics of ideal gas or dilute solution with nonreactive solvent [119]. Mathematically, we may employ other functions for the same MJP or CRN with LMA kinetics as we introduce different information geometric structures onto a family of probability depending on the purpose. Such an exploitation is an interesting open problem. [58] As we will see later, the condition 1 T p(t) = 1 need not be assumed but is automatically satisfied due to the topological constraint of the graph and the initial condition 1 T p(0) = 1 when we work on rMJP.

C. Explicit form of dissipation functions for rLDG and CRN
To determine the dissipation functions, we need the definition of force, which may depend on the phenomena and purpose [59] . In physics, the flux-force relations, which are also called constitutive equations [131], are central because they determine what kind of change is induced by an incurred force [60] . For rMJP and CRNs, the flux and force are conventionally defined using the one-way fluxes, j + (x) and j − (x) as where the dependency of j − (x) on x is abbreviated for notational simplicity. In physics, assuming this form of force-flux relation goes by the name of the local detailed balance (LDB) assumption [61] , or the generalized detailed balance assumption [62] . By defining the frenetic activity [134]: we have a relation j = ω . For a fixed ω, this relation between the pair (j, f ) is a one-to-one Legendre duality induced by the following specific form of dissipation functions: which lead to the Legendre transformation: [59] This is parallel to the problem of how to define the dual of x. The choice of logarithm is contingent on the domain and knowledge of physics and statistics. [60] Actual forms of the relations depend on the respective phenomena. Some relations were obtained empirically through experiments and others were computed theoretically from microscopic models. [61] LDB is different from the DBC, which was discussed in Def. 22. [62] The validity of LDB was shown for rMJP and CRN with LMA kinetics via large deviation theory for the corresponding microscopic Markovian models or via its consistency with the macroscopic chemical thermodynamics [132,133].
We can easily verify that these functions satisfy the conditions for dissipation functions, i.e., Eq. 32, Eq. 33, and Eq. 31.
For the flux j MA (x) of LMA kinetics (Eq. 12) [63] , the force and activity become where we introduced a transformation of the kinetic parameters (k + , k − ) into the force part K and activity part κ as κ := √ k + • k − and K := k + /k − [64] . Because k ± = κ • K ±1/2 holds, (κ, K) has the same information as (k + , k − ). Moreover, we can verify that the force and activity are dependent only on K and κ, respectively. The dissipation functions of the forms above and their relations to LDG and CRN were derived from the large deviation function of the corresponding microscopic stochastic models for rMJP and CRN [74,135].
Actually, the Bregman divergence D J x [j; j MA (x)] of the dissipation functions is identical to the rate function of the flux for rMJP and CRN. Thus, these dissipation functions are keystones connecting macroscopic and microscopic dynamics.
If there existsỹ satisfying −S Tỹ = ln K, i.e., ln K ∈ ImS T , the force in Eq. 69 is represented as wherex is the Legendre conjugate ofỹ [65] . Thus, CRN (and rMJP) is an equilibrium flow of the generalized KL divergence D X [x x] when the parameter K satisfies ln K ∈ ImS T . In chemistry, the condition ln K ∈ ImS T is called Wegscheider's equilibrium condition [46,136], and the CRN satisfying this parametric condition is called equilibrium CRN [66] . Even if [63] The dissipation functions in Eq. 67 and the induced Legendre transformation in Eq. 68 are not necessarily restricted to these particular types of force and activity. Actually, the extended LMA kinetics (Eq. 13) can also be represented by re- Thus, Eq. 67 could be applied to a wider class of kinetics than Eq. 12.
[64] For CRN, K is referred as the equilibrium constant in chemistry. [65] It should be noted that, whileỹ is not uniquely determined by K in general, it does not cause problem. This is clarified in the following section (Sec. VIII) by introducing appropriate affine subspaces. [66] Historically, the equilibrium chemical systems were characterized by macroscopic thermodynamics.
The equilibrium condition was derived as the necessary and sufficient condition that the flux of the ln K ∈ ImS T is not satisfied, we can represent ln K = −S Tỹ + f N E with f N E ∈ ImS T . The force in Eq. 69 is always represented as which leads to the nonequilibrium flow (Eq. 59). Thus, CRN with LMA kinetics as well as rLDG are generally within the class of Eq. 59. is also often assumed in statistics when we design or analyze a random walk in parameter spaces, e.g., in the Markov Chain Monte Carlo simulations or in other random-walk-based optimization schemes [67] . These two are equivalent for (extended) LMA kinetics. Actually, From the Fredholm alternative, we obtain the Wegscheider's equilibrium condition ln K ∈ ImS T for the existence of x DB .

D. Some remarks on the dissipation functions for CRN and rLDG
The dissipation functions in Eq. 67 have several notable properties. First, they are separable: LMA kinetics (Eq. 12) should satisfy to have consistent properties with the thermodynamic equilibrium systems. It was found only recently that the equilibrium properties are mathematically attributed to the generalized gradient flow structure. [67] The DB condition is conventionally adopted be- where and ω(x) is local: 61 are also separable. The locality and separability of dissipation functions are important requisites for the cochain complexs on graphs and hypergraphs to function as the discrete versions of differential forms [68] .
Second, the scalar function ψ * (f ) is the N-function. The N-function of the (cosh(f ) − 1)-type and the associated Orlicz space have been employed for establishing the infinitedimensional information geometry by Pistone [71,137,138]. In functional analysis, Orlicz spaces are a generalization of L p spaces, which arise naturally when we work on L log + L spaces for the divergences and large deviation functions. Hence, the dissipation functions in Eq. 67 are tightly related to such topics.
This type of quadratic dissipation function has also been proposed even earlier than the non-quadratic ones [139][140][141] and been investigated [142][143][144]. Its advantage is that the induced geometry is Riemannian, and thus the Hessian information geometric argument is not necessarily required. In addition, this Riemannian geometric structure is related to the formal Riemannian geometric structure of FPE and other diffusion processes on continuous manifolds induced via the L 2 -Wasserstein geometry [64,65] (Fig. 4). Thus, this quadratic dissipation function provides a consistent extension of these results for FPE and diffusion processes to graphs and hypergraphs. Nevertheless, the Hessian information geometric struc- [68] The locality and separability may sound like natural properties. However, from the physical viewpoint, the Onsager matrix can have nondiagonal components, which implies nonseparable dissipa-tion functions. In addition, equilibrium thermodynamics does not preclude thermodynamic functions from being nonseparable ture with the non-quadratic dissipation functions that we introduce is also another sound generalization of the formal Riemannian geometry of FPE, as we see in the next subsection.
As long as we focus only on the trajectory of the generalized flow (Eq. 43), the difference does not matter because both induce the same dynamics. However, the Bregman divergence of the quadratic dissipation functions is not related to the rate function of the microscopic stochastic models, while that of nonquadratic ones in Eq. 67 is [135]. Thus, if we consider projections of fluxes and forces in the edge spaces, different choices of dissipation functions lead to different projections.

FIG. 4. A relationship between Wasserstein geometry and information geometry. The formal
Riemannian geometric structure appears at their intersection. It should be noted that, while the regions of L p -Wasserstein distance for p = 2 and nonquadratic dissipation function are not overlapping in this figure, this does not mean that they are unrelated. There may be undiscovered relations between these two regions.

E. Explicit forms of thermodynamic and dissipation functions for FPE
For FPE, the dual representation of the density p(r) and its logarithm y(r) = ln p(r) is also relevant. This duality is induced formally by the following thermodynamic functions [69] : [69] The base measure is omitted because this is just a formal one.
The Bregman divergence becomes the KL divergence D X [p p ′ ] = drp(r) ln p(r) p ′ (r) . In physics, the flux and force for FPE are defined conventionally as The dissipation functions associated with the force-flux relation above are The last quantity without D 0 is known as relative Fisher information [65,145] and Hyvärinen divergence [122,146]

A. Four affine subspaces
Two families of orthogonal affine subspaces are naturally introduced on X and Y, respectively, from the topological structure of graph and hypergraph, i.e., B and S.
Definition 26 (Stoichiometric subspaces in X ). The stoichiometric subspaces are defined as [70] P sc ( where x 0 is a parameter to specify the position of the subspace (Fig. 5, lower left) [71] . [70] In CRN theory, a stoichiometric subspace is called stoichiometric compatibility class [8]. [71] Because P sc (x 0 ) is restricted within the positive orthant X , P sc (x 0 ) is a polyhedron. If bounded, it is called a polytope in discrete geometry and also in combinatorial optimization [148]. However, we abuse the word (affine) subspace for P sc (x 0 ), and use polyhedron or polytope when we care about the boundary. Definition 27 (Equilibrium subspaces in Y). The equilibrium subspaces (Fig. 5, upper left) are defined as P eq (ỹ) := y ∈ Y|y −ỹ ∈ KerS T ,ỹ ∈ Y.
Two other families of orthogonal subspaces are introduced on J x and F x .
Definition 28 (Iso-velocity subspaces in J x ). The iso-velocity subspaces (Fig. 5, lower right) are defined as Definition 29 (Iso-force subspaces in F x ). The iso-external-force subspaces, iso-force subspaces in short, (Fig. 5, upper right) are defined as Again, from the correspondence of δ 1 = S and δ 0 = S T , P vl (ĵ) and P f r (f ′ ) are associated with the 1-cycle and 1-cocycle spaces, respectively [73] . We specifically call P vl (0) and P f r (0) zero-velocity subspace and gradient-force (GD-force) subspace, respectively. Their meaning is explained in the next subsection.

B. Meaning of the subspaces
All four subspaces are natural constituents in the theory of algebraic graph theory and homological algebra. Here, we provide their meaning in terms of the dynamics on graphs and hypergraphs.
The stoichiometric and iso-velocity subspaces, P sc (x 0 ) and P vl (ĵ), are related by the continuity equation (Eq. 10). From the continuity equation, P vl (ĵ) is the set of fluxes that [72] In algebraic graph theory, P sc (0) = ImS is called cocycle space. [73] In algebraic graph theory, P vl (0) is called the cycle space.
induce the same velocity as a referenceĵ does: j ∈ P vl (ĵ) ⇐⇒ẋ = −Sĵ = −Sj. Thereby, P vl (ĵ) is parametrized as follows: This subspace is crucial to characterize fluxes that can realize the same dynamics as the reference one.
The stoichiometric subspace P sc (x 0 ) determines the subspace in which the dynamics are algebraically constrained via the topology of the underlying graph or hypergraph. Becausė should hold, meaning that x(t) ∈ P sc (x 0 ). Thus, P sc (x 0 ) is the subspace in which the dynamics are restricted by the initial condition x 0 . P X sc (x 0 ) can also be represented parametrically, which provides it with another characterization as the quantities which are conserved by the dynamics. For any vector u ∈ KerS T , η(t) := u T x(t) is constant over time: In Sec. III A, we defined a matrix U by a complete basis of KerS T so that ImU T = KerS T .
Using U, the conserved quantities for a given initial condition x 0 are obtained as η = Ux 0 = Ux(t). Because ImU is isomorphic to C −1 (H), the stoichiometric subspace is explicitly parametrized by the conserved quantities (an element of C −1 (H)): For rMJP, the conserved quantity is reduced to the conservation of probability 1p(t) = 1 and P sc (p 0 ) becomes the probability simplex. Because KerB T determines the connected components of the graph G and we conventionally assume that the underlying graph is connected in rMJP, we only have the one-dimensional cokernel space and one conserved quantity, which is η = 1. Thus, the conservation of probability or, equivalently, the restriction of p in the probability simplex is automatically guaranteed from the topological constraint of the dynamics if we start from the initial state satisfying 1 T p 0 = 1.
The iso-force subspace P f r (f ′ ) and the equilibrium subspace P eq (ỹ) are related to the equilibrium and nonequilibrium force equations, Eq. 55 and Eq. 58. The equilibrium gradient force defined in Eq. 55 satisfies f (x) ∈ ImS T = P f r (0). Thus, the gradient-force subspace P f r (0) is the set of equilibrium gradient forces.
Thus, for f ′ ∈ P f r (0), P f r (f ′ ) is a quotient set of nonequilibrium forces by the equilibrium force P f r (0). Using V defined in Sec. III A, we can represent P f r parametrically as because F x /ImS T ∼ = F x /KerV T ∼ = ImV T ∼ = C 2 (H). Thus, ζ characterizes the type of nonequilibrium forces.
Finally, the equilibrium subspace P eq (ỹ) can also be regarded as the set of potentials y that generate the same gradient force because any y ∈ P eq (ỹ) satisfies f ′ = S T y = S Tỹ ∈ P f r (0). Due to this, the equilibrium subspace P eq is parameterized as The parametric forms of the subspaces are summarized as follows: From these subspaces, we can obtain dual foliations on the vertex and edge spaces.

C. Dual Manifold, Dual Foliation, and Pythagorean relation in vertex spaces
For the subspaces P sc and P eq in the vertex spaces, we introduce their Legendre transformation via the thermodynamic functions, Φ(x) and Φ * (y), which form the dual foliation with the orthogonal subspace (Fig. 6, left).
Lemma 1 (Dual foliation in vertex space [84]). P sc and M eq are foliations of X and M sc and P eq are foliations of Y. For each pair of (x 0 ,x), the intersection of P sc (x 0 ) and M eq (x) is unique and transversal. The same applies to M sc (y 0 ) and P eq (ỹ). Then, (P sc , M eq ) and (M sc , P eq ) form dual foliations (nonlinear coordinate systems) in X and Y spaces, respectively.

Dual foliation in
Proof. The polyhedron P sc (x 0 ) and the affine subspace P eq (ỹ) can cover the whole X and Y by changing x 0 andỹ, respectively. Similarly, M eq (x) and M sc (y 0 ) can cover the whole X and Y because Legendre transformations by the thermodynamic functions are one-to-one between X and Y. Consider the intersection of P sc (x 0 ) and M eq (x) in X space: The existence of x † is related to the existence of x ‡ defined by the following convex opti-mization problem: x ‡ := arg min Because of the properties of Φ(x), D X [x x] and its linear restriction to P sc (x 0 ) are strictly convex with respect to x. Thus, if x ‡ ∈ P sc (x 0 ), x ‡ is unique and either satisfies the stationarity condition x ‡ ∈ P sc (x 0 ) ∩ M eq (x) or locates at the boundary if x ‡ ∈ P sc (x 0 ), . Let x bry ∈ R N X ≥0 \ X and x in ∈ X be arbitrary points on the boundary and interior of X . From the condition Eq. 23 of the thermodynamic function, for Thus, x ‡ ∈ X is excluded, and the intersection exists, i.e., x ‡ ∈ P sc (x 0 ) ∩ M eq (x). The intersection x † is unique and transversal because, for any x sc ∈ P sc (x 0 ) and x eq ∈ M eq (x), x sc − x † , y eq − y † = 0 holds. The dimensions of P sc (x 0 ) and M eq (x) are complementary, the intersection x ‡ is unique. Thus, x ‡ = x † . As a result, x † ∈ P sc (x 0 ) ∩ M eq (x) always exists, and (P X sc , M X eq ) forms a dual foliation in X . Also (M Y sc , P Y eq ) does in Y because they are bijective Legendre duals of (P X sc , M X eq ).
This result is reduced to Birch's theorem [100,102]

and the seminal result by Horn and
Jackson [40] when the thermodynamic function is the generalized KL divergence.
With the dual foliation, we can consider the generalized Pythagorean relations and orthogonal decomposition. For any three points satisfying x ∈ P sc (x 0 ), x q ∈ M eq (x), and x † = P sc (x 0 ) ∩ M eq (x), we have the generalized Pythagorean relation: In Y space, we also have the dual version of the relations as and y eq = arg min These relations are used to characterize the steady state of equilibrium and nonequilibrium flow geometrically and also variationally.
Remark 8 (Interpretation in terms of statistical inference). The meaning of the equilibrium manifold in statistics can be clarified more explicitly by considering the specific form of thermodynamic function (Eq. 64). For this thermodynamic function, the equilibrium manifold M eq (p) is represented as where we use the fact S T U T = 0. Thus, M eq (p) is the exponential family with algebraic constraints via U T . In contrast, P sc (η) can be regarded as the data manifold, which constrains where U is sometimes called the design matrix[102] [74] .

Kodaira decomposition
For the edge spaces, we similarly introduce the iso-velocity and iso-force manifolds, which are the duals of P f r (f ′ ) and P vl (ĵ), respectively, via the Legendre transformation of the dissipation functions, Ψ x (j) and Ψ * x (f ) (Fig. 6, right): Definition 31 (Iso-velocity manifold in F x and iso-force manifold in J x ). The iso-velocity and iso-force manifolds (Fig. 6, right) are defined as follows: It should be noted that M vl x (f ) and M f r x (j ′ ) are dependent on x via the x dependence of the dissipation functions. We obtain the intersections in J x and F x : [74] In algebraic statistics, U is explicitly given as constraints of a statistical model. In the dynamics on graph and hypergraph, S is explicitly given, and U is implicitly defined as a complete basis of KerS T .
As a result, their connection is not apparently obvious.
which are also unique and transversal for each fixed x ∈ X because J x and F x are whole vector spaces and the Legendre transformation are one-to-one. Thus, similarly to the case of vertex space, we have the dual foliation: Lemma 2 (Dual foliation in edge space [82]). For each x ∈ X, (P vl , M f r x ) and (M vl x , P f r ) form dual foliations in J x and F x spaces, respectively.
Forĵ and f ′ , and their intersections j † and f † defined in Eq. 106, ĵ − j † , f † − f ′ = 0 holds. Thus, we have the generalized Pythagorean relations: In contrast to the thermodynamic functions (Φ, Φ * ), the dissipation functions have symmetry, which makes the origins 0 in J x and F x special and leads to the generalized Helmholtz-Hodge-Kodaira decomposition.
Theorem 1 (Generalized Helmholtz-Hodge-Kodaira (HHK) decomposition [82]). For a given flux-force Legendre pair (j, f ) ∈ (J x , F x ), we have their unique x-dependent decompositions: such that f eq (x), ∈ P f r (0), j − j eq (x) ∈ P vl (0), f − f st (x), ∈ P f r (0), and j st (x) ∈ P vl (0) hold. j eq (x) and f st (x) are characterized geometrically as Furthermore, j eq and f st are also characterized variationally as the minimizers of dissipation functions: j eq (x) = arg min Proof. The uniqueness of j eq (x) and f st (x) as intersections in Eq. 109 follows immediately from the property of the dual foliations. Because, for any j ′ ∈ P vl (j) and f ′′ ∈ P f r (f ), j ′ − j eq , f eq = 0 and j st , f ′′ − f st = 0 hold, the generalized Pythagorean relations lead to Then Eq. 110 follows.
The decomposed flux j eq and force f st play a particularly important role in dynamics.
From the definition, j eq is the gradient flux (the flux whose corresponding force f eq is an equilibrium gradient force, i.e. f eq ∈ P f r (0) = ImS T = Im[grad S ]), which induces the same instantaneous velocityẋ as j does, i.e.,ẋ = −div S j = −div S j eq . Thus, j eq is the gradient flux mimicking the dynamics of the nonequilibrium flux j. This gradient flux is uniquely determined owing to the information-geometric orthogonality of P vl (j) and M f r x (0). Moreover, the decomposition j = j eq + (j − j eq ) can be regarded as an information-geometric generalization of the Helmholtz-Hodge-Kodaira decomposition in vector calculus, because (j − j eq ) is divergence free, i.e., div S (j − j eq ) = 0, and f eq is a curl-free gradient force, i.e., On the contrary, by definition, j st ∈ P v (0) is the flux that makes the state x a steady state, i.e.,ẋ = 0, and is also induced by the force in the same quotient set of force P f r (f ) as is also a HHK decomposition because j st ∈ P vl (0) is divergence free, i.e., div S j st = 0, and f − f st is a curl-free gradient force, i.e., f − f st ∈ P f r (0) = Im[grad S ] = Ker[V T ]. [75] These decompositions are used in the subsequent sections (Sec. VIII and Sec. VIII).

VII. CENTRAL AFFINE MANIFOLD AND HILBERT ORTHOGONALITY
The dual foliation is an essential geometric object in information geometry. While less common than the dual foliation, the central affine manifold defined by a convex function also plays integral roles in information geometry [147]. [75] It should be noted that, in general, j st = j − j eq and f eq = f − f st and as a result j = j eq + j st and F x are defined as the level sets of Ψ x (j) and Ψ * x (f ), respectively [76] : where c ∈ R ≥0 . For a given j ′ ∈ J x or f ′ ∈ F x , the manifolds are also denoted as Their Legendre transformations are also called (dual) central affine manifolds: A. Pseudo-Hilbert-isosceles orthogonality and decomposition By employing the central affine manifold, we can introduce another type of generalized orthogonality: Definition 33 (Pseudo-Hilbert-isosceles orthogonality [77,78,80]). Pseudo-Hilbert-isosceles orthogonality between j S , j A ∈ J x and between f S , f A ∈ F x are defined as follows: This orthogonality is motivated by the relation j S + j A 2 = j S − j A 2 satisfied by an orthogonal pair j S ⊥ j A under a usual inner product structure and its induced norm · 2 [77] .
By employing this orthogonality, we obtain a pseudo-Hilbert isosceles decomposition j as follows: [76] While we introduce the central affine manifolds only on the edge space, they can be defined on the vertex space as well. The central affine manifolds on the vertex space become fundamental when we work on the isobaric processes in which the volume changes in conjunction with the reactions [85]. In this case, the volume is a global variable affecting all the reactions simultaneously. [77] If · 2 is a squared norm that is not necessarily induced from an inner product, the orthogonality is called isosceles or James orthogonality [149].
Here, we further replace · 2 with the dissipation function, which does not satisfy some conditions required to be a norm. Thus, we call the decomposition pseudo-Hilbert isosceles Lemma 3 (A positive decomposition of the bilinear pairing via pseudo-Hilbert-isosceles orthogonality [77,78,80,82]). For a given j ∈ J x and any j ′ on the same central affine manifold as j, i.e., j ′ ∈ C Ψ x (j), we obtain the Pseudo-Hilbert-isosceles orthogonal decomposition j = j S + j A : where j S ⊥ H j A and j ′ = j S − j A hold. In addition, this decomposition induces a positive decomposition of the bilinear product j, hold. Similarly, for f ∈ F x and f ′′ ∈ C Ψ * x (f ), the positive orthogonal decomposition f = f S + f A is obtained by f S := 1 2 (f + f ′′ ), and f A := 1 2 (f − f ′′ ) and the associated relations: hold.
These decompositions were introduced in [77] for MJP and extended to CRN in [78,80], whereas we pointed out its information geometric aspect in [82]. The decomposition plays the role of characterizing the gradient-flow-like property of non-gradient flows.

VIII. INFORMATION-GEOMETRIC PROPERTIES OF EQUILIBRIUM GRADI-ENT FLOW
In this section, we describe several properties of the equilibrium flow (Eq. 56) from the viewpoint of information geometry by employing the objects introduced in the previous sections. Such properties include the existence and uniqueness of the steady state (static property), convergence to the state (kinetic property), and the balance between informationgeometric quantities associated with the steady state and convergence along the trajectory (the connection between static and kinetic properties). These properties are consistent with those that thermodynamic equilibrium systems should have. In addition, several results are extensions of the results obtained for FPE in the context of functional analysis, in the theories of partial differential equations, and optimal transport.

A. Properties of equilibrium flow
The following property of the equilibrium state characterizes the static aspect of the equilibrium flow and is fundamentally ascribed to the dually flat structure of vertex spaces [78] : Proposition 6 (Equilibrium state and its geometric and variational characterizations).
The steady state of the equilibrium flow x t (Eq. 56) starting from x(0) = x 0 is called the equilibrium state x eq . For each x 0 , the equilibrium state is identical to the intersection i.e., x eq = x † , and thus uniquely exists for a given pair of the initial state x 0 and the parameter of equilibrium flowx. The equilibrium state x eq is also characterized variationally as Moreover, M eq (x) = M DB holds.
Proof. From Prop. 23, x eq ∈ M DB , from which f (x eq ) = 0 follows. For the equilibrium Thus, x eq ∈ M eq (x). Because the initial state is x 0 , x eq ∈ P sc (x 0 ). Thus, x eq = x † = P sc (x 0 ) ∩ M eq (x). The first equality of Eq. 120 is obvious from the proof of the dual foliation (Lemma 1). The second equality is from the generalized Pythagorean relation (Eq.

100).
The second property is kinetic in nature and characterizes the Bregman divergence as the generalized driving potential, which ensures the convergence of x t to the equilibrium state.
This property is attributed to the second dually flat structure on the edge spaces.

Proposition 7 (Bregman divergence and Gibb's H-Theorem).
For the trajectory of the equilibrium flow x t (Eq. 56) starting from x]/dt = 0 holds. Thus, the equilibrium state x eq ∈ P sc (x 0 ) ∩ M eq (x) is locally and asymptotically stable.
) ≤ 0 and the equality holds if and only if f (x t ) = 0(⇔ x t ∈ M eq (x). [78] Actually, this result is independent of the detail of the dissipation functions.
Because D X Φ [x x] can be identified with the difference of total entropy between x andx for thermodynamic systems such as CRN [47], dD X Φ [x t x eq ]/dt ≤ 0 corresponds to the nondecreasing property of thermodynamic entropy, which is also referred as Gibb's H-theorem [79] .
The third property provides a connection between the thermodynamic function and the dissipation function, which is immediately obtained from the De Giorgi's formulation of the generalized gradient flow (Eq. 50): Proposition 8 (Balancing of thermodynamic function and dissipation function). For the trajectory of the equilibrium flow x t (Eq. 56) starting from x(0) = x 0 , the following relation holds for the thermodynamic function and the dissipation function: In physics and chemistry, this relation means that the difference in the thermodynamic (potential) function between x t and x 0 (the left-hand side), i.e., the change in total entropy, is equal to the integral of dissipation along x t (the right-hand side), i.e., the entropy production, for equilibrium systems.
All these results indicate that the equilibrium flow and its properties mathematically abstract the properties of physical equilibrium systems. The equilibrium state x eq is characterized algebraically by the unique intersection of P sc (x 0 ) and M eq (x) and also variationally by Eq. 120. The convergence to x eq is guaranteed by dD X Φ [x t x]/dt ≤ 0. Furthermore, the entropy-dissipation balance relation (Eq. 121) itself defines the equilibrium system abstractly as the De Giorgi's formulation (Eq. 50) does.
B. Induced dually flat structure on tangent-cotangent spaces as a generalized Otto structure The equilibrium state is characterized geometrically and variationally via the informationgeometric structure on the vertex spaces (X , Y) as in Prop. 6. Similarly, the flux (kinetic [79] We have multiple types of H-theorems. The most famous one is Boltzmann's H theorem, in which the H function is derived from the microscopic dynamics of a system. In Gibb's H-theorem, H func-tion is obtained by coarse-graining the microscopic system [43] Then, j † t is generated by the gradient force, f † t = ∂Ψ x [j † t ] ∈ P f r (0). Thus, the minimum primal dissipation flux that generates the given {x t } is the equilibrium flux.
Proof. Because the minimization of Eq. 122 can be conducted pointwise-manner for each These functions are Legendre conjugate as follows: The inverse is also shown: where we used the fact that in the second line. The pair (v, u) x are Legendre dual of these functions: where we used ∂j † (x,v) They are dissipation functions: strict convexity and 1-coercivity follow from those of the original dissipation functions. Also, we have Using the induced dissipation functions, we define the Bregman divergence on (T x X ,T * x X ), which is associated with the Bregman divergence on (J x , F x ): where j † = j † (x, v) and f ′ ‡ = f ‡ (x, u ′ ). Therefore, we have the induced dually flat structure on (T x X ,T * x X ). This induced structure can be regarded as a generalization to discrete manifolds of the Otto structure [64,65]: the formal Riemannian structure induced by the L 2 -Wasserstein distance. This is also related to Pistone's infinite-dimensional information geometry [71,123].
C. Fisher information, natural gradient, mirror descent, evolutionary computation, and optimal transport In information geometry, it is conventional to use the Fisher information matrices, i.e., the Hessian matrices G x and G * y (Eq. 29) as the metric tensor (Fisher-Rao metric) on (T x X , T * x X ) or equivalently on (T y Y, T * y Y) ( Fig. 7 (b)). Gradient systems have been defined information-geometrically [24] as a Riemannian gradient flow using the Bregman divergence and the Fisher information matrix of Φ(x) as the gradient function and the metric tensor, Because both G x and D X Φ are derived from Φ(x), this gradient flow becomes a geodesic in Y space:ẏ = −(y −ỹ). In natural gradient descent [66,68,150], the Fisher information matrix is used to find the steepest descent gradient of a function F (θ) on a parameter space Θ asθ = −G −1 θ ∂F (θ), where G θ is determined independently of F (θ) by considering the underlying model parameter space.
In optimization, the natural gradient is fundamental in information-geometric optimization algorithms, which contain various evolutionary optimization schemes [69]. In relation to machine learning, the mirror descent is identified with the natural gradient descent by a naive continuous limit [68,151]. Furthermore, optimal transport has recently been employed to replace or integrate the Fisher-Rao metric with the Wasserstein metrics [152,153]. Because the Wasserstein metric can take the underlying manifold information into account, their integration may provide more amenable ways to accommodate various prior and structural information.
The doubly dual flat structure introduced in this work actually provides a solution to generalize those results and the associated problems. The base space X with the dually flat structure and the associated Fisher information matrix accommodates the conventional natural gradient. The graph or hypergraph structure endows the additional topological relation to the base space of X . The dissipation functions on the edge spaces or their induced versions bestow a more flexible way than the Fisher-Rao metric to represent the loss of the potential function, i.e., the dissipation, at each point in the state space. Upon necessity, we may combine both of them ( Fig. 7 (b)), for example, asẋ = −G −1 where F (1) (x) and F (2) (x) could be different. This flexibility may contribute to the design of new algorithms for machine learning. Actually, this integrated representation is quite relevant to the filtering equations [154] in sequential inference where the first term, i.e., −G −1 x ∂F (1) (x), can usually be associated with the update of posterior probability by observation and the second term, −div S ∂Ψ * x [F (2) (x)] can represent the prediction by the prior information on the dynamics. Our framework may provide a unified information-geometric perspective to various information-geometric analyses and extensions of filtering, e.g., projection-filters [155], information-geometric nonlinear filtering [156], and information geometric optimization [69] Furthermore, the generalized gradient flow can be regarded as a continuous time limit of the mirror descent where the nonlinear Legendre duality between primal and dual spaces is In this section, we consider the nonequilibrium flow defined by Eq. 59, i.e., with f N E = 0, and show how information geometry can be employed to analyze such dynamics. While we can obtain several properties of equilibrium flow independently of the detail of the thermodynamic function and the dissipation function, these functions should be related to obtaining nice properties for the nonequilibrium flow. We will observe that the thermodynamic function and the dissipation function of LMA kinetics actually have such a relation.

A. Gradient-flow-like property and Lyapunov function of nonequilibrium flow
For the dynamics of the equilibrium flow (Eq. 56), the Bregman divergence D X Φ [x x] is a Lyapunov function. The Bregman divergence can still be a Lyapunov function even for the nonequilibrium flow (Eq. 59) under the following conditions: is the unique steady state of Eq. 59 with the initial state x 0 that attains d dt D X Φ [x t x CB ] = 0 . Thus, x CB is locally and asymptotically stable [81] .

Proof. We can directly verify
where we used Eq. 119 andf (x) := f S (x) − f A (x). The equality holds if and only if , which means that Thus, if the pseudo-Hilbert-isosceles orthogonal decomposition exists, then the nonequilibrium flow behaves like the equilibrium gradient flow.
is as in Eq. 61. Then, for the dissipation functions in Eq. 67, the pseudo-Hilbert isosceles orthogonality f S (x) ⊥ H f A holds for all x ∈ X . In addition, M eq (x CB ) = M CB holds.
Proof. We can prove the orthogonality by direct computation. The orthogonality condition Consider the following equality: where γ ± e := L b ± e . By using this, we have the following: Thus, Eq. 139 holds for x ∈ X if Bj M A (x CB ) = 0 holds [82] . M eq (x CB ) = M CB can be proved by obtaining the parametric representation of M CB as M CB = {x ∈ X | ln x − lnx CB ∈ KerS T } via solving j M A (x) = 0 [83] . This representation is identical to that of M eq (x CB ) (Eq. 96). [82] The transformation of Eq. 140 here is strongly dependent on the specific functional form of j MA (x).
For a general nonequilibrium flow, M CB = ∅ does not necessarily imply the orthogonality. [83] We skip the derivation because it is involved.
See the original derivation [100] or our rephrased version [83] Remark 9 (Algebraic structure of detailed balanced and complex balanced manifolds). We here mention about the underlying algebraic source of why M eq (x CB ) = M CB holds. First, we already showed that M DB = M eq (x) holds generally if M DB = ∅. Under LMA kinetics (Eq. 12), the DB condition j M A (x) = 0 is nothing but the binomial equations because j ± M A (x) is a vector of monominals of x. Owing to this, M DB becomes a toric variety [84] . In contrast, the CB condition Bj M A (x CB ) = 0 is a set of polynomial equations for LMA kinetics. Nonetheless, it was shown that M CB is binomially generated and has the same structural matrix S T as the equilibrium manifold [100] . Because of that, they are equivalent as manifolds.
Because rLDG (Eq. 3) is a subclass of CRN where L = I and thus S = B holds, the complex-balanced condition is always satisfied for rLDG.
The properties described in this corollary are well-known for MJP and are usually obtained by using the Perron-Frobenius theorem for linear operators. The framework of the generalized flow enables us to extend them to the nonlinear regime.
C. Effective flow of the nonequilibrium flow by the primal information-geometric projection In general, the nonequilibrium force or flux has redundant degrees of freedom in terms of generating a specific vector field or trajectory x t on the vertex spaces. By using the generalized HHK projective decomposition (Thm. 1), we can carve out the effective part of the flux for the trajectory. In addition, we can obtain an effective time-dependent equilibrium flow that mimics a trajectory {x t } of time-independent nonequilibrium flow: [84] The real variety generated by a toric ideal, i.e., a binomial and prime ideal [100]. [85] Such a situation is called unconditionally complex balanced.  is the flux of a generalized flow (Eq. 43), and define the corresponding effective flux by Then, j eq (x) generates the same trajectory as j(x) does [87] . Furthermore, for a given trajectory {x t }, we can construct a time-dependent equilibrium flow j eq (t, x) that can generate the same {x t } as follows Proof. From Thm. 1, j(x) can be decomposed as j(x) = j eq (x) + (j(x) − j eq (x)). Because ] ∈ P f r (0), there exists u eq (x) satisfying S T u eq (x) = f eq (x). By employing the duality introduced in Thm. 2, u eq (x) can be represented as where v(x) = −Sj(x). Because −Sj eq (x) = v(x), j eq (x) generates the same dynamics or vector field as j(x) does. By solving u eq (x t ) = ∂Φ(x t ) − ∂Φ(x t ), we have Eq. 145.
The effective time-dependent flow is more explicitly obtained for CRN with LMA kinetics: Corollary 2 (Effective flow for LMA kinetics). Consider the following quantities of CRN with LMA kinetics: Force defined in Eq. 69 Thermodynamic function defined in Eq. 61 Dissipation functions defined in Eq. 67 with Eq. 69 where k ± = κ • K ±1/2 holds. For the trajectory x t generated by j M A (x; k ± ), the effective time-dependent equilibrium force f eq (t, x) can be described as f M A (x; K eq (t)) where K eq (t) is determined by Thus, the effective time-dependent flux j eq (t, x) is represented as j eq (t, x) = j M A (t, x; k ± eq (t)) where k ± eq (t) = κ • K ±1/2 eq (t). [87] j eq (x) may not be equilibrium flux because u eq (x) is not necessarily represented by the gradient of a certain function F (x) as u eq (x) = ∂ x F (x).
This corollary means that the effective time-dependent flux of LMA kinetics is always obtained by a time-dependent modulation of the parameter k ± of LMA kinetics [88] .
D. Characterization of the nonequilibrium flow by the dual information geometric projection The nonequilibrium flow is redundant in terms of generating a specific vector field or trajectory {x t }. However, such redundancy is crucial to characterize the extent of nonequilibrium. One approach for the characterization is to investigate the cycle force or flux, which has been employed in the linear theory of dynamics on graphs and also in graph-theoretic approaches to nonequilibrium phenomena [90,[157][158][159][160]. To extract such cyclic components, we can use V T = curl V , its adjoint V = curl * V , the associated cycle subspaces C 2 (H) and C 2 (H), and also the generalized HKK decomposition (Thm. 1).
Definition 34 (Cycle spaces). The cycle spaces at x ∈ X are defined as Z For a given nonequilibrium force f (x), we can obtain its cycle component ζ = ζ contains the information to categorize the force because P f r (f ) = P f r (ζ) is the quotient space of force by the gradient force. For each ζ ∈ Z x , we obtain the representative force f ♦ (x, ζ) via the following variational problem: Lemma 7 (Steady (zero-velocity) force as the minimizer of the dual dissipation). For a given ζ ∈ Z x , we define the force f ♦ (x, ζ) minimizing the dual dissipation: Then, f ♦ (x, ζ) is the steady (zero-velocity) force, i.e., Proof. From Eq. 110 in Thm. 1, f ♦ (x, ζ) ∈ P f r (ζ) ∩ M vl (0). Thus, j ♦ ∈ P vl (0). [88] While this result may not be so important mathematically, for physics and chemistry, the results means that the effective flux is physically realizable within the LMA kinetics by parameter modu-lation. If the effective flux is out of the LMA kinetics, its physicochemical realization becomes highly nontrivial. The same applies to rLDG.
For a given force f (x) of generalized flow (Eq. 43), Thus, the effective cycle flux j st (x) is represented as In modern nonequilibrium thermodynamics, the effective flux and force, j st (x) and f st (x), have been associated with the housekeeping EPRΣ hk := j st (x), f st (x) , which quantifies the part of dissipation in the nonequilibrium dynamics, which stays non-zero if the trajectory is kept at x or even at the real steady state x st ∈ M st . The remaining partΣ ex :=Σ −Σ hk is also called excess EPR, which is nonzero transiently and becomes zero at the steady state. [90] . Thus, the dually flat structure on the edge space and the HHK decomposition provide a way to dissect the steady and transient aspect in the nonequilibrium flow.

X. SUMMARY AND DISCUSSION
In this work, we have clarified that the doubly dual flat structure of the vertex and edge spaces on graphs and hypergraphs provides the information-geometric basis for the dynamics on graphs and hypergraphs as equilibrium and nonequilibrium flows. Two notions of orthogonality, pseudo-Hilbert isosceles orthogonality and information-geometric orthogonality, have been introduced and shown to dissect the equilibrium and nonequilibrium aspects of the dynamics as the induced structures on the tangent and cotangent spaces and the cycle spaces, the former of which is a generalization of the Otto structure. This structure naturally connects the topological information of underlying discrete manifolds, i.e., graphs and hypergraphs, with the dynamics on them and thus endows more flexibility and representation power to the information-geometric modeling of dynamics. Furthermore, the generalized equilibrium and nonequilibrium flow, as well as the generalized flow, accommodate a sufficiently wide range of models, which include the reversible Markov jump processes on finite graphs and CRN with LMA kinetics (a class of PDS). These results could substantially extend the applicability of information geometry to dynamical problems and open up a new connection between information geometry and homological algebra.

A. Extension of other relations involving information measures
While we demonstrated that the generalized flow and the two dual flat structures can generalize several results known for FPE and diffusion processes, we still have potentially relevant results and problems that could be explained and extended in our framework. For example, for FPE and diffusion processes, the Fisher information number I F was extended to the relative Fisher information (also known as Hyvärinen divergence [122,146]). The relative Fisher information of two trajectories p (1) t (r) and p (2) t (r) are known to satisfy informationtheoretic relations such as the De Brujin identity [53] and its extensions [55,61]. In addition, the logarithmic Sobolev inequality also constitutes a relationship between the Fisher information number and the KL divergence (or Shannon information) [62,63]. It would be an important future problem to integrate these results into the doubly dual flat structure.
Moreover, several relations potentially being related to De Giorgi's formulation (Eq. 50) have been known for mutual information in filtering and control theories. For example, Guo, Shamai, and Verdu found a relation between mutual information and the minimum mean square error (MMSE) in Gaussian channels [161]. Relations similar to these have also been reported by Mayer-Wolf and Zakai [162,163]. Our framework may offer a unified perspective behind these different types of relation involving information measures.

B. Extensions of the doubly dual flat structure
There is also room for extensions of the doubly dual flat structure. While we consider only strictly convex thermodynamic functions and dissipation functions, strict convexity is not necessarily required, at least for defining the generalized flow and the equilibrium and nonequilibrium flows [91] . Actually, in terms of thermodynamics, the thermodynamic function can be non-strictly convex when a phase transition of the system occurs [118]. The loss of bijectivity via the loss of the strict convexity can happen in complicated and degenerate statistical models [164]. Techniques from algebraic geometry could be employed to address such situations [165]. [91] The strict convexity of the dissipation functions is assumed to work on the projections in the edge spaces where the bijection between J x and F x are important. Injectivity of ∂Ψ * x and ∂Φ is sufficient to define the generalized flow and equilibrium and nonequilibrium flow.
Moreover, the structure introduced for reversible LDG and CRN may be extended to irreversible cases, where some edges have only either forward or reverse jumps or reactions.
For this purpose, we may take advantage of several results about the CB states obtained in CRN theory [8] where the reversibility is not necessarily assumed and those in stochastic thermodynamics for absolute irreversible processes [166].
While the nonequilibrium flow is general enough to cover at least all reversible CRN with LMA kinetics, the classes of nonlinear dynamics other than CRN or PDS are much wider than that in general. To further extend the range of models that can be covered,

GENERIC (General Equation for
Non-Equilibrium Reversible-Irreversible Coupling) would be a good candidate [167]. GENERIC is a theoretical framework to integrate dissipative dynamics (gradient flow dynamics) and conservative dynamics (Hamiltonian dynamics).
The extension of the generalized flow to GENERIC has already been attempted, but is still ongoing [76,78]. The information geometry could offer new insights and techniques to achieve this mission.

C. Homological algebraic and differential geometric formulations
From the viewpoint of the standard homological algebra, the doubly dual flat structure that we introduced is an extension of chain and cochain complexes with inner product structure to those with the Legendre dual structure. Because the homological algebra used here is an abstraction of the differential form, the doubly dual flat structure can also be viewed as an extension of the differential form and might be called dually flat form. It is an interesting mission to characterize this stricture under a more rigorous mathematical formulation and to investigate if the Legendre duality can be consistently introduced for chains and cochains higher than those of the edge space. From the viewpoint of differential geometry, the dual flat structure is characterized independently of the specific coordinate by the Hessian manifold [147]. While we stick to the standard basis of the graph or hypergraph on which the convex thermodynamic function is defined, we can formulate it by using Hessian geometry more generally. It would be an important future work to clarify how the doubly dual flat structure is formulated from a differential geometric perspective.

D. Consistency and Persistence
Finally, we mention the problem of consistency and persistence. In this work, we presume that the flux j(x) is consistent with H and that the trajectory is persistent. The explicit conditions when they are satisfied are still elusive. Actually, the sufficient condition for consistency is intricate, even for the separable cases. For an illustrative example, suppose that the ith molecule is involved in the eth reaction of a CRN with LMA kinetics. For x i → 0, j e (x) → 0 holds. However, their Legendre duals diverge as y i → −∞ and f e (x) → −∞.
j e (x) goes to 0 because ω e (x) → 0 holds. This example suggests that we have to consider a certain limit of relevant quantities to appropriately address the consistency condition.
The persistence of the nonequilibrium flow would be a much harder problem. While the persistence was proven for CB CRN with LMA kinetics using techniques from algebraic geometry [113], its connection to information geometry has yet to be clarified. Moreover, from the information-geometric viewpoint, the loss of persistence means the change in the support of probability or positive density, which is effectively accompanied by the change in the topology of the underlying graph or hypergraph. To resolve the problem, we may need a deeper understanding of the interrelationship among dynamics, information-geometric structure, and the underlying topology.