Geometric thermodynamics for the Fokker-Planck equation: Stochastic thermodynamic links between information geometry and optimal transport

We propose a geometric theory of non-equilibrium thermodynamics, namely geometric thermodynamics, using our recent developments of differential-geometric aspects of entropy production rate in non-equilibrium thermodynamics. By revisiting our recent results on geometrical aspects of entropy production rate in stochastic thermodynamics for the Fokker-Planck equation, we introduce a geometric framework of non-equilibrium thermodynamics in terms of information geometry and optimal transport theory. We show that the proposed geometric framework is useful for obtaining several non-equilibrium thermodynamic relations, such as thermodynamic trade-off relations between the thermodynamic cost and the fluctuation of the observable, optimal protocols for the minimum thermodynamic cost and the decomposition of the entropy production rate for the non-equilibrium system. We clarify several stochastic-thermodynamic links between information geometry and optimal transport theory via the excess entropy production rate based on a relation between the gradient flow expression and information geometry in the space of probability densities and a relation between the velocity field in optimal transport and information geometry in the space of path probability densities.


Introduction
A geometric interpretation of thermodynamics originates from the geometric picture of the thermodynamic potential proposed by W. Gibbs in equilibrium thermodynamics and chemical thermodynamics [1].In non-equilibrium thermodynamics, second-order thermodynamic fluctuations around the equilibrium or steady state have been studied [2][3][4][5][6].A differential geometry for equilibrium thermodynamics has been proposed by F. Weinhold [7] and G. Ruppeiner [8] by considering the fluctuation around the equilibrium state, and the length called thermodynamic length in Weinhold geometry has been proposed to quantify the dissipated availability [9].Because this geometry for equilibrium thermodynamics is based on the second-order fluctuation of entropy, its generalization [10,11] has been regarded as information geometry [12,13], which is the differential geometry for the Fisher metric [14].
In this paper, we summarize our recent development of differential geometry for non-equilibrium thermodynamics [22,24,25,29,31,[37][38][39][40][41][42][43][44][45] and propose several relations between these studies by focusing on non-equilibrium dynamics of the Fokker-Planck equation.Because entropy production for the Fokker-Planck equation can be discussed from the viewpoint of both information geometry and optimal transport theory, these relations provide links between information geometry and optimal transport theory.Our proposed geometrical framework for non-equilibrium thermodynamics, namely geometric thermodynamics, offers a new perspective on links between information geometry and optimal transport theory [46][47][48][49] and the unification of non-equilibrium thermodynamic geometry.
2 Fokker-Planck equation and stochastic thermodynamics

Setup
We consider the time evolution of a probability density described by the (overdamped) Fokker-Planck equation.Let t ∈ R and x ∈ R d (d ∈ N) be time and the d-dimensional position, respectively.The probability density of x at time t will be denoted by P t (x), which satisfies P t (x) ≥ 0 and dxP t (x) = 1.The Fokker-Planck equation is given by the following continuity equation, Here, ν t (x) ∈ R d and F t (x) ∈ R d are the vector functions at position x, µ ∈ R >0 and T ∈ R >0 are positive constants, and ∇• and ∇ are the divergence and the gradient operators, respectively.Physically, the Fokker-Planck equation is used to describe the time evolution of the probability density of an overdamped Brownian particle.For Brownian motion, µ, T , and F t (x) physically represent the mobility of the Brownian particle, the temperature of the medium scaled by the Boltzmann constant, and the force on the Brownian particle, respectively [33].The force field ν t (x) is called the mean local velocity because it quantifies the ensemble average of the Brownian particle's velocity in x at time t [16].This Fokker-Planck equation corresponds to the over-damped Langevin equation, which describes the position of the Brownian particle X(t) ∈ R d at time t, that is Ẋ(t) = µF t (X(t)) + 2µT ξ(t). ( Here, Ẋ(t) is the time derivative of position X(t), and (ξ(t) is the white Gaussian noise that satisfies ξ i (t)ξ j (t ′ ) = δ ij δ(t−t ′ ) and ξ i (t) = 0, where • , δ ij , and δ(t − t ′ ) stand for the ensemble average, the Kronecker delta, and the delta function, respectively (j ∈ {1, 2, • • • , d}).
The product of √ 2µT and ξ(t) is given by the Ito integral.Mathematically, this correspondence between the Fokker-Planck equation and the over-damped Langevin equation indicates that these two descriptions provide the same transition probability density from position X(τ ) = x τ to position X(τ + dt) = x τ +dt during the positive infinitesimal time interval dt > 0. The transition probability density from x τ to x τ +dt is given by the Onsager-Machlup theory [33], where • stands for the L 2 norm, and the transition probability density satisfies dx τ +dt T τ (x τ +dt | x τ ) = 1 and T τ (x τ +dt | x τ ) ≥ 0. Let X τ be the random variable corresponding to state x τ .The joint probability of X τ +dt and X τ being in x τ +dt and x τ is defined as which satisfies dx τ +dt dx τ P(x τ +dt , x τ ) = 1 and P(x τ +dt , x τ ) ≥ 0. This joint probability P is called the forward path probability density because it is the probability of the forward path from time t = τ to time t = τ + dt.

Entropy production rate
We introduce stochastic thermodynamics [15,16], that is a framework for nonequilibrium thermodynamics described by a stochastic process such as the Fokker-Planck equation.In stochastic thermodynamics, the entropy production rate is introduced as a measure of thermodynamic dissipation [34].The entropy production rate is defined as follows.
Definition 1 For the Fokker-Planck equation (1), the entropy production rate at time τ is defined as Remark 1 This entropy production rate is definitely non-negative, and its nonnegativity στ ≥ 0 is known as the second law of thermodynamics [16].

Remark 2
The entropy production rate is regarded as the sum of the entropy changes in the heat bath and the system [16].If we assume that Pτ (x) decays sufficiently rapidly at infinity, the entropy production rate can be rewritten as where we used Eq.(1), dxPτ (x)(∂τ ln Pτ (x)) = ∂τ dxPτ (x) = 0, and dx∇ • (ντ (x)Pτ (x) ln Pτ (x)) = 0 because of the assumption that Pτ (x) decays sufficiently rapidly at infinity.The term is the time derivative of the differential entropy [50], which is regarded as the entropy change of the system.The term is the heat dissipation rate and − Qτ /T is regarded as the entropy change of the heat bath.Thus, the entropy production rate is given by the sum of the entropy changes in the heat bath and the system, Its non-negativity στ ≥ 0 provides the Clausius inequality for the Fokker-Planck equation, which is an expression of the second law of thermodynamics.

Kullback-Leibler divergence and entropy production rate
We introduce an expression of the entropy production rate in terms of the Kullback-Leibler divergence, which was discussed in the context of the fluctuation theorem [16,[51][52][53].We assume that the parity of state x τ is even; in other words, the sign of x τ does not change under the time reversal transformation.Let P † (x τ +dt , x τ ) be backward path probability density defined as P † (x τ +dt , x τ ) = T τ (x τ | x τ +dt )P τ +dt (x τ +dt ).Now, we consider the Kullback-Leibler divergence between P(x τ +dt , x τ ) and P † (x τ +dt , x τ ) defined as The entropy production rate σ τ is given by this Kullback-Leibler divergence as follows.
Lemma 1 The entropy production rate στ is given by Proof We rewrite the Kullback-Leibler divergence D KL (P P † ) as where O(dt 2 ) means the term that satisfies lim dt→0 O(dt 2 )/dt = 0 and we used dxτ P(x τ +dt , xτ ) = P τ +dt (x τ +dt ) and where • stands for the Stratonovich integral defined as ( we obtain from Eqs. ( 13) and (15).

Remark 3
The Kullback-Leibler divergence is always non-negative D KL (P P † ) ≥ 0 and zero if and only if P = P † .Thus, στ = 0 if and only if P = P † .Physically, P = P † means the reversibility of stochastic dynamics, and στ = 0 means that the system is in equilibrium.

Remark 4
The time integral of the entropy production rate Σ(τ ′ ; τ ) = τ ′ τ dtσ t is called the entropy production from time t = τ to time t = τ ′ .The Lemma 1 implies that the Kullback-Leibler divergence D KL (P P † ) is equivalent to the entropy production from time t = τ to t = τ + dt up to O(dt 2 ), where O(dt 2 ) means the term lim dt→0 O(dt 2 )/dt = 0.
Remark 5 Because the Fokker-Planck equation describes the Markov process, each increments are independent and the results in this paper can be generalized for the entire path from t = 0 to t = τ .Thus, the results for the entropy production rate στ in this paper can be generalized for the entropy production Σ(τ ; 0) based on the expression, where P and P † is the forward and backward path probability densities for the entire path This link between the entropy production rate and the Kullback-Leibler divergence in Lemma 1 leads to an information-geometric interpretation of the entropy production rate.
3 Information geometry and entropy production

Projection theorem and entropy production
We discuss an information-geometric interpretation of the entropy production based on the projection theorem [13], which is obtained for a general Markov jump process in Ref. [25].The entropy production can be understood in terms of the information-geometric projection onto the backward manifold defined as follows.
Definition 2 Let Q(x τ +dt , xτ ) be a probability density that satisfies The set of the probability density is called the backward manifold.This backward path probability density P † is given by the informationgeometric projection from P onto M B (P).This information-geometric projection is formulated based on the following generalized Pythagorean theorem.
Lemma 2 For any Q ∈ M B (P), the generalized Pythagorean theorem Proof where we used dx τ +dt Tτ (x τ +dt | xτ ) = 1, dxτ Tτ (xτ | x τ +dt ) = 1, and Remark 8 The generalized Pythagorean theorem Eq. ( 20) can be rewritten as Information-geometrically, Eq. ( 22) implies that the m-geodesic between two points P and P † is orthogonal to the e-geodesic between two points P † and Q [12].The definition of the m-geodesic between two points P and P † is given by (1 − θ)P + θP † , where θ is an affine parameter θ ∈ [0, 1].The definition of the e-geodesic between two points P † and Q is also given by (1 − θ) ln P † + θ ln Q.
The orthogonality based on the generalized Pythagorean theorem leads to the projection theorem, which provides a minimization problem of the Kullback-Leibler divergence.Thus, Lemma 2 implies that the entropy production rate can be obtained from a minimization problem of the Kullback-Leibler divergence.
Thus, Theorem 3 implies that the entropy production Σ(τ + dt; τ ) can be obtained from the information-geometric projection onto the backward manifold M B (P).This result is helpful in estimating the entropy production Σ(τ + dt; τ ) numerically by calculating the optimization problem to minimize the Kullback-Leibler divergence D KL (P Q).

Interpolated dynamics and Fisher information
We can consider not only the m-geodesic between P and P † in Lemma 2 but also the e-geodesic between P and P † .This geodesic can be discussed in terms of the interpolated dynamics, which has been substantially introduced in Refs.[40,54].By considering this interpolation, we obtain an expression of the entropy production rate by the Fisher metric, which provides a tradeoff relation between the entropy production rate and the fluctuation of any observable.We start with the definition of interpolated dynamics as follows.
Definition 3 Dynamics described by the following continuity equation are called interpolated dynamics for two force fields ν t (x) = µ(F t (x) − T ∇ ln P t (x)) and where θ ∈ [0, 1] is an interpolation parameter.

Remark 9
The corresponding over-damped Langevin equation of interpolated dynamics is given by Ẋ Definition 4 The path probability density of interpolated dynamics for two force fields ντ (xτ ) = µ(F τ (xτ ) − T ∇ ln Pτ (xτ )) and ν ′ τ (xτ ) is defined as Remark 10 The parameter θ quantifies a difference between interpolated dynamics and original Fokker-Planck dynamics (1) because θ = 0 provides the transition rate Remark 11 Because the path probability density is given by ln the parameter θ can be regarded as a theta coordinate system for the exponential family in information geometry [13].By neglecting O(dt), ln P θ ν ′ τ (x τ +dt , xτ ) can be rewritten as (32) which implies that P θ ν ′ τ gives the e-geodesic between two points P 0 We next consider the backward path probability density P † in terms of interpolated dynamics.The backward path probability density P † can be regarded as P 1 −ντ as follows.
Lemma 4 The backward path probability density P † (x τ +dt , xτ ) is given by Proof The backward path probability density P † (x τ +dt , xτ ) is calculated as Lemma 4 implies that the path probability density of interpolated dynamics P θ −ντ gives the e-geodesic between P and P † .We discuss an information-geometric interpretation of the entropy production based on P θ −ντ .To discuss it, we introduce the following lemma proposed in Ref. [55].

Remark 13
The entropy production can be regarded as half of the Fisher metric, and thus the entropy production can also be a particular Riemannian metric of differential geometry.Based on the Fisher metric, we can introduce the square of the line element ds 2 path defined as , where θ is the interpolation parameter for P θ −ντ and this line element is introduced on the corresponding e-geodeisc.

Thermodynamic uncertainty relations
The link between the entropy production rate and the Fisher metric leads to thermodynamic trade-off relations between the entropy production rate and the fluctuation of any observable.A particular case of thermodynamic trade-off relations was proposed as the thermodynamic uncertainty relations [56,57].In Refs.[24,40,43,[58][59][60], several links between the Cramér-Rao bound and generalizations of the thermodynamic uncertainty relations have been discussed.Here, we newly propose a generalization of the thermodynamic uncertainty relations based on the fact that the entropy production is regarded as half of the Fisher information in Theorem 6.To obtain the proposed thermodynamic uncertainty relation, we start with the Cramér-Rao bound.
where | θ=0 stands for substitution θ = 0, is the expected value defined as and the deviation From the Cauchy-Schwartz inequality, we obtain the Cramér-Rao bound, where we used dx τ +dt dxτ ∂ By plugging Eq. ( 42) into the Cramér-Rao bound, the proposed generalized thermodynamic uncertainty relation, which is a trade-off relation between the entropy production rate σ τ and the fluctuation of any observable Var[R] = E P [(∆ P R) 2 ], can be obtained as follows.
The entropy production rate στ is bounded by the generalized thermodynamic uncertainty relation, where Var[R] is the variance defined as is the generalized current defined as Here, ∇x is a gradient operator for x ∈ R d .
Remark 14 In Ref. [58], a special case of the proposed thermodynamic uncertainty relation (48) was discussed for  48) can be rewritten as This result can also be easily obtained from the Cauchy-Schwartz inequality as follows, 4 Optimal transport theory and entropy production 4.1 L 2 -Wasserstein distance and minimum entropy production We next discuss a relation between the L 2 -Wasserstein distance in optimal transport theory [36] and the entropy production.We start with the Benamou-Brenier formula [61], which gives the definition of the L 2 -Wasserstein distance.
Definition 5 Let P (x) and Q(x) be probability densities at the position x ∈ R d that satisfy dxP (x) = dxQ(x) = 1, P (x) ≥ 0 and Q(x) ≥ 0, where the second-order moments dx x 2 P (x) and dx x 2 Q(x) are finite.Let ∆τ ≥ 0 be a non-negative time interval.The L 2 -Wasserstein distance between probability densities P and Q is defined as where the infimum is taken among all paths (v t (x), P t (x)) τ ≤t≤τ +∆τ satisfying the continuity equation with the boundary conditions Remark 15 This definition of the L 2 -Wasserstein distance is consistent with the definitions used in the Monge-Kantorovich problem [35].The definition in the Monge-Kantrovich problem is as follows.Let Π(x, x ′ ) be the joint probability density at positions x ∈ R d and x ′ ∈ R d that satisfies Π(x, x ′ ) ≥ 0, dx ′ Π(x, x ′ ) = P (x), and dxΠ(x, x ′ ) = Q(x ′ ).The L 2 -Wasserstein distance is also defined as Remark 16 The L 2 -Wasserstein distance is a distance [35], since it is symmetric Based on the Benamou-Breiner formula, we can consider the minimum entropy production for the Fokker-Planck equation in terms of the L 2 -Wasserstein distance.This link between the L 2 -Wasserstein distance and the entropy production was initially pointed out in the field of the optimal transport theory (for example, in Refs.[36,62]).After that, it was also discussed in the context of stochastic thermodynamics [19,63].This link has been recently revisited in terms of thermodynamic trade-off relations such as the thermodynamic speed limit [31,64] and the thermodynamic uncertainty relation [40].The decomposition of the entropy production rate based on the optimal transport theory has also been proposed in Refs.[31,39].
For a stochastic process evolving according to the Fokker-Planck equation, the entropy production Σ(τ + ∆t; τ ) for a fixed initial probability density P τ and a fixed final probability density P τ +∆τ is bounded by the L 2 -Wasserstein distance as follows.This result is regarded as a thermodynamic speed limit [31,64], which is a trade-off relation between the finite time interval ∆τ and the entropy production Σ(τ + ∆τ ; τ ).
By considering the geometry of the L 2 -Wasserstein distance and introducing the L 2 -Wasserstein path length, we can obtain another lower bound on the entropy production, which is the tighter than Eq. ( 61).This bound is proposed as a tighter version of the thermodynamic speed limit in Ref. [31].The L 2 -Wasserstein path length is defined as follows.
Definition 6 Let t ∈ R and s ∈ R indicate time.For a fixed trajectory of probability density (P t ) τ ≤t≤τ +∆τ , the Wasserstein path length from time t = τ to time t = τ + ∆τ is defined as Remark 17 We can obtain L(τ + ∆τ ; τ ) ≥ W 2 (Pτ , P τ +∆τ ) by using the triangle inequality of the L 2 -Wasserstein distance.Thus, the L 2 -Wasserstein distance W 2 (Pτ , P τ +∆τ ) can be regarded as the minimum Wasserstein path length, that is the geodesic between two points Pτ and P τ +∆τ .

Geometric decomposition of entropy production rate
Based on Eq. ( 66), we can obtain a decomposition of the entropy production into two non-negative parts, namely the housekeeping entropy production rate and the excess entropy production rate.This decomposition has been substaintially obtained in Ref. [31], and discussed in Refs.[39,40] from the viewpoint of the thermodynamic uncertainty relation.
Here, we define the housekeeping entropy production rate and the excess entropy production rate based on Eq. ( 66).

Definition 7
The excess entropy production rate is defined as and the housekeeping entropy production rate is defined as Remark 19 The excess entropy production rate is definitely non-negative σ ex τ ≥ 0. The housekeeping entropy production is non-negative σ hk τ ≥ 0 because of Eq. ( 66).The entropy production rate is decomposed as στ = σ ex τ + σ hk τ .

Remark 20
The excess entropy production rate becomes zero σ ex τ = 0 if and only if the system is in the steady-state ∂ t P t (x) = 0, or equivalently Pτ = P τ +dt for an infinitesimal time interval dt.The decomposition of the entropy production rate such that the excess entropy production rate becomes zero in the steady state is not unique, and another example of the decomposition of the entropy production rate has been obtained in the study of the steady-state thermodynamics [65].Our definitions of the excess entropy production rate and the housekeeping entropy production rate are generally different from the excess entropy production rate and the housekeeping entropy production rate proposed in Ref. [65].We discussed this difference in Ref. [40].

Remark 21
The thermodynamic speed limit can be tightened by using the excess entropy production rate as follows.
The contribution of the housekeeping entropy production rate does not affect in the thermodynamic speed limit and the lower bound becomes tighter if σ hk t = 0.The lower bound Σ(τ + ∆τ ; τ ) = [W 2 (Pτ , P τ +∆τ )] 2 /(µT ∆τ ) is achieved when and σ hk t = 0 for τ ≤ t ≤ τ + ∆τ .This implies that the geodesic in the space of the L 2 -Wasserstein distance is related to the optimal protocol that minimizes the entropy production in a finite time.The condition σ hk t = 0 is related to the condition of the potential force as discussed below.Thus, if we want to minimize the entropy production in a finite time, the probability density P t should be changed along the geodesic in the space of the L 2 -Wasserstein distance by the potential force [31].
To discuss a physical interpretation of this decomposition, we focus on another expression of the optimal protocol proposed in Ref. [61].
Here, ν * t (x) ∈ R d is a vector field, namely an optimal mean local velocity, that satisfies with a potential φ t (x) ∈ R and a time evolution of P t (x) that connects Pτ (x) and P τ +∆τ (x).
Proof Using the method of Lagrange multipliers, the optimization problem in Eq. ( 56) with the constraint ) can be solved by the calculus of variations for (P t ) τ <t<τ +∆τ and (ν * t ) τ ≤t<τ +∆τ , with the Lagrangian where From this optimal protocol, the excess entropy production rate and the housekeeping entropy production rate can be regarded as a potential contribution and a non-potential contribution to the entropy production rate, respectively.This fact is given by the following theorem.
Remark 23 Let us consider the case that a force F τ (x) is a potential force F τ (x) = −∇Uτ (x) where U (x) ∈ R is a potential.In this case, the local mean velocity is given by ντ (x) = ∇(−µUτ (x) − µT ln Pτ (x)) and φτ (x) can be φτ (x) = −µUτ (x) − µT ln Pτ (x).Thus, we obtain ντ (x) = ν * τ (x), σ ex τ = στ and σ hk τ = 0 for a potential force.This fact implies that the excess entropy production rate and the housekeeping entropy production rate quantify contributions of a potential force and a non-potential force to the entropy production rate, respectively.
Based on the expression in Theorem 12, we also obtain a thermodynamic uncertainty relation for the excess entropy production rate [39,40], which was substaintially obtained in Refs.[24,67] for the entropy production rate.
Theorem 13 Let r(x) ∈ R be any time-independent function of x ∈ R d .We assume that Pτ (x) decays sufficiently rapidly at infinity.The entropy production rate is bounded by where E Pτ (r) is the expected value defined as Proof The quantity ∂τ E Pτ (r) is calculated as where we used dx∇ • (ν * τ (x)Pτ (x)r(x)) = 0 because of the assumption that Pτ (x) decays sufficiently rapidly at infinity.From the Cauchy-Schwartz inequality, we obtain By combining Eq. ( 91) with στ ≥ σ ex τ , we obtain Eq. ( 88).

Remark 24
The weaker inequality στ ≥ [∂τ E Pτ (r)] 2 /[µT dx ∇r(x) 2 Pτ (x)] can be regarded as the thermodynamic uncertainty relation Eq. ( 54) for w(x) = ∇r(x) where the generalized current is calculated as J [∇r(x)] = ∂τ E Pτ (r).Thus, this result is also regarded as a consequence of the Cramér-Rao bound for the path probability density.In the context of optimal transport theory, a mathematically equivalent inequality, namely the Wasserstein-Cramér-Rao bound, was also proposed in Ref. [68].
5 Thermodynamic links between information geometry and optimal transport

Gradient flow and information geometry in space of probability densities
In terms of the excess entropy production rate, we can obtain a thermodynamic link between information geometry in the space of probability densities and optimal transport theory.To discuss this thermodynamic link, we start with the definition of the pseudo energy U * t (x) and the pseudo canonical distribution P pcan t (x) proposed in Ref. [40].
Definition 8 Let φ t (x) ∈ R d be a potential which provides the optimal mean local velocity ν * t (x) = ∇φ t (x) ∈ R d for an infinitesimal time such that lim ∆t→0 [W 2 (P t , P t+∆t )] 2 /(∆t) 2 = dx ν * t (x) 2 P t (x).The pseudo energy U * t (x) ∈ R is defined as and the pseudo canonical distribution P pcan t (x) is defined as which is a probability density that satisfies P pcan t (x) ≥ 0 and dxP pcan t (x) = 1.

Remark 25
The pseudo energy can be defined for a non-potential force F t (x) that satisfies ∇ • ((F t (x) + ∇U * t (x))P t (x)) = 0. Thus, the pseudo energy U * t (x) is not generally unique.If a force F t (x) is given by a potential force F t (x) = −∇U t (x), a potential energy can trivially be a pseudo energy The time evolution of P t (x) is given by a gradient flow expression.The concept of the gradient flow is originated from the Jordan-Kinderlehrer-Otto scheme [62].We rewrite the gradient flow expression in Ref. [69] by using a functional derivative of the Kullback-Leibler divergence for general Markov jump processes [43].The following proposition is a special case of a gradient flow expression [43] for the Fokker-Planck equation.

Proposition 14
The time evolution of P t (x) under the Fokker-Planck equation is described by a gradient flow expression, where and D[•] stands for the weighted Laplacian operator defined as Proof The functional derivative ∂ Pt(x) D KL (P t P pcan t ) is calculated as and its gradient is calculated as where we used ∇ dx exp [−U * t (x)/T ] = 0. Thus, the optimal mean local velocity provides Eq. ( 94) as follows, Remark 26 The Kullback-Leibler divergence is calculated as D KL (P t P pcan t ) = dxP t (x) ln[P t (x)/P pcan t (x)] by using the normalization of the probability dxP t (x) = dxP pcan t (x) = 1.We take care that the functional derivative for dxP t (x) ln[P t (x)/P pcan t (x)] is different from the functional derivative for Eq. ( 95), and the functional derivative ∂ Pt(x) D KL (P t P pcan t ) = ln[P t (x)/P pcan t (x)] is defined for Eq. ( 95).
Remark 27 If a force is given by a potential force F t (x) = −∇U (x) with a timeindependent potential energy U (x) ∈ R, a pseudo energy becomes a potential energy U * t (x) = U (x) and a pseudo canonical distribution P pcan t (x) becomes an equilibrium distribution P pcan t (x) = P eq (x), which satisfies the condition that P t (x) → P eq (x) in the limit t → ∞.In this case, the gradient flow expression Eq. ( 94) is given by ∂ t P t (x) = D[∂ Pt(x) D KL (P t P eq )] which describes a relaxation to an equilibrium distribution P eq (x).
Remark 28 If a pseudo canonical distribution is given by a time-independent distribution P pcan t (x) = P st (x), the Fokker-Planck equation can be rewritten as the heat equation, or equivalently, where div Pt is the operator defined as with |g| = P t (x).Because the operator div √ | g | is a generalization of the divergence operator for non-Euclidean space with the absolute value of the determinant of the metric tensor |g|, the Fokker-Planck equation may be regarded as a kind of the diffusion equation for ∂ Pt(x) D KL (P t P st ) = ln P t (x)−ln P st (x).As a consequence of the diffusion process, we may obtain ∂ Pt(x) D KL (P t P st ) = ln P t (x) − ln P st (x) → 0 in the limit t → ∞, which implies the relaxation to a steady state P t (x) → P st (x).
Based on the gradient flow expression (94), we obtain the following expression of the excess entropy production rate discussed in Ref. [40].
Theorem 15 We assume that Pτ (x) decays sufficiently rapidly at infinity.The excess entropy production rate is given by Proof The excess entropy production rate is calculated as where we used dx∇ • (∂ Pτ (x) D KL (Pτ P pcan τ )(∇φτ (x))Pτ (x)) = 0 because of the assumption that Pτ (x) decays sufficiently rapidly at infinity.

Remark 29
If the pseudo distribution does not depend on time P pcan τ (x) = P st (x), the non-negativity of the excess entropy production rate is related to the monotonicity of the Kullback-Leibler divergence ∂τ D KL (Pτ P st ) ≤ 0, where P st (x) is a steady-state distribution that satisfies ∂τ P st (x) = 0.Because σ ex τ = 0 if and only if the system is in a steady-state Pτ (x) = P st (x), the excess entropy production rate can be given by a Lyapunov function D KL (Pτ P st ) and this monotonicity ∂τ D KL (Pτ P st ) ≤ 0 gives the relaxation to a steady state distribution Pτ (x) → P st (x) in the limit τ → ∞.
Remark 30 This expression of the excess entropy production rate in terms of the Kullback-Leibler divergence provides a link between optimal transport theory and information geometry.As discussed in Ref. [41], the excess entropy production can be expressed using the dual coordinate systems for the Kullback-Leibler divergence.By using the dual coordinate systems with an affine transformation, the Kullback-Leibler divergence between P and Q is given by D , where η P (x) = P (x) − P st (x) and θ Q (x) = ln Q(x) − ln P st (x) are the eta and theta coordinate systems that satisfy η P st (x) = θ P st (x) = 0, and ϕ(η P (x)) = D KL (P P st ) and ψ(θ Q (x)) = D KL (P st Q) are the dual convex functions, respectively.Thus, if the pseudo distribution does not depend on time P pcan τ (x) = P st (x), the excess entropy production is given by σ ex τ = −∂τ ϕ(η Pτ (x)).

Remark 31
The relaxation to an equilibrium distribution P eq (x) for a timeindependent potential force F τ (x) = −∇U (x) was discussed from the viewpoint of information geometry based on the expression of the entropy production rate σ ex τ = στ = −∂τ D KL (Pτ P eq ) in Refs.[30,70,71].
Near steady state, the Fisher metric for a probability density is also related to the entropy production rate.A thermodynamic interpretation of the Fisher metric was discussed in Ref. [11] as a generalization of the Weinhold geometry [7] or the Ruppeiner geometry [8,10] in a stochastic system near equilibrium.We also examined this Fisher metric for a far-from-equilibrium system in Refs.[22,24,29].To discuss a thermodynamic interpretation of the Fisher metric, we start with the definition of the Fisher information of time for a probability density.
Definition 9 Let P t (x) be a probability density of x ∈ R d .The Fisher information of time is defined as The positive square root v info (t) = ds 2 /dt 2 is called the intrinsic speed.
Remark 32 The Fisher information of time is given by the Taylor expansion of the Kullback-Leibler divergence as follows. 2 If we consider a time-dependent parameter θ ∈ R, the Fisher information of time is given by where g θθ (P t ) is the Fisher metric defined as g θθ (P t ) = dxP t (x)(∂ θ ln P t (x)) 2 .Thus, the intrinsic speed v info (t) = ds 2 /dt 2 means the speed on the manifold of the probability simplex, where the metric is given by the Fisher metric.This differential geometry is well discussed in information geometry [12].

Remark 33
The relaxation to a steady state can be discussed in terms of the monotonicity of the intrinsic speed (or the monotonicity of the Fisher information of time) When a force F t (x) depends on time, the upper bound on ∂ t (v info (t)) 2 cannot be zero.The upper bound on ∂ t (v info (t)) 2 for the general case ∂ t F t (x) = 0 was discussed in Ref. [24].
If a pseudo canonical distribution is given by a time-independent steadystate distribution P pcan τ = P st , the Fisher information of time is related to the excess entropy production rate.This was discussed in Ref. [29] for the entropy production rate with a general rate equation on chemical reaction networks.For the Fokker-Planck equation, we newly propose the following relation between the Fisher information of time and the excess entropy production rate as a correspondence of the result in Ref. [29].Proposition 16 We assume that a pseudo canonical distribution does not depend on time P pcan τ (x) = P st (x).We assume that Pτ (x) decays sufficiently rapidly at infinity.Let η Pt (x) = P t (x) − P st (x) be a difference from a steady-state distribution.The Fisher information of time is given by where O(η 3 Pt ) stands for O(η 3 Pt )/(η Pt (x)) 2 → 0 in the limit P t → P st .
Proof Let θ Pt (x) = ln P t (x) − ln P st (x) be the theta coordinate that satisfies θ P st (x) = 0.The Fisher information of time is given by where we used dx∇ • ((∇θ Pt (x))P t (x)[∂ t θ Pt (x)]) = 0 because of the assumption that Pτ (x) decays sufficiently rapidly at infinity.The excess entropy production rate is given by where we used dx∇ • ((∇θ Pt (x))P t (x)θ Pt (x)) = 0 because of the assumption that Pτ (x) decays sufficiently rapidly at infinity.Thus, we obtain Remark 34 Because the excess entropy production rate is defined in terms of the 108) implies a relation between the L 2 -Wasserstein distance and the Fisher information of time near steady state (v info (τ ) Pt ) and O(dt).

Remark 35
We discussed a relation between the Fisher information and the excess entropy production rate proposed by Glansdorff and Prigogine near steady state in Ref. [38] from the viewpoint of the Glansdorff-Prigogine criterion for stability [5,6,72,73].We remark that the definition of the excess entropy production rate by Glansdorff and Prigogine is slightly different from the definition based on L 2 -Wasserstein distance in this paper.
Remark 36 As discussed in Ref. [41], the expression of σ ex t = − dx[∂ t η Pt (x)]θ Pt (x) in Eq. ( 110) implies that the time derivative of the eta coordinate system ∂ t η Pt (x) corresponds to the thermodynamic flow and the theta coordinate −θ Pt (x) corresponds to the conjugated thermodynamic force, respectively.The expression of the Fisher information of time in terms of the thermodynamic flow and the conjugated thermodynamic force (v info (t)) 2 = dx[∂ t η Pt (x)][∂ t θ Pt (x)] has been substantially obtained in Ref. [22].The gradient of the thermodynamic force ∇θ Pt (x) is also regarded as the thermodynamic force because the gradient is given by the linear combination of the thermodynamic force at position x + ∆x and position x for the infinitesimal distance ∆x.The quantity µT P st (x) is also regarded as the Onsager coefficient near equilibrium because Eq. ( 110) is the quadratic function of the thermodynamic force ∇θ Pt (x) with proportionality coefficient µT P st (x).The gradient flow expression of the Fokker-Planck equation Eq. ( 94) is given by the weighted Laplacian operator Eq. ( 96) where this weight is regarded as the Onsager coefficient near equilibrium.Based on the quadratic expression in Eq. ( 110), we also can consider a geometry where the weight of the Onsager coefficient is a metric.We used the weight of the generalized Onsager coefficient in Ref. [29] to define the excess entropy production rate for general Markov processes based on optimal transport theory, and discussed a geometric interpretation of the excess entropy production rate.
By using the Fisher information of time, the information-geometric speed limit discussed in Refs.[11,22,24,29] can be obtained in parallel with the derivation of the thermodynamic speed limit Eq. ( 68).The information-geometric speed limit provides a lower bound on the quantity τ +∆τ τ dt(v info (t)) 2 .The quantity τ +∆τ τ dt(v info (t)) 2 can be regarded as the thermodynamic cost because this quantity is related to the change of the excess entropy production rate Pt ) near steady state by using Eq. ( 108).

Theorem 17 The quantity
where D(Pτ , P τ +∆τ ) is the twice of the Bhattacharyya angle ζ B defined as Proof From the Cauchy-Schwartz inequality, we obtain Thus, the tighter lower bound is obtained as To solve the minimization of τ +∆t τ dt(v info (t)) 2 under the constraint dxP t (x) = 1 with fixed Pτ and P τ +∆τ , we consider the Euler-Lagrange equation 116) for τ < t < τ + ∆τ with the Lagrangian The Euler-Lagrange equation can be rewritten as which solution is generally given by and β(x) ∈ R. The constraint dxP t (x) = 1 with fixed Pτ and P τ +∆τ for this solution provides the optimal solution that minimize τ +∆t τ dt(v info (t)) 2 under the constraint, where the normalization of the probability is satisfied for τ ≤ t ≤ τ + ∆τ , Thus, the weaker lower bound is calculated as Remark 37 D(Pτ , P τ +∆τ ) is regarded as the geodesic on the hyper-sphere surface of radius 2.An interpretation of the Bhattacharyya angle as the geodesic on the hypersphere surface is related to the fact that information geometry can be regarded as the geometry of a hyper-sphere surface of radius 2 because the square of the line element is obtained from the Fisher metric as ds 2 = dx(2d P t (x)) 2 with the constraint dx( P t (x)) 2 = 1.The Bhattacharyya angle ζ B is given by the inner product for a unit vector on the hyper-sphere cos ζ B = dx( Pτ (x) P τ +∆τ (x)).

Remark 38
The quantity τ +∆τ τ dtv info (t) is called the thermodynamic length proposed in Ref. [11] as a generalization of the result in Ref. [9].The thermodynamic length is minimized as τ +∆τ τ dtv info (t) ≥ D(Pτ , P τ +∆τ ) for the fixed initial distribution Pτ and the final distribution P τ +∆τ .The minimization of the thermodynamic length near equilibrium for large time interval ∆τ is related to an optimal protocol to minimize the quadratic cost representing an observable fluctuation [11,20,74].
From the Cramèr-Rao bound, the intrinsic speed is also related to the speed of the observable.From the viewpoint of thermodynamics, this fact was discussed in Ref. [24] for a time-independent observable, and in Ref. [26] for a time-dependent observable.
Definition 10 Let r(x) ∈ R be time-independent ∂ t r(x) = 0.The speed of the observable vr(t) is defined as Lemma 18 For any r(x) ∈ R, the speed of the observable vr(t) is generally bounded by the intrinsic speed v info (t), Proof The Fisher information of time [v info (t)] 2 is the Fisher metric for the parameter θ = t.As discussed in Lemma 7, the Cramér-Rao bound for the parameter θ = t is given by where we used the Cauchy-Schwartz inequality and dx∂ t P t (x) = 0.By taking the square root of each side, we obtain Eq. (7).
We newly propose that the intrinsic speed v info (t) also provides an upper bound on the excess entropy production rate.This fact was substantially proposed in Refs.[29,38].

Proposition 19
The excess entropy production rate σ ex t is bounded as follows.
where θ Pt (x) is the theta coordinate system defined as θ Pt (x) = ln P t (x) − ln P pcan t (x).
Proof The excess entropy production is given by where we used dx∂ t P t (x) = 0. From the Cauchy-Schwartz inequality, we obtain By taking the square root of each side, we obtain Eq. ( 125).
Remark 40 Proposition 19 implies that the excess entropy production rate σ ex t is generally bounded by the intrinsic speed v info (t).From the bound (125), the excess entropy production rate is zero σ ex t = 0 if the intrinsic speed is zero v info (t) = 0.This result is consistent with the fact that the excess entropy production rate is zero in a steady state.

Excess entropy production and information geometry in space of path probability densities
Here we newly propose that the excess entropy production rate, which is given by the L 2 -Wasserstein distance in optimal transport theory, can also be obtained from the projection theorem in the space of path probability densities as analogous to the entropy production rate.This projection theorem for the excess entropy production rate was substantially obtained in Ref. [44] for the general Markov process.This result also gives another link between information geometry in the space of path probability densities and optimal transport theory.
We start with the expressions of the entropy production rate, the excess entropy production rate and the housekeeping entropy production rate by the Kullback-Leibler divergence between the path probability densities of interpolated dynamics P θ ν ′ τ defined in Definition 4.
Thus, the Kullback-Leibler divergecne is calculated as Therefore, by plugging Remark 41 Proposition 20 implies that the origin of the decomposition στ = σ ex τ + σ hk τ comes from the generalized Pythagorean theorems D KL (P that is consistent with the Pythagorean theorem in Remark 22.
Based on the projection theorem for the Pythagorean theorem in Remark 41, we obtain expressions of the excess entropy production rate and the housekeeping entropy production rate by the minimization problem of the Kullback-Leibler divergence.
Proposition 21 We assume that Pτ (xτ ) decays sufficiently rapidly at infinity.The excess entropy production rate and the housekeeping entropy production rate are given by where M ZD (P) is the zero-divergence manifold defined as and M G (P) is the gradient manifold defined as Proof Let ν * τ (xτ ) = ∇φτ (xτ ) be the optimal mean local velocity.For any P 1 v ∈ M ZD (P), we obtain the generalized Pythagorean theorem, where we used Eq. ( 133) and where we used Eq. ( 133) and The excess entropy production rate and the housekeeping entropy production rate are also regarded as the Fisher metric for the path probability density P.This fact implies that optimal transport can be discussed from the viewpoint of information geometry in the space of path probability densities.

Remark 44
The expressions of the excess entropy production rate and the housekeeping entropy production rate by the Fisher metric lead to the thermodynamic uncertainty relations for the excess entropy production rate and the housekeeping entropy production rate as a consequence of the Cramér-Rao inequality.The thermodynamic uncertainty relation for the excess entropy production rate had been discussed in Theorem.13.These thermodynamic uncertainty relations for the excess entropy production rate and the housekeeping entropy production rate have been substaintially obtained in Ref. [40].The thermodynamic uncertainty relation for the excess entropy production rate and the housekeeping entropy production rate can be generalized based on the orthogonality as discussed in Ref. [75].

Conclusion and discussion
We discuss stochastic thermodynamic links between information geometry and optimal transport theory via the excess entropy production rate.We can discuss a link between information geometry in the space of probability densities and optimal transport theory and a link between information geometry in the space of path probability densities and optimal transport theory because the excess entropy production rate is related to the L 2 -Wasserstein distance, the time derivative of the Kullback-Leibler divergence between probability densities, and the Kullback-Leibler divergence between the path probability densities.These links are useful for studying the mathematical properties of the entropy production rate in stochastic thermodynamics.For example, thermodynamic trade-off relations, namely the thermodynamic uncertainty relations and the thermodynamic speed limit, can be obtained from geometric inequalities such as the Cauchy-Schwartz inequality and the triangle inequality.The optimal protocol to minimize the thermodynamic cost can also be discussed in terms of the geodesic.We also remark on possible generalizations of the results in this paper.In this paper, we only focus on the stochastic dynamics described by the Fokker-Planck equation.Because stochastic thermodynamics has been discussed for the general Markov process described by the master equation, the generalization of the proposed results for the general Markov process is interesting.For example, generalizations of the results in this paper for the master equation have been seen in Refs.[22,25,28,38,41,43,44,60,76].Because generalizations are not unique, rather different approaches of optimal transport theory for stochastic thermodynamics in the Markov jump process have also been seen in Refs.[77][78][79][80].Unlike these generalizations, our generalizations [43,44] are related to the gradient flow expression and information-geometric projection discussed in this paper.The generalization for the deterministic chemical rate equation is also interesting to consider information geometry and optimal transport theory for chemical thermodynamics, which was proposed in Refs.[29,[42][43][44][80][81][82][83].For the deterministic chemical rate equation, we do not need to use stochasticity to obtain generalized results, and geometric properties play a crucial role in the derivation of generalized results, similarly as in the stochastic case.This is the reason why we call our framework geometric thermodynamics instead of stochastic thermodynamics.
Finally, we point out that geometric thermodynamics is related to several fascinating topics, and has the potential to clarify the geometric properties of these topics.The classical correspondence of the control protocol called shortcuts to adiabatically for the stochastic process [84][85][86][87][88] is related to the geometry of the probability distribution.Remarkably, the link between shortcuts and information geometry has been proposed in Ref. [89].Indeed, a generalization of our framework for general Markov jump processes [43] is related to the stochastic correspondence of shortcuts called shortcuts in stochastic systems [86].An application of shortcuts or our framework to a low-power electronic circuits called adiabatic circuits [90] is promising.A connection between geometric thermodynamics and a geometrical interpretation of another excess entropy production rate proposed in Ref. [91] in terms of the Berry phase [92], which is related to the geometry of the cyclic path, is interesting.The cyclic path in information geometry and optimal transport theory was discussed in the optimal heat engine [27,31,32,93,94] and the geometric pump [95].A geometric interpretation of the restricted path may also be interesting in the context of optimal limited control [96][97][98][99].The dual coordinate systems in stochastic thermodynamics provide the duality in stochastic thermodynamics [41,42,44,82,83,100,101], which is related to the variational calculus such as the maximum caliber principle [44,[101][102][103] and the Schrödinger bridge [104,105].Our results may also be useful for a machine learning technique based on non-equilibrium thermodynamics called the diffusion model [106] or the score-based generative modeling, because our result provides a link between machine learning technique based on optimal transport and non-equilibrium thermodynamics for diffusion dynamics described by the Fokker-Planck equation.Because the denoising matching [107,108] or the reversed stochastic differential equation [109] can be described by proposed interpolated dynamics, our framework may be helpful in learning process by the diffusion-based generative model.Because our framework also can provide a link between the reversed stochastic differential equation and optimal transport theory, which is well used in generative models such as the Wasserstein generative adversarial networks [110], it might be interesting to consider a link between the diffusion model and learning based on the L 2 -Wasserstein distance in our framework.A connection to information thermodynamics [111] is also interesting.For example, the information-thermodynamic quantities, called the partial entropy production and the transfer entropy [112][113][114][115][116] can be information-geometrically treated by the projection theorem [25,117], and the optimal transport for the subsystem is related to the problem of the finite bit erasure [118][119][120][121] and the problem of the minimum partial entropy production in the subsystem [31,122].Applications to the evolutionary process [45,[123][124][125] is also interesting because information geometry provides a geometric interpretation of the Price equation [126,127].As a generalization of the gradient flow expression, an approach based on the general equation for non-equilibrium reversible-irreversible coupling (GENERIC) [128][129][130][131] or the transport Hessian Hamiltonian flow [132] might be promising, The experimental application of geometric thermodynamics to biological dynamics is interesting [37,133,134], and the quantitative discussion on the design principle of the complex biological system from the viewpoint of geometric thermodynamics could be a significant topic in the near future.