The information geometry of two-field functional integrals

Two-field functional integrals (2FFI) are an important class of solution methods for generating functions of dissipative processes, including discrete-state stochastic processes, dissipative dynamical systems, and decohering quantum densities. The stationary trajectories of these integrals describe a conserved current by Liouville’s theorem, despite the absence of a conserved kinematic phase space current in the underlying stochastic process. We develop the information geometry of generating functions for discrete-state classical stochastic processes in the Doi-Peliti 2FFI form, and exhibit two quantities conserved along stationary trajectories. One is a Wigner function, familiar as a semiclassical density from quantum-mechanical time-dependent density-matrix methods. The second is an overlap function, between directions of variation in an underlying distribution and those in the directions of relative large-deviation probability that can be used to interrogate the distribution, and expressed as an inner product of vector fields in the Fisher information metric. To give an interpretation to the time invertibility implied by current conservation, we use generating functions to represent importance sampling protocols, and show that the conserved Fisher information is the differential of a sample volume under deformations of the nominal distribution and the likelihood ratio. We derive a pair of dual affine connections particular to Doi-Peliti theory for the way they separate the roles of the nominal distribution and likelihood ratio, distinguishing them from the standard dually-flat connection of Nagaoka and Amari defined on the importance distribution, and show that dual flatness in the affine coordinates of the coherent-state basis captures the special role played by coherent states in Doi-Peliti theory.

equation between the advanced and retarded "response functions" to perturbations, and the correlation function of fluctuations of the coordinate variables.
The coordinate field along with the response field introduced in [52] can be recast from operators back to ordinary classical fields if the classical fields become variables of integration in a path integral, through the equivalence of operator formulations and path-integral formulations known in quantum mechanics since Feynman and Hibbs [28]. An especially useful observation by Kamenev [42] is that the response and correlation functions of classical stochastic processes have exactly the algebra of "observable" and "response" fields defined by Keldysh [43] as a rotation of field variables from the pairs of forward-and backward-evolving state fields in Schwinger's time-loop formulation [63] of time-dependent quantum density matrices. Through this algebraic equivalence the role of systems of response fields introduced as duals to coordinate fields is represented generally in what we may call two-field functional integrals (2FFI). The path-integral formulation of the Schwinger time-loop was the original example, which the operator methods of [52] show may be extended to quite general stochastically perturbed dynamical systems.
For a certain class of Markov jump processes on discrete state spaces, which we may generally describe as stochastic population processes, a direct construction of the 2FFI representation for generating functions, from the transition matrix of the master equation, was worked out by Doi [21,22] and Peliti [57,58]. (This case was the path integral studied in [42].) The direct construction [44,68,71] makes clear the correspondence between the response field and the jump operator in the master equation, giving a mechanistic meaning to the operator-field interpretation of [52]. The Doi-Peliti integral representation also shows that it is the Liouville operator-the representation of the generator of time translation acting in the space of generating functions [74]-that is the Hamiltonian of the Hamilton-Jacobi theory [13,69].
In the large-deviation limit, which will be the subject of this paper, the Liouville operator can be replaced with a function on deterministic phase-space trajectories, which becomes the generating function for the Lagrange-Hamilton duality between the tangent and cotangent bundles on the coordinate manifold of first moments of the dynamical distribution. Doi-Peliti theory is particularly expressive of the origin and nature of two-field duality because Peliti's construction [57,58] of the functional integral employs a representation of unity in the space of generating functions in terms of outer products of basis functions parametrized by the observable field, and projection operators parametrized by the response field introduced in [52]. The conservation of "data" entailed by a Liouville theorem is nothing other than the requirement of completeness in a representation of unity in a Hilbert space of generating functions.

The information geometry induced by symplectomorphisms acting on probability distributions evolved under stochastic processes
The 2FFI representation of Hamilton-Jacobi theory will give a direct mathematical interpretation of the conserved phase space density, as the Wigner function of the two-field integral (see Sect. 3). This function is known in quantum mechanics as a semiclassical approximation to quantum density matrices in time-dependent thermal ensembles.
We may better understand its statistical interpretation and its relation to time reversal, however, by going beyond the simple Hamiltonian volume conservation already inherent in 2FFI theories, to derive dual parallel transport relations for vector fields induced by symplectomorphisms on the cotangent bundle, as these relate to the Riemannian geometry induced by cumulant generating functions and their Legendre duals, the large-deviation functions [73]. The construction of those aspects of the information geometry of two-field integrals is the main work of this paper.
Information geometry derives from divergence functions between probability distributions [2,7], and provides coordinate-independent constructions for the metric distance between pairs of distributions, or the overlap between two directions of change in the form of the extended Pythagorean theorem [3,54]. In particular, these measures are invariant under the symplectomorphisms generated by time translation, so that dynamical conservation is implied by coordinate invariance.
The probability distributions corresponding to phase-space points along the largedeviation trajectories in Doi-Peliti theory may vary either through change of the underlying probability distribution (e.g., through change in initial conditions) or through change of the argument of the generating function at which that underlying distribution is being studied. The second source of change has an interpretation in terms of biased sampling from the underlying distribution, and establishes a connection between generating functions and methods of importance sampling [56] in statistical inference.
Inference is the pertinent context within which to understand the time-reversal of Hamilton-Jacobi equations for stochastic processes, because it relates directly to the adjoint duality between the Kolmogorov forward and backward equations [33], equivalently either evolving probability distributions forward in time or operators corresponding to observables backward in time. The representation of adjoint duality in terms of time-reversed evolution has been studied extensively [37,38,69] in the context of fluctuation theorems [64].
Here we will study shifts in the underlying distributions evolving under some stochastic process, and shifts in the sampling bias in generating functions, as sources of vector fields which are parallel transported under the canonical transformations on phase space induced by time translation. In particular we derive affine connections under which, as differences in underlying distributions are attenuated over time due to dissipation, differences between degrees of sampling bias are amplified in the Doi-Peliti integral. The exact compensation of these two effects in Hamilton-Jacobi theory identifies conserved phase-space densities as densities of the information relevant to inference between protocols for biased sampling and features of underlying distributions.
We derive a particular form of this relation, in which the information delivered under varying degrees of sampling bias about the difference between nearby distributions is the inner product of two vector fields in the Fisher information metric. The conserved densities of sample-bias information under time translation are none other than the eigenvalues of the Fisher information. Dual parallel transport of the vector fields describing shifts in the underlying distribution and shifts in the sample-bias factor preserves this inner product along the one-parameter family of symplectomorphisms in the phase space generated by time translation, making the sample information about distribution change, like the phase-space density, an invariant of motion.

Organization of the presentation
Section 2 opens with brief reviews of the three background areas needed in this work, beginning with spaces and dualities defined at a single time and then continuing to those added with time evolution. We first consider the way general probability distributions are represented by Laplace transforms as contours in phase space, on which the fundamental Riemannian metric of information geometry, and Legendre duality between conjugate phase-space coordinates, are defined. We then review the correspondence of a generating function to an exponential family of tilted distributions with the interpretation of importance distributions produced by biased sampling of the base distribution. Third we introduce time evolution under a stochastic process generator. We review the construction of the Doi-Peliti two-field integral representation of generating functions, followed by the large-deviation limit in which the Liouville operator becomes a classical Hamiltonian and the generating function for duality between coordinates in the tangent and cotangent bundles over the configuration space.
The main results of the paper are derived in Sect. 3. The Wigner function is defined and shown to be the phase space density conserved under Liouville's theorem. The two vector fields associated with the dual coordinates in phase space are constructed, to describe variations within exponential families through change of the sample bias, and change across families through change in the base distribution. The transport law for the Fisher metric, and dual affine connections respecting the symplectic structure, are then derived. Section 4 contains a simple worked example illustrating all aspects of the 2FFI Liouville theorem and its associated dual geometry. Section 5 concludes, relating the familiar treatment of adjoint duality in terms of time reversal [37] to the interpretation developed here as a duality between dynamics and statistical inference.

Review: Legendre duality and information geometry, importance sampling, and Lagrange/Hamilton duality
Phase spaces with conjugate coordinates play several roles in the constructions to follow. At single times, they host the Legendre duality and Riemannian geometry of surfaces representing probability distributions through their Laplace transforms. Within the exponential family over any one distribution, conjugate fields separate the nominal distribution from other members of the family that take on the interpretation of importance distributions. When probability distributions are evolved in time, the phase spaces in which they are embedded become cotangent bundles of the dynamics, with Lagrangian-Hamiltonian dualities to the tangent bundles that depend on the generator of time translation but not on the states being evolved. In this section we establish 432 E. Smith notations for all three roles and show how they are instantiated in Doi-Peliti functional integrals.

Embedding of probability distributions as surfaces in phase space, and the geometry in those surfaces
The state spaces on which the Doi-Peliti construction can be carried out to produce generating functions and functionals are non-negative integer lattices in some number of dimensions D, which here will be assumed finite for simplicity. Dimensions are indexed i ∈ 1, . . . , D, and lattice points in the state space are indexed by vectors with integer components n ≡ (n i ) with each n i ≥ 0. Examples include such phenomena as stochastic population processes, in which i indexes the types of members in the population and each n i is the count of individuals of type i. Well-developed applications include evolutionary populations [70] and chemical reaction networks [44,71]. The object of study will be continuous-valued probability distributions ρ over states n, indexed ρ n . The Laplace transform (for a countable basis {n}, also termed the z-transform) of ρ is the moment-generating function If we write each component z i ≡ e θ i , ψ(θ ) = log (z) is the cumulant-generating function (CGF), which will be convex when written in coordinates θ . As a shorthand we have written z n as the component-wise nth power of z. If n is regarded as a column vector and θ a row vector, the Cartesian inner product is abbreviated θ n ≡ θ i n i , adopting the Einstein summation convention. The gradient of ψ(θ ), which we denote is the expectation of n in a distributioñ tilted with weight z n , which we will interpret later as a sampling bias or likelihood function. The continuous coordinate vectors n make up the points in a manifold Q that will be the coordinate manifold with respect to which geometries and dynamics are defined. Pairs of values (θ, n) define a 2D-dimensional phase space over the coordinate manifold Q. Later, when dynamics is introduced, that phase space will become the cotangent bundle T * Q over manifold Q, on which a Lagrangian-Hamiltonian duality is defined. The first relation we study in this phase space, however, does not depend on its interpretation as a cotangent bundle; it is Legendre duality between values θ and n induced by a convex CGF ψ.
For convex ψ(θ ), the function n(θ ) is invertible, and the inverse function θ(n) is a D-dimensional surface, single-valued at each n, in the phase space. Each such surface has an intrinsic Riemannian geometry induced by the Hessian of ψ, known as the Fisher tensor, introduced in the interpretation as a Riemannian metric by Rao [61] (reprinted as [62]). Using the differential geometry notation in which is the set of basis elements in the tangent space to the exponential family, the Fisher metric is an inner product, which we denote The manifold Q with metric (4) admits a variety of dual transport relations induced by pairs of connections compatible with that metric, as developed by Amari and Nagaoka [3,54]. Each hypersurface θ (n) is generated by a single underlying distribution ρ (which we will call the "base distribution"). The whole CGF ψ(θ ) is thus identified with an exponential family of distributions tilted from ρ at each value of z = e θ . The dual connections of information geometry provide a way to understand the information divergences of distributions in such a family with others either inside or outside the family, in familiar geometric terms such as least-distance projections onto the family.

Foliations in phase space and the inner product between variations within and between exponential families
We will be interested here in families of base distributions, denoted ae (n 0 ) n , for which the exponential families defining the CGFs, over all values of the index n 0 , provide a foliation of the phase space, as shown in Fig. 1. The CGFs and tilted distributions that make up each leaf are denoted The index n 0 is the value of n(θ ) at θ = 0 in the distribution ρ (n 0 ) . Equation (2) implies that the Fisher metric (4) is a coordinate transformation from contravariant to covariant coordinates: Foliation of a 1-species phase space (θ, n) by the contours θ (n) = log (n/n 0 ) in the exponential families over each of a sequence of Poisson base distributions with mean values n 0 (at θ = 0). The exponential family and its associated Legendre-dual function θ (n) for one value of n 0 is drawn bold and labeled. Overcomplete bases of such Poisson distributions with continuous Poisson parameter are used in the construction of a representation of unity in the Peliti [57,58] functional integral construction. Covariant vector fields corresponding to ∂/∂θ (within leaves) characterize change in tilted distributions driven through sampling bias, while contravariant vector fields corresponding to ∂/∂n 0 (across leaves) characterize change in the base distribution. The inner product δθ δ n 0 n will be preserved by parallel transport of these vectors under maps of the phase space generated by time translation If ψ(θ ) is convex, the transformation (7) is invertible. The inverse of the coordinate transform, and with it the Fisher metric, is obtained from the Legendre transform of ψ(θ ), where θ (n) is the maximizer of the argument in Eq. (8) over θ values. ψ * (n) is the Large-Deviation function (LDF). By construction its gradient gives the inverse function From the definition (1) of the generating function and the extremization condition (8), ψ(θ ) is seen to be, in leading exponential approximation, a kind of convex approximation to − log ρ n for n ≈ n, of which θ is the gradient. These interpretations help to give meaning to the evolution of ψ(θ ) under a Hamilton-Jacobi equation derived below in Sect. 2.3.4.
From Eq. (9), the inverse coordinate transform is the inverse of g(θ ) from Eq. (7). The distance between two distributions in the exponential family is then written in either contravariant coordinate differentials δθ or covariant coordinate differentials δn as The Fisher metric can be obtained as the projection of the Euclidean metric in R D under a spherical embedding of the distribution ρ n , briefly reviewed in Appendix A.1, providing a third set of coordinates for the tilted distributionρ (θ) . We note this embedding because for the exponential families that will play an important role in stationary-point methods later, the distance between distributions on the Fisher sphere in potentially infinite dimensions {n} reduce to divergences of the same form on the phase space, as shown in Appendix A.2.
The distance element (11) measures the divergence between two nearby distributions in an exponential family in the geometry derived from ψ(θ ). For two distributions related to a common base distribution through shifts respectively in θ or in n 0 , we seek a measure of the degree to which the two changes overlap. The extended Pythagorean theorem [3,54] for Kullback-Leibler (KL) divergences provides such a measure, where in the small-coordinate limit it becomes an inner product of the two coordinate displacements in the Fisher metric: Recognizing that ∂ψ/∂θ i = n i (θ, n 0 ), the inner product in the final line of Eq. (12) is just the sensitivity of the mean in the tilted distribution to variations in the base.

Preservation of the inner product in connection with Liouville's theorem
A coordinate change from the mean in the base distribution to the mean in the tilted exponential distribution produces the metric in mixed coordinates that by construction is the Kronecker δ, 436 E. Smith and the coordinate inner product We may ask, for what one-parameter families of coordinate systems (θ (t) , n(t)) is the coordinate inner product (14) conserved across the family? A one-parameter family of coordinates generates a one-parameter family of maps of vector fields δθ and δn by the action The condition d dt will be met whenever where · denotes d/dt. Eq. (18) is satisfied if there is a symplectic form L(θ, n) in terms of which the velocity vectors along trajectories can be writteṅ Alternatively, if Eq. (18) holds everywhere, the form L(θ, n) can be constructed by integration.
Eq. (18) relates the dual contravariant and covariant coordinates under the Fisher metric as canonically conjugate variables in a Hamiltonian dynamical system. We return in Sect. 2.3 to derive symplectic forms L(θ, n) from the generators of stochastic processes.

Tilted distributions as importance distributions, and particular application to large deviations
In order to furnish an interpretation of the 2FFI symplectic transport structure, we appeal to statistical inference to assign informational meanings to the conjugate fields z that exponentially tilt the base distributions in generating functions. In importance sampling, z takes on the interpretation of a sampling bias known as a likelihood ratio, which transforms the measure for samples. In the terminology of importance sampling [56], the base distribution ρ (n 0 ) corresponds to the nominal distribution, and the tilted distributionρ (θ,n 0 ) n of Eq. (6) plays the role of an importance distribution. The normalized exponential tilt e θn−ψ(θ,n 0 ) is the corresponding likelihood ratio, also called the Radon-Nikodym derivative of the measure between the base and the importance distributions. Importance distributionsρ (θ,n 0 ) can be chosen to concentrate the density of samples away from the mode of ρ (n 0 ) to values of n that are more informative about observables of interest. Tilts are typically chosen to minimize some cost function, such as the variance of samples. The large-deviation function can be derived as a leading exponential approximation to the tail weight of the base distribution, in a protocol tuned to minimize sample variance, as shown in the following construction from [66].
To illustrate with an example in one dimension, an estimate of the probability that a particle count n exceeds some boundn can be obtained by sampling values of the random variable the indicator function for n >n. In the base distribution ρ (n 0 ) , the probability for n >n is An unbiased estimator for P(n >n | n 0 ) can be obtained by using the tilted distributionρ The tilted estimator is unbiased because A few lines of algebra, provided in Appendix B, show that an exponential bound for the estimator at any choices of θ andn is given by The variance of the same sample estimator has a corresponding bound (see Eq. (146)) The parameter θ that minimizes the bound on sample variance (25) also gives the tightest bound (24) on the tail weight. It is the minimizing argument θ (n) of Eq. (8), so the bound is given in terms of the LDF as.
Without further assumptions about ρ (n 0 ) it is not possible to say more about the ratio h (n) (n 0 ) /e −ψ * (n) . The relevant additional property, which is also associated with the use and tightness of saddle-point approximations [36] in Doi-Peliti theory, is the onset of large-deviations scaling; that is, if n 0 andn are increased together in proportion to some scale factor N asn = Nν, n 0 = N ν 0 , the following two limits should exist: Then the variance-minimizing tilt θ likewise has a limit, the variance ∂ 2 ψ/∂θ i ∂θ j in Eq. (4) scales as N , and the relative variance scales as 1/N . Appendix B shows that in this limit the log ratio log h (n) (n 0 ) /e −ψ * (n) ≤ O N 1/2 , compared to ψ * (n) ∼ N .

The Doi-Peliti integral construction of generating functions for time-dependent probability distributions
Canonical transformations on phase space [35] are introduced when distributions ρ are evolved under the generators of stochastic processes. A formally exact representation of the time-translation of distributions over finite intervals, acting directly in the representation by generating functions, is given by the Doi-Peliti construction [21,22,57,58]. (See [9,42,68] for other self-contained introductions.) Phase-space coordinates in Doi-Peliti theory take on interpretations in terms of distributions and dual projection operators, equivalent in the path integral to the non-commuting operator fields of [52], through Peliti's construction [57,58] of a representation of the identity in the Hilbert space of generating functions.
In the path integral, tangent vectorsṅ to configurations, and phase-space dual coordinates θ , are independent dummy variables of integration. It is the large-deviation limit, realized in Doi-Peliti theory by saddle-point evaluation of the functional integral, that couples coordinate pairs (ṅ, n) and (θ, n) as Lagrangian and Hamiltonian dual coordinate systems, with the Liouville operator-representation of the stochastic process generator as the Hamiltonian and the generating function of the coordinate transformation. The tangent vectors to stationary trajectories generate a family of symplectomorphisms in phase space, and the LDF evolves as a solution to the corresponding Hamilton-Jacobi equation [13,69].
A tutorial on the defining steps in Doi-Peliti theory is given in this subsection. Supporting algebra, where needed to provide a complete self-contained definition of the method, is provided in Appendix C. Doi-Peliti theory was invented to solve Markov jump processes on integer lattices of the kind that describe population processes, and while providing a definite notation and formally-exact integral solution, neither the indexing nor the particular exponential generating functions used here are meant to be a general notation describing all 2FFIs. Appendix D, working in the simplifying limit of Gaussian fluctuations, provides derivations starting from Doi-Peliti theory of equivalent representations in terms of the Langevin stochastic differential equation and the Fokker-Planck equation [74], which manifestly have no restriction to the specific assumptions made in the Doi-Peliti construction. The derivations leading to them may be "inverted" to equivalence classes of 2FFI path integrals in the Gaussian limit, defined by a common integral kernel form explained in [42], and equivalent to the Dyson equations for general stochastic processes derived in [52]. Saddle-point methods in Doi-Peliti theory are seen to be one instance of the eikonal approximations for large-deviation functions, developed in general form by Freidlin and Wentzell [30], and the forms derived from the action here are related to specific forms arising in that work.

Generator, Hilbert space, and quadrature
The starting representation for time evolution is the continuous-time master equation of the probability distribution ρ T nn is known as the transition matrix, and is the representation of the generator of the stochastic process acting in the space of probability distributions. It is left-stochastic, meaning n T nn = 1, ∀n , ensuring conservation of probability. The matrix elements T nn can depend on the time t, though in the example developed in Sect. 4 we will use a time-independent generator for simplicity. The z-transform (1) induces a representation of the generator of time translation acting in the space of moment generating functions in the form of a Liouville equation L(z, ∂/∂z), called the Liouville operator, is conventionally defined with the minus sign of Eq. (29) because its spectrum is non-negative. For many purposes, the properties of the generating function as an analytic function of a complex variable [29] are not needed, and the algebra of the MGF as a formal power series [77] is sufficient. An operator algebra due to Doi [21,22] replaces the variable z and derivative ∂/∂z with formal raising and lowering operators In the condensed notation (1) for vector inner products, we will regard a as a column vector and a † as a row vector. Associated with operators a † i and a i are bilinear number operatorsn i ≡ a † i a i (no Einstein sum), of which the basis monomials z n under the mapping (30) correspond to number eigenstates. Number states are denoted |n), and the MGF (z) is represented as a state vector | ) defined in terms of number states as n ρ n |n) ≡ | ) .

E. Smith
The Liouville equation (29) becomes in which the the Liouville operator L a † , a is the former function L(z, ∂/∂z) under the substitution (30). An inner product must be defined to make generating functions states in a Hilbert space. In the Doi theory this is done formally by introducing a dual state (0| satisfying (0 | 0) = 1. Conjugate number states (m| with inner product (m | n) = δ mn are then built up using lowering operators. The correspondence of the Doi dual ground state with a projector on analytic generating functions, and the inner product known as the Glauber norm, corresponding to the evaluation of any MGF at argument z = 1, are given in Appendix C.1. The analytic form (1) of the MGF can be recovered using a variant of the Glauber norm, as The objective in introducing the Doi operator formalism is to more conveniently compute the quadrature of the Liouville equation (29), formally written | ) T is the generating function for the distribution ρ n evolved to time t = T from the generating function | ) 0 for an initial distribution given at time t = 0. T denotes time-ordering of the exponential integral, defined operationally in the second line of Eq. (34), in terms of a time-ordered product of applications of L a † , a evaluated at the sequence of times kδt.

The Peliti coherent-state expansion and two-field functional integral
The implications of the time-ordered operator algebra in the quadrature (34) for correlation functions may not be at all easy to derive or approximate, and the purpose of Peliti's functional-integral construction is to supply evaluation methods, and even more importantly, approximation methods including saddle-point evaluations and a systematic approach to perturbation theory such as exists for path-integral formulations in quantum mechanics [28]. A straightforward step in the Peliti construction is to expand arbitrary generating functions in a basis of eigenstates of the lowering operator a. However, whereas the indices {n} are countable, the eigenvalues of a are continuous, so such a basis is overcomplete. The important and non-trivial step in the Peliti construction is to identify a representation of the identity operator in the space of generating functions, which requires defining projectors that are left-eigenstates of the raising operator a † , and showing that an expansion in these two continuously-indexed sets of states is equivalent to the identity operator on number states n. It is in this second step that the response field in phase space enters the functional-integral construction for stochastic processes.
The eigenstates of the lowering operator will be the moment-generating functions of products of Poisson distributions, and these distributions will play a central role in the interpretation of stationary-point approximations in the functional integral. In the Poisson distribution with mean n, the expectations of factorial moments [10], defined (again, component-wise) as n k ≡ n!/(n − k)!, are n k for all k. Products of Poisson marginals, or multinomial distributions that are sections through such products at fixed N ≡ i n i , both serve as saddle-point approximations to more general distributions in the Doi-Peliti integral, and also form an important class of exact solutions for some applications such as chemical reaction network models [4]. Tilts of Poisson (35) or multinomial (36) distributions remain of the same form, so that nominal and importance distributions will have a uniform relation to phase-space points at all pairs (θ, n) in Fig. 1.
The fixed form of all moments in Poisson and multinomial distributions as functions of the mean makes these minimum-information distributions. For these distributions the Fisher spherical embedding of divergences of distributions ρ on the infinite index set {n} can be reduced to only D dimensions in the coordinates {n i /N }, as shown in Appendix A.2. Related simplifications, from forms of adjoint duality [38] that generally require infinitely many parameters, to simple coordinate transforms in the Doi-Peliti functional integral, are reviewed below in Sect. 2.3.6.
The generating function of a product distribution with a vector of Poisson param- as may be checked by an elementary expansion of the Taylor's series for the exponential. These states are eigenstates of the lowering operators: From the inner product of the base projection operator (0| and number states (m| introduced above, a similar Taylor's series expansion of the exponential verifies that 442 E. Smith left eigenstates of the raising operator must take the form The normalization in Eq. (39) has been chosen for convenience, so that the inner product of two such coherent states will evaluate to and in particular φ † φ = 1. Note that the inner product (40), if expanded in the power series for the two coherent states, identifies a tilted distributioñ so we recognize that the coherent-state parameters φ index a space of base distributions, and the dual fields φ † are the biased-sampling weights in an exponential family of generating functions in which log φ † takes the place of the coordinate θ in Eq. (3). In the language of importance sampling from Sect. 2.2, Eq. (41) are importance distributions with mean parameters n i = φ * 2i φ 1i . The result enabling the Peliti functional integral construction is that the following integral of outer products constitutes a representation of the identity in the space of generating functions: a result demonstrated in Eq. (160) of Appendix C.2. From the insertion of copies of the identity (42) at a sequence of times t = kδt for some small δt into the quadrature (34), some algebra gives the generating function (1) in the form of a functional integral with what Feynman and Hibbs [28] term a "skeletonized" functional measure (defined in Eq. (162)), as e ψ 0 log φ † 0 is the initial generating function | ) 0 in Eq. (34). S in Eq. (43) has the form of a Lagrange-Hamilton action functional, in which the "kinetic term" that creates a Doi-Peliti Lagrangian comes from the inner product (40) between projectors and states in two expansions for the identity operator at closely-spaced times. While the CGF (43) is formally a function of the coordinate θ = log z, the functional integral (44) and action (43) are at the moment expressed in terms of coherent-state fields φ † , φ . As noted in Eq. (41), these variables are on one hand mathematically expressive, as they distinguish the mean in a base distribution from the likelihood ratio in an importance distribution. On the other hand they make calculation of the mean in the importance distribution inconvenient because it is a bilinear form in φ † and φ, and more importantly, the Hessian of the function ψ T is not generally positive-definite in coordinates z.
Therefore we introduce the first canonical transformation from the Peliti coherentstate coordinates that will be of interest in this work, which is a simple logarithmic point transformation The fields n and θ would have the interpretation of a mean molecule number and conjugate chemical potential in a chemical-reaction model, so we refer to these as number-potential coordinates, distinguishing them from coherent-state coordinates. The action (44) becomes in which L(θ, n) is L φ † , φ with φ † and φ written as functions of n and θ by Eq. (45).

The large-deviation limit and Hamiltonian stationary trajectories
The large-deviation limit of the path integral (43) is a leading-exponential approximation in some value such as φ that becomes large to reflect large population size or a similar scale variable. In that limit the integral is approximated by the value of the exponential kernel at its saddle point, which is any stationary solution of the action S and the initial and final boundary terms. The stationary trajectories for S in the form (44) satisfy the pair of equations of motioṅ The final-time boundary value for the field φ † is given by the vanishing derivative of the exponent in Eq. (43) with respect to φ T , resulting in φ † T = z. The initial time boundary value for φ is given, after an integration by parts, by the vanishing derivative with respect to φ † 0 , and depends on the form of the CGF ψ 0 and the stationary-path value of its argument φ † 0 . The two conditions are solved self-consistently through the Eq. (47).
Joint stationary values for φ † , φ at intermediate times t, in a generating function with argument z imposed at a final time T , represent the bundle of rays in the statistical model that dominate the contribution to the importance distribution at later times. For this reason the stationary trajectory in the base distribution is not generally independent of the trajectory for the tilt in systems with non-linear equations of motion.
The stationary-path equations corresponding to Eq. (47) in the logarithmic numberpotential coordinates (45) arė These equations are instances of the symplectomorphisms (19). The triangle inequality of divergences (12) within and across exponential families is therefore an invariant of motion.

Duality with the Lagrangian, and the Hamilton-Jacobi equation
The equation of motion (48) for n introduces a coordinate transform from phase-space coordinates (θ, n) to coordinates (ṅ, n) in the tangent bundle T Q over manifold Q of n values. −L(θ, n) is the generating function for the transformation and the Hamiltonian on phase space. The Lagrangian dual to −L, is the argument of the action S up to a total time derivative in θ n. By construction, L is the generating function of the inverse coordinate transform, satisfying Thus Lagrange-Hamilton duality identifies the phase space with the cotangent bundle T * Q to the coordinate manifold. For any initial distribution ρ encoded in the initial-time generating function ψ 0 (θ 0 ) in Eq. (43), we may write the action (46) evaluated along a stationary trajectory as After performing the Legendre transform (8) to the LDF to cancel the surface term in Eq. (51), we find that ψ * (n), at any time, satisfies the pair of equations The Hamiltonian equations (48) generate a family of maps of phase space, and under any of these maps the initial-value contour θ 0 (n) for any distribution ρ is mapped to a contour θ t (n), along which ψ * t (n) is a solution to the Hamilton-Jacobi equation (52).

The commutative diagram of maps from Legendre and Lagrangian-Hamiltonian dualities
The Legendre and Lagrange-Hamilton dualities of Doi-Peliti theory define a system of D-dimensional coordinate transformations that form a commutative diagram, summarized by Loek and Zhang [47]. Equations (48, 49) define a map (ṅ, n) ↔ (θ, n) between the tangent and cotangent bundles, T Q ↔ T * Q, describing the same trajectories in respectively Lagrangian and Hamiltonian coordinates. Integration of these equations over any finite time interval t defines a map from pairs of initial and final points (n t , n 0 ) in Q × Q to points in T Q at either initial or final times, by assigning initial and final velocities (ṅ t ,ṅ 0 ) at those points. Combined with the maps T Q ↔ T * Q at both times, the Lagrangian map between times generates a symplectomorphism in the cotangent bundle T * Q ↔ T * Q equal to the direct integration along stationary trajectories (48). The Legendre duality θ (n) at a single time is defined separately on the exponential families over each of the base distributions ρ (n 0 ) that form the leaves in the foliation of Fig. 1, but does not depend on the generator of time translation. The Lagrange-Hamilton duality depends on the generator of time translations and not on the choice of exponential families used to foliate the cotangent bundle. Because points (θ, n) on stationary trajectories in Doi-Peliti theory correspond to tilted distributions over Poisson base distributions (35), the canonical foliation of phase space in Fig. 1 is singled out by the Peliti representation of unity (42). Note that these distributions constitute a 2D-dimensional subspace of the space of all distributions on the lattice {n}. It is shown in [69] that these special distributions, because they are minimum-information distributions, have the interpretation of macrostates for a general equilibrium or nonequilibrium thermodynamics.
In the framework of [47], the Doi-Peliti integral (43) and other two-field functional integrals are defined on the Pontryagin bundle T Q ⊕ T * Q of triples (θ,ṅ, n) in which all three variables are independent. It is only with the reduction to saddle-point trajectories (and only if the Doi-Peliti integral is evaluated in its continuous-time limit, and not with a finite time-step in the inner product (40)), that these triples are reduced to Lagrangian and Hamiltonian descriptions of the same trajectories with no "duality gap".

Coordinate transformations in phase space expressing adjoint duality as time reversal
A large body of work on "fluctuation theorems" [64], although not required to derive the foregoing dualities or their implied conservation laws, is nonetheless helpful in interpreting the meaning of the conserved volumes in Doi-Peliti theory. Fluctuation theorems may be seen as extensions to finite time intervals of the study of instantaneous adjoint duality between the Kolmogorov-backward and -forward evolution [33] under the transition matrix T nn . Hatano and Sasa [38] recognized that application of a suitable similarity transform to the transition matrix exchanges the forms of T nn and its adjoint: that is, the similarity transform exchanges the forms of the Kolmogorov-backward and -forward equations, making the evolution of distributions appear like that of observables and vice versa. Performing the similarity transform at every moment in an extended-time quadrature of the form (34) has the effect of multiplying each path of the stochastic coordinate n by a weight [65], creating an extended-time generating functional for paths. 1 The required similarity transform is generally non-locally determined and requires solution for stationary distributions on the whole state space.
An exception arises for cases in which the stationary distributions are minimuminformation distributions such as the Poisson (35), for which the phase-space coordinates (θ, n) determine all fluctuation moments. In these cases the similarity transform in the potentially-infinite-dimensional state space {n} can be projected to a coordinate transform in the phase space [69]. Moreover, the required path weights of [65] for all paths are given by a D-dimensional, time-local correction to the Liouville function L, making evolution under the adjoint generator appear like ordinary stochastic evolution in reverse time.
One such coordinate transformation, exchanging the roles of coordinate and response fields in phase space, was first used in Doi-Peliti integrals by Baish [11]. Let n be the saddle-point value of the field n in Eq. (45) in the steady state that would be annihilated by T nn at the parameters it possesses at some time. If T nn and thus L is explicitly time-dependent, then the scale factor n will generally be different for each time. Define coherent state fields rescaled locally by n as The form of the action in fields ϕ † , ϕ remains as in Eq. (44) but the Liouville function L is replaced by a possibly-shifted functioñ If we follow the Baish transformation (53) with a logarithmic transform equivalent to (45), but in the descaled fields, 1 It is impossible to fairly represent the motivations and scope of what has now become a significant fraction of work spanning dynamical systems and statistical mechanics. The study of generating functions for reverse-time trajectories began in dynamical systems [16,25,31,32], and was later taken up in similar form for non-equilibrium stochastic processes [14, 19, 20, 24, 26, 38-41, 45, 46]. Reviews of parts of this literature from different stages in its development and from different domain perspectives include [15,27,37,64].
we coordinatize the base distributions in Fig. 1 as exponential families, and the dependence of number n on θ in mixture families. The evaluation of the triangle inequality (12) is unchanged, but the coordinates in which vector fields are expanded are transformed. The action (44) in the new variables becomes where the modified Liouville functionL from Eq. (54) must be used, such that in the new variablesL The symmetric exponential families on phase space represented by the coordinate systems (θ, n) from Eq. (45) and (n, η) from Eq. (55) provide natural variables in which to expand the coefficients of affine connections constructed from the ψ-divergence below in Sect. 3.3.

The Liouville theorem connecting dynamics to inference induced by two-field stationary trajectories
In the review of standard results in Sect. 2, we have shown where a phase space with conserved volume under mappings along Hamiltonian stationary trajectories originates within Doi-Peliti theory, and identified a particular form (15) for the volume element preserved by a mapping of coordinate differentials along these trajectories. In this section we compute two quantities that are conserved under time translation and develop their interpretations as information measures. The first is a scalar phase-space density, which we identify as the Wigner function of semiclassical Liouville evolution. The second is an inner product of vector fields, which we express in coordinate-invariant terms by defining affine connections for dual parallel transport compatible with the Riemannian metric (4). The distinctive feature of the parallel transport derived here is that it is affine-flat in the coherent-state variables of Sect. 2.3.2, rather than in the exponential coordinates θ normally used to define dually-flat parallel transport, as in Ch. 6 of [3,54]. The importance-sampling interpretation of Sect. 2.2 identifies a geometric role for coherent-state variables that we believe has not been recognized. Dual parallel transport under these affine connections expresses conservation of the Liouville volume element as a result of compensation, in the flow of phase-space trajectories, between the resolution of underlying base distributions and the discrimination imposed by sampling bias. The information available from the large-deviation probability as a sample estimator, about differences between possible initial distributions, is carried by the eigenvalues of the Fisher information metric, which are transported as invariants along stationary trajectories.

E. Smith
The canonical transformation of Sect. 2.3.6, which exchanges the roles of fields representing the base distribution and the sampling bias in the Doi-Peliti integral, is used to express the conserved information volume in symmetric form in terms of the Fisher metric. The symmetric representation of adjoint duality [37,38] in terms of apparent time-reversal in the Doi-Peliti integral makes clear that the "reversibility" entailed by conserved phase-space volumes in the Hamilton-Jacobi equation should be understood as a reflection of duality between dynamics (the forward propagation of base distributions) and inference (the backward propagation of sample biases).
We begin with the conserved scalar density and its interpretation as a statistical model in the terms of importance sampling, and then define vector fields corresponding to the extended Pythagorean theorem (12) and derive appropriate affine connections for them.

The Peliti functional integral as a statistical model
The Peliti basis of coherent states (37) The argument log φ † ‡ is the CGF coordinate for an exponential family of distributions tilted from whatever base distribution the functional integral produces at time t. The inner product φ † t+dt φ ‡ is a dual mixture coordinate, corresponding to the mean of n in the importance distribution with φ ‡ as the base distribution and φ † t+dt as the likelihood ratio. It is the mean of this importance distribution, together with the log-likelihood that is its dual coordinate in the exponential family, that must transform under symplectomorphism to satisfy the condition (19) for preservation of inner products. We show next that the stationary-path conditions provide the necessary mapping.

The Wigner function from the two-field identity operator plays the role of a phase-space density
The scalar density in 2FFI that fills the role of a phase space density in classical Hamiltonian mechanics is the Wigner function [76], of which versions exist for both classical and quantum systems. 2 It is defined in terms of the representation of unity in Eq. (58), as Eq. (59) implies that, for w t φ † ‡ , φ ‡ at any time, In a saddle-point approximation, one identifies arguments φ † ‡ ,φ ‡ for which, to leading exponential order, Since Eq. (61) approximates the same function at any time t, its total time derivative along the sequence of stationary points must vanish, Moreover, the stationary points should coincide with values along the stationary trajectories (47) of the functional integral (58), which satisfy Equation (62) may thus be recast as the conservation law for a 2D-dimensional current φ † w,φw , which is Liouville's theorem. w t is a density of rays for joint base distributions and likelihood ratios that is conserved along the Doi-Peliti stationary trajectories. log w t is the leading exponential approximation to the value of the CGF. It therefore integrates information along the trajectory from the final-time imposed value of z and the initial-time structure of the generating function ψ 0 log φ † 0 .
The indirect definition (59) of the Wigner function in terms of the functional integral is convenient to manipulate but perhaps not very self-explanatory. Appendix E gives a direct construction of the stationary-point approximation in terms of a density ρ(θ ) over the basis of coherent states |φ) and their Laplace transforms, and verifies that the sequence of stationary points do indeed coincide with the equations of motion (47).

Constraints and conserved current flows in reduced dimensions
Often systems of interest will evolve under constraints arising from conservation laws, such as conserved quantities of the stoichiometry in chemical reaction networks [44,60,71]. Conserved quantities result in flat directions in the CGF and zero eigenvalues of the Fisher metric. Since generally the constraints will involve multiple species, and because the logarithmic canonical transform (45) is defined in the species basis, it will not be possible to factor out non-dynamical combinations. Then the transport equation (64) for the current of the Wigner density will occupy only a sub-manifold of the 2D-dimensional Doi-Peliti coordinate space needed to define the system.
A convenient way to handle constraints is to work in the eigenbasis of the Fisher metric which we will index with subscript α, where a number-potential counterpart to the transport equation (64) reads The picture of the Liouville equation as implying a conserved volume element d dt α δθ α δn α = 0 with the product index α taken only over nonzero eigenvalues of the Fisher metric, remains nondegenerate and has a direct interpretation in terms of the product of eigenvectors of the Fisher inner product in independent dimensions.

The Fisher metric and cubic tensor in dual canonical coordinates
The leading-exponential equivalence of the Wigner density to the CGF from Eq. (61) suggests that the 2D-dimensional differential of the stationary-point CGF should likewise obey a symplectic transport law, implying a transport law for the Fisher metric.
To derive those results we return to the expression of the differential of the CGF in terms of the generalized Pythagorean theorem (12), and derive the Fisher metric from the ψ-divergence following Amari [2, Sec. 6.2].
Base distributions corresponding to points along stationary paths under the action (44) form exponential families, because they are in the class of coherent-state distributions described in Appendix A.2. Therefore label importance distributions (6) symmetrically asρ(θ, η) with the exponential coordinates in the two logarithmic transformations (45) and (55). To study their independent variations about a reference value (θ R , η R ), introduce two distinct exponential families, labeled The ψ-divergence D ψ (θ : η) = ψ(θ ) + ψ * (n) − nθ , a Bregman divergence of the CGF, is related to the Kullback-Leibler divergence of ρ from ρ as The mixed second partial derivative of −D ψ gives the same variance that defines the Fisher metric. At general θ , η, it is labeled At θ = θ R , η = η R , the second line of Eq. (69) recovers exactly the differential form of the Pythagorean theorem of Eq. (12) in dual exponential coordinates. Two third-order mixed partials define the connection coefficients for Amari's dually-flat connections on exponential and mixture coordinates. Written in allcontravariant indices, 3 these are given by

E. Smith
Evaluated at θ = θ R and η = η R , g i j is the Fisher metric introduced in Eq. (4), and T ki j is the cubic tensor, also called the Amari-Chentsov tensor [2]. Below we remove the subscript R and write θ and η as the arguments of these tensors. Introduce two vector fields corresponding to variations in θ at the final time T , and to variations in η at the initial time 0. The first can be independently imposed through the arguments in T (z), while the second can be independently imposed in the initial data. Fields δθ T and δη 0 are written in components as

The dual vector fields induced by base-distribution initial conditions, and final-time tilts
The stationary-path conditions map the dual initial and final coordinates to pairs of coordinates at any intermediate time, which we denote θ(θ T , η 0 , t), η(θ T , η 0 , t). A one-parameter family of vector fields is defined by assigning to each such coordinate Under the change of coordinates from (θ T , η 0 ) to (θ, η) at each time t, the vector fields (73) may be written in terms of the local coordinate differentials as Below we suppress the explicit (θ, η, t) coordinate and time arguments of δθ j (θ, η, t) and δη(θ, η, t), and indicate the time t in a subscript only where it is needed to avoid confusion. The vector fields (74) have a time dependence that can be defined through the dependences of (θ (θ T , η 0 , t) , η(θ T , η 0 , t)) on the boundary coordinates and then transformed to the local coordinate system, becoming Eq. (48) is used to arrive at the third form of each equation in terms of mixed partials of L andL. The coordinate transformation (7) from contravariant exponential coordinates to covariant mixture coordinates may be used in two ways to write the inner product of vector fields δθ and δη in mixed form. From the definition of the inner product in terms of g D in Eq. (69) and its equivalence to the Hessian definition of g in Eq. (71), Although the field variable n is the same in either log-transform (45) or (55), the two displacements δ η n and δ θ n are independent vector fields.

The conserved inner product of dual vector fields, and directional transport of the metric
Equation (75) has a symmetric form but evolves δθ and δη respectively using L and L, making it not immediately apparent that the inner product is preserved. Writing the field δη in its dual mixture coordinate as in the first line of Eq. (76) the time derivative becomes The condition (18) is met and we have Using Eq. (78) to evaluate the change in the inner product written as (δθ) i g i j (δη) j , substituting the derivatives (48) forθ andη, and grouping terms, we obtain the transport equation for the metric along stationary paths The tensor transport equation from Eq. (79) can be compared to Eq. (65) for the transport of the Wigner density.

Dual connections respecting the symplectic structure of canonical transformations in the two-field system
The transport relations derived so far make use of the symplectic structure of maps generated by time-translation along Doi-Peliti stationary paths, but they are not specifically geometric. We now turn to geometric constructions that respect the symplectic structure, render its maps coordinate invariant under canonical transformations, and express the special roles of affine transport in some coordinates such as coherent states through the definition of appropriate Riemannian connections.

Conservation of the inner product through the combined effects of two maps
The inner product (76) is preserved through the complementary action of two maps, one generated by the time-dependence of θ , and the other by the time-dependence of η. By construction, δθ depends on time only throughθ , and δη only throughη, while the metric has no explicit time dependence but changes under both maps as the location (θ, η) changes. Denoting by d/dt|θ and d/dt|η these separate components of change, the time derivative of the inner product can be partitioned into two canceling terms: Connection coefficients may be added within either d/dt|θ δ θ n j or d/dt|η δ η n i to make the components of change in the vector field and metric coordinate-invariant, without altering the duality between independent variations in the base distribution and in the likelihood ratio.
To introduce a connection we first replace the total derivative d/dt with a partialderivative decomposition expressing the same transformation as a flow: Connection coefficients are defined from the pullbacks ∂/∂θ k or ∂/∂η k of infinitesimally transformed basis vectors in the tangent spaces to the two exponential families,

E. Smith
The covariant part of the flow decomposition in Eq. (81) is defined by subtraction of the nonzero connection coefficients from the total derivatives (75), as Compensating covariant derivatives of the metric are ∇ (θ) Equation (84)

Referencing arbitrary dual connections to dually flat connections in the exponential family
The manifold for a Doi-Peliti system with D independent components has dimension 2D, with parallel subspaces for the base distribution and likelihood ratio. The dual connections (82) act within these two independent subspaces, in contrast to the dually-flat connections D and D * of Eq. (70), which act within the same Ddimensional exponential family. Although the Fisher metric is a function only of the overall importance distribution, which aggregates dependence from the base distribution and likelihood, the symplectic transformations from translation along stationary paths separate components of variation from within the two independent subspaces. The subspace decomposition cannot be recovered from the importance distribution alone, and thus no connection defined only from the properties of the Fisher metric is sufficient to identify the dual symplectic connections for a Doi-Peliti system. Nonetheless, we may relate the symplectic dual connections to Amari's dually flat connections and the Amari-Chentsov tensor through the relation (see [2, Eq. (6.27)])

Flat connections for coherent-state coordinates
Of particular interest in Doi-Peliti theory will be the canonical transformations (45) and (55) between coherent-state and number-potential coordinates. We note that the forms of the connection coefficients for which affine transport in fields φ † is flat in the likelihood subspace, and affine transport in fields ϕ is flat in the base-distribution subspace, are 4

On the roles of coherent-state versus number-potential coordinates in the Doi-Peliti representation
The Doi-Peliti solution method is almost always introduced through the coherentstate representation [9,42,53], and for many applications such as chemical reaction networks [8,10,44,71] or evolutionary population processes [70], coherent states are also the "native" representation in the sense that the Liouville operator is a finite-order (generally low-order) polynomial in fields. Moreover, for the importance-sampling interpretation emphasized in this paper, the coherent-state representation separates the nominal distribution and likelihood ratio. On the other hand, Legendre duality is defined with respect to potential fields, which are the tilt coordinates θ in the exponential family of importance distributions, and it is in these coordinates, not the coherent-state coordinates, that the Fisher metric corresponds to the Hessian of the CGF. Indeed, it is not generally possible to define a dual coordinate system from the Hessian of the CGF in coherent-state fields, as we illustrate for the worked example in Sect. 4.5.
The use of Riemannian connections neatly expresses the role of each coordinate system. The elementary eigenvalues of divergence or convergence of bases and tilts, and of information susceptibilities, are often simple in coherent-state coordinates, where they are eigenvalues of coordinate divergence or convergence. In the dual connections (90), covariant derivatives retain these elementary eigenvalues, while inheriting from the exponential family the Fisher geometry that defines contravariant/covariant coordinate duality. A concrete example is given in the next section.
Therefore the connection coefficients (90) in the species basis are For various reasons it will, however, not be convenient to work in the species basis, and the resulting connection coefficients will not generally be constant.

A worked example: the two-state linear system
The foregoing constructions are nicely illustrated in minimal form in a simple, exactly solvable model. It is the stochastic process for N independent random walkers on a network with two states and bidirectional hopping between them. The statistical mechanics of transients, time-dependent generating functionals, and large deviations for this system has been didactically covered within the Doi-Peliti framework in [68].
Though simple, the model is nonetheless rich enough to illustrate the complementary roles of coherent states and number-potential coordinates in Doi-Peliti theory-the former as the "native" coordinates in which the system is simple, and the latter as the coordinate system carrying the Fisher geometry-and the way this relation is captured by the dual coherent-state connection (90) different from both the Levi-Civita connection and the dually-flat connections (71) of Nagaoka and Amari [3,54].

Two-argument and one-argument generating functions on distributions with a conserved quantity
The two-state model describes (for example) a one-particle chemical reaction in a well-mixed reactor with the schema The probability per unit time for a reaction event is given by rate constants k + and k − , and proportional sampling (the microphysics underlying mass-action rate laws). A distribution initially in binomial form (36) will retain that form at all times under the master equation for the schema (91), even with time-dependent coefficients. Here for simplicity we will take k + and k − to be fixed. Therefor the distribution at any time is specified by descaled mean values ν a = n a /N , ν b = n b /N , with ν a + ν b = 1.
Although the system has only one dynamical degree of freedom, it is instructive to compute both the two-argument generating function with independent weights z a on n a and z b on n b , and the 1-argument generating function for the difference coordinate n ≡ (n b − n a ) /2, to illustrate the role of conservation laws and the geometry of the coherent-state connection. The two-argument CGF (1) for the binomial distribution is Because the total number N = (n b + n a ) is fixed, the normalized 1-variable distribution may be written and the terms in the generating function (92) regrouped as Introducing rotated coordinates on the exponential family of tilts and dividing the two-argument MGF (94) by √ z a z b N , we obtain an expression for the one-argument MGF in the difference coordinate n: In what follows, ψ (log z) will always be used to refer to the two-argument CGF (92), and the 1-argument generating function, when needed, will be written out explicitly as ψ(log z) − N h, as in Eq. (96).

Generator and conserved volume element in coherent-state coordinates
The master equation for the two-state system is developed in [68], but introduces further notation, and will not be needed here. We move directly to the expression for the Liouville function of Eq. (44) after conversion to field variables, which is In what follows, math boldface will be reserved for parameters in the generator such as k ± or functions of these such as the associated steady states used in Eq. (53). Two descalings reduce the problem to parameters which are dimensionless ratios. The first defines a time coordinate τ in units of the sum of rate parameters, The second expresses the equilibrium steady state under generator (97) in terms of relative hopping rates, As for the discrete index n, define ν ≡ (ν b − ν a ) /2.

Conservation of total number N results in a generator L that is a function only of the difference variable
Therefore it is natural to rotate the coherent-state fields to components corresponding to conserved N and dynamical n/N , and their dual coordinates in the generating-function argument z: In rotated fields (100) the action (44) becomes A descaled Liouville function has been introduced as N (k + + k − )L ≡ L. Absence of the field † fromL implies constancy of the expectation forˆ . 5

Splitting the symplectic structure between coherent-state conjugate field pairs
Althoughˆ obeys certain time-translation invariances in correlation functions, its value even along stationary paths will not generally be 1. Therefore the coherentstate variables cannot directly be interpreted as mean values of number variables in the nominal distribution or mean weights in its dual likelihood ratio. To express the functions that are these expectation values, we recall the mean number components in the importance distribution, which are bilinear quantities in φ † and φ, and then introduce a pair of dual number coordinates that, while not linear functions of the coherent-state fields, are functions respectively of φ † or of φ extracted by making use of the steady-state measure under the instantaneous value ν in the generator (97).
(Along stationary paths, where some components of φ † or φ are invariant, these dual number fields will become linear functions of the remaining dynamical components of φ † or φ, as we show below.) The two components of the normalized number field in number-potential coordinates (45) are given by Recall that the instantaneous steady state under the generating process is the scale variable for the dualizing canonical transform (53). To see how this reference steady state is used to separate the two conjugate variables (base and tilt) in the symplectic transformations, it is helpful to recast Eq. (102) as The action of the tilt alone can be isolated, without regard to the underlying nominal distribution, by referencing the action of the φ † fields to the steady state rather than to φ, defining an offset ν as Likewise, the mean valueν of n/N in the base (nominal) distribution is isolated by referencing the value of φ to the uniform measure 1 instead of the dynamic measure φ † , as

Stationary-path solutions and Liouville volume element
Solutions to the stationary-path equations of motion (47) for the Liouville function (97) are evaluated in Appendix F.1. Stationary-path approximations to the time-dependent density ρ would be binomial distributions even if the exact ρ were not (the stationary point is always a pure coherent state), so the CGF at any time has the form (92), with fields z replaced by the stationarypath values of φ † and the mean values ν from Eq. (93) replaced by corresponding components of φ.
In particular, the initial-time generating function ψ 0 log φ † a0 , log φ † b0 appearing in Eq. (43) carries the mean valueν 0 in the starting density ρ 0 , imposed as an initialdata parameter. It is through this function that the final-time tilt data in the form of the parameter ν T , propagated backward to the stationary-path values of φ † a0 and φ † b0 , determines the stationary path values for the fields φ of the base distribution, establishing the potential for information coupling between initial properties of the base distribution and final-time queries in the generating function ψ T . ψ 0 is evaluated in Eq. (190), and the value is shown to depend only on an overlap parameter between initial and final data which we denote Thus under independent variations ofν 0 and ν T as described in Sect. 3.4, the trajectories of the coherent state fields φ † and φ trace out an invariant volume, illustrated in Fig. 2.

Invariant cumulant-generating function and the incompressible phase-space density
The stationary value ofˆ 0 , obtained from the gradient of ψ 0 with respect to the components φ † 0a and φ † 0b , is computed in Eq. (193). It differs from unity-the reason constructions (104) and (105) were needed-and it is equal to the stationary value of at all times as a consequence of conservation of total number N . The value depends only on and T in the combination Moreover, as a consequence of the conserved Liouville volume element from Eq. (107), the stationary-point evaluation of the CGF at all times takes the same form as Eq. (92) and evaluates to the constant 108) is the basis for all information densities in this simple linear system. Through the stationary-point relation (61) between the Wigner function and the CGF, − logˆ 0 is the incompressible phase-space density convected along stationary trajectories by Eq. (64). As shown below, it is also the geometrically invariant part of sole nonzero eigenvalue of the Fisher metric.

Fisher metric
The Fisher metric (5) for the two-state system evaluates, along the stationary path at any time, to The nonzero eigenvalue comes from the single-argument generating function in

Dual coordinates for base and tilt, and the additive exponential family
To relate the Fisher metric in Eq. (110) to the construction of Sect. 3.3 from the ψ-divergence and to dually-symplectic parallel transport, we first express the base and tilt displacements (104) and (105) in terms of the coordinates in their respective exponential families.

E. Smith
Introduce reference values for the fields θ and h defined in Eq. (95), corresponding to the steady-state measure under the parameters of the generating process, denoted It is clear, in the two-argument generating function (92), that one component of variation in z couples only to the conserved quantity N and is not needed. It is sufficient therefore to vary along an affine coordinate in z that couples to the dynamical argument n, and the natural choice is to fix the component of z corresponding to the component of φ † that is invariant under the stationary-path equations of motion, given in Eq. (189). The resulting contour for z at final time T becomes The quantity in the first line of Eq. (113) is preserved at all times by Eq. (189), and the quantity in the second line obeys the exponential law of Eq. (194) repeated as (107). By the definition (104), the contour (113) which is affine in coherent-state fields φ † is written in the coordinates on the exponential family of tilts as Likewise, in the dual exponential representation (55) of the family of base distributions, the definition (105) giving the mean number offset in the nominal distribution is expressedν in the dual number-potential system (55), analogously to θ in Eq. (95). Because the two exponential coordinates (base and tilt) are additive, the mean of samples in the importance distribution can likewise be written It follows that the eigenvalue (111) in the Fisher metric also has the simple expression exhibiting the equivalence of the ψ-divergence expression (69) and the Hessian (71) for this quantity.

Why coherent-state fields do not generally produce invertible coordinate transformations
The Hessian matrix is not a tensor under coordinate transform, so it is clear that the Hessian of ψ with respect to the argument z equivalent to the coherent-state response field φ † will not be the Fisher metric. However, since coherent states are in many ways a native basis for Doi-Peliti theory, as noted in Sect. 3.6, we may ask whether some other coordinate duality can be defined from the coherent-state Hessian of ψ. In fact such a duality cannot generally be defined, and it is instructive to see where it fails, to better understand why the affine connection (90) and not the Fisher geometry captures the special role of coherent states.
A divergence under the Hessian of ψ in coherent-state variables, which we will denote δs 2 for reasons to become clear in a moment, if converted from the coordinates ν to coordinates θ along the z-affine contour (114), evaluates as Unlike the Fisher metric, Eq. (118) is negative-semidefinite, and degenerates if ν = ν, which is shown in Eq. (197) to hold for all z ifν 0 = ν. At degenerate solutions, we cannot use the Hessian of ψ to define a base-field variation δφ as a dual coordinate for a variation produced by a field δφ † , as we could use the Hessian in the exponential family to produce a variation δn as a dual coordinate to a variation δθ. The source of the degeneration has a nice description in terms of intrinsic and extrinsic curvatures, and advection, in the natural geometry on the exponential family. The geometric distance element (4), with θ and h varied independently, is The z-affine contour (114) specifies a function h(θ ) with extrinsic curvature in the affine coordinate manifold of the exponential family, along which the distance element is The second coherent-state coordinate derivative of ψ along the contour (114) can be decomposed as With some algebra, the expression (121) is shown to equal that in Eq. (118). The extrinsic curvature of the embedded contour h(θ ) and the convected quantity −2νν cancel against the intrinsic Fisher-Rao curvature, rendering the duality invisible to the fields φ † at degenerate points.

Flat transport in the coherent-state connection
The correct way to capture the simplifying role of coherent-state coordinates for simple models such as the two-state system is with the dual connections of Sect. 3.5. We first recognize, from the forms (106) or (195) of , a completely-descaled coordinate system for the dynamical parts of the coherent-state fields, defined by The eigenvalue of the Fisher metric in Eq. (111) then reduces to The role of the factors 1 /4 − ν 2 = 1 /4 − ν 2 (∂v/∂θ) and 1 /4 −ν 2 = 1 /4 − ν 2 (∂u/∂η) in Eq. (111) as measure terms is now explicit, and they can be absorbed by a change of variables to u and v. By Eq. (195) and the definitions (122) and (108), [1 + uv] = 1 + e −T = 1/ˆ 0 , so the Fisher inner product (12) may be written

Connection coefficients and absorption of measure terms
In this linear model, time evolution of φ † and φ has no cross-dependence once the initial values have been fixed through the gradients of ψ 0 as explained in Sect. 4.2.2.
Thus ∇ Only the dependence in the Fisher eigenvalueˆ 2 0 from Eq. (124) appears. The two lines of Eq. (126) (which are equal and opposite) scale as ∼ e −T , and have an interpretation similar to that of a Le Chatelier principle. The term e −T in Eq. (108) forˆ 0 is a susceptibility of the initial stationary value φ 0 to the perturbation by the tilt variable φ † T = z, attenuated exponentially from time T to time 0. The role of this attenuation, which takesˆ 0 → 1 as T → ∞, becomes clearer as a constraint on the total extractable information when we consider in Sect. 4.7 the range of all initial distributions ρ 0 and all tilts z.

Duality of dynamics and inference in Doi-Peliti theory
The natural separation of the coordinate transformation of the inner product of vector fields δθ and δη generated by time translation is not between exponential and mixture coordinates, as in the dually-flat connections of Amari [2], but rather between the symplectically dual contributions from changes in θ and in η. The two contributions group as

The Fisher information density and large-deviation ratios as sample estimators
The interpretation of the vector inner product as a convected density of information can be illustrated by using ratios of large-deviation probabilities to define a sample estimator for differences in the tilt coordinate η between two base distributions. Suppose that we sample from a binomial nominal distribution at a parameter η that is to be estimated. Recall from Eq. (24) that the probability for the value n of a sample to exceed a threshold n is given in terms of the large-deviation function by In a 1-dimensional system, 6 for two threshold values n B > n A , the conditional probability for n to surpass n B given that it has surpassed n A is the ratio The ratio (130) can be estimated from samples of the indicator function h n for thresholds n as described in Sect. 2.2. Appendix F.5 shows that if two such conditional probabilities are compared from distributions at unknown parameters η 2 and η 1 , the log ratio is related to the largedeviation thresholds and the η values as where d θ n dη is one of the two forms of the (differential) inner product appearing in Eq (76). Thus is a sample estimator for the difference of exponential parameters in the two underlying distributions.
The quantity (131) may be computed at any time, for instance the final time T when the thresholds n B and n A are imposed as experimental conditions, and η 2 and η 1 characterize evolved nominal distributions at time T from any pair of initial conditions at some earlier time t = 0. If we use the stationary-path conditions to propagate values of θ and η through time, and define V (τ ) to be the area inside the image of the rectangle in Eq. (131) along these stationary trajectories, time-invariance of the inner product, and the Liouville conservation of volume elements in dual coordinates, implies that Note that, with a coordinate transform to coherent-state variables and a corresponding redefinition of the boundary of V , the relation (133) could be recast using Eq. (124) as d dτ V dv duˆ 2 0 = 0 (134) which is the conserved integral graphed in Fig. 2. In Eq. (134)ˆ 2 0 , the 2-dimensional differential of the scaled CGF ψ/N = − logˆ 0 , appears explicitly as the density of overlap of dv with du that, like ψ itself, is constant along stationary paths.ˆ 2 0 is not independent of the position (v, u) within the volume V , but because the volume element moves along with the conserved density, the integral measures a fixed quantity of Fisher information as it is transported through different domains of base and tilt.
Although the limits of integration for dv du in Eq. (134) are bounded, the limits on (η 2 − η 1 ) in Eq. (131) are not, so formally the range of the sample estimator (132) remains unbounded over any duration T . However, for any fixed values of (n B − n A ) t=T and starting uncertainty (η 2 − η 1 ) t=0 , the total information obtainable from large-deviations sampling about differences in the initial conditions is finite and decreases as e −T . In Fig. 2 this limit is seen in the way any fixed ranges are squeezed exponentially at the "waist" as T → ∞. The contraction of boundaries, rather than the asymptotic behavior of the eigenvalue in the Fisher metric, measures the loss of information between initial distributions and final observations with increasing separation between the two.

Conclusions: the duality of dynamics and inference for irreversible and reversible processes
The three-part structure of the Fisher metric, dual Riemannian connections, and symplectic parallel transport of the Wigner density, vector fields, and the metric tensor, elegantly expresses the transport properties along 2FFI stationary paths in terms of geometric invariants. It resolves a feature of two-field constructions that at first seems paradoxical: if memory of initial conditions is continuously lost to dissipation, what concept of time-reversal is implied by invertibility of the map along stationary rays?
The answer from the perspective of importance sampling is that, even if samples are finite, their expectations are computed in continuous-valued distributions, and deformations of measure through the Radon-Nikodym derivative can locally compensate for concentration of measure in the nominal distribution by expanding sensitivity of likelihood ratios. Locally in sampling space, then, time is immaterial as it is in Hamiltonian mechanics; the mappings along stationary trajectories make it possible to interpret sampling protocols from different times in an evolving distribution simply as coordinate transformations of a fixed sampling protocol on the original distribution.
On the other hand, for any fixed ranges of parameter variation in the initial conditions, and fixed large-deviation thresholds compared at late time, the integrated Liouville density contracts monotonically with the separation between the two times, reflecting the absolute loss of information that can be recovered.
We have wanted to establish a concrete interpretation of time-duality in 2FFI theories as a duality of dynamics and inference, to provide an alternative to the interpretation in terms of physical reversal of paths that is the starting point in most of the literature on fluctuation theorems in stochastic thermodynamics. Microscopic reversibility can always be added later to any class of 2FFI constructions as a restriction on the scope of phenomena under study, and both stronger conclusions and additional interpretations will then follow from the added constraints. Where the existence of a duality in the mathematics itself does not depend on any such additional assumptions, taking the inference interpretation to reflect the core concepts, directly expressing Kolmogorov's forward/backward adjoint duality, frames the special case of microscopic reversibility as one in which the system's own dynamics contains an image of certain sampling protocols over itself.
Even if one only cares about microscopically reversible processes, making explicit the step of self-modeling, and having a concrete interpretation of conserved densities such as the Fisher information constructed here, provides a bridge between trajectory reversal in low-level mechanics and operations for sample estimation of the kind that are used by control systems. Linking limitations from path probability in a system's autonomous dynamics to concepts of information capacity in control loops [5,6,18] promises a way to study the limits on spontaneous emergence of dynamical hierarchy, which has been a desired application for stochastic thermodynamics [23,59]. These are intended topics for future work.

A.2 Embeddings in reduced dimension for exponential families on the multinomial
The Poisson (35) and multinomial (36) distributions are both in a class recognized by Anderson, Craciun, and Kurtz (ACK) [4] in connection with uniqueness of stationary solutions for chemical reaction networks. All factorial moments are powers of their first moments, causing the CGF for many particles to scale as a multiple of a singleparticle CGF. It is not then surprising that the expression (4) for the Fisher metric in terms of a distributionρ n with possibly indefinitely many independent terms, for the ACK distributions projects to a function of the same form in terms of expected numbers n i over the D independent species. To see how this works for a distribution ρ (n 0 ) n with multinomial form (36), express the expected number fractions as Then the mean in the distributionρ and the CGF ψ(θ ; n 0 ) evaluates (up to a constant offset) to The Hessian giving the Fisher metric is where ν i is the function of η + θ in Eq. (139). Inverting Eq. (141), and projecting onto the i θ i = 0 plane to fix the undetermined component of θ , gives the inverse One can check that both g(θ ) and g −1 (n) sum to zero on either index, and the product is the identity on the subspace D j=1 θ j = 0 or D j=1 n j = N .
If a shift of the tilted distribution in the exponential family is indexed with coordinate δn, with D j=1 δn j = 0, the Fisher distance element from Eq. (11) becomes which is the same function of n as the function ofρ in the third line of Eq. (4).

B Sample means and variances in the large-deviation approximation to threshold indicator expectations
The providing Eq. (24) in the text. The variance of the same sample estimator has a corresponding bound giving Eq. (25) in the text.
To estimate the tightness of the bounds, begin by observing that in the largedeviation scaling regime (27), with all cumulants generated as derivatives of ψ ∼ N , the expansion of central moments in terms of cumulants bounds the scaling of the kth central moment as The log ratio we wish to bound is the Bregman divergence log n>n e −θ(n−n) e θn ρ n n e θn ρ n = θn − ψ(θ, n 0 ) + log h (n) The maximum of Eq. (148) occurs at θ (n) from Eq. (8), and the width of the transition for the log ratio to change by O N 0 is given by To estimate its maximum value we write Eq. Next, observe the leading-order scaling of the expectation (n −n) 2 n>n ≈ (n −n) 2 n>n on the half-line (as for any second moment), and likewise for (n − n(θ )). At θ (n), where n(θ ) =n, the two derivatives (151) are equal and opposite, and the boundary of the half-line n >n is also the symmetry point of (n −n) 2 . Because the skewness and higher-order central moments grow more slowly thann k by Eq.

C.1 Doi operator algebra and inner product
The main constructs in the Doi operator formulation [21,22] of moment-generating functions as formal power series are as follows: The identification (30) of z and ∂/∂z with raising and lowering operators a † and a allows the commutator algebra to stand for the commutator algebra between components of ∂/∂z and factors of z, applied by function composition acting to the right on MGFs. Monomials z n in Eq. (1) are basis elements in a linear space of MGFs, built up by multiplication on the number 1. A bracket notation for states and an inner product are introduced by the pair of denotations Each monomial z n is denoted as a number state The number states are eigenstates of the set of number operatorsn i ≡ a † i a i (no Einstein sum):n i |n) = n i |n) .
Dual to each number state is a conjugate projection operator which defines the asymmetric inner product on the Hilbert space of generating functions.
Replacing the uniform measure i a i ≡ 1 T a in Eq. (159) with the scalar product za gives the map (33) to (z) in the main text. (1) ≡ 1; the Glauber norm of the Laplace transform of any normalized distribution is the trace of the probability distribution n ρ n .

C.2 Coherent states and Peliti functional integral
The uniform measure in the 2D-dimensional integral for the representation of unity (42) in the main text is known as the Haar measure. Using the definition (38) for coherent states and (39) for their dual projectors, and expanding the exponential functions as sums, The phase component in each integral dφ † i dφ i vanishes unless n i = m i , and the remaining modulus component produces a Gamma-function canceling the n i !. Thus the Haar measure on coherent states is equivalent to the uniform measure on states n of the classical probability distribution. 7 Mapping backward through the correspondences between analytic functions and Doi state vectors from Appendix C.1, an evaluation of the integral in terms of Dirac δ-functions 8 shows that the effect of the representation of unity is the map 7 Aaronson [1, (p. 123)] has raised this equivalence as one of the reasons only the complex L2 norm of quantum mechanics results in a correspondence principle with the classical laws of probability. It is interesting that the representation of probability components ρ n as squared amplitudes (though only realvalued) also underlies the natural spherical embedding of Appendix (A) for the Fisher metric. 8 There is a notational subtlety in writing the complex area integral dφ † dφ ≡ ∞ 0 d |φ| 2π 0 |φ| d arg φ with respect to δ-functions evaluated as complex contour integrals. For example, in D = 1, the integral kernel in Eq. (161) is written in the two notations as To keep track of the scoping rules for application of complex functions and derivatives would require introducing a distinct set of variables z kδt for each interval in the quadrature (34). In the Doi operator algebra this scoping is handled by the bracket inner product, and the map (161) becomes simply the identity map on a † and a. The integration measure that results from inserting a copy of the representation of unity (42) between each interval of time evolution in Eq. (34) is called a skeletonized measure. Its limit as the interval length δt → 0 defines the functional integration measure used in Eq. (43) and elsewhere.

D Gaussian-order fluctuations, diffusions, and stochastic differential equations
The Doi-Peliti construction used in this paper affords a point of departure to many other representations of perturbed dynamical systems. It is important to emphasize that the motivation and form of 2FFI representations does not depend on the integer-lattice state space and exponential generating function that enable the specific Doi-Peliti construction. To show this we reproduce from [42,68] a variety of reductions from the full field theory to the limit of Gaussian-order fluctuations that is specified by a drift velocity and diffusion tensor that are functions only of the configuration variable n. These are "classical" representations of perturbed dynamical systems ostensibly lacking two-field structure. We will show how the inverse of the steps reducing Doi-Peliti to these classical limits has been used in [30,52] to construct two-field theories or their large-deviation asymptotics in the form produced by the Doi-Peliti 2FFI. These classical descriptions from functions on n no longer reference lattice state spaces or population processes, and the 2FFI theories to which they correspond, like the operator formulation of [30,52] are expected to apply to much more general perturbed dynamical systems.
The measure dφ † dφ over a single complex variable must therefore be used in evaluating δ-functions as where a factor of 2π would be required if φ † and φ were distinct complex variables integrated over independent contours. This use of the measure will be needed to understand the normalization of the Wigner function in later sections.
The integral (166) is over an ensemble of trajectories each satisfying Eq. (167) is the approximation to linear order in the perturbing field ξ of the stochastic differential equation Because the sole remaining Gaussian kernel in Eq. (166) is time-local, ξ is a Langevin white noise with correlation function The noise covariance (169), although expressed in terms of the valuen along a stationary trajectory at θ = 0, may be understood as a function of the local coordinates n in the underlying manifold, since for classical solutions stationary trajectories form a foliation of the configuration manifold. The Gaussian-order fluctuation limit corresponding to the Langevin equation (168, 169) produces, for the continuum limit of the lattice of states n = (n i ), the diffusion equation for a density indexed by continuous index n = (n i ), the master equation known as the Fokker-Planck equation [74]. (This form may be obtained directly from the transition matrix corresponding to the Liouville operator-for example, as written in [44,69,71] for chemical reaction networks-by writing discrete shifts of the index n i as if they were exponentiated operators e ∂/∂n i acting on a continuously-indexed density ρ n , and then "expanding the Taylor's series for the exponential" to second order.) Here we have derived the three equivalent representations of the Gaussianfluctuation limit specified entirely by a drift velocity and diffusion kernel as functions of n-the quadratic action (163), the Langevin equation (168, 169), and the Fokker-Planck equation (170)-from a Doi-Peliti starting point. The reverse of the derivation from Eqs. (163-170) may be followed to arrive at either two-field forms or their large-deviation asymptotics starting from the stochastic differential equation or Fokker-Planck equation. Kamenev [42] has shown that the "tri-diagonal" arrangement of response and correlation functions in Eq. (4.1) of [52], equivalent to the quadratic action (163) here, is in fact the common motif in 2FFI for stochastic classical mechanics, and for the Schwinger-Keldysh time-loop in quantum mechanics [43,63], allowing us to bypass the operator formalism altogether and proceed directly from stochastic differential equations (168, 169) to the path integral up to Gaussian order. For a development of correspondence principles, where both the quantum Schwinger-Keldysh time loop and the stochastic two-field integral appear, see [67].
It is central to the aims of [52] that the tri-diagonal response and correlation functions are not limited to the "bare" constitutive parameters of the underlying stochastic differential equation, but rather are a general form for Gaussian order fluctuations in an effective theory where the drift velocity and fluctuation covariance are generally renormalized quantities, in this way subsuming higher-order fluctuation effects, which may appear explicitly in the Doi-Peliti action. Renormalization of the stochastic action as an effective action may also be done directly from the path integral with diagrammatic methods of the type appearing in [52]. For an application to evolutionary games see [70,Ch. 7].
Alternatively, the large-deviation asymptotics for excursions within, or for escapes from, a domain, may be derived directly from the stochastic differential equation, and related to boundary-value problems for solutions of diffusions such as the Fokker-Planck equation. This program, starting from representations identical to Eqs. (168-170) for Gaussian fluctuations, and covering several other classes of perturbations as well, is carried out by Freidlin and Wentzell [30], and has been applied to a wide range of first-passage and escape problems [48][49][50][51].
The connection to 2FFI methods is that an action arises as their large-deviation function from the drift and diffusion functions of the Gaussian limit, which is simply obtained from the Doi-Peliti action. By completing the square in θ (which is θ in a neighborhood ofθ = 0), the quadratic action (163) is put into the form The quadratic form in θ is integrated out in the functional integral to produce a functional determinant that depends on the backgroundn only logarithmically. Again, the term in n in Eq. (171) is the leading-linear expansion of the equations of motion in small fluctuations, so at the Gaussian order to which it is defined, Eq. (171) may be approximated as a form first due to Onsager and Machlup [55], and appearing in Theorem 1.1 on p. 86 of [30]. Two-field constructions originating in classical perturbed dynamical systems (168-170) in the continuous variable n may readily be extended to continuously-indexed state spaces with topologies, on which the drift velocity may include spatial gradi-ents, as in the case of reaction-diffusion systems, without altering the program for construction of the corresponding two-field representation.

E Stationary-point approximations to the Wigner function
The functional integral provides the most direct route to the current conservation law (62) for the Wigner function. It is possible, with somewhat more work, to derive the same relations directly from stationary variation of the generating function, and in the process to gain some more intuition for what the Wigner function quantifies.

The Wigner function in terms of an explicit density over coherent-state parameters
Begin by writing any state vector as the integral of a density in the coherent-state basis: The generalized Glauber norm (33) returns the analytic representation of the MFG: Now evaluate the integral in Eq. (59) at time t → T , where the stationary value of the field φ † will coincide with the imposed argument z, It follows then that A second integral over the φ ‡ fields yields the two equivalent expressions (60) and (174).

Stationary-point approximations
The stationary valueφ(z) of the tilted density e (z−1)φ ρ(φ) is given by To identify the time-dependence of the stationary argument φ ‡ , we work directly from the stationary-point condition (177). The total time derivative of that equation is In passing from the second to the third line of Eq. (182), to obtain an explicit expression for ∂ log ρ(φ) /∂t|φ in terms of L z,φ , we evaluate z as an inverse function ofφ from Eq. (177). This functional dependence contributes the term ∂L/∂z in the final line, from which we obtain the stationary-path equation for the trajectory ofφ(z) along which φ ‡ is to be evaluated: Equations (181) and (183) imply the 2FFI counterpart to conservation of energy in Hamiltonian mechanics: dL/dt = 0 along the stationary path if ∂L/∂t = 0.

The stochastic effective action in stationary-path evaluations
Note from Eq. (180) that along the contour identified to preserve stationarity of w T , Therefore the extension of the Wigner function to times t > T must include the stationary-path contribution from the action, which was present for t < T in the functional integral definition (59). w T thus extended satisfies recovering Eq. (62).
We have termed the stationary-path evaluation of S the stochastic effective action [68]. It is the functional Legendre transform of the large-deviation functional for trajectories in Doi-Peliti integrals. The approximation (179), with φ ‡ set equal tō φ(z) given ρ(φ), together with the contribution from S T in Eq. (185), provides the desired interpretation of the Wigner function in terms of densities in the statistical model provided by coherent states, and their exponential tilts by likelihood functions.

F Stationary-path solutions for the two-state system
The stationary-path equations and both initial and final values for fields are obtained from vanishing of all terms in the variational derivative of the exponential argument in Eq. (43). We begin with solutions in coherent-state variables, and then present the forms for the descaled number coordinates ν, ν, andν.

F.1.1 Stationary-path equations and final-time conditions for response fields
The stationary-path equations of motion for the components of the field φ † from Eq. (47), in the rotated basis (100), evaluate to The final-time values φ † T are given by variation ofφ T , as Fixing the magnitude of the combination φ † b ν b +φ † a ν a in Eq. (104) requires varying z along the contour giving Eq. (113) in the main text. The remaining time-dependent solutions, with time argument denoted explicitly here by subscript τ , are given by Initial data are specified in the generating function ψ 0 , which when evaluated at the solutions for φ † 0 become The final line of Eq. (190) is obtained by combining the two solutions (189) at τ = 0, and introduces the combination defined in Eq. (106).

F.1.2 Stationary-path equations and initial-time conditions for observable fields
The stationary-path equations of motion for the components of the field φ from Eq. (47) are obtained by removing a total derivative d t φ † φ from the action (44) to shift the derivative onto φ. In the rotated basis (100), they evaluate to The total derivative cancels the final-time term φ † φ T from the exponential in Eq. (43) and introduces an initial-time term φ † φ 0 . Variation of this term against ψ 0 with respect to φ † 0 gives the initial-value conditions for the components ofφ 0 , aŝ Solutions to the equations of motion (191) from these initial conditions are then The two displacements defined in Eqs. (104) and (105), characterizing respectively the mean in the nominal distribution and the likelihood ratio applied to the stationary measure, evaluate tō These results are reproduced (dropping the explicit subscripts τ ) as Eq. (107) in the text. It follows from Eq. (194) that the combination is invariant at its initial value. The CGF for a binomial distribution at any time retains the form (92), with ν a /ν a and ν b /ν b replacing z a and z b , andν a andν b replacing ν a , and ν

F.2 Fisher spherical embedding
In one dimension, the Pythagorean theorem (12) for K-L divergences loses the interpretation of a direction cosine between vector fields, but still reflects scale changes between coherent-state or exponential families and the geometric coordinate.
The mean-value Fisher-sphere construction of Sect. A.2, for one variable, is the embedding on a circle: The coordinate differential is showing the variation of the embedding coordinate of the importance distribution with the tilt multiplied by its variation with the base.

F.3 Evaluation of the Amari-Chentsov tensor
From the form (122) of the Fisher metric in coordinates (v, u), the Amari-Chentsov tensor on all contravariant indices can be computed: T is symmetric under u ↔ v, but its magnitude is not conserved along the stationarypath trajectories. The measure changes (dv/dθ ) and (du/dη) are the same as those in the Fisher metric, but in addition to these the term u + v is not invariant.

F.4 Connection coefficients in the coherent-state connection
Among the nonzero connection coefficients for the dual coherent-state connections (90), the only independent components are for the rotated variables θ of Eq. By definition of θ (n; η) as the inverse function of n(θ ; η) ≡ ∂ψ(θ ; η) /∂θ , the term (n − ∂ψ/∂θ) ≡ 0 at all n. Therefore the second derivative The third line of Eq. (210) uses additivity of θ and η to cancel the two factors, which equal respectively g −1 and g.
The log-ratio of the conditional large-deviation probabilities in Eq. (130) may be written as the integral log P(n B | n A ; η 2 ) P(n B | n A ; η 1 ) = giving Eq. (131) in the text. The conversion from the fifth to the sixth line in Eq. (211) makes use of the two alternative ways of expressing the inner product in contravariant/covariant coordinates given in Eq (76).