Information Geometry of Reversible Markov Chains

We analyze the information geometric structure of time reversibility for parametric families of irreducible transition kernels of Markov chains. We define and characterize reversible exponential families of Markov kernels, and show that irreducible and reversible Markov kernels form both a mixture family and, perhaps surprisingly, an exponential family in the set of all stochastic kernels. We propose a parametrization of the entire manifold of reversible kernels, and inspect reversible geodesics. We define information projections onto the reversible manifold, and derive closed-form expressions for the e-projection and m-projection, along with Pythagorean identities with respect to information divergence, leading to some new notion of reversiblization of Markov kernels. We show the family of edge measures pertaining to irreducible and reversible kernels also forms an exponential family among distributions over pairs. We further explore geometric properties of the reversible family, by comparing them with other remarkable families of stochastic matrices. Finally, we show that reversible kernels are, in a sense we define, the minimal exponential family generated by the m-family of symmetric kernels, and the smallest mixture family that comprises the e-family of memoryless kernels.


Introduction
Time reversibility is a fundamental property of many statistical laws of nature. Inspired by Schrödinger [1931], Kolmogorov was the first [Dobrushin et al., 1988], in his celebrated work [Kolmogorov, 1936[Kolmogorov, , 1937, to investigate this notion in the context of Markov chains and diffusion processes. Reversible chains also find numerous applications in computer science, for instance in queuing networks [Kelly, 2011] or Markov Chain Monte Carlo sampling algorithms [Brooks et al., 2011]. For instance, a random walk over a weighted network corresponds to a reversible Markov chains [Aldous and Fill, 2002, Section 3.2].
Reversible Markov operators enjoy a considerably richer mathematical structure than their non-reversible counterparts, enabling a wide range of analytical tools and techniques. Indeed, the significance of reversibility spans across surprisingly many areas of mathematics, from spectral theory [Levin et al., 2009, Chapter 12] to abstract algebra [Pistone and Rogantin, 2013]. For instance, the mixing time of a reversible Markov chain, i.e. the time to guarantee closeness to stationarity, is controlled up to logarithmic factors by its absolute spectral gap (the difference of its two largest eigenvalues in magnitude). The diversity of the existing tools and analyses prompts our first question of whether reversibility can also be treated from an information geometry perspective.
Through the lens of information geometry, the manifold of all irreducible Markov kernels forms both an exponential family (e-family) and a mixture family (m-family).
Our natural second question is whether we can find subfamilies of irreducible kernels that enjoy similar geometric properties, or in other words, can we find submanifolds that are autoparallel with respect to affine connections of interest? For instance, the set of doubly-stochastic matrices is known to form an m-family [Hayashi and Watanabe, 2016], while a tree model is an e-family of Markov kernels, if and only if it is an FSMX model [Takeuchi and Nagaoka, 2017b].
In this article, we will answer these two questions, see that reversible irreducible Markov chains enjoy the structure of both exponential and mixture families, and explore their geometric properties.

Related work
The concept of exponential tilting of stochastic matrices using Perron-Frobenius (PF) theory can be traced back to the work of Miller [1961]. The large deviation theory for Markov chains, whose crown achievement is showing that the convex conjugate of the log-PF root of the tilted kernel essentially controls the large deviation rate was further developed by Donsker and Varadhan [1975], Gärtner [1977], Dembo and Zeitouni [1998]. Csiszár et al. [1987] seem to be the first to recognize the exponential structure of the set of irreducible Markov kernels, in the context of information projections. Independently, Ito and Amari [1988] implicitly introduced the notion of asymptotic exponential families, and exhibited irreducible Markov kernels as an example. Takeuchi and Barron [1998] later formalized this definition (see also Takeuchi and Kawabata [2007]), and Takeuchi and Nagaoka [2017a] subsequently proved that exponential families and their asymptotic counterparts are equivalent. Nakagawa and Kanaya [1993] formally defined the exponential family of irreducible Markov chains and Nagaoka [2005] later gave a full treatment in the language of information geometry, proving its dually flat structure. A notable collection of works has also explored the implications of this geometric structure for problems related to parameter estimation [Hayashi and Watanabe, 2016], hypothesis testing [Nakagawa andKanaya, 1993, Watanabe andHayashi, 2017], large deviation theory [Moulos and Anantharam, 2019], and hidden Markov models [Hayashi, 2019.
We refer the reader to Levin et al. [2009] and Amari and Nagaoka [2007] for thorough treatments of the theory of Markov chains and information geometry.

Outline and main results
In Section 2, we begin with a primer on reversible Markov chains, define exponential and mixture families, and briefly discuss the importance of affine structures for our analysis of exponential families. In Section 3, we define a time-reversal operation on parametric families, and show in Proposition 3.1 that both m-families and e-families are closed under this transformation. In Section 4 we introduce the concept of a reversible e-family, and provide a characterization (Theorem 4.2) of such family in terms of its carrier kernel and set of generator functions. Adapting the Kolmogorov criterion, we show that the necessary and sufficient conditions can be verified in a time that depends polynomially on the number of states. In Section 5, we prove that the set of all reversible and irreducible transition kernels is both an m-family, and an e-family (Theorem 5.1), construct a basis (Theorem 5.2), and derive a parametrization (Theorem 5.3) of the entire set of reversible kernels. In Section 6, we investigate information projections of an irreducible Markov chain onto its reversible submanifold. We show that the projections verify Pythagorean identities, and obtain closed-form expressions (Theorem 6.1). Additionally, we prove that the projections are always equidistant from an irreducible Markov kernel and its time-reversal (bisection property, Proposition 6.1). In Section 7, we show that reversible edge measures also form an e-family in distributions over pairs (Theorem 7.1). In Section 8, we briefly compare the geometric properties of reversible chains with several other natural families of Markov kernels. Finally, in Section 9, we characterize the reversible family as both the smallest exponential family that comprises symmetric kernels (Theorem 9.1), and the smallest mixture family that contains memoryless Markov kernels (Theorem 9.2).

Preliminaries
For m ∈ N we write [m] = {1, 2, . . . , m}. Let X be a set such that |X | = m < ∞, identified with [m], where to avoid trivialities, we also assume that m > 1. We denote P (X ) the probability simplex over X , and P + (X ) = {µ ∈ P (X ) : ∀x ∈ X , µ(x) > 0}. All vectors will be written as row-vectors, unless otherwise stated. For some real matrices A and B, ρ(A) is the spectral radius of A, f [A] for f : R → R is the entry-wise application of f to A; A • B is the Hadamard product of A and B, A > 0 (resp. A ≥ 0) means that A is an entry-wise positive (resp. non-negative) matrix. We will routinely identify a function f : X 2 → R with the linear operator f : R X → R X .

Irreducible Markov chains
We let (X , E ) be a strongly connected directed graph, where X is the set of vertices, and E ⊂ X 2 the set of edges. Let F (X , E ) be the set of all real functions over the set E , identified with the totality of functions over X 2 that are null outside of E , and let F + (X , E ) ⊂ F (X , E ) be the subset of positive functions over E . Similarly, we define P (E ) = P (X 2 ) ∩ F + (X , E ), the set of distributions whose mass is concentrated on the edge set E . We write W (X ) for the set of row-stochastic transition kernels over the state space X , and W (X , E ) for the subset of irreducible kernels whose support is E , i.e.
and where P(x, x ) corresponds to the transition probability from state x to state x 1 . For P ∈ W (X , E ), there exists a unique π ∈ P + (X ), such that πP = π [Levin et al., 2009, Corollary 1.17], which we call the stationary distribution of P. When E = X 2 and if there is no ambiguity about the space under consideration, we may write more simply F , F + instead of F (X , X 2 ), F + (X , X 2 ) (a similar notation will apply to all subsequently defined spaces).

Reversibility
For an irreducible kernel P, we write Q = diag(π)P for the edge measure matrix, [Levin et al., 2009, (7.5)], which corresponds to stationary pair-probabilities of P, i.e. Q(x, x ) = P π (X t = x, X t+1 = x ), and denote the set of irreducible edge measures by Note that this definition is equivalent to We further denote P for the uniquely defined time-reversal of P, that verifies P (x, x ) = π(x )P(x , x)/π(x), and write Q = Q for its corresponding edge measure, where denotes matrix transposition. When Q is symmetric (i.e. Q = Q), the chain verifies the detailed balance equation, π(x)P(x, x ) = π(x )P(x , x), i.e. P = P, and we say that the Markov chain is reversible. Observe that in this case, for P irreducible over E , the edge set must also be symmetric . We write W rev (X , E ) for the set of all reversible kernels that are irreducible over (X , E ). For f , g ∈ R X , f , g π ∑ x∈X f (x)g(x)π(x) defines an inner product. We call 2 (π) the corresponding Hilbert space. The time-reversal is the adjoint operator of P in 2 (π), i.e. the unique linear operator that verifies P f , g π = f , P g π , ∀ f , g ∈ R X (represented here as column vectors). As a consequence, when P is reversible, it is also self-adjoint in 2 (π), and the spectrum of P is real.

Mixture family and exponential family
For later convenience we consider the following three equivalent definitions of a mixture family.
Definition 2.1 (m-family of transition kernels). We say that a family of irreducible transition kernels V m is a mixture family (m-family) of irreducible transition kernels on (X , E ) when one of the following (equivalent) statements (i), (ii), (iii) holds.
(iii) [Hayashi and Watanabe, 2016, Section 4.2] There exist k ∈ N, g 1 , . . . , g k ∈ F (X , E ) and c 1 , . . . , c k ∈ R, such that Note that Ξ is an open set, ξ is called the mixture parameter and d is the dimension of the family V m .
Definition 2.2 (e-family of transition kernels). Let Θ ⊂ R d , be some connected parameter space that contains an open ball centered at 0. We say that the parametric family of irreducible transition kernels V e = P θ : θ = (θ 1 , . . . , θ d ) ∈ Θ is an exponential family (e-family) of transition kernels on (X , E ) with natural parameter θ, whenever when (x, x ) ∈ E , and P θ (x, x ) = 0 otherwise.
When fixing some θ ∈ Θ, we may later write for convenience ψ θ for ψ(θ) and R θ for R(θ, ·) ∈ R X . The carrier kernel K, the collection of generator functions g 1 , . . . , g d and the parameter range Θ define the family entirely. The remaining functions R θ and ψ θ will be determined uniquely by PF theory, from the constraint of P θ being row-stochastic (see for example the proof of Proposition 3.1). In fact, we can define the mapping s that constructs a proper irreducible stochastic matrix from any linear operator defined by an irreducible matrix over (X , E ).
where ρ( P) and v are respectively the PF root and right PF eigenvector of P.
Following the information geometry philosophy [Amari and Nagaoka, 2007], we view the e-families or m-families that we defined, as d-dimensional submanifolds of W (X , E ) with corresponding chart maps θ, ξ : W (X , E ) → R d . We can give more geometrical, parametrization-free definitions of e-families and m-families of irreducible transition kernel over (X , E ), as autoparallel submanifolds of W (X , E ) with respect to the e-connection and m-connection [Nagaoka, 2005, Section 6]. We will prefer, however, to mostly cast our analysis in the language of linear algebra, and defer analysis of the relationship with differential geometry concepts to Section 5.3. This choice is motivated by the existence of a known correspondence between affine functions over E and the manifold W (X , E ) [Nagaoka, 2005] that we now describe. Denote, [Nagaoka, 2005, Section 3]. Introducing the mapping, such that We see from the expression at (3) that ∆ gives a diffeomorphism from the quotient linear space to W (X , E ) and a subset V of W (X , E ) is an e-family if and only if there exists an affine subspace A of the quotient space G(X , E ) such that V = ∆(A) (we identify a coset with a representative function in that coset). In this case, the correspondence is one-to-one, and the dimension of the affine space and the submanifold coincide [Nagaoka, 2005, Theorem 2]. In particular, this entails that dim W (X , E ) = |E | − |X | Nagaoka [2005, Corollary 1].
Remark 2.2. For Definition 2.2, unless stated otherwise, we will henceforth assume that the g i form an independent family in G(X , E ). This will ensure that the family is well-behaved in the sense of Hayashi and Watanabe [2016, Lemma 4.1].

Time-reversal of parametric families
We begin by extending the definition of a time-reversal to families of Markov chains.
Definition 3.1 (Time-reversal family). We say that the family of irreducible transition kernels V is the time-reversal of the family of irreducible transition kernels V when V = {P : P ∈ V }, where P denotes the time-reversal of P.
We now state the fundamental fact that the quality of being an e-family or an mfamily of transition kernels is closed under this time-reversal operation.
Proposition 3.1. The following statements hold.
Time reversal of m-family: Let V m be an m-family over (X , E ), then V m is an m-family over (X , E ). Furthermore, if V m is the m-family generated by Q 1 , . . . , Q d ∈ Q(X , E ) (following the notation at Definition 2.1-(i)), then the time-reversal m-family is given by where Q ξ pertains to P ξ and with Time reversal of e-family: Let V e be an e-family over (X , E ), then V e is an e-family over (X , E ). Furthermore, if V e is the e-family generated by K and g 1 , . . . , g d (following the notation at Definition 2.2), then the time-reversal e-family is given by V = P θ : θ ∈ Θ such that when (x, x ) ∈ E , P θ (x, x ) = 0 otherwise, and where L θ is the left PF eigenvector of the non-negative irreducible matrix Proof. Since the edge measure Q ξ of the time-reversal P ξ is the transpose of Q ξ corresponding to P ξ , it is easy to obtain the expression of the time-reversal, and to see that V m is a mixture family. It remains to show that this also holds true for e-families. From the definition of an exponential family (2), and the requirement that P θ be row-stochastic, it must be that for any x ∈ X , By positivity of the exponential function, the vector exp[R θ ] ∈ R X is positive. Thus, from the PF theorem, e ψ θ corresponds to the spectral radius of P θ , and exp[R θ ] its (right) associated eigenvector. There must therefore also exist a left positive eigenvector, which we denote by exp[L θ ], such that Defining the positive normalized measure it is easily verified that π θ is the stationary distribution of P θ . Notice that θ, K and g i determine uniquely L θ , R θ , ψ θ and π θ by the PF theorem. Recall that the adjoint of a transition kernel P can be written P (x, x ) = π(x )P(x , x)/π(x), thus we can compute the time-reversal as when (x , x) ∈ E, and 0 for (x , x) ∈ E. The requirements of Definition 2.2 for an efamily are all fulfilled, which concludes the proof.
Remark 3.1. Recall that for a distribution µ ∈ P (X ), we can by exponential change of measure -also known as exponential tilting -construct the natural exponential family of µ: where A(θ) is a normalization function that ensures µ θ ∈ P (X ) for all θ ∈ R. The idea of exponential change of measure for distributions can be traced back to Chernoff [Chernoff, 1952], and was later termed tilting [Gallager, 1968, Van Campenhout andCover, 1981]. Similarly, given some function g : X 2 → R we can tilt an irreducible kernel P (e.g. Miller [1961]), by first constructingP θ (x, x ) = P(x, x )e θg(x,x ) , and then rescaling the newly obtained irreducible matrix 2 with the mapping s. When θ = 0, notice that we recover the original P. But while in our definition, denotes the kernel tilted involving the right PF eigenvector v θ , we could alternatively define the Markov kernel P θ by tilting P with the left PF eigenvector u θ : Observe that the right and left tilted versions of P with identical θ share the same stationary distribution π θ ∝ u θ • v θ (6) and that they are in fact each other's time-reversal (P θ = P θ ), i.e. they form a pair of adjoint linear operators over the space 2 (π θ ).

Reversible exponential families
The previous section extended the time-reversal operation to parametric families of transition kernels. It seems then natural to investigate fixed points, i.e. parametric families that remain invariant under this transformation. We say that an irreducible e-family V (X , E ) is reversible when P is reversible ∀P ∈ V (X , E ). In this case, E coincides with E and V (X , E ) with V (X , E ). Observe first that an e-family obtained from tilting a reversible P is not generally reversible, making it clear that the reversible nature of the family cannot be determined solely by the properties of the carrier kernel K. It is however easy to see that an e-family is reversible when K and all the generator functions g 1 , . . . , g d are symmetric. Moreover, for a state space X of size 2, any exponential family would be reversible regardless of symmetry, showing that this condition is not always necessary. In this section, we give a complete characterization of this invariant set. Additionally, we explore the algorithmic cost of checking whether this property is verified from the description of the carrier kernel and generators of a given e-family. Before diving into the general theory of reversible e-families, let us consider the following simple examples.

Example 4.2 (Birth-and-death chains). For
Markov kernel having its support on E bd is referred to as a birth-and-death chain. Since every birth-and-death chain is reversible [Levin et al., 2009, Section 2.5], W (X , E bd ) is a reversible e-family.
We first recall Kolmogorov's characterization of reversibility, which will be instrumental in our argument. For E ⊂ X 2 such that (X , E ) is a strongly connected directed graph, we write Γ(X , E ) for the set of finite directed closed paths in the graph (X , E ). Formally, we treat γ as a map [n] → E such that γ(t) = (x t , x t+1 ) with x n+1 = x 1 and we write |γ| = n for the length of the path. For each γ ∈ Γ(X , E ), we also introduce the reverse closed path γ ∈ Γ(X , E ) given by γ (t) = (x t+1 , x t ). Namely, if γ ∈ Γ(X , E ), we can write γ informally as a succession of edges such that the starting and finishing states agree (i.e. as an element of E n ).
Note that γ is not necessarily a cycle, i.e. in our definition, multiple occurrences of the same point of the space are allowed.
We now extend the definition of reversibility to arbitrary irreducible functions F + (X , E ) (non-negative on X 2 and positive exactly on E ) based on Kolmogorov's criterion, and further introduce the concept of log-reversibility for F (X , E ), that considers sums instead of products.

Definition 4.1 (Reversible and log-reversible functions
for all finite directed closed paths γ ∈ Γ(X , E ).
Remark: These definitions do not rely on connectedness properties of E per se, but we will assume irreducibility nonetheless. Observe that when h is represented by an irreducible row-stochastic matrix, the definition of reversibility of h as a function and as a Markov operator coincide by Kolmogorov's criterion (Theorem 4.1). Clearly, for h ∈ F (X , E ), exp[h] being reversible is equivalent to h being log-reversible. We could endow the set of positive reversible functions on E with a group structure by considering the standard multiplicative operation on functions. We will choose however (Lemma 5.1), to rather construct and focus on the vector space of log-reversible functions.
Theorem 4.2 (Characterization of reversible e-family). Let V (X , E ) be an irreducible efamily of Markov chains, with natural parametrization θ, generated by K and (g i ) i∈ [d] . The following two statements are equivalent.
(ii) E = E and V (X , E ) is such that the carrier kernel K and generator functions g i , ∀i ∈ [d] are all log-reversible functions.
Proof. We apply Kolmogorov's criterion to some arbitrary family member. Let γ be some finite closed path in (X , E ), Rewriting the left-hand side, Proceeding in a similar way with the right-hand side, we obtain When K and the g i are log-reversible, this equality is verified for any closed path, and every member of the family is therefore reversible. Taking θ = 0 yields the reversibility requirement for K. Further taking θ i = δ i (j) for j ∈ [d] similarly yields the requirement for g j .
This path checking approach, although mathematically convenient, is not algorithmically efficient. In order to determine whether a full-support kernel -or function-is reversible, the number of distinct Kolmogorov equations that must be checked is [Brill et al., 2018, Proposition 2.1] which corresponds to the maximal number of cycles (i.e. closed paths such that the only repeated vertices are the first and last one) in a complete graph over |X | nodes. Such testing algorithm becomes rapidly intractable as |X | increases. However for Markov kernels, we know that this is equivalent to verifying the detailed balance equation, which can be achieved in (at most) polynomial time O(|X | 3 ), by solving a linear system in order to find π. We show that this idea naturally extends to verifying reversibility of functions, enabling us to design an algorithm of time complexity O(|X | 3 ). Proof. Treat h as the linear operator h : R X → R X . Suppose first that h is reversible. We apply PF theory, which guarantees that the following Cesàro averages converge [Meyer, 2000, Example 8.3.2] to some positive projection, For any such cycle, it holds (perhaps vacuously if

Summing this equality over all possible paths in
In the case where h(x, x ) = 0, the above equation holds by symmetry of E . For n ∈ N, appropriately rescaling on both sides with the PF root, summing over all k ∈ {0, . . . , n} and taking the limit at n → ∞, (7) yields detailed balance equations with respect to the projection Π h , or in other words, reversibility of h implies symmetry of Π h • h .
To prove necessity, we suppose now that this symmetry holds, with Π h the PF projection of h. We know that rank(Π h ) = rank(v h u h ) = 1, and that Π h is positive. Consider some finite directed closed path γ. Rearranging products yields but the first factor on the right-hand side vanishes, from the fact that rank one functions are always reversible (Lemma 4.1). This concludes the proof of the lemma.
Notice that we can define π h (x) u h (x)/v h (x) the positive entry-wise ratio of the PF eigenvectors. We can then restate Lemma 4.2 in terms of the familiar detailed balance Remark: when h is known to be reversible, one can compute π h in O(|X |), by adapting the technique of [Suomela, 1979]; unfortunately, it is not possible to check for reversibility using this method. If the space becomes large, the reader can consider iterative (power) methods to compute the PF projector, potentially further reducing the verification time cost. We end this section with a technical lemma that will allow us in later sections to swiftly compute expectations of functions under certain reversibility or skew-symmetricity properties. Lemma 4.3. Let P irreducible with associated edge measure matrix Q. For a function g : (regardless of P being reversible).
Proof. Claim (iii) follows by property of edge measure Q.
From Corollary 4.1, claim (iii), and re-indexing, . Then by re-indexing and symmetry of Q,

The e-family of reversible Markov kernels
In Section 5.1, we begin by analyzing the affine structure of the space of log-reversible functions, derive its dimension, construct a basis, and deduce that the manifold of all irreducible reversible Markov kernels forms an exponential family. The dimension of this family confirms the well-known fact that the number of free parameters for a reversible kernel is only about half of what is required for the general case, hence that reversible chains serve in a sense as a "natural intermediate" [Diaconis et al., 2006, Section 5] in terms of model complexity. In Section 5.2, we proceed to derive a systematic parametrization of the manifold W (X , E ), similar in spirit to the one given in Ito and Amari [1988], and in Nagaoka [2005, Example 1]. In Section 5.3, we connect our results to general differential geometry, and point out that reversible kernels W rev (X , E ) form a doubly autoparallel submanifold in W (X , E ). Finally, we conclude with a brief discussion on reversible geodesics (Section 5.4).

Affine structures
Identifying X with [m], we can endow the set with the natural order induced from N. In this section, we will henceforth assume that E is symmetric, and consider the following subsets of E , We immediately observe that the following cardinality relations hold and that from irreducibility, x = m. The last expression in (8) highlights the fact that |T(E )| is independent of any ordering of elements of X . Note also that the element (m, x ) in the definition of T(E ) plays no special role, and could be replaced with any other element of T 0 (E ) ∪ T + (E ). We define the sets of symmetric and log-reversible functions (Definition 4.1) over the graph (X , E ), respectively by We note that F sym (X , E ) is isomorphic to the vector space of symmetric matrices whose entries are null outside of E , thus dim F sym (X , E ) = |T + (E )| + |T 0 (E )|. We now show that F rev (X , E ) is also a vector space, and that it contains N (X , E ) defined at (4).
Lemma 5.1. The following vector subspace inclusions hold: Proof. To verify (ii), we argue that F rev (X , E ) is closed by linear combinations from properties of the sum. The fact that the null function is trivially reversible concludes this claim. For (i), consider an element h ∈ N ( , and from Corollary 4.1, h ∈ F rev (X , E ), thus the inclusion holds. The set is closed by linear combinations by properties of sums again, and taking f = 0, c = 0 is allowed, whence claim (i).
Remark 5.1. In fact, defining It is then possible to further define the quotient space of reversible generator functions G rev (X , E ) F rev (X , E )/N (X , E ).
Theorem 5.1. The following statements hold.
(i) The set of reversible generators G rev (X , E ) can be endowed with a |T(E )|-dimensional vector space structure.
(ii) The set W rev (X , E ) of irreducible and reversible Markov kernels over (X , E ) forms an e-family of dimension dim W rev (X , E ) = |T(E )|.
Theorem 5.2. The family of functions g ij = δ i δ j + δ j δ i , for (i, j) ∈ T(E ), forms a basis of G rev (X , E ).
Proof. We begin by proving the independence of the family in the quotient space G rev (X , E ).
Since g ij is symmetric in the sense that g ij = g ij , it trivially verifies the log-reversibility property, thus belongs to G rev (X , E ). Let now g ∈ G rev (X , E ) be such that with α ij ∈ R, for any (i, j) ∈ T(E ), and suppose that g = 0 G rev (X ,E ) . Our first step is to observe that necessarily g = 0 F (X ,E ) , i.e. g must be the null vector in the ambient space. Let us suppose for contradiction that there exist ( f , c) ∈ (R X , R) such that g(x, x ) = f (x ) − f (x) + c and either c = 0 or f is not constant over X . Since by definition, (m, x ), (x , m) ∈ T(E ), therefore summing the latter equalities yields c = 0, thus f cannot be constant. But then, g is both symmetric and skew-symmetric, which leads to a contradiction, and g = 0 F (X ,E ) . Since the family g ij : (i, j) ∈ T(E ) is independent in the ambient space F (X , E ), the coefficients α ij , (i, j) ∈ T(E ) must be null, and as result, the family is also linearly independent in G rev (X , E ). Finally, since from Theorem 5.1, |T(E )| = dim G rev (X , E ), the family is maximally independent, hence constitutes a basis of the quotient vector space.
Remark 5.2. An alternative way of showing the linear independence of the family g ij : (i, j) ∈ T(E ) in Theorem 5.2 consists in verifying that (i) the family is independent in F sym , (ii) R ⊂ span g ij : (i, j) ∈ T(E ) , and then invoking (10).

Parametrization of the manifold of reversible kernels
Recall that from [Nagaoka, 2005, Example 1], in the complete graph case (E = X 2 ), we can find an explicit parametrization for W (X , X 2 ). Indeed, picking any x ∈ X , we can easily verify that for the two cases where x = x and x = x , In the remainder of this section, we show how to derive a similar parametrization for W rev (X , E ). We start by recalling the definition of the expectation parameter of an exponential family of kernels. For an e-family V e , following the notation of Definition 2.2, we define and call η = (η 1 , . . . , η d ) the expectation parameter of the family. We will first derive η and later convert to the natural parameter θ using the following lemma.
Lemma 5.2. For a given exponential family, we can express the chart transition maps between the expectation and natural parameters θ • η −1 and η • θ −1 . Extending the notation at Lemma 4.3, In particular, when the carrier kernel verifies K = 0, we more simply have Proof. It is well-known that η i (θ) = ∂ ∂θ i ψ θ = Q θ [g i ] [Hayashi and Watanabe, 2016, Lemma 5.1], [Nagaoka, 2005, Theorem 4], [Nakagawa and Kanaya, 1993, (28)], therefore we only need to show (ii). Let g 1 , g 2 , . . . , g d be a collection of independent functions of G(X , E ). Consider the exponential family as in Definition 2.2. Recall that for two transition kernels P 1 , P 2 respectively irreducible over (X , E 1 ) and (X , E 2 ), and with stationary distributions π 1 and π 2 , the information divergence of P 1 from P 2 is given by Writing P 0 for P θ when θ = 0, where for the last equality we used (i) of the present lemma and Lemma 4.3-(iii). Moreover, by a direct computation, Thus, the potential function is given by By taking the derivative, we recover ∂ ∂η i ϕ(η) = θ i (η) [Nagaoka, 2005, (17)]. Moreover, from (12), we have that where for the last equality, we used the fact that Q η [∂K/∂η i ] = 0, and that from P η being stochastic, This finishes proving (ii) of the lemma.
Theorem 5.3. Let P ∈ W rev (X , E ), with stationary distribution π. Using the basis g ij = δ i δ j + δ j δ i , we can write Q, the edge measure matrix associated with P, as a member of the m-family of reversible kernels, where g = δ m δ x + δ x δ m , and we can write P as a member of the e-family, when (x, x ) ∈ E , P(x, x ) = 0 otherwise, and where x = arg min x∈X {(m, x) ∈ E }.
Proof. Let us consider the basis g ij = δ i δ j + δ j δ i , and taking K = 0, we are looking for a parametrization of the type where exp ψ θ and exp[R θ ] are respectively the PF root and right PF eigenvector of P θ . We first derive a parametrization of the edge measure Q η as a member of an m-family (following Definition 2.1-(ii) with respect to the expectation parameter η). For (i, j) ∈ X 2 , by Lemma 5.2-(i), and thus, from symmetry of Q η and since Q η ∈ P (X 2 ), and more compactly, for (x, x ) ∈ X 2 , where g is defined as in the statement of the theorem. We differentiate by η ij for (i, j) ∈ T(E ), to obtain ∂Q η ∂η ij = g ij − g 2(1 + δ i (j)) .
Invoking (ii) of Lemma 5.2, we convert the expectation parametrization to a natural one, Notice that P θ = P θ , hence the right and left PF eigenvector are identical, i.e. R θ = L θ and as is known (see (6)), the stationary distribution is given by π θ = exp[2R θ ]/ ∑ x∈X exp(2R θ (x)).
In fact, we can easily verify that the right PF eigenvector is given by exp[R θ ] = √ π, and that the PF root is exp ψ θ = (P(m, x )P(x , m)) −1/2 .
Indeed, letting x ∈ X , from detailed balance of P, we have

The doubly autoparallel submanifold of reversible kernels
Recall that we can view W = W (X , E ) as a smooth manifold of dimension d = dim W = |E | − |X |. For each P ∈ W, we can then consider the tangent plane T P at P, endowed with a d-dimensional vector space structure. Together with the manifold, we define an information geometric structure consisting of a Riemannian metric, called the Fisher information metric g, and a pair of torsion-free affine connections ∇ (e) and ∇ (m) respectively called e-connection and m-connection, that are dual with respect to g, i.e. for any vector fields X, Y, Z ∈ Γ(TW ), where Γ(TW ) is the set of all sections over the tangent bundle. We now review an explicit construction for g, ∇ (m) , ∇ (e) .

Construction in the natural chart map. Consider a parametric family
with Θ open subset of R d . For any n ∈ N, we define the path measure Q Nagaoka [2005] defines the Fisher metric as and the dual affine e/m-connections of {P θ : θ ∈ Θ} by their Christoffel symbols,

Autoparallelity. Connections allow us to talk about covariant derivatives and parallelity of vectors fields.
Definition 5.1. A submanifold V is called autoparallel in W with respect to a connection ∇, when for any vector fields ∀X, Y ∈ Γ(TV ), it holds that ∇ X Y ∈ Γ(TV ).
A submanifold V of W is then an e-family (resp. m-family) if and only if it is autoparallel with respect to ∇ (e) (resp. ∇ (m) ) [Nagaoka, 2005, Theorem 6]. As the manifold of reversible kernels is both an e-family and an m-family, it is called doubly autoparallel [Ohara and Ishi, 2016, Definition 1].
Theorem 5.4. The manifold W rev (X , E ) of irreducible and reversible Markov chains over (X , E ) is a doubly autoparallel submanifold in W (X , E ) with dimension Proof. The set of reversible Markov chains is an e-family (Theorem 5.1), and an m-family (Theorem 5.3).

Reversible geodesics
In this section, we let two irreducible reversible kernels P 0 and P 1 over (X , E ), and discuss the geodesics that connect them with respect to ∇ (e) and ∇ (m) . Although already guaranteed (see for example Ohara and Ishi [2016, Proposition 1]), we offer alternative elementary proofs that any kernel lying on either e/m-geodesic is irreducible and reversible. m-geodesics. By irreducibility, there exist unique Q 0 , Q 1 , ∈ Q(X , E ) corresponding to P 0 , P 1 . Moreover, by reversibility Q 0 and Q 1 are symmetric. We let be the m-geodesic (auto-parallel curve with respect to the m-connection) connecting P 0 and P 1 . Then G m (P 0 , P 1 ) forms an m-family of dimension 1. For any ξ ∈ [0, 1], the matrix Q ξ is symmetric as convex combination of two symmetric matrices. Q ξ takes value 0 exactly when Q 0 , Q 1 , i.e. P 0 , P 1 take value 0. Furthermore, writing π 0 (resp. π 1 ) the unique stationary distribution of P 0 (resp. P 1 ), thus Q ξ always defines a proper associated stochastic irreducible stochastic P ξ .

e-geodesics.
We consider the auto-parallel curve with respect to the e-connection that connect P 0 and P 1 , The set G e (P 0 , P 1 ) forms an e-family of dimension 1. Indeed, from Theorem 4.2, and since P 0 and P 1 are reversible by hypothesis, it suffices to verify that (x, x ) → P 1 (x, x )/P 0 (x, x ) is a reversible function over (X , E ). This follows from a simple application of the Kolmogorov criterion (Theorem 4.1).

Reversible information projections
Reversible Markov kernels, as self-adjoint linear operators, enjoy a set of powerful yet brittle spectral properties. The eigenvalues are real, the second largest in magnitude controls the time to stationarity of the Markov process [Levin et al., 2009, Chapter 12], and all are stable under perturbation and estimation [Hsu et al., 2019]. However, any deviation from reversibility carries steep consequences, as the spectrum can suddenly become complex, and partially loses control over the mixing time. Furthermore, eigenvalue perturbation results that were dimensionless [Stewart, 1990, Corollary 4.10 (Weyl's inequality)] now come at a cost possibly exponential in the dimension [Stewart, 1990, Theorem 1.4 (Ostrowski-Elsner)]. For some irreducible P with stationary distribution π, it is therefore interesting to find the closest representative that is reversible, so as to enable Hilbert space techniques. Computing the closest reversible transition kernel with respect to a norm induced from an inner product was considered in Nielsen and Weber [2015], who showed that the problem reduces to solving a convex minimization problem with a unique solution.
In this section, we examine this problem under a different notion of distance. We consider information projections onto the reversible family of transition kernels W rev (X , E ), for some symmetric edge set E . We define the m-projection and the e-projection of P onto the set of reversible transition kernels W rev (X , E ) respectively as P m arg min where D (·||·) is the informational divergence, that was defined at (11). These two generally distinct projections (D is not symmetric in its arguments) correspond to the closest reversible chains when considering information divergence as a measure of distance. Under a careful choice of the connection graph of the reversible family, we derive closed-form expressions for P m and P e , along with Pythagorean identities, as illustrated in Figure 1.
Theorem 6.1. Let P be irreducible over (X , E ).
m-projection. The m-projection P m of P onto W rev (X , E ∪ E ) is given by Moreover, for anyP ∈ W rev (X , E ∪ E ), P m satisfies the following Pythagorean identity.
e-projection. When E ∩ E is a strongly connected directed graph, the e-projection P e of P onto W rev (X , E ∩ E ) is given by and where s is the stochastic rescaling mapping defined at (3). Moreover, for anyP ∈ W rev (X , E ∩ E ), P e satisfies the following Pythagorean identity.
Proof. Our first order of business is to show that P m and P e belong respectively to W rev (X , E ∪ E ) and W rev (X , E ∩ E ). It is easy to see that P m (x, x ) > 0 exactly when (x, x ) or (x , x) belongs to E , hence P m ∈ W rev (X , E ∪ E ), and that P e (x, x ) > 0 whenever (x, x ) belongs to both E and E . Moreover, since the time-reversal operation preserves the stationary distribution of an irreducible chain, P m has the same stationary distribution π m = π, and a straightforward computation shows that P m satisfies the detailed balance equation. To prove reversibility of P e , we rewrite log P e (x, x ) = 1 2 log P(x, x )P(x , x) − log ρ( P e ) From Corollary 4.1, log[P e ] ∈ F rev (X , E ), thus P e ∈ W rev (X , E ).
To prove optimality of P m , it suffices to verify the following Pythagorean identity Writing Q m = diag(π)P m , notice that P m = (P + P )/2 is equivalent to Q m = (Q + Q )/2. We then have where the last equality stems from (i) of Lemma 4.3 and reversibility of P m andP. Similarly, to prove optimality of P e , it suffices to verify that By reorganizing terms From the definition of P e (x, x ), The first three terms being skew-symmetric, reversibility ofP and (ii) of Lemma 4.3 yield thatQ [log(P/P e )] = log ρ( P e ).
By a similar argument, Q e [log(P/P e )] = log ρ( P e ), which concludes the proof.
In other words, the m-projection is given by the natural additive reversiblization [Fill, 1991, (2.4)] of P, while the e-projection is achieved by some newly defined exponential reversiblization of P.
The difference between the m-projection and the e-projection is illustrated in the following example.
Remark 6.1. We observe that, although the m-projection preserves the stationary distribution, this is not true for P e , which exhibits a stationary distribution π e generally different from π. Furthermore, while the solution for the m-projection is always properly defined by taking union of the edge sets, our expression for the e-projection requires additional constraints on the connection graph of P. Indeed, taking the intersection E ∩ E , we always obtain a symmetric set, but can lose strong connectedness. We note but do not pursue the fact that reversibility can be defined for the less well-behaved set of reducible chains. In this case, π need not be unique, or could take null values, and the kernel could have a complex spectrum.
Finally, we show that for any irreducible P, both its reversible projections P m and P e are equidistant from P and its time-reversal P (see also Figure 1). Proposition 6.1 (Bisection property). Let P irreducible, and let P m (resp. P e ) the m-projection (resp. e-projection) of P onto W rev (X , E ).
Proof. For P 1 irreducible over (X , E 1 ) and P 2 irreducible over (X , E 2 ), it is easy to see that Then take P 2 = P m for the first equality, and P 1 = P e for the second.

The e-family of reversible edge measures
Recall that P (X 2 ), the set of all distributions over X 2 , forms an e-family [Amari and Nagaoka, 2007, Example 2.8]. For some e-family of irreducible transition kernels V e ⊂ W (X , X 2 ), one may wonder whether the corresponding family of edge measures also forms an efamily of distributions in P (X 2 ). We begin by illustrating that this holds in particular for the e-family obtained by tilting a memoryless Markov kernel.
Example 7.1. Consider the degenerate Markov kernel corresponding to an iid process P(x, x ) = π(x ) for π ∈ P (X ). For a given function g : X → R, and θ ∈ R, construct P θ (x, x ) = P(x, x )e θg(x ) = π(x )e θg(x ) . Then v θ = 1 is right eigenvector of P θ with eigenvalue ρ(θ) = ∑ x ∈X π(x )e θg(x ) . Letting π θ (x) = π(x)e θg(x) /ρ(θ), we see that π θ is the left PF eigenvector of P θ , and the stationary distribution of the rescaled P θ . We can therefore write, thus {Q θ } θ∈Θ forms an exponential family of distributions over X 2 . This fact can be further understood in the following manner. An e-family of distributions {π θ } θ induces an e-family of memoryless Markov kernels {P θ } θ with P θ (x, x ) = π θ (x ) (see Lemma 8.1 for a proof of this fact for the set of all memoryless kernels), and thus with edge measures Q θ (x, x ) = π θ (x)π θ (x ). Since the 2-iid extension {π θ (x)π θ (x )} θ of the e-family {π θ } θ is also an efamily, it follows that {Q θ (x, x )} θ forms an e-family. : Information projections P e and P m of P onto W rev (X , X 2 ) in the full support case (E = X 2 ) (Theorem 6.1), Pythagorean identities (Theorem 6.1), and the bisection property (Proposition 6.1).
In the remainder of this section, we show that the subset of positive reversible edge measures Q rev = Q rev (X , X 2 ), induced from the e-family of reversible positive kernels, forms a submanifold of P (X 2 ) that is autoparallel with respect to the e-connection, i.e. Q rev is an e-family of distribution of over pairs. Our proof will rely on the definition of a Markov map.
Definition 7.1 (e.g. Nagaoka [2017]). We say that M : P (X ) → P (Y ) is a Markov map, when there exists a transition kernel P M from X to Y (also called a channel) such that for any µ ∈ P (X ), Let U and V be smooth submanifolds (statistical models) of P (X ) and P (Y ) respectively. When there exists a pair of Markov maps M : P (X ) → P (Y ), N : P (Y ) → P (X ) such that their restrictions M| U , N| V are bijections between U and V, and are the inverse mappings of each other, we say that U and V are Markov equivalent, and write U ∼ = V.
Proof. Identify X = [m], and consider Q ∈ Q rev such that We flatten the definition of Q.
Let the matrix E with m(m − 1)/2 columns and m(m − 1) rows be such that, Theorem 7.1. The set Q rev forms an e-family and an m-family of P (X 2 ) with dimension |X | (|X | + 1)/2 − 1. Moreover, Q does not form an e-family in P (X 2 ) (except when |X | = 2).
Proof. Since Q rev ⊂ P (X 2 ), the claim stems from the equivalence between (i) and (ii) of Nagaoka [2017, Theorem 1], and application of Lemma 7.1, and the fact that dim P |X |(|X |+1) In order to prove that Q is not an e-family in P (X 2 ), we first construct the following family of edge measures over three states.
Computing the point on the e-geodesic in P (X 2 ) at parameter value 1/2, yields which does not belong to Q. We can readily expand the above example to general state space size, m > 3, by considering the one-padded versions of the above Q Remark 7.1.
(ii) We note but do not pursue here the fact that a more refined treatment over some irreducible edge set E X 2 is possible.

Comparison of remarkable families of Markov chains
We briefly compare the geometric properties of reversible kernels with that of several other remarkable families of Markov chains, and compile a summary in Table 1.
Family of all kernels irreducible over (X , E ): W (X , E ). This family is known to form both an e-family and an m-family of dimension |E | − |X | [Nagaoka, 2005, Corollary 1].
Family of all reversible kernels irreducible over (X , E ): W rev (X , E ). We show in Theorem 5.1 and Theorem 5.4 that W rev (X , E ) is both an e-family and m-family Family of positive memoryless (iid) kernels: W iid (X , X 2 ). This family comprises degenerate irreducible kernels that correspond to iid processes, i.e. where all rows are equal to the stationary distribution. Notice that for P ∈ W iid , irreducibility forces P to be positive. We show that W iid is an e-family of dimension |X | − 1 (Lemma 8.1), but not an m-family (Lemma 8.2).
Proof. For X = [m], let us consider the following parametrization proposed by Ito and Amari [1988]: This corresponds to the basis with parameters Let P irreducible with stationary distribution π. Suppose first that P is memoryless, i.e. for all x, x ∈ X , P(x, x ) = π(x ). In this case, for all i, j ∈ [m − 1], the coefficient θ ij vanishes, and for all i ∈ [m − 1], it holds that θ i = π(i)/π(m), so that we can write more simply Conversely, now suppose that θ ij = 0 for any i, j ∈ [m − 1]. Then the matrix has rank one, the right PF eigenvector is constant, and P is memoryless. As a result, W iid is an e-family of W such that θ ij = 0 for every i, j ∈ [m − 1].
Lemma 8.2. W iid does not form an m-family.
Proof. We prove the case |X | = 2 and p = 1/2, Computing the corresponding edge measures, But then if we let , we see that the stationary distribution is π 1/2 = 1/2, and But for p = 0, P 1/2 does not belong to W iid , hence the family is not an m-family. The proof can be extended to the more general X = [m], m > 2 by considering instead the two kernels defined by π p = (p, 1 − p, 1, . . . , 1)/(m − 1) and π 1−p for p ∈ (0, 1), p = 1/2.
For simplicity, in the remainder of this section, we mostly consider the full support case.
Family of positive doubly-stochastic kernel: W bis (X , X 2 ). Recall that a kernel P is said to be doubly-stochastic, or bi-stochastic, when P and P are both stochastic matrices. In this case, the stationary distribution is always uniform. It is known that the set of doubly stochastic Markov chains forms an m-family of dimension (|X | − 1) 2 [Hayashi and Watanabe, 2016, Example 4]. However, as a consequence of Lemma 8.4, it does not form an e-family (except when |X | = 2).
Family of positive symmetric kernel: W sym (X , X 2 ). A Markov kernel is symmetric, when P = P , hence this family lies at the intersection between reversible and doubly-stochastic families of Markov kernels, which are both m-families. This implies that symmetric kernels also form an m-family. In fact, Lemma 8.3 shows that the dimension of this family is |X | (|X | − 1)/2. Lemma 8.4, however, shows that W sym only forms an e-family for |X | = 2. Proof. To prove the claim, we will rely on Definition 2.1-(ii) of a mixture family. Consider the functions s 0 : X 2 → R and s ij : X 2 → R for i, j ∈ X , i > j such that for any It remains to show that the s 0 , s 0 + s ij , for i > j, are affinely independent, or equivalently, that the s ij , for i > j, are linearly independent. Let s = ∑ i>j α ij s ij with α ij ∈ R, for any i > j, be such that s = 0. For any i > j, taking x = i, x = j yields α ij = 0, thus the family is independent, hence constitutes a basis, and the dimension is |{i, j ∈ X : i > j}| = |X | (|X | − 1)/2. (ii) The set W bis does not form an e-family, unless |X | = 2.
Proof. We first treat the case |X | = 2 for (i) and (ii). Notice that for θ ∈ R satisfies P θ ∈ W sym , and that the latter expression exhausts all irreducible symmetric chains. We can therefore write which follows the defintion at (2) of an e-family with carrier kernel K = 0, generator g(x, x ) = δ x (x ), natural parameter θ, R θ = 0 and potential function ψ θ = log(e θ + 1).
Furthermore, for |X | = 2, it is easy to see that symmetric and doubly-stochastic families coincide, hence W bis is also an e-family.
We now prove (i) for |X | = 3. We will consider two positive symmetric Markov kernels P 0 and P 1 , and look at the e-geodesic G e (P 0 , P 1 ) s P θ : where the map s, defined in (3), enforces stochasticity. The matrix P θ (x, x ) = s( P θ ) is symmetric, if and only if the right eigenvector of P θ is constant. This, in turn, is equivalent to the rows of P θ being all equal. Consider the two symmetric kernels with free parameter α = 1/3, and let us inspect the curve at parameter θ = 1/2. For P 1/2 to be symmetric, it is necessary that whose unique solution is precisely α = 1/3. Invoking Nagaoka [2005, Corollary 3] finishes proving (i) for |X | = 3. We extend the proof to |X | ≥ 4 using the padding argument of Theorem 7.1, considering 1 m 3P i 1 1 1 1 1 1 . Suppose for contradiction that (ii) is false, i.e. bi-stochastic matrices form an e-family. Take then any e-geodesic between two arbitrary symmetric kernels. The latter operators being reversible, so is the geodesic. But then this curve must also be composed entirely of symmetric matrices, hence the geodesic is symmetric, which contradicts (i).   ≥ 3). We also include, for completeness, the one dimensional manifolds defined by the e-geodesic G e (P 0 , P 1 ) and m-geodesic G m (P 0 , P 1 ) between two irreducible kernels P 0 and P 1 , as defined in Section 5.4. Note that for the binary case |X | = 2, W sym = W bis forms an e-family.

Generation of the reversible family
In this final section, we consider the family of positive Markov kernels W = W (X , X 2 ) i.e. where the support E = X 2 . We first show that W rev is in a sense the smallest exponential family that contains W sym , the family of symmetric Markov kernels. Our notion of minimality relies on the following definition of the exponential hull of some submanifold of W.
where s is defined in (3).
Remark: When U = 1 |X | 1 1 ∈ V, the constraint ∑ k i=1 α i = 1 is redundant. Indeed, since U corresponds to the origin in e-coordinates, the linear hull and affine hull coincide in this case.
Proof. We begin by proving the inclusion e-hull(W sym ) ⊂ W rev . Let P ∈ e-hull(W sym ), then there exist a positive P ∈ F + , and k ∈ N, α 1 , . . . , α k ∈ R, P 1 , . . . , P k ∈ W sym such that log P(x, x ) = ∑ k i=1 α i log P i (x, x ). Observe that the function ∑ k i=1 α i log P i (x, x ) is symmetric in x and x , thus log P(x, x ) is log-reversible, and P is reversible.
We now prove the second inclusion W rev ⊂ e-hull(W sym ). We let H = span ({log[P] : P ∈ W sym } ∪ N ) .
Recall from Theorem 5.2 that the functions g ij = δ i δ j + δ j δ i , for (i, j) ∈ T(X 2 ), form a basis of the quotient space G rev = F rev /N . It suffices therefore to show that g ij : (i, j) ∈ T(X 2 ) ⊂ H. Introduce a free parameter t ∈ (0, 1), t = 1/2, and let us fix (i, j) ∈ T + (X ). Consider P ij,t ∈ W sym defined as follows and the functionsĥ ij ,h iĵ h ij = log |X | + log P ij,t = a(δ i δ i + δ j δ j ) + b(δ i δ j + δ j δ i ), where for simplicity we wrote a = log 2(1 − t) and b = log 2t = a. Since the function ((x, x ) → log |X |) ∈ N , we haveĥ ij ,h ij ∈ H. Notice that we can write hence also g ij ∈ H. Introduce the function h ij = aĥ ij − bh ij a 2 − b 2 = δ i δ i + δ j δ j ∈ H, and observe that we can rewrite the identity I = 1 1 − ∑ (i,j)∈T + (X 2 ) g ij with 1 1 being a constant function. It follows that I ∈ H, and for any j ∈ X , we can express As a result, g ij : (i, j) ∈ T(X 2 ) ⊂ H, and the theorem follows.
Secondly, we show that W rev is also the smallest mixture family that contains W iid , the family of Markov kernels that correspond to iid processes. For this, we define minimality in terms of a mixture hull.
Proof. Let P ∈ m-hull(W iid ), then the corresponding edge measure can be expressed as a linear combination ∑ k i=1 α i Q i , with k ∈ N, α 1 , . . . , α k ∈ R, and where the Q i pertain to some degenerate iid kernel P i = 1 π i . This implies that Q i (x, x ) = π i (x)π i (x ), hence Q i is symmetric. In turn, Q is symmetric, i.e. P is reversible, and m-hull(W iid ) ⊂ W rev .
A direct computation yields that the pair probabilities of the iid process can be written as We first show that Q ij,0 : i ≥ j forms a basis of F sym . Let α ij ∈ R : i ≥ j be such that ∑ i≥j α ij Q ij,0 = 0. Consider first x, x ∈ X such that x > x .
By a similar argument for the case x < x , we obtain that α xx = 0 for any x = x . Inspecting now the diagonal for x ∈ X , This implies that the family Q ij,0 : i ≥ j is independent. Since dim F sym = |X | (|X | + 1)/2 = Q ij,0 : i ≥ j , it is maximally so, thus forms a basis. However, the basis elements are not in W iid . We therefore examine the case ε > 0, and leverage the property that in normed vector spaces, finite linearly independent systems are stable under small perturbations (see Lemma 9.1 reported below for convenience) in order to show existence of a basis in W iid .
Let us consider (F sym , · 1,1 ), the space of real symmetric matrices equipped with the entry-wise 1 norm. For any i ≥ j and for any ε ∈ (0, 1), Let η as defined in Lemma 9.1, with respect to the basis Q ij,0 : i ≥ j , and choose 0 < ε < η 5|X | 2 . Then Q ij,ε − Q ij,0 1,1 < η, thus the family Q ij,ε : i≥j is a also basis for F sym that lies in W iid , whence the theorem.