Asymptotic dependency structure of multiple signals

We formalize the notion of the dependency structure of a collection of multiple signals, relevant from the perspective of information theory, artificial intelligence, neuroscience, complex systems and other related fields. We model multiple signals by commutative diagrams of probability spaces with measure-preserving maps between some of them. We introduce the asymptotic entropy (pseudo-)distance between diagrams, expressing how much two diagrams differ from an information-processing perspective. If the distance vanishes, we say that two diagrams are asymptotically equivalent. In this context, we prove an asymptotic equipartition property: any sequence of tensor powers of a diagram is asymptotically equivalent to a sequence of homogeneous diagrams. This sequence of homogeneous diagrams expresses the relevant dependency structure.


Introduction
According to usual modeling assumptions in information theory, a discrete signal is cut into a collection of long words of length n, whose particular representation is irrelevant (each word is considered as an atomic object without inner structure), and small errors are allowed.If the signal is modeled as a sequence of independently, identically distributed random variables, there is only one relevant variable determining the signal, namely the entropy: the exponential growth rate of the number of typical words of length n.We elaborate on this point of view below in Sect.1.1.Similarly, if one probes a measure-preserving dynamical system at a discrete sequence of times with a finite-output measurement device and counts measurement trajectories of length n, while discarding rarely appearing, untypical ones, one arrives at the notion of entropy of a system-measurement pair.Entropy, in this case, is the exponential growth rate of the number of typical trajectories with respect to the length n.The supremum of such entropies over varying measurement devices is the Kolmogorov-Sinai entropy of a measure-preserving dynamical system.According to a theorem of Ornstein [17], the entropy is the only invariant of the isomorphism classes of certain types of dynamical systems (Bernoulli shifts).
In information theory, but also in artificial intelligence, neuroscience and the theory of complex systems, one usually studies multiple signals at once.Likewise, a dynamical system is often observed with multiple measurement devices simultaneously.In these cases, one assumes in addition that the relations between the signals are essential.In this article we characterize, under these modeling assumptions, the relevant invariants in multiple signals, that are obtained as i.i.d.samples from random variables.We explain this in more detail in Sects.1.3 and 1.4.
We will now explain our point of view on entropy for a single signal, that is, for a single probability space.

Probability spaces and their entropy
First we consider a finite probability space X = (S, p), where S is a finite set, and p is a probability measure on S. For simplicity, assume for now that the measure p has full support.Next, we consider the, so-called, Bernoulli sequence of probability spaces where S n denotes the n-fold Cartesian product of S, and p ⊗n is the n-fold product measure.
The entropy of X is the exponential growth rate of the observable cardinality of tensor powers of X .The observable cardinality, loosely speaking, is the cardinality of the set X n after the biggest possible subset of small measure has been removed.It turns out that the observable cardinality of X n might be much smaller than |S| n , the cardinality of the whole of X n , in the following sense.
The Asymptotic Equipartition Property states that for every ε > 0 and n 0 one can find a, so-called, typical subset A (n) ε ⊂ S n , defined as a subset that takes up almost all of the mass of X n and the probability distribution on A (n) ε is almost uniform on the normalized logarithmic scale, as stated in the following theorem, see [8].
Theorem 1.1 (Asymptotic equipartition property) Suppose X = (S, p) is a finite probability space.Then, for every ε > 0 and every n 0 there exists a subset A ε and B (n) ε are two subsets of X n satisfying the two conditions above, then their cardinalities satisfy The cardinality |A (n)  ε | may be much smaller than |S| n , but it will still grow exponentially with n.Even though there are generally many choices for such a set A (n) ε , in view of the property (1) in Theorem 1.1, the exponential growth rate with respect to n is well-defined up to 2ε.
The limit of the growth rate as ε → 0+ is called the entropy of X Ent(X ) := lim This point of view on entropy goes back to the original idea of Boltzmann [3,4], according to which entropy is the logarithm of the number of equiprobable states, that a system, comprised of many identical weakly interacting subsystems, may take on.It was further developed and applied to Information Theory by Shannon [19], and in the context of dynamical systems by Kolmogorov and Sinai [12,13,21].
Entropy is especially easy to evaluate if the space is uniform, since for any finite probability space with the uniform distribution the observable cardinality is equal to the cardinality of the whole space and therefore Ent(X ) = ln |X |. ( For non-uniform spaces, the entropy can be evaluated by the well-known formula Ent(X ) = − x∈S X p X (x) ln p X (x).

Asymptotic equivalence
If X 1 and X 2 are probability spaces with the same entropy, there is a bijection between their typical sets of sequences of length n, for the plain reason that they can be chosen to have the same cardinality.It means that up to a change of code (of representation) and an error that becomes small as n gets large, the spaces X n 1 and X n 2 are equivalent.In the same sense, both X n 1 and X n 2 are equivalent to a uniform measure space with cardinality e nEnt(X i ) .
In [10], Gromov formalized this concept of asymptotic equivalence.With his definition, two Bernoulli sequences of measure spaces X n 1 and X n 2 are asymptotically equivalent if there exists an "almost-measure-preserving" "almost-bijection" Even though we were greatly influenced by ideas in [10], we found that Gromov's definition does not extend easily to situations in which multiple signals are processed at the same time, or when a dynamical system is probed with several measurement devices at once.

Diagrams of probability spaces
We model multiple signals by diagrams of probability spaces.By a diagram of probability spaces we mean a commutative diagram of probability spaces and measurepreserving maps between some of them.We will give a precise definition in Sect.2.4, but will now consider particular examples of diagrams called two-fans.
A two-fan is a triple of probability spaces X = (X , p X ), Y = (Y , p Y ) and U = (U , p U ), and two measure-preserving maps π X and π Y .We will restrict ourselves for now to the case in which the underlying set of U is the Cartesian product of the underlying sets of X and Y , U = X × Y , and π X and π Y are the ordinary projections.Such a situation arises, for example, when a complex dynamical system, such as a living cell or a brain, is observed via two measuring devices.Generalizing from the case of single signals, we might want to say that two-fans are asymptotically equivalent if for large n there exist almost measure-preserving, almost-bijections Without additional assumptions, asymptotic equivalence classes for two-fans would be completely determined by the entropies of the constituent spaces.However, such an asymptotic equivalence relation would be too coarse.Consider the three examples of two-fans are shown in Fig. 1, which is to be interpreted in the following way.Each of the spaces X i and Y i , i = 1, 2, 3, have cardinality six and a uniform distribution, where the weight of each atom is 1  6 .The spaces U i have cardinality 12 and the distribution is also uniform with all weights being 1  12 .The Fig. 1 Examples of pairs of probability spaces together with joint distributions support of the measure on U i 's is colored grey on the pictures.The maps from U i to X i and Y i are coordinate projections.
In view of Eq. (3) we have for each i = 1, 2, 3, However, common information-processing techniques can still differentiate between the two-fans, by calculating solutions to information-optimization problems.This observation is sometimes expressed by saying that mutual information does not capture the full dependency structure that is relevant from an information-processing perspective.Information-optimization problems play an important role in information theory [25], causal inference [18], artificial intelligence [23], information decomposition [5], robotics [1], and neuroscience [9].
The additional assumption that the relations between the signals is relevant and should be preserved by the asymptotic equivalence results in the requirement that the following diagram commutes However, with this generalization of an asymptotic equivalence to diagrams, we were not able to prove a corresponding Asymptotic Equipartition Property or even to prove the transitivity of the relation.

The entropy distances and asymptotic equivalence for diagrams
Instead of finding an almost measure-preserving bijection between large parts of the two spaces, we consider a stochastic coupling (transportation plan, joint distribution) between a pair of spaces and measure its deviation from being an isomorphism of probability spaces (a measure-preserving bijection).Such a measure of deviation from being an isomorphism then leads to the notion of intrinsic entropy distance, and its stable version-the asymptotic entropy distance, as explained in Sect.3. We say two sequences of diagrams are asymptotically equivalent if the asymptotic entropy distance between them vanishes.
The intrinsic entropy distance is an intrinsic version of a distance between random variables going by many different names, such as entropy distance, shared information distance and variation of information.It was reinvented many times by different people, among them Shannon [20], Kolmogorov, Sinai and Rokhlin.It appears in the proof of the theorem about generating partitions for ergodic systems by Kolmogorov and Sinai, see for example [22].
The intrinsic version of the entropy distance between probability spaces was introduced by Kovacevic et al. [14] and by Vidyasagar [24].They showed that the involved minimization problem is NP-hard.Methods to find approximate solutions are discussed in [6,11].

Asymptotic equipartition property
With the notion of asymptotic equivalence induced by the asymptotic entropy distance, we can prove an asymptotic equipartition property for diagrams.Whereas the asymptotic equipartition property for single probability spaces states that high tensor powers of probability spaces can be approximated by uniform measure spaces, the Asymptotic Equipartition Property Theorem for diagrams, Theorem 6.1, states that sequences of successive tensor powers of a diagram can be approximated in the asymptotic entropy distance by a sequence of homogeneous diagrams.
Homogeneous diagrams have the property that the symmetry group acts transitively on the support of the measures of the constituent spaces.Two-fans shown on Fig. 1 are particular examples of homogeneous diagrams.
Homogeneous probability spaces are just uniform probability spaces, while homogeneous diagrams are, unlike homogeneous probability spaces, rather complex objects.Nonetheless, they seem to be simpler than arbitrary diagrams of probability spaces for the types of problems that we would like to address.
In a subsequent article we show that the optimal values in Information-Optimization problems only depend on the asymptotic class of a diagram and that they are continuous with respect to the asymptotic entropy distance; in many cases, the optimizers are continuous as well.The Asymptotic Equipartition Property implies that for the purposes of calculating optimal values and approximate optimizers, one only needs to consider homogeneous diagrams and this can greatly simplify computations.
Summarizing, the Asymptotic Equipartition Property and the continuity of Information-Optimization problems are important justifications for the choice of asymptotic equivalence relation and the introduction of the intrinsic and asymptotic Kolmogorov-Sinai distances.

Definitions and results in random variable context
In this article, we use the language of probability spaces and their commutative diagrams rather than the language of random variables, because we often encounter situations in which their joint distributions are not defined, are variable, or even do not exist.Some relations between the probability spaces can be easily represented by commutative diagrams of probability spaces, such as by a diamond diagram, Sect.2.5.5, while the description with random variables is complex and not easily interpretable.The diagrams also provide a geometric overview of various entropy identities and inequalities.
Since the language of random variables will be more familiar to many readers, we now present our main result in these terms.
For random variables X, Y, Z etc., we denote by X , Y , Z the target sets, and by X , Y , Z the probability spaces with the induced distributions.
In general, there is a relation between k-tuples of random variables and diagrams of a certain type, involving a space for every subset I ⊂ {1, . . ., k}.
For example, a pair of random variables X, Y (defined on the same probability space) gives rise to a two-fan, where X , Y and X × Y are the target spaces of the random variables X, Y and (X, Y) endowed with their respective laws (i.e. the pushforward of the probability measure).
However, not every type of diagram corresponds to a tuple of random variables.The entropy distance between two k-tuples This condition is strictly stronger than the requirement that all the distributions are uniform.
It is difficult to formulate our main result, Theorem 6.1, in full generality using the language of random variables.However, the following theorem is an immediate corollary.
Theorem 1 Let (X(i) : i ∈ N) be a sequence of i.i.d.random tuples defined on a standard probability space.
Define random k-tuples Y(n) by Then, there exists a sequence of homogeneous random k-tuples

Category of probability spaces and diagrams
In this section we present the basic setup used throughout the article.We will start by explaining how probability spaces and (equivalence classes) of measure-preserving maps between them form a category.This point of view on probability theory was already advocated in [2,10].Category theory yields simple definitions of diagrams of probability spaces and morphisms between them and allows for precise and relatively short proofs.The setup is also convenient when couplings (joint distributions) between probability spaces are absent or variable.

Categories
Below we briefly review elementary category theory.We refer the reader to the first chapter of [15] for a more extensive introduction.
A category C is an abstract mathematical structure that captures the idea of a collection of spaces and structure-preserving maps between them, such as groups and homomorphisms, vector spaces and linear maps, and topological spaces and continuous maps.Categories consist of a collection of objects (which need not to be sets), a collection of morphisms (which need not to be maps), and a rule for composing morphisms.
More formally a category consists of -A class of objects Obj C ; -A class of morphisms Hom C (A, B) for every pair of objects A, B ∈ Obj C .For a morphism f ∈ Hom C (A, B) one usually writes f : A → B. Object A will be called the domain and B the target of f , and we say that f is a morphism from A to B; -For each triple of objects A, B and C, a binary, associative operation, called composition, • : -For every object A ∈ Obj C an identity morphism 1 A : A → A, with the property that for every f : A → B and every g : A morphism f : A → B is an isomorphism if there exists a morphism g : Category theory becomes a very powerful tool when functors and their natural transformations are considered.Functors can be seen as homomorphisms between categories.In turn, natural transformations are homomorphisms between functors.
A (covariant) functor X : C → D between two categories C and D, maps objects and morphisms in C to objects and morphisms in D, respectively.It satisfies the following additional properties: For every morphism for any pair of morphisms f : A → B and g : B → C.
A natural transformation between functors X , Y : C → D is a family η of morphisms in category D, indexed by objects in C: For every A ∈ Obj C , there is a morphism η A : X (A) → Y (A), such that for every morphism f : A → B the diagram

Probability spaces and reductions
We will now describe the category Prob.The objects in Prob are finite probability spaces.A finite probability space X is a pair (S, p), where S is a (not necessarily finite) set and p : 2 S → [0, 1] is a probability measure, such that there is a finite subset of S with full measure.We denote by X = supp p the support of the measure and by |X | := |supp p X | its cardinality.Slightly abusing the language, we call this quantity the cardinality of X .We will no longer explicitly mention that the probability spaces we consider are finite.We will also write p X where we truly mean its density with respect to the counting measure.
We say that a map f : X → Y between two probability spaces X and Y is measurepreserving if the push-forward f * p X equals p Y .This means that for every A ⊂ Y , We say that two measure-preserving maps f : X → Y are equivalent if they agree on a set of full measure.We call an equivalence class of measure-preserving maps from X to Y a reduction.
The morphisms in the category Prob are exactly the reductions between finite probability spaces.At this stage one might want to check that Prob is indeed a category, and this is guaranteed as the composition of two reductions is again a reduction.

Isomorphisms, automorphisms and homogeneity
Now that we have organized probability spaces and reductions into a category, we get concepts such as isomorphism for free: Two probability spaces X and Y are isomorphic in the category Prob if and only if there exists a measure-preserving bijection between the supports of the measures on X and Y .If X and Y are isomorphic, they have the same cardinality.The automorphism group Aut(X ) is the group of all self-isomorphisms of X .
A probability space X is called homogeneous if the automorphism group Aut(X ) acts transitively on the support X of the measure.For the category Prob, this turns out to be a complicated way of saying that the measure on X is uniform on its support, but when we consider diagrams later, there will be no such simple implication.Homogeneity is an isomorphism invariant and we will denote the subcategory of homogeneous spaces by Prob h .
There is a product in Prob (which is not a product in the sense of category theory!) given by the Cartesian product of probability spaces, that we will denote by which is equal to the class of the Cartesian product of maps representing f i 's.The product leaves the subcategory of homogeneous spaces invariant.If one of the factors in the product is replaced by an isomorphic space, then the product stays in the same isomorphism class.
We close this section with a technical remark.The category Prob is not a small category.However it has a small full subcategory, that contains an object for every isomorphism class in Prob and for every pair of objects in it, it contains all the available morphisms between them and is closed under the product.From now on we imagine that such a subcategory was chosen and fixed and replaces Prob in all considerations below.

Diagrams of probability spaces
Essentially, a diagram X = X i ; f i j is a commutative diagram in Prob consisting of a finite number of probability spaces and reductions between some of them.We have seen an example of the two-fan diagram in the introduction We require the diagram to be commutative, that is We need to keep track of the combinatorial structure of the collection of reductions within a diagram.There are several possibilities for doing so: -the reductions form a directed, acyclic graph which is transitively closed; -the spaces in the diagram form a poset; -the underlying combinatorial structure could be recorded as a finite category.
The last option seems to be most convenient since it has many operations, that are necessary for our analysis, already built-in.Besides, we need at times to iterate the construction of commutative diagrams, to create diagrams of diagrams, which is readily available in the category-theory framework but is cumbersome in the other contexts.
A (finite) poset category G is a finite category such that for every two objects O 1 and O 2 there is at most one morphism between them in either direction: For instance, the poset category Λ 2 , is a category with three objects {O 1 , O 12 , O 2 } and two non-identity morphisms A two-fan is then a diagram indexed by Λ 2 : we assign to each object in Λ 2 a probability space and to each morphism in Λ 2 a reduction.
In general, then, a diagram of probability spaces indexed by a poset category G is a functor X : G → Prob.The requirement that X is a functor and not just a map between objects and morphisms (combined with the assumption that there is only one morphism between objects), is exactly the requirement that the diagrams should be commutative.
The collection of all diagrams of probability spaces indexed by a fixed poset category G forms the so-called category of functors The objects of Prob G are diagrams, that is functors from G to Prob, while morphisms in Prob G are natural transformations between them.We will refer to the morphisms in Prob G as reductions as well.
Let us go through the simple example of two-fans: we look at a reduction η : X → Y between two-fan diagrams X , Y : Λ 2 → Prob.The reduction η, being a natural transformation between X and Y , is illustrated by the commutative diagram Thus, a reduction of a two-fan is a family of reductions of probability spaces indexed by the objects in the poset category Λ 2 such that the diagram commutes.
For a diagram X ∈ Prob G , the poset category G will be called the combinatorial type of X .For a poset category G or a diagram X ∈ Prob G we denote by ] the number of objects in the category G.
An object O in a poset category G will be called a source, if it is not a target of any morphism except for the identity.Likewise a sink object is not a domain of any morphism, except for the identity morphism.If a category contains a unique source object, the object is called the initial object and such a category will be called complete.
The above terminology transfers to diagrams indexed by G: A source space in X ∈ Prob G is one that is not a target space of any reduction within the diagram, a sink space is not the domain of any non-trivial reduction and X is called complete if G is, i.e. if it has a unique source space.
The tensor product of probability spaces extends to a tensor product of diagrams.
The construction of the category of commutative diagrams could be applied to any category, not just Prob.Two additional cases will be of interest to us.
Denote by Set the category of finite sets and surjective maps.Then all of the above constructions could be repeated for sets instead of probability spaces.Thus we could talk about the category of diagrams of sets Set G .
Given a reduction f : X → Y between two probability spaces, the restriction f : X → Y is a well-defined surjective map.Given a diagram X = X i ; f i j of probability spaces, there is an underlying diagram of sets, obtained by taking the supports of measures on each level and restricting reductions on these supports.We will denote it by X = X i ; f i j , where X i := supp p X i .Thus we have a forgetful functor We could also repeat the construction of commutative diagrams to form a category of diagrams of diagrams.Thus given two poset categories G and H we can form a category Prob G, H := Prob G H .We will rarely need anything beyond a two-fan of diagrams.There is a natural isomorphism Thus, for example, a two-fan of G-diagrams could be equivalently considered as a G-diagram of two-fans, see also Sect.2.5.3.

Examples of diagrams
We now consider some examples of poset categories and corresponding diagrams, that will be important in what follows.

Singleton
We denote by • the poset category with a single object.Clearly diagrams indexed by • are just probability spaces and we have Prob ≡ Prob • .

Chains
The chain

Two-fan
The two-fan Λ 2 is a category with three objects {O 1 , O 12 , O 2 } and two non-identity morphisms O 12 → O 1 and O 12 → O 2 .A diagram indexed by a two-fan will also be called a two-fan.
Essentially, a two-fan (X ← Z → Y ) is a triple of probability spaces and a pair of reductions between them.
A reduction of a two-fan  For any two-fan (X ← Z → Y ) of probability spaces there always exist a unique (up to isomorphism), minimal two-fan (X ← Z → Y ), that can be included in the diagram shown on Fig. 2b.The minimization can be constructed by taking Z := X ×Y as a set and considering a probability distribution on Z induced by a map Z → Z , that is the Cartesian product of the reductions Z → X and Z → Y in the original two-fan.Thus, the inclusion of a pair of probability spaces X and Y as sink vertices in a minimal two-fan is equivalent to specifying a joint distribution on X × Y .
Note that minimality of a two-fan is defined in purely categorical terms.Even though the definition applies to two-fans of morphisms in any category, the minimization need not to exist.However as the next proposition asserts, if minimization of any two-fan exists in a category C, then it also exists in a category of diagrams over C.

Proposition 2.1 Let G be a poset category, and let
and only if the constituent two-fans of probability spaces The proof of Proposition 2.1 can be found on page 274.

Co-fan
A co-fan V is a category with three objects and morphisms a Diamond category b Two-tents category Fig. 3 Diamond and two-tents categories

A diamond diagram
A "diamond" diagram is indexed by a diamond category , that consists of a two-fan and a co-fan, as shown on Fig. 3.
Of course, there is also a morphism O 12 → O • , which lies in the transitive closure of the given four morphisms.We will often skip writing morphisms that are implied by the transitive closure.
A diamond diagram will be called minimal if the top two-fan in it is minimal.

"Two-tents" diagram
The "two-tents" category M 2 consists of five objects, of which two are sources and three are sinks, and morphisms are as in Fig. 3b.Thus, a typical two-tents diagram consists of five probability spaces and reductions as in The probability spaces U and V are sources and X , Y and Z are sinks.

Full diagram
The full category Λ n on n objects is a category with objects {O I } I ∈2 {1,...,n} \{∅} indexed by all non-empty subsets I ∈ 2 {1,...,n} and a morphism from O I to O J , whenever J ⊆ I .
A diagram X indexed by a full category will be called minimal, if for every two-fan in it, it also contains a minimal two-fan with the same sink vertices.If X ∈ Prob Λ n is minimal full diagram of probability spaces, then the set X (O I ) can be considered as a subset of the product i∈I X (O i ), while reductions are just coordinate projections.
For an n-tuple of random variables X 1 , . . ., X n one may construct a minimal full diagram X ∈ Prob Λ n by considering all joint distributions and "marginalization" reductions.We denote such a diagram by X 1 , . . ., X n .On the other hand, the reductions from the initial space to the sink vertices of a full diagram can be viewed as random variables on the domain of definition given by the (unique) initial space.Suppose X ∈ Prob Λ n is a minimal full diagram with sink vertices X 1 , . . ., X n .It is convenient to view X as a distribution on the Cartesian product of the underlying sets of the sink vertices: where ΔS stands for the space of all probability distribution on a finite set S.
Once the underlying sets of the sink spaces are fixed, there is a one-to-one correspondence between the full minimal diagrams and distributions as above.
As a corollary of Proposition 2.1 we also obtain the following characterization of minimal full diagrams of any G-diagrams of probability spaces.

Constant diagrams
Suppose X is a probability space and G is a poset category.One may form a constant G-diagram by considering a functor that maps all objects in G to X and all the morphisms to the identity morphism X Id −→ X .We denote such a constant diagram by X G or simply by X , when G is clear from the context.Any constant diagram is automatically minimal.
If Y = Y i ; f i j is another G-diagram, then a reduction ρ : Y → X G (which we write sometimes simply as ρ : Y → X ) is a collection of reductions ρ i : Y i → X , such that Let X = X i ; f i j be a complete diagram with the initial space X 0 .Then there is a canonical reduction with components By {•} we denote a one-point probability space.The constant G-diagram {•} G is a unit with respect to the product in Prob G .

Homogeneous diagrams
A diagram X ∈ Prob G indexed by some poset category G is called homogeneous if its automorphism group Aut(X ) acts transitively on every probability space in X .Three examples of homogeneous diagrams were given in the introduction.The subcategory of all homogeneous diagrams indexed by G will be denoted Prob G h .
In fact, for X to be homogeneous it is sufficient that the Aut(X ) acts transitively on every source space in X .Thus, if X is complete with initial space X 0 , to check homogeneity it is sufficient to check the transitivity of the action of the symmetries of X on X 0 .
Any subdiagram of a homogeneous diagram is also homogeneous.In particular, all the individual spaces of a homogeneous diagram are homogeneous However homogeneity of the whole of the diagram is a stronger property than homogeneity of the individual spaces in the diagram, thus in general Two examples of non-homogeneous two-fans are shown in Fig. 4. The pictures are to be interpreted in the same way as the pictures in Fig. 1.
A single probability space is homogeneous if and only if there is a representative in its isomorphism class with uniform measure and the same holds true for chain diagrams, for the co-fan or any other diagram that does not contain a two-fan.However, for more complex diagrams, for example for two-fans, no such simple description is available.

Universal construction of homogeneous diagrams
Examples of homogeneous diagrams could be constructed in the following manner.Suppose Γ is a finite group and {H i } is a collection of subgroups.Consider a collection of sets X i := Γ /H i and consider a natural surjection f i j : X i → X j whenever H i is a subgroup of H j .Equipping each X i with the uniform distribution one can turn the diagram of sets X i ; f i j into a homogeneous diagram of probability spaces.It will be complete if there is a smallest subgroup (under inclusion) among H i 's.
Such a diagram will be complete and minimal, if together with any pair of groups H i and H j in the collection, their intersection H i ∩ H j also belongs to the collection {H i }.
In fact, any homogeneous diagram arises this way.Suppose diagram X = X i ; f i j is homogeneous, then we set Γ = Aut(X ) and choose a collection of points x i ∈ X i such that f i j (x i ) = x j and denote by H i := Stab(x i ) ⊂ Γ .Then, if one applies the construction of the previous paragraph to Γ , with the collection of subgroups {H i }, one recovers the original diagram X upto isomorphism.

Conditioning
Suppose a diagram X contains a fan Given a point x ∈ X with a non-zero weight one may consider conditional probability distributions p Z ( • |x) on Z , and and is defined by the property that for any function f : and is given by Recall that if F is minimal, the underlying set of Z can be assumed to be the product X × Y .In that case We denote the corresponding space Under some assumptions it is possible to condition a whole sub-diagram of X .More specifically, if a diagram X contains a sub-diagram Y and a probability space X satisfying the condition that there exists a space Z in X that reduces to all the spaces in Y and to X , then we may condition the whole of Y on x ∈ X given that p X (x) > 0.
For x ∈ X with positive weight we denote by Y | x the diagram of spaces in Y conditioned on x ∈ X .The diagram Y | x has the same combinatorial type as Y and will be called the slice of Y over x ∈ X .Note that the space X itself may or may not belong to Y .The conditioning Y | x may depend on the choice of a fan between Y and X , however when X is complete the conditioning Y | x is well-defined and is independent of the choice of fans.
Suppose now that there are two subdiagram Y and Z in X and in addition Z is a constant diagram, Z = Z G for some poset category G .Let z ∈ Z , then Y | z is well defined and is independent of the choice of the space in Z , the element of which z is to be considered.
If X is homogeneous, then Y | x is also homogeneous and its isomorphism class does not depend on the choice of x ∈ X .

Entropy
We define entropy by the limit in Eq. 2. Entropy satisfies the so-called Shannon inequality, see for example [8].Namely for any minimal diamond diagram the following inequality holds, Furthermore, entropy is additive with respect to the tensor product, that is, for a pair of probability spaces X , Y ∈ Prob holds Conditional entropy Ent(X |Y ) is defined for a pair X , Y of probability spaces included in a minimal two-fan The above quantity is always non-negative in view of Shannon inequality (5).Moreover, the following identity holds, see [8] Ent For a G-diagram X = X i ; f i j define the entropy homomorphism 123 It will be convenient for us to equip the target R [[G]] with the 1 -norm.Thus If X is a complete G-diagram with initial space X 0 , then by Shannon inequality (5) there is an obvious estimate

The entropy distance
We turn the space of diagrams into a pseudo-metric space by introducing the intrinsic entropy distance and asymptotic entropy distance.The intrinsic entropy distance is obtained by taking an infimum of the entropy distance over all possible joint distributions on two probability spaces.

Entropy distance in the case of single probability spaces
For a two-fan F = (X ← Z → Y ) define a "distance" kd(F ) between probability spaces X and Y with respect to F by kd(F If a two-fan F satisfies kd(F ) = 0, then both reductions in F are isomorphisms.Thus, essentially kd(F ) is some measure of the deviation of the statistical map defined by F from being a deterministic bijection between X and Y .
The minimal reduction For a pair of probability spaces X , Y define the intrinsic entropy distance as The optimization takes place over all two-fans with sink spaces X and Y .In view of inequality (8) one could as well optimize over the space of minimal two-fans, which we will also refer to as couplings between X and Y .The tensor product of X and Y trivially provides a coupling and the set of couplings is compact, therefore an optimum is always achieved and it is finite.
The bivariate function k : Prob × Prob → R ≥0 defines a notion of pseudo-distance and it vanishes exactly on pairs of isomorphic probability spaces.This follows directly from the Shannon inequality (5), and a more general statement will be proven in Proposition 3.1 below.

Entropy distance for complete diagrams
The definition of entropy distance for complete diagrams repeats almost literally the definition for single spaces.We fix a complete poset category G and will be considering diagrams from Prob G .
Consider three such diagrams X = X i , f i j , Y = Y i , g i j and Z = Z i , h i j from Prob G .Recall that a two-fan F = (X ← Z → Y ) can also be viewed as a G-diagram of two-fans The quantity kd(F ) vanishes if and only if the fan F provides isomorphisms between all individual spaces in X and Y that commute with the inner structure of the diagrams, that is, it provides an isomorphism between X and Y in Prob G .
The intrinsic entropy distance between diagrams is defined in analogy with the case of single probability spaces where the infimum is over all two-fans of G-diagrams with sink vertices X and Y .
The following proposition records that the intrinsic entropy distance is in fact a pseudo-distance on Prob G , provided G is a complete poset category (that is when G has a unique initial space).

Proposition 3.1 Let G be a complete poset category. Then the bivariate function
The idea of the proof is very simple.In the case of single probability spaces X , Y , Z a coupling between X and Z can be constructed from a coupling between X and Y and a coupling between Y and Z by adhesion on Y , see [16].The triangle inequality then follows from Shannon inequality.However, since we are dealing with diagrams the combinatorial structure requires careful treatment.Therefore, we provide a detailed proof on page 276.
It is important to note, that the proof uses the fact that G is complete.In fact, even though the definition of k could be easily extended to some bivariate function on the space of diagrams of any fixed combinatorial type, it fails to satisfy the triangle inequality in general, because the composition of couplings requires completeness of G.

The asymptotic entropy distance
Let G be a complete poset category.We will show in Corollary 3.5 below, that the sequence is sublinear and therefore the following limit exists.

κ(X , Y
We call it's value, κ(X , Y ), the asymptotic entropy distance between two diagrams X , Y ∈ Prob G .As a corollary of Proposition 3.1 and definition (10) we immediately obtain that the asymptotic entropy distance is a homogeneous pseudo-distance on Prob G .

Corollary 3.2 Let G be a complete poset category. Then the bivariate function
We will see later that there are instances when κ < k, moreover there are pairs of non-isomorphic diagrams with vanishing asymptotic entropy distance between them.
In the next subsection we derive some elementary properties of the intrinsic entropy distance and the asymptotic entropy distance.

Tensor product
We show that the tensor product on the space of diagrams is 1-Lipschitz.Later this will allow us to give a simple description of tropical diagrams, that is, of points in the asymptotic cone of Prob G , as limits of certain sequences of "classical" diagrams, as will be discussed in a subsequent article.

Proposition 3.3 Let G be a complete poset category. Then with respect to the Kolmogorov distance on Prob G the tensor product
This statement is a direct consequence of additivity of entropy with respect to the tensor product.Details can be found on page 279.
It follows directly from definition (10) and Proposition 3.3, that the asymptotic entropy distance enjoys a similar property.

Corollary 3.4 Let G be a complete poset category. Then with respect to the asymptotic entropy distance on Prob G the tensor product
As another corollary we obtain the subadditivity properties of the intrinsic entropy distance and asymptotic entropy distance.

Corollary 3.5 Let G be a complete poset category and let
It implies in particular that shifts are non-expanding maps in (Prob G , k) or (Prob G , κ).

is a non-expanding map with respect to either intrinsic entropy distance or asymptotic entropy distance.
Less obvious is the fact that κ is, in fact, translation invariant and in particular, (Prob G , κ) satisfies the cancellation property.This is the subject of Proposition 3.7 below, which was communicated to us by Tobias Fritz.

Proposition 3.7 For any triple of diagrams
The proof of the lemma can be found on page 280.

Entropy
Recall that we defined the entropy function by evaluating the entropy of all individual spaces in a G-diagram.The target space R [[G]] will be endowed with the 1 -norm with respect to the natural coordinate system.With such a choice, the entropy function is 1-Lipschitz with respect to the Kolmogorov distance on Prob G .

Proposition 3.8 Suppose G is a complete poset category and δ = k, κ is either intrinsic entropy distance or asymptotic entropy distance on Prob G . Then the entropy function
Again, the proof of the proposition above is an application of Shannon's inequality, see page 281 for details.

The Slicing Lemma
The Slicing Lemma, Proposition 3.9 below, allows to estimate the intrinsic entropy distance between two diagrams with the integrated intrinsic entropy distance between "slices", which are diagrams obtained by conditioning on another probability space.It turned out to be a very powerful tool for estimation of the intrinsic entropy distance and will be used below on several occasions.
As described in Sect.2.6, by a reduction of a diagram X = X i , f i j to a single space U we mean a collection of reductions ρ i : X i → U from the individual spaces in X to U , that commute with the reductions within X Alternatively, whenever a single probability space appears as a domain or a target of a morphism to or from a G-diagram, it should be replaced by a constant G-diagram.Proposition 3.9 (Slicing Lemma) Suppose G is a complete poset category and we are given X , X , Y , Ŷ ∈ Prob G -four G-diagrams and U , V , W ∈ Prob-three probability spaces, that are included into the following three-tents diagram The idea of the proof of the Slicing Lemma (page 281) is as follows.For every pair (u, v) ∈ W we consider an optimal two-fan G uv coupling X |u and Y |v.These fans have the same underlying diagram of sets.Then we construct a coupling between X and Y as a convex combination of distributions of G uv 's weighted by p W (u, v).The estimates on the resulting two-fan then imply the proposition.
Various implications of the Slicing Lemma are summarized in the next corollary.

Corollary 3.10
Let G be a complete poset category, X , Y ∈ Prob G and U ∈ Prob.

Given a fan
3. Let X → U be a reduction, then

Distributions and types
In this section we recall some elementary inequalities for (relative) entropies and the total variation distance for distributions on finite sets.Furthermore, we generalize the notion of a probability distribution on a set to a distribution on a diagram of sets.Finally, we give a perspective on the theory of types, and also introduce types in the context of complete diagrams.

Single probability spaces
For a finite set S we denote by ΔS the collection of all probability distributions on S. It is a unit simplex in the real vector space R S .We often use the fact that it is a compact, convex set, whose interior points correspond to fully supported probability measures on S.
For π 1 , π 2 ∈ ΔS denote by |π 1 − π 2 | 1 the total variation of the signed measure (π 1 − π 2 ) and define the entropy of the distribution π 1 by If, in addition, π 2 lies in the interior of ΔS define the relative entropy by The entropy of a probability space is often defined through formula (11).It is a standard fact, and can be verified with the help of Lemma 4.2 below, that for π ∈ ΔS holds which justifies the name "entropy" for the function h : ΔS → R. Define a divergence ball of radius ε > 0 centered at π ∈ InteriorΔS as For a fixed π and ε 1 the ball B ε (π ) also lies in the interior of ΔS.The total variation norm and relative entropy are related by the following inequality.Lemma 4.1 Let S be a finite set, for any π 1 , π 2 ∈ ΔS, Pinsker's inequality holds The claim of the Lemma, Pinsker's inequality, is a well-known inequality in for instance information theory, and a proof can be found in [8].

Distributions on diagrams
A map f : S → S between two finite sets induces an affine map f * : ΔS → ΔS .
For a diagram of sets S = S i ; f i j we define the space of distributions on the diagram S by Essentially, an element of ΔS is a collection of distributions on the sets S i in S that is consistent with respect to the maps f i j .The consistency conditions ( f i j ) * π i = π j form a collection of linear equations with integer coefficients with respect to the standard convex coordinates in ΔS i .Thus, ΔS is a rational affine subspace in the product of simplices.In particular, ΔS has a convex structure.
If S is complete with initial set S 0 , then specifying a distribution π 0 ∈ ΔS 0 uniquely determines distributions on all of the S i 's by setting π i := ( f 0i ) * π 0 .In such a situation we have If S is not complete and S 0 , . . ., S k is a collection of its source sets, then ΔS is isomorphic to an affine subspace of the product ΔS 0 × • • • × ΔS k cut out by linear equations with integer coefficients corresponding to co-fans in S with source sets among S 0 , . . ., S k .
To simplify notation, for a probability space X or a diagram X we will write

Types
We now discuss briefly the theory of types.Types are special subspaces of tensor powers that consist of sequences with the same "empirical distribution" as explained in details below.For a more detailed discussion the reader is referred to [7,8].We generalize the theory of types to complete diagrams of sets and complete diagrams of probability spaces.The theory of types for diagrams, that are not complete, is more complex and will be addressed in a subsequent article.

Types for single spaces
Let S be a finite set.For n ∈ N denote also a collection of rational points in ΔS with denominator n. (We say that a rational number r Define the empirical distribution map q : S n → ΔS, that sends (s i ) n i=1 = s ∈ S n to the empirical distribution q(s) ∈ ΔS given by Clearly the image of q lies in Δ (n) S.
For π ∈ Δ (n) S, the space T n π S := q −1 (π ) equipped with the uniform measure is called a type over π .The symmetric group S n acts on S n by permuting the coordinates.This action leaves the empirical distribution invariant and therefore could be restricted to each type, where it acts transitively.Thus, for π ∈ Δ (n) S the probability space (T n π S, u) with u being a uniform (S n -invariant) distribution, is a homogeneous space.Suppose X = (X , p) is a probability space.Let τ n be the pushforward of p ⊗n under the empirical distribution map q : X n → ΔX .Clearly supp τ n ⊂ Δ (n) X , thus (ΔX , τ n ) is a finite probability space.Therefore we have a reduction which we call the empirical reduction.
In particular, it follows that the right-hand side does not depend on the probability p on X as long as π is "compatible" to it.
The following lemma records some standard facts about types, which can be checked by elementary combinatorics and found in [8].Lemma 4.2 Let X be a probability space and x ∈ X n , then If X = (X , p X ) a probability space with rational probability distribution with denominator n, then the type over p X will be called the true type of X T n X := T n p X X .
As a corollary to Lemma 4.2 and equation ( 12) we obtain the following.
Corollary 4.3 For a finite set S and π ∈ Δ Also, for a finite probability space X = (S, p) with a rational distribution p with denominator n holds In particular, The following important theorem is known as Sanov's theorem.It can be easily derived from Lemma 4.2 or a proof can be found in [8].
Theorem 4.4 (Sanov's Theorem) Let X = (S, p) be a finite probability space and let q : X n → (ΔX , τ n ) be the empirical reduction.Then for every r > 0, where B r ( p) is the divergence ball (relative entropy ball) defined in (13).
Combining the estimate in Theorem 4.4 with the Pinsker's inequality in 4.1 we obtain the following corollary.Corollary 4.5 For a finite probability space X = (S, p) holds

Types for complete diagrams
In this subsection we generalize the theory of types for diagrams indexed by a complete poset category.The theory for a non-complete diagrams is more complex and will be addressed in our future work.Before we describe our approach we need some preparatory material.
Suppose we have a f : X → Y between a pair of probability spaces.Then for any n ∈ N there is an induced reduction f * : (ΔX , τ n ) → (ΔY , τ n ) that can be included in the following diamond diagram that satisfies certain special condition, namely, the sides of the diamond are independent conditioned on the bottom space In particular, for any π ∈ ΔX with τ n (π ) > 0 and π = f * π ∈ ΔY holds and there is a well-defined reduction Now we are ready to give the definitions of types.Let X ∈ Prob G be a complete diagram, X = X i ; f i j with initial space X 0 and let π ∈ Δ (n) X .
Define the type T n π X as the G-diagram, whose individual spaces are types of the individual spaces of X over the corresponding push-forwards of π Consider a symmetric group S n acting on X n by automorphisms permuting the coordinates.The action leaves the types T n π X invariant and it is transitive on the initial space T n π X 0 .Thus, each type T n π X is a homogeneous diagram.

The empirical two-fan
Unlike in the cases of single probability spaces there is no empirical reduction from the power of X to ΔX .It will be convenient for us to see the types as the power of the diagram conditioned on a distribution.This is achieved by including the power of diagram into a empirical two-fan.
Given a G-diagram X with initial space X 0 we construct the associated empirical two-fan with sink vertices X n and , τ n ) G as the "composition" of the canonical reduction (X 0 ) G −→ X , Eq. (4) in Sect.2.6, and the empirical reduction X n 0 −→ ΔX 0 ∼ = ΔX in Eq. (14).
The two-fan Q n is not necessarily minimal, but its minimal reduction can be constructed using Lemma 2.2 on page 252. Let by Eq. ( 15) and therefore For every n ∈ N and π ∈ Δ (n) X 0 the type T n π X is a homogeneous diagram.Suppose that a complete diagram X is such that the probability distribution p 0 on the initial set is rational with the denominator n, then we call T n p X the true type of X and denote T n X := T n p 0 X .

Distance between types
Our goal in this section is to estimate the intrinsic entropy distance between two types over two different distributions π 1 , π 2 ∈ Δ (n) S in terms of the total variation distance For this purpose we use a "lagging" technique which is explained below.Practically, we couple different types by randomly removing and inserting the appropriate amount of symbols to pass from a trajectory of the one type to a trajectory of the other.

The lagging trick
Let Λ α be a binary probability space, Λ α := { , } ; p Λ α ( ) = α and let X = (X i , p i ); f i j , Z = (Z i , q i ); g i j be two diagrams indexed by a poset category G and included in a minimal two-fan, i.e a coupling, Assume further that the distribution q on Z is rational with denominator n ∈ N, that is q ∈ Δ (n) Z .It follows that p and p Λ α are also rational with the same denominator n.
We construct a lagging two-fan as follows.The right leg T ρ of L is induced by the right leg ρ of the original two-fan.The left leg is obtained by erasing symbols that reduce to and applying ρ to the remaining symbols.The target space for the reduction l is the true type of X | which is "lagging" behind T n Z by a factor of (1 − α).More specifically, the reduction l is constructed as follows.
Let λ j : Z j → Λ α be the components of the reduction λ : Z → Λ α .Given z = (z i ) n i=1 ∈ T n Z j define the subset of indices and define the jth component of l by By equivariance each l j is a reduction of homogeneous spaces, since the inverse image of any point has the same cardinality.Moreover the reductions l j commute with the reductions in T n Z as explained in Sect.4.3 and therefore l is a reduction of diagrams.
The next lemma uses the lagging two-fan to estimate the intrinsic entropy distance between its sink diagrams.Lemma 5.1 Let X , Z ∈ Prob G be two diagrams indexed by a complete poset category G and included in a minimal two-fan where distribution on Z is rational with denominator n ∈ N. Then It is an immediate consequence of the Slicing Lemma, in particular Corollary 3.10 part (2) that By the subadditivity of the intrinsic entropy distance, This bound is almost the estimate in Lemma 5.1, except Lemma 5.1 estimates the distance between types rather than tensor powers.We will soon see that tensor powers and types are very close in the intrinsic entropy distance.However, for the purpose of the proof of Lemma 5.1, it suffices to know that their entropies are close, an estimate that is provided by Corollary 4.3.

Proof of Lemma 5.1
We will use the lagging two-fan constructed in Eq. ( 17), namely as a coupling to estimate the intrinsic entropy distance Recall that by Corollary 4.3 for a probability space X with a rational distribution we have Thus we can estimate kd(L ) as follows By minimality of the original two-fan and Shannon inequality (5) we have a bound The second part in the sum can be estimated using relation (7) Combining all of the above we obtain the estimate in the conclusion of the lemma.

Distance between types
In this section we use the lagging trick as described above to estimate the distance between types over two different distributions in ΔS where S is a complete diagram of sets.Proposition 5.2 Suppose S is a complete G-diagram of sets with initial set S 0 .Suppose p, q ∈ Δ (n) S and let α = 1 2 | p 0 − q 0 | 1 .Then The idea of the proof is to write p and q as a convex combination of a common distribution p and "small amounts" of p + and q + , respectively.Then we use the lagging trick to estimate distances between types over p and p, as well as between types over q and p.We now present details of the proof.

Proof of Proposition 5.2
Recall that for a complete diagram S with initial set S 0 we have Our goal now is to write p and q as the convex combination of three other distributions p, p + and q + as in We could do it the following way.Let α := 1 2 | p 0 − q 0 | 1 .If α = 1 then the proposition follows trivially by constructing a tensor-product fan, so from now on we assume that α < 1. Define three probability distributions p0 , p + 0 and q + 0 on S 0 by setting for every x ∈ S 0 p0 (x) := 1 1 − α min { p 0 (x), q 0 (x)} and hence combining the above estimates

Technical proofs
This section contains some proofs that did not make it into the main text.The numbering of the claims in this section coincides with the numbering in the main text.Lemma that first appear in this section are numbered within section.
Proposition 2.1 Let G be a poset category, and let X = {X i ; a i j }, Y = {Y i ; b i j } and Z = {Z i ; c i j } be three G-diagrams.Then Before we go to the proof of Proposition 2.1, we will need the following lemma.
Lemma 7.1 Suppose we are given a pair of two-fans of probability spaces be a minimal reduction of F .Then for any reduction ρ : F → F , there exists a reduction ρ : Proof of Lemma 7. 1 We define ρ on the sink spaces of F to coincide with ρ.
To prove the lemma we just need to provide a dashed arrow that makes the following The reduction ρ is constructed by simple diagram chasing and by using the minimality of F .Suppose z ∈ Z and z 1 , z 2 ∈ Z are such that z = μ(z 1 ) = μ(z 2 ).By commutativity of the solid arrows in the diagram above, we have Thus by minimality of F it follows that ρ(z 1 ) = ρ(z 2 ).Hence, ρ can be constructed by setting ρ (z ) = ρ(z 1 ).This finishes the proof of Lemma 7.1.

Proof of Proposition 2.1
First we address claim (1) of the Proposition.Let G = O i ; m i j be a poset category, X , Y , Z ∈ Prob G be three G-diagrams and F = (X ← Z → Y ) be a two-fan.Recall that it can also be considered as a G-diagram of two-fans for all i in the index set I .It follows that if all F i 's are minimal, then so is F .Now we prove the implication in the other direction.Suppose F is minimal.We have to show that all F i are minimal as well.Suppose there exist a non-minimal fan among F i 's.For an index i ∈ I let Choose an index i 0 such that 1. F i 0 is not minimal 2. for any j ∈ Ĵ (i 0 )\{i 0 } the two-fan F j is minimal.
Consider now the minimal reduction μ : F i 0 → F i 0 and construct a two-fan G = G i ; g i j of G-diagrams by setting where f i 0 j is the reduction provided by the Lemma 7.1 applied to the diagram We thus constructed a non-trivial reduction F → G which is identity on the sink G-diagrams X and Y .This contradicts the minimality of F .To address the second assertion of the Lemma 2.1 observe that the argument above gives an algorithm for the construction of a minimal reduction of any two-fan of G-diagrams.

Proposition 3.1 Let G be a complete poset category. Then the bivariate function
Proof The symmetry of k is immediate.The non-negativity of k follows from the fact that entropy of the target space of a reduction is not greater then the entropy of the domain, which is a particular instance of the Shannon inequality (5).
We proceed to prove the triangle inequality.We will make use of the following lemma

Lemma 7.2 For a minimal full diagram of probability spaces
The Lemma 7.2 follows immediately from Shannon inequality.Suppose for now that G = • and we are given three probability spaces X , Y , Z together with the optimal couplings U = (X ← U → Y ) and V = (Y ← V → Z ) in the sense of optimization problem (9).Together they form a two-tents diagram T = (X ← U → Y ← V → Z ).If we can extend T to a minimal full diagram Q as in the assumption of Lemma 7.2, the triangle inequality would follow.The diagram Q = ad(T ) can be constructed by the so called adhesion, as explained below.
As explained in Sect.2.5.7, to construct a minimal full diagram with sink vertices X , Y and Z it is sufficient to provide a distribution on Q := X × Y × Z with the correct push-forwards.We do this by setting

y) .
It is straightforward to check that the appropriate restriction of the full diagram defined in the above manner is indeed the original two-tents diagram.Essentially, to extend we need to provide a relationship (coupling) between spaces X and Z and we do it by declaring X and Z independent conditioned on Y .This is an instance of operation called adhesion, see [16].Thus we have shown that k : Prob × Prob → R is a pseudo-distance.Assume now that G is an arbitrary complete poset category.Suppose X = X i ; f i j , Y = Y i ; g i j and Z = Z i ; h i j are G-diagrams, with initial spaces being X 0 , Y 0 and Z 0 , respectively.Let be two optimal minimal two-fans.
Recall that each two-fan of G-diagrams is a G-diagram of two-fans between the individual spaces, that is

123
We construct a coupling Ŵ between X and Z in the following manner.Starting with the two-tents diagram between the initial spaces, we use adhesion to extend it to a full diagram, thus constructing a coupling between X 0 and Z 0 .This full diagram could then be "pushed down" and provides full extensions of two-tents on all lower levels.Thus we could "compose" couplings Û and V and use a Shannon inequality to establish the triangle inequality for the intrinsic entropy distance.Details are as follows.
Consider a two-tents diagram and extend it by adhesion, as described above, to a Λ 3 -diagram Together with the reductions The diagram above is not necessarily minimal and we now consider the "minimization" of the Λ 3 -diagram (23), as provided by Lemma 2.2 which is minimal by Lemma 2.2(1), we obtain the required inequality, concluding the proof of the triangle inequality.Finally, if k(X , Y ) = 0, then there is a two-fan F of G-diagrams between X and Y with kd(F ) = 0, from which it follows that X and Y are isomorphic.Proof The claim follows easily from the additivity of entropy in equation (6).Suppose that X = X i ; f i j , Y = Y i ; g i j and Y = Y i ; g i j are three G-diagrams and is an optimal fan, so that Consider the fan Then, by additivity of entropy, in equation ( 6), we have Then for any n ∈ N we can estimate For i = 0, . . ., n we set T i = X n−i ⊗ Y i ⊗ Z .Then we have T 0 = X n ⊗ Z and T n = Y n ⊗ Z .Also for each i = 0, . . ., n − 1 the pair (T i , T i+1 ) is a translation of the pair Now we continue the estimate Since n is arbitrarily large, the first summand in the right-hand-side could be dropped and then by choosing ε > 0 arbitrarily small we obtain the required inequality.
Proposition 3.8 Suppose G is a complete poset category and δ = k, κ is either intrinsic entropy distance or asymptotic entropy distance on Prob G .Then the entropy function is 1-Lipschitz.
Proof Let X , Y ∈ Prob G and let be an optimal fan with components For a fixed index i we can estimate the difference of entropies By symmetry we then have 123 Proof Since the two-fan (U ← W → V ) is minimal the probability space W could be considered having underlying set to be a subset of the Cartesian product of the underlying sets of U and V .For any pair (u, v) ∈ W with a positive weight consider an optimal two-fan where Z uv = Z uv,i ; ρ i j .Let p uv,i be the probability distributions on Z uv,i -the individual spaces in the diagram Z uv .The next step is to take a convex combination of distributions p uv,i weighted by p W to construct a coupling X ← Z → Y .First we extend the 7-vertex diagram to a full Λ 4 -diagram of G-diagrams, such that the top vertex has the distribution p i (x, y, u, v) := p uv,i (x, y) p W (u, v) as described in the Sect.2.5.7.
If we integrate over y, we obtain y∈Y d p i (x, u, v, y) = ((π X ,i ) * p uv,i )(x) p W (u, v).
Equation (24) implies that (π X ,i ) * p uv,i = p X i ( • |u) and therefore y∈Y d p i (x, y, u, v) = p X i (x|u) p W (u, v).
In the same way, The extended diagram contains a two-fan of diagrams F = (X ← Z → Y ) with sink vertices X and Y .We call its initial vertex Z = XY i , f i j .
The following estimates conclude the proof the Slicing Lemma.First we use the definitions of intrinsic entropy distance k and of kd(F ) to estimate k(X , Y ) ≤ kd(F ) Next, we apply the definition of the conditional entropy to rewrite the right-hand side We now use (26) and rearrange terms to obtain By the integral formula for conditional entropy (7) applied to the first three terms we get However, because of (25) this simplifies to and X → X that commute with the reductions within each fan, so that the diagram on Fig.2ais commutative.

Fig. 4 4 X 4 U 4 Y 5 X 5 U 5
Fig. 4 Non-homogeneous two-fans consisting of uniform spaces Y 4 check shows that for n ≥ |S 0 |, the constants in O only depend on |S 0 | and [[G]].

Proposition 3 . 3
Let G be a complete poset category.Then with respect to the Kolmogorov distance on Prob G the tensor product⊗ : (Prob G , k) 2 → (Prob G , k) is 1-Lipschitz in each variable, that is, for every triple X , Y , Y ∈ Prob G the following bound holds k(X ⊗ Y , X ⊗ Y ) ≤ k(Y , Y ).

Proposition 3 . 9 (
|Ent(Xi ) − Ent(Y i )| ≤ kd(G i ).Adding above inequalities for all i we have|Ent * (X ) − Ent * (Y )| 1 ≤ kd(G ) = k(X , Y ).By the additivity of entropy we also obtain the 1-Lipschitz property of the entropy function with respect to the asymptotic entropy distance κ.Slicing Lemma) Suppose G is a complete poset category and we are given X , X , Y , Ŷ ∈ Prob G -four G-diagrams and U , V , W ∈ Prob-three probability spaces, that are included into the following three-tents diagram that the two-fan (U ← W → V ) is minimal.Then the following estimate holdsk(X , Y ) ≤ W k(X |u, Y |v)d p W(u, v) x∈X d p i (x, y, u, v) = p Y i (y|v) p W (u, v).It follows thatX |uv = X |u and Y |uv = Y |v(25)andEnt(X i |U V ) = Ent(X i |U ) and Ent(Y i |U V ) = Ent(Y i | V ).(26) is minimal, if and only if the constituent full diagrams of probability spaces F i are all minimal.
2. For any full diagram F ∈ Prob G, Λ n of G-diagrams there exists another minimal full diagram F ∈ Prob G, Λ n with the same sink entries and a reduction μ : F → F , such that μ restricts to an isomorphism on sink entries of F .Moreover, F is unique upto isomorphism.