Exploding Markov operators

A special class of doubly stochastic (Markov) operators is constructed. These operators come from measure preserving transformations and inherit some of their properties, namely ergodicity and positivity of entropy, yet they may have no pointwise factors.


Introduction
The subject of the current paper lies in the border zone between ergodic theory and operator theory. The main motivation of study was the desire to increase the number of examples of doubly stochastic operators, which escape the scope of classical ergodic theory (because they are not induced by measure preserving maps as their Koopman operators), but they still reveal a nontrivial dynamical behavior. By a doubly stochastic or a Markov operator we understand an operator P : L p (µ) → L p (ν), where (X, µ) and (Y, ν) are probability spaces, which fulfills the following conditions: (i) P f is positive for every positive f ∈ L p (µ), (ii) P 1 = 1 (where 1 is the function constantly equal to 1), (iii) P f dν = f dµ for every f ∈ L p (µ).
For example, the well-studied class of quasi-compact doubly stochastic operators on L 2 lies pretty far from the theory of measure preserving maps. But the domain of a quasi-compact operator decomposes into the direct sum of two reducing subspaces, called reversible and almost weakly stable parts, respectively, such that the first one is finite dimensional, while on the other one orbits of functions converge to zero in L 2 norm. The restriction of such an operator to the reversible part is Markov isomorphic to a rotation of a compact abelian group (which is finite in this case). The transition probability associated to the operator forces points of the underlying space to ramble periodically through finitely many sets of states (in a fixed order), randomly choosing the succeeding state from a set which is next in the queue. These operators are null, meaning that their sequence entropy is always zero (see [8] for details). As another example one may think of a convex combination of finitely many measure preserving maps, which leads to studying a rich class of iterated function systems. Unfortunately, such operators are hard to handle by the entropy theory as defined in [3]-e.g., it is possible that the combination of maps with positive entropy has entropy equal to zero. In the current paper another class of examples which stem from pointwise maps is proposed and some of their properties are investigated.

The definition
Let (X, Σ, µ) be a standard probability space and let T : X → X be a measure preserving surjection. Let (a n ) be a strictly decreasing sequence, such that ∞ n=1 a n = 1. Define a probability distribution m = m (an) on N by m({k}) = a k . Now let Y be the Cartesian product X × N with the product σ-algebra and let ν = µ × m. For a fixed k ∈ N let ξ k be a partition of X into sets T −k {T k x}, x ∈ X, and denote by ξ k (x) an element of the partition ξ k which contains x. For every k this partition is measurable (see Appendix 1 in [1] for the precise definition), in other words, the quotient space X/ξ k is countably separated, so X/ξ k is also a standard probability space with measure transported by the map x → ξ k (x). Indeed, let {B 1 , B 2 , ...} be a separating collection in X. If ξ k (x) and ξ k (y) are disjoint then T k x = T k y. Without loss of generality one may assume that Let {µ C : C ∈ ξ k } be the disintegration of µ over X/ξ k , that is, there is a map C → µ C defined on X/ξ k with range in the space of all probability measures on X, such that each measure µ C satisfies µ C (X \ C) = 0 and there is a measureμ k on X/ξ k with the property that for any measurable function f ∈ L 1 (µ), (see [6]). In addition, the map C → µ C is measurable, when the space of probability measures is endowed with the Borel σ-algebra for the weak * topology in the space of probability measures on X. An operator E k : L 1 (X, µ) → L 1 (X, µ) given by the formula E k f (x) = f • T dµ ξ k (x) is doubly stochastic. Indeed, it is clear that it is positive and preserves constant functions. Moreover, the function x → µ ξ k (x) is constant on atoms of ξ k , so E k f (x) = C f • T dµ C for x ∈ C. Therefore, for every f ∈ L 1 (X, µ) it holds that Define a sequence (b k ) of positive numbers by b k = a k − a k+1 a 1 .
As the simplest example one may consider a k = b k = 1 2 k or, more generally, a k = b k = (1 − a)a k−1 (0 < a < 1), but other choices are also possible (though if a k = b k then one automatically obtains a geometric sequence). Let δ y be the Dirac measure concentrated in y, that is, δ y (A) = 1 A (y), and let sections of a set A ⊂ Y and of a function f on Y be denoted by Definition 2.
1. An exploding operator induced by T is a Markov integral operator on L 1 (Y, ν) defined by where the probability kernel P T is given by: In other words, One may visualize the action of Υ T via the transition probablilty P T in the following way. Each point of Y is a pair consisting of some x ∈ X and a positive integer k. The integer coordinate represents the indication of a clock, which counts down time to an explosion. As long as this indication is greater than 1, the point is mapped according to the action of the pointwise transformation T and the counter goes down by one. When the counter is to be reduced from 1 to 0, the point x explodes and its images are spread over the space (more precisely, over the set of points which would share the common future with x, if one considered the evolution by T ) with counters reset to k with probability b k . This class of operators is a generalization of the following example described in [3]. It has positive entropy, yet it is strictly non-pointwise, meaning that the only pointwise factor of it is the trivial one (see section 3 for definitions).
Example 2.2. Let (X, Σ, µ) consist of the set X = {0, 1} N of one-sided 0-1 sequences with the product σ-algebra and with the uniform product measure µ = ( 1 2 , 1 2 ) N . Let m be the geometric distribution on natural numbers N given by and define the operator P on L 1 (µ × ν) as follows: This example is an instance of our construction if T is the ( 1 2 , 1 2 )-Bernoulli shift. We will restore some of its features in the more general case.
Let us first prove that Υ T (f ) defined by (1) is Markovian. Clearly, Υ T f 0 for f 0 and Υ T preserves constants, so the only thing left to show is the fact that Υ T preserves measure. This is checked in the following calculation.
The above construction is particularly simple for invertible (injective) maps. In this case, Υ T is closely related to the Koopman operator of T , precisely,
(1) U is a Markov embedding if it is a lattice homomorphism (i.e., |U f | = U |f | for every f ∈ L 1 (µ 2 )) or, equvalently, there is a Markov operator S such that SU is an identity. This definition identifies factors as certain subspaces of the domain. It can be shown these subspaces have form L 1 (X, Σ F , µ), where Σ F is a sub-σ-algebra of the σ-algebra of all measurable subsets of X, which is invariant in this sense that P 1 A is Σ F -measurable for each A ∈ Σ F . Moreover, the representation is unique if one assumes that Σ F is complete with respect to the measure µ. Motivated by the theory of classical dynamical systems one may say that if P 1 is a doubly stochastic operator on L 1 (µ 1 ) and P 2 is a doubly stochastic operator on L 1 (µ 2 ) then P 2 is a factor of P 1 if there is a Markov embedding U : L 1 (µ 2 ) → L 1 (µ 1 ) such that U P 2 = P 1 U . This definition is consistent with Definition 3.3-the appropriate sublattice is obtained as the image U (L 1 (µ 2 )).
In case of standard Borel spaces an operator P 2 is a factor of P 1 if and only if there is a measure-preserving surjection π : X 1 → X 2 satisfying, for every f ∈ L 1 (µ 2 ), the condition (P 2 f ) • π = P 1 (f • π). Indeed, a Markov embedding sends characteristic functions of sets to characteristic functions of other sets, hence it defines a homomorphism between measure algebras. In case of standard Borel spaces such homomorphism Π is always induced by a pointwise measure preserving map π : X 1 → X 2 by the formula Π = π −1 , so the general definition boils down to the pointwise one. Furthermore, if a measure preserving map T 2 is a factor of a map T 1 (in a classical sense) then the Koopman operator of T 2 is a factor of the Koopman operator of T 1 .
A factor of an operator P is pointwise if it is a Koopman operator of a measure preserving map. This may be understood in two equivalent ways: either there is a sublattice of the form L 1 (X, Σ F , µ), where Σ F is a complete σ-algebra, which satisfies or there is a dynamical system (Z, λ, S), where S : Z → Z is a measure preserving map, and a Markov embedding U : A factor is trivial if the measure is concentrated in one point. Obviously, such factors are pointwise and the factor of P is trivial if and only if it can be represented as a subspace L 1 (X, Σ , µ), where Σ consists solely of sets of measure zero or one.
Given a map T we define the following relation on X: x ∼ x ⇔ ∃n ∈ N T n (x) = T n (x ) Equivalently, one can replace the condition above by ξ n (x) = ξ n (x ) for some n or by (2) ∃i, j ∈ N ∃z ∈ X x ∈ ξ i (z) and x ∈ ξ j (z).
Cleary, this is an equivalence relation so it decomposes the space X into disjoint equivalence classes. The equivalence class of x ∈ X will be denoted by [x] and the corresponding quotient space by X. All such classes are both measurable in X and ξ k -measurable for each k.
Consider the space ( X, Σ, µ), where Σ and µ are the σ-algebra and the measure transported from (X, µ) by the canonical map x → [x]. There is a natural pointwise action T on ( X, Σ, µ) given by It is obvious that this is a factor of T , but by a straightforward calculation one verifies that it also is a factor of Υ T . Lemma 3.4. Let L 1 (Y, Σ F , µ) be a pointwise factor of Υ T . Then for every A ∈ Σ F the function Υ T 1 A | 1 is a characteristic function of a set being a union of some equivalence classes of ∼.
for some set B ∈ Σ F . Hence given x one has µ ξ k (x) (T −1 A| k ) = 0 for all k or µ ξ k (x) (T −1 A| k ) = 1 for all k. The value of µ ξ k (x) (A) is constant on each element of ξ k . But if x ∼ x then x and x belong to the same atom of ξ i for some i, so for every k the function x → µ ξ k (x) (A) is also constant on each equivalence class of ∼. Thus, B| 1 is a union of equivalence classes on which µ ξ k (x) (A) = 1.
Theorem 3.5. The exploding operator Υ T has no non-trivial pointwise factor if and only if the σ-algebra Σ consists solely of sets of measure zero or one.
Proof. If Σ contains a set with measure in (0, 1) then ( X, Σ, µ, T ) is a non-trivial pointwise factor. If not and if L 1 (Y, Σ F , µ) is a pointwise factor of Υ T then by the above lemma for every A ∈ Σ F it holds that Υ T 1 A | 1 is either equal to zero µ-a.e.
or it is equal to one µ-a.e.
Proof. The equivalence relation identifies all points, which are sequences having the same tail. It is not hard to show that any set which belongs to the σ-algebra defined by such relation is a member of the tail σ-algebra, therefore it has measure zero or one, by the Kolmogorov's zero-one law.

Ergodicity
A doubly stochastic operator is ergodic if constant functions are the only functions invariant under the action of the operator.
Let us start with the following operator-theoretic restatement of a classical equivalent definition of ergodicity. Though it is probably well known to the experts, we include the proof. Proof. Assume first that P is ergodic. Let f be a nonnegative function. By the Chacon-Ornstein theorem, the averages 1 n n k=1 P k f converge almost everywhere to an invariant function. If ∞ k=1 P k f = 0 on a set of positive measure then the limit function is zero on this set, hence by ergodicity it must be equal to zero almost everywhere. Since f and the limit function have the same integral, also f must be zero almost everywhere.
Conversely, assume that f is a nonconstant invariant function for P . Then f < f dµ on a set of positive measure. For any functions g and h denote by g ∨ h the function being the pointwise maximum of g and h.
Since  Proof. Assume that T is not ergodic, so there is A ⊂ X such that 0 < µ(A) < 1 and T −1 A = A. Clearly, for k > 1: Therefore, 0 Υ T 1 A×N 1 A×N . Both these functions have equal integrals, so they must be equal and 1 A×N is a non-constant function, which is invariant under the action of Υ T .
For the converse statement, let T be ergodic and let f be a non-negative function, which is not equal almost everywhere to zero. If f | 1 is strictly positive on a set A ⊂ X with µ(A) > 0 then, since f | 1 • T is constant on ξ 1 (x), it holds that for almost every x and every k > 1.
If f equals zero almost everywhere on X × {1} then f (x, k) > 0 on a set A ⊂ X of positive measure µ for some k > 1. By definition, and the right hand side is positive on a set of positive measure µ. Indeed, positive on a set of positive measureμ k . Hence, Υ T f (x, 1) is positive on a set of positive measure µ and the hypothesis follows as before.

Entropy
For the sequence (b n ) let us denote its ith partial sum by S(i) = i k=1 b k and its ith tail by R(i) = ∞ k=i+1 b k . We will prove that if i R(i) converges then the entropy of Υ T is bounded from below by the entropy of T . This assumption is satisfied for example by geometric sequences (but not only for them). Let · ∞ denote the norm in L ∞ (Y, ν).
converges to 0, when i goes to infinity.
Proof. Let i be a positive integer. For k i the set for k i.

Consequently,
for k 2, using the above inequality one Assume inductively that for some n. Since Υ T is a L ∞ contraction, it holds that Hence, Υ n k=i R(k) for every n, which ends the proof The definition of entropy of a doubly stochastic operator is not widely known, so I will devote next few lines for a short introduction to the subject-a detailed exposition may be found in [2] or [3] and an alternative approach in [7]. Similarly to the classical case of the Kolmogorov-Sinai invariant, the entropy of a doubly stochastic operator on L 1 (Y, ν) is defined in several steps. First, the entropy H ν (F) of a finite collection F of measurable functions with range contained in [0, 1] is defined (such collections replace partitions in operator-theoretic definition). Simultaneously, an operation of joining such collections is introduced, for instance, one can define the join of collections F and G as a concatenation of F and G. Then, the entropy h ν (P, F) of an operator P with respect to a collection F is obtained as an upper limit (or a limit, if it exists) lim sup n→∞ 1 n H µ (F n ), where F n stands for the join of F, P F, ..., P n−1 F and P k F = {P k f : f ∈ F }. Finally, h ν (F) is the supremum sup F h ν (P, F) over all collections under consideration. It was proved in [3] that any specification of the joining operation and the 'static' entropy H ν (F), which satisfies certain set of axioms, leads to a common value of the final notion h ν (P ). In addition, the conditional entropy of a collection F with respect to G is defined as The explicit formula for H ν (F) will not be needed in the current paper, but I will recall some of the properties of operator entropy, which I use in the forthcomming argument. Both the Kolmogorov-Sinai entropy and the operator entropy will be denoted by the same symbols H ν and h ν . Moreover, the same symbol ∨ will be used for both the joining of partitions and the joining of collections of functions-in either case the meaning will be clear from the context. Below there is a list of some of the properties of entropy which can be found in [3]. (ii) For any finite collections F and G it holds that H ν (F ∨G) H ν (F)+H ν (G).
(iii) For any finite collections F 1 , ..., F n and G 1 , ..., G n it holds that is continuous with respect to F in the following sense. For two collections F = {f 1 , ..., f r } and G = {g 1 , ..., g r } one defines their where the minimum ranges over all permutations π of a set {1, 2, . . . r}. For every ε > 0 there is δ > 0 such that if F and G have cardinalities at most r and dist(F, G) < δ then |H ν (F|G)| < ε.
Proof. Fix a partition ξ of X and ε > 0. Denote by Id : X → X the identity map on X. Given i ∈ N one calculates: By the preceding lemma and the continuity of entropy with respect to a collection of functions, the expression under the sum is smaller than ε if only i is big enough. Moreover, for every i one has h µ (T, ξ) = h µ (T, T −i ξ). Therefore, h µ (T, ξ) h ν (Υ T , 1 T −i ξ×N ) + ε h ν (Υ T ) + ε and the inequality follows by taking supremum over ξ and infimum over ε.
Final remarks.
(1) It seems very unlikely that the entropy of Υ T could ever be strictly higher than the entropy of T , but, surprisingly, I was not able to prove the equality. However, my conjecture is that the equality holds at least if T is a Bernoulli shift. (2) Let X be a compact or, more generally, Polish space. An operator P : C(X) → C(X) is Markov operator if it is positive and preserves constants (in non-compact case C(X) is understood as the space of bounded continuous functions). For a continuous map T our definition 2.1 gives a Markov operator on C(Y ) if for every continuous g ∈ C(X) the map x → gdµ ξ k (x) is everywhere defined and continuous, i.e., if x → µ ξ k (x) is continuous in the weak * topology. It seems reasonable to ask how restrictive are these demands. In [9] one finds an interesting non-classical approach to the idea of disintegration, which yields the same result as the usual disintegration, if well-defined. In particular, theorem 7.1 there gives (together with preceding definitions and construction) a set of assumptions guaranteeing that our definition of Υ T is possible in topological setup. It states that if X and Z are both locally compact and σ-compact Hausdorff spaces with Radon measures µ and ν, respectively, t : X → Z is a continuous map and Z is the support of ν, then the disintegration µ z of µ along t is defined for all z ∈ Z and the map z → µ z is continuous. In our case, for a given k we consider a partition of X into closed sets C x,k = T −k T k (x) and the role of Z is played by the quotient space X/ ξ k . By the definition of identification topology in X/ ξ k , this space is a T 1 -space and a canonical surjection x → C x,k is continuous (e.g., see [4]). If this map was open, then compactness of X would imply that X/ ξ k is a Hausdorff space and compactness of X/ ξ k would follow easily. It is indeed open if X is a subshift-it is easy to see that the image of a cylinder under the identification map is a set in X/ ξ k , such that the union of its elements (treated as subsets of X) is also a cylinder. Any measure on X with full support transports to a measure with full support on X/ ξ k . So the definition 2.1 makes sense in topological setup at least in the class of all subshifts having invariant measure with full support. However, to study entropy of this operator one either needs to extend entropy theory introduced in [3] beyond compact spaces or to define the operator on some compactification of Y .