Exploding Markov operators

A special class of doubly stochastic (Markov) operators is constructed. In a sense these operators come from measure preserving transformations and inherit some of their properties, namely ergodicity and positivity of entropy, yet they may have no pointwise factors.


Introduction
The subject of the current paper lies in the border zone between ergodic theory and operator theory. The main motivation of study was the desire to increase the number of examples of doubly stochastic operators, which escape the scope of classical ergodic theory (because they are not induced by measure preserving maps as their Koopman operators), but they still reveal a nontrivial dynamical behavior. By a doubly stochastic or a Markov operator we understand an operator P : L p (μ) → L p (ν), where (X , μ) and (Y , ν) are probability spaces, which fulfills the following conditions: (i) P f is positive for every positive f ∈ L p (μ), (ii) P1 = 1 (where 1 is the function constantly equal to 1), (iii) P f dν = f dμ for every f ∈ L p (μ). For example, the well-studied class of quasi-compact doubly stochastic operators on L 2 lies pretty far from the theory of measure preserving maps. But the domain of a

The definition
Let us start with the following example described in [3]. It has positive entropy, yet it is strictly non-pointwise, meaning that the only pointwise factor of it is the trivial one (see Sect. 3 for definitions). and define the operator P on L 1 (μ × ν) as follows: In other words, the operator is obtained by integrating with respect to the transition probability defined by: where δ y be the Dirac measure concentrated in y, that is, δ y (A) = 1 A (y).
One may visualize the action of P via the transition probablility P T in the following way. Each point of Y is a pair consisting of some x ∈ X and a positive integer k. The integer coordinate represents the indication of a clock, which counts down time to an explosion. As long as this indication is greater than 1, the point is mapped according to the shift and the counter goes down by one. When the counter is to be reduced from 1 to 0, the point x explodes and its images are spread over the space (more precisely, over some set of cofinal points, which share the common future with x) with counters reset to k with probability b k . Aiming to generalize the above example we notice that its definition may easily be rewritten with use of the adjoint operator. Indeed, if T σ f = f • σ is the Koopman operator of the shift, then the adjoint is given by where ax denotes x shifted to the right and completed by symbol a at the first coordinate. Thus, writing We arrive at the following general definition. Let T : L 1 (X , μ) → L 1 (X , μ) be a Markov embedding (or a formal Koopman operator), i.e., a doubly stochastic operator satisfying |T f | = T | f | for every f ∈ L 1 (μ) (a lattice homomorphism)-in particular, it may be a Koopman operator of a measure preserving transformation. Having in mind that a doubly stochastic operator defined on L p (μ) is uniquely extended to the whole L 1 (μ) we denote by T * : L 1 (μ) → L 1 (μ) the operator adjoint to T . Let: Clearly, P k and E k are doubly stochastic and E 1 = T T * T = T . Let (a n ) be a strictly decreasing sequence, such that ∞ n=1 a n = 1. Define a sequence (b k ) of positive numbers by b k = a k − a k+1 a 1 .
As the simplest example one may consider a k = b k = 1 2 k or, more generally, a k = b k = (1 − a)a k−1 (0 < a < 1), but other choices are also possible (though if a k = b k then one automatically obtains a geometric sequence). We denote by V the direct sum l 1 -∞ n=1 L 1 (X , a n μ), i.e., the Cartesian product of countably many copies of L 1 (X , μ) with norm of f = ( f 1 , f 2 , . . .) given by Define a probability distribution m = m (a n ) on N by m({k}) = a k . Now let Y be the Cartesian product X × N with the product σ -algebra and let ν = μ × m. Clearly, the space V is isomorphic to L 1 (X ×N, μ×m). We will freely use both the representation of elements of V as a sequence ( f n ) and the element f of L 1 (X ×N, μ×m), identifying f n (·) with f (·, n). It is convenient to define an operator J embedding L 1 (X ) in V by We will also say that an operator P on the direct sum is doubly stochastic if:

.) and
∞ k=1 a n (P f ) k dμ = ∞ k=1 a n f k dμ-this is consistent with the definition for operators on L 1 (X × N).

Definition 1 An exploding operator induced by T is an operator
where V T : V → L 1 (X , μ) is given by . By a straightforward calculation one easily veirfies that both these operators are doubly stochastic, and, consequently, so is Υ T .

Remark 1
If T is a Koopman operator of a measure preserving map, then T * is the Perron-Frobenius operator of this map.
Remark 2 (Topological model) By passing to a faithful topological model, e.g., the Stone model (see [5, Chapter 12]), one may without loss of generality assume that X is a compact Hausdorff space, μ is a regular Borel probability measure with full support and T is induced by a continuous surjective map ϕ : X → X , which preserves the measure μ. Each element of L 1 (μ) contains a unique continuous function, so both T and its adjoint T * may be treated as operators on C(X ). Therefore, the Riesz representation theorem allows to define for each k ∈ N and y ∈ X a conditional measure μ k u on X by the formula It is easy to verify that the measure is supported on the closed set ϕ −k ({y}). We have and for n 2 for f = ( f n ). This means that Υ T is induced by a transition kernel given by where ϕ * is the map adjoint to ϕ, acting on regular Borel measures.

Remark 3
For a standard probability space (X , Σ, μ) and a measure preserving surjection T : X → X the operator may also be defined via the notion of a disintegration of a measure. For a fixed k ∈ N let ξ k be a partition of X into sets T −k {T k x}, x ∈ X , and denote by ξ k (x) an element of the partition ξ k which contains x. For every k this partition is measurable (see Appendix 1 in [1] for the precise definition), in other words, the quotient space X /ξ k is countably separated, so X /ξ k is also a standard probability space with measure transported by the map x → ξ k (x). Indeed, let {B 1 , B 2 , . . .} be a separating collection in X . If ξ k (x) and ξ k (y) are disjoint then T k x = T k y. Without loss of generality one may assume that T k x ∈ B i and T k y / ∈ B i for some i ∈ N.
range in the space of all probability measures on X , such that each measure μ k C satisfies μ k C (X \C) = 0 and there is a measureμ k on X /ξ k with the property that for any measurable function f ∈ L 1 (μ), (see [8]). In addition, the map C → μ k C is measurable in the sense that for each measurable A the map x → μ k ξ k (x) (A) is measurable. Let sections of a set A ⊂ Y and of a function f on Y be denoted by We have and where the probability kernel P T is given by:

Remark 4
The construction is particularly simple for Koopman operators of invertible maps, say S :

Pointwise factors
We begin with the definition of a factor of a doubly stochastic operator. Definition 2 1. A unital sublattice of L 1 (X , μ) is a closed linear subspace of L 1 (X , μ) which contains the constant 1 and together with each f it contains its conjugatef and its absolute value | f |. 2. A factor of a doubly stochastic operator P on L 1 (X , μ) is a unital sublattice of L 1 (X , μ), which is invariant under the action of P (see [5], Definition 13.26).
This definition identifies factors as certain subspaces of the domain. It can be shown these subspaces have the form L 1 (X , Σ F , μ), where Σ F is a sub-σ -algebra of the σ -algebra of all measurable subsets of X , which is invariant in this sense that P1 A is Σ F -measurable for each A ∈ Σ F . Moreover, the representation is unique if one assumes that Σ F is complete with respect to the measure μ. Motivated by the theory of classical dynamical systems one may say that if P 1 is a doubly stochastic operator on L 1 (μ 1 ) and P 2 is a doubly stochastic operator on L 1 (μ 2 ) then P 2 is a factor of P 1 if there is a Markov embedding U : L 1 (μ 2 ) → L 1 (μ 1 ) such that U P 2 = P 1 U . This definition is consistent with Definition 2-the appropriate sublattice is obtained as the image U (L 1 (μ 2 )).
In the case of standard Borel spaces, an operator P 2 is a factor of P 1 if and only if there is a measure-preserving surjection π : X 1 → X 2 satisfying, for every f ∈ L 1 (μ 2 ), the condition (P 2 f ) • π = P 1 ( f • π). Indeed, a Markov embedding sends characteristic functions of sets to characteristic functions of other sets, hence it defines a homomorphism between measure algebras. In case of standard Borel spaces such homomorphism Π is always induced by a pointwise measure preserving map π : X 1 → X 2 by the formula Π = π −1 , so the general definition boils down to the pointwise one. Furthermore, if a measure preserving map T 2 is a factor of a map T 1 (in a classical sense) then the Koopman operator of T 2 is a factor of the Koopman operator of T 1 .
A factor F ⊂ L 1 (X , μ) of an operator P is pointwise if the restriction P| F is a Markov embedding. If F = L 1 (X , Σ F , μ) and Σ F is a complete σ -algebra then A factor is trivial if it consists only of constant functions or, in other words, if it can be represented as a subspace L 1 (X , Σ F , μ), where Σ F consists solely of sets of measure zero or one. Obviously, a trivial factor is pointwise.
For a Markov embedding T : L 1 (X , μ) → L 1 (X , μ) let us consider a unital sublattice Note that T is an isometry, so ran(T n ) is closed for every n. The sequence ran(T n ) is descending, so R T is invariant, thus it is a factor of T . The restriction T | R T is invertible, the restriction of the adjoint being the inverse. Moreover, R T is also a factor of Υ T in the sense that J T f = Υ T J f for every f ∈ R T , so J (R T ) is a unital sublattice of V ∼ = L 1 (Y ), invariant under the action of Υ T . Since T is a Markov embedding, it is a pointwise factor of Υ T .

Lemma 1 If F is a pointwise factor of Υ T then V T f ∈ R T for every f ∈ F.
Proof Let f = 1 A ∈ F and let (1 A 1 , 1 A 2 , . . .) be its direct sum representation. Then for each k ∈ N. By linearity and continuity of V T , one has V T f ∈ R T for all f ∈ F.

Theorem 1 The exploding operator Υ T has no non-trivial pointwise factor if and only if R T is trivial.
Proof If R T contains a non-constant function f then it is a non-trivial pointwise factor.
On the other hand, let F be a pointwise factor of Υ T and let 1 A ∈ F. If R T is trivial then V T 1 A is constant, by the above lemma. But then it is equal to zero or to one, because the restriction Υ T | F is a Markov embedding. If it is zero, representing 1 A by  (1 A 1 , 1 A 2 Thus, ∞ k=1 (a k − a k+1 ) 1 A k dμ = 0, so the fact that the sequence (a k ) is strictly decreasing implies that 1 A k dμ = 0 for all k, which means that 1 A is constantly zero. Performing similar calculations for the second case, we obtain which happens only if 1 A k dμ = 1 for all k. Hence, 1 A ≡ 1 and, consequently, F consists only of constant functions.

Corollary 1 If T is a Koopman operator of a one-sided (noninvertible) Bernoulli shift then Υ T is strictly non-pointwise.
Proof Any characteristic function 1 A which belongs to R T is measurable with respect to the tail σ -algebra. Therefore, A has measure zero or one, by Kolmogorov's zero-one law, so R T is trivial.

Ergodicity
A doubly stochastic operator is ergodic if constant functions are the only functions invariant under the action of the operator.
Let us start with the following operator-theoretic restatement of a classical equivalent definition of ergodicity. Though it is probably well known to the experts, we include the proof.

Theorem 2 A doubly stochastic operator P is ergodic if and only if for any nonnegative function f , which is not equal almost everywhere to zero, the sum ∞
k=1 P k f is positive almost everywhere.
Proof Assume first that P is ergodic. Let f be a nonnegative function. By the Chacon-Ornstein theorem, the averages 1 n n k=1 P k f converge almost everywhere to an invariant function. If ∞ k=1 P k f = 0 on a set of positive measure then the limit function is zero on this set, hence by ergodicity it must be equal to zero almost everywhere. Since f and the limit function have the same integral, also f must be zero almost everywhere.
Conversely, assume that f is a nonconstant invariant function for P. Then f < f dμ on a set of positive measure. For any functions g and h denote by g ∨ h the function being the pointwise maximum of g and h. Since and both functions have the same integrals, one has P( f ∨ f dμ) = f ∨ f dμ. The function g = ( f ∨ f dμ) − f dμ is a positive invariant function, which is zero on a set of positive measure. But then ∞ k=1 P k g is zero on the same set, which contradicts our assumption.
We will say that an element ( f 1 , f 2 , f 3 , . . .) of V is constant if the corresponding function f ∈ L 1 (X ×N) is constant, i.e., f 1 = f 2 = . . . = const. With this convention we can study ergodicity of Υ T using its direct sum representation.

Theorem 3 The operator Υ T is ergodic if and only if the Markov embedding T is ergodic.
Proof Assume that T is not ergodic and let f be a non-constant invariant function. It is also invariant with respect to the adjoint T * and, consequently, E k f = f . Then, For the converse statement, let T be ergodic and let f be a nonnegative function on X × N, which is not equal almost everywhere to zero. Representing f by ( f 1 , f 2 , . . .), we obtain that (Υ k T f ) 1 , the first coordinate of Υ k T f , is bounded from below by b k 1 T k ( f 1 ) for every k ∈ N. Indeed, Since T is ergodic, ∞ k=1 T k ( f 1 ) is strictly positive if only f 1 is not identically equal to zero, hence so is ∞ k=1 b k 1 T k ( f 1 ), implying that ∞ k=1 (Υ k T f ) 1 is strictly positive. For n 2 and k n − 1 we have 1 is not identically equal to zero and T is a Markov embedding. Hence, ∞ k=1 Υ k T f is positive. If f 1 is identically equal to zero, but f is not, there exists k such that f k does not vanish. Then (Υ T f ) 1 = V T f b k E k f k and the last function does not vanish. By the first part of the argument we obtain

Entropy
We will prove that the entropy of Υ T is bounded from below by the entropy of T . The definition of entropy of a doubly stochastic operator is not widely known, so I will devote the next few lines for a short introduction to the subject-a detailed exposition may be found in [2] or [3] and an alternative approach in [6]. Similarly to the classical case of the Kolmogorov-Sinai invariant, the entropy of a doubly stochastic operator on L 1 (Y , ν) is defined in several steps. First, the entropy H ν (F) of a finite collection F of measurable functions with range contained in [0, 1] is defined (such collections replace partitions in operator-theoretic definition). Furthermore, an operation of joining such collections is introduced. For instance, one can define the join of collections F and G as a union of F and G (or as their concatenation if collections are interpreted as finite sequences of sets). Then, the entropy h ν (P, F) of an operator P with respect to a collection F is obtained as an upper limit (or a limit, if it exists) lim sup n→∞ 1 n H μ (F n ), where F n stands for the join of F, PF, . . . , P n−1 F and P k F = {P k f : f ∈ F}. Finally, h ν (F) is the supremum sup F h ν (P, F) over all collections under consideration. It was proved in [3] that any specification of the joining operation and the 'static' entropy H ν (F), which satisfies certain set of axioms, leads to a common value of the final notion h ν (P). In addition, the conditional entropy of a collection F with respect to G is defined as The explicit formula for H ν (F) will not be needed in the current paper, but I will recall some of the properties of operator entropy, which I use in the proof of Theorem 4. Below, there is a list of some of the properties of entropy which can be found in [3].
i. Let ξ and ξ be partitions of Y and let Then  F 1 , . . . , F n and G 1 , . . . , G n one has iv. Entropy H ν (F) is continuous with respect to F in the following sense. For two collections F and G both having r elements, F = { f 1 , . . . , f r }, G = {g 1 , . . . , g r }, one defines their L 1 -distance dist(F, G) by the formula where the minimum ranges over all permutations π of a set {1, 2, . . . r }. For every ε > 0 and r ∈ N there is δ > 0 such that if F and G have cardinalities at most r and dist(F, G) < δ then |H ν (F|G)| < ε.
We turn to the proof of the aforementioned result.

Lemma 2
For every measurable function f ∈ L 1 (X , μ) the sequence converges to 0, when i goes to infinity.
Proof Let i be a positive integer. First, we evalute the above norm for n = 1. Note Since (Υ T J T i f ) k = T i+1 f for k 2, using the above inequality one gets ∞ j=i+1 a j for every n, which ends the proof.

Theorem 4 If T is a Markov embedding then h μ (T ) h ν (Υ T ).
Proof Fix a collection F and ε > 0. Denote J F = {J f : f ∈ F}. Given i ∈ N one calculates: By the preceding lemma and the continuity of entropy with respect to a collection of functions, the expression under the sum is smaller than ε if only i is big enough. Moreover, for every i one has h μ (T , F) = h μ (T , T i F). Therefore, h μ (T , F) h ν (Υ T , T i J F) + ε h ν (Υ T ) + ε and the inequality follows by taking supremum over F and infimum over ε.

Final remarks
1. It seems very unlikely that the entropy of Υ T could ever be strictly higher than the entropy of T , but I was not able to prove the equality. However, my conjecture is that the equality holds at least if T is the Koopman operator of a Bernoulli shift. 2. By passing to a topological model one gains possibility to define exploding operators in a purely topological setup. However, the underlying space Y = X × N is not compact, so to study the entropy of such operators one either needs to extend entropy theory introduced in [3] beyond compact spaces or to define the operator on some compactification of Y .