Entropy rate of product of independent processes

Kułaga-Przymus, Joanna; Lemańczyk, Michał D.

doi:10.1007/s00605-022-01801-2

Entropy rate of product of independent processes

Open access
Published: 10 December 2022

Volume 200, pages 131–162, (2023)
Cite this article

Download PDF

You have full access to this open access article

Monatshefte für Mathematik Aims and scope Submit manuscript

Entropy rate of product of independent processes

Download PDF

1284 Accesses
Explore all metrics

Abstract

We study the multiplicative version of the classical Furstenberg’s filtering problem, where instead of the sum $\textbf{X}+\textbf{Y}$ one considers the product $\textbf{X}\cdot \textbf{Y}$ ($\textbf{X}$ and $\textbf{Y}$ are bilateral, real, finitely-valued, stationary independent processes, $\textbf{Y}$ is taking values in $\{0,1\}$). We provide formulas for ${\textbf{H}}\,(\textbf{X}\cdot \textbf{Y}\,\,|\,\,\textbf{Y})$. As a consequence, we show that if ${\textbf{H}}\,({\textbf{X}})>{\textbf{H}}\,({\textbf{Y}})=0$ and $\textbf{X}\amalg \textbf{Y}$, then $\textbf{H}\,(\textbf{X}\cdot \textbf{Y})<{\textbf{H}}\,({\textbf{X}})$ (and thus $\textbf{X}$ cannot be filtered out from $\textbf{X}\cdot \textbf{Y}$) whenever $\textbf{X}$ is not bilaterally deterministic, $\textbf{Y}$ is ergodic and $\textbf{Y}$ first return to 1 can take arbitrarily long with positive probability. On the other hand, if almost surely $\textbf{Y}$ visits 1 along an infinite arithmetic progression of a fixed difference (with possibly some more visits in between) then we can find $\textbf{X}$ that is not bilaterally deterministic and such that ${\textbf{H}}\,(\textbf{X}\cdot \textbf{Y})={\textbf{H}}\,({\textbf{X}})$. As a consequence, a ${\mathscr {B}}$-free system $(X_\eta ,S)$ is proximal if and only if there is always an entropy drop $h(\kappa *\nu _\eta )<h(\kappa )$ for any $\kappa $ corresponding to a non-bilaterally deterministic process of positive entropy. These results partly settle some open problems on invariant measures for ${\mathscr {B}}$-free systems.

A Donsker-Type Theorem for Log-Likelihood Processes

Article 13 June 2019

Limit theorems for linear processes with tapered innovations and filters

Article 01 January 2024

Optimal Berry-Esseen bound for an estimator of parameter in the Ornstein-Uhlenbeck process

Article 31 January 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Background and main results

In this paper we concentrate on two seemingly unrelated areas:

(A)
multiplicative version of the classical Furstenberg’s problem on defiltering a noisy signal,
(B)
open questions related to invariant measures for so-called ${\mathscr {B}}$-free systems.

We will now give some background on both, (A) and (B). Then we will present the main technical result and its consequences. Finally, since the paper is a mixture of probabilistic and ergodic tools, we present in a separate section a dictionary allowing for a simultaneous use of both. The remainder of the paper is devoted to the proofs and examples illustrating our results. In the appendix we give some more detailed comments on ${\mathscr {B}}$-free systems that can be of an independent interest.

1.1 Furstenberg’s filtering problem

The classical Furstenberg’s filtering problem from the celebrated paper [17] concerns two stationary real processes: $\textbf{X}$ (the emitted signal) and $\textbf{Y}$ (the noise), with $\textbf{X}\amalg \textbf{Y}$, and the following question is asked:

Question 1

([17]) When is $\textbf{X}$ measurable with respect to $\sigma $-algebra generated by $\textbf{X}+\textbf{Y}$? In other words, when it is possible to recover $\textbf{X}$ from the received signal $\textbf{X}+\textbf{Y}$?

In order to address this problem, Furstenberg [17] introduced the notion of disjointness of dynamical systems, which even today remains one of the central concepts in ergodic theory. Recall that measure-theoretic dynamical systems $(X,\mathcal {B},\mu ,T)$ and $(Y,\mathcal {C},\nu ,S)$ are disjoint if the product measure $\mu \otimes \nu $ is the only $(T\times S)$-invariant measure, projecting as $\mu $ and $\nu $ onto the first and second coordinates, respectively.^{Footnote 1} Recall also that each measure-theoretic dynamical system $(X,\mathcal {B},\mu ,T)$ yields a family of bilateral, real, stationary processes in the following way: for any measurable function $f:X\rightarrow {\mathbb {R}}$, the process $\textbf{X}=(f\circ T^i)_{i\in {\mathbb {Z}}}$ is stationary. In particular, each measurable partition of X into finitely many pieces yields a finitely-valued stationary process. On the other hand, each real stationary process $\textbf{X}$ yields a (symbolic) measure-theoretic dynamical system by taking the left shift S on the product space ${\mathbb {R}}^{\mathbb {Z}}$, with the invariant measure given by the distribution of $\textbf{X}$ (if the state space of $\textbf{X}$ is smaller than ${\mathbb {R}}$, we can consider the left shift S on the appropriate smaller product space). A crucial basic observation is that whenever the family of functions $\{f\circ T^i:i\in {\mathbb {Z}}\}$ generates $\mathcal {B}$ then the resulting symbolic (measure-theoretic) dynamical system is isomorphic to $(X,\mathcal {B},\mu ,T)$. Last, but not least, we say that processes $\textbf{X}$ and $\textbf{Y}$ are absolutely independent, whenever the resulting dynamical systems are disjoint. Furstenberg showed that absolute independence is a sufficient condition, under which one has the positive answer to Question 1:

Theorem 1.1

([17]) Suppose that $\textbf{X}$ and $\textbf{Y}$ are integrable and that $\textbf{X}$ is absolutely independent from $\textbf{Y}$. Then $\textbf{X}$ is measurable with respect to $\sigma $-algebra generated by $\textbf{X}+\textbf{Y}$.

Garbit [19] showed that the integrability assumption can be dropped and the assertion of Theorem 1.1 still holds.

We are interested in the following modification of Question 1: instead of the sum of processes $\textbf{X}$ and $\textbf{Y}$, we consider their product

$$\begin{aligned} \textbf{M}:=\textbf{X}\cdot \textbf{Y}=(X_i\cdot Y_i)_{i\in {\mathbb {Z}}}. \end{aligned}$$

Notice that if $\textbf{X}$ and $\textbf{Y}$ take only positive values, we can define processes $\log \textbf{X}$ and $\log \textbf{Y}$. Since $\log \textbf{M}=\log \textbf{X}+\log \textbf{Y}$, by the result of Garbit, $\textbf{X}$ can be recovered from $\textbf{M}$ whenever $\textbf{X}$ and $\textbf{Y}$ are disjoint. Therefore, it is natural to ask whether the same conclusion as in Theorem 1.1 holds for processes that admit zero as a value. The simplest instance of this is when the state space, e.g., of $\textbf{Y}$, equals $\{0,1\}$. One can think of $\textbf{M}$ as of the original signal $\textbf{X}$, where some of the information was lost (due to $Y_i=0$), instead of just being perturbed (by adding $Y_i$ to $X_i$). Thus, we deal with the following problem:

Question 2

Let $\textbf{X}$ and $\textbf{Y}$ be bilateral, real, finitely-valued, stationary processes, with $Y_i\in \{0,1\}$. Suppose that $\textbf{X}\amalg \textbf{Y}$. Is it possible to recover $\textbf{X}$ and / or $\textbf{Y}$ from $\textbf{M}$?

A similar (in fact, much more general) problem of retrieving a lost signal was studied by Furstenberg, Peres and Weiss in [18]. Let $\textbf{X}^{(i)}=\left( X_i^{(U_i)}\right) _{i\in \mathbb {Z}}$, where $i\in {\mathbb {N}}$, be a family of processes and $\textbf{U}$ be an ${\mathbb {N}}$-valued process. Suppose that all these processes are stationary and define

$$\begin{aligned} \textbf{X}^{(\textbf{U})}=\left( X_i^{(U_i)} \right) _{i\in {\mathbb {Z}}} \end{aligned}$$

(informally, $\textbf{U}$ chooses among the family of processes).

Question 3

Is it possible to recover $\textbf{U}$ from $\textbf{X}^{(\textbf{U})}$?

In order to answer this question the authors of [18] introduce the notion of double disjointness. We say that process $\textbf{Y}$ is doubly disjoint from $\textbf{X}$ if every self-joining of $\textbf{Y}$ is absolutely disjoint from $\textbf{X}$. In other words if $(\textbf{X}',\textbf{Y}', \textbf{Y}'')$ is a stationary process such that $\textbf{X}' \sim \textbf{X}$ and $\textbf{Y}', \textbf{Y}'' \sim \textbf{Y}$ then $\textbf{X}'\amalg (\textbf{Y}', \textbf{Y}'') $. The most basic example of doubly disjoint processes arises when $\textbf{Y}$ is of zero entropy rate (then every self-joining of $\textbf{Y}$ has zero entropy) and $\textbf{X}$ has trivial tail-$\sigma $-field (let us add that, in fact, if $\textbf{Y}$ is doubly disjoint from $\textbf{X}$ then necessarily ${\textbf{H}}\,({\textbf{Y}}) = 0$ and $\textbf{X}$ is ergodic). (For the definition of entropy rate, see (1.2) below.) Now, the main result of [18] can be summarized (roughly) as follows. Suppose that $\textbf{X}^{(i)}$ for $i\in {\mathbb {N}}$ and $\textbf{U}$ are jointly stationary. If $\textbf{U}$ is doubly disjoint from each $\textbf{X}^{(i)}$ for $i\in {\mathbb {N}}$ then one can retrieve $\textbf{U}$ from $\textbf{X}^{(\textbf{U})}$.

Let us explain how to fit this theorem to our setting from Question 2 (and retrieve $\textbf{Y}$ from $\textbf{M}$). Consider two processes $\textbf{X}^{(i)}$, for $i \in \{0, 1\}$, where

$$\begin{aligned} X_j^{(i)} = iX_j \end{aligned}$$

(1.1)

and take $\textbf{U} = \textbf{Y}$. Then $\textbf{X}^{(\textbf{U})} = \textbf{X} \cdot \textbf{Y}$ and the theorem states that we can retrieve $\textbf{Y}$ from $\textbf{X} \cdot \textbf{Y}$ as soon as $\textbf{Y}$ is doubly disjoint from $\textbf{X}$. Since the role of $\textbf{X}$ and $\textbf{Y}$ is here not symmetric (and $\textbf{M}$ and $\textbf{Y}$ do not determine $\textbf{X}$ unlike when one studies the sum $\textbf{X}+\textbf{Y}$), it is interesting to ask whether one can also retrieve $\textbf{X}$. To stay compatible with the notion of double disjointness, we will assume that ${\textbf{H}}\,({\textbf{X}})>{\textbf{H}}\,({\textbf{Y}})=0$. Then, clearly, a necessary condition for having the positive answer to Question 2 is that ${\textbf{H}}\,({\textbf{M}})={\textbf{H}}\,({\textbf{X}})$. Having this in mind, we will deal with the following three more specific problems:

Question 4

(A)
Is there a general formula for the entropy rate $\textbf{H}(\textbf{M})$ of $\textbf{M}=\textbf{X}\cdot \textbf{Y}$?
(B)
Do we always have $\textbf{H}(\textbf{M})>0$ whenever $\textbf{H}(\textbf{X})>0$?
(C)
Can we have $\textbf{H}(\textbf{M})=\textbf{H}(\textbf{X})$ with ${\textbf{H}}\,({\textbf{X}})>0$?

Remark 1.2

Notice that the answers to Question 1 in [17] and to Question 3 in [18] depend only on the properties of the underlying dynamical systems corresponding to $\textbf{X}$ and $\textbf{Y}$. In this paper the situtation will be different and the ability to defilter $\textbf{X}$ from $\textbf{M}$ will highly depend on the properties of the stochastic processes under consideration, cf. Example 2.16.

1.2 Invariant measures for ${\mathscr {B}}$-free systems

Question 4 is a generalization of some questions asked in [28] in the context of ${\mathscr {B}}$-free systems. For ${\mathscr {B}}\subset {\mathbb {N}}\setminus \{1\}$, consider the corresponding sets of multiples and ${\mathscr {B}}$-free integers:

$$\begin{aligned} \mathcal {M}_{\mathscr {B}}:=\bigcup _{b\in {\mathscr {B}}}b{\mathbb {Z}}\text { and }\mathcal {F}_{\mathscr {B}}:={\mathbb {Z}}\setminus \mathcal {M}_{\mathscr {B}}. \end{aligned}$$

Such sets were studied already in the 30’s from the number-theoretic viewpoint (see, e.g. [2, 5, 7,8,9, 14]). The most prominent example of $\mathcal {F}_{\mathscr {B}}$ is the set of square-free integers (with ${\mathscr {B}}$ being the set of squares of all primes). The dynamical approach to ${\mathscr {B}}$-free sets was initiated by Sarnak [35] who proposed to study the dynamical system given by the orbit closures of the Möbius function $\varvec{\mu }$ and its square $\varvec{\mu }^2$ under the left shift S in $\{-1,0,1\}^{\mathbb {Z}}$.^{Footnote 2} For an arbitrary ${\mathscr {B}}\subset {\mathbb {N}}\setminus \{1\}$, let $X_\eta $ be the orbit closure of $\eta =\mathbb {1}_{\mathcal {F}_{\mathscr {B}}}\in \{0,1\}^{\mathbb {Z}}$ under the left shift, i.e. we deal with a subshift of $(\{0,1\}^{\mathbb {Z}},S)$.^{Footnote 3} We say that $(X_\eta ,S)$ is a ${\mathscr {B}}$-free system. In the so-called Erdös case (when the elements of ${\mathscr {B}}$ are pairwise coprime, ${\mathscr {B}}$ is infinite and $\sum _{b\in {\mathscr {B}}}1/b<\infty $), $X_\eta $ is hereditary: for $y\leqslant x$ coordinatewise, with $x\in X_\eta $ and $y\in \{0,1\}^{\mathbb {Z}}$, we have $y\in X_\eta $. In other words, $X_\eta =M(X_\eta \times \{0,1\}^{\mathbb {Z}})$, where M stands for the coordinatewise multiplication of sequences. For a general ${\mathscr {B}}\subset {\mathbb {N}}\setminus \{1\}$, ${X}_\eta $ may no longer be hereditary and we consider its hereditary closure ${\widetilde{X}}_\eta :=M(X_\eta \times \{0,1\}^{\mathbb {Z}})$ instead. Usually, one assumes at least the primitivity of ${\mathscr {B}}$ (i.e. $b\,|\!\!/\, b'$ for $b\ne b'$ in ${\mathscr {B}}$).

Given a topological dynamical system (X, T), i.e. a homeomorphism T acting on a compact metric space X, let $\mathcal {B}$ be the $\sigma $-algebra of Borel subsets of X. By $\mathcal {M}(X,T)$ we will denote the set of all probability Borel T-invariant measures on X and $\mathcal {M}^e(X,T)$ will stand for the subset of ergodic measures. Each choice of $\mu \in \mathcal {M}(X,T)$ results in a measure-theoretic dynamical system, i.e. a 4-tuple $(X,\mathcal {B},\mu ,T)$, where $(X,\mathcal {B},\mu )$ is a standard probability Borel space, with an automorphism T. We often skip $\mathcal {B}$ and write $(X,\mu ,T)$. Recall also that $x\in X$ is said to be generic for $\mu \in \mathcal {M}(X,T)$, whenever $\lim _{N\rightarrow \infty }\frac{1}{N}\sum _{n\leqslant N}\delta _{T^nx}=\mu $ in the weak topology. If the convergence takes place only along a subsequence $({N_k})_{k\geqslant 1}$ then we say that x is quasi-generic for $\mu $. Each measure $\mu $ resulting this way yields a measure theoretic dynamical system $(X,\mathcal {B},\mu ,T)$.

A central role in the theory of ${\mathscr {B}}$-free systems is played by the so-called Mirsky measure, denoted by $\nu _\eta $. In the Erdös case, $\eta $ is a generic point for $\nu _\eta $ (in general, $\eta $ is quasi-generic along some natural sequence $(N_k)$), see [12]. It was shown in [12, 28] that all invariant measures for ${\widetilde{X}}_\eta $ are of the following special form:

Theorem 1.3

(cf. Sect. 1) For any $\nu \in \mathcal {M}({\widetilde{X}}_\eta ,S)$, there exists $\rho \in \mathcal {M}(X_\eta \times \{0,1\}^{\mathbb {Z}}, S\times S)$ such that $\rho |_{X_\eta }=\nu _\eta $ and $M_*(\rho )=\nu $.^{Footnote 4}

Recall that given a measure-theoretic dynamical system $(X,\mathcal {B},\mu , T)$, any T-invariant sub-$\sigma $-algebra $\mathcal {A}\subset \mathcal {B}$ is called a factor of $(X,\mathcal {B},\mu ,T)$.^{Footnote 5} Notice that given $\nu $ and $\rho $ as in Theorem 1.3, $({\widetilde{X}}_\eta ,\nu ,S)$ is a factor of $(X_\eta \times \{0,1\}^{\mathbb {Z}},\rho ,S\times S)$.

The measure-theoretic entropy of $(X,\mathcal {B},\mu ,T)$ will be denoted by $h_\mu (T,\mathcal {B})$. If no confusion arises, we will also write $h(\mu ,T)$ or even $h(\mu )$. If $\textbf{X}$ is a finitely-valued stationary process determining $(X,\mu ,T)$ (as described in Sect. 1.1) then ${\textbf{H}}\,({\textbf{X}})=h(\mu )$.

The Mirsky measure $\nu _\eta $ is of zero entropy. Moreover, it was shown in [28] in the Erdös case that $({X}_\eta ,S)$ is intrinsically ergodic (it has exactly one measure realizing the topological entropy). Its measure of maximal entropy equals , where stands for the Bernoulli measure on $\{0,1\}^{\mathbb {Z}}$ of entropy $\log 2$. These results were extended in [12] to a general ${\mathscr {B}}$ (one needs to replace $X_\eta $ with ${\widetilde{X}}_\eta $). In the Erdös case, the topological entropy of $({X}_\eta ,S)$ is equal ^{Footnote 6} (in general, the topological entropy of $({\widetilde{X}}_\eta ,S)$ equals ${\overline{d}}(\mathcal {F}_{\mathscr {B}})$ [12]).^{Footnote 7} This led to the study of product type measures (or multiplicative convolutions):

$$\begin{aligned} \nu _\eta *\kappa :=M_*(\nu _\eta \otimes \kappa ). \end{aligned}$$

In particular, it was proved that

Moreover, it was shown that for each value there is an ergodic measure $\kappa $ satisfying $h(X_\eta ,\nu _\eta *\kappa )=h$. However, some fundamental questions related to such measures were left open – they turn out to be a special instance of Question 4 (see Question 1 in [28]):

Question 5

(A)
Is there a general formula for the entropy $h(\nu _\eta *\kappa )$ of $\nu _\eta *\kappa $?
(B)
Do we always have $h(\nu _\eta *\kappa )>0$ whenever $h(\kappa )>0$?
(C)
Can we have $h(\nu _\eta *\kappa )=h(\kappa )$ with $h(\kappa )>0$?

1.3 Main technical result

Our main tool used to answer Questions 4 and 5 is concerned with the entropy rate of stationary processes. Before we can formulate it, we need some definitions and notation that will be used througout the whole paper.

All random variables and processes will be defined on a fixed probability space $({\Omega , \mathcal {F},\mathbb {P}})$. Sometimes, we will replace the underlying probability measure $\mathbb {P}$ by its conditioned version, $\mathbb {P}_A(\cdot ) = {\mathbb {P}}\,\, (\cdot \cap \,\, A) / \,{\mathbb {P}}\,\,(A)$, where $A\in \mathcal {F}$ with $\mathbb {P}(A)>0$. In particular, $\mathbb {E}_A$ will stand for the expectation taken with respect to $\mathbb {P}_A$. For convenience sake, we will write A, B instead of $A\cap B$ for any $A,B\in \mathcal {F}$: for example, $\mathbb {E}_{A, B}$ stands for $\mathbb {E}_{A\cap B}$. A central role will be played by the Shannon entropy of a random variable X, denoted by us by ${\textbf{H}}\,(X)$. Although we will recall basic definitions and properties related to ${\textbf{H}}\,(X)$, some well-known facts will be taken for granted (all of them can be found in [22]). All random processes will be bilateral and real. Usually, they will be also finitely-valued and stationary, however sometimes we will need auxiliary countably-valued, non-stationary processes. Recall that a process $\textbf{X} = {({{X}_i})_{i\in {\mathbb {Z}}}}$ is stationary if ${({{X}_i})_{i\in {\mathbb {Z}}}}$ has the same distribution as $\left( X_{i + 1}\right) _{i\in {\mathbb {Z}}}$ and finitely-valued if, for every $i\in {\mathbb {Z}}$, $X_i \in \mathcal {X}$, with $\left| \mathcal {X}\right| < \infty $.

Let now X, Y be random variables taking values in finite state spaces $\mathcal {X}$, $\mathcal {Y}$ respectively and fix $A\in \mathcal {F}$ with $\,{\mathbb {P}}\,\,(A) > 0$. We put ${{\textbf {H}}}_{A}({X})\,=\, -\sum _{x\in \mathcal {X}} {\mathbb {P}}_{A} \, (X\,=\,x) \text {log}_{2}\,{\mathbb {P}}_{A} \, (X\,=\,x)$. Moreover, ${{\textbf {H}}}_{A}(X\,|\,Y)\,=\,\sum _{y\in \mathcal {Y}}\,{\mathbb {P}}_{A}(Y=y){{\textbf {H}}}_{Y=y,A}(X)$ will stand for the conditional Shannon entropy of X with respect to Y. When ${\mathbb {P}}\,\,(A) = 1$, we will omit subscript A and write ${\textbf{H}}\,(X)$ and ${\textbf{H}}\,(X\,\,|\,\,Y)$, respectively.

To shorten the notation, we will use the following convention. For a subset $A = \left\{ i_1, \ldots , i_n\right\} \subset {\mathbb {Z}}$ with $i_1< i_2< \cdots < i_n$ and a process $\textbf{X} = {({{X}_i})_{i\in {\mathbb {Z}}}}$, we will write

$$\begin{aligned} X_A = \left( X_{i_1}, X_{i_2},\ldots , X_{i_n}\right) . \end{aligned}$$

Moreover, for any $k \leqslant \ell $ in ${\mathbb {Z}}$, we define integer intervals:

$$\begin{aligned}{}[k, \infty ) {:=} \left\{ k, k + 1, \ldots \right\} , \quad (-\infty , \ell ] {:=} \left\{ \ell , \ell -1, \ldots \right\} , \quad [k,\ell ] {:=} \left\{ k, k + 1, \ldots , \ell \right\} . \end{aligned}$$

For example, $X_{[0, n]} = \left( X_0, \ldots , X_n\right) $ for $n\in {\mathbb {N}}$. It is natural and convenient to interpret $[k, \ell ]$ as $\varnothing $ if $\ell < k$, ${\textbf{H}}\,(X_{\varnothing }) = 0$ and ${\textbf{H}}\,(X\,|\,Y_{\varnothing }) = {\textbf{H}}\,(X)$.

Consider now two random processes $\textbf{X}= {({{X}_i})_{i\in {\mathbb {Z}}}}$ and $\textbf{Y} = {({{Y}_i})_{i\in {\mathbb {Z}}}}$ such that $(\textbf{X}, \textbf{Y}) := \left( (X_i, Y_i)\right) _{i\in {\mathbb {Z}}}$ is stationary. Then

$$\begin{aligned} {\textbf{H}}\,({\textbf{X}}) = \lim \limits _{n\rightarrow \infty }\frac{1}{n}{\textbf{H}}(X_{[0, n - 1]}), \quad {\textbf{H}}\,(\textbf{X}\,|\,\textbf{Y}) = \lim \limits _{n \rightarrow \infty } \frac{1}{n} {\textbf{H}}\,(X_{[0, n - 1]}\,|\,Y_{[0, n - 1]})\qquad \end{aligned}$$

(1.2)

will denote, respectively, the entropy rate of $\textbf{X}$ and the relative entropy rate of $\textbf{X}= {({{X}_i})_{i\in {\mathbb {Z}}}}$ with respect to $\textbf{Y}={({{Y}_i})_{i\in {\mathbb {Z}}}}$. By the stationarity of $\textbf{X}$, ${\textbf{H}}\,({\textbf{X}}) = \lim \nolimits _{n\rightarrow \infty }{\textbf{H}}\,(X_0\,|\,X_{[-n, -1]})$. Note that both limits in (1.2) exist due to the subadditivity of appropriate sequences.

Remark 1.4

Sometimes it is convenient to extend the classical definition of the conditional entropy, ${\textbf{H}}\,(X\,\,|\,\,Y)$, to ${\textbf{H}}\,(X\,|\,\mathcal {G})$, where X is a finitely-valued random variable and $\mathcal {G} \subset \mathcal {F}$ is a sub-$\sigma $-algebra (see [20], Chapter 14, for a precise construction and proofs). This extension is justified by the following facts. If $\mathcal {G} =\sigma (Y)$ then ${\textbf{H}}\,(X\,|\,\sigma (Y)) = {\textbf{H}}\,(X\,\,|\,\,Y)$ for any random variable Y.^{Footnote 8} If $\mathcal {H}\subset \mathcal {G}\subset \mathcal {F}$ are sub-$\sigma $-algebras then ${\textbf{H}}\,(X\,|\,\mathcal {G}) \leqslant {\textbf{H}}\,(X\,|\,\mathcal {H})$. Moreover, if $\mathcal {G}_n \searrow \mathcal {G}$ or $\mathcal {G}_n \nearrow \mathcal {G}$ then ${\textbf{H}}\,(X\,|\,\mathcal {G}_n) \nearrow {\textbf{H}}\,(X\,|\,\mathcal {G})$ or ${\textbf{H}}\,(X\,|\,\mathcal {G}_n) \searrow {\textbf{H}}\,(X\,|\,\mathcal {G})$, respectively. Thus, for example, it makes sense to write ${\textbf{H}}\,(\textbf{X}) = {\textbf{H}}\,(X_0\,|\,X_{(-\infty , - 1]}) = \lim \nolimits _{n\rightarrow \infty } {\textbf{H}}\,(X_0\,|\,X_{(-n, - 1]})$. The chain rule is still valid, namely, if X and Y are finitely-valued then

$$\begin{aligned} {\textbf{H}}\,(X, Y\,|\,\mathcal {G}) = {\textbf{H}}\,(X\,|\,\mathcal {G}) + {\textbf{H}}\,(Y\,|\,\sigma (\mathcal {G}, \sigma (X))). \end{aligned}$$

(1.3)

Furthermore, ${\textbf{H}}\,(X\,|\,\mathcal {G}) = 0$ if and only if X is $\mathcal {G}$-measurable and ${\textbf{H}}\,(X\,|\,\mathcal {G}) = {\textbf{H}}\,(X)$ if and only if X is independent of $\mathcal {G}$.

Remark 1.5

We will often omit some technicalities concerning events of zero probability. First, we tacitly assume that $\mathcal {F}$ is complete (i.e. all subevents of zero-measure events are measurable). Secondly, when considering sub-$\sigma $-fields associated with random processes, we think of them as of measure-$\sigma $-algebras (intuitively, we look at them "up to events of probability zero"). Given sub-$\sigma $-fields $\mathcal {G},\mathcal {H}\subset \mathcal {F}$, sometimes we will write

$$\begin{aligned} \mathcal {G}\,\,{\mathop {\subset }\limits ^{\mathbb {P}}}\,\,\mathcal {H} \end{aligned}$$

to stress that for every $G \in \mathcal {G}$ there is $H\in \mathcal {H}$ such that $\mathbb {P} (G\triangle H) = 0$ but not necessarily $\mathcal {G}\subset \mathcal {H}$ (with obvious modifications for ${\mathop {\supset }\limits ^{\mathbb {P}}}$ and ${\mathop {=}\limits ^{\mathbb {P}}}$). However, in most cases, we will skip such considerations, cf. the last sentence of the previous remark.

Given processes $\textbf{X} = {({{X}_i})_{i\in {\mathbb {Z}}}}$ and $\textbf{Y} = {({{Y}_i})_{i\in {\mathbb {Z}}}}$, we will be interested in the entropy rate ${\textbf{H}}\,(\textbf{X}\cdot \textbf{Y})$ of their product $\textbf{X}\cdot \textbf{Y} = (X_i \cdot Y_i)_{i\in {\mathbb {Z}}}$. Our standing assumptions (unless stated otherwise) will be that:

(i)
$\textbf{X}$ is finitely-valued, $\textbf{Y}$ is binary ($Y_i\in \{0,1\}$ for $i\in {\mathbb {Z}}$) and $\mathbb {P}(Y_0=1)>0$,
(ii)
$\textbf{X}\amalg \textbf{Y}$, i.e. $\textbf{X}$ and $\textbf{Y}$ are independent.

Notice that by the independence of $\textbf{X}$ and $\textbf{Y}$, process $(\textbf{X}, \textbf{Y})$ is stationary. Moreover, $\textbf{X}\cdot \textbf{Y}$ is a factor of $(\textbf{X}, \textbf{Y})$.^{Footnote 9} The quantity ${\textbf{H}}\,(\textbf{X}\cdot \textbf{Y}\,|\,{\textbf{Y}})$ turns out to be easier to deal with than ${\textbf{H}}(\textbf{X}\cdot \textbf{Y})$. A particular emphasis will be put on the case when ${\textbf{H}}\,({\textbf{Y}})=0$ in which ${\textbf{H}}\,(\textbf{X}\cdot \textbf{Y}\,|\,\textbf{Y})={\textbf{H}}\,(\textbf{X}\cdot \textbf{Y})$^{Footnote 10} and ${\textbf{H}}\,({\textbf{X}}\cdot {\textbf{Y}}) \leqslant {\textbf{H}}\,({\textbf{X}})$.^{Footnote 11}

Let $\textbf{R}=\textbf{R}(\textbf{Y})={({{R}_i})_{i\in {\mathbb {Z}}}}$ be the return process, i.e. the process of consecutive arrival times of $\textbf{Y}$ to 1:

$$\begin{aligned} R_i={\left\{ \begin{array}{ll} \inf \{j \geqslant 0 : Y_j = 1\}, &{} i=0, \\ \inf \{j\geqslant R_{i - 1} : Y_j = 1\}, &{} i\geqslant 1, \\ \sup \{j < R_{i + 1} : Y_j = 1\}, &{} i\leqslant -1. \end{array}\right. } \end{aligned}$$

(1.4)

Note that, in general, $\textbf{R}$ can be countably-valued. If $\textbf{Y}$ is ergodic then it visits 1 infinitely often, both in the future and in the past and, thus, $\textbf{R}$ is well-defined almost everywhere. However, we don’t need to assume the ergodicity of $\textbf{Y}$ to be able to speak of $\textbf{R}$ and we will just assume that:

(iii)
$\textbf{Y}$ is such that the definition of $\textbf{R}$ makes sense.

Whenever (i), (ii) and (iii) hold, we will say that the pair $(\textbf{X},\textbf{Y})$ is good. If $\textbf{Y}$ is binary, with $\mathbb {P}(Y_0=1)>0$ and such that (iii) holds, we will say that $\textbf{Y}$ is good.

Remark 1.6

We will use lowercase letters to denote realizations of the corresponding random processes (denoted by uppercase letters). Recall that $\textbf{x}={({{x}_i})_{i\in {\mathbb {Z}}}}$ is a realization of $\textbf{X} ={({{X}_i})_{i\in {\mathbb {Z}}}}$ if there exists $\omega \in \Omega $ such that $x_i = X_i(\omega )$ for all $i\in {\mathbb {Z}}$. Moreover, we will tacitly assume that $\omega $ belongs to some “good” subset of $\Omega $ of probability 1. For example, for $\textbf{R}$, our standing assumption will be that $\omega $ realizing $\textbf{r}$ belongs to the set where $\textbf{Y}$ visits 1 infinitely often in both directions. In general, if some property of a process $\textbf{X}$ has probability 1, then realization $\textbf{x}$ inherits it. For example, if we consider $\textbf{Y}$ under $\mathbb {P}_{Y_0 = 1}$ then every realization $\textbf{y}$ will satisfy $y_0 = 1$.

The main technical result contains entropy formulas for good processes.

Theorem 1.7

(answer to Question 4(A)) Let $\textbf{X}=(X_{n})_{n\in {\mathbb {Z}}}$, $\textbf{Y}=(Y_{n})_{n\in {\mathbb {Z}}}$ be a pair of mutually independent stationary processes, where $\textbf{X}$ is finitely valued and $\textbf{Y}$ is binary and such that $\mathbb {P}(Y_{0}=1)>0$. Assume also that $\textbf{Y}$ is such that the definition of the corresponding return process $\textbf{R}$ to state 1 makes sense. Then

(A)
${\textbf{H}}\,(\textbf{X}\,\cdot \,\textbf{Y}\,|\,\textbf{Y})={\mathbb {P}}(Y_0=1){\mathbb {E}}_{Y_{0=1}} {\textbf{H}} (X_0|X_\{r_{-1},r_{-2},...\})|_{r_{-i}= R_{-i}}.$

If additionally $\textbf{Y}$ is ergodic then

(B)
${\textbf{H}}\,(\textbf{X}\,\cdot \,\textbf{Y}\,|\,\textbf{Y})={\textbf{H}}(\textbf{X})-{\mathbb {P}}(Y_0=1){\mathbb {E}}_{Y_{0=1}}{\textbf{H}} (X_{[1,r_1-1]}|X_{(-\infty ,0]}, X_{\{r_1,r_2,\ldots \}})|_{r_i=R_i}$.

Remark 1.8

The above expectations are to be understood in the following way:

we compute ${\textbf{H}}\,(X_0\,|\,X_{\{r_{-1}, r_{-2}, \cdots \}})$ or ${\textbf{H}}\,(X_{[1,r_1-1]}\,|\,X_{(-\infty ,0]}, X_{\{r_1,r_2,\ldots \}})$ for all realizations $\textbf{r} = {({{r}_i})_{i\in {\mathbb {Z}}}}$ thus obtaining a function $f(\textbf{r})$ depending on $\textbf{r}$;
we find $\mathbb {E}_{Y_0 = 1} f(\textbf{R})$.

1.4 Consequences of the main technical result

Clearly, Theorem 1.7 gives an answer to Questions 4(A) and 5(A). We will say now how it is related to Questions 4(B), 5 (B), 4(C) and 5(C). The details and longer proofs are included in Sect. 2.3.

1.4.1 Answer to Questions 4(B) and 5(B)

Notice first that

$$\begin{aligned} {\textbf{H}}\,({\textbf{X}})\leqslant {\textbf{H}}\,(X_0\,|\,X_{\{r_{-1}, r_{-2}, \cdots \}})\leqslant {{\textbf{H}}\,({X}_0)} \end{aligned}$$

for each choice of negative integers $\dots<r_{-2}<r_{-1}<0$. Therefore, by Theorem 1.7 (A), we obtain immediately the following:

Corollary 1.9

(positive answer to Question 4(B)) Suppose that $(\textbf{X},\textbf{Y})$ is good, i.e. $\textbf{X}=(X_{n})_{n\in {\mathbb {Z}}}$, $\textbf{Y}=(Y_{n})_{n\in {\mathbb {Z}}}$ is a pair of mutually independent stationary processes, where $\textbf{X}$ is finitely valued and $\textbf{Y}$ is binary, such that $\mathbb {P}(Y_{0}=1)>0$ and the definition of the corresponding return process $\textbf{R}$ to state 1 makes sense. Assume additionally that ${\textbf{H}}\,({\textbf{Y}})=0$. Then

$$\begin{aligned} \mathbb {P}\,({Y_0=1})\,{\textbf{H}}\,({\textbf{X}})\leqslant {\textbf{H}}\,({\textbf{M}})\leqslant \mathbb {P}\,({Y_0=1})\,{\textbf{H}}\,(X_0). \end{aligned}$$

In particular,

$$\begin{aligned} {\textbf{H}}\,({\textbf{M}})>0\,\, \textrm{whenever} \,\,{\textbf{H}}\,({\textbf{X}})>0. \end{aligned}$$

(1.5)

Remark 1.10

The lower bound in Corollary 1.9 is attained for exchangeable processes (see Proposition 2.5), whereas the upper bound is attained for i.i.d. processes. If $\textbf{X}$ is a Markov chain (which is not i.i.d.), both inequalities are strict, see Sect. 2.1.2.

Remark 1.11

(positive answer to Question 5(B)) Implication (1.5) means, in particular, that the answer to Question 5(B) is positive, whenever $\nu _\eta \ne \delta _{(\ldots 0,0,0\ldots )}$. In Sect. 1 we present an alternative ergodic-theoretic approach to this problem. The proof presented therein is much shorter, on the other hand it addresses directly Question 5(B), without providing any explicit formulas.

Remark 1.12

If one drops the assumption that $\textbf{X} \amalg \textbf{Y}$ then the situation changes completely and one can get ${\textbf{H}}\,({\textbf{M}}) = 0$ (with ${\textbf{H}}\,({\textbf{X}})>0$ and $\,{\mathbb {P}}\,(Y_0=1)>0$). To see how far this can go, consider

$$\begin{aligned} \textbf{X} = \textbf{Z}\cdot \textbf{W} \text { and }\textbf{Y} = \mathbf {1 - W} = ({1 - W_i})_{i \in {\mathbb {Z}}}, \end{aligned}$$

where

$$\begin{aligned} \textbf{Z} \amalg \textbf{W}, {\textbf{H}}\,({\textbf{W}}) = 0\text { and }\mathbb {P}(W_0=0)\cdot \mathbb {P}(W_0=1)>0. \end{aligned}$$

Then $\textbf{M}$ is a trivial zero process, in particular, we have ${\textbf{H}}\,({\textbf{M}}) = 0$. On the other hand, by Corollary 1.9, ${\textbf{H}}\,(\textbf{X}) = {\textbf{H}}\,(\textbf{Z}\cdot \textbf{W}) > 0 = {\textbf{H}}\,(\textbf{W}) = {\textbf{H}}\,({\textbf{Y}})$. Cf. also Sect. 1 for more examples of ergodic-theoretic flavour.

1.4.2 Answer to Questions 4(C) and 5(C)

Answers to Questions 4(C) and 5(C) are more complex and they are related to the notion of a bilaterally deterministic process.

Definition 1.13

We say that a stationary process $\textbf{Z} = \left( Z_i\right) _{i \in {\mathbb {Z}}}$ is bilaterally deterministic if, for all $k\in {\mathbb {N}}$,

$$\begin{aligned} {\textbf{H}}\,(Z_{[0,k]}\,|\,Z_{(-\infty , -1]},Z_{[k+1, \infty )}) = 0. \end{aligned}$$

Remark 1.14

The notion of a bilaterally deterministic process was introduced by Ornstein and Weiss [32], in terms of the following (double) tail sigma-algebra:

$$\begin{aligned} {{\mathcal {T}}_d} := \bigcap _{n\geqslant 1} \sigma \left( Z_{(-\infty , -n]}, Z_{[n, \infty )}\right) . \end{aligned}$$

Notice that the following conditions are equivalent:

$\textbf{Z}$ is bilaterally deterministic,
$Z_{[-k,k]}\in {{\mathcal {T}}_d} \text { for each }k\geqslant 1$,
$\sigma (\textbf{Z})={{\mathcal {T}}_d}$.

Indeed, e.g., if $\textbf{Z}$ is bilaterally deterministic then ${\textbf{H}}\,(Z_{[0,k]}\,|\,Z_{(-\infty , -\ell ]},Z_{[k+1 + m, \infty )})=0$ for any $k,\ell ,m\in {\mathbb {N}}$ and by taking $\ell ,m\rightarrow \infty $, we easily obtain $Z_{[-k,k]}\in {{\mathcal {T}}_d} \text { for each }k\geqslant 1$. Cf. also Remark 1.4. Informally, “given the far past and the distant future, the present can be reconstructed” [32].

Remark 1.15

Given a stationary finitely-valued process $\textbf{Z}$, let

$$\begin{aligned} {{\mathcal {T}}_p} := \bigcap _{n\geqslant 1} \sigma \left( Z_{(-\infty , -n]}\right) , \qquad {{\mathcal {T}}_f} := \bigcap _{n\geqslant 1} \sigma \left( Z_{[n, \infty )}\right) \end{aligned}$$

denote, respectively, the tail $\sigma $-algebra corresponding to the past and to the future. By a celebrated result of Pinsker [34], ${{\mathcal {T}}_{p}}\,\,{\mathop {=}\limits ^{\mathbb {P}}}\,\,{{\mathcal {T}}_{f}}\,\,{\mathop {=}\limits ^{\mathbb {P}}}\,\,\Pi $, where $\Pi $ denotes the Pinsker $\sigma $-algebra (i.e., the largest zero entropy sub-$\sigma $-algebra). Thus, the following conditions are equivalent (cf. Remark 1.14):

${\textbf{H}}\,(\textbf{Z})=0$,
$Z_{[-k,k]}\in {{\mathcal {T}}_p}$ for each $k\geqslant 1$,
$\sigma (\textbf{Z})={{\mathcal {T}}_p}$.

A direct consequence of Remark 1.14 and Remark 1.15 is the following observation:

Corollary 1.16

Suppose that ${\textbf{H}}\,(\textbf{Z})>0$. Then $\textbf{Z}$ is not bilaterally deterministic whenever ${{\mathcal {T}}_d}={{\mathcal {T}}_p}$. In particular, this happens if ${{\mathcal {T}}_d}$ is trivial.

Notice that from this point of view, stationary processes can be split into three pairwise disjoint classes:

(a)
of zero entropy rate (they are automatically bilaterally deterministic),
(b)
of positive entropy rate that are bilaterally deterministic,
(c)
of positive entropy rate but not bilaterally deterministic.

Class (c) includes the following positive entropy rate processes:

exchangeable processes,
Markov chains,
weakly Bernoulli processes (here ${{\mathcal {T}}_d}$ is trivial),

for more details, see Sect.on 2.1. Theorem 1.7 allows us to “compare” a large subclass of processes from class (a) with processes from class (c), see Corollaries 1.17 and 1.19 below. In particular, the zero entropy class that we have in mind contains all ${\mathscr {B}}$-free systems (considered with the Mirsky measure), cf. Proposition 1.20 and Corollary 1.21. We leave as an open problem to find answers to analogous questions on the relations between class (a) with class (b).

Notice that

$$\begin{aligned}{} & {} \mathbb {E}_{Y_0=1}{\textbf{H}}\,(X_{[1,r_1-1]}\,|\,X_{(-\infty ,0]},X_{\{r_1,r_2,\dots \}})_{|r_i = R_i} \\{} & {} \qquad \geqslant \mathbb {E}_{Y_0=1}{\textbf{H}}\,(X_{[1,r_1-1]}\,|\,X_{(-\infty ,0]},X_{[r_1,\infty )})_{|r_1 = R_1}\\{} & {} \qquad =\sum _{k\geqslant 1} \,{\mathbb {P}}_{Y_0=1}(R_1=k+1){\textbf{H}}\left( X_{[1,k]}|\, X_{(-\infty ,0),X_{[k+1,\infty )}}\right) . \end{aligned}$$

Moreover, if $\textbf{X}$ fails to be bilaterally deterministic, then, for all k sufficiently large, we have

$$\begin{aligned} {\textbf{H}}\,(X_{[1,k]}\,|\,X_{(-\infty ,0]},X_{[k+1,\infty )})>0. \end{aligned}$$

(1.6)

Thus, using Theorem 1.7(B), we obtain immediately the following:

Corollary 1.17

(answer to Question 4(C)) Suppose that $(\textbf{X},\textbf{Y})$ is good and $\textbf{Y}$ is ergodic of zero entropy rate (i.e. $\textbf{X}$ and $\textbf{Y}$ is a pair of mutually independent stationary processes, $\textbf{X}$ is finitely-valued, $\textbf{Y}$ is binary and ergodic, with $\textbf{H}(\textbf{Y})=0$). If additionally

$$\begin{aligned} {\mathbb {P}}\,(R_1=k)\, >0 \, \textrm{for} \,\,\textrm{infinitely} \,\,\textrm{many}\, k\in {\mathbb {N}}\end{aligned}$$

(1.7)

and $\textbf{X}$ is not bilaterally deterministic then ${\textbf{H}}\,({\textbf{M}}) < {\textbf{H}}\,({\textbf{X}})$.

Remark 1.18

In fact, if we know more about $\textbf{X}$ than just (1.6) then the assumption that ${\mathbb {P}}\,(R_1 = k) > 0$ for infinitely many $k\in {\mathbb {N}}$ can be relaxed and we can still have ${\textbf{H}}\,({\textbf{M}}) < {\textbf{H}}\,({\textbf{X}})$. E.g. if $\textbf{X}$ is Bernoulli then we will always have ${\textbf{H}}\,({\textbf{M}})<{\textbf{H}}\,({\textbf{X}})$ whenever $(\textbf{X},\textbf{Y})$ is good ($\textbf{X}$, $\textbf{Y}$ is a pair of mutually independent stationary processes, $\textbf{X}$ is finitely-valued, $\textbf{Y}$ is binary and such that the definition of the corresponding return process $\textbf{R}$ to state 1 makes sense) and $\textbf{Y}$ is of zero entropy rate.

A natural question arises what happens when (1.7) fails to hold. Suppose that our processes are of dynamical origin and the underlying dynamical system is a transitive symbolic dynamical system. Namely, take $\textbf{w}\in \{0,1\}^{\mathbb {Z}}$ such that the support of $\textbf{w}$ is unbounded both from below and from above, and suppose that $\textbf{w}$ is quasi-generic along some subsequence for an invariant zero entropy measure $\nu $. Let Y be the orbit closure of $\textbf{w}$ under the left shift S and let $\textbf{Y}\sim \nu $ be the corresponding stationary process. Clearly,

$$\begin{aligned} (1.7) \implies \text { the support of } \textbf{w} \text { does not contain a two-sided infinite arithmetic progression}. \end{aligned}$$

It turns out that if we assume that the support of $\textbf{w}$ does contain a two-sided (infinite) arithmetic progression then one can obtain a complementary result to Corollary 1.17:

Corollary 1.19

Let $\textbf{Y}$ be a good, ergodic process (i.e. $\textbf{Y}$ is a stationary binary ergodic process, with $\mathbb {P}(Y_0=1)>0$) of zero entropy rate. Assume additionally that there exists $L\geqslant 1$ such that for a.e. realization $\textbf{y}$, the corresponding return time sequence $\textbf{r}$ contains an arithmetic progression of difference L. Then there exists a stationary binary process $\textbf{X}$ that is not bilaterally deterministic and such that ${\textbf{H}}\,({\textbf{M}})={\textbf{H}}\,({\textbf{X}})$.

Let us turn now to the interpretation of Corollaries 1.17 and 1.19 from the point of view of ${\mathscr {B}}$-free systems. Recall that a topological dynamical system (T, X) is said to be proximal whenever for any $x,y\in X$ there exists $n_{k}\rightarrow \infty $ such that $d(T^{n_{k}}x,T^{n_{k}}y)\rightarrow 0$. It turns out that in ${\mathscr {B}}$-free setting we have the following dychotomy:

Proposition 1.20

Let ${\mathscr {B}}\subset \mathbb {N}$ and let $\eta $ be the characteristic function of the corresponding ${\mathscr {B}}$-free set. Then exactly one of the following holds:

$(X_\eta ,S)$ is proximal and then for infinitely many $k\geqslant 1$ the block of the form $10\ldots 01$ (with k zeros between the 1’s) is of positive Mirsky measure $\nu _\eta $,
$(X_\eta ,S)$ is not proximal and then $\eta $ contains a two-sided infinite arithmetic progression.

As a direct consequence of Corollaries 1.17, 1.19 and Proposition 1.20, for ${\mathscr {B}}$-free systems we have the following result:

Corollary 1.21

Let ${\mathscr {B}}\subset \mathbb {N}$. Then $(X_\eta ,S)$ is proximal if and only if for any $\textbf{X}$ that is not bilaterally deterministic, such that $\textbf{X}\amalg \textbf{Y}$, we have ${\textbf{H}}\,({\textbf{M}})<{\textbf{H}}\,({\textbf{X}})$.

Finally, let us remark that $\textbf{X}$ in Corollary 1.19 can be chosen to be very weakly Bernoulli (i.e. as a dynamical system, isomorphic to a Bernoulli process [31]) (compare Example 2.16 below and Remark 1.18). That is, for $\textbf{Y}$ as in Corollary 1.19, we can find a measure-theoretic dynamical system $(X,\mathcal {B},\mu ,T)$ with two stochastic representations $\textbf{X}$ and $\textbf{X}'$ (both not bilaterally deterministic!) such that ${\textbf{H}}\,(\textbf{X}\cdot \textbf{Y})<{\textbf{H}}\,({\textbf{X}})={\textbf{H}}\,(\textbf{X}')={\textbf{H}}\,(\textbf{X}'\cdot \textbf{Y})$. More than that, in some cases $\textbf{X}'$ can be retrieved from $\textbf{X}'\cdot \textbf{Y}$. This matches well with the fact that the notion of a bilaterally deterministic process is not stable under taking various process representation of a given dynamical system [32]. It makes the situation completely different from the one in [18], where the results are purely ergodic-theoretic.

1.5 Dictionary between ergodic theory and probability theory

In our paper, both ergodic-theoretic and stochastic questions and tools are often intertwined. Let us now give some samples of ergodic-theory results translated into the language of stochastic processes. Our basic object is an ergodic-theoretic dynamical system $(\mathcal {X}^{\mathbb {Z}}, \mu , S)$, where S stands, as usual, for the left shift, together with a subset $A \subset X$ satisfying $\mu (A) > 0$. Recall that for $x\in A$, the first return time $n_A$ is defined as $n_A(x) = \inf \left\{ n\geqslant 1\;|\; S^n x \in A\right\} $ and the corresponding induced transformation as $S_A(x) = S^{n_A(x)}(x)$, with the corresponding conditional measure $\mu _A$ being invariant under $S_A$.

Fix now a stationary process $\textbf{X} = {({{X}_i})_{i\in {\mathbb {Z}}}}$ on $(\Omega ,\mathcal {F},\mathbb {P})$, with distribution $\mu $, i.e. $\textbf{X}\sim \mu $. This is a stochastic counterpart of $(\mathcal {X}^{\mathbb {Z}},\mu ,S)$, cf. also Sect. 1.1. Left shift S naturally acts on processes by $S \textbf{X} = \left( X_{i + 1}\right) _{i \in {\mathbb {Z}}}$. In particular,

$$\begin{aligned} S_*(\mu ) = \mu \text { precisely if }S\textbf{X} \sim \textbf{X}. \end{aligned}$$

Similarly, $\mu _A$ corresponds to the distribution of $\textbf{X}$ under $\mathbb {P}_{\textbf{X} \in A}$. To see how one should interpret $S_A$ in terms of stochastic processes, let $R_{A} = \inf \left\{ n \geqslant 1\;|\; S^n\textbf{X} \in A\right\} $ be the first return time, defined on $\textbf{X}\in A$, cf. (1.4). Now, we set $S_A \textbf{X} = \left( X_{i + R_A}\right) _{i \in {\mathbb {Z}}}$ and one can easily check that

$$\begin{aligned} S_A \mu _A = \mu _A \text { precisely if }S_A \textbf{X} \sim \textbf{X} \text { under }\mathbb {P}_{\textbf{X} \in A}. \end{aligned}$$

Finally, recall that $h(\mu )={\textbf{H}}\,({\textbf{X}})$.

Let us present a summary of some classical ergodic theorems (formulated for $(\mathcal {X}^{\mathbb {Z}},\mu ,S)$), with their counterparts for random processes.

	Ergodic	Probabilistic
Ergodicity of $\mu ^{1}$	$\frac{1}{n}\sum _{i=0}^{n-1}S^i f \rightarrow \int f\, d\mu $	$\frac{1}{n}\sum _{i=0}^{n-1}f(S^{i}\textbf{X}) \rightarrow \mathbb {E}f(\textbf{X})$
Poincaré Rec.	$\mu _{A}\left( \{x : {S^{k}}x \in A \text { i.o.}\}\right) = 1$	${{\mathbb {P}}_{\textbf{X} \in A}}(S^k\textbf{X} \in A\text { i.o.}) = 1$
Kac’s Lemma	$\int _A n_A d\mu _A =1$	${\mathbb {P}}(\textbf{X} \in A) \mathbb {E}_{\textbf{X} \in A} R_A = 1$
Invariance of $\mu _A$	$S_A\mu _A =\mu _{A}$	$S_A\textbf{X} \sim \textbf{X}$, under $\mathbb {P}_{\textbf{X} \in A}$
Ergodicity of $\mu _A$	$\frac{1}{n}\sum _{i=0}^{n-1}S^i f \rightarrow \int f\, d\mu _A$	$\frac{1}{n}\sum _{i=0}^{n-1}f(S_A^{i}\textbf{X}) \rightarrow \mathbb {E}_{\textbf{X}\in A} f(\textbf{X})$
Maker’s ET$^{2}$	$\frac{1}{n}\sum _{i=0}^{n-1}S^{i}f_{n - i} \rightarrow \int f d\mu $	$\frac{1}{n}\sum _{i=0}^{n-1}f_{n - i}(S^{i}\textbf{X}) \rightarrow \mathbb {E}f(\textbf{X})$

$^{1}$Here, in fact, we state Birkhoff ergodic theorem under the assumption that $\mu $ is ergodic
$^{2}$ET stands for “ergodic theorem”

We owe the reader a word of explanation concerning the abbreviations in the table above. The convergence of ergodic averages is always meant a.e. / a.s. with respect to the appropriate underlying measure ($\mu $ or $\mu _A$ / $\mathbb {P}$ or $\mathbb {P}_{\textbf{X} \in A}$). Also, we tacitly assume that all required assumptions are satisfied, e.g. functions appearing in ergodic averages are integrable with respect to the underlying measure. Finally, let us give some details concerning Maker’s ergodic theorem [29] which will play a central role in the proof of Theorem 1.7 (A). We recall it now (in the ergodic-theoretic language, i.e. as in [25], under the extra assumption that T is ergodic).

Theorem 1.22

(Maker’s ergodic theorem) Let $(X,\mu ,T)$ be an ergodic measure-theoretic dynamical system. Let $f\in L_1(\mu )$ and $f_n\rightarrow f$ $\mu $-a.e. Suppose that $\sup _n |f_n|\in L_1(\mu )$. Then

$$\begin{aligned} \frac{1}{n}\sum _{i=0}^{n-1}T^if_{n-i} \rightarrow \mathbb {E}_\mu f \text { a.e.} \end{aligned}$$

Let us now return to our general setting, with standing assumptions (i) and (ii) on $\textbf{X}$ and $\textbf{Y}$. Consider the inter-arrival process $\textbf{T} = {({{T}_i})_{i\in {\mathbb {Z}}}}$, where

$$\begin{aligned} T_i = R_i - R_{i - 1} \end{aligned}$$

(1.8)

and the return-process $\textbf{R}$ is as in (1.4). Thus, $T_i$ tells us how much time elapses between $(i-1)$’th and i’th visit of $\textbf{Y}$ to the state 1.

Remark 1.23

(Factor of a random process) Recall that whenever $\textbf{Y}$ is ergodic, the return process $\textbf{R}$ and thus also $\textbf{T}$ is well-defined. Moreover, $\textbf{T}$ can be regarded as a factor of $\textbf{Y}$ in the ergodic-theoretic sense. More precisely, by the very definition of $\textbf{T}$, there is a natural measurable function $\pi :\left( \{0, 1\}^{\mathbb {Z}}, S_{[1]}, \mathcal {L}(\textbf{X}|\; \mathbb {P}_{Y_0 = 1})\right) \rightarrow \left( {\mathbb {Z}}^{\mathbb {Z}}, S, \mathcal {L}(\textbf{T}\;|\; \mathbb {P}_{Y_0 = 1})\right) $ such that $\pi (\textbf{X}) = \textbf{T}$ almost surely, where $\mathcal {L}(\cdot \;|\; \cdot )$ stands for the "distribution of $\cdot $ under $\cdot $", $[1] = \{\textbf{y}\; |\; y_0 = 1\}$ and $S_{[1]}$ is the corresponding induced shift operator (cf. the beginning of this section). Clearly, $\pi S_{[1]} = S \pi $. In particular, since $\textbf{Y} \sim S_{[1]}\textbf{Y}$ and $\textbf{Y}$ is ergodic (under $\mathbb {P}_{Y_0 = 1}$), we get that $ \textbf{T}$ is stationary and ergodic (under $\mathbb {P}_{Y_0 = 1}$) as well.

As a consequence of the above remark, we can apply Maker’s ergodic theorem to $\textbf{T}$, which results in the following corollary:

Corollary 1.24

Suppose that $\sup _{i\in {\mathbb {N}}} g_i(\textbf{T}) \in L_1(\mathbb {P}_{Y_0 = 1})$ and $g_i\xrightarrow {\mathbb {P}_{Y_0 = 1}\;a.s.} g$. Then, $\mathbb {P}_{Y_0 = 1}$ a.s.,

2 Examples, comments and proofs

2.1 Examples of non-bilaterally deterministic processes

In the subsections below we tacitly assume that $(\textbf{X},\textbf{Y})$ is good, i.e. $\textbf{X},\textbf{Y}$ is a pair of mutually independent stationary processes, where $\textbf{X}$ is finitely-valued, $\textbf{Y}$ is binary, with $\mathbb {P}(Y_{0}=1)>0$ and such that the definition of the corresponding return process $\textbf{R}$ to state 1 makes sense.

2.1.1 Exchangeable processes

Definition 2.1

We say that a process $\textbf{X}$ is exchangeable if for any $n\in {\mathbb {N}}$ and distinct times $i_1, i_2, \ldots , i_n$,

$$\begin{aligned} \left( X_{i_1}, X_{i_2}, \ldots , X_{i_n}\right) \sim \left( X_{1}, X_{2}, \ldots , X_{n}\right) . \end{aligned}$$

In other words, the distribution of $\textbf{X}$ is invariant under finite permutations.

Remark 2.2

Let $\textbf{X} = {({{X}_i})_{i\in {\mathbb {Z}}}}$ be exchangeable. By a celebrated result of de Finetti [10] (cf. also [24]), this is equivalent to $\textbf{X}$ being a convex combination of i.i.d. processes. Thus, there exists a random variable $\Theta $ such that, conditionally on $\Theta $, $\textbf{X}$ is i.i.d. Note that this ensures that ${\textbf{H}}\,(\textbf{X}) > 0$, unless $X_i = f_i(\Theta )$ for some Borel functions $f_i$. Indeed,

$$\begin{aligned} {\textbf{H}}\,(X_1, \ldots , X_n) \geqslant {\textbf{H}}\,(X_1, \ldots , X_n)[\Theta ] = \sum _{i = 1}^{n}{\textbf{H}}\,(X_i\,|\,\Theta ) = n{\textbf{H}}\,(X_1\,|\,\Theta ), \end{aligned}$$

which gives ${\textbf{H}}\,(\textbf{X}) \geqslant {\textbf{H}}\,(X_1\,|\,\Theta )$. Therefore, ${\textbf{H}}\,(\textbf{X}) = 0$ implies $X_i = f_i(\Theta )$.

Remark 2.3

Olshen in [30] showed that if $\textbf{X} = {({{X}_i})_{i\in {\mathbb {Z}}}}$ is exchangeable then

$$\begin{aligned} \mathcal {I} = \mathcal {E} = {{\mathcal {T}}_d} = {{\mathcal {T}}_f} = {{\mathcal {T}}_p}, \end{aligned}$$

(as measure-algebras), where $\mathcal {I}, \mathcal {E}$ denote the $\sigma $-algebra of shift-invariant and finite permutation invariant sets respectively and ${{\mathcal {T}}_d}$, ${{\mathcal {T}}_f}$, ${{\mathcal {T}}_p}$ are double, future, past tails respectively.

As an immediate consequence of Remark 2.3 and Corollary 1.16, we obtain the following:

Corollary 2.4

Suppose that $\textbf{X}$ is exchangeable. Then ${\textbf{H}}\,({\textbf{X}})>0$ if and only if $\textbf{X}$ is not bilaterally deterministic.

Proposition 2.5

Suppose that $\textbf{X}$ is exchangeable. Then ${\textbf{H}}\,({\textbf{M}}\,|\,{\textbf{Y}}) ={\mathbb {P}}\left( Y_0 = 1\right) {\textbf{H}}\,(\textbf{X})$.

Proof

It follows from the exchangeability of $\textbf{X}$ that for any negative distinct times $ r_{-i}$, $i\in {\mathbb {N}}$,

$$\begin{aligned} {\textbf{H}}\,(X_0\,|\,X_{\{r_{-1},r_{-2}, \ldots \}}) = {\textbf{H}}\,(X_0\,|\,X_{\{-1, -2, \ldots \}}) ={\textbf{H}}\,({\textbf{X}}) \end{aligned}$$

It remains to use Theorem 1.7 (A). $\square $

2.1.2 Markov chains

Recall that a process $\textbf{X}$ is a Markov chain if, for every time $i\in {\mathbb {Z}}$, conditionally on $X_i$, $X_{(-\infty , i - 1]}$ is independent of $X_{[i + 1, \infty )}$. Colloquially, given present, the past and the future are independent. This immediately leads to the following corollary of Theorem 1.7 (A):

Corollary 2.6

If $\textbf{X}$ is a Markov chain (and $(\textbf{X},\textbf{Y})$ is good) then

$$\begin{aligned} {\textbf{H}}\,({\textbf{M}})={\mathbb {P}}\,({Y}_{0}={1})\sum _{k = 1}^{\infty }\,{\mathbb {P}}_{{Y}_{0}}={1} ({R}_{1}={k}) \,{\textbf{H}}\,(X_{k}\,\,|\,\,X_{\textrm{o}}). \end{aligned}$$

Remark 2.7

Corollary 2.6 easily extends to the case of k-Markov chains but for simplicity sake we decided to present it for $k = 1$.

Remark 2.8

Let $\textbf{X} = {({{X}_i})_{i\in {\mathbb {N}}}}$ be a finitely-valued Markov chain, $X_i \in \mathcal {X}$. It is well-known (see [15], Chapter XV, Section 6, Theorem 3, page 392) that we can uniquely decompose the state space $\mathcal {X}$ into disjoint union

$$\begin{aligned} \mathcal {X} = C \sqcup D_1 \sqcup D_2 \sqcup \cdots \sqcup D_k, \end{aligned}$$

(2.1)

where C is the set of transient states and $D_i$ are closed sets. If $\textbf{X}$ starts in $D_j$ (i.e. $X_0 \in D_j$) then it remains in $D_j$ forever. If $X_0 \in C$ then $\textbf{X}$ stays in C for finite time and jumps to some $D_j$ (and never leaves $D_j$ afterwards). Moreover (see [15], Chapter XV, Section 7, Criterion, page 395), if $\pi $ is a stationary measure then necessarily $\pi (C) = 0$.

Now suppose that a bilateral, finitely-valued Markov chain $\textbf{X} ={({{X}_i})_{i\in {\mathbb {Z}}}}$ is stationary (thus, $C = \emptyset $ in (2.1)). Fix $1\leqslant j\leqslant k$ and let $\textbf{X}_{D_j}$ stand for $\textbf{X}$ conditioned on $X_0 \in D_j$. By the definition of $D_j$, process $\textbf{X}_{D_j}$ is an irreducible (equivalently, ergodic), stationary Markov chain. Now, let $p_j$ be the period of $\textbf{X}_{D_j}$. Then $D_j$ can be decomposed into $p_j$ disjoint sets (see [6], Chapter 1, Section 3, Theorem 4)

$$\begin{aligned} D_j = D_{j, 0} \sqcup \cdots \sqcup D_{j, p_j - 1} \end{aligned}$$

such that $\,{\mathbb {P}}\,\,(X_1 \in D_{j, (\ell + 1)\bmod p_j}\;|\; X_0 \in D_{j, \ell }) = 1$. Using Corollary 2 from [3], we get that

$$\begin{aligned} {{\mathcal {T}}_d}\left( \textbf{X}_{D_j}\right) = {{\mathcal {T}}_p}\left( \textbf{X}_{D_j}\right) = {{\mathcal {T}}_f}\left( \textbf{X}_{D_j}\right) = \sigma \left\{ \left\{ X_0 \in D_{j, 0}\right\} , \left\{ X_0 \in D_{j, 1}\right\} , \ldots , \left\{ X_0 \in D_{j, p_j - 1}\right\} \right\} . \end{aligned}$$

Note that Corollary 2 from [3] is stated only for ${{\mathcal {T}}_f}$ but a perusal of the proofs of Theorem 1 and Corollaries 1 and 2 therein gives the same result for ${{\mathcal {T}}_d}$. Thus, $\textbf{X}$, conditionally on $X_0 \in D_{j, l}$, has trivial tail $\sigma $-algebras. This immediately leads to

$$\begin{aligned} {{\mathcal {T}}_d}\left( \textbf{X}\right) = {{\mathcal {T}}_p}\left( \textbf{X}\right) = {{\mathcal {T}}_f}\left( \textbf{X}\right) = \sigma \left\{ \left\{ X_0 \in D_{j, \ell }\right\} \;|\;1\leqslant j \leqslant k, 0 \leqslant \ell \leqslant p_j\right\} . \end{aligned}$$

(2.2)

Indeed, if for example $A \in {{\mathcal {T}}_d}\left( \textbf{X}\right) $ then, for all $j, \ell $, ${\mathbb {P}}\,(A\;|\;X_0 \in D_{j, \ell }) \in \{0, 1\}$ which yields (2.2). As a consequence of (2.2), we obtain the following:

Corollary 2.9

Suppose that $\textbf{X}$ is a stationary finitely-valued Markov-chain. Then ${\textbf{H}}\,({\textbf{X}})>0$ if and only if $\textbf{X}$ is not bilaterally deterministic.

Remark 2.10

Since ${\textbf{H}}\,({\textbf{X}})={\textbf{H}}\,(X_1\,|\,X_0)={\textbf{H}}\,(X_{i + 1}\,|\,X_i)$, it follows that ${\textbf{H}}\,({\textbf{X}})=0$ if and only if, for every $i\in {\mathbb {Z}}$, $X_i = f_i(X_{0})$ for some functions $f_i$. It is not hard to see that if for every $x\in \mathcal {X}$, ${\mathbb {P}}\,(X_0 = x) >0$, then every $f_i$ must be a bijection on $\mathcal {X}$. Moreover, by the stationarity of $\textbf{X}$, for $f_1(x) = y$, we get

$$\begin{aligned} {\mathbb {P}}\,(X_0 = x)= & {} {\mathbb {P}}\,(X_0 = x, f_1(x) = y) = {\mathbb {P}}\,(X_0 = x, f_1(X_0) = y) \\= & {} {\mathbb {P}}\,(f_i(X_0) = x, f_{i + 1}(X_0) = y) = {\mathbb {P}}\,(X_0 = f_i^{-1}(x))\mathbb {1}_{f_{i + 1}\left( f_i^{-1}(x)\right) = f_1(x)}. \end{aligned}$$

Thus, necessarily, $f_{i + 1}\left( z\right) = f_1(f_i(z))$. Consequently, if we set $f:=f_1$ then $f_{i} = f^{\circ i}$. Moreover, f must be such that, for all x, ${\mathbb {P}}\,(X_0 = x) = {\mathbb {P}}\,(X_0 = f(x))$.

Therefore, if $\textbf{X}$ is bilateral, finitely-valued, stationary Markov chain, with ${\mathbb {P}}\,(X_0 = x) >0$ for all $x \in \mathcal {X}$, then the following are equivalent:

$\textbf{X}$ is bilaterally deterministic;
there exist a bijection $f:\mathcal {X} \rightarrow \mathcal {X}$, such that $X_i = f^{\circ i}(X_0)$ and for all $x\in \mathcal {X}$, ${\mathbb {P}}\,(X_0 = x) = {\mathbb {P}}\,(X_0 = f(x))$.

2.1.3 Weakly Bernoulli processes

Weakly Bernoulli processes were introduced by Friedman and Ornstein [16] and belong to the classics of ergodic theory. Equivalently, one speaks of finitely determined processes. Recall that any process $\textbf{X}$ that is weakly Bernoulli is also very weakly Bernoulli (i.e. as a dynamical system, it is isomorphic to a Bernoulli process [31]). In particular, ${\textbf{H}}\,({\textbf{X}})>0$. We refer the reader, e.g., to [36] for more information on the subject.

Suppose now that $\textbf{X}$ is weakly Bernoulli. Then ${{\mathcal {T}}_d}$ is trivial (see, e.g., Proposition 5.17 in [4]). Therefore, as an immediate consequence of Corollary 1.16, we obtain the following:

Corollary 2.11

Suppose that $\textbf{X}$ is weakly Bernoulli. Then $\textbf{X}$ is not bilaterally deterministic.

In fact, the results in [4] are formulated in a different language. One more notion, equivalent to the weak Bernoulli property, is absolute regularity. It first appeared in works of Volkonskii and Rozanov [37, 38] who, in turn, attribute it to Kolmogorov. Fix a probability space $(\Omega ,\mathcal {F},\mathbb {P})$. Let $\mathcal {A},\mathcal {B}\subset \mathcal {F}$ be sub-$\sigma $-algebras and let

$$\begin{aligned} \beta (\mathcal {A},\mathcal {B}):=\sup \frac{1}{2}\sum _{i=1}^{I}\sum _{j=1}^{J}|\mathbb {P}(A_i\cap B_j) - \mathbb {P}(A_i)\mathbb {P}(B_j)|, \end{aligned}$$

where the supremum is taken over all (finite) partitions $\{A_1,\dots , A_I\}$, $\{B_1,\dots , B_J\}$ of $\Omega $, with $A_i\in \mathcal {A}$, $B_j\in \mathcal {B}$ for each i, j. Now, given a process $\textbf{X}$, for $-\infty \leqslant J < L \leqslant \infty $, we define the $\sigma $-algebra

$$\begin{aligned} \mathcal {F}_J^L:=\sigma (X_k : J\leqslant k\leqslant L). \end{aligned}$$

Then, for each $n\geqslant 1$, we define the following $\beta $-dependence coefficients:

$$\begin{aligned} \beta (n):=\sup _{j\in {\mathbb {Z}}}\beta (\mathcal {F}_{-\infty }^{j},\mathcal {F}_{j+n}^{\infty }). \end{aligned}$$

We say that $\textbf{X}$ is absolutely regular (or $\beta $-mixing) if $\beta (n)\rightarrow 0$ as $n\rightarrow \infty $.

Berbee, in [1], studied $\beta $-dependence coefficients for stationary ergodic processes. He showed that

$$\begin{aligned} \lim _{n\rightarrow \infty } \beta (n)= \beta = 1-\frac{1}{p}\text { for some }p\in {\mathbb {N}}\cup \{\infty \}. \end{aligned}$$

Moreover, he proved that if $\beta <1$ then ${{\mathcal {T}}_d}={{\mathcal {T}}_p}$. As a consequence of his result and of Corollary 1.16, we have:

Corollary 2.12

Suppose that $\textbf{X}$ is a stationary ergodic process with $\beta <1$. Then $\textbf{X}$ is not bilaterally deterministic.

2.2 Proof of the main technical result (Theorem 1.7)

2.2.1 Part (A)

By the chain rule (cf. (1.3)), we have

$$\begin{aligned} {\textbf{H}}\,(M_{[0, n]}\,|\,Y_{[0, n]}) = \sum _{k = 0}^{n} {\textbf{H}}\,(M_k\,|\,Y_{[0, n]}, M_{[0, k)}) =: \sum _{k = 0}^{n} H_{k, n}. \end{aligned}$$

(2.3)

Fix $0 \leqslant k \leqslant n$. Since $M_k = X_k \cdot Y_k$ and $\textbf{X} \amalg \textbf{Y}$, we easily get that conditionally on $\left( Y_{[0, k]}, M_{[0, k)}\right) $, $M_k$ is independent of $Y_{[k + 1, n]}$. In other words,

$$\begin{aligned} H_{k, n} = H_k = {\textbf{H}}\,(M_k\,|\,Y_{[0, k]}, M_{[0, k)}). \end{aligned}$$

Now, using the definition of Shannon conditional entropy, the fact that on the event $Y_k=0$, we have $M_k\equiv 0$, whereas on $Y_k = 1$, we have $M_k = X_k$ and the stationarity of the $(\textbf{X}, \textbf{Y})$, we get

$$\begin{aligned} H_k= & {} {\mathbb {P}}\,(Y_k = 1){\textbf{H}_{{Y}_k=1}}\,(X_k\,|\,Y_{[0,k)},M_{[0,k)})\\ {}= & {} {\mathbb {P}}\,(Y_0 = 1){{\textbf{H}}_{{Y}_0=1}}\,(X_0\,|\,Y_{[-k,0)},M_{[-k,0)}). \end{aligned}$$

Moreover, if $Y = Y_{[-k,0)}$, $M = M_{[-k,0)}$, $y = y_{[-k,0)}$, $m = m_{[-k,0)}$, $s_{-k} = \sum _{i = -k}^{-1}y_i$, $r_{-s_{-k}}< \cdots < r_{-1}$ are such that $y_{r_{-i}} = 1$, then

$$\begin{aligned} {{\mathbb {P}}_{Y_{0} = 1}}\,(Y = y , M = m) = {\left\{ \begin{array}{ll} {{\mathbb {P}}_{Y_{0} = 1}}\,(Y = y)\,{\mathbb {P}}\,\left( X_{\{r_{-1}, \ldots , r_{s_{-k}}\}} = m_{\{r_{-1}, \ldots , r_{s_{-k}}\}}\right) , \;\; &{} s_{-k} > 0, \\ {{\mathbb {P}}_{{Y}_{0} = 1}}(Y = y), \;\; &{} s_{-k} = 0, \\ \end{array}\right. } \end{aligned}$$

whenever $m\leqslant y$ coordinatewise (otherwise, we get zero). This implies that

$$\begin{aligned} H_k= & {} {\mathbb {P}}(Y_0 = 1)\,{{\mathbb {P}}_{{Y}_{0 = 1}}}\,(S_{-k} = 0)\,{\textbf{H}}\,(X_0) \\{} & {} +\,{\mathbb {P}}\,(Y_0 = 1)\,\mathbb {E}_{Y_0 = 1} \mathbb {1}_{S_{-k} > 0}{\textbf{H}}\,(X_0\,|\,X_{\{r_{-1}, \ldots , r_{s_{-k}}\}})_{|_{r_{-i} = R_{-i}, s_{-k} = S_{-k}}}. \end{aligned}$$

Since $\textbf{Y}$ visits 1 a.s. infinitely many times (in the past),

$$\begin{aligned} {{\mathbb {P}}_{{Y}_{0 = 1}}}(S_{-k} = 0) \rightarrow 0 \text { as }k\rightarrow \infty . \end{aligned}$$

Moreover, $\mathbb {P}_{Y_0 = 1}$ a.s., we have $\mathbb {1}_{S_{-k} > 0} \rightarrow 1$ and

$$\begin{aligned} {\textbf{H}}\,(X_0\,|\,X_{\{r_{-1}, \ldots , r_{s_{-k}}\}})_{|_{r_{-i} = R_{-i}, s_{-k} = S_{-k}}} \rightarrow {\textbf{H}}\,(X_0\,|\,X_{\{r_{-1}, r_{-2}, \ldots , \}})_{|_{r_{-i} = R_{-i} }}. \end{aligned}$$

Thus, by the bounded convergence theorem, we get that

$$\begin{aligned} H_k \rightarrow {\mathbb {P}}\,(Y_0 = 1)\mathbb {E}_{Y_0 = 1}{\textbf{H}}\,(X_0\,|\,X_{\{r_{-1}, r_{-2}, \ldots , \}})_{|_{r_{-i} = R_{-i} }}, \end{aligned}$$

which, by (2.3), concludes the proof of Theorem 1.7 (A).

2.2.2 Part (B)

First, we will prove a technical lemma.

Lemma 2.13

We have

$$\begin{aligned} {\textbf{H}}\,(\textbf{X}\cdot \textbf{Y}\,\,|\,\,\textbf{Y})=\lim \limits _{n \rightarrow \infty }\frac{1}{n}\mathbb {E}\mathbb {1}_{S_n > 0}{\textbf{H}}\,(X_{r_0}, X_{r_1},\ldots , X_{r_{s_n - 1}})_{|_{r_i = R_i, s_n = S_n}}. \end{aligned}$$

Proof

Since for any $k\in {\mathbb {Z}}$, on the event $Y_k=0$, we have $M_k\equiv 0$, it follows that

$$\begin{aligned} {\textbf{H}}\,(M_{[0, n]}\,|\,Y_{[0, n]}) = {\mathbb {P}}\,(S_n> 0)\sum _{y_{[0,n]}} {{\mathbb {P}}_{{S}_n > 0}}(Y_{[0, n]} = y_{[0, n]})\,{{\textbf{H}}_{Y_{[0, n]} = y_{[0, n]}}}\,(M_{[0, n ]}). \end{aligned}$$

Moreover, if $s_n = \sum _{i = 0}^n y_i>0$ then

$$\begin{aligned} {{\mathbb {P}}_{Y_{[0,n]}=y_{[0,n]}}}(M_{[0,n]}=m_{[0,n]})={\mathbb {P}}(X_{r_0}=m_{r_0},\dots , X_{r_{s_n - 1}}=m_{r_{s_n - 1}}), \end{aligned}$$

whenever $m_{[0,n]}\leqslant y_{[0,n]}$ coordinatewise (otherwise, we get zero). Hence,

$$\begin{aligned} {\textbf{H}}_{{Y}_{[0, n]} = {y}_{[0, n]}}\,(M_{[0, n ]}) = {\textbf{H}}\,(X_{r_0}=m_{r_0},\dots , X_{r_{s_n - 1}}=m_{r_{s_n - 1}}), \end{aligned}$$

which results in

$$\begin{aligned} {\textbf{H}}\,(M_{[0, n]}\,|\,Y_{[0, n]})&= {\mathbb {P}}\,(S_n> 0)\,\mathbb {E}_{S_n> 0}{\textbf{H}}\,(X_{r_0},\dots , X_{r_{s_n - 1}})_{|_{r_i = R_i, s_n = S_n}}\\&= \mathbb {E}\mathbb {1}_{S_n > 0}{\textbf{H}}\,(X_{r_0},\dots , X_{r_{s_n - 1}})_{|_{r_i = R_i, s_n = S_n}}. \end{aligned}$$

This completes the proof. $\square $

Notice now that

$$\begin{aligned} \frac{1}{n}{\textbf{H}}\,(X_{r_0},\dots ,X_{r_{s_n-1}})=\frac{1}{n}{\textbf{H}}\,(X_{[0,n]})- \frac{1}{n}{\textbf{H}}\,(X_{[0,n]\setminus \{r_0,\dots , r_{s_n-1}\}}\,|\,X_{r_0},\dots ,X_{r_{s_n-1}}), \end{aligned}$$

$\lim _{n\rightarrow \infty }\frac{1}{n}{\textbf{H}}\,(X_{[0,n]})={\textbf{H}}\,({\textbf{X}})$ and that (by the ergodicity of $\textbf{Y}$) we have $\mathbb {1}_{S_n > 0 } \rightarrow 1$. Thus, in order to conclude the proof it remains to find $\lim \nolimits _{n \rightarrow \infty }\mathbb {E}\mathbb {1}_{S_n > 0} H(n, \textbf{R})$ where

$$\begin{aligned} H(n, \textbf{r}) := \frac{1}{n}{\textbf{H}}\,(X_{[0,n]\setminus \{r_0,\dots , r_{s_n-1}\}}\,|\,X_{r_0},\dots ,X_{r_{s_n-1}}), \quad \textbf{r} = {({{r}_i})_{i\in {\mathbb {Z}}}}. \end{aligned}$$

More precisely, if we show that

$$\begin{aligned} \lim \limits _{n \rightarrow \infty } H(n, \textbf{R}) = \mathbb {P}(A_0)\mathbb {E}_{A_0} {\textbf{H}}\,(X_{[r_0+1, r_{1} - 1]}\,|\,X_{(-\infty , r_0]}, X_{\{r_{ 1}, r_{2}, \ldots \}})|_{r_i=R_i} \end{aligned}$$

(2.4)

holds a.e. then by the bounded convergence theorem (as $H(n,\textbf{R})\leqslant {{\textbf{H}}\,({X}_0)})$ we will have

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb {E}\mathbb {1}_{S_n>0}H(n,\textbf{R}) = \mathbb {P}(A_0)\mathbb {E}_{A_0} {\textbf{H}}\,(X_{[r_0+1, r_{1} - 1]}\,|\,X_{(-\infty , r_0]}, X_{\{r_{ 1}, r_{2}, \ldots \}})|_{r_i=R_i} \end{aligned}$$

since $\lim _{n\rightarrow \infty }\mathbb {1}_{S_n>0}=1$ a.e. by the ergodicity of $\textbf{Y}$.

Let

$$\begin{aligned} A_i=[Y_0=\ldots = Y_{i-1}=0,Y_i=1] \text { for }i\geqslant 0 \end{aligned}$$

(in particular, $A_0=[Y_0=1]$).

Fix $\textbf{y}$ and $n\in {\mathbb {N}}$. By the chain rule, we get

$$\begin{aligned} nH(n, \textbf{r})&= \underbrace{{\textbf{H}}\,(X_{[0,r_0-1]}\,|\,X_{\{r_0,\dots , r_{s_n-1}\}})}_{\Sigma _1(n)} +\underbrace{{\textbf{H}}\,((X_{[r_{s_n-1}+1,n]}\,|\,X_{r_{s_n-1}})}_{\Sigma _3(n)} \\&\quad +\underbrace{\sum _{i=0}^{s_n-2}{\textbf{H}}\,(X_{[r_i+1,r_{i+1}-1]}\,|\,X_{[0,r_i]},X_{\{r_{i+1},\dots ,r_{s_n-1}\}})}_ {\Sigma _2(s_n-1)} . \end{aligned}$$

We will deal first with the summands $\Sigma _1(n)$ and $\Sigma _3(n)$. Clearly,

$$\begin{aligned} \frac{1}{n}\Sigma _1(n) \leqslant \frac{1}{n}{\textbf{H}}\,(X_{[0,r_0-1]}) \leqslant \frac{r_0}{n}H(X_0)\rightarrow 0 \end{aligned}$$

(2.5)

when $n\rightarrow \infty $. Since $s_n=s_{r_{s_n-1}}$, $\frac{s_n}{n} \rightarrow {\mathbb {P}}\,(Y_0 = 1) > 0$ (by the ergodicity of $\textbf{Y}$) and $r_{s_n - 1} \rightarrow \infty $, it follows that

$$\begin{aligned} \frac{\Sigma _3(n)}{n} \leqslant \frac{n-r_{s_n-1}}{n}H(X_0)=\left( 1-\frac{r_{s_n-1}}{s_{r_{s_n-1}}}\cdot \frac{s_n}{n}\right) H(X_0) \rightarrow 0. \end{aligned}$$

(2.6)

In order to deal with $\Sigma _2(s_n-1)$, notice that

$$\begin{aligned} \frac{1}{n}\Sigma _2(s_n-1) =\frac{s_n}{n}\frac{1}{s_n}\Sigma _2(s_n-1). \end{aligned}$$

(2.7)

Because of $\frac{s_n}{n} \rightarrow {\mathbb {P}}\,(Y_0 = 1)$, it suffices to show that $\mathbb {P}_{A_0}$-a.e. we have

$$\begin{aligned} \lim \limits _{n\rightarrow \infty }\frac{1}{n}\Sigma _2(n) = \mathbb {E}_{A_0}{\textbf{H}}\,(X_{[r_0+1, r_{1} - 1]}\,|\,X_{(-\infty , r_0]}, X_{\{r_{ 1}, r_{2}, \ldots \}}). \end{aligned}$$

(2.8)

Using the stationarity of $\textbf{X}$, for $t_i = r_i - r_{i - 1}$, we obtain

$$\begin{aligned} \Sigma _2(n)&=\sum _{i=0}^{n-1}{\textbf{H}}\,(X_{[r_i+1,r_{i+1}-1]}\,|\,X_{[0,r_i]},X_{\{r_{i+1},\dots ,r_{n}\}}) \\&=\sum _{i = 0}^{n-1}{\textbf{H}}\,(X_{[1, t_{i + 1} - 1]}\,|\,X_{[-r_i, 0]}, X_{\{t_{i + 1}, \dots , t_{i + 1} + \dots + t_{n}\}}). \end{aligned}$$

We would like to apply Maker’s ergodic theorem to study the above sum. However, we cannot do it directly due to the term $X_{[-r_i, 0]}$ appearing in the conditional entropies. This obstacle will be overcome by estimating each summand from below and above.

Fix $k\in {\mathbb {N}}$. Then for every i such that $r_i \geqslant k$ and for every $j \in {\mathbb {N}}$, we have

$$\begin{aligned} H_{\infty , j}\left( t_{i + 1}, t_{i + 2}, \ldots \right)\leqslant & {} {\textbf{H}}\,(X_{[1, t_{i + 1} - 1]}\,|\,X_{[-r_i, 0]}, X_{\{t_{i + 1}, \ldots , t_{i + 1} + \cdots t_{i + j}\}}) \nonumber \\\leqslant & {} H_{k, j}\left( t_{i + 1}, t_{i + 2}, \ldots \right) , \end{aligned}$$

(2.9)

where $H_{k, j}\left( t_{i + 1}, t_{i + 2}, \ldots \right) = {\textbf{H}}\,(X_{[1, t_{i + 1} - 1]}\,|\,X_{(-k, 0]}, X_{\{t_{i + 1}, \ldots , t_{i + 1} + \cdots t_{i + j}\}})$ for $k \in {\mathbb {Z}}\cup \{\infty \}$. Clearly,

$$\begin{aligned} H_{k, j}\left( t_{1}, t_{2}, \ldots \right) \xrightarrow {j\rightarrow \infty } H_{k}\left( t_{1}, t_{2}, \ldots \right)&:= {\textbf{H}}\,(X_{[1, t_{1} - 1]}\,|\,X_{(-k, 0]}, X_{\{t_{ 1}, t_{1} + t_{2}, \ldots \}})\\&={\textbf{H}}\,(X_{[r_0+1, r_{1} - 1]}\,|\,X_{(-k, r_0]}, X_{\{r_{1}, r_2, \ldots \}}). \end{aligned}$$

By the entropy chain rule and Kac’s lemma,

$$\begin{aligned} \sup _{k, j \in {\mathbb {N}}} H_{k, j}(T_{[1, \infty )}) \leqslant {{\textbf{H}}\,({X}_0)} T_1 \in L_1(\mathbb {P}_{A_0}). \end{aligned}$$

(2.10)

Therefore, Maker’s ergodic theorem implies that, for every $k\in {\mathbb {N}}\cup \{\infty \}$, $\mathbb {P}_{A_0}$ a.s., we have

$$\begin{aligned} \lim \limits _{n\rightarrow \infty }\frac{1}{n}\sum _{i = 0}^{n-1} H_{k, n - i}\left( t_{i + 1}, t_{i + 2}, \ldots \right) \rightarrow \mathbb {E}_{A_0} H_{k}\left( T_{1}, T_{2}, \ldots \right) . \end{aligned}$$

(2.11)

Using (2.9), it follows from the definition of $\Sigma _2$ (and the chain rule) that

$$\begin{aligned} \begin{aligned} \frac{1}{n}\sum _{i=0}^{n-1}H_{\infty ,n-i}(t_{i+1},t_{i+2},\dots )&\leqslant \frac{1}{n}\Sigma _2(n)\\&\leqslant \frac{t_1+\dots +t_k}{n}H(X_0)+ \frac{1}{n}\sum _{i=k}^{n-1}H_{k,n-i}(t_{i+1},t_{i+2},\dots )\\&\leqslant \frac{t_1+\dots +t_k}{n}H(X_0)+\frac{1}{n}\sum _{i=0}^{n-1}H_{k,n-i}(t_{i+1},t_{i+2},\dots ), \end{aligned} \end{aligned}$$

(2.12)

with $\frac{t_1+\dots +t_k}{n}H(X_0)\xrightarrow {n \rightarrow \infty } 0$. Thus, due to (2.11),

$$\begin{aligned} \mathbb {E}_{Y_0 = 1} H_{\infty }\left( T_{1}, T_{2}, \ldots \right) \leqslant \lim \limits _{n \rightarrow \infty }\frac{1}{n}\Sigma _2(n) \leqslant \mathbb {E}_{Y_0 = 1} H_{k}\left( T_{1}, T_{2}, \ldots \right) . \end{aligned}$$

Notice that $H_k \rightarrow H_\infty $ as $k\rightarrow \infty $. Hence, combining (2.10) and the bounded convergence theorem, we obtain

$$\begin{aligned} \lim \limits _{n\rightarrow \infty }\frac{1}{n} \Sigma _2(n)= \mathbb {E}_{A_0} H_{\infty }\left( T_{1}, T_{2}, \ldots \right) \end{aligned}$$

(2.13)

$\mathbb {P}_{A_0}$ a.s. which is exactly (2.8) under $\mathbb {P}_{A_0}$.

It remains to show (2.8) under $\mathbb {P}_{A_i}$ for $i\geqslant 1$. However, it is a direct consequence of the above and the following lemma:

Lemma 2.14

Suppose that we have a sequence of measurable functions $(f_n)_{n\geqslant 1}$ depending on $(T_n)_{n\geqslant 1}$ and a measurable function f depending on $\textbf{Y}$ such that

$$\begin{aligned} f_n((T_n)_{n\geqslant 1}) \rightarrow f(\textbf{Y}) \end{aligned}$$

(2.14)

$\mathbb {P}_{A_0}$-a.e. Then (2.14) holds also $\mathbb {P}_{A_i}$-a.e. for each $i\geqslant 1$.

Proof

For the sake of simplicity, we assume that $\textbf{Y}$ is a cannonical process. Let $B_0\subset A_0$ be the set where (2.14) holds. We claim that $B_i:=A_i\cap S^{-i}B_0$ is of full measure $\mathbb {P}_{A_i}$ and (2.14) holds on $B_i$. Indeed, since $S^iA_i\subset A_0$, we have

$$\begin{aligned} \mathbb {P}_{A_i}(A_i\setminus B_i)=\frac{1}{\mathbb {P}(A_i)}\mathbb {P}(A_i\setminus S^{-i}B_0)=\frac{1}{\mathbb {P}(A_i)}\mathbb {P}(S^iA_i\setminus B_0)\leqslant \frac{1}{\mathbb {P}(A_i)}\mathbb {P}(A_0\setminus B_0)=0. \end{aligned}$$

Moreover, if $\textbf{y}\in B_i$ then $S^i\textbf{y}\in S^iA_i\cap B_0\subset A_0\cap B_0=B_0$. Since $\textbf{y}\in A_i$, it follows immediately that $T_n(\textbf{y})=T_n(S^i\textbf{y})$ for all $n\geqslant 1$ which completes the proof. $\square $

2.3 General setting: proof of Corollary 1.19 and related examples

In this section we will study a certain class of good $(\textbf{X},\textbf{Y})$ with no entropy drop. We begin by the proof of Corollary 1.19.

Proof of Corollary 1.19

Let $L\geqslant 1$ be such that $\text {supp}\ \textbf{y} \supset L\mathbb {Z} +a$ for some a and for a.e. realization $\textbf{y}$ of $\textbf{Y}$. Let $(X,\mathcal {B},\mu ,T)$ be a measure-theoretic dynamical system with entropy less than $\frac{1}{L}\log 2$ and take a measurable partition $X=J \cup J^c$ that is generating for the map $T^L$. Let Y be the orbit closure of $\textbf{y}$ in $\{0,1\}^\mathbb {Z}$ under the left shift.

Process $\textbf{M}$ corresponds to coding of points in $(X\times Y,T\times S)$ with respect to $J\times C$ (with $C=[1]\subset Y$) and its complement. Using Theorem 1.7 (B), we obtain

$$\begin{aligned} {\textbf{H}}\,({\textbf{M}}) ={\textbf{H}}\,({\textbf{X}})-\mathbb {P}(A_0)\mathbb {E}_{A_0}{\textbf{H}}\,(X_{[r_0+1,r_1-1]}\,|\,X_{(-\infty , r_0]},X_{\{r_1,r_2,\dots \}})|_{r_i=R_i}={\textbf{H}}\,({\textbf{X}}). \end{aligned}$$

(a.e. $\textbf{r}$ contains a two-sided infinite arithmetic progression with difference L, the partition $\{J,J^c\}$ is generic for $T^L$ and thus the conditional entropy in the above formula is equal to zero). $\square $

It would be interesting to know if in the above example $\textbf{X}$ can be recovered from $\textbf{M}$. Let us see now that this can be the case when $\textbf{Y}$ arises from the rotation on two points $\{0,1\}$. We will look at it both from the probabilitic and ergodic-theoretic perspective.

Example 2.15

Let $\left( \xi _i\right) _{i\in {\mathbb {Z}}}$ be a sequence of i.i.d. random variables such that

$$\begin{aligned} {\mathbb {P}}\,(\xi _0 = 0)\, = \,{\mathbb {P}}\,(\xi _0 = 1)\, = \frac{1}{2}, \end{aligned}$$

an arbitrary (relabelling) 1-1 function $F:\{0,1\}^2 \rightarrow \{0, 1, 2, 3\}$ and put

$$\begin{aligned} X_i = F(\xi _i, \xi _{i + 1}), \qquad \textbf{Y} \sim \frac{1}{2}(\delta _a+\delta _{Sa}), \end{aligned}$$

where , S stands for the left shift and $\textbf{X} \amalg \textbf{Y}$. Since $\textbf{X}$ is a Markov chain and F is 1-1, we have

$$\begin{aligned} {\textbf{H}}\,({\textbf{X}}) = {\textbf{H}}\,(X_1\,|\,X_0) = {\textbf{H}}\,(\xi _1, \xi _2\,|\,\xi _0, \xi _1) = {\textbf{H}}\,(\xi _2\,|\,\xi _0, \xi _1) = {\textbf{H}}\,(\xi _2)=\log 2. \end{aligned}$$

Moreover, $\mathbb {P}_{Y_0=1}(R_{-1}=-2)=1$ and therefore

$$\begin{aligned} \mathbb {E}_{Y_0=1} {\textbf{H}}\,(X_0\,|\,X_{\{r_{-1}, r_{-2}, \cdots \}})|_{r_i=R_i}={\textbf{H}}\,(X_0\,|\,X_{-2})={{\textbf{H}}\,({X}_0)}=2\log 2. \end{aligned}$$

Clearly, for every $j\in {\mathbb {Z}}$, $\left( X_i\right) _{i \leqslant j} \amalg \left( X_i\right) _{i \geqslant j + 2}$ yielding

$$\begin{aligned} \frac{1}{n}f(y_{[0,n]}) = \frac{1}{n}{\textbf{H}}\,(X_{r_1}, \ldots , X_{r_m}) = \frac{m}{n} {{\textbf{H}}\,({X}_0)} \rightarrow \frac{1}{2}{{\textbf{H}}\,({X}_0)}. \end{aligned}$$

Thus, by Theorem 1.7 (A), ${\textbf{H}}\,({\textbf{M}}) = \frac{1}{2}{{\textbf{H}}\,({X}_0)} = \frac{1}{2} 2 \log 2 =\log 2 = {\textbf{H}}\,({\textbf{X}})$. In fact, notice that since F is 1-1, knowing all even (resp. all odd) coordinates of a realization $\textbf{x}$ of $\textbf{X}$ determines its all coordinates. In other words, $\textbf{M}$ contains full information about $\textbf{X}$.

We will now see how to use ergodic-theoretic approach to modify the above idea so that $X_i \in \{0,1\}$ and keep the property ${\textbf{H}}\,({\textbf{M}})={\textbf{H}}\,({\textbf{X}})$ and the ability to recover $\textbf{X}$ from $\textbf{M}$.

Example 2.16

Let $(X,\mathcal {B},\mu ,T)$ be an ergodic automorphism, with $h(\mu )\in (0,\log 2)$ and let S be the rotation on $Y=\{0,1\}$, with the unique invariant measure denoted by $\nu $. Let $\{J,J^c\}$ be a (measurable) generating partition of X for T (the existence of such a partition follows by Krieger’s finite generator theorem [27]) and let $C:=\{1\}\subset Y$ We consider the following stationary process:

$$\begin{aligned} \textbf{X}=({\mathbb {1}_J \circ T^i})_{i\in {\mathbb {Z}}} \text { and }\textbf{Y}=({\mathbb {1}_C \circ S^i})_{i\in {\mathbb {Z}}}. \end{aligned}$$

Then $\textbf{M}:=\textbf{X}\cdot \textbf{Y}$ corresponds to coding of points in the dynamical system $(X\times Y,T\times S)$ with respect to $J\times C$ and its complement:

Equivalently, $\textbf{M}$ corresponds to the dynamical system that is a tower of height two above the factor of $T^2$ corresponding to the partition $\{J,J^c\}$.

Assume now additionally that $h(T)<\frac{1}{2}\log 2$ and the partition $\{J,J^c\}$ is generating for $T^2$ (e.g. T can be a Bernoulli automorphism, with entropy less than $\frac{1}{2}\log 2$). Then $\textbf{M}$ corresponds to a tower of height two above $T^2$, denoted by R, and given by

$$\begin{aligned} R(x,0)=(x,1),\ R(x,1)=(T^2x,0). \end{aligned}$$

Notice that R is isomorphic to $T\times S$ via the map $\Phi $ given by

$$\begin{aligned} \Phi (x,0)=(x,0),\ \Phi (x,1)=(Tx,1) \end{aligned}$$

(we easily check that $\Phi \circ R=(T\times S)\circ \Phi $). It follows that

$$\begin{aligned} {\textbf{H}}\,({\textbf{M}})=h(\mu \otimes \nu )=h(\mu )={\textbf{H}}\,({\textbf{X}})>0. \end{aligned}$$

(2.15)

In fact, since $\Phi $ is an isomorphism, one can filter out $\textbf{X}$ from $\textbf{M}$.

2.4 ${\mathscr {B}}$-free systems: proof of Proposition 1.20

Let ${\mathscr {B}}\subset \mathbb {N}$, let $\eta =\mathbb {1}_{\mathcal {F}_{\mathscr {B}}}$ and let $(X_\eta ,S)$ be the corresponding ${\mathscr {B}}$-free system, with the underlying Mirsky measure $\nu _\eta $. Recall that:

$$\begin{aligned} h({\widetilde{X}}_\eta ,S)={\overline{d}}(\mathcal {F}_{\mathscr {B}})=\nu _\eta (1), \end{aligned}$$

so $\nu _\eta \ne \delta _{(\dots ,0,0,0,\dots )}$ is equivalent to $h({\widetilde{X}}_\eta ,S)>0$. Thus, $\nu _\eta \ne \delta _{(\dots ,0,0,0,\dots )}$ is necessary and sufficient for the existence of $\kappa $ with $h(\nu _\eta *\kappa )>0$.

Proof of Proposition 1.20

It was shown in Theorem 3.7 in [12] that the following are equivalent:

$(X_\eta ,S)$ is proximal,
${\mathscr {B}}$ contains an infinite pairwise coprime subset,
the support of $\eta $ does not contain a two-sided infinite arithmetic progression.

Thus, in order to complete the proof of Proposition 1.20, we need to show that in the proximal case, for infinitely many $k\geqslant 1$ the block of the form $10\ldots 01$ (with k zeros between the 1’s) is of positive Mirsky measure $\nu _\eta $. An important notion in the theory of ${\mathscr {B}}$-free systems is that of tautness [23], defined in terms of the logarithmic density of sets of multiples. We say that ${\mathscr {B}}$ is taut if for any $b\in {\mathscr {B}}$, we have

$$\begin{aligned} \varvec{\delta }(\mathcal {M}_{{\mathscr {B}}}) > \varvec{\delta }(\mathcal {M}_{{\mathscr {B}}\setminus \{b\}}), \end{aligned}$$

where $\varvec{\delta }(A)=\lim _{N\rightarrow \infty }\frac{1}{\log N}\sum _{n\leqslant N}\frac{1}{n}\textbf{1}_{A}(n)$ for any $A\subset \mathbb {Z}$. It was proved in [12] (see Theorem C and Lemma 4.11 therein) that given any ${\mathscr {B}}$, there exists a taut set ${\mathscr {B}}'$ such that $\mathcal {M}_{{\mathscr {B}}'}\subset \mathcal {M}_{\mathscr {B}}$ and $\nu _{\eta '}=\nu _\eta $. Keller [26] proved that the Mirsky measure of any taut set has full support. Therefore, whenver $\nu _\eta =\nu _{\eta '}\ne \delta _{(\dots ,0,0,0,\dots )}$ then any block of the form $10\dots 01$ appearing in $\eta '$ (and there are infinitely many such blocks as we exclude the Dirac measure at $(\dots ,0,0,0,\dots )$!) is in fact of positive $\nu _\eta $-measure. $\square $

Notes

The ergodicity of $(X,\mathcal {B},\mu ,T)$ or $(Y,\mathcal {C},\nu ,S)$ is a necessary condition for disjointness.
Recall that $\varvec{\mu }(n)=(-1)^k$ if n is a product of k distinct primes, $\varvec{\mu }(1)=1$ and $\varvec{\mu }(n)=0$ otherwise; $\varvec{\mu }^2$ is the characteristic function of the set of square-free integers.
More generally, given a finite alphabet $\mathcal {X}$, we define S to be the left shift on $\mathcal {X}^{\mathbb {Z}}$, i.e. $S({({{x}_i})_{i\in {\mathbb {Z}}}})={({{y}_i})_{i\in {\mathbb {Z}}}}$, where $y_i=x_{i+1}$, $i\in {\mathbb {Z}}$. Each closed S-invariant subset of $\mathcal {X}^{\mathbb {Z}}$ is called a subshift.
By $\rho |_{X_\eta }$, we denote the projection of $\rho $ onto the first coordinate $X_\eta $; $M_*(\rho )$ stands for the image of $\rho $ via M. We will use similar notation later on, too.
Equivalently, if $\pi :(X,\mathcal {B},\mu ,T) \rightarrow (Z,\mathcal {D},\rho ,R)$ intertwines the actions of T and R, then R is called a factor of T (as $\mathcal {A}=\pi ^{-1}(\mathcal {D})\subset \mathcal {B}$ is T-invariant).
In the special case of the square-free system the result was proved by Peckner [33].
By ${\overline{d}}$ we denote the upper asymptotic density.
Recall that $\sigma (Y)$ stand for the smallest $\sigma $-algebra making Y measurable.
Cf. Remark 1.23.
To see this, it suffices to notice that given a process $\textbf{Z}$, we have $\frac{1}{n}{\textbf{H}}\,(Z_{[0, n]}\,|\,Y_{[0, n]}) \leqslant \frac{1}{n}{\textbf{H}}\,(Z_{[0, n]}) = \frac{1}{n}{\textbf{H}}\,(Z_{[0,n]}\,|\,Y_{[0, n]}) + \frac{1}{n}{{\textbf{H}}\,(Y_{[0,n]})}$. In particular, we can take $\textbf{Z}=\textbf{X}\cdot \textbf{Y}$.
Using the fact that factors cannot increase entropy and by the subadditivity of entropy rate, we have ${\textbf{H}}\,(\textbf{X}\cdot \textbf{Y})\leqslant {\textbf{H}}\,((\textbf{X}, \textbf{Y})) \leqslant {\textbf{H}}\,(\textbf{X}) + {\textbf{H}}\,({\textbf{Y}})$.
Let $H_1,H_2$ be Hilbert spaces and let $G_2\subset H_2$ be a closed subspace. Suppose that $ f\otimes g\in H_1\otimes G_2, \text { with }f\ne 0. $ Let $g=g_0+g'_0$, with $g_0\in G_2$ and $g'_0\in G_2^\perp $. It follows that $f\otimes g'_0\in H_1\otimes G_2$. But, on the other hand, we can approximate $f\otimes g'_0$ by tensors of the form $\sum _{n}\alpha _n f_k\otimes h_k$ with $h_k\in G_2$ which are all orthogonal with $f\otimes g'_0$. This means that $g_0'=0$ and, thus, we have $g\in G_2$.
For subset $A\subset {\mathbb {Z}}$ symmetric with respect to 0, we have $d(A)=\lim _{N\rightarrow \infty }\frac{1}{N}|A\cap [1,N]|=\lim _{N\rightarrow \infty }\frac{1}{2N}|A\cap [-N,N]|$; an analogous relation holds for ${\underline{d}}$.

References

Berbee, H.: Periodicity and absolute regularity. Israel J. Math. 55, 289–304 (1986)
Article MathSciNet MATH Google Scholar
Besicovitch, A.S.: On the density of certain sequences of integers. Math. Ann. 110, 336–341 (1935)
Article MathSciNet MATH Google Scholar
Blackwell, D., Freedman, D.: The tail $\sigma $-field of a Markov chain and a theorem of Orey. The Annals of Mathematical Statistics 35, 1291–1295 (1964)
Article MathSciNet MATH Google Scholar
Bradley, R.C.: Introduction to strong mixing conditions, vol. 1. Kendrick Press, Heber City, UT (2007)
MATH Google Scholar
Chowla, S.: On abundant numbers, J. Indian Math. Soc., New Ser., 1 (1934), pp. 41–44
Chung, K. L.: Markov Chains with Stationary Transition Probabilities: 2d Ed, Springer, 1967
Davenport, H.: Über numeri abundantes, Sitzungsber.Preuss.Akad.Wiss., (1933), pp. 830–837
Davenport, H., Erdös, P.: On sequences of positive integers. Acta Arithmetica 2, 147–151 (1936)
Article MATH Google Scholar
Davenport, H., Erdös, P.: On sequences of positive integers, J. Indian Math. Soc. (N.S.), 15: 19–24 (1951)
de Finetti, B.: Funzione caratteristica di un fenomeno aleatorio, Atti della R. Accademia Nazionale dei Lincei, Ser. 6. Memorie, Classe di Scienze Fisiche, Matematiche e Naturali 4, (1931), pp. 251–299
Downarowicz, T.: Entropy in Dynamical Systems. New Mathematical Monographs, vol. 18. Cambridge University Press, Cambridge (2011)
Dymek, A., Kasjan, S., Kułaga-Przymus, J., Lemańczyk, M.: ${\mathscr {B}}$-free sets and dynamics. Trans. Amer. Math. Soc. 370, 5425–5489 (2018)
Article MathSciNet MATH Google Scholar
Einsiedler, M., Ward, T.: Ergodic theory with a view towards number theory. Graduate Texts in Mathematics, vol. 259. Springer-Verlag, London Ltd, London (2011)
Erdös, P.: On the Density of the Abundant Numbers. J. London Math. Soc. 9, 278–282 (1934)
Article MathSciNet MATH Google Scholar
Feller, W.: An introduction to probability theory and its applications, vol. 1, John Wiley & Sons, 2008
Friedman, N.A., Ornstein, D.S.: On isomorphism of weak Bernoulli transformations. Advances in Math. 5(1970), 365–394 (1970)
Article MathSciNet MATH Google Scholar
Furstenberg, H.: Disjointness in ergodic theory, minimal sets, and a problem in Diophantine approximation. Math. Systems Theory 1, 1–49 (1967)
Article MathSciNet MATH Google Scholar
Furstenberg, H., Peres, Y., Weiss, B.: Perfect filtering and double disjointness, Annales de l’I.H.P. Probabilités et statistiques, 31 (1995), pp. 453–465
Garbit, R.: A note on Furstenberg’s filtering problem. Israel J. Math. 182, 333–336 (2011)
Article MathSciNet MATH Google Scholar
Glasner, E.: Ergodic theory via joinings. Mathematical Surveys and Monographs, vol. 101. American Mathematical Society, Providence, RI (2003)
Glasner, E., Thouvenot, J.-P., Weiss, B.: Entropy theory without a past. Ergodic Theory and Dynamical Systems 20, 1355–1370 (2000)
Article MathSciNet MATH Google Scholar
Gray, R. M.: Entropy and information theory, Springer, New York, second ed., 2011
Hall, R.R.: Sets of multiples. Cambridge Tracts in Mathematics, vol. 118. Cambridge University Press, Cambridge (1996)
Hewitt, E., Savage, L.J.: Symmetric measures on Cartesian products. Trans. Amer. Math. Soc. 80, 470–501 (1955)
Article MathSciNet MATH Google Scholar
Hochman, M.: Lectures on dynamical systems and entropy. http://math.huji.ac.il/~mhochman/courses/dynamics2014/notes.5.pdf
Keller, G.: Tautness of sets of multiples and applications to ${\cal{B} }$-free systems. Studia Math. 247, 205–216 (2019)
Article MathSciNet MATH Google Scholar
Krieger, W.: On entropy and generators of measure-preserving transformations. Trans. Amer. Math. Soc. 149, 453–464 (1970)
Article MathSciNet MATH Google Scholar
Kułaga-Przymus, J., Lemańczyk, M., Weiss, B.: On invariant measures for ${\mathscr {B}}$-free systems, Proc. Lond. Math. Soc. (3), 110 (2015), pp. 1435–1474
Maker, P.T.: The ergodic theorem for a sequence of functions. Duke Mathematical Journal 6, 27–30 (1940)
Article MathSciNet MATH Google Scholar
Olshen, R.A.: The coincidence of measure algebras under an exchangeable probability. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 18, 153–158 (1971)
Article MathSciNet MATH Google Scholar
Ornstein, D.S., Weiss, B.: Finitely determined implies very weak Bernoulli. Israel J. Math. 17, 94–104 (1974)
Article MathSciNet MATH Google Scholar
Ornstein, D.S., Weiss, B.: Every transformation is bilaterally deterministic. Israel Journal of Mathematics 21, 154–158 (1975)
Article MathSciNet MATH Google Scholar
Peckner, R.: Uniqueness of the measure of maximal entropy for the squarefree flow. Israel J. Math. 210, 335–357 (2015)
Article MathSciNet MATH Google Scholar
Pinsker, M.S.: Dynamical systems with completely positive or zero entropy. Soviet Math. Dokl. 1, 937–938 (1960)
MathSciNet MATH Google Scholar
Sarnak, P.: Three lectures on the Möbius function, randomness and dynamics. http://publications.ias.edu/sarnak/
Shields, P.: The theory of Bernoulli shifts, The University of Chicago Press, Chicago, Ill.-London, 1973. Chicago Lectures in Mathematics
Volkonskiĭ, V. A., Rozanov, Y. A.: Some limit theorems for random functions. I, Theor. Probability Appl., 4 (1959), pp. 178–197
Volkonskiĭ, V. A., Rozanov, Y. A.: Some limit theorems for random functions. II, Teor. Verojatnost. i Primenen., 6 (1961), pp. 202–215

Download references

Acknowledgements

Research of the first author is supported by Narodowe Centrum Nauki grant UMO-2019/33/B/ST1/00364.

Author information

Authors and Affiliations

Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Chopina 12/18, 87-100, Toruń, Poland
Joanna Kułaga-Przymus
Faculty of Mathematics, Informatics and Mechanics, Warsaw University, Stefana Banacha 2, 02-097, Warsaw, Poland
Michał D. Lemańczyk

Authors

Joanna Kułaga-Przymus
View author publications
You can also search for this author in PubMed Google Scholar
Michał D. Lemańczyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joanna Kułaga-Przymus.

Additional information

Communicated by H. Bruin.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Ergodic theory viewpoint

1.1 A.1 Direct answer to Question 5(B)

Let us first recall the remaining necessary notions from ergodic theory and theory of joinings (for more information, we refer the reader, e.g., to [11, 13, 20]). Given two measure-preserving transformations $(X_i,\mathcal {B}_i,\mu _i,T_i)$, $i=1,2$, any $\rho \in \mathcal {M}(X_1\times X_2,T_1\times T_2)$ that projects onto $\mu _1$ and $\mu _2$ onto the first and second coordinate, respectively, is called a joining of $T_1$ and $T_2$. The set of joinings is always non-empty (it contains the product measure). If $T_1=T_2$, we speak of self-joinings. The diagonal self-joining of $(X,\mathcal {B},\mu , T)$ is determined by $\triangle (A\times B)=\mu (A\cap B)$ for $A, B\in \mathcal {B}$. If $(Z,\mathcal {D},\rho ,R)$ is a common factor of $T_1$ and $T_2$, then also the set of joinings of $T_1$ and $T_2$ that project onto the diagonal self-joining of the common factor is non-empty (it contains the so-called relatively independent extension over the common factor, see [20]).

Proposition A.1

Assume that $\nu ,\kappa \in \mathcal {M}^e(\{0,1\}^{{\mathbb {Z}}},S)$ satisfy $h(\nu )=0$ with $\nu \ne \delta _{(\ldots 0,0,0\ldots )}$ and $h(\kappa )>0$. Then $h(\nu *\kappa )>0$.

Proof

Consider $(\{0,1\}^{{\mathbb {Z}}}\times \{0,1\}^{\mathbb {Z}},\nu \otimes \kappa ,S\times S)$ and denote by $\Pi (\kappa )\subset \mathcal {B}$ the Pinsker $\sigma $-algebra of $\kappa $. Recall that for $(X_i,\mu _i,T_i)$, $i=1,2$, we have the corresponding relation between the Pinsker $\sigma $-algebras: $\Pi (X_1\times X_2,\mu _1\otimes \mu _2, T_1\times T_2)=\Pi (T_1)\otimes \Pi (T_2)$, see, e.g. [21]. It follows that

$$\begin{aligned} \Pi (\nu \otimes \kappa )=\mathcal {B}\otimes \Pi (\kappa ). \end{aligned}$$

(A.1)

Let $C:=\{x\in \{0,1\}^{\mathbb {Z}}: x_0=1\}$ and suppose that $h(\nu *\kappa )=0$, i.e. $\Pi (\nu *\kappa )=\mathcal {B}$. Therefore, additionally using (A.1), we obtain

$$\begin{aligned} M^{-1}(\mathcal {B})=M^{-1}(\Pi (\nu *\kappa ))\subset \Pi (\nu \otimes \kappa )=\mathcal {B}\otimes \Pi (\kappa ) \end{aligned}$$

and it follows that

$$\begin{aligned} C\times C=M^{-1}{C}\in \mathcal {B}\otimes \Pi (\kappa ) \end{aligned}$$

(even though $C\times C=M^{-1}C$ is an equality between sets, we think of it up to sets of measure zero, cf. also Remark 1.5). Hence, for C on the second coordinate in $C\times C$, we have $C\in \Pi (\kappa )$.^{Footnote 12} Since $\{C,C^c\}$ is a generating partition, $\Pi (\kappa )=\mathcal {B}$ (modulo $\kappa $) and it follows immediately that $h(\kappa )=0$. $\square $

1.2 A.2 Simple proof of Theorem 1.3

We begin this section by the following simple but general observation (it overlaps with Theorem 1.3 for uniquely ergodic ${\mathscr {B}}$-free systems):

Proposition A.2

Suppose that (Y, S) is a uniquely ergodic subshift of $\{0,1\}^{\mathbb {Z}}$. Let ${\widetilde{Y}}=M(Y\times \{0,1\}^{\mathbb {Z}})$ be the hereditary closure of Y. Then, for any $\nu \in \mathcal {M}^e({\widetilde{Y}},S)$, there exists $\rho \in \mathcal {M}^e(Y\times \{0,1\}^{\mathbb {Z}},S\times S)$ such that $M_*(\rho )=\nu $.

Proof

Let $z\in {\widetilde{Y}}$ be a generic point for $\nu $. Then there exists $y\in Y$ such that $z\leqslant y$. Moreover, y is generic for the unique S-invariant measure on Y. Let $x\in \{0,1\}^{\mathbb {Z}}$ be such that $M(y,x)=z$. Notice that (y, x) is quasi-generic for some measure $\rho \in \mathcal {M}(Y\times \{0,1\}^{\mathbb {Z}},S\times S)$. Moreover, $M_*(\rho )=\nu $ follows directly from the equality $M(y,x)=z$. To complete the proof, it suffices to use the ergodic decomposition of $\rho $ (the image of a convex combination of measures is a convex combination of their images, with the same coefficients). $\square $

Remark A.3

The original proof of Theorem 1.3 is much more involved than what we present below. However, it includes much more information about the structure of invariant measures for $({\widetilde{X}}_\eta ,S)$. E.g. it serves as a tool to prove that $({\widetilde{X}}_\eta ,S)$ is intrinsically ergodic [12, 28]. Cf. also Remark A.7.

Let now ${\mathscr {B}}=\{b_k : k\geqslant 1\}\subset \mathbb {N}\setminus \{1\}$ and, for each $K\geqslant 1$, let ${\mathscr {B}}_K:=\{b_1,\dots , b_K\}$. Set $\eta :=\mathbb {1}_{\mathcal {F}_{\mathscr {B}}}$ and $\eta _K:=\mathbb {1}_{\mathcal {F}_{{\mathscr {B}}_K}}$. The Mirsky measure $\nu _{\eta _K}$ (considered on ${\widetilde{X}}_{\eta _K}$) is the purely atomic measure given by the periodic point $\eta _K$. Moreover, $\eta _K$ is a generic point for $\nu _{\eta _K}$ while $\eta $ is quasi-generic for $\nu _\eta $, see [12]. Recall also the following classical result of Davenport and Erdös:

Theorem A.4

([7, 8]) For any ${\mathscr {B}}\subset {\mathbb {N}}\setminus \{1\}$,

$$\begin{aligned} {\underline{d}}(\mathcal {M}_{\mathscr {B}})=\lim _{n\rightarrow \infty }d(\mathcal {M}_{{\mathscr {B}}_K}), \end{aligned}$$

(${\underline{d}}$ and d denote the lower density and usual asymptotic density, respectively).^{Footnote 13}

Lemma A.5

For each ${\mathscr {B}}=\{b_k : k\geqslant 1\}\subset {\mathbb {N}}\setminus \{1\}$, $\nu _{\eta _K}\rightarrow \nu _\eta $ weakly, as $K\rightarrow \infty .$

Proof

It suffices to show that

$$\begin{aligned} \int f\, d\nu _{\eta _K} \rightarrow \int f\, d\nu _{\eta } \end{aligned}$$

for functions f on $\{0,1\}^{\mathbb {Z}}$ depending on a finite number (say, L) of coordinates. Let $(N_k)_{k\geqslant 1}$ be an increasing sequence such that $\lim _{k\rightarrow \infty }\frac{1}{N_k}|\mathcal {M}_{\mathscr {B}}\cap [1,N_k]|={\underline{d}}(\mathcal {M}_{\mathscr {B}})$. We then have

$$\begin{aligned} \left| \int f\, d\nu _{\eta _K} -\int f\, d\nu _{\eta }\right|&=\lim _{k\rightarrow \infty }\left| \int f\, d \frac{1}{N_k}\sum _{n\leqslant N_k}\delta _{S^n\eta _K}-\int f\, d\frac{1}{N_k}\sum _{n\leqslant N_k}\delta _{S^n\eta }\right| \\&\leqslant \lim _{k\rightarrow \infty }\frac{1}{N_k}\sum _{n\leqslant N_k}\left| f(S^n\eta _K)-f(S^n\eta )\right| \\&\leqslant 2\Vert f\Vert \cdot (2L-1)\cdot \lim _{k \rightarrow \infty }\frac{1}{N_k}\sum _{n\leqslant N_k}\left| \{1\leqslant n\leqslant N_k : \eta _K(n)\ne \eta (n) \}\right| \\&=2\Vert f\Vert \cdot (2L-1)\cdot \left| {\underline{d}}(\mathcal {M}_{\mathscr {B}})-d\left( \mathcal {M}_{{\mathscr {B}}_K}\right) \right| \rightarrow 0 \text { as }K\rightarrow \infty , \end{aligned}$$

where the convergence follows from Theorem A.4. $\square $

Proof of Theorem 1.3

Take $\nu \in \mathcal {M}^e({\widetilde{X}}_\eta ,S)$. Since for $K\geqslant 1$, we have $\eta \leqslant \eta _K$ (coordinatewise), it follows that ${\widetilde{X}}_{\eta _K}\supset {\widetilde{X}}_\eta $, whence $\nu \in \mathcal {M}^e({\widetilde{X}}_{\eta _K},S)$. Let $u_K\in {\widetilde{X}}_{\eta _K}$ be a generic point for $\nu $. Since $u_K\in {\widetilde{X}}_{\eta _K}$, $u_K\leqslant S^i\eta _K$ for some i (because we consider the hereditary closure of a periodic sequence). In other words, we have $u_K=S^i\eta _K \cdot y_K$ for some $y_K\in \{0,1\}^{\mathbb {Z}}$. We may assume without loss of generality that $i=0$ (since $S^{-i}u_K$ and $u_K$ are generic for the same measure). Now, $(\eta _K,y_K)$ is quasi-generic for a measure ${\rho }_K$ defined on ${X}_{\eta _K}\times \{0,1\}^{\mathbb {Z}}$. Note that its projection $\rho _K|_{X_{\eta _K}}$ onto the first coordinate satisfies $\rho _K|_{X_{\eta _K}}=\nu _{\eta _K}$. Moreover,

$$\begin{aligned} M_*({\rho }_K)=\nu \end{aligned}$$

as $u_K=\eta _K\cdot y_K=M(\eta _K,y_K)$ is (quasi-)generic for $M_*({\rho }_K)$ and generic for $\nu $. Passing to a subsequence, if necessary, ${\rho }_K\rightarrow {\rho }$ (a measure on $X_\eta \times \{0,1\}^{\mathbb {Z}}$). Therefore, we have

$$\begin{aligned} \nu =M_*({\rho }_K)\rightarrow M_*({\rho }), \end{aligned}$$

so $\nu =M_*({\rho })$. Moreover,

$$\begin{aligned} \nu _{\eta _K}=\rho _K |_{X_\eta }\rightarrow \rho |_{X_\eta } \end{aligned}$$

so $\rho |_{X_\eta }=\nu _\eta $, in view of Lemma A.5. $\square $

1.3 A.3 Example related to Question 5(B)

This section is related to Remark 1.12: it turns out that after relaxing the independence assumption (ii), there might be plenty of joint distributions of $(\textbf{X},\textbf{Y})$ such that ${\textbf{H}}\,({\textbf{X}})>{\textbf{H}}\,({\textbf{M}})=0$, with $\textbf{X}$ and $\textbf{Y}$, satisfying (i). More precisely, one can prove the following ergodic-theoretic result on ${\mathscr {B}}$-free systems:

Theorem A.6

For any ${\mathscr {B}}\subset \mathbb {N}\setminus \{1\}$, there exists $\rho \in \mathcal {M}(X_\eta \times \{0,1\}^{\mathbb {Z}},S\times S)$ with $\rho |_{X_\eta }=\nu _\eta $, such that $h(\rho ,S\times S)>0$ and $h(M_*(\rho ),S)=0$.

Remark A.7

The proof of Theorem A.6 is quite technical and it is beyond the scope of this paper, as we put emphasis on the “independent case”. It will be published elsewhere. We present it below in the simplest possible case, i.e. for ${\mathscr {B}}=\{2\}$. Then $X_\eta =\{a,b\}$, where and $b:=Sa$, where S is the left shift on $\{0,1\}^{\mathbb {Z}}$. Our approach is ergodic-theoretic and draws heavily on the description of invariant measures for $({\widetilde{X}}_\eta ,S)$ from [12, 28]. The notation is also related to the one in [12, 28].

Proposition A.8

There exists $\rho \in \mathcal {M}(\{a,b\}\times \{0,1\}^{\mathbb {Z}},S\times S)$ such that $h(\rho ,S\times S)>0$, whereas $h(M_*(\rho ),S)=0$.

Define $ {\overline{\Psi }}:\{a,b\}\times \{0,1\}^{\mathbb {Z}}\rightarrow \{a,b\}\times \{0,1\}^{\mathbb {Z}}\times \{0,1\}^{\mathbb {Z}}$ in the following way:

Clearly, ${\widetilde{x}}_c={\widehat{x}}_{Sc}$ for $c\in \{a,b\}$. Notice that one can interpret ${\widehat{x}}_c$ as “survivors”, i.e. these coordinates of x that “survive” after applying M to (c, x), $c\in \{a,b\}$. Similarly, ${\widetilde{x}}_c$, $c\in \{a,b\}$ can be seen as “victims”, i.e. the coordinates of x that are “killed” after applying M to (c, x). Moreover, ${\overline{\Psi }}$ is a homeomorphism.

Lemma A.9

We have ${\overline{\Psi }}\circ (S\times S) = {\overline{S}} \circ {\overline{\Psi }}$, where

$$\begin{aligned} {\overline{S}}(a,y,z)=(Sa,y,Sz)\text { and } {\overline{S}}(b,y,z)=(Sb,Sy,z). \end{aligned}$$

Proof

Direct calculation. $\square $

Denote the restriction of ${\overline{S}}$ to the first to coordinates by ${\widehat{S}}$ and to the first and third coordinate by ${\widetilde{S}}$. All these maps are homeomorphisms. Thus, ${\overline{S}}$ can be viewed as a (topological) joining of ${\widehat{S}}$ and ${\widetilde{S}}$. Both, ${\widehat{S}}$ and ${\widetilde{S}}$ act on $\{a,b\}\times \{0,1\}^{\mathbb {Z}}$; in this joining, the first coodinates of ${\widehat{S}}$ and ${\widetilde{S}}$ are glued diagonally. Each choice of an invariant measure for ${\overline{S}}$ yields a joining (in ergodic theoretic sense) of the corresponding projections for ${\widehat{S}}$ and ${\widetilde{S}}$.

Lemma A.10

We have $M=m\circ \pi _{1,2}\circ \overline{\Psi }$, where $\pi _{1,2}:\{a,b\}\times \{0,1\}^{\mathbb {Z}}\times \{0,1\}^{\mathbb {Z}}\rightarrow \{a,b\}\times \{0,1\}^\mathbb {Z}$ stands for the projection onto the first two coordinates and $m:\{a,b\}\times \{0,1\}^{\mathbb {Z}}\rightarrow \widetilde{\{a,b\}}=M(\{a,b\}\times \{0,1\})$ is given by , .

Proof

Direct calculation. $\square $

We can summarize the above in the following commuting diagram ($\pi _{1,3}$ stands for the projection onto the first and third coordinate):

Proof of Proposition A.8

We start with any ${\widehat{\kappa }}\in \mathcal {M}^e(\{a,b\}\times \{0,1\}^{\mathbb {Z}},{\widehat{S}})$ and $\widetilde{\kappa }\in \mathcal {M}^e(\{a,b\}\times \{0,1\}^{\mathbb {Z}},{\widetilde{S}})$. The projection of both ${\widehat{\kappa }}$ and $\widetilde{\kappa }$ onto the first coordinate is the unique S-invariant measure on $\{a,b\}$, i.e. equals $\frac{1}{2}\left( \delta _{a} + \delta _{b}\right) $. Note that this is nothing but $\nu _\eta $ corresponding to ${\mathscr {B}}=\{2\}$. Therefore, we can “glue” these coordinates together diagonally and find $\kappa \in \mathcal {M}(\{a,b\}\times \{0,1\}^{\mathbb {Z}}\times \{0,1\}^{\mathbb {Z}},{\overline{S}})$ such that

$$\begin{aligned} (\pi _{1,2})_*(\kappa )={\widehat{\kappa }} \text { and }(\pi _{1,3})_*(\kappa )=\widetilde{\kappa } \end{aligned}$$

(for instance, one can take so-called relatively independent extension of the diagonal joining of the first coordinates).

Now, suppose additionally that $0=h({\widehat{S}},{{\widehat{\kappa }}})<h({\widetilde{S}},\widetilde{\kappa })$ (e.g., one can take ${\widehat{\kappa }}=\nu _\eta \otimes \delta _{(\ldots ,0,0,0,\ldots )}$ and ). Then

$$\begin{aligned} h({\widetilde{S}},\widetilde{\kappa })\leqslant h({\overline{S}},\kappa )\leqslant h({\widetilde{S}},\widetilde{\kappa }) + h({\widehat{S}},{\widehat{\kappa }})=h({\widetilde{S}},\widetilde{\kappa }), \end{aligned}$$

where the first inequality follows from the fact that $(S,\widetilde{\kappa })$ is a factor of $({\overline{S}},\kappa )$, the second one is a direct consequence of $({\overline{S}},\kappa )$ being a joining of $(S,\widetilde{\kappa })$ and $(S,{\widehat{\kappa }})$, and we use $h({{\widehat{\kappa }}})=0$. Thus,

$$\begin{aligned} h({\overline{S}},\kappa )=h({\widetilde{S}},\widetilde{\kappa })>0. \end{aligned}$$

Moreover,

$$\begin{aligned} h(S,m_*({\widehat{\kappa }}))\leqslant h({\widehat{S}},{\widehat{\kappa }})=0, \text { whence }h(S,m_*({\widehat{\kappa }}))=0. \end{aligned}$$

Let $\rho :=({\overline{\Psi }}^{-1})_*(\kappa )$. We obtain

$$\begin{aligned} h(\rho ,S\times S)=h({\overline{S}},\kappa )>0. \end{aligned}$$

Moreover, $M_*(\rho )=(m\circ \pi _{1,2}\circ \overline{\Psi })_*(\rho )=(m\circ \pi _{1,2})_*(\kappa )=m_*({\widehat{\kappa }})$ and it follows immediately that

$$\begin{aligned} h(M_*(\rho ),S)=0. \end{aligned}$$

$\square $

Remark A.11

Notice that in the above proof ${\widehat{\kappa }}$ and $\widetilde{\kappa }$ is arbitrary, the only additional assumption was concerned with their entropy. This (together with the fact that ${\overline{\Psi }}$ is a homeomorphism) indicates that the set of measures $\rho $ satisfying the assertion of Proposition A.8 is very rich.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kułaga-Przymus, J., Lemańczyk, M.D. Entropy rate of product of independent processes. Monatsh Math 200, 131–162 (2023). https://doi.org/10.1007/s00605-022-01801-2

Download citation

Received: 16 July 2021
Accepted: 09 October 2022
Published: 10 December 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s00605-022-01801-2

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

	Ergodic	Probabilistic
Ergodicity of \(\mu ^{1}\)	\(\frac{1}{n}\sum _{i=0}^{n-1}S^i f \rightarrow \int f\, d\mu \)	\(\frac{1}{n}\sum _{i=0}^{n-1}f(S^{i}\textbf{X}) \rightarrow \mathbb {E}f(\textbf{X})\)
Poincaré Rec.	\(\mu _{A}\left( \{x : {S^{k}}x \in A \text { i.o.}\}\right) = 1\)	\({{\mathbb {P}}_{\textbf{X} \in A}}(S^k\textbf{X} \in A\text { i.o.}) = 1\)
Kac’s Lemma	\(\int _A n_A d\mu _A =1\)	\({\mathbb {P}}(\textbf{X} \in A) \mathbb {E}_{\textbf{X} \in A} R_A = 1\)
Invariance of \(\mu _A\)	\(S_A\mu _A =\mu _{A}\)	\(S_A\textbf{X} \sim \textbf{X}\), under \(\mathbb {P}_{\textbf{X} \in A}\)
Ergodicity of \(\mu _A\)	\(\frac{1}{n}\sum _{i=0}^{n-1}S^i f \rightarrow \int f\, d\mu _A\)	\(\frac{1}{n}\sum _{i=0}^{n-1}f(S_A^{i}\textbf{X}) \rightarrow \mathbb {E}_{\textbf{X}\in A} f(\textbf{X})\)
Maker’s ET\(^{2}\)	\(\frac{1}{n}\sum _{i=0}^{n-1}S^{i}f_{n - i} \rightarrow \int f d\mu \)	\(\frac{1}{n}\sum _{i=0}^{n-1}f_{n - i}(S^{i}\textbf{X}) \rightarrow \mathbb {E}f(\textbf{X})\)

Entropy rate of product of independent processes

Abstract

Similar content being viewed by others

A Donsker-Type Theorem for Log-Likelihood Processes

Limit theorems for linear processes with tapered innovations and filters

Optimal Berry-Esseen bound for an estimator of parameter in the Ornstein-Uhlenbeck process

1 Background and main results

1.1 Furstenberg’s filtering problem

Question 1

Theorem 1.1

Question 2

Question 3

Question 4

Remark 1.2

1.2 Invariant measures for \({\mathscr {B}}\)-free systems

Theorem 1.3

Question 5

1.3 Main technical result

Remark 1.4

Remark 1.5

Remark 1.6

Theorem 1.7

Remark 1.8

1.4 Consequences of the main technical result

1.4.1 Answer to Questions 4(B) and 5(B)

Corollary 1.9

Remark 1.10

Remark 1.11

Remark 1.12

1.4.2 Answer to Questions 4(C) and 5(C)

Definition 1.13

Remark 1.14

Remark 1.15

Corollary 1.16

Corollary 1.17

Remark 1.18

Corollary 1.19

Proposition 1.20

Corollary 1.21

1.5 Dictionary between ergodic theory and probability theory

Theorem 1.22

Remark 1.23

Corollary 1.24

2 Examples, comments and proofs

2.1 Examples of non-bilaterally deterministic processes

2.1.1 Exchangeable processes

Definition 2.1

Remark 2.2

Remark 2.3

Corollary 2.4

Proposition 2.5

Proof

2.1.2 Markov chains

Corollary 2.6

Remark 2.7

Remark 2.8

Corollary 2.9

Remark 2.10

2.1.3 Weakly Bernoulli processes

Corollary 2.11

Corollary 2.12

2.2 Proof of the main technical result (Theorem 1.7)

2.2.1 Part (A)

2.2.2 Part (B)

Lemma 2.13

Proof

Lemma 2.14

Proof

2.3 General setting: proof of Corollary 1.19 and related examples

Proof of Corollary 1.19

Example 2.15

Example 2.16

2.4 \({\mathscr {B}}\)-free systems: proof of Proposition 1.20

Proof of Proposition 1.20

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author