1 Background and main results

In this paper we concentrate on two seemingly unrelated areas:

  1. (A)

    multiplicative version of the classical Furstenberg’s problem on defiltering a noisy signal,

  2. (B)

    open questions related to invariant measures for so-called \({\mathscr {B}}\)-free systems.

We will now give some background on both, (A) and (B). Then we will present the main technical result and its consequences. Finally, since the paper is a mixture of probabilistic and ergodic tools, we present in a separate section a dictionary allowing for a simultaneous use of both. The remainder of the paper is devoted to the proofs and examples illustrating our results. In the appendix we give some more detailed comments on \({\mathscr {B}}\)-free systems that can be of an independent interest.

1.1 Furstenberg’s filtering problem

The classical Furstenberg’s filtering problem from the celebrated paper [17] concerns two stationary real processes: \(\textbf{X}\) (the emitted signal) and \(\textbf{Y}\) (the noise), with \(\textbf{X}\amalg \textbf{Y}\), and the following question is asked:

Question 1

([17]) When is \(\textbf{X}\) measurable with respect to \(\sigma \)-algebra generated by \(\textbf{X}+\textbf{Y}\)? In other words, when it is possible to recover \(\textbf{X}\) from the received signal \(\textbf{X}+\textbf{Y}\)?

In order to address this problem, Furstenberg [17] introduced the notion of disjointness of dynamical systems, which even today remains one of the central concepts in ergodic theory. Recall that measure-theoretic dynamical systems \((X,\mathcal {B},\mu ,T)\) and \((Y,\mathcal {C},\nu ,S)\) are disjoint if the product measure \(\mu \otimes \nu \) is the only \((T\times S)\)-invariant measure, projecting as \(\mu \) and \(\nu \) onto the first and second coordinates, respectively.Footnote 1 Recall also that each measure-theoretic dynamical system \((X,\mathcal {B},\mu ,T)\) yields a family of bilateral, real, stationary processes in the following way: for any measurable function \(f:X\rightarrow {\mathbb {R}}\), the process \(\textbf{X}=(f\circ T^i)_{i\in {\mathbb {Z}}}\) is stationary. In particular, each measurable partition of X into finitely many pieces yields a finitely-valued stationary process. On the other hand, each real stationary process \(\textbf{X}\) yields a (symbolic) measure-theoretic dynamical system by taking the left shift S on the product space \({\mathbb {R}}^{\mathbb {Z}}\), with the invariant measure given by the distribution of \(\textbf{X}\) (if the state space of \(\textbf{X}\) is smaller than \({\mathbb {R}}\), we can consider the left shift S on the appropriate smaller product space). A crucial basic observation is that whenever the family of functions \(\{f\circ T^i:i\in {\mathbb {Z}}\}\) generates \(\mathcal {B}\) then the resulting symbolic (measure-theoretic) dynamical system is isomorphic to \((X,\mathcal {B},\mu ,T)\). Last, but not least, we say that processes \(\textbf{X}\) and \(\textbf{Y}\) are absolutely independent, whenever the resulting dynamical systems are disjoint. Furstenberg showed that absolute independence is a sufficient condition, under which one has the positive answer to Question 1:

Theorem 1.1

([17]) Suppose that \(\textbf{X}\) and \(\textbf{Y}\) are integrable and that \(\textbf{X}\) is absolutely independent from \(\textbf{Y}\). Then \(\textbf{X}\) is measurable with respect to \(\sigma \)-algebra generated by \(\textbf{X}+\textbf{Y}\).

Garbit [19] showed that the integrability assumption can be dropped and the assertion of Theorem 1.1 still holds.

We are interested in the following modification of Question 1: instead of the sum of processes \(\textbf{X}\) and \(\textbf{Y}\), we consider their product

$$\begin{aligned} \textbf{M}:=\textbf{X}\cdot \textbf{Y}=(X_i\cdot Y_i)_{i\in {\mathbb {Z}}}. \end{aligned}$$

Notice that if \(\textbf{X}\) and \(\textbf{Y}\) take only positive values, we can define processes \(\log \textbf{X}\) and \(\log \textbf{Y}\). Since \(\log \textbf{M}=\log \textbf{X}+\log \textbf{Y}\), by the result of Garbit, \(\textbf{X}\) can be recovered from \(\textbf{M}\) whenever \(\textbf{X}\) and \(\textbf{Y}\) are disjoint. Therefore, it is natural to ask whether the same conclusion as in Theorem 1.1 holds for processes that admit zero as a value. The simplest instance of this is when the state space, e.g., of \(\textbf{Y}\), equals \(\{0,1\}\). One can think of \(\textbf{M}\) as of the original signal \(\textbf{X}\), where some of the information was lost (due to \(Y_i=0\)), instead of just being perturbed (by adding \(Y_i\) to \(X_i\)). Thus, we deal with the following problem:

Question 2

Let \(\textbf{X}\) and \(\textbf{Y}\) be bilateral, real, finitely-valued, stationary processes, with \(Y_i\in \{0,1\}\). Suppose that \(\textbf{X}\amalg \textbf{Y}\). Is it possible to recover \(\textbf{X}\) and / or \(\textbf{Y}\) from \(\textbf{M}\)?

A similar (in fact, much more general) problem of retrieving a lost signal was studied by Furstenberg, Peres and Weiss in [18]. Let \(\textbf{X}^{(i)}=\left( X_i^{(U_i)}\right) _{i\in \mathbb {Z}}\), where \(i\in {\mathbb {N}}\), be a family of processes and \(\textbf{U}\) be an \({\mathbb {N}}\)-valued process. Suppose that all these processes are stationary and define

$$\begin{aligned} \textbf{X}^{(\textbf{U})}=\left( X_i^{(U_i)} \right) _{i\in {\mathbb {Z}}} \end{aligned}$$

(informally, \(\textbf{U}\) chooses among the family of processes).

Question 3

Is it possible to recover \(\textbf{U}\) from \(\textbf{X}^{(\textbf{U})}\)?

In order to answer this question the authors of [18] introduce the notion of double disjointness. We say that process \(\textbf{Y}\) is doubly disjoint from \(\textbf{X}\) if every self-joining of \(\textbf{Y}\) is absolutely disjoint from \(\textbf{X}\). In other words if \((\textbf{X}',\textbf{Y}', \textbf{Y}'')\) is a stationary process such that \(\textbf{X}' \sim \textbf{X}\) and \(\textbf{Y}', \textbf{Y}'' \sim \textbf{Y}\) then \(\textbf{X}'\amalg (\textbf{Y}', \textbf{Y}'') \). The most basic example of doubly disjoint processes arises when \(\textbf{Y}\) is of zero entropy rate (then every self-joining of \(\textbf{Y}\) has zero entropy) and \(\textbf{X}\) has trivial tail-\(\sigma \)-field (let us add that, in fact, if \(\textbf{Y}\) is doubly disjoint from \(\textbf{X}\) then necessarily \({\textbf{H}}\,({\textbf{Y}}) = 0\) and \(\textbf{X}\) is ergodic). (For the definition of entropy rate, see (1.2) below.) Now, the main result of [18] can be summarized (roughly) as follows. Suppose that \(\textbf{X}^{(i)}\) for \(i\in {\mathbb {N}}\) and \(\textbf{U}\) are jointly stationary. If \(\textbf{U}\) is doubly disjoint from each \(\textbf{X}^{(i)}\) for \(i\in {\mathbb {N}}\) then one can retrieve \(\textbf{U}\) from \(\textbf{X}^{(\textbf{U})}\).

Let us explain how to fit this theorem to our setting from Question 2 (and retrieve \(\textbf{Y}\) from \(\textbf{M}\)). Consider two processes \(\textbf{X}^{(i)}\), for \(i \in \{0, 1\}\), where

$$\begin{aligned} X_j^{(i)} = iX_j \end{aligned}$$
(1.1)

and take \(\textbf{U} = \textbf{Y}\). Then \(\textbf{X}^{(\textbf{U})} = \textbf{X} \cdot \textbf{Y}\) and the theorem states that we can retrieve \(\textbf{Y}\) from \(\textbf{X} \cdot \textbf{Y}\) as soon as \(\textbf{Y}\) is doubly disjoint from \(\textbf{X}\). Since the role of \(\textbf{X}\) and \(\textbf{Y}\) is here not symmetric (and \(\textbf{M}\) and \(\textbf{Y}\) do not determine \(\textbf{X}\) unlike when one studies the sum \(\textbf{X}+\textbf{Y}\)), it is interesting to ask whether one can also retrieve \(\textbf{X}\). To stay compatible with the notion of double disjointness, we will assume that \({\textbf{H}}\,({\textbf{X}})>{\textbf{H}}\,({\textbf{Y}})=0\). Then, clearly, a necessary condition for having the positive answer to Question 2 is that \({\textbf{H}}\,({\textbf{M}})={\textbf{H}}\,({\textbf{X}})\). Having this in mind, we will deal with the following three more specific problems:

Question 4

  1. (A)

    Is there a general formula for the entropy rate \(\textbf{H}(\textbf{M})\) of \(\textbf{M}=\textbf{X}\cdot \textbf{Y}\)?

  2. (B)

    Do we always have \(\textbf{H}(\textbf{M})>0\) whenever \(\textbf{H}(\textbf{X})>0\)?

  3. (C)

    Can we have \(\textbf{H}(\textbf{M})=\textbf{H}(\textbf{X})\) with \({\textbf{H}}\,({\textbf{X}})>0\)?

Remark 1.2

Notice that the answers to Question 1 in [17] and to Question 3 in [18] depend only on the properties of the underlying dynamical systems corresponding to \(\textbf{X}\) and \(\textbf{Y}\). In this paper the situtation will be different and the ability to defilter \(\textbf{X}\) from \(\textbf{M}\) will highly depend on the properties of the stochastic processes under consideration, cf. Example 2.16.

1.2 Invariant measures for \({\mathscr {B}}\)-free systems

Question 4 is a generalization of some questions asked in [28] in the context of \({\mathscr {B}}\)-free systems. For \({\mathscr {B}}\subset {\mathbb {N}}\setminus \{1\}\), consider the corresponding sets of multiples and \({\mathscr {B}}\)-free integers:

$$\begin{aligned} \mathcal {M}_{\mathscr {B}}:=\bigcup _{b\in {\mathscr {B}}}b{\mathbb {Z}}\text { and }\mathcal {F}_{\mathscr {B}}:={\mathbb {Z}}\setminus \mathcal {M}_{\mathscr {B}}. \end{aligned}$$

Such sets were studied already in the 30’s from the number-theoretic viewpoint (see, e.g. [2, 5, 7,8,9, 14]). The most prominent example of \(\mathcal {F}_{\mathscr {B}}\) is the set of square-free integers (with \({\mathscr {B}}\) being the set of squares of all primes). The dynamical approach to \({\mathscr {B}}\)-free sets was initiated by Sarnak [35] who proposed to study the dynamical system given by the orbit closures of the Möbius function \(\varvec{\mu }\) and its square \(\varvec{\mu }^2\) under the left shift S in \(\{-1,0,1\}^{\mathbb {Z}}\).Footnote 2 For an arbitrary \({\mathscr {B}}\subset {\mathbb {N}}\setminus \{1\}\), let \(X_\eta \) be the orbit closure of \(\eta =\mathbb {1}_{\mathcal {F}_{\mathscr {B}}}\in \{0,1\}^{\mathbb {Z}}\) under the left shift, i.e. we deal with a subshift of \((\{0,1\}^{\mathbb {Z}},S)\).Footnote 3 We say that \((X_\eta ,S)\) is a \({\mathscr {B}}\)-free system. In the so-called Erdös case (when the elements of \({\mathscr {B}}\) are pairwise coprime, \({\mathscr {B}}\) is infinite and \(\sum _{b\in {\mathscr {B}}}1/b<\infty \)), \(X_\eta \) is hereditary: for \(y\leqslant x\) coordinatewise, with \(x\in X_\eta \) and \(y\in \{0,1\}^{\mathbb {Z}}\), we have \(y\in X_\eta \). In other words, \(X_\eta =M(X_\eta \times \{0,1\}^{\mathbb {Z}})\), where M stands for the coordinatewise multiplication of sequences. For a general \({\mathscr {B}}\subset {\mathbb {N}}\setminus \{1\}\), \({X}_\eta \) may no longer be hereditary and we consider its hereditary closure \({\widetilde{X}}_\eta :=M(X_\eta \times \{0,1\}^{\mathbb {Z}})\) instead. Usually, one assumes at least the primitivity of \({\mathscr {B}}\) (i.e. \(b\,|\!\!/\, b'\) for \(b\ne b'\) in \({\mathscr {B}}\)).

Given a topological dynamical system (XT), i.e. a homeomorphism T acting on a compact metric space X, let \(\mathcal {B}\) be the \(\sigma \)-algebra of Borel subsets of X. By \(\mathcal {M}(X,T)\) we will denote the set of all probability Borel T-invariant measures on X and \(\mathcal {M}^e(X,T)\) will stand for the subset of ergodic measures. Each choice of \(\mu \in \mathcal {M}(X,T)\) results in a measure-theoretic dynamical system, i.e. a 4-tuple \((X,\mathcal {B},\mu ,T)\), where \((X,\mathcal {B},\mu )\) is a standard probability Borel space, with an automorphism T. We often skip \(\mathcal {B}\) and write \((X,\mu ,T)\). Recall also that \(x\in X\) is said to be generic for \(\mu \in \mathcal {M}(X,T)\), whenever \(\lim _{N\rightarrow \infty }\frac{1}{N}\sum _{n\leqslant N}\delta _{T^nx}=\mu \) in the weak topology. If the convergence takes place only along a subsequence \(({N_k})_{k\geqslant 1}\) then we say that x is quasi-generic for \(\mu \). Each measure \(\mu \) resulting this way yields a measure theoretic dynamical system \((X,\mathcal {B},\mu ,T)\).

A central role in the theory of \({\mathscr {B}}\)-free systems is played by the so-called Mirsky measure, denoted by \(\nu _\eta \). In the Erdös case, \(\eta \) is a generic point for \(\nu _\eta \) (in general, \(\eta \) is quasi-generic along some natural sequence \((N_k)\)), see [12]. It was shown in [12, 28] that all invariant measures for \({\widetilde{X}}_\eta \) are of the following special form:

Theorem 1.3

(cf. Sect. 1) For any \(\nu \in \mathcal {M}({\widetilde{X}}_\eta ,S)\), there exists \(\rho \in \mathcal {M}(X_\eta \times \{0,1\}^{\mathbb {Z}}, S\times S)\) such that \(\rho |_{X_\eta }=\nu _\eta \) and \(M_*(\rho )=\nu \).Footnote 4

Recall that given a measure-theoretic dynamical system \((X,\mathcal {B},\mu , T)\), any T-invariant sub-\(\sigma \)-algebra \(\mathcal {A}\subset \mathcal {B}\) is called a factor of \((X,\mathcal {B},\mu ,T)\).Footnote 5 Notice that given \(\nu \) and \(\rho \) as in Theorem 1.3, \(({\widetilde{X}}_\eta ,\nu ,S)\) is a factor of \((X_\eta \times \{0,1\}^{\mathbb {Z}},\rho ,S\times S)\).

The measure-theoretic entropy of \((X,\mathcal {B},\mu ,T)\) will be denoted by \(h_\mu (T,\mathcal {B})\). If no confusion arises, we will also write \(h(\mu ,T)\) or even \(h(\mu )\). If \(\textbf{X}\) is a finitely-valued stationary process determining \((X,\mu ,T)\) (as described in Sect. 1.1) then \({\textbf{H}}\,({\textbf{X}})=h(\mu )\).

The Mirsky measure \(\nu _\eta \) is of zero entropy. Moreover, it was shown in [28] in the Erdös case that \(({X}_\eta ,S)\) is intrinsically ergodic (it has exactly one measure realizing the topological entropy). Its measure of maximal entropy equals , where stands for the Bernoulli measure on \(\{0,1\}^{\mathbb {Z}}\) of entropy \(\log 2\). These results were extended in [12] to a general \({\mathscr {B}}\) (one needs to replace \(X_\eta \) with \({\widetilde{X}}_\eta \)). In the Erdös case, the topological entropy of \(({X}_\eta ,S)\) is equal Footnote 6 (in general, the topological entropy of \(({\widetilde{X}}_\eta ,S)\) equals \({\overline{d}}(\mathcal {F}_{\mathscr {B}})\) [12]).Footnote 7 This led to the study of product type measures (or multiplicative convolutions):

$$\begin{aligned} \nu _\eta *\kappa :=M_*(\nu _\eta \otimes \kappa ). \end{aligned}$$

In particular, it was proved that

Moreover, it was shown that for each value there is an ergodic measure \(\kappa \) satisfying \(h(X_\eta ,\nu _\eta *\kappa )=h\). However, some fundamental questions related to such measures were left open – they turn out to be a special instance of Question 4 (see Question 1 in [28]):

Question 5

  1. (A)

    Is there a general formula for the entropy \(h(\nu _\eta *\kappa )\) of \(\nu _\eta *\kappa \)?

  2. (B)

    Do we always have \(h(\nu _\eta *\kappa )>0\) whenever \(h(\kappa )>0\)?

  3. (C)

    Can we have \(h(\nu _\eta *\kappa )=h(\kappa )\) with \(h(\kappa )>0\)?

1.3 Main technical result

Our main tool used to answer Questions 4 and 5 is concerned with the entropy rate of stationary processes. Before we can formulate it, we need some definitions and notation that will be used througout the whole paper.

All random variables and processes will be defined on a fixed probability space \(({\Omega , \mathcal {F},\mathbb {P}})\). Sometimes, we will replace the underlying probability measure \(\mathbb {P}\) by its conditioned version, \(\mathbb {P}_A(\cdot ) = {\mathbb {P}}\,\, (\cdot \cap \,\, A) / \,{\mathbb {P}}\,\,(A)\), where \(A\in \mathcal {F}\) with \(\mathbb {P}(A)>0\). In particular, \(\mathbb {E}_A\) will stand for the expectation taken with respect to \(\mathbb {P}_A\). For convenience sake, we will write AB instead of \(A\cap B\) for any \(A,B\in \mathcal {F}\): for example, \(\mathbb {E}_{A, B}\) stands for \(\mathbb {E}_{A\cap B}\). A central role will be played by the Shannon entropy of a random variable X, denoted by us by \({\textbf{H}}\,(X)\). Although we will recall basic definitions and properties related to \({\textbf{H}}\,(X)\), some well-known facts will be taken for granted (all of them can be found in [22]). All random processes will be bilateral and real. Usually, they will be also finitely-valued and stationary, however sometimes we will need auxiliary countably-valued, non-stationary processes. Recall that a process \(\textbf{X} = {({{X}_i})_{i\in {\mathbb {Z}}}}\) is stationary if \({({{X}_i})_{i\in {\mathbb {Z}}}}\) has the same distribution as \(\left( X_{i + 1}\right) _{i\in {\mathbb {Z}}}\) and finitely-valued if, for every \(i\in {\mathbb {Z}}\), \(X_i \in \mathcal {X}\), with \(\left| \mathcal {X}\right| < \infty \).

Let now XY be random variables taking values in finite state spaces \(\mathcal {X}\), \(\mathcal {Y}\) respectively and fix \(A\in \mathcal {F}\) with \(\,{\mathbb {P}}\,\,(A) > 0\). We put \({{\textbf {H}}}_{A}({X})\,=\, -\sum _{x\in \mathcal {X}} {\mathbb {P}}_{A} \, (X\,=\,x) \text {log}_{2}\,{\mathbb {P}}_{A} \, (X\,=\,x)\). Moreover, \({{\textbf {H}}}_{A}(X\,|\,Y)\,=\,\sum _{y\in \mathcal {Y}}\,{\mathbb {P}}_{A}(Y=y){{\textbf {H}}}_{Y=y,A}(X)\) will stand for the conditional Shannon entropy of X with respect to Y. When \({\mathbb {P}}\,\,(A) = 1\), we will omit subscript A and write \({\textbf{H}}\,(X)\) and \({\textbf{H}}\,(X\,\,|\,\,Y)\), respectively.

To shorten the notation, we will use the following convention. For a subset \(A = \left\{ i_1, \ldots , i_n\right\} \subset {\mathbb {Z}}\) with \(i_1< i_2< \cdots < i_n\) and a process \(\textbf{X} = {({{X}_i})_{i\in {\mathbb {Z}}}}\), we will write

$$\begin{aligned} X_A = \left( X_{i_1}, X_{i_2},\ldots , X_{i_n}\right) . \end{aligned}$$

Moreover, for any \(k \leqslant \ell \) in \({\mathbb {Z}}\), we define integer intervals:

$$\begin{aligned}{}[k, \infty ) {:=} \left\{ k, k + 1, \ldots \right\} , \quad (-\infty , \ell ] {:=} \left\{ \ell , \ell -1, \ldots \right\} , \quad [k,\ell ] {:=} \left\{ k, k + 1, \ldots , \ell \right\} . \end{aligned}$$

For example, \(X_{[0, n]} = \left( X_0, \ldots , X_n\right) \) for \(n\in {\mathbb {N}}\). It is natural and convenient to interpret \([k, \ell ]\) as \(\varnothing \) if \(\ell < k\), \({\textbf{H}}\,(X_{\varnothing }) = 0\) and \({\textbf{H}}\,(X\,|\,Y_{\varnothing }) = {\textbf{H}}\,(X)\).

Consider now two random processes \(\textbf{X}= {({{X}_i})_{i\in {\mathbb {Z}}}}\) and \(\textbf{Y} = {({{Y}_i})_{i\in {\mathbb {Z}}}}\) such that \((\textbf{X}, \textbf{Y}) := \left( (X_i, Y_i)\right) _{i\in {\mathbb {Z}}}\) is stationary. Then

$$\begin{aligned} {\textbf{H}}\,({\textbf{X}}) = \lim \limits _{n\rightarrow \infty }\frac{1}{n}{\textbf{H}}(X_{[0, n - 1]}), \quad {\textbf{H}}\,(\textbf{X}\,|\,\textbf{Y}) = \lim \limits _{n \rightarrow \infty } \frac{1}{n} {\textbf{H}}\,(X_{[0, n - 1]}\,|\,Y_{[0, n - 1]})\qquad \end{aligned}$$
(1.2)

will denote, respectively, the entropy rate of \(\textbf{X}\) and the relative entropy rate of \(\textbf{X}= {({{X}_i})_{i\in {\mathbb {Z}}}}\) with respect to \(\textbf{Y}={({{Y}_i})_{i\in {\mathbb {Z}}}}\). By the stationarity of \(\textbf{X}\), \({\textbf{H}}\,({\textbf{X}}) = \lim \nolimits _{n\rightarrow \infty }{\textbf{H}}\,(X_0\,|\,X_{[-n, -1]})\). Note that both limits in (1.2) exist due to the subadditivity of appropriate sequences.

Remark 1.4

Sometimes it is convenient to extend the classical definition of the conditional entropy, \({\textbf{H}}\,(X\,\,|\,\,Y)\), to \({\textbf{H}}\,(X\,|\,\mathcal {G})\), where X is a finitely-valued random variable and \(\mathcal {G} \subset \mathcal {F}\) is a sub-\(\sigma \)-algebra (see [20], Chapter 14, for a precise construction and proofs). This extension is justified by the following facts. If \(\mathcal {G} =\sigma (Y)\) then \({\textbf{H}}\,(X\,|\,\sigma (Y)) = {\textbf{H}}\,(X\,\,|\,\,Y)\) for any random variable Y.Footnote 8 If \(\mathcal {H}\subset \mathcal {G}\subset \mathcal {F}\) are sub-\(\sigma \)-algebras then \({\textbf{H}}\,(X\,|\,\mathcal {G}) \leqslant {\textbf{H}}\,(X\,|\,\mathcal {H})\). Moreover, if \(\mathcal {G}_n \searrow \mathcal {G}\) or \(\mathcal {G}_n \nearrow \mathcal {G}\) then \({\textbf{H}}\,(X\,|\,\mathcal {G}_n) \nearrow {\textbf{H}}\,(X\,|\,\mathcal {G})\) or \({\textbf{H}}\,(X\,|\,\mathcal {G}_n) \searrow {\textbf{H}}\,(X\,|\,\mathcal {G})\), respectively. Thus, for example, it makes sense to write \({\textbf{H}}\,(\textbf{X}) = {\textbf{H}}\,(X_0\,|\,X_{(-\infty , - 1]}) = \lim \nolimits _{n\rightarrow \infty } {\textbf{H}}\,(X_0\,|\,X_{(-n, - 1]})\). The chain rule is still valid, namely, if X and Y are finitely-valued then

$$\begin{aligned} {\textbf{H}}\,(X, Y\,|\,\mathcal {G}) = {\textbf{H}}\,(X\,|\,\mathcal {G}) + {\textbf{H}}\,(Y\,|\,\sigma (\mathcal {G}, \sigma (X))). \end{aligned}$$
(1.3)

Furthermore, \({\textbf{H}}\,(X\,|\,\mathcal {G}) = 0\) if and only if X is \(\mathcal {G}\)-measurable and \({\textbf{H}}\,(X\,|\,\mathcal {G}) = {\textbf{H}}\,(X)\) if and only if X is independent of \(\mathcal {G}\).

Remark 1.5

We will often omit some technicalities concerning events of zero probability. First, we tacitly assume that \(\mathcal {F}\) is complete (i.e. all subevents of zero-measure events are measurable). Secondly, when considering sub-\(\sigma \)-fields associated with random processes, we think of them as of measure-\(\sigma \)-algebras (intuitively, we look at them "up to events of probability zero"). Given sub-\(\sigma \)-fields \(\mathcal {G},\mathcal {H}\subset \mathcal {F}\), sometimes we will write

$$\begin{aligned} \mathcal {G}\,\,{\mathop {\subset }\limits ^{\mathbb {P}}}\,\,\mathcal {H} \end{aligned}$$

to stress that for every \(G \in \mathcal {G}\) there is \(H\in \mathcal {H}\) such that \(\mathbb {P} (G\triangle H) = 0\) but not necessarily \(\mathcal {G}\subset \mathcal {H}\) (with obvious modifications for \({\mathop {\supset }\limits ^{\mathbb {P}}}\) and \({\mathop {=}\limits ^{\mathbb {P}}}\)). However, in most cases, we will skip such considerations, cf. the last sentence of the previous remark.

Given processes \(\textbf{X} = {({{X}_i})_{i\in {\mathbb {Z}}}}\) and \(\textbf{Y} = {({{Y}_i})_{i\in {\mathbb {Z}}}}\), we will be interested in the entropy rate \({\textbf{H}}\,(\textbf{X}\cdot \textbf{Y})\) of their product \(\textbf{X}\cdot \textbf{Y} = (X_i \cdot Y_i)_{i\in {\mathbb {Z}}}\). Our standing assumptions (unless stated otherwise) will be that:

  1. (i)

    \(\textbf{X}\) is finitely-valued, \(\textbf{Y}\) is binary (\(Y_i\in \{0,1\}\) for \(i\in {\mathbb {Z}}\)) and \(\mathbb {P}(Y_0=1)>0\),

  2. (ii)

    \(\textbf{X}\amalg \textbf{Y}\), i.e. \(\textbf{X}\) and \(\textbf{Y}\) are independent.

Notice that by the independence of \(\textbf{X}\) and \(\textbf{Y}\), process \((\textbf{X}, \textbf{Y})\) is stationary. Moreover, \(\textbf{X}\cdot \textbf{Y}\) is a factor of \((\textbf{X}, \textbf{Y})\).Footnote 9 The quantity \({\textbf{H}}\,(\textbf{X}\cdot \textbf{Y}\,|\,{\textbf{Y}})\) turns out to be easier to deal with than \({\textbf{H}}(\textbf{X}\cdot \textbf{Y})\). A particular emphasis will be put on the case when \({\textbf{H}}\,({\textbf{Y}})=0\) in which \({\textbf{H}}\,(\textbf{X}\cdot \textbf{Y}\,|\,\textbf{Y})={\textbf{H}}\,(\textbf{X}\cdot \textbf{Y})\)Footnote 10 and \({\textbf{H}}\,({\textbf{X}}\cdot {\textbf{Y}}) \leqslant {\textbf{H}}\,({\textbf{X}})\).Footnote 11

Let \(\textbf{R}=\textbf{R}(\textbf{Y})={({{R}_i})_{i\in {\mathbb {Z}}}}\) be the return process, i.e. the process of consecutive arrival times of \(\textbf{Y}\) to 1:

$$\begin{aligned} R_i={\left\{ \begin{array}{ll} \inf \{j \geqslant 0 : Y_j = 1\}, &{} i=0, \\ \inf \{j\geqslant R_{i - 1} : Y_j = 1\}, &{} i\geqslant 1, \\ \sup \{j < R_{i + 1} : Y_j = 1\}, &{} i\leqslant -1. \end{array}\right. } \end{aligned}$$
(1.4)

Note that, in general, \(\textbf{R}\) can be countably-valued. If \(\textbf{Y}\) is ergodic then it visits 1 infinitely often, both in the future and in the past and, thus, \(\textbf{R}\) is well-defined almost everywhere. However, we don’t need to assume the ergodicity of \(\textbf{Y}\) to be able to speak of \(\textbf{R}\) and we will just assume that:

  1. (iii)

    \(\textbf{Y}\) is such that the definition of \(\textbf{R}\) makes sense.

Whenever (i), (ii) and (iii) hold, we will say that the pair \((\textbf{X},\textbf{Y})\) is good. If \(\textbf{Y}\) is binary, with \(\mathbb {P}(Y_0=1)>0\) and such that (iii) holds, we will say that \(\textbf{Y}\) is good.

Remark 1.6

We will use lowercase letters to denote realizations of the corresponding random processes (denoted by uppercase letters). Recall that \(\textbf{x}={({{x}_i})_{i\in {\mathbb {Z}}}}\) is a realization of \(\textbf{X} ={({{X}_i})_{i\in {\mathbb {Z}}}}\) if there exists \(\omega \in \Omega \) such that \(x_i = X_i(\omega )\) for all \(i\in {\mathbb {Z}}\). Moreover, we will tacitly assume that \(\omega \) belongs to some “good” subset of \(\Omega \) of probability 1. For example, for \(\textbf{R}\), our standing assumption will be that \(\omega \) realizing \(\textbf{r}\) belongs to the set where \(\textbf{Y}\) visits 1 infinitely often in both directions. In general, if some property of a process \(\textbf{X}\) has probability 1, then realization \(\textbf{x}\) inherits it. For example, if we consider \(\textbf{Y}\) under \(\mathbb {P}_{Y_0 = 1}\) then every realization \(\textbf{y}\) will satisfy \(y_0 = 1\).

The main technical result contains entropy formulas for good processes.

Theorem 1.7

(answer to Question 4(A)) Let \(\textbf{X}=(X_{n})_{n\in {\mathbb {Z}}}\), \(\textbf{Y}=(Y_{n})_{n\in {\mathbb {Z}}}\) be a pair of mutually independent stationary processes, where \(\textbf{X}\) is finitely valued and \(\textbf{Y}\) is binary and such that \(\mathbb {P}(Y_{0}=1)>0\). Assume also that \(\textbf{Y}\) is such that the definition of the corresponding return process \(\textbf{R}\) to state 1 makes sense. Then

  1. (A)

    \({\textbf{H}}\,(\textbf{X}\,\cdot \,\textbf{Y}\,|\,\textbf{Y})={\mathbb {P}}(Y_0=1){\mathbb {E}}_{Y_{0=1}} {\textbf{H}} (X_0|X_\{r_{-1},r_{-2},...\})|_{r_{-i}= R_{-i}}.\)

If additionally \(\textbf{Y}\) is ergodic then

  1. (B)

    \({\textbf{H}}\,(\textbf{X}\,\cdot \,\textbf{Y}\,|\,\textbf{Y})={\textbf{H}}(\textbf{X})-{\mathbb {P}}(Y_0=1){\mathbb {E}}_{Y_{0=1}}{\textbf{H}} (X_{[1,r_1-1]}|X_{(-\infty ,0]}, X_{\{r_1,r_2,\ldots \}})|_{r_i=R_i}\).

Remark 1.8

The above expectations are to be understood in the following way:

  • we compute \({\textbf{H}}\,(X_0\,|\,X_{\{r_{-1}, r_{-2}, \cdots \}})\) or \({\textbf{H}}\,(X_{[1,r_1-1]}\,|\,X_{(-\infty ,0]}, X_{\{r_1,r_2,\ldots \}})\) for all realizations \(\textbf{r} = {({{r}_i})_{i\in {\mathbb {Z}}}}\) thus obtaining a function \(f(\textbf{r})\) depending on \(\textbf{r}\);

  • we find \(\mathbb {E}_{Y_0 = 1} f(\textbf{R})\).

1.4 Consequences of the main technical result

Clearly, Theorem 1.7 gives an answer to Questions 4(A) and 5(A). We will say now how it is related to Questions 4(B), 5 (B), 4(C) and 5(C). The details and longer proofs are included in Sect. 2.3.

1.4.1 Answer to Questions 4(B) and 5(B)

Notice first that

$$\begin{aligned} {\textbf{H}}\,({\textbf{X}})\leqslant {\textbf{H}}\,(X_0\,|\,X_{\{r_{-1}, r_{-2}, \cdots \}})\leqslant {{\textbf{H}}\,({X}_0)} \end{aligned}$$

for each choice of negative integers \(\dots<r_{-2}<r_{-1}<0\). Therefore, by Theorem 1.7 (A), we obtain immediately the following:

Corollary 1.9

(positive answer to Question 4(B)) Suppose that \((\textbf{X},\textbf{Y})\) is good, i.e. \(\textbf{X}=(X_{n})_{n\in {\mathbb {Z}}}\), \(\textbf{Y}=(Y_{n})_{n\in {\mathbb {Z}}}\) is a pair of mutually independent stationary processes, where \(\textbf{X}\) is finitely valued and \(\textbf{Y}\) is binary, such that \(\mathbb {P}(Y_{0}=1)>0\) and the definition of the corresponding return process \(\textbf{R}\) to state 1 makes sense. Assume additionally that \({\textbf{H}}\,({\textbf{Y}})=0\). Then

$$\begin{aligned} \mathbb {P}\,({Y_0=1})\,{\textbf{H}}\,({\textbf{X}})\leqslant {\textbf{H}}\,({\textbf{M}})\leqslant \mathbb {P}\,({Y_0=1})\,{\textbf{H}}\,(X_0). \end{aligned}$$

In particular,

$$\begin{aligned} {\textbf{H}}\,({\textbf{M}})>0\,\, \textrm{whenever} \,\,{\textbf{H}}\,({\textbf{X}})>0. \end{aligned}$$
(1.5)

Remark 1.10

The lower bound in Corollary 1.9 is attained for exchangeable processes (see Proposition 2.5), whereas the upper bound is attained for i.i.d. processes. If \(\textbf{X}\) is a Markov chain (which is not i.i.d.), both inequalities are strict, see Sect. 2.1.2.

Remark 1.11

(positive answer to Question 5(B)) Implication (1.5) means, in particular, that the answer to Question 5(B) is positive, whenever \(\nu _\eta \ne \delta _{(\ldots 0,0,0\ldots )}\). In Sect. 1 we present an alternative ergodic-theoretic approach to this problem. The proof presented therein is much shorter, on the other hand it addresses directly Question 5(B), without providing any explicit formulas.

Remark 1.12

If one drops the assumption that \(\textbf{X} \amalg \textbf{Y}\) then the situation changes completely and one can get \({\textbf{H}}\,({\textbf{M}}) = 0\) (with \({\textbf{H}}\,({\textbf{X}})>0\) and \(\,{\mathbb {P}}\,(Y_0=1)>0\)). To see how far this can go, consider

$$\begin{aligned} \textbf{X} = \textbf{Z}\cdot \textbf{W} \text { and }\textbf{Y} = \mathbf {1 - W} = ({1 - W_i})_{i \in {\mathbb {Z}}}, \end{aligned}$$

where

$$\begin{aligned} \textbf{Z} \amalg \textbf{W}, {\textbf{H}}\,({\textbf{W}}) = 0\text { and }\mathbb {P}(W_0=0)\cdot \mathbb {P}(W_0=1)>0. \end{aligned}$$

Then \(\textbf{M}\) is a trivial zero process, in particular, we have \({\textbf{H}}\,({\textbf{M}}) = 0\). On the other hand, by Corollary 1.9, \({\textbf{H}}\,(\textbf{X}) = {\textbf{H}}\,(\textbf{Z}\cdot \textbf{W}) > 0 = {\textbf{H}}\,(\textbf{W}) = {\textbf{H}}\,({\textbf{Y}})\). Cf. also Sect. 1 for more examples of ergodic-theoretic flavour.

1.4.2 Answer to Questions 4(C) and 5(C)

Answers to Questions 4(C) and 5(C) are more complex and they are related to the notion of a bilaterally deterministic process.

Definition 1.13

We say that a stationary process \(\textbf{Z} = \left( Z_i\right) _{i \in {\mathbb {Z}}}\) is bilaterally deterministic if, for all \(k\in {\mathbb {N}}\),

$$\begin{aligned} {\textbf{H}}\,(Z_{[0,k]}\,|\,Z_{(-\infty , -1]},Z_{[k+1, \infty )}) = 0. \end{aligned}$$

Remark 1.14

The notion of a bilaterally deterministic process was introduced by Ornstein and Weiss [32], in terms of the following (double) tail sigma-algebra:

$$\begin{aligned} {{\mathcal {T}}_d} := \bigcap _{n\geqslant 1} \sigma \left( Z_{(-\infty , -n]}, Z_{[n, \infty )}\right) . \end{aligned}$$

Notice that the following conditions are equivalent:

  • \(\textbf{Z}\) is bilaterally deterministic,

  • \(Z_{[-k,k]}\in {{\mathcal {T}}_d} \text { for each }k\geqslant 1\),

  • \(\sigma (\textbf{Z})={{\mathcal {T}}_d}\).

Indeed, e.g., if \(\textbf{Z}\) is bilaterally deterministic then \({\textbf{H}}\,(Z_{[0,k]}\,|\,Z_{(-\infty , -\ell ]},Z_{[k+1 + m, \infty )})=0\) for any \(k,\ell ,m\in {\mathbb {N}}\) and by taking \(\ell ,m\rightarrow \infty \), we easily obtain \(Z_{[-k,k]}\in {{\mathcal {T}}_d} \text { for each }k\geqslant 1\). Cf. also Remark 1.4. Informally, “given the far past and the distant future, the present can be reconstructed” [32].

Remark 1.15

Given a stationary finitely-valued process \(\textbf{Z}\), let

$$\begin{aligned} {{\mathcal {T}}_p} := \bigcap _{n\geqslant 1} \sigma \left( Z_{(-\infty , -n]}\right) , \qquad {{\mathcal {T}}_f} := \bigcap _{n\geqslant 1} \sigma \left( Z_{[n, \infty )}\right) \end{aligned}$$

denote, respectively, the tail \(\sigma \)-algebra corresponding to the past and to the future. By a celebrated result of Pinsker [34], \({{\mathcal {T}}_{p}}\,\,{\mathop {=}\limits ^{\mathbb {P}}}\,\,{{\mathcal {T}}_{f}}\,\,{\mathop {=}\limits ^{\mathbb {P}}}\,\,\Pi \), where \(\Pi \) denotes the Pinsker \(\sigma \)-algebra (i.e., the largest zero entropy sub-\(\sigma \)-algebra). Thus, the following conditions are equivalent (cf. Remark 1.14):

  • \({\textbf{H}}\,(\textbf{Z})=0\),

  • \(Z_{[-k,k]}\in {{\mathcal {T}}_p}\) for each \(k\geqslant 1\),

  • \(\sigma (\textbf{Z})={{\mathcal {T}}_p}\).

A direct consequence of Remark 1.14 and Remark 1.15 is the following observation:

Corollary 1.16

Suppose that \({\textbf{H}}\,(\textbf{Z})>0\). Then \(\textbf{Z}\) is not bilaterally deterministic whenever \({{\mathcal {T}}_d}={{\mathcal {T}}_p}\). In particular, this happens if \({{\mathcal {T}}_d}\) is trivial.

Notice that from this point of view, stationary processes can be split into three pairwise disjoint classes:

  1. (a)

    of zero entropy rate (they are automatically bilaterally deterministic),

  2. (b)

    of positive entropy rate that are bilaterally deterministic,

  3. (c)

    of positive entropy rate but not bilaterally deterministic.

Class (c) includes the following positive entropy rate processes:

  • exchangeable processes,

  • Markov chains,

  • weakly Bernoulli processes (here \({{\mathcal {T}}_d}\) is trivial),

for more details, see Sect.on 2.1. Theorem 1.7 allows us to “compare” a large subclass of processes from class (a) with processes from class (c), see Corollaries 1.17 and 1.19 below. In particular, the zero entropy class that we have in mind contains all \({\mathscr {B}}\)-free systems (considered with the Mirsky measure), cf. Proposition 1.20 and Corollary 1.21. We leave as an open problem to find answers to analogous questions on the relations between class (a) with class (b).

Notice that

$$\begin{aligned}{} & {} \mathbb {E}_{Y_0=1}{\textbf{H}}\,(X_{[1,r_1-1]}\,|\,X_{(-\infty ,0]},X_{\{r_1,r_2,\dots \}})_{|r_i = R_i} \\{} & {} \qquad \geqslant \mathbb {E}_{Y_0=1}{\textbf{H}}\,(X_{[1,r_1-1]}\,|\,X_{(-\infty ,0]},X_{[r_1,\infty )})_{|r_1 = R_1}\\{} & {} \qquad =\sum _{k\geqslant 1} \,{\mathbb {P}}_{Y_0=1}(R_1=k+1){\textbf{H}}\left( X_{[1,k]}|\, X_{(-\infty ,0),X_{[k+1,\infty )}}\right) . \end{aligned}$$

Moreover, if \(\textbf{X}\) fails to be bilaterally deterministic, then, for all k sufficiently large, we have

$$\begin{aligned} {\textbf{H}}\,(X_{[1,k]}\,|\,X_{(-\infty ,0]},X_{[k+1,\infty )})>0. \end{aligned}$$
(1.6)

Thus, using Theorem 1.7(B), we obtain immediately the following:

Corollary 1.17

(answer to Question 4(C)) Suppose that \((\textbf{X},\textbf{Y})\) is good and \(\textbf{Y}\) is ergodic of zero entropy rate (i.e. \(\textbf{X}\) and \(\textbf{Y}\) is a pair of mutually independent stationary processes, \(\textbf{X}\) is finitely-valued, \(\textbf{Y}\) is binary and ergodic, with \(\textbf{H}(\textbf{Y})=0\)). If additionally

$$\begin{aligned} {\mathbb {P}}\,(R_1=k)\, >0 \, \textrm{for} \,\,\textrm{infinitely} \,\,\textrm{many}\, k\in {\mathbb {N}}\end{aligned}$$
(1.7)

and \(\textbf{X}\) is not bilaterally deterministic then \({\textbf{H}}\,({\textbf{M}}) < {\textbf{H}}\,({\textbf{X}})\).

Remark 1.18

In fact, if we know more about \(\textbf{X}\) than just (1.6) then the assumption that \({\mathbb {P}}\,(R_1 = k) > 0\) for infinitely many \(k\in {\mathbb {N}}\) can be relaxed and we can still have \({\textbf{H}}\,({\textbf{M}}) < {\textbf{H}}\,({\textbf{X}})\). E.g. if \(\textbf{X}\) is Bernoulli then we will always have \({\textbf{H}}\,({\textbf{M}})<{\textbf{H}}\,({\textbf{X}})\) whenever \((\textbf{X},\textbf{Y})\) is good (\(\textbf{X}\), \(\textbf{Y}\) is a pair of mutually independent stationary processes, \(\textbf{X}\) is finitely-valued, \(\textbf{Y}\) is binary and such that the definition of the corresponding return process \(\textbf{R}\) to state 1 makes sense) and \(\textbf{Y}\) is of zero entropy rate.

A natural question arises what happens when (1.7) fails to hold. Suppose that our processes are of dynamical origin and the underlying dynamical system is a transitive symbolic dynamical system. Namely, take \(\textbf{w}\in \{0,1\}^{\mathbb {Z}}\) such that the support of \(\textbf{w}\) is unbounded both from below and from above, and suppose that \(\textbf{w}\) is quasi-generic along some subsequence for an invariant zero entropy measure \(\nu \). Let Y be the orbit closure of \(\textbf{w}\) under the left shift S and let \(\textbf{Y}\sim \nu \) be the corresponding stationary process. Clearly,

$$\begin{aligned} (1.7) \implies \text { the support of } \textbf{w} \text { does not contain a two-sided infinite arithmetic progression}. \end{aligned}$$

It turns out that if we assume that the support of \(\textbf{w}\) does contain a two-sided (infinite) arithmetic progression then one can obtain a complementary result to Corollary 1.17:

Corollary 1.19

Let \(\textbf{Y}\) be a good, ergodic process (i.e. \(\textbf{Y}\) is a stationary binary ergodic process, with \(\mathbb {P}(Y_0=1)>0\)) of zero entropy rate. Assume additionally that there exists \(L\geqslant 1\) such that for a.e. realization \(\textbf{y}\), the corresponding return time sequence \(\textbf{r}\) contains an arithmetic progression of difference L. Then there exists a stationary binary process \(\textbf{X}\) that is not bilaterally deterministic and such that \({\textbf{H}}\,({\textbf{M}})={\textbf{H}}\,({\textbf{X}})\).

Let us turn now to the interpretation of Corollaries 1.17 and 1.19 from the point of view of \({\mathscr {B}}\)-free systems. Recall that a topological dynamical system (TX) is said to be proximal whenever for any \(x,y\in X\) there exists \(n_{k}\rightarrow \infty \) such that \(d(T^{n_{k}}x,T^{n_{k}}y)\rightarrow 0\). It turns out that in \({\mathscr {B}}\)-free setting we have the following dychotomy:

Proposition 1.20

Let \({\mathscr {B}}\subset \mathbb {N}\) and let \(\eta \) be the characteristic function of the corresponding \({\mathscr {B}}\)-free set. Then exactly one of the following holds:

  • \((X_\eta ,S)\) is proximal and then for infinitely many \(k\geqslant 1\) the block of the form \(10\ldots 01\) (with k zeros between the 1’s) is of positive Mirsky measure \(\nu _\eta \),

  • \((X_\eta ,S)\) is not proximal and then \(\eta \) contains a two-sided infinite arithmetic progression.

As a direct consequence of Corollaries 1.171.19 and Proposition 1.20, for \({\mathscr {B}}\)-free systems we have the following result:

Corollary 1.21

Let \({\mathscr {B}}\subset \mathbb {N}\). Then \((X_\eta ,S)\) is proximal if and only if for any \(\textbf{X}\) that is not bilaterally deterministic, such that \(\textbf{X}\amalg \textbf{Y}\), we have \({\textbf{H}}\,({\textbf{M}})<{\textbf{H}}\,({\textbf{X}})\).

Finally, let us remark that \(\textbf{X}\) in Corollary 1.19 can be chosen to be very weakly Bernoulli (i.e. as a dynamical system, isomorphic to a Bernoulli process [31]) (compare Example 2.16  below and Remark 1.18). That is, for \(\textbf{Y}\) as in Corollary 1.19, we can find a measure-theoretic dynamical system \((X,\mathcal {B},\mu ,T)\) with two stochastic representations \(\textbf{X}\) and \(\textbf{X}'\) (both not bilaterally deterministic!) such that \({\textbf{H}}\,(\textbf{X}\cdot \textbf{Y})<{\textbf{H}}\,({\textbf{X}})={\textbf{H}}\,(\textbf{X}')={\textbf{H}}\,(\textbf{X}'\cdot \textbf{Y})\). More than that, in some cases \(\textbf{X}'\) can be retrieved from \(\textbf{X}'\cdot \textbf{Y}\). This matches well with the fact that the notion of a bilaterally deterministic process is not stable under taking various process representation of a given dynamical system [32]. It makes the situation completely different from the one in [18], where the results are purely ergodic-theoretic.

1.5 Dictionary between ergodic theory and probability theory

In our paper, both ergodic-theoretic and stochastic questions and tools are often intertwined. Let us now give some samples of ergodic-theory results translated into the language of stochastic processes. Our basic object is an ergodic-theoretic dynamical system \((\mathcal {X}^{\mathbb {Z}}, \mu , S)\), where S stands, as usual, for the left shift, together with a subset \(A \subset X\) satisfying \(\mu (A) > 0\). Recall that for \(x\in A\), the first return time \(n_A\) is defined as \(n_A(x) = \inf \left\{ n\geqslant 1\;|\; S^n x \in A\right\} \) and the corresponding induced transformation as \(S_A(x) = S^{n_A(x)}(x)\), with the corresponding conditional measure \(\mu _A\) being invariant under \(S_A\).

Fix now a stationary process \(\textbf{X} = {({{X}_i})_{i\in {\mathbb {Z}}}}\) on \((\Omega ,\mathcal {F},\mathbb {P})\), with distribution \(\mu \), i.e. \(\textbf{X}\sim \mu \). This is a stochastic counterpart of \((\mathcal {X}^{\mathbb {Z}},\mu ,S)\), cf. also Sect. 1.1. Left shift S naturally acts on processes by \(S \textbf{X} = \left( X_{i + 1}\right) _{i \in {\mathbb {Z}}}\). In particular,

$$\begin{aligned} S_*(\mu ) = \mu \text { precisely if }S\textbf{X} \sim \textbf{X}. \end{aligned}$$

Similarly, \(\mu _A\) corresponds to the distribution of \(\textbf{X}\) under \(\mathbb {P}_{\textbf{X} \in A}\). To see how one should interpret \(S_A\) in terms of stochastic processes, let \(R_{A} = \inf \left\{ n \geqslant 1\;|\; S^n\textbf{X} \in A\right\} \) be the first return time, defined on \(\textbf{X}\in A\), cf. (1.4). Now, we set \(S_A \textbf{X} = \left( X_{i + R_A}\right) _{i \in {\mathbb {Z}}}\) and one can easily check that

$$\begin{aligned} S_A \mu _A = \mu _A \text { precisely if }S_A \textbf{X} \sim \textbf{X} \text { under }\mathbb {P}_{\textbf{X} \in A}. \end{aligned}$$

Finally, recall that \(h(\mu )={\textbf{H}}\,({\textbf{X}})\).

Let us present a summary of some classical ergodic theorems (formulated for \((\mathcal {X}^{\mathbb {Z}},\mu ,S)\)), with their counterparts for random processes.

 

Ergodic

Probabilistic

Ergodicity of \(\mu ^{1}\)

\(\frac{1}{n}\sum _{i=0}^{n-1}S^i f \rightarrow \int f\, d\mu \)

\(\frac{1}{n}\sum _{i=0}^{n-1}f(S^{i}\textbf{X}) \rightarrow \mathbb {E}f(\textbf{X})\)

Poincaré Rec.

\(\mu _{A}\left( \{x : {S^{k}}x \in A \text { i.o.}\}\right) = 1\)

\({{\mathbb {P}}_{\textbf{X} \in A}}(S^k\textbf{X} \in A\text { i.o.}) = 1\)

Kac’s Lemma

\(\int _A n_A d\mu _A =1\)

\({\mathbb {P}}(\textbf{X} \in A) \mathbb {E}_{\textbf{X} \in A} R_A = 1\)

Invariance of \(\mu _A\)

\(S_A\mu _A =\mu _{A}\)

\(S_A\textbf{X} \sim \textbf{X}\), under \(\mathbb {P}_{\textbf{X} \in A}\)

Ergodicity of \(\mu _A\)

\(\frac{1}{n}\sum _{i=0}^{n-1}S^i f \rightarrow \int f\, d\mu _A\)

\(\frac{1}{n}\sum _{i=0}^{n-1}f(S_A^{i}\textbf{X}) \rightarrow \mathbb {E}_{\textbf{X}\in A} f(\textbf{X})\)

Maker’s ET\(^{2}\)

\(\frac{1}{n}\sum _{i=0}^{n-1}S^{i}f_{n - i} \rightarrow \int f d\mu \)

\(\frac{1}{n}\sum _{i=0}^{n-1}f_{n - i}(S^{i}\textbf{X}) \rightarrow \mathbb {E}f(\textbf{X})\)

  1. \(^{1}\)Here, in fact, we state Birkhoff ergodic theorem under the assumption that \(\mu \) is ergodic
  2. \(^{2}\)ET stands for “ergodic theorem”

We owe the reader a word of explanation concerning the abbreviations in the table above. The convergence of ergodic averages is always meant a.e. / a.s. with respect to the appropriate underlying measure (\(\mu \) or \(\mu _A\) / \(\mathbb {P}\) or \(\mathbb {P}_{\textbf{X} \in A}\)). Also, we tacitly assume that all required assumptions are satisfied, e.g. functions appearing in ergodic averages are integrable with respect to the underlying measure. Finally, let us give some details concerning Maker’s ergodic theorem [29] which will play a central role in the proof of Theorem 1.7 (A). We recall it now (in the ergodic-theoretic language, i.e. as in [25], under the extra assumption that T is ergodic).

Theorem 1.22

(Maker’s ergodic theorem) Let \((X,\mu ,T)\) be an ergodic measure-theoretic dynamical system. Let \(f\in L_1(\mu )\) and \(f_n\rightarrow f\) \(\mu \)-a.e. Suppose that \(\sup _n |f_n|\in L_1(\mu )\). Then

$$\begin{aligned} \frac{1}{n}\sum _{i=0}^{n-1}T^if_{n-i} \rightarrow \mathbb {E}_\mu f \text { a.e.} \end{aligned}$$

Let us now return to our general setting, with standing assumptions (i) and (ii) on \(\textbf{X}\) and \(\textbf{Y}\). Consider the inter-arrival process \(\textbf{T} = {({{T}_i})_{i\in {\mathbb {Z}}}}\), where

$$\begin{aligned} T_i = R_i - R_{i - 1} \end{aligned}$$
(1.8)

and the return-process \(\textbf{R}\) is as in (1.4). Thus, \(T_i\) tells us how much time elapses between \((i-1)\)’th and i’th visit of \(\textbf{Y}\) to the state 1.

Remark 1.23

(Factor of a random process) Recall that whenever \(\textbf{Y}\) is ergodic, the return process \(\textbf{R}\) and thus also \(\textbf{T}\) is well-defined. Moreover, \(\textbf{T}\) can be regarded as a factor of \(\textbf{Y}\) in the ergodic-theoretic sense. More precisely, by the very definition of \(\textbf{T}\), there is a natural measurable function \(\pi :\left( \{0, 1\}^{\mathbb {Z}}, S_{[1]}, \mathcal {L}(\textbf{X}|\; \mathbb {P}_{Y_0 = 1})\right) \rightarrow \left( {\mathbb {Z}}^{\mathbb {Z}}, S, \mathcal {L}(\textbf{T}\;|\; \mathbb {P}_{Y_0 = 1})\right) \) such that \(\pi (\textbf{X}) = \textbf{T}\) almost surely, where \(\mathcal {L}(\cdot \;|\; \cdot )\) stands for the "distribution of \(\cdot \) under \(\cdot \)", \([1] = \{\textbf{y}\; |\; y_0 = 1\}\) and \(S_{[1]}\) is the corresponding induced shift operator (cf. the beginning of this section). Clearly, \(\pi S_{[1]} = S \pi \). In particular, since \(\textbf{Y} \sim S_{[1]}\textbf{Y}\) and \(\textbf{Y}\) is ergodic (under \(\mathbb {P}_{Y_0 = 1}\)), we get that \( \textbf{T}\) is stationary and ergodic (under \(\mathbb {P}_{Y_0 = 1}\)) as well.

As a consequence of the above remark, we can apply Maker’s ergodic theorem to \(\textbf{T}\), which results in the following corollary:

Corollary 1.24

Suppose that \(\sup _{i\in {\mathbb {N}}} g_i(\textbf{T}) \in L_1(\mathbb {P}_{Y_0 = 1})\) and \(g_i\xrightarrow {\mathbb {P}_{Y_0 = 1}\;a.s.} g\). Then, \(\mathbb {P}_{Y_0 = 1}\) a.s.,

2 Examples, comments and proofs

2.1 Examples of non-bilaterally deterministic processes

In the subsections below we tacitly assume that \((\textbf{X},\textbf{Y})\) is good, i.e. \(\textbf{X},\textbf{Y}\) is a pair of mutually independent stationary processes, where \(\textbf{X}\) is finitely-valued, \(\textbf{Y}\) is binary, with \(\mathbb {P}(Y_{0}=1)>0\) and such that the definition of the corresponding return process \(\textbf{R}\) to state 1 makes sense.

2.1.1 Exchangeable processes

Definition 2.1

We say that a process \(\textbf{X}\) is exchangeable if for any \(n\in {\mathbb {N}}\) and distinct times \(i_1, i_2, \ldots , i_n\),

$$\begin{aligned} \left( X_{i_1}, X_{i_2}, \ldots , X_{i_n}\right) \sim \left( X_{1}, X_{2}, \ldots , X_{n}\right) . \end{aligned}$$

In other words, the distribution of \(\textbf{X}\) is invariant under finite permutations.

Remark 2.2

Let \(\textbf{X} = {({{X}_i})_{i\in {\mathbb {Z}}}}\) be exchangeable. By a celebrated result of de Finetti [10] (cf. also [24]), this is equivalent to \(\textbf{X}\) being a convex combination of i.i.d. processes. Thus, there exists a random variable \(\Theta \) such that, conditionally on \(\Theta \), \(\textbf{X}\) is i.i.d. Note that this ensures that \({\textbf{H}}\,(\textbf{X}) > 0\), unless \(X_i = f_i(\Theta )\) for some Borel functions \(f_i\). Indeed,

$$\begin{aligned} {\textbf{H}}\,(X_1, \ldots , X_n) \geqslant {\textbf{H}}\,(X_1, \ldots , X_n)[\Theta ] = \sum _{i = 1}^{n}{\textbf{H}}\,(X_i\,|\,\Theta ) = n{\textbf{H}}\,(X_1\,|\,\Theta ), \end{aligned}$$

which gives \({\textbf{H}}\,(\textbf{X}) \geqslant {\textbf{H}}\,(X_1\,|\,\Theta )\). Therefore, \({\textbf{H}}\,(\textbf{X}) = 0\) implies \(X_i = f_i(\Theta )\).

Remark 2.3

Olshen in [30] showed that if \(\textbf{X} = {({{X}_i})_{i\in {\mathbb {Z}}}}\) is exchangeable then

$$\begin{aligned} \mathcal {I} = \mathcal {E} = {{\mathcal {T}}_d} = {{\mathcal {T}}_f} = {{\mathcal {T}}_p}, \end{aligned}$$

(as measure-algebras), where \(\mathcal {I}, \mathcal {E}\) denote the \(\sigma \)-algebra of shift-invariant and finite permutation invariant sets respectively and \({{\mathcal {T}}_d}\), \({{\mathcal {T}}_f}\), \({{\mathcal {T}}_p}\) are double, future, past tails respectively.

As an immediate consequence of Remark 2.3 and Corollary 1.16, we obtain the following:

Corollary 2.4

Suppose that \(\textbf{X}\) is exchangeable. Then \({\textbf{H}}\,({\textbf{X}})>0\) if and only if \(\textbf{X}\) is not bilaterally deterministic.

Proposition 2.5

Suppose that \(\textbf{X}\) is exchangeable. Then \({\textbf{H}}\,({\textbf{M}}\,|\,{\textbf{Y}}) ={\mathbb {P}}\left( Y_0 = 1\right) {\textbf{H}}\,(\textbf{X})\).

Proof

It follows from the exchangeability of \(\textbf{X}\) that for any negative distinct times \( r_{-i}\), \(i\in {\mathbb {N}}\),

$$\begin{aligned} {\textbf{H}}\,(X_0\,|\,X_{\{r_{-1},r_{-2}, \ldots \}}) = {\textbf{H}}\,(X_0\,|\,X_{\{-1, -2, \ldots \}}) ={\textbf{H}}\,({\textbf{X}}) \end{aligned}$$

It remains to use Theorem 1.7 (A). \(\square \)

2.1.2 Markov chains

Recall that a process \(\textbf{X}\) is a Markov chain if, for every time \(i\in {\mathbb {Z}}\), conditionally on \(X_i\), \(X_{(-\infty , i - 1]}\) is independent of \(X_{[i + 1, \infty )}\). Colloquially, given present, the past and the future are independent. This immediately leads to the following corollary of Theorem 1.7 (A):

Corollary 2.6

If \(\textbf{X}\) is a Markov chain (and \((\textbf{X},\textbf{Y})\) is good) then

$$\begin{aligned} {\textbf{H}}\,({\textbf{M}})={\mathbb {P}}\,({Y}_{0}={1})\sum _{k = 1}^{\infty }\,{\mathbb {P}}_{{Y}_{0}}={1} ({R}_{1}={k}) \,{\textbf{H}}\,(X_{k}\,\,|\,\,X_{\textrm{o}}). \end{aligned}$$

Remark 2.7

Corollary 2.6 easily extends to the case of k-Markov chains but for simplicity sake we decided to present it for \(k = 1\).

Remark 2.8

Let \(\textbf{X} = {({{X}_i})_{i\in {\mathbb {N}}}}\) be a finitely-valued Markov chain, \(X_i \in \mathcal {X}\). It is well-known (see [15], Chapter XV, Section 6, Theorem 3, page 392) that we can uniquely decompose the state space \(\mathcal {X}\) into disjoint union

$$\begin{aligned} \mathcal {X} = C \sqcup D_1 \sqcup D_2 \sqcup \cdots \sqcup D_k, \end{aligned}$$
(2.1)

where C is the set of transient states and \(D_i\) are closed sets. If \(\textbf{X}\) starts in \(D_j\) (i.e. \(X_0 \in D_j\)) then it remains in \(D_j\) forever. If \(X_0 \in C\) then \(\textbf{X}\) stays in C for finite time and jumps to some \(D_j\) (and never leaves \(D_j\) afterwards). Moreover (see [15], Chapter XV, Section 7, Criterion, page 395), if \(\pi \) is a stationary measure then necessarily \(\pi (C) = 0\).

Now suppose that a bilateral, finitely-valued Markov chain \(\textbf{X} ={({{X}_i})_{i\in {\mathbb {Z}}}}\) is stationary (thus, \(C = \emptyset \) in (2.1)). Fix \(1\leqslant j\leqslant k\) and let \(\textbf{X}_{D_j}\) stand for \(\textbf{X}\) conditioned on \(X_0 \in D_j\). By the definition of \(D_j\), process \(\textbf{X}_{D_j}\) is an irreducible (equivalently, ergodic), stationary Markov chain. Now, let \(p_j\) be the period of \(\textbf{X}_{D_j}\). Then \(D_j\) can be decomposed into \(p_j\) disjoint sets (see [6], Chapter 1, Section 3, Theorem 4)

$$\begin{aligned} D_j = D_{j, 0} \sqcup \cdots \sqcup D_{j, p_j - 1} \end{aligned}$$

such that \(\,{\mathbb {P}}\,\,(X_1 \in D_{j, (\ell + 1)\bmod p_j}\;|\; X_0 \in D_{j, \ell }) = 1\). Using Corollary 2 from [3], we get that

$$\begin{aligned} {{\mathcal {T}}_d}\left( \textbf{X}_{D_j}\right) = {{\mathcal {T}}_p}\left( \textbf{X}_{D_j}\right) = {{\mathcal {T}}_f}\left( \textbf{X}_{D_j}\right) = \sigma \left\{ \left\{ X_0 \in D_{j, 0}\right\} , \left\{ X_0 \in D_{j, 1}\right\} , \ldots , \left\{ X_0 \in D_{j, p_j - 1}\right\} \right\} . \end{aligned}$$

Note that Corollary 2 from [3] is stated only for \({{\mathcal {T}}_f}\) but a perusal of the proofs of Theorem 1 and Corollaries 1 and 2 therein gives the same result for \({{\mathcal {T}}_d}\). Thus, \(\textbf{X}\), conditionally on \(X_0 \in D_{j, l}\), has trivial tail \(\sigma \)-algebras. This immediately leads to

$$\begin{aligned} {{\mathcal {T}}_d}\left( \textbf{X}\right) = {{\mathcal {T}}_p}\left( \textbf{X}\right) = {{\mathcal {T}}_f}\left( \textbf{X}\right) = \sigma \left\{ \left\{ X_0 \in D_{j, \ell }\right\} \;|\;1\leqslant j \leqslant k, 0 \leqslant \ell \leqslant p_j\right\} . \end{aligned}$$
(2.2)

Indeed, if for example \(A \in {{\mathcal {T}}_d}\left( \textbf{X}\right) \) then, for all \(j, \ell \), \({\mathbb {P}}\,(A\;|\;X_0 \in D_{j, \ell }) \in \{0, 1\}\) which yields (2.2). As a consequence of (2.2), we obtain the following:

Corollary 2.9

Suppose that \(\textbf{X}\) is a stationary finitely-valued Markov-chain. Then \({\textbf{H}}\,({\textbf{X}})>0\) if and only if \(\textbf{X}\) is not bilaterally deterministic.

Remark 2.10

Since \({\textbf{H}}\,({\textbf{X}})={\textbf{H}}\,(X_1\,|\,X_0)={\textbf{H}}\,(X_{i + 1}\,|\,X_i)\), it follows that \({\textbf{H}}\,({\textbf{X}})=0\) if and only if, for every \(i\in {\mathbb {Z}}\), \(X_i = f_i(X_{0})\) for some functions \(f_i\). It is not hard to see that if for every \(x\in \mathcal {X}\), \({\mathbb {P}}\,(X_0 = x) >0\), then every \(f_i\) must be a bijection on \(\mathcal {X}\). Moreover, by the stationarity of \(\textbf{X}\), for \(f_1(x) = y\), we get

$$\begin{aligned} {\mathbb {P}}\,(X_0 = x)= & {} {\mathbb {P}}\,(X_0 = x, f_1(x) = y) = {\mathbb {P}}\,(X_0 = x, f_1(X_0) = y) \\= & {} {\mathbb {P}}\,(f_i(X_0) = x, f_{i + 1}(X_0) = y) = {\mathbb {P}}\,(X_0 = f_i^{-1}(x))\mathbb {1}_{f_{i + 1}\left( f_i^{-1}(x)\right) = f_1(x)}. \end{aligned}$$

Thus, necessarily, \(f_{i + 1}\left( z\right) = f_1(f_i(z))\). Consequently, if we set \(f:=f_1\) then \(f_{i} = f^{\circ i}\). Moreover, f must be such that, for all x, \({\mathbb {P}}\,(X_0 = x) = {\mathbb {P}}\,(X_0 = f(x))\).

Therefore, if \(\textbf{X}\) is bilateral, finitely-valued, stationary Markov chain, with \({\mathbb {P}}\,(X_0 = x) >0\) for all \(x \in \mathcal {X}\), then the following are equivalent:

  • \(\textbf{X}\) is bilaterally deterministic;

  • there exist a bijection \(f:\mathcal {X} \rightarrow \mathcal {X}\), such that \(X_i = f^{\circ i}(X_0)\) and for all \(x\in \mathcal {X}\), \({\mathbb {P}}\,(X_0 = x) = {\mathbb {P}}\,(X_0 = f(x))\).

2.1.3 Weakly Bernoulli processes

Weakly Bernoulli processes were introduced by Friedman and Ornstein [16] and belong to the classics of ergodic theory. Equivalently, one speaks of finitely determined processes. Recall that any process \(\textbf{X}\) that is weakly Bernoulli is also very weakly Bernoulli (i.e. as a dynamical system, it is isomorphic to a Bernoulli process [31]). In particular, \({\textbf{H}}\,({\textbf{X}})>0\). We refer the reader, e.g., to [36] for more information on the subject.

Suppose now that \(\textbf{X}\) is weakly Bernoulli. Then \({{\mathcal {T}}_d}\) is trivial (see, e.g., Proposition 5.17 in [4]). Therefore, as an immediate consequence of Corollary 1.16, we obtain the following:

Corollary 2.11

Suppose that \(\textbf{X}\) is weakly Bernoulli. Then \(\textbf{X}\) is not bilaterally deterministic.

In fact, the results in [4] are formulated in a different language. One more notion, equivalent to the weak Bernoulli property, is absolute regularity. It first appeared in works of Volkonskii and Rozanov [37, 38] who, in turn, attribute it to Kolmogorov. Fix a probability space \((\Omega ,\mathcal {F},\mathbb {P})\). Let \(\mathcal {A},\mathcal {B}\subset \mathcal {F}\) be sub-\(\sigma \)-algebras and let

$$\begin{aligned} \beta (\mathcal {A},\mathcal {B}):=\sup \frac{1}{2}\sum _{i=1}^{I}\sum _{j=1}^{J}|\mathbb {P}(A_i\cap B_j) - \mathbb {P}(A_i)\mathbb {P}(B_j)|, \end{aligned}$$

where the supremum is taken over all (finite) partitions \(\{A_1,\dots , A_I\}\), \(\{B_1,\dots , B_J\}\) of \(\Omega \), with \(A_i\in \mathcal {A}\), \(B_j\in \mathcal {B}\) for each ij. Now, given a process \(\textbf{X}\), for \(-\infty \leqslant J < L \leqslant \infty \), we define the \(\sigma \)-algebra

$$\begin{aligned} \mathcal {F}_J^L:=\sigma (X_k : J\leqslant k\leqslant L). \end{aligned}$$

Then, for each \(n\geqslant 1\), we define the following \(\beta \)-dependence coefficients:

$$\begin{aligned} \beta (n):=\sup _{j\in {\mathbb {Z}}}\beta (\mathcal {F}_{-\infty }^{j},\mathcal {F}_{j+n}^{\infty }). \end{aligned}$$

We say that \(\textbf{X}\) is absolutely regular (or \(\beta \)-mixing) if \(\beta (n)\rightarrow 0\) as \(n\rightarrow \infty \).

Berbee, in [1], studied \(\beta \)-dependence coefficients for stationary ergodic processes. He showed that

$$\begin{aligned} \lim _{n\rightarrow \infty } \beta (n)= \beta = 1-\frac{1}{p}\text { for some }p\in {\mathbb {N}}\cup \{\infty \}. \end{aligned}$$

Moreover, he proved that if \(\beta <1\) then \({{\mathcal {T}}_d}={{\mathcal {T}}_p}\). As a consequence of his result and of Corollary 1.16, we have:

Corollary 2.12

Suppose that \(\textbf{X}\) is a stationary ergodic process with \(\beta <1\). Then \(\textbf{X}\) is not bilaterally deterministic.

2.2 Proof of the main technical result (Theorem 1.7)

2.2.1 Part (A)

By the chain rule (cf. (1.3)), we have

$$\begin{aligned} {\textbf{H}}\,(M_{[0, n]}\,|\,Y_{[0, n]}) = \sum _{k = 0}^{n} {\textbf{H}}\,(M_k\,|\,Y_{[0, n]}, M_{[0, k)}) =: \sum _{k = 0}^{n} H_{k, n}. \end{aligned}$$
(2.3)

Fix \(0 \leqslant k \leqslant n\). Since \(M_k = X_k \cdot Y_k\) and \(\textbf{X} \amalg \textbf{Y}\), we easily get that conditionally on \(\left( Y_{[0, k]}, M_{[0, k)}\right) \), \(M_k\) is independent of \(Y_{[k + 1, n]}\). In other words,

$$\begin{aligned} H_{k, n} = H_k = {\textbf{H}}\,(M_k\,|\,Y_{[0, k]}, M_{[0, k)}). \end{aligned}$$

Now, using the definition of Shannon conditional entropy, the fact that on the event \(Y_k=0\), we have \(M_k\equiv 0\), whereas on \(Y_k = 1\), we have \(M_k = X_k\) and the stationarity of the \((\textbf{X}, \textbf{Y})\), we get

$$\begin{aligned} H_k= & {} {\mathbb {P}}\,(Y_k = 1){\textbf{H}_{{Y}_k=1}}\,(X_k\,|\,Y_{[0,k)},M_{[0,k)})\\ {}= & {} {\mathbb {P}}\,(Y_0 = 1){{\textbf{H}}_{{Y}_0=1}}\,(X_0\,|\,Y_{[-k,0)},M_{[-k,0)}). \end{aligned}$$

Moreover, if \(Y = Y_{[-k,0)}\), \(M = M_{[-k,0)}\), \(y = y_{[-k,0)}\), \(m = m_{[-k,0)}\), \(s_{-k} = \sum _{i = -k}^{-1}y_i\), \(r_{-s_{-k}}< \cdots < r_{-1}\) are such that \(y_{r_{-i}} = 1\), then

$$\begin{aligned} {{\mathbb {P}}_{Y_{0} = 1}}\,(Y = y , M = m) = {\left\{ \begin{array}{ll} {{\mathbb {P}}_{Y_{0} = 1}}\,(Y = y)\,{\mathbb {P}}\,\left( X_{\{r_{-1}, \ldots , r_{s_{-k}}\}} = m_{\{r_{-1}, \ldots , r_{s_{-k}}\}}\right) , \;\; &{} s_{-k} > 0, \\ {{\mathbb {P}}_{{Y}_{0} = 1}}(Y = y), \;\; &{} s_{-k} = 0, \\ \end{array}\right. } \end{aligned}$$

whenever \(m\leqslant y\) coordinatewise (otherwise, we get zero). This implies that

$$\begin{aligned} H_k= & {} {\mathbb {P}}(Y_0 = 1)\,{{\mathbb {P}}_{{Y}_{0 = 1}}}\,(S_{-k} = 0)\,{\textbf{H}}\,(X_0) \\{} & {} +\,{\mathbb {P}}\,(Y_0 = 1)\,\mathbb {E}_{Y_0 = 1} \mathbb {1}_{S_{-k} > 0}{\textbf{H}}\,(X_0\,|\,X_{\{r_{-1}, \ldots , r_{s_{-k}}\}})_{|_{r_{-i} = R_{-i}, s_{-k} = S_{-k}}}. \end{aligned}$$

Since \(\textbf{Y}\) visits 1 a.s. infinitely many times (in the past),

$$\begin{aligned} {{\mathbb {P}}_{{Y}_{0 = 1}}}(S_{-k} = 0) \rightarrow 0 \text { as }k\rightarrow \infty . \end{aligned}$$

Moreover, \(\mathbb {P}_{Y_0 = 1}\) a.s., we have \(\mathbb {1}_{S_{-k} > 0} \rightarrow 1\) and

$$\begin{aligned} {\textbf{H}}\,(X_0\,|\,X_{\{r_{-1}, \ldots , r_{s_{-k}}\}})_{|_{r_{-i} = R_{-i}, s_{-k} = S_{-k}}} \rightarrow {\textbf{H}}\,(X_0\,|\,X_{\{r_{-1}, r_{-2}, \ldots , \}})_{|_{r_{-i} = R_{-i} }}. \end{aligned}$$

Thus, by the bounded convergence theorem, we get that

$$\begin{aligned} H_k \rightarrow {\mathbb {P}}\,(Y_0 = 1)\mathbb {E}_{Y_0 = 1}{\textbf{H}}\,(X_0\,|\,X_{\{r_{-1}, r_{-2}, \ldots , \}})_{|_{r_{-i} = R_{-i} }}, \end{aligned}$$

which, by (2.3), concludes the proof of Theorem 1.7 (A).

2.2.2 Part (B)

First, we will prove a technical lemma.

Lemma 2.13

We have

$$\begin{aligned} {\textbf{H}}\,(\textbf{X}\cdot \textbf{Y}\,\,|\,\,\textbf{Y})=\lim \limits _{n \rightarrow \infty }\frac{1}{n}\mathbb {E}\mathbb {1}_{S_n > 0}{\textbf{H}}\,(X_{r_0}, X_{r_1},\ldots , X_{r_{s_n - 1}})_{|_{r_i = R_i, s_n = S_n}}. \end{aligned}$$

Proof

Since for any \(k\in {\mathbb {Z}}\), on the event \(Y_k=0\), we have \(M_k\equiv 0\), it follows that

$$\begin{aligned} {\textbf{H}}\,(M_{[0, n]}\,|\,Y_{[0, n]}) = {\mathbb {P}}\,(S_n> 0)\sum _{y_{[0,n]}} {{\mathbb {P}}_{{S}_n > 0}}(Y_{[0, n]} = y_{[0, n]})\,{{\textbf{H}}_{Y_{[0, n]} = y_{[0, n]}}}\,(M_{[0, n ]}). \end{aligned}$$

Moreover, if \(s_n = \sum _{i = 0}^n y_i>0\) then

$$\begin{aligned} {{\mathbb {P}}_{Y_{[0,n]}=y_{[0,n]}}}(M_{[0,n]}=m_{[0,n]})={\mathbb {P}}(X_{r_0}=m_{r_0},\dots , X_{r_{s_n - 1}}=m_{r_{s_n - 1}}), \end{aligned}$$

whenever \(m_{[0,n]}\leqslant y_{[0,n]}\) coordinatewise (otherwise, we get zero). Hence,

$$\begin{aligned} {\textbf{H}}_{{Y}_{[0, n]} = {y}_{[0, n]}}\,(M_{[0, n ]}) = {\textbf{H}}\,(X_{r_0}=m_{r_0},\dots , X_{r_{s_n - 1}}=m_{r_{s_n - 1}}), \end{aligned}$$

which results in

$$\begin{aligned} {\textbf{H}}\,(M_{[0, n]}\,|\,Y_{[0, n]})&= {\mathbb {P}}\,(S_n> 0)\,\mathbb {E}_{S_n> 0}{\textbf{H}}\,(X_{r_0},\dots , X_{r_{s_n - 1}})_{|_{r_i = R_i, s_n = S_n}}\\&= \mathbb {E}\mathbb {1}_{S_n > 0}{\textbf{H}}\,(X_{r_0},\dots , X_{r_{s_n - 1}})_{|_{r_i = R_i, s_n = S_n}}. \end{aligned}$$

This completes the proof. \(\square \)

Notice now that

$$\begin{aligned} \frac{1}{n}{\textbf{H}}\,(X_{r_0},\dots ,X_{r_{s_n-1}})=\frac{1}{n}{\textbf{H}}\,(X_{[0,n]})- \frac{1}{n}{\textbf{H}}\,(X_{[0,n]\setminus \{r_0,\dots , r_{s_n-1}\}}\,|\,X_{r_0},\dots ,X_{r_{s_n-1}}), \end{aligned}$$

\(\lim _{n\rightarrow \infty }\frac{1}{n}{\textbf{H}}\,(X_{[0,n]})={\textbf{H}}\,({\textbf{X}})\) and that (by the ergodicity of \(\textbf{Y}\)) we have \(\mathbb {1}_{S_n > 0 } \rightarrow 1\). Thus, in order to conclude the proof it remains to find \(\lim \nolimits _{n \rightarrow \infty }\mathbb {E}\mathbb {1}_{S_n > 0} H(n, \textbf{R})\) where

$$\begin{aligned} H(n, \textbf{r}) := \frac{1}{n}{\textbf{H}}\,(X_{[0,n]\setminus \{r_0,\dots , r_{s_n-1}\}}\,|\,X_{r_0},\dots ,X_{r_{s_n-1}}), \quad \textbf{r} = {({{r}_i})_{i\in {\mathbb {Z}}}}. \end{aligned}$$

More precisely, if we show that

$$\begin{aligned} \lim \limits _{n \rightarrow \infty } H(n, \textbf{R}) = \mathbb {P}(A_0)\mathbb {E}_{A_0} {\textbf{H}}\,(X_{[r_0+1, r_{1} - 1]}\,|\,X_{(-\infty , r_0]}, X_{\{r_{ 1}, r_{2}, \ldots \}})|_{r_i=R_i} \end{aligned}$$
(2.4)

holds a.e. then by the bounded convergence theorem (as \(H(n,\textbf{R})\leqslant {{\textbf{H}}\,({X}_0)})\) we will have

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb {E}\mathbb {1}_{S_n>0}H(n,\textbf{R}) = \mathbb {P}(A_0)\mathbb {E}_{A_0} {\textbf{H}}\,(X_{[r_0+1, r_{1} - 1]}\,|\,X_{(-\infty , r_0]}, X_{\{r_{ 1}, r_{2}, \ldots \}})|_{r_i=R_i} \end{aligned}$$

since \(\lim _{n\rightarrow \infty }\mathbb {1}_{S_n>0}=1\) a.e. by the ergodicity of \(\textbf{Y}\).

Let

$$\begin{aligned} A_i=[Y_0=\ldots = Y_{i-1}=0,Y_i=1] \text { for }i\geqslant 0 \end{aligned}$$

(in particular, \(A_0=[Y_0=1]\)).

Fix \(\textbf{y}\) and \(n\in {\mathbb {N}}\). By the chain rule, we get

$$\begin{aligned} nH(n, \textbf{r})&= \underbrace{{\textbf{H}}\,(X_{[0,r_0-1]}\,|\,X_{\{r_0,\dots , r_{s_n-1}\}})}_{\Sigma _1(n)} +\underbrace{{\textbf{H}}\,((X_{[r_{s_n-1}+1,n]}\,|\,X_{r_{s_n-1}})}_{\Sigma _3(n)} \\&\quad +\underbrace{\sum _{i=0}^{s_n-2}{\textbf{H}}\,(X_{[r_i+1,r_{i+1}-1]}\,|\,X_{[0,r_i]},X_{\{r_{i+1},\dots ,r_{s_n-1}\}})}_ {\Sigma _2(s_n-1)} . \end{aligned}$$

We will deal first with the summands \(\Sigma _1(n)\) and \(\Sigma _3(n)\). Clearly,

$$\begin{aligned} \frac{1}{n}\Sigma _1(n) \leqslant \frac{1}{n}{\textbf{H}}\,(X_{[0,r_0-1]}) \leqslant \frac{r_0}{n}H(X_0)\rightarrow 0 \end{aligned}$$
(2.5)

when \(n\rightarrow \infty \). Since \(s_n=s_{r_{s_n-1}}\), \(\frac{s_n}{n} \rightarrow {\mathbb {P}}\,(Y_0 = 1) > 0\) (by the ergodicity of \(\textbf{Y}\)) and \(r_{s_n - 1} \rightarrow \infty \), it follows that

$$\begin{aligned} \frac{\Sigma _3(n)}{n} \leqslant \frac{n-r_{s_n-1}}{n}H(X_0)=\left( 1-\frac{r_{s_n-1}}{s_{r_{s_n-1}}}\cdot \frac{s_n}{n}\right) H(X_0) \rightarrow 0. \end{aligned}$$
(2.6)

In order to deal with \(\Sigma _2(s_n-1)\), notice that

$$\begin{aligned} \frac{1}{n}\Sigma _2(s_n-1) =\frac{s_n}{n}\frac{1}{s_n}\Sigma _2(s_n-1). \end{aligned}$$
(2.7)

Because of \(\frac{s_n}{n} \rightarrow {\mathbb {P}}\,(Y_0 = 1)\), it suffices to show that \(\mathbb {P}_{A_0}\)-a.e. we have

$$\begin{aligned} \lim \limits _{n\rightarrow \infty }\frac{1}{n}\Sigma _2(n) = \mathbb {E}_{A_0}{\textbf{H}}\,(X_{[r_0+1, r_{1} - 1]}\,|\,X_{(-\infty , r_0]}, X_{\{r_{ 1}, r_{2}, \ldots \}}). \end{aligned}$$
(2.8)

Using the stationarity of \(\textbf{X}\), for \(t_i = r_i - r_{i - 1}\), we obtain

$$\begin{aligned} \Sigma _2(n)&=\sum _{i=0}^{n-1}{\textbf{H}}\,(X_{[r_i+1,r_{i+1}-1]}\,|\,X_{[0,r_i]},X_{\{r_{i+1},\dots ,r_{n}\}}) \\&=\sum _{i = 0}^{n-1}{\textbf{H}}\,(X_{[1, t_{i + 1} - 1]}\,|\,X_{[-r_i, 0]}, X_{\{t_{i + 1}, \dots , t_{i + 1} + \dots + t_{n}\}}). \end{aligned}$$

We would like to apply Maker’s ergodic theorem to study the above sum. However, we cannot do it directly due to the term \(X_{[-r_i, 0]}\) appearing in the conditional entropies. This obstacle will be overcome by estimating each summand from below and above.

Fix \(k\in {\mathbb {N}}\). Then for every i such that \(r_i \geqslant k\) and for every \(j \in {\mathbb {N}}\), we have

$$\begin{aligned} H_{\infty , j}\left( t_{i + 1}, t_{i + 2}, \ldots \right)\leqslant & {} {\textbf{H}}\,(X_{[1, t_{i + 1} - 1]}\,|\,X_{[-r_i, 0]}, X_{\{t_{i + 1}, \ldots , t_{i + 1} + \cdots t_{i + j}\}}) \nonumber \\\leqslant & {} H_{k, j}\left( t_{i + 1}, t_{i + 2}, \ldots \right) , \end{aligned}$$
(2.9)

where \(H_{k, j}\left( t_{i + 1}, t_{i + 2}, \ldots \right) = {\textbf{H}}\,(X_{[1, t_{i + 1} - 1]}\,|\,X_{(-k, 0]}, X_{\{t_{i + 1}, \ldots , t_{i + 1} + \cdots t_{i + j}\}})\) for \(k \in {\mathbb {Z}}\cup \{\infty \}\). Clearly,

$$\begin{aligned} H_{k, j}\left( t_{1}, t_{2}, \ldots \right) \xrightarrow {j\rightarrow \infty } H_{k}\left( t_{1}, t_{2}, \ldots \right)&:= {\textbf{H}}\,(X_{[1, t_{1} - 1]}\,|\,X_{(-k, 0]}, X_{\{t_{ 1}, t_{1} + t_{2}, \ldots \}})\\&={\textbf{H}}\,(X_{[r_0+1, r_{1} - 1]}\,|\,X_{(-k, r_0]}, X_{\{r_{1}, r_2, \ldots \}}). \end{aligned}$$

By the entropy chain rule and Kac’s lemma,

$$\begin{aligned} \sup _{k, j \in {\mathbb {N}}} H_{k, j}(T_{[1, \infty )}) \leqslant {{\textbf{H}}\,({X}_0)} T_1 \in L_1(\mathbb {P}_{A_0}). \end{aligned}$$
(2.10)

Therefore, Maker’s ergodic theorem implies that, for every \(k\in {\mathbb {N}}\cup \{\infty \}\), \(\mathbb {P}_{A_0}\) a.s., we have

$$\begin{aligned} \lim \limits _{n\rightarrow \infty }\frac{1}{n}\sum _{i = 0}^{n-1} H_{k, n - i}\left( t_{i + 1}, t_{i + 2}, \ldots \right) \rightarrow \mathbb {E}_{A_0} H_{k}\left( T_{1}, T_{2}, \ldots \right) . \end{aligned}$$
(2.11)

Using (2.9), it follows from the definition of \(\Sigma _2\) (and the chain rule) that

$$\begin{aligned} \begin{aligned} \frac{1}{n}\sum _{i=0}^{n-1}H_{\infty ,n-i}(t_{i+1},t_{i+2},\dots )&\leqslant \frac{1}{n}\Sigma _2(n)\\&\leqslant \frac{t_1+\dots +t_k}{n}H(X_0)+ \frac{1}{n}\sum _{i=k}^{n-1}H_{k,n-i}(t_{i+1},t_{i+2},\dots )\\&\leqslant \frac{t_1+\dots +t_k}{n}H(X_0)+\frac{1}{n}\sum _{i=0}^{n-1}H_{k,n-i}(t_{i+1},t_{i+2},\dots ), \end{aligned} \end{aligned}$$
(2.12)

with \(\frac{t_1+\dots +t_k}{n}H(X_0)\xrightarrow {n \rightarrow \infty } 0\). Thus, due to (2.11),

$$\begin{aligned} \mathbb {E}_{Y_0 = 1} H_{\infty }\left( T_{1}, T_{2}, \ldots \right) \leqslant \lim \limits _{n \rightarrow \infty }\frac{1}{n}\Sigma _2(n) \leqslant \mathbb {E}_{Y_0 = 1} H_{k}\left( T_{1}, T_{2}, \ldots \right) . \end{aligned}$$

Notice that \(H_k \rightarrow H_\infty \) as \(k\rightarrow \infty \). Hence, combining (2.10) and the bounded convergence theorem, we obtain

$$\begin{aligned} \lim \limits _{n\rightarrow \infty }\frac{1}{n} \Sigma _2(n)= \mathbb {E}_{A_0} H_{\infty }\left( T_{1}, T_{2}, \ldots \right) \end{aligned}$$
(2.13)

\(\mathbb {P}_{A_0}\) a.s. which is exactly (2.8) under \(\mathbb {P}_{A_0}\).

It remains to show (2.8) under \(\mathbb {P}_{A_i}\) for \(i\geqslant 1\). However, it is a direct consequence of the above and the following lemma:

Lemma 2.14

Suppose that we have a sequence of measurable functions \((f_n)_{n\geqslant 1}\) depending on \((T_n)_{n\geqslant 1}\) and a measurable function f depending on \(\textbf{Y}\) such that

$$\begin{aligned} f_n((T_n)_{n\geqslant 1}) \rightarrow f(\textbf{Y}) \end{aligned}$$
(2.14)

\(\mathbb {P}_{A_0}\)-a.e. Then (2.14) holds also \(\mathbb {P}_{A_i}\)-a.e. for each \(i\geqslant 1\).

Proof

For the sake of simplicity, we assume that \(\textbf{Y}\) is a cannonical process. Let \(B_0\subset A_0\) be the set where (2.14) holds. We claim that \(B_i:=A_i\cap S^{-i}B_0\) is of full measure \(\mathbb {P}_{A_i}\) and (2.14) holds on \(B_i\). Indeed, since \(S^iA_i\subset A_0\), we have

$$\begin{aligned} \mathbb {P}_{A_i}(A_i\setminus B_i)=\frac{1}{\mathbb {P}(A_i)}\mathbb {P}(A_i\setminus S^{-i}B_0)=\frac{1}{\mathbb {P}(A_i)}\mathbb {P}(S^iA_i\setminus B_0)\leqslant \frac{1}{\mathbb {P}(A_i)}\mathbb {P}(A_0\setminus B_0)=0. \end{aligned}$$

Moreover, if \(\textbf{y}\in B_i\) then \(S^i\textbf{y}\in S^iA_i\cap B_0\subset A_0\cap B_0=B_0\). Since \(\textbf{y}\in A_i\), it follows immediately that \(T_n(\textbf{y})=T_n(S^i\textbf{y})\) for all \(n\geqslant 1\) which completes the proof. \(\square \)

2.3 General setting: proof of Corollary 1.19 and related examples

In this section we will study a certain class of good \((\textbf{X},\textbf{Y})\) with no entropy drop. We begin by the proof of Corollary 1.19.

Proof of Corollary 1.19

Let \(L\geqslant 1\) be such that \(\text {supp}\ \textbf{y} \supset L\mathbb {Z} +a\) for some a and for a.e. realization \(\textbf{y}\) of \(\textbf{Y}\). Let \((X,\mathcal {B},\mu ,T)\) be a measure-theoretic dynamical system with entropy less than \(\frac{1}{L}\log 2\) and take a measurable partition \(X=J \cup J^c\) that is generating for the map \(T^L\). Let Y be the orbit closure of \(\textbf{y}\) in \(\{0,1\}^\mathbb {Z}\) under the left shift.

Process \(\textbf{M}\) corresponds to coding of points in \((X\times Y,T\times S)\) with respect to \(J\times C\) (with \(C=[1]\subset Y\)) and its complement. Using Theorem 1.7 (B), we obtain

$$\begin{aligned} {\textbf{H}}\,({\textbf{M}}) ={\textbf{H}}\,({\textbf{X}})-\mathbb {P}(A_0)\mathbb {E}_{A_0}{\textbf{H}}\,(X_{[r_0+1,r_1-1]}\,|\,X_{(-\infty , r_0]},X_{\{r_1,r_2,\dots \}})|_{r_i=R_i}={\textbf{H}}\,({\textbf{X}}). \end{aligned}$$

(a.e. \(\textbf{r}\) contains a two-sided infinite arithmetic progression with difference L, the partition \(\{J,J^c\}\) is generic for \(T^L\) and thus the conditional entropy in the above formula is equal to zero). \(\square \)

It would be interesting to know if in the above example \(\textbf{X}\) can be recovered from \(\textbf{M}\). Let us see now that this can be the case when \(\textbf{Y}\) arises from the rotation on two points \(\{0,1\}\). We will look at it both from the probabilitic and ergodic-theoretic perspective.

Example 2.15

Let \(\left( \xi _i\right) _{i\in {\mathbb {Z}}}\) be a sequence of i.i.d. random variables such that

$$\begin{aligned} {\mathbb {P}}\,(\xi _0 = 0)\, = \,{\mathbb {P}}\,(\xi _0 = 1)\, = \frac{1}{2}, \end{aligned}$$

an arbitrary (relabelling) 1-1 function \(F:\{0,1\}^2 \rightarrow \{0, 1, 2, 3\}\) and put

$$\begin{aligned} X_i = F(\xi _i, \xi _{i + 1}), \qquad \textbf{Y} \sim \frac{1}{2}(\delta _a+\delta _{Sa}), \end{aligned}$$

where , S stands for the left shift and \(\textbf{X} \amalg \textbf{Y}\). Since \(\textbf{X}\) is a Markov chain and F is 1-1, we have

$$\begin{aligned} {\textbf{H}}\,({\textbf{X}}) = {\textbf{H}}\,(X_1\,|\,X_0) = {\textbf{H}}\,(\xi _1, \xi _2\,|\,\xi _0, \xi _1) = {\textbf{H}}\,(\xi _2\,|\,\xi _0, \xi _1) = {\textbf{H}}\,(\xi _2)=\log 2. \end{aligned}$$

Moreover, \(\mathbb {P}_{Y_0=1}(R_{-1}=-2)=1\) and therefore

$$\begin{aligned} \mathbb {E}_{Y_0=1} {\textbf{H}}\,(X_0\,|\,X_{\{r_{-1}, r_{-2}, \cdots \}})|_{r_i=R_i}={\textbf{H}}\,(X_0\,|\,X_{-2})={{\textbf{H}}\,({X}_0)}=2\log 2. \end{aligned}$$

Clearly, for every \(j\in {\mathbb {Z}}\), \(\left( X_i\right) _{i \leqslant j} \amalg \left( X_i\right) _{i \geqslant j + 2}\) yielding

$$\begin{aligned} \frac{1}{n}f(y_{[0,n]}) = \frac{1}{n}{\textbf{H}}\,(X_{r_1}, \ldots , X_{r_m}) = \frac{m}{n} {{\textbf{H}}\,({X}_0)} \rightarrow \frac{1}{2}{{\textbf{H}}\,({X}_0)}. \end{aligned}$$

Thus, by Theorem 1.7 (A), \({\textbf{H}}\,({\textbf{M}}) = \frac{1}{2}{{\textbf{H}}\,({X}_0)} = \frac{1}{2} 2 \log 2 =\log 2 = {\textbf{H}}\,({\textbf{X}})\). In fact, notice that since F is 1-1, knowing all even (resp. all odd) coordinates of a realization \(\textbf{x}\) of \(\textbf{X}\) determines its all coordinates. In other words, \(\textbf{M}\) contains full information about \(\textbf{X}\).

We will now see how to use ergodic-theoretic approach to modify the above idea so that \(X_i \in \{0,1\}\) and keep the property \({\textbf{H}}\,({\textbf{M}})={\textbf{H}}\,({\textbf{X}})\) and the ability to recover \(\textbf{X}\) from \(\textbf{M}\).

Example 2.16

Let \((X,\mathcal {B},\mu ,T)\) be an ergodic automorphism, with \(h(\mu )\in (0,\log 2)\) and let S be the rotation on \(Y=\{0,1\}\), with the unique invariant measure denoted by \(\nu \). Let \(\{J,J^c\}\) be a (measurable) generating partition of X for T (the existence of such a partition follows by Krieger’s finite generator theorem [27]) and let \(C:=\{1\}\subset Y\) We consider the following stationary process:

$$\begin{aligned} \textbf{X}=({\mathbb {1}_J \circ T^i})_{i\in {\mathbb {Z}}} \text { and }\textbf{Y}=({\mathbb {1}_C \circ S^i})_{i\in {\mathbb {Z}}}. \end{aligned}$$

Then \(\textbf{M}:=\textbf{X}\cdot \textbf{Y}\) corresponds to coding of points in the dynamical system \((X\times Y,T\times S)\) with respect to \(J\times C\) and its complement:

Equivalently, \(\textbf{M}\) corresponds to the dynamical system that is a tower of height two above the factor of \(T^2\) corresponding to the partition \(\{J,J^c\}\).

Assume now additionally that \(h(T)<\frac{1}{2}\log 2\) and the partition \(\{J,J^c\}\) is generating for \(T^2\) (e.g. T can be a Bernoulli automorphism, with entropy less than \(\frac{1}{2}\log 2\)). Then \(\textbf{M}\) corresponds to a tower of height two above \(T^2\), denoted by R, and given by

$$\begin{aligned} R(x,0)=(x,1),\ R(x,1)=(T^2x,0). \end{aligned}$$

Notice that R is isomorphic to \(T\times S\) via the map \(\Phi \) given by

$$\begin{aligned} \Phi (x,0)=(x,0),\ \Phi (x,1)=(Tx,1) \end{aligned}$$

(we easily check that \(\Phi \circ R=(T\times S)\circ \Phi \)). It follows that

$$\begin{aligned} {\textbf{H}}\,({\textbf{M}})=h(\mu \otimes \nu )=h(\mu )={\textbf{H}}\,({\textbf{X}})>0. \end{aligned}$$
(2.15)

In fact, since \(\Phi \) is an isomorphism, one can filter out \(\textbf{X}\) from \(\textbf{M}\).

2.4 \({\mathscr {B}}\)-free systems: proof of Proposition 1.20

Let \({\mathscr {B}}\subset \mathbb {N}\), let \(\eta =\mathbb {1}_{\mathcal {F}_{\mathscr {B}}}\) and let \((X_\eta ,S)\) be the corresponding \({\mathscr {B}}\)-free system, with the underlying Mirsky measure \(\nu _\eta \). Recall that:

$$\begin{aligned} h({\widetilde{X}}_\eta ,S)={\overline{d}}(\mathcal {F}_{\mathscr {B}})=\nu _\eta (1), \end{aligned}$$

so \(\nu _\eta \ne \delta _{(\dots ,0,0,0,\dots )}\) is equivalent to \(h({\widetilde{X}}_\eta ,S)>0\). Thus, \(\nu _\eta \ne \delta _{(\dots ,0,0,0,\dots )}\) is necessary and sufficient for the existence of \(\kappa \) with \(h(\nu _\eta *\kappa )>0\).

Proof of Proposition 1.20

It was shown in Theorem 3.7 in [12] that the following are equivalent:

  • \((X_\eta ,S)\) is proximal,

  • \({\mathscr {B}}\) contains an infinite pairwise coprime subset,

  • the support of \(\eta \) does not contain a two-sided infinite arithmetic progression.

Thus, in order to complete the proof of Proposition 1.20, we need to show that in the proximal case, for infinitely many \(k\geqslant 1\) the block of the form \(10\ldots 01\) (with k zeros between the 1’s) is of positive Mirsky measure \(\nu _\eta \). An important notion in the theory of \({\mathscr {B}}\)-free systems is that of tautness [23], defined in terms of the logarithmic density of sets of multiples. We say that \({\mathscr {B}}\) is taut if for any \(b\in {\mathscr {B}}\), we have

$$\begin{aligned} \varvec{\delta }(\mathcal {M}_{{\mathscr {B}}}) > \varvec{\delta }(\mathcal {M}_{{\mathscr {B}}\setminus \{b\}}), \end{aligned}$$

where \(\varvec{\delta }(A)=\lim _{N\rightarrow \infty }\frac{1}{\log N}\sum _{n\leqslant N}\frac{1}{n}\textbf{1}_{A}(n)\) for any \(A\subset \mathbb {Z}\). It was proved in [12] (see Theorem C and Lemma 4.11 therein) that given any \({\mathscr {B}}\), there exists a taut set \({\mathscr {B}}'\) such that \(\mathcal {M}_{{\mathscr {B}}'}\subset \mathcal {M}_{\mathscr {B}}\) and \(\nu _{\eta '}=\nu _\eta \). Keller [26] proved that the Mirsky measure of any taut set has full support. Therefore, whenver \(\nu _\eta =\nu _{\eta '}\ne \delta _{(\dots ,0,0,0,\dots )}\) then any block of the form \(10\dots 01\) appearing in \(\eta '\) (and there are infinitely many such blocks as we exclude the Dirac measure at \((\dots ,0,0,0,\dots )\)!) is in fact of positive \(\nu _\eta \)-measure. \(\square \)