1 Introduction

A (deterministic) substitution replaces each symbol in a finite or infinite string by a concatenation of symbols, according to a fixed rule. If this replacement is instead performed randomly, we speak of a random substitution. The data necessary to determine a random substitution can be given in terms of a tuple \((\vartheta ,{\mathbf {P}})\), where \(\vartheta \) encodes all the possible replacement rules and \({\mathbf {P}}\) the associated probability parameters. To a given random substitution \((\vartheta ,{\mathbf {P}})\), we associate a sequence space \(X_{\vartheta }\), called a random substitution subshift. In non-degenerate cases, this subshift does not depend on the choice of \({\mathbf {P}}\). The bi-infinite sequences \(x \in X_{\vartheta }\) are characterised by the property that every pattern in x can be generated by iterating \(\vartheta \), starting from a single symbol. Fundamental properties of \(\vartheta \) are mirrored by topological, combinatorial and measure theoretic properties of \(X_{\vartheta }\). The influence of \({\mathbf {P}}\) is captured by the choice of a particular probability measure \(\mu _{{\mathbf {P}}}\) on \(X_{\vartheta }\), called the frequency measure of \((\vartheta ,{\mathbf {P}})\). In many cases, these subshifts combine, in a non-trivial manner, properties of classic examples such as subshifts of finite type and (deterministic) substitution subshifts. In fact, these two well-studied classes can be interpreted as special cases of random substitution subshifts [18, 37].

Positive topological entropy for random substitutions was identified in the pioneering work of Godrèche and Luck [16] in 1989, where they introduced and focused on a single example, the random Fibonacci substitution. This was later shown to hold in general for random substitutions [37] and places them in stark contrast to their deterministic counterparts. While they have positive entropy, indicating disorder, random substitutions often admit long-range correlations presenting as a non-trivial pure-point component in the diffraction spectrum of a corresponding quasicrystal [4, 16, 28]. This competition between order and disorder, and between long- and short-range correlations suggests an intricate combinatorial structure which warrants careful study.

The presence of an inherent hierarchical structure allows for the application of renormalisation methods in the study of random substitutions. Leveraging these techniques, the topological entropy was calculated for several examples of random substitution subshifts, see for instance [16, 30], and a unified approach was later provided in [17]. There, it was shown that for subshifts of primitive and compatible random substitutions, the topological entropy coincides with the notion of inflation word entropy, which is characterised in terms of the substitution branching process as opposed to the subshift. This builds a natural bridge to the point of view adopted in formal language theory, where random substitutions—known as (stochastic) E0L, or L systems—are classified according to the set of accessible inflation words [35, 40]. Similarly, the Martin boundaries of random substitutions, studied by Denker and Koslicki [23] are limiting objects of the stochastic process induced by a random substitution, rather than being defined for the associated subshift.

Topological entropy is almost by definition blind to the generating probabilities assigned to a random substitution. This is not the case for aspects such as word frequencies and diffraction spectra, which are almost-sure properties in the limit of an appropriate substitution Markov process [33]. Alternatively, these properties can be associated with the frequency measure \(\mu _{{\mathbf {P}}}\), which is ergodic with respect to the shift-action [19]. It is therefore reasonable to treat entropy on the same footing, interpreting it as a quantity that is generic with respect to a frequency measure that reflects the underlying Markov process. What’s more, this perspective more closely reflects the original context considered by Godrèche and Luck [16], who were interested in random substitutions providing models for generating physical quasicrystals, whose empirical entropy will depend on the underlying Markov process.

A seminal paper of Mandelbrot on turbulence in a fluid [26], which inspired the first formal setup of random substitutions in the physics literature [33], initiated the research into fractal percolation [7, 20, 32]. Random substitutions have proved a useful tool to model this phenomenon [11, 12] and it was shown by Dekking, Grimmett and Meester [10, 11] that varying the underlying generating probabilities gives rise to several phase transitions. In the one-dimensional setting, we show that the associated entropy depends continuously on the generating probabilities and give a closed form expression in many cases. This enables us to single out those parameters that give maximal entropy. We expect that many of the methods established in this paper can be generalized to higher dimensions, which would provide a way to determine the phase in a random percolation model that gives rise to maximal entropy.

More explicitly, we study the entropy of frequency measures corresponding to primitive random substitutions (isolated examples have been previously studied in [39]). We show that the entropy of these measures coincides with a new notion of entropy characterised in terms of inflation words (Theorem 3.3). For subshifts of primitive and compatible random substitutions, we demonstrate the existence of a measure of maximal entropy that is realised as a weak limit of frequency measures (Theorem 4.2). Further, under mild conditions, we prove that there exists a frequency measure of maximal entropy, and for a large class of random substitution subshifts, we verify that this measure is the unique measure of maximal entropy (Theorem 4.8). Indeed, determining dynamical systems which are intrinsically ergodic (i.e. those which exhibit a unique measure of maximal entropy) is a fundamental problem at the interface of ergodic theory and topological dynamics, and stems from the foundational work of Bowen [5]. There, it was shown that a dynamical system which is expansive and satisfies the specification property is intrinsically ergodic. Bowen’s proof relies on combinatorial arguments to establish a (weak) Gibbs property for a certain measure of maximal entropy, from which uniqueness of the measure follows. Beyond specification, for instance for \(\beta \)-shifts, similar strategies can be employed [8, 9]. However, as with Bowen’s proof, central to these strategies is the use of a Gibbs property. In our case there exists an obstruction to using these methods in that frequency measures of maximal entropy do not satisfy the Gibbs properties given in [5, 8, 9]. Nevertheless, by establishing a weaker Gibbs property on cylinder sets of inflation words (Lemma 4.11), we are able to circumvent this obstruction to obtain Theorem 4.8.

Outline. In Sect. 2 we introduce our key notation and definitions. We summarise the main results on topological entropy from [17] in Sect. 2.3, and give the definition of the frequency measure corresponding to a primitive random substitution in Sect. 2.4.

In Sect. 3 we introduce the notion of measure theoretic inflation word entropy and state our first main result, Theorem 3.3, which shows, for primitive random substitutions, that this new notion of entropy coincides with the entropy of the corresponding frequency measure. We also obtain explicit upper and lower bounds. Under some additional assumptions, closed form expressions for the entropy can be obtained from Theorem 3.5.

We conclude with Sects. 4 and 5. Section 4 is devoted to measures of maximal entropy and intrinsic ergodicity of random substitution subshifts, and Sect. 5 contains a number of examples that illustrate our main results and a collection of open questions.

2 Preliminaries

The symbolic notation introduced in this section is mostly in line with [3, 25], to which we refer the reader for further details. For background on random substitutions as introduced below, we point the reader to [19, 37].

An alphabet \({\mathcal {A}} = \{ a_{1}, \ldots , a_{d} \}\), for some \(d \in {\mathbb {N}}\), is a finite set of symbols \(a_{i}\), which we call letters, equipped with the discrete topology. A word u with letters in \({\mathcal {A}}\) is a finite concatenation of letters, namely \(u = a_{i_{1}} \cdots a_{i_{n}}\) for some \(n \in {\mathbb {N}}\). We write \(|u |= n\) for the length of the word u, and for \(m \in {\mathbb {N}}\), we let \({\mathcal {A}}^{m}\) denote the set of all words of length m with letters in \({\mathcal {A}}\). We set \({\mathcal {A}}^{+} = \bigcup _{m \in {\mathbb {N}}} {\mathcal {A}}^{m}\) and let \({\mathcal {A}}^{{\mathbb {Z}}} = \{ \cdots a_{i_{-1}} a_{i_{0}} a_{i_{1}} \cdots : a_{i_j} \in {\mathcal {A}} \; \text {for all} \; j \in {\mathbb {Z}} \}\) denote the set of all bi-infinite sequences with elements in \({\mathcal {A}}\) and endow \({\mathcal {A}}^{{\mathbb {Z}}}\) with the product topology. With this topology, the space \({\mathcal {A}}^{{\mathbb {Z}}}\) is compact and metrisable.

If i and \(j \in {\mathbb {Z}}\) with \(i \le j\), and \(x = \cdots x_{-1} x_{0} x_{1} \cdots \in {\mathcal {A}}^{{\mathbb {Z}}}\), then we let \(x_{[i,j]} = x_i x_{i+1} \cdots x_{j}\). We use the same notation if \(v \in {\mathcal {A}}^{+}\) and \(1 \le i \le j \le |v|\). For u and \(v \in {\mathcal {A}}^{+}\) (or \(v \in {\mathcal {A}}^{{\mathbb Z}}\)), we write \(u \triangleleft v\) if u is a subword of v, namely if there exist i and \(j \in {\mathbb {Z}}\) with \(i \le j\) so that \(u = v_{[i, j]}\). For u and \(v \in {\mathcal {A}}^{+}\), we set \(|v |_u\) to be the number of (possibly overlapping) occurrences of u as a subword of v.

If \(u = a_{i_1} \cdots a_{i_n}\) and \(v = a_{j_1} \cdots a_{j_m} \in {\mathcal {A}}^{+}\), for some n and \(m \in {\mathbb {N}}\), we write uv for the concatenation of u and v, that is, we set \(uv = a_{i_1} \cdots a_{i_n} a_{j_1} \cdots a_{j_m} \in {\mathcal {A}}^{n+m}\). The abelianisation of a word \(u \in {\mathcal {A}}^{+}\) is the vector \(\Phi (u) \in {\mathbb N}_0^d\), defined by \(\Phi (u)_{i} = |u |_{a_i}\) for all \(i \in \{ 1, \ldots , d \}\).

For a set B, we let \(\# B\) be the cardinality of B and let \({\mathcal {F}}(B)\) be the set of non-empty finite subsets of B.

2.1 Random Substitutions and Their Subshifts

We define a random substitution via the data that is required to determine its action on letters. In the second step we extend it to a random map on words.

Definition 2.1

Let \({\mathcal {A}} = \{ a_{1}, \ldots , a_{d} \}\) be a finite alphabet. A random substitution \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) is a finite-set-valued function \(\vartheta :{\mathcal {A}} \rightarrow {\mathcal {F}}({\mathcal {A}}^{+})\) together with a set of non-degenerate probability vectors

$$\begin{aligned} {\mathbf {P}}{=}\left\{ {\mathbf {p}}_i {=} ( p_{i, 1}, \ldots , p_{i, r_i} ) : r_i {=}\# \vartheta (a_i), \, {\mathbf {p}}_i \in (0,1]^{r_i} \text { and } \sum _{j=1}^{r_i} p_{i,j} {=} 1 \text { for all } 1 {\le } i \le d \right\} , \end{aligned}$$

such that

$$\begin{aligned} \vartheta _{{\mathbf {P}}} :a_i \mapsto {\left\{ \begin{array}{ll} s^{(i,1)} &{} \text {with probability } p_{i, 1},\\ \quad \vdots &{}\qquad \qquad \quad \vdots \\ s^{(i,r_i)} &{} \text {with probability } p_{i, r_i}, \end{array}\right. } \end{aligned}$$

for every \(1 \le i \le d\), where \(\vartheta (a_i) = \{ s^{(i,j)} \}_{1\le j \le r_i}\). We call each \(s^{(i,j)}\) a realisation of \(\vartheta _{{\mathbf {P}}}(a_i)\). If there exists an integer \(\ell \ge 2\) such that \(|s^{(i,j)} |= \ell \) for all \(i \in \{ 1, \ldots , d \}\) and \(j \in \{ 1, \ldots , r_i \}\), then we call \(\vartheta _{{\mathbf {P}}}\) a constant length random substitution of length \(\ell \). If \(r_i = 1\) for all \(i \in \{ 1, \ldots , d \}\), then we call \(\vartheta _{{\mathbf {P}}}\) deterministic.

Example 2.2

(Random period doubling) Let \({\mathcal {A}} = \{ a, b\}\), and let \(p \in (0,1)\). The random period doubling substitution \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) is the constant length substitution given by

$$\begin{aligned} \vartheta _{{\mathbf {P}}} :{\left\{ \begin{array}{ll} a \mapsto {\left\{ \begin{array}{ll} ab &{} \text {with probability } p,\\ ba &{} \text {with probability } 1-p, \end{array}\right. }\\ b \mapsto aa \quad \text {with probability } 1, \end{array}\right. } \end{aligned}$$

with defining data \(r_{a} = 2\), \(r_{b} = 1\), \(s^{(a, 1)} = ab\), \(s^{(a, 2)} = ba\), \(s^{(b, 1)} = aa\), \({\mathbf {P}} = \{ {\mathbf {p}}_{a} = (p, 1-p), {\mathbf {p}}_{b} = (1) \}\), and corresponding set-valued function \(\vartheta :a \mapsto \{ab,ba\}, b \mapsto \{aa\}\).

In the following we describe how a random substitution \(\vartheta _{{\mathbf {P}}}\) determines a (countable state) Markov matrix Q, indexed by \({\mathcal {A}}^{+} \times {\mathcal {A}}^{+}\). We interpret the entry \(Q_{u,v}\) as the probability to map a word u to a word v under the random substitution. Formally, \(Q_{a_i, s^{(i,j)}} = p_{i,j}\) for \(j \in \{1,\ldots , r_i\}\) and \(Q_{a_i,v} =0\) if \(v \notin \vartheta (a_i)\). We extend the action of \(\vartheta _{{\mathbf {P}}}\) to finite words by mapping each letter independently to one of its realisations, distinguishing random substitutions from S-adic systems. More precisely, given \(n \in {\mathbb {N}}\), \(u = a_{i_1} \cdots a_{i_n} \in {\mathcal {A}}^{n}\) and \(v \in {\mathcal {A}}^{+}\) with \(|v| \ge n\), we let

$$\begin{aligned} {\mathcal {D}}_n(v) = \{ (v^{(1)},\ldots , v^{(n)}) \in ({\mathcal {A}}^{+})^{n} : v^{(1)} \cdots v^{(n)} = v \} \end{aligned}$$

denote the set of all decompositions of v into n individual words and set

$$\begin{aligned} Q_{u,v} = \sum _{(v^{(1)},\ldots ,v^{(n)}) \in {\mathcal {D}}_n(v)} \prod _{j = 1}^{n} Q_{a_{i_j},v^{(j)}}. \end{aligned}$$

In words, \(\vartheta _{{\mathbf {P}}}(u) = v\) with probability \(Q_{u,v}\).

For \(u \in {\mathcal {A}}^{+}\), let \((\vartheta _{{\mathbf {P}}}^{n}(u))_{n \in {\mathbb {N}}}\) be a stationary Markov chain on some probability space \((\Omega _u, {\mathcal {F}}_u, {\mathbb {P}}_u)\), with Markov matrix given by Q, that is

$$\begin{aligned} {\mathbb {P}}_u [\vartheta _{{\mathbf {P}}}^{n+1}(u) = w \mid \vartheta _{{\mathbf {P}}}^{n}(u) = v] = {\mathbb {P}}_v [\vartheta _{{\mathbf {P}}}(v) = w] = Q_{v,w}, \end{aligned}$$

for all v and \(w \in {\mathcal {A}}^{+}\), and \(n \in {\mathbb {N}}\). In particular, we have

$$\begin{aligned} {\mathbb {P}}_u [\vartheta _{{\mathbf {P}}}^{n}(u) = v] = (Q^{n})_{u,v} \end{aligned}$$

for all u and \(v \in {\mathcal {A}}^{+}\), and \(n \in {\mathbb {N}}\). We often write \({\mathbb {P}}\) for \({\mathbb {P}}_u\) if the initial word is understood. In this case, we also write \({\mathbb {E}}\) for the expectation with respect to \({\mathbb {P}}\). As before, we call v a realisation of \(\vartheta ^{n}_{{\mathbf {P}}}(u)\) if \((Q^{n})_{u,v} > 0\) and set

$$\begin{aligned} \vartheta ^{n}(u) = \{ v \in {\mathcal {A}}^{+} : (Q^{n})_{u,v} > 0\} \end{aligned}$$

to be the set of all realisations of \(\vartheta _{{\mathbf {P}}}^{n}(u)\). Conversely, we may regard \(\vartheta ^{n}_{{\mathbf {P}}}(u)\) as the set \(\vartheta ^{n}(u)\), endowed with the additional structure of a probability vector. If \(u = a \in {\mathcal {A}}\) is a letter, we call a word \(v \in \vartheta ^{k}(a)\) a (level-k) inflation word. The approach of defining a random substitution in terms of an associated Markov chain goes back to work of Peyrière [33] and was pursued further by Koslicki [22], and Denker and Koslicki [23].

For many structural properties of \(\vartheta _{{\mathbf {P}}}\) the choice of (non-degenerate) probability vectors is immaterial. In these cases, one sometimes refers to \(\vartheta \) instead of \(\vartheta _{{\mathbf {P}}}\) as a random substitution, see for instance [17]. On the other hand, for some applications, one needs additional structure on the probability space. In fact, there is an underlying branching process, similar to a Galton–Watson process, that allows one to construct more refined random variables, see [19] for further details.

Given a random substitution \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) over an alphabet \({\mathcal {A}} = \{ a_{1}, \ldots , a_{d} \}\) with cardinality \(d \in {\mathbb {N}}\), we define the substitution matrix \(M = M_{\vartheta _{{\mathbf {P}}}} \in {\mathbb {R}}^{d \times d}\) of \(\vartheta _{{\mathbf {P}}}\) by

$$\begin{aligned} M_{i, j} = {\mathbb {E}}[|\vartheta _{{\mathbf {P}}} (a_{j}) |_{a_{i}}] = \sum _{k = 1}^{r_{j}} p_{j, k} |s^{(j, k)} |_{a_{i}}. \end{aligned}$$

Since M has only non-negative entries, its spectral radius is also a real eigenvalue of maximal modulus, denoted by \(\lambda \). For notational convenience, we denote the maximal length of a (level-1) inflation word by

$$\begin{aligned} |\vartheta | = \max \{ |u| : u \in \vartheta (a), a \in {\mathcal {A}} \}. \end{aligned}$$

By construction, \(1 \le \lambda \le |\vartheta |\), where \(\lambda = 1\) occurs precisely if M is column-stochastic. This corresponds to the trivial case of a non-expanding random substitution, which we discard in the following. If the matrix M is primitive (i.e. if there exists a \(k \in {\mathbb {N}}\) such that all the entries of \(M^{k}\) are positive), Perron–Frobenius theory implies that \(\lambda \) is a simple eigenvalue and that the corresponding left and right eigenvectors \({\mathbf {L}} = (L_{1}, \ldots , L_{d})^{\top }\) and \({\mathbf {R}} = (R_{1}, \ldots , R_{d})^{\top }\) can be chosen to have strictly positive entries. We normalise these eigenvectors according to \(\Vert {\mathbf {R}} \Vert _{1} = 1 = {\mathbf {L}}^{\top } \, {\mathbf {R}}\). In this situation, we call \(\lambda \) the Perron–Frobenius eigenvalue of \(\vartheta _{{\mathbf {P}}}\), and \({\mathbf {L}}\) and \({\mathbf {R}}\) the left and right Perron–Frobenius eigenvectors of \(\vartheta _{{\mathbf {P}}}\), respectively.

Definition 2.3

We say that \(\vartheta _{{\mathbf {P}}}\) is primitive if \(M = M_{\vartheta _{{\mathbf {P}}}}\) is primitive and its Perron–Frobenius eigenvalue satisfies \(\lambda > 1\).

We emphasise that for a random substitution \(\vartheta _{{\mathbf {P}}}\), being primitive is independent of the (non-degenerate) data \({\mathbf {P}}\). In this sense, primitivity is a property of \(\vartheta \) rather than \(\vartheta _{{\mathbf {P}}}\).

Remark 2.1

Primitivity is a standard assumption, both for deterministic and random substitutions. More general (random) substitutions can be treated by bringing M into an upper block-triangular normal form via an appropriate permutation of letters. Throughout most of this paper we stick to the primitive case to avoid technicalities.

For a constant length primitive random substitution of length \(\ell \), an elementary calculation shows that \(\lambda = \ell \); and for a given primitive random substitution \(\vartheta _{{\mathbf {P}}}\) with Perron–Frobenius eigenvalue \(\lambda \) and for \(k \in {\mathbb {N}}\), we have that \(M_{\vartheta _{{\mathbf {P}}}}^{k} = M_{\vartheta _{{\mathbf {P}}}^{k}}\) and hence the Perron–Frobenius eigenvalue of \(\vartheta _{{\mathbf {P}}}^{k}\) is \(\lambda ^{k}\).

Given a random substitution \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\), a word \(u \in {\mathcal {A}}^{+}\) is called (\(\vartheta \)-)legal if there exists an \(a_i \in {\mathcal {A}}\) and \(k \in {\mathbb {N}}\) such that u appears as a subword of some word in \(\vartheta ^{k} (a_i)\). We define the language of \(\vartheta \) by \({\mathcal {L}}_{\vartheta } = \{ u \in {\mathcal {A}}^{+} : u \text{ is } \vartheta \text{-legal } \}\) and, for \(w \in {\mathcal {A}}^{+} \cup {\mathcal {A}}^{{\mathbb {Z}}}\), we let \({\mathcal {L}} (w) = \{ u \in {\mathcal {A}}^{+} : u \triangleleft w \}\) denote the language of w.

Definition 2.4

The random substitution subshift of a random substitution \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) is the system \((X_{\vartheta }, S)\), where \(X_{\vartheta } = \{ w \in {\mathcal {A}}^{{\mathbb {Z}}} : {\mathcal {L}} (w) \subseteq {\mathcal {L}}_{\vartheta } \}\) and S denotes the (left) shift map, defined by \(S (w)_{i} = w_{i+1}\) for each \(w \in X_{\vartheta }\).

If \(\vartheta _{{\mathbf {P}}}\) is primitive, the corresponding sequence space \(X_{\vartheta }\) is always non-empty [19]. The notation \(X_{\vartheta }\) mirrors the fact that the random substitution subshift does not depend on the choice of \({\mathbf {P}}\). We endow \(X_{\vartheta }\) with the subspace topology inherited from \({\mathcal {A}}^{{\mathbb {Z}}}\), and since \(X_{\vartheta }\) is defined in terms of a language, it is a compact S-invariant subspace of \({\mathcal {A}}^{{\mathbb {Z}}}\). Hence, \(X_{\vartheta }\) is a subshift. For \(n \in {\mathbb {N}}\), we write \({\mathcal {L}}_{\vartheta }^{n} = {\mathcal {L}}_\vartheta \cap {\mathcal {A}}^{n}\) and \({\mathcal {L}}^{n} (w) = {\mathcal {L}}(w) \cap {\mathcal {A}}^{n}\) to denote the subsets of \({\mathcal {L}}_{\vartheta }\) and \({\mathcal {L}} (w)\), respectively, consisting of words of length n. We also note that, when \(\vartheta \) is primitive, \(X_{\vartheta ^{k}} = X_{\vartheta }\) for all \(k \in {\mathbb {N}}\).

The set-valued function \(\vartheta \) naturally extends to \(X_{\vartheta }\), where for \(w = \cdots w_{-1} w_{0} w_{1} \cdots \in X_{\vartheta }\) we let \(\vartheta (w)\) denotes the (infinite) set of sequences of the form \(v = \cdots v_{-2} v_{-1}.v_0 v_1 \cdots \), with \(v_j \in \vartheta (w_j)\) for all \(j \in {\mathbb {Z}}\). By definition, it is easily verified that \(\vartheta (X_{\vartheta }) \subset X_{\vartheta }\). Some properties of \(\vartheta \) are reminiscent of continuous functions, although \(\vartheta \) itself is not a function. The following property will be useful in our discussion of intrinsic ergodicity (Sect. 4.2) and is also of independent interest.

Lemma 2.5

If \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) is a random substitution and \(X \subset {\mathcal {A}}^{{\mathbb {Z}}}\) is compact, then \(\vartheta (X)\) is compact.

Proof

It suffices to show that \(\vartheta (X)\) is closed. Let \((y^{(n)})_{n \in {{\mathbb {N}}}}\) denote a sequence in \(\vartheta (X)\) and assume that this sequence converges to some \(y \in {\mathcal {A}}^{{\mathbb {Z}}}\). We need to show that \(y \in \vartheta (X)\). To this end, let \((x^{(n)})_{n \in {\mathbb {N}}}\) be a sequence in X with \(y^{(n)} \in \vartheta (x^{(n)})\) for all \(n \in {\mathbb {N}}\). By compactness of X, this sequence has an accumulation point \(x = \cdots x_{-1} x_{0} x_{1} \cdots \in X\). By restricting to an appropriate subsequence, we may assume that

$$\begin{aligned} x^{(m)}_{[-n,n]} = x_{[-n,n]} \end{aligned}$$

for all m and \(n \in {{\mathbb {N}}}\) with \(m \ge n\). In which case,

$$\begin{aligned} y^{(n)}_{\,[-n, n]} = w^{(n)}_{-n} \cdots w^{(n)}_{-1} . w^{(n)}_{0} \cdots w^{(n)}_{n} \end{aligned}$$

with \(w^{(n)}_j \in \vartheta (x_j)\) for all \(j \in \{-n, \ldots , n\}\). As \(( y^{(m)} )_{m \in {\mathbb {N}}}\) converges to y, we may assume, for \(n \in {\mathbb {N}}\),

$$\begin{aligned} y_{\,[-n, n]} = w^{(n)}_{-n} \cdots w^{(n)}_{-1}. w^{(n)}_{0} \cdots w^{(n)}_n, \end{aligned}$$

again by possibly restricting to an appropriate subsequence. By a standard diagonal argument utilising the pigeonhole principle, we can choose \(w_j \in \vartheta (x_j)\) for all \(j \in {{\mathbb {Z}}}\) such that \(y = \cdots w_{-2} w_{-1}. w_0 w_1 w_2 \cdots \). Namely, we have that \(y \in \vartheta (x)\). \(\square \)

2.2 Special Classes of Random Substitutions

Primitive random substitutions produce a wide variety of subshifts, including for example all topologically transitive shifts of finite type [18] as well as all (primitive) deterministic substitution subshifts. It is therefore reasonable to expect that further assumptions on the random substitution are required in order to obtain a more detailed control over its (measure theoretic) entropy. Indeed, there is a useful property which allows us to obtain more precise estimates that can be shown to fail in the general primitive setting. Recall that for \(v = v_1 \cdots v_n\) the random word \(\vartheta _{{\mathbf {P}}}(v) = \vartheta _{{\mathbf {P}}}(v_1) \cdots \vartheta _{{\mathbf {P}}}(v_n)\) can be written as a concatenation of the random variables \(\vartheta _{{\mathbf {P}}}(v_1),\ldots ,\vartheta _{{\mathbf {P}}}(v_n)\). In general, there might be several realisations of \((\vartheta _{{\mathbf {P}}}(v_1),\ldots ,\vartheta _{{\mathbf {P}}}(v_n))\) that concatenate to the same realisation of \(\vartheta _{{\mathbf {P}}}(v)\). In some situations this phenomenon can be excluded.

Definition 2.6

We say that \(\vartheta _{{\mathbf {P}}}\) has unique realisation paths if for every \(v \in {\mathcal {L}}_{\vartheta }^n\) and \(k \in {\mathbb N}\), the random variable \((\vartheta _{{\mathbf {P}}}^k(v_1), \ldots , \vartheta _{{\mathbf {P}}}^k(v_n))\) is completely determined by \(\vartheta _{{\mathbf {P}}}^k(v)\).

While the definition above is most adequate for our purposes, it is worth pointing out that the property of having unique realisation paths does not depend on the choice of \({\mathbf {P}}\). Indeed, it is straightforward to verify that \(\vartheta _{{\mathbf {P}}}\) has unique realisation paths if and only if for all \(v \in \mathcal L_{\vartheta }^n\) and \(k \in {\mathbb N}\) the concatenation map

$$\begin{aligned} \vartheta ^k(v_1) \times \cdots \times \vartheta ^k(v_n) \rightarrow \mathcal L_{\vartheta }, \quad (w_1,\ldots ,w_n) \mapsto w_1 \cdots w_n \end{aligned}$$

is injective.

The property of having unique realisation paths might appear difficult to check in general. However, there is a general class of random substitutions that satisfy this condition and that is of relevance in the context of random tilings. In the following, we denote by a marginal of \(\vartheta _{{\mathbf {P}}}\) a deterministic substitution \(\varrho \) on the same alphabet \(\mathcal A\), such that \(\varrho (a) \in \vartheta (a)\) for all \(a \in \mathcal A\).

Definition 2.7

We say that a primitive random substitution \(\vartheta _{{\mathbf {P}}}\) is geometrically compatible if there is a real number \(\lambda > 1\) and a vector \({\mathbf {L}}\) with strictly positive entries, such that \({\mathbf {L}}\) is a left eigenvector with eigenvalue \(\lambda \) for all marginals of \(\vartheta _{{\mathbf {P}}}\).

In this situation, it is easy to check that \(\lambda \) and \({\mathbf {L}}\) are indeed Perron–Frobenius data for the substitution matrix M of \(\vartheta _{{\mathbf {P}}}\). Geometric compatibility is equivalent to the assumption that \(\lambda \) and the corresponding eigenline spanned by \({\mathbf {L}}\) are independent of the choice of \({\mathbf {P}}\), which is easy to check for a given example. Moreover, it provides a natural setting in which a random substitution can be interpreted as a random inflation rule on an associated tiling dynamical system. In this geometric model, every letter \(a_i\) is identified with a tile of length \({\mathbf {L}}_i\). This motivates the term geometrically compatible.

Remark 2.2

The class of geometrically compatible random substitutions contains all (primitive) constant-length random substitutions. Indeed, if \(\vartheta _{{\mathbf {P}}}\) is of length \(\ell \) we have \(\lambda = \ell \) and \({\mathbf {L}}_i = 1\) for all \(1 \le i \le d\), irrespective of \({\mathbf {P}}\).

Geometric compatibility is also a generalization of primitive compatible random substitutions. Compatibility has been a standard assumption in much recent work on random substitutions and is particularly useful in those settings, where \(\vartheta \) instead of \(\vartheta _{{\mathbf {P}}}\) is regarded as a random substitution.

Definition 2.8

We say that a random substitution \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) is compatible if for all \(a \in {\mathcal {A}}\), and u and \(v \in \vartheta (a)\), we have \(\Phi (u) = \Phi (v)\).

Observe that compatibility is independent of the choice of probabilities, and that a random substitution \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) is compatible if and only if for all \(u \in {\mathcal {A}}^{+}\), we have that \(|s |_{a} = |t |_{a}\) for all s and \(t \in \vartheta (u)\), and \(a \in {\mathcal {A}}\). We write \(|\vartheta (u) |_{a}\) to denote this common value, and let \(|\vartheta (u) |\) denote the common length of words in \(\vartheta (u)\). In which case, letting \(M = M_{\vartheta _{{\mathbf {P}}}}\) denote the substitution matrix of \(\vartheta _{{\mathbf {P}}}\), we have that \(M_{i, j} = |\vartheta (a_j) |_{a_i}\) for all \(a_i\) and \(a_j \in {\mathcal {A}}\). Note that the random period doubling substitution defined in Example 2.2 is compatible, since \(\Phi (ab) = \Phi (ba) = (1,1)^{\top }\), and is primitive, since the square of its substitution matrix is positive.

The class of geometrically compatible random substitutions contains all compatible random substitutions and all constant length random substitutions but is not confined to them.

Example 2.9

Let \(\vartheta _{{\mathbf {P}}}\) be the primitive random substitution on the alphabet \({\mathcal {A}} = \{a,b\}\) defined by

$$\begin{aligned} \vartheta _{{\mathbf {P}}} :{\left\{ \begin{array}{ll} a \mapsto &{} abb, \\ b \mapsto &{} {\left\{ \begin{array}{ll} a &{} \text{ with } \text{ probability } p, \\ bb &{} \text{ with } \text{ probability } 1-p. \end{array}\right. } \end{array}\right. } \end{aligned}$$

This random substitution is geometrically compatible with \({\mathbf {L}} = (2,1)^{\top }\) and \(\lambda = 2\). It is neither of constant length nor compatible.

Example 2.10

Let \(\vartheta _{{\mathbf {P}}}\) be the primitive random substitution defined by

$$\begin{aligned} \vartheta _{{\mathbf {P}}} :a \mapsto {\left\{ \begin{array}{ll} a &{} \text{ with } \text{ probability } p, \\ ab &{} \text{ with } \text{ probability } 1-p, \end{array}\right. } \qquad b \mapsto {\left\{ \begin{array}{ll} a &{} \text{ with } \text{ probability } q, \\ ba &{} \text{ with } \text{ probability } 1-q. \end{array}\right. } \end{aligned}$$

This is neither geometrically compatible nor does it have unique realisation paths. The latter can be seen from the fact that both (aba) and (aba) are two different realisations of \((\vartheta _{{\mathbf {P}}}(a), \vartheta _{{\mathbf {P}}}(b))\) that give rise to the same word \(aba \in \vartheta (ab)\).

Remark 2.3

Like primitivity, geometric compatibility is stable under taking powers of the random substitution at hand. That is, if \(\vartheta _{{\mathbf {P}}}\) is geometrically compatible, then so is \(\vartheta _{{\mathbf {P}}}^n\) for all \(n \in {\mathbb N}\). This is because the Perron–Frobenius data \((\lambda ,{\mathbf {L}})\) of \(\vartheta _{{\mathbf {P}}}\) is independent of \({\mathbf {P}}\), which in turn implies that the Perron–Frobenius data \((\lambda ^n,{\mathbf {L}})\) of \(\vartheta _{{\mathbf {P}}}^{n}\) is independent of \({\mathbf {P}}\).

Lemma 2.11

Every primitive, geometrically compatible random substitution has unique realisation paths.

Proof

Let \(\vartheta _{{\mathbf {P}}}\) be primitive and geometrically compatible. Since the same holds for \(\vartheta _{{\mathbf {P}}}^k\), we may restrict to the case \(k = 1\) in the following. Let \(v \in {\mathcal {L}}_{\vartheta }^{n}\) and let u be a realisation of the random word

$$\begin{aligned} \vartheta _{{\mathbf {P}}}(v) = \vartheta _{{\mathbf {P}}}(v_1) \cdots \vartheta _{{\mathbf {P}}}(v_n) \end{aligned}$$

and \((u^1, \ldots , u^n)\) a corresponding realisation of \((\vartheta _{{\mathbf {P}}}(v_1),\ldots , \vartheta _{{\mathbf {P}}}(v_n))\) satisfying

$$\begin{aligned} u = u^1 \cdots u^n. \end{aligned}$$

Let \(M_1\) be the substitution matrix of a marginal of \(\vartheta _{{\mathbf {P}}}\) with \(v_1 \mapsto u_1\). Since \({\mathbf {L}}\) has strictly positive entries, there is a unique \(1\le m \le |u|\) such that

$$\begin{aligned} {\mathbf {L}} \Phi (u_{[1,m]}) = {\mathbf {L}} \Phi (u^1) = {\mathbf {L}} M_1 \Phi (v_1) = \lambda {\mathbf {L}}_{v_1}. \end{aligned}$$

This determines \(u^1 = u_{[1,m]}\) unambiguously. Inductively, we find that \(u^k\) is uniquely determined by u for all \(1\le k \le n\). \(\square \)

For the reader’s convenience, we summarize the relation between different characterisations of primitive random substitutions in Fig. 1.

Fig. 1
figure 1

Implication diagram for some conditions on primitive random substitutions

2.3 Topological Entropy

The non-trivial topological entropy of random substitution subshifts distinguishes them from subshifts of deterministic substitutions, which always have zero topological entropy, see [34]. The topological entropy was calculated for several families of random substitutions in [16, 30]. There, the topological entropy was calculated from the growth rate of inflation words. This approach was unified by Gohlke [17], where the notion of inflation word entropy was introduced for compatible primitive random substitutions and shown to equal the topological entropy of the corresponding subshift.

For completeness, let us take a moment to recall the definition of the topological entropy of a subshift, see [6, 38] for further details. Given a subshift (XS), we define the language of the subshift by \(\mathcal L(X) = \{x_{[j,k]} : x \in X, j\le k \}\) and for each \(n \in {\mathbb {N}}\), we let \({\mathcal {L}}^{n} (X)\) denote the set of all words of length n in \({\mathcal {L}} (X)\). The topological entropy \(h_{\text {top}}(X)\) of the system (XS) is defined to be the quantity

$$\begin{aligned} h_{\text {top}}(X) = \lim _{n \rightarrow \infty } \frac{1}{n} \log ( \# {\mathcal {L}}^{n} (X)). \end{aligned}$$
(2.1)

Given a primitive and compatible random substitution \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) over the alphabet \({\mathcal {A}} = \{ a_1, \ldots , a_d \}\), we have that \({\mathcal {L}} (X_{\vartheta }) = {\mathcal {L}}_{\vartheta }\). For each \(m \in {\mathbb {N}}\), let \({\mathbf {q}}_{m} = (q_{m, 1}, \dots , q_{m, d})\) denote the vector defined by

$$\begin{aligned} q_{m,i} = \log (\# \vartheta ^{m} (a_i)) \end{aligned}$$
(2.2)

for \(i \in \{ 1, \ldots , d \}\). When the limit exists, the inflation word entropy of type i is defined by

$$\begin{aligned} t_{i} (\vartheta _{{\mathbf {P}}}) = t_{i} (\vartheta ) = \lim _{m \rightarrow \infty } \frac{q_{m,i}}{|\vartheta ^{m} (a_i) |}. \end{aligned}$$

Theorem 2.12

[17, Theorem 17] Let \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) be a primitive and compatible random substitution over the alphabet \({\mathcal {A}} = \{ a_1, \ldots , a_d \}\) with cardinality \(d \in {\mathbb {N}}\). Let \(\lambda \) denote the Perron–Frobenius eigenvalue of \(\vartheta _{{\mathbf {P}}}\), and let \({\mathbf {R}}\) be the right Perron–Frobenius eigenvector of \(\vartheta _{{\mathbf {P}}}\). For all \(i \in \{ 1, \ldots , d \}\), the inflation word entropy \(t_i (\vartheta )\) exists, is independent of i, and is equal to the topological entropy \(h_{\text {top}}(X_{\vartheta })\) of the system \((X_{\vartheta }, S)\). Moreover, for all \(m \in {\mathbb {N}}\), we have

$$\begin{aligned} \frac{1}{\lambda ^{m}} {\mathbf {q}}_{m}^{\top } {\mathbf {R}} \le t_{i} (\vartheta ) = h_{\text {top}} (X_{\vartheta }) \le \frac{1}{\lambda ^{m}-1} {\mathbf {q}}_{m}^{\top } {\mathbf {R}}, \end{aligned}$$
(2.3)

where the lower bounds are non-decreasing in m. Further, \(h_{\text {top}}(X_{\vartheta })\) can be calculated as

$$\begin{aligned} h_{\text {top}} (X_{\vartheta }) = t_{i} (\vartheta ) = \lim _{m \rightarrow \infty } \frac{1}{\lambda ^{m}} {\mathbf {q}}_{m}^{\top } {\mathbf {R}} = \sup _{m \in {\mathbb {N}}} \frac{1}{\lambda ^{m}} {\mathbf {q}}_{m}^{\top } {\mathbf {R}}. \end{aligned}$$

In general, it is difficult to obtain a closed form formula for the topological entropy using Theorem 2.12. The difficulty lies in quantifying the overlaps of sets of the form \(\vartheta ^{m} (u)\), for \(u \in \vartheta (a_i)\). However, if the random substitution satisfies either of two mild conditions, then it is possible to obtain a closed form expression for the topological entropy using Theorem 2.12.

Definition 2.13

A random substitution \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) is said to satisfy the identical set condition if

$$\begin{aligned} u \; \text {and} \; v \in \vartheta (a)&\implies \vartheta ^{k} (u) = \vartheta ^{k} (v) \end{aligned}$$

for all \(a \in {\mathcal {A}}\) and \(k \in {\mathbb {N}}\). It is said to satisfy the disjoint set condition if

$$\begin{aligned} u \; \text {and} \; v \in \vartheta (a) \; \text {with} \; u \ne v&\implies \vartheta ^{k} (u) \cap \vartheta ^{k} (v) = \varnothing \end{aligned}$$

for all \(a \in {\mathcal {A}}\) and \(k \in {\mathbb {N}}\).

Remark 2.4

An easy way to satisfy the identical set condition is to assume that \(\vartheta (a) = \vartheta (b)\) for all \(a, b \in \mathcal A\). In this case, the corresponding random substitution subshift is a coded shift, generated by the set \(\vartheta (a)\). However, this structure is not necessary for the identical set condition as one may see from the example \(\vartheta :a,b \mapsto \{abc,bac\}, c \mapsto \{a\}\). For further discussion of the identical set condition and the disjoint set condition we refer to the examples in Sect. 5 and [17].

Corollary 2.14

[17, Corollary 18] Assume the setting of Theorem 2.12. If \(\vartheta _{{\mathbf {P}}}\) satisfies the identical set condition, then

$$\begin{aligned} h_{\text {top}} (X_{\vartheta }) = \frac{1}{\lambda } {\mathbf {q}}_{1}^{\top } {\mathbf {R}} = \frac{1}{\lambda } \sum _{i=1}^{d} R_{i} \log (\# \vartheta (a_i)). \end{aligned}$$

If \(\vartheta _{{\mathbf {P}}}\) satisfies the disjoint set condition, then

$$\begin{aligned} h_{\text {top}} (X_{\vartheta }) = \frac{1}{\lambda -1} {\mathbf {q}}_{1}^{\top } {\mathbf {R}} = \frac{1}{\lambda -1} \sum _{i=1}^{d} R_{i} \log (\# \vartheta (a_i)). \end{aligned}$$

Thus, if \(\vartheta _{{\mathbf {P}}}\) satisfies the identical set condition, then the topological entropy of its subshift achieves the lower bound given in (2.3) with \(m=1\), and if \(\vartheta _{{\mathbf {P}}}\) satisfies the disjoint set condition, then it achieves the upper bound given in (2.3) with \(m=1\). In fact, one can show that these bounds are attained precisely when \(\vartheta _{{\mathbf {P}}}\) satisfies the identical/disjoint set condition. The random period doubling substitution defined in Example 2.2 satisfies the disjoint set condition. Hence, it follows by Corollary 2.14 that the corresponding subshift has topological entropy equal to \(\log (2^{2/3})\), noting that \(\lambda = 2\) and \({\mathbf {R}} = (\frac{2}{3},\frac{1}{3})^{\top }\).

2.4 Frequency Measures

For \(v \in {\mathcal {L}} (X)\) and \(m \in {\mathbb {Z}}\), we define the cylinder set of v at position m by

$$\begin{aligned}{}[v]_{m} = \{ w \in X : w_{m + i} = v_{i} \text { for all } 0 \le i \le |v |- 1 \} \end{aligned}$$

and set \([v] = [v]^{}_0\) for convenience. The union of the collection of cylinder sets that specify the zeroth position,

$$\begin{aligned} \xi (X) = \{ [v]_{m} : v \in {\mathcal {L}}_{\vartheta }, \, 1 - |v |\le m \le 0 \} \}, \end{aligned}$$

with \(\{ \varnothing \}\) forms a semi-ring of sets, which generates the Borel \(\sigma \)-algebra \({\mathcal {B}} (X)\). Hence, any content with mass one defined on \(\xi (X) \cup \{ \varnothing \}\) extends uniquely to a probability measure on \({\mathcal {B}} (X)\) by the Hahn-Kolmogorov extension theorem. As we will see shortly, frequency measures are defined in this manner.

Given a primitive random substitution \(\vartheta _{{\mathbf {P}}}\), the expected frequency of a word \(v \in {\mathcal {L}}_{\vartheta }\) is defined by

$$\begin{aligned} \text {freq} (v) = \lim _{k \rightarrow \infty } \frac{{\mathbb {E}} [ |\vartheta _{{\mathbf {P}}}^{k} (a) |_{v} ] }{ {\mathbb {E}} [|\vartheta _{{\mathbf {P}}}^{k} (a)|]}, \end{aligned}$$

where this limit is independent of the choice of \(a \in {\mathcal {A}}\). In fact, we have the stronger property that the word frequencies exist \({\mathbb {P}}\)-almost surely in the limit of large inflation words and are given by \(\text {freq}(v)\) for all \(v \in {\mathcal {L}}_{\vartheta }\), see [19] for further details. It turns out that these frequencies naturally define an ergodic measure supported on \(X_{\vartheta }\).

Proposition 2.15

[19, Proposition 5.3, Theorem 5.9] Let \(\vartheta _{{\mathbf {P}}}\) be a primitive random substitution with subshift \(X_{\vartheta } \ne \varnothing \). Define \(\mu _{{\mathbf {P}}} :\xi (X_{\vartheta }) \cup \{ \varnothing \} \rightarrow [0,1]\) by \(\mu _{{\mathbf {P}}} (\varnothing ) = 0\), \(\mu (X_{\vartheta }) = 1\), and \(\mu _{{\mathbf {P}}} ([v]_{m}) = \text {freq} (v)\) for \(v \in {\mathcal {L}}_{\vartheta }\) and \(m \in \{ 1 - |v |, 2 - |v |, \ldots , 0 \}\). The set function \(\mu _{{\mathbf {P}}}\) is a content with mass one which extends uniquely to a shift-invariant ergodic probability measure on \({\mathcal {B}} (X_{\vartheta })\).

We call the measure \(\mu _{{\mathbf {P}}}\) defined in Proposition 2.15 the frequency measure corresponding to the random substitution \(\vartheta _{{\mathbf {P}}}\). Alternatively, frequency measures can be defined in terms of the right Perron–Frobenius eigenvector of a sequence of induced random substitutions (treating words as letters), which encode information on word frequencies; in particular,

$$\begin{aligned} \mu _{{\mathbf {P}}}([a]) = \text {freq} (a) = R_{a} \quad \text {and} \quad \lim _{k \rightarrow \infty } \frac{{\mathbb {E}}[|\vartheta _{{\mathbf {P}}}^{k} (a) |]}{{\mathbb {E}}[|\vartheta _{{\mathbf {P}}}^{k-1} (a) |]} = \lambda , \end{aligned}$$
(2.4)

for \(a \in {\mathcal {A}}\) – see [19] for further details.

Observe that frequency measures are dependent on the probabilities of the substitution. As such, for the subshift of a primitive random substitution that is non-deterministic, there exist uncountably many frequency measures supported on this subshift [19]. In contrast, the subshift of a primitive deterministic substitution has precisely one frequency measure, which is the unique ergodic measure [34].

3 Measure Theoretic Entropy

If T is an invertible measure preserving transformation of a probability space \((X, {\mathcal {B}}, \nu )\) and if \(\xi \) is a finite measurable partition of X with \(\bigvee _{i \in {\mathbb {Z}}} T^{-i} (\xi ) = {\mathcal {B}}\), up to null sets, then we define the entropy \(h(T, \nu )\) of \(\nu \) with respect to T by

$$\begin{aligned} h(T, \nu ) = \lim _{n \rightarrow \infty } \frac{1}{2n} \sum _{A \in \xi _{n}} \!-\nu (A) \log (\nu (A)), \end{aligned}$$

where \(\xi _{k} = \bigvee _{i = -k}^{k-1} T^{-i} (\xi )\) for \(k \in {\mathbb {N}}\). In the case when \(X \subseteq {\mathcal {A}}^{{\mathbb {Z}}}\) is a subshift and \(\nu \) is an S-invariant probability measure supported on X, it is known that, for \(m \in {\mathbb {N}}\),

$$\begin{aligned} h(S^{m}, \nu ) = \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{u \in {\mathcal {L}}^{mn}(X)} -\nu ([u]) \log (\nu ([u])) = m h(S, \nu ), \end{aligned}$$

where \({\mathcal {L}}^{k}(X)\) denotes the set of all words of length k in the language \({\mathcal {L}}(X)\) of X, for k a natural number. Since a primitive random substitution \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) satisfies \({\mathcal {L}} (X_{\vartheta }) = {\mathcal {L}}_{\vartheta }\), the entropy of a frequency measure \(\mu _{{\mathbf {P}}}\) supported on \(X_{\theta }\) is given by

$$\begin{aligned} h(S, \mu _{{\mathbf {P}}}) = \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{u \in {\mathcal {L}}_{\vartheta }^{n}} \!- \mu _{{\mathbf {P}}} ([u]) \log (\mu _{{\mathbf {P}}} ([u])). \end{aligned}$$

In what follows we will predominantly be concerned with computing the invariant \(h(S, \mu _{{\mathbf {p}}})\) and so when it is clear from the context, we write \(h(\mu _{{\mathbf {p}}})\) for \(h(S, \mu _{{\mathbf {p}}})\); this will be the case in all of what follows, except in the proof of Theorem 4.8.

Two additional concepts, which we will utilise in the proof of Theorem 4.8 is the entropy and conditional entropy of a partition. In order to define these quantities, let \(\eta \) be a second measurable partition of X. The entropy \(H_{\nu }(\eta )\) of \(\eta \) with respect to \(\nu \) is defined to be the quantity

$$\begin{aligned} H_{\nu }(\eta ) = \sum _{A \in \eta } -\nu (A) \log (\nu (A)), \end{aligned}$$

and we note that, by Fekete’s Lemma,

$$\begin{aligned} h(T, \nu ) = \mathop {\mathrm {inf}}_{n \in {\mathbb {N}}} \frac{1}{2n} H_{\nu }(\xi _{n}). \end{aligned}$$
(3.1)

The entropy of \(\xi \) given \(\eta \) with respect to \(\nu \) is defined by

$$\begin{aligned} H_{\nu }(\xi \, \vert \, \eta ) = - \sum _{A \in \eta } \nu (A) H_{\nu ^{}_A}(\xi ), \end{aligned}$$

where \(\nu ^{}_A :B \mapsto \nu (A \cap B)/\nu (A)\) denotes the normalized restriction of \(\nu \) to the set A.

We will mostly be concerned with partitions that are generated by some random map \(\mathcal U\), that is, a measurable function on a probability space \((\Omega ,\mathcal F,\nu )\). More precisely, if \(\mathcal U\) has a finite image \(\mathrm {Im}(\mathcal U)\) (i.e. if it takes only finitely many values), it generates the partition

$$\begin{aligned} \xi (\mathcal U) = \{ \mathcal U^{-1}(u) : u \in \mathrm {Im}(\mathcal U) \}. \end{aligned}$$

To avoid heavy notation, we set

$$\begin{aligned} H_{\nu }(\mathcal U) : = H_{\nu }(\xi (\mathcal U)), \end{aligned}$$

in such situations. If we are dealing with two such random maps \(\mathcal U\) and V, we set

$$\begin{aligned} H_{\nu }(\mathcal U, \mathcal V) : = H_{\nu }(\xi ((\mathcal U, \mathcal V))) \end{aligned}$$

where

$$\begin{aligned} \xi ((\mathcal U, \mathcal V)) = \xi (\mathcal U) \vee \xi (\mathcal V) := \{ A \cap B : A \in \xi (\mathcal U), B \in \xi (\mathcal V) \}, \end{aligned}$$

is a common refinement of the partitions generated by \(\mathcal U\) and \(\mathcal V\). Conditional entropies are defined accordingly. Namely, \(H_{\nu } (\mathcal U \, \vert \, \mathcal V) = H_{\nu } (\xi (\mathcal U) \, \vert \, \xi (\mathcal V))\), \(H_{\nu } (\mathcal U, \mathcal V \, \vert \, \mathcal W) = H_{\nu } (\xi (\mathcal U) \vee \xi (\mathcal V) \, \vert \, \xi (\mathcal W))\) and \(H_{\nu } (\mathcal U \, \vert \, \mathcal V, \mathcal W) = H_{\nu } (\xi (\mathcal U) \, \vert \, \xi (\mathcal V) \vee \xi (\mathcal W))\), where \(\mathcal U, \mathcal V\) and \(\mathcal W\) are random maps on \((\Omega , {\mathcal {F}}, \nu )\).

In the proof of our main results, we will freely use several properties of (conditional) entropy. For the reader’s convenience we list the most important ones in the following; compare [38, Ch. 4].

Lemma 3.1

Let \(\mathcal U, \mathcal V\) and \(\mathcal W\) be (measurable) random maps with finite image as above. Then,

  1. (1)

    \(H_{\nu }(\mathcal U) \le \log (\# \mathrm {Im}(\mathcal U))\), with equality precisely if \(\nu \circ \mathcal U^{-1}\) is equi-distributed.

  2. (2)

    \(H_{\nu }(\mathcal U) \le H_{\nu }(\mathcal U, \mathcal V)\), with equality precisely if \(\mathcal U\) determines \(\mathcal V\) (up to nullsets).

  3. (3)

    \(H_{\nu } (\mathcal U, \mathcal V ) = H_{\nu }(\mathcal V) + H_{\nu }(\mathcal U \, \vert \, \mathcal V)\).

  4. (4)

    \(H_{\nu }(\mathcal U \, \vert \, \mathcal V) \le H_{\nu }(\mathcal U)\), with equality if and only if \(\mathcal U\) and \(\mathcal V\) are independent.

  5. (5)

    \(H_{\nu } (\mathcal U \, \vert \, \mathcal V, \mathcal W) \le H_{\nu } (\mathcal U \, \vert \, \mathcal V)\).

  6. (6)

    \(H_{\nu } (\mathcal U, \mathcal V \, \vert \, \mathcal W) = H_{\nu } (\mathcal U \, \vert \, \mathcal W) + H_{\nu } (\mathcal V \, \vert \, \mathcal U, \mathcal W)\).

We refer the reader to [6, 38] for further details concerning the entropy of a measure preserving transformation and that of a partition.

3.1 Main Results

The aim of this section is to relate the entropy of the frequency measure \(\mu _{{\mathbf {P}}}\) to a sequence of entropy vectors which are related to inflation words \(\vartheta ^n(a)\) with \(n \in {\mathbb N}\) and \(a \in \mathcal A\). This will establish a natural analogue to the results on topological entropy presented in Section 2.3. However, we emphasise that our present setting is more general as we do not require the random substitution to be compatible. We make the standing assumption that \(\vartheta _{{\mathbf {P}}}\) is a primitive random substitution throughout.

Definition 3.2

For a primitive random substitution \(\vartheta _{{\mathbf {P}}}\) on \({\mathcal {A}}\) and \(m \in {\mathbb N}\), we let \({\mathbf {H}}_m = (H_{m,a})_{a \in {\mathcal {A}}}\) denote the vector with entries \(H_{m,a} = H_{{\mathbb {P}}}(\vartheta _{{\mathbf {P}}}^m(a))\) for all \(a \in \mathcal A\).

As a further notational tool, we write H(p) for the entropy of the vector \((p,1-p)\), that is,

$$\begin{aligned} H(p) = - p \log (p) - (1-p) \log (1-p). \end{aligned}$$

Our most general result on the relation between the entropy of \(\mu _{{\mathbf {P}}}\) and the sequence of entropies assigned to the Markov processes \((\vartheta _{{\mathbf {P}}}^n(a))_{n \in {\mathbb N}}\), with \(a \in \mathcal A\), takes the following form.

Theorem 3.3

Let \(\vartheta _{{\mathbf {P}}}\) be a primitive random substitution with Perron–Frobenius eigenvalue \(\lambda \) and right eigenvector \({\mathbf {R}}\). Let \(\mu _{{\mathbf {P}}}\) be its frequency measure on \((X_{\vartheta },S)\). Then, for all \(k \in {\mathbb N}\),

$$\begin{aligned} \frac{1}{\lambda ^k} {\mathbf {H}}_k^{\top } {\mathbf {R}} - H(\lambda ^{-k}) \le h(\mu _{{\mathbf {P}}}) \le \frac{1}{\lambda ^k - 1} {\mathbf {H}}_k^{\top } {\mathbf {R}}. \end{aligned}$$

In particular,

$$\begin{aligned} h(\mu _{{\mathbf {P}}}) = \lim _{k \rightarrow \infty } \frac{1}{\lambda ^k} {\mathbf {H}}_k^{\top } {\mathbf {R}}. \end{aligned}$$

In particularly convenient situations it is possible to omit the counterterm \(H(\lambda ^{-k})\). This is the case if \(\vartheta _{{\mathbf {P}}}\) has unique realisation paths, which allows us to gain more control over the bounds for the measure theoretic entropy. Moreover, in the case when the random substitution satisfies the disjoint set condition we obtain a closed form formula. We also obtain a closed form formula when the random substitution satisfies the identical set condition, provided the production probabilities satisfy the following condition.

Definition 3.4

Let \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) be a random substitution satisfying the identical set condition. We say that \(\vartheta _{{\mathbf {P}}}\) has identical production probabilities if for all \(a \in {\mathcal {A}}\), \(k \in {\mathbb {N}}\) and \(v \in \vartheta ^{k} (a)\), we have

$$\begin{aligned} {\mathbb {P}} [\vartheta _{{\mathbf {P}}}^{k-1} (u_1) = v] = {\mathbb {P}} [\vartheta _{{\mathbf {P}}}^{k-1} (u_2) = v] \end{aligned}$$

for all \(u_1\) and \(u_2 \in \vartheta (a)\).

Theorem 3.5

Assume that \(\vartheta _{{\mathbf {P}}}\) is a primitive random substitution with unique realisation paths, with Perron–Frobenius eigenvalue \(\lambda \) and right eigenvector \({\mathbf {R}}\). Then, for all \(k \in {\mathbb N}\),

$$\begin{aligned} \frac{1}{\lambda ^k} {\mathbf {H}}_k^{\top } {\mathbf {R}} \le h(\mu _{{\mathbf {P}}}) \le \frac{1}{\lambda ^k - 1} {\mathbf {H}}_k^{\top } {\mathbf {R}}, \end{aligned}$$

where the upper bound is an equality if and only if \(\vartheta _{{\mathbf {P}}}^k\) satisfies the disjoint set condition. The lower bound is an equality if and only if \(\vartheta _{{\mathbf {P}}}^k\) satisfies the identical set condition with identical production probabilities. Further, the sequence of lower bounds \((\lambda ^{-n} {\mathbf {H}}^{\top }_n {\mathbf {R}})_{n \in {\mathbb N}}\) is non-decreasing in n.

Remark 3.1

The conditions that allow us to obtain closed expressions for the entropy in Theorem 3.5 have been formulated in a manner that parallels our discussion of topological entropy. They can also be rephrased in probabilistic terms. More precisely, \(\vartheta _{{\mathbf {P}}}\) satisfies the disjoint set conditions if and only if \(\vartheta _{{\mathbf {P}}}(a)\) is determined by \(\vartheta _{{\mathbf {P}}}^n(a)\) for all \(n\in {\mathbb N}\) and \(a \in \mathcal A\). The identical set condition with identical production probabilities holds for \(\vartheta _{{\mathbf {P}}}\) if and only if the random words \(\vartheta _{{\mathbf {P}}}(a)\) and \(\vartheta _{{\mathbf {P}}}^n(a)\) are independent for all \(n \ge 2\) and \(a \in \mathcal A\).

Comparing Theorem 3.3 and Theorem 3.5 one of the most striking differences is that the term \(H(\lambda ^{-k})\) does not appear in the lower bound under the assumption of unique realisation paths. It is natural to inquire whether this term can also be dropped in the more general case of primitive random substitutions. That this is not the case can be seen from the following example.

Example 3.6

Let \(p \in (0,1)\) and let \(\vartheta _{{\mathbf {P}}}\) be the random substitution defined by

$$\begin{aligned} \vartheta _{{\mathbf {P}}} :{\left\{ \begin{array}{ll} a \mapsto &{} {\left\{ \begin{array}{ll} a &{} \text{ with } \text{ probability } p, \\ aba &{} \text{ with } \text{ probability } 1-p, \end{array}\right. } \\ b \mapsto &{} bab. \end{array}\right. } \end{aligned}$$

This random substitution gives rise to the periodic subshift

\(X_{\vartheta } = \{ (ab)^{{\mathbb Z}}, (ba)^{{\mathbb Z}} \}\), which has entropy 0. On the other hand, M is primitive and \(H_{{\mathbb {P}}}(\vartheta _{{\mathbf {P}}}(a)) > 0\).

In general, the measure theoretic entropy \(h(\mu _{{\mathbf {P}}})\) depends on the choice of \({\mathbf {P}}\). As a consequence of Theorem 3.3 we obtain that the dependence on the probability parameters is continuous.

In the following, we regard \({\mathbf {P}}\) as a vector in \({\mathbb {R}}^{r}\) equipped with the Euclidean topology, where \(r = \sum _{i = 1}^{d} r_i = \sum _{i = 1}^{d} \# \vartheta (a_i)\) and d is the cardinality of the alphabet. We emphasize that we assume that \({\mathbf {P}}\) is non-degenerate in the sense that all probabilities are assumed to be strictly positive.

Corollary 3.7

Assume the setting of Theorem 3.3. The map \({\mathbf {P}} \mapsto h (\mu _{{\mathbf {P}}})\) is continuous.

Proof

For \(0< \varepsilon < 1\) let \(D_{\varepsilon }\) be the domain of those \({\mathbf {P}}\) such that all entries of \({\mathbf {P}}\) are greater than \(\varepsilon \). Since we get the complete domain of \({\mathbf {P}}\) as a (nested) union over all \(D_{\varepsilon }\), it is enough to show that the map \({\mathbf {P}} \mapsto h(\mu _{{\mathbf {P}}})\) is continuous on \(D_{\varepsilon }\) for arbitrary \(\varepsilon \). The general strategy of the proof is to represent \(h(\mu _{{\mathbf {P}}})\) as a uniform limit of continuous functions on \(D_{\varepsilon }\) via Theorem 3.3.

Recall that all of the data \(\lambda , {\mathbf {H}}_m, {\mathbf {R}}\) depend implicitly on \({\mathbf {P}}\). By primitivity \(\lambda > 1\) is a simple eigenvalue for all \({\mathbf {P}}\). Since the substitution matrix depends analytically on the probability parameters, we can resort to fundamental facts in perturbation theory; compare for example [21]. In particular, \(\lambda \) depends analytically on \({\mathbf {P}} \in D_{\varepsilon }\) and since \(\lambda \) is simple, so does \({\mathbf {R}}\). The entries of \({\mathbf {H}}_m\) inherit continuity from the fact that the maps \({\mathbf {P}} \mapsto {\mathbb {P}}[\vartheta ^m_{{\mathbf {P}}}(a) = u]\) are continuous for all \(a \in \mathcal A\) and \(u \in \mathcal A^+\). Hence, the function

$$\begin{aligned} s_m :{\mathbf {P}} \mapsto \frac{1}{\lambda ^m} {\mathbf {H}}_m^{\top } {\mathbf {R}}, \end{aligned}$$

is continuous in \({\mathbf {P}}\) for all \(m \in {\mathbb N}\). With this notation, Theorem 3.3 can be rephrased as

$$\begin{aligned} \frac{\lambda ^m - 1}{\lambda ^m} h(\mu _{{\mathbf {P}}}) \le s_m({\mathbf {P}}) \le h(\mu _{{\mathbf {P}}}) + H(\lambda ^{-m}), \end{aligned}$$
(3.2)

for all \(m \in {\mathbb N}\). Note that \(h(\mu _{{\mathbf {P}}})\) is uniformly bounded from above by the topological entropy of \(X_{\vartheta }\) and \(\lambda \) is bounded from below by its minimal value \(\lambda _{\varepsilon }>1\) on the compact set \(D_{\varepsilon }\). Therefore, the convergence

$$\begin{aligned} \lim _{m \rightarrow \infty } s_m({\mathbf {P}}) = h(\mu _{{\mathbf {P}}}) \end{aligned}$$

is uniform on \(D_{\varepsilon }\) which implies the assertion. \(\square \)

3.2 Renormalisation

Properties adhering to a (deterministic) substitution subshift can often be expressed more directly in terms of the corresponding substitution. A key observation in this regard is that a substitution subshift exhibits a self-similar structure that relates it directly to the substitution action via a renormalisation step. More precisely, every sequence in the subshift can be decomposed into inflation words of type \(\vartheta (a)\), with \(a \in \mathcal A\) such that replacing \(\vartheta (a)\) by a gives another sequence in the subshift. This corresponds to an (average) change of the scale by a factor \(\lambda \). In the primitive case, keeping track of letter frequencies during this procedure provides a consistency relation that immediately shows that they must form a right Perron–Frobenius eigenvector of the substitution matrix.

A similar procedure works for word frequencies, if the substitution is replaced by an induced substitution [34]. This can be extended to primitive random substitutions [19, Prop. 5.8], showing that the probability distribution \(\mu ^{(n)}\) on \({\mathcal {L}}_{\vartheta }^n\), given by

$$\begin{aligned} \mu ^{(n)}(w) = \mu _{{\mathbf {P}}}([w]), \end{aligned}$$

is the unique normalised Perron–Frobenius eigenvector of an appropriate induced substitution matrix \(M_n\), for all \(n \in {\mathbb N}\). This gives the following self-consistency relation, which was shown as the first step in the proof of [19, Prop. 5.8].

Lemma 3.8

Let \(\vartheta _{{\mathbf {P}}}\) be a primitive random substitution. Then, for all \(w \in {\mathcal {L}}_{\vartheta }^n\),

$$\begin{aligned} \mu _{{\mathbf {P}}}([w]) = \sum _{v \in {\mathcal {L}}_{\vartheta }^n} \mu _{{\mathbf {P}}}([v]) \frac{1}{\lambda } \sum _{m = 1}^{|\vartheta |} \sum _{j = 1}^{m} {\mathbb {P}} [\vartheta _{{\mathbf {P}}}(v)_{[j,j+n-1]} = w \wedge |\vartheta _{{\mathbf {P}}}(v_1)| = m]. \end{aligned}$$

It will be convenient to interpret the expression appearing in Lemma 3.8 via the distribution of an appropriate random variable that mirrors the action of \(\vartheta _{{\mathbf {P}}}\) on the initial distribution \(\mu _{{\mathbf {P}}}\), together with the choice of the origin in the inflation word decomposition.

Lemma 3.9

For \(n \in {\mathbb N}\), \(\mu ^{(n)}\) is the distribution of a random word \({\mathcal {W}}_n\) on a finite probability space \((\Omega _n, P_n)\), defined as follows. The space

$$\begin{aligned} \Omega _n = \{ (v, u_1, \ldots , u_n, j) : v \in \mathcal L_{\vartheta }^n, u_i \in \vartheta (v_i), 1 \le j \le |u_1| \} \end{aligned}$$

is equipped with the probability vector

$$\begin{aligned} P_n :(v, u_1, \ldots , u_n, j) \mapsto \frac{1}{\lambda } \mu _{{\mathbf {P}}}([v]) \prod _{i = 1}^n {\mathbb {P}} [\vartheta _{{\mathbf {P}}}(v_i) = u_i]. \end{aligned}$$

The random word \({\mathcal {W}}_n\) is defined via

$$\begin{aligned} {\mathcal {W}}_n :(v, u_1, \ldots , u_n, j) \mapsto (u_1 \cdots u_n)_{[j,j+n-1]}. \end{aligned}$$

Proof

Let \(w \in {\mathcal {L}}_{\vartheta }^n\). We note that \(\mathcal W_n^{-1}(\{w\})\) comprises all those elements in \(\Omega _n\) such that the property \((u_1 \cdots u_n)_{[j,j+n-1]} = w\) holds. That is,

$$\begin{aligned} P_n(\mathcal W_n = w) = \sum _{v \in {\mathcal {L}}_{\vartheta }^n} \sum _{u_1, \ldots , u_n} \sum _{j = 1}^{|u_1|} \frac{1}{\lambda } \mu _{{\mathbf {P}}} ([v]) \prod _{i = 1}^n {\mathbb {P}} [\vartheta _{{\mathbf {P}}}(v_i) = u_i] \, \delta _{w, (u_1 \ldots u_n)_{[j,j+n-1]}}. \end{aligned}$$

Comparing with the expression in Lemma 3.8, we further note that

$$\begin{aligned}&{\mathbb {P}} [\vartheta _{{\mathbf {P}}}(v)_{[j,j+n-1]}= \ w \wedge |\vartheta _{{\mathbf {P}}}(v_1)| = m]\\&\qquad = \sum _{u_1, \ldots , u_n} \prod _{i = 1}^n {\mathbb {P}} [\vartheta _{{\mathbf {P}}}(v_i) = \ u_i] \, \delta _{m,|u_1|} \, \delta _{w, (u_1 \ldots u_n)_{[j,j+n-1]}}.\end{aligned}$$

From this, we obtain that \(P_n({\mathcal {W}}_n = w) = \mu _{{\mathbf {P}}}([w])\) and the claim follows. \(\square \)

Remark 3.2

We may interpret the factors occurring in the definition of \(P_n\) in terms of the renormalisation step. The term \(\lambda ^{-1}\) corresponds to a change of scale due to the expansion of the length of words, \(\mu _{{\mathbf {P}}}([v])\) reflects the choice of a word before the inflation step, and each of \({\mathbb {P}}[\vartheta _{{\mathbf {P}}}(v_i) = u_i]\) gives the probability of mapping \(v_i\) to the particular word \(u_i\) as we apply the random substitution. Marginalized to (prefixes) of v, the distribution induced by \(P_n\) and \(\mu _{{\mathbf {P}}}\) are closely related but different in general. To be more precise, we will be interested in the random variable

$$\begin{aligned} {\mathcal {V}}_{[1,m]} :(v ,u_1 \cdots u_n, j) \mapsto v_{[1,m]} \end{aligned}$$

for some \(m \leqslant n\). Integrating out the dependencies on \(u_2,\ldots ,u_n\) and j in the first step, we obtain

$$\begin{aligned} P_n({\mathcal {V}}_{[1,m]} = v')= & {} \frac{1}{\lambda } \sum _{v, v_{[1,m]} = v'} \mu _{{\mathbf {P}}} ([v]) \sum _{u_1} |u_1| {\mathbb {P}}[\vartheta _{{\mathbf {P}}}(v_1) = u_1] \\= & {} \frac{1}{\lambda } \mu _{{\mathbf {P}}} ( [v'] ) {\mathbb {E}} [|\vartheta _{{\mathbf {P}}}(v_1)|]. \end{aligned}$$

The additional factor \(\lambda ^{-1}{\mathbb {E}} [|\vartheta _{{\mathbf {P}}}(v_1)|]\) accounts for the fact that starting the inflation word decomposition of a word within some \(u_1 \in \vartheta (v_1)\) is more probable if \({\mathbb {E}} [|\vartheta _{{\mathbf {P}}}(v_1)|]\) is large.

Lemma 3.9 provides us with an alternative way to calculate the measure theoretic entropy that will be instrumental for the proof of our main theorems.

Lemma 3.10

The measure theoretic entropy \(h(\mu _{{\mathbf {P}}})\) of \((X_{\vartheta }, S, \mu )\) satisfies

$$\begin{aligned} h(\mu _{{\mathbf {P}}}) = \lim _{n \rightarrow \infty } \frac{1}{n} H_{P_n}({\mathcal {W}}_n). \end{aligned}$$

Proof

Let \(I_n :v \mapsto v\) be the identity map on \(\mathcal L_n\). By the definition of measure theoretic entropy,

$$\begin{aligned} h(\mu _{{\mathbf {P}}}) = \lim _{n \rightarrow \infty } \frac{1}{n} H_{\mu ^{(n)}}(I_n). \end{aligned}$$

Since \(\mu ^{(n)} = P_n \circ W_n^{-1}\) by Lemma 3.9, it follows that \( H_{\mu ^{(n)}}(I_n) = H_{P_n}({\mathcal {W}}_n). \)

\(\square \)

3.3 Control Over Large Deviations

A useful property of any primitive random substitution \(\vartheta _{{\mathbf {P}}}\) is that its Perron–Frobenius eigenvalue \(\lambda \) can be regarded as an inflation factor. In the case that \(\vartheta _{{\mathbf {P}}}\) is of constant length \(\ell \), this interpretation is exact in the sense that \(|\vartheta (v)| = \ell |v|\) for all \(v \in \mathcal A^+\) and all realisations of \(\vartheta _{{\mathbf {P}}}(v)\). If \(\vartheta _{{\mathbf {P}}}\) is compatible, \(|\vartheta (v)|\) is still independent of the realisation but might deviate slightly from \(\lambda |v|\). However, we still obtain that \(\lambda \) is arbitrarily close to the actual ratio \(|\vartheta (v)|/|v|\) for large enough values of |v|. This is a consequence of the following result on the length of inflation words which is a mild adaptation of [34, Proposition 5.8] and hence given without proof.

Lemma 3.11

Let \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) be a primitive random substitution that is compatible. Then, given an \(\varepsilon > 0\), there exists \(n_{0} \in {\mathbb {N}}\) such that for all \(v \in {\mathcal {A}}^{+}\) with \(|v |> n_{0}\),

$$\begin{aligned} |v |(\lambda - \varepsilon )< |\vartheta (v) |< |v |(\lambda + \varepsilon ) \text {.} \end{aligned}$$

Moreover, letting \(\tau \) denote the modulus of the second largest eigenvalue of \(M_{\vartheta }\), there exists a constant \(D > 0\) so that, for all \(i \in \{ 1, 2, \ldots , d \}\) and \(m \in {\mathbb {N}}\),

$$\begin{aligned} \lambda ^{m} L_{i} - D \tau ^{m} \le |\vartheta ^{m}(a_{i}) |\le \lambda ^{m} L_{i} + D \tau ^{m}. \end{aligned}$$

In general, such a strong statement does not hold if we drop the assumption of compatibility. However, the probability that \(|\vartheta _{{\mathbf {P}}}(v)|\) deviates by a positive fraction from \(\lambda |v|\) decays quickly with |v| for typical choices of v. We will make this more precise in the following lemma in a form that is useful for our purposes.

Lemma 3.12

Let \(\lambda _-< \lambda < \lambda _+\) and for each \(n \in {\mathbb N}\) fix a positive number \(m = m(n) < n\) such that \(\lim _{n \rightarrow \infty } m(n) = \infty \). Further, let

$$\begin{aligned} A_n = \{ (v, u_1,\ldots , u_n, j) : \lambda _- m \le |u_2 \cdots u_m| \le \lambda _+ m \}, \end{aligned}$$

for all \(n \in {\mathbb N}\). Then, \(\lim _{n \rightarrow \infty } P_n(A_n) = 1\).

Proof

Let \(A_n^u := \{ (u_2,\ldots ,u_m) :\lambda _- m \le |u_2 \cdots u_m| \le \lambda _+ m \}\) be the set of \((u_2,\ldots ,u_m)\)-tuples that extend to elements in \(A_n\). By definition of \(P_n\) and \(A_n\),

$$\begin{aligned} P_n(A_n)&= \frac{1}{\lambda } \sum _{v_{[1,m]}} \mu _{{\mathbf {P}}} ([ v_{[1,m]} ]) \sum _{u_1} |u_1| {\mathbb {P}} [ \vartheta _{{\mathbf {P}}}(v_1) = u_1] \\&\qquad \sum _{ (u_2,\ldots ,u_m) \in A_n^u} \, \prod _{i = 1}^m {\mathbb {P}}[\vartheta _{{\mathbf {P}}}(v_i) = u_i] \\&= \frac{1}{\lambda } \sum _{v_{[1,m]}} \mu _{{\mathbf {P}}} ([v_{[1,m]}]) {\mathbb {E}} [|\vartheta _{{\mathbf {P}}}(v_1)|] \, {\mathbb {P}}[\lambda _- m \le |\vartheta _{{\mathbf {P}}}(v_2 \cdots v_m)| \le \lambda ^+ m]. \end{aligned}$$

We claim that for \(\mu _{{\mathbf {P}}}\)-almost every \(v \in X_{\vartheta }\), it is

$$\begin{aligned} \lim _{m \rightarrow \infty } {\mathbb {P}}[ \lambda _- m \le |\vartheta _{{\mathbf {P}}}(v_2 \cdots v_m)| \le \lambda _+ m ] = 1. \end{aligned}$$
(3.3)

This can be seen as follows. By ergodicity of \(\mu _{{\mathbf {P}}}\), for \(\mu _{{\mathbf {P}}}\)-almost every v and every given \(\delta > 0\) it holds that

$$\begin{aligned} m(R_a - \delta ) \le |v_{[2,m]}|_a \le m(R_a + \delta ), \end{aligned}$$

for each \(a \in {\mathcal {A}}\) and large enough \(m \in {\mathbb N}\). In this case, it follows by standard large deviation arguments (see for example [13]) that for all \(\delta '>0\),

$$\begin{aligned} \sum _{i,v_i = a} |\vartheta _{{\mathbf {P}}}(v_i)| \le (1 + \delta ') m (R_a + \delta ) {\mathbb {E}}[|\vartheta _{{\mathbf {P}}}(a)|], \end{aligned}$$
(3.4)

up to a set \(E = E(m,v,\delta ,\delta ')\) whose probability decays exponentially with m. By the definition of the substitution matrix M, we have

$$\begin{aligned} {\mathbb {E}}[|\vartheta _{{\mathbf {P}}}(a)|] = \sum _{b \in \mathcal A} {\mathbb {E}}[|\vartheta _{{\mathbf {P}}}(a)|_b] = \sum _{b \in \mathcal A} M_{ba}. \end{aligned}$$

Summing over \(a \in {\mathcal {A}}\) in (3.4), we obtain that

$$\begin{aligned} |\vartheta _{{\mathbf {P}}}(v_2 \cdots v_m)| \le m (1 + \delta ') \biggl ( \sum _{a,b \in {\mathcal {A}}} M_{ba} R_a + \delta |\vartheta | \biggr ) = m (1 + \delta ')(\lambda + \delta |\vartheta |), \end{aligned}$$

up to an exponentially decaying probability. Choosing \(\delta , \delta '\) small enough, we get

$$\begin{aligned} |\vartheta _{{\mathbf {P}}}(v_2 \cdots v_m)| \le \lambda _+ m \end{aligned}$$

in these cases. The estimate for the lower bound works by completely analogous arguments. Hence, we have that there exists \(c = c(v) > 0\) and \(m_0 = m_0(v)\) such that

$$\begin{aligned} {\mathbb {P}}[\lambda _- m \le |\vartheta _{{\mathbf {P}}}(v_2 \cdots v_m)| \le \lambda _+ m ] \ge 1 - \mathrm {e}^{- m c}, \end{aligned}$$

for all \(m \ge m_0\). In particular, (3.3) holds \(\mu _{{\mathbf {P}}}\)-almost surely and we get by dominated convergence,

$$\begin{aligned} \lim _{n \rightarrow \infty } P_n(A_n) = \frac{1}{\lambda } \int _{X_{\vartheta }} {\mathbb {E}}[|\vartheta _{{\mathbf {P}}}(v_1)|] \,\mathrm {d}\mu _{{\mathbf {P}}}(v) = 1. \end{aligned}$$

\(\square \)

3.4 The Upper Bound

As a first step towards the proof of our main theorems, we establish the sequence of upper bounds for the measure theoretic entropy that we stated in Theorems 3.3 and 3.5. For ease of notation, we let \(\varphi \) denote the function

$$\begin{aligned} \varphi :x \mapsto - x \log (x), \end{aligned}$$

for positive \(x \in {\mathbb {R}}\), and set \(\varphi (0) = 0\). To handle various terms that are of no concern for the main calculations, we also recall some standard notation on error terms. Given a positive function \(f :{\mathbb N}\rightarrow {\mathbb R}\), we denote by O(f) any function \(g :{\mathbb N}\rightarrow {\mathbb R}\) such that g(n)/f(n) is bounded in n. Similarly, we write o(f) for a function \(g :{\mathbb N}\rightarrow {\mathbb R}\) such that g(n)/f(n) converges to 0 as \(n \rightarrow \infty \).

Proposition 3.13

Let \(\vartheta _{{\mathbf {P}}}\) be a primitive random substitution. Then,

$$\begin{aligned} h(\mu _{{\mathbf {P}}}) \le \frac{1}{\lambda ^k-1} {\mathbf {H}}_k^{\top } {\mathbf {R}}, \end{aligned}$$

for all \(k \in {\mathbb N}\).

Proof

It suffices to show the relation for \(k = 1\), since \(\mu _{{\mathbf {P}}}\) remains the same measure for all powers of \(\vartheta _{{\mathbf {P}}}\). By Lemma 3.10, it is possible to control \(h(\mu _{{\mathbf {P}}})\) via the entropy of \({\mathcal {W}}_n\). We wish to refer to data in \(\Omega _n\) via a set of appropriate random variables. To this end we introduce (or recall in the case of \({\mathcal {V}}_{[1,m]}\))

  • \({\mathcal {V}}_{[1,m]} :(v ,u_1 \cdots u_n, j) \mapsto v_{[1,m]}\) for all \(1 \le m \le n\),

  • \({\mathcal {J}} :(v, u_1, \ldots , u_n, j) \mapsto j\),

  • \({\mathcal {U}}_k :(v, u_1 \cdots u_n,j) \mapsto u_k\) for all \(1 \le k \le n\),

  • \({\mathcal {U}}_{[k,\ell ]} = ({\mathcal {U}}_k, \ldots , {\mathcal {U}}_{\ell })\) for \(1 \le k \le \ell \le n\).

Also recall that \({\mathcal {W}}_n\) is given by \((u_1 \cdots u_n)_{[j,j+n-1]}\). On average, the words \(u_k\) have length \(\lambda \), and therefore, in typical situations, \({\mathcal {W}}_n\) in fact only depends on \(u_k\) with \(1 \le k \le m(n) \), with \(m(n) \approx n / \lambda \). This motivates the following notation. Fix a small \(\varepsilon > 0\) and let \(\lambda _- = \lambda - \varepsilon \). Further, let \(n \in {\mathbb N}\) and

$$\begin{aligned} m = m_+(n) = \Bigl \lceil \frac{n}{\lambda _-}\Bigr \rceil . \end{aligned}$$

As a first step, we bound the entropy by

$$\begin{aligned} H_{P_n}({\mathcal {W}}_n) \le H_{P_n} ({\mathcal {U}}_{[1,m]},{\mathcal {J}}) + H_{P_n} ({\mathcal {W}}_n \, | \, {\mathcal {U}}_{[1,m]}, {\mathcal {J}}). \end{aligned}$$

Setting

$$\begin{aligned} A_n = \{ (v,u_1, \ldots , u_n, j) \in \Omega _n \mid |u_2\cdots u_m| \ge n \}. \end{aligned}$$

we note that on \(A_n\), \({\mathcal {W}}_n\) is given by \((u_1 \cdots u_m)_{[j, j+n-1]}\) and hence is completely determined by \(\mathcal U_{[1,m]}\) and \({\mathcal {J}}\). On \(A_n^C\), we can bound the (conditioned) entropy of \({\mathcal {W}}_n\) by

$$\begin{aligned} \log (\# {\mathcal {L}}_{\vartheta }^n) \le n \log (\# {\mathcal {A}}). \end{aligned}$$

With these two observations, we get

$$\begin{aligned} H_{P_n} ({\mathcal {W}}_n \, | \, {\mathcal {U}}_{[1,m]}, {\mathcal {J}} ) \le P_n(A_n^C) \, n \log (\# {\mathcal {A}}). \end{aligned}$$

By Lemma 3.12, the term \(P_n(A_n^C)\) converges to 0 as \(n \rightarrow \infty \) and hence

$$\begin{aligned} H_{P_n}({\mathcal {W}}_n) \le H_{P_n}({\mathcal {U}}_{[1,m]}, {\mathcal {J}}) + o(n). \end{aligned}$$

On the other hand, since both \({\mathcal {J}}\) and \({\mathcal {U}}_1\) have a bounded number of realisations,

$$\begin{aligned} H_{P_n}({\mathcal {U}}_{[1,m]}, {\mathcal {J}}) = H_{P_n}(\mathcal U_{[2,m]}) + O(1). \end{aligned}$$

Conditioning on \({\mathcal {V}}_{[1,m]}\), we therefore get

$$\begin{aligned} H_{P_n}({\mathcal {W}}_n) \le H_{P_n}({\mathcal {U}}_{[2,m]}) + o(n) \le H_{P_n}({\mathcal {V}}_{[1,m]}) + H_{P_n}({\mathcal {U}}_{[2,m]} \, | \, {\mathcal {V}}_{[1,m]}) + o(n).\nonumber \\ \end{aligned}$$
(3.5)

For the calculation of the entropy \(H_{P_n} ({\mathcal {V}}_{[1,m]})\), recall from Remark 3.2 that

$$\begin{aligned} P_n({\mathcal {V}}_{[1,m]} = v_{[1,m]}) = \frac{1}{\lambda } \mu _{{\mathbf {P}}} ( [v_{[1,m]}] ) {\mathbb {E}} [|\vartheta _{{\mathbf {P}}}(v_1)|]. \end{aligned}$$
(3.6)

In the following, we will convince ourselves that the modification by the factor \(\lambda ^{-1} {\mathbb {E}} [|\vartheta _{{\mathbf {P}}}(v_1)|]\) is inessential for our purposes. To this end, we make use of the general observation that \(\varphi (pq) = p \varphi (q) + q \varphi (p)\). For an arbitrary probability vector \((p_i)_{i \in I}\) and a finite sequence of real numbers \(q = (q_i)_{i \in I}\), this implies

$$\begin{aligned} \sum _{i \in I} \varphi (p_i q_i) \leqslant \max _{i \in I} \varphi (q_i) + \sum _{i \in I} q_i \varphi (p_i). \end{aligned}$$

Using this for \(I = \mathcal L_{\vartheta }\), and the probability vector with entries \(\mu _{{\mathbf {P}}} ( [v_{[1,m]}] )\), we obtain via (3.6),

$$\begin{aligned} H_{P_n}({\mathcal {V}}_{[1,m]})&= {} \sum _{v_{[1,m]} \in \mathcal L_{\vartheta }^m} \varphi \biggl ( \frac{1}{\lambda } \mu _{{\mathbf {P}}} ( [ v_{[1,m]} ] ) {\mathbb {E}}[|\vartheta _{{\mathbf {P}}}(v_1)|] \biggr )\\&= O(1) + \sum _{v_{[1,m]}\in \mathcal L_{\vartheta }^m} \frac{{\mathbb {E}}[|\vartheta _{{\mathbf {P}}}(v_1)|]}{\lambda } \varphi (\mu _{{\mathbf {P}}} ([v_{[1,m]}]) ).\end{aligned}$$

Recall that \(m = m(n)\) implicitly depends on n and note that we can rewrite

$$\begin{aligned}&\frac{1}{n} \sum _{v_{[1,m]}\in \mathcal L_{\vartheta }^m} \frac{{\mathbb {E}}[|\vartheta _{{\mathbf {P}}}(v_1)|]}{\lambda } \varphi (\mu _{{\mathbf {P}}} ([v_{[1,m]}]) ) \\&\quad = \frac{m}{n} \int _{X_{\vartheta }} \frac{- \log (\mu _{{\mathbf {P}}}([v_{[1,m]}]))}{m} \, \frac{{\mathbb {E}}[|\vartheta _{{\mathbf {P}}}(v_1)|]}{\lambda } \,\mathrm {d}\mu _{{\mathbf {P}}}(v). \end{aligned}$$

Due to the ergodicity of \(\mu _{{\mathbf {P}}}\) and the Shannon-MacMillan-Breiman theorem, we have that \(- \log (\mu _{{\mathbf {P}}}([v_{[1,m]}]) )/m\) converges to \(h(\mu _{{\mathbf {P}}})\) in \(L^1(X_{\vartheta },\mu _{{\mathbf {P}}})\) and hence we also get \(L^1\)-convergence for the product with an arbitrary uniformly bounded function g on \(X_{\vartheta }\). Applying this to \(g:v \mapsto \lambda ^{-1} {\mathbb {E}}[|\vartheta _{{\mathbf {P}}}(v_1)|]\) yields

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{1}{n} H_{P_n}( {\mathcal {V}}_{[1,m]})= & {} \frac{1}{\lambda _-} h(\mu _{{\mathbf {P}}}) \sum _{v_1 \in \mathcal A} \mu _{{\mathbf {P}}}([v_1]) \frac{{\mathbb {E}} [|\vartheta _{{\mathbf {P}}}(v_1)|]}{\lambda } \nonumber \\= & {} \frac{1}{\lambda _-} h(\mu _{{\mathbf {P}}}) \frac{1}{\lambda } \sum _{a,b \in {\mathcal {A}}} M_{ba} R_a = \frac{1}{\lambda _-} h(\mu _{{\mathbf {P}}}). \end{aligned}$$
(3.7)

We next turn to the calculation of the conditional entropy \(H_{P_n}({\mathcal {U}}_{[2,m]} \, | {\mathcal {V}}_{[1,m]})\). Denoting by \(P_{n,v_{[1,m]}}\) the normalized restriction of \(P_n\) to \(\{ {\mathcal {V}}_{[1,m]} = v_{[1,m]} \}\), we get via straightforward calculation

$$\begin{aligned} P_{n, v_{[1,m]}} [{\mathcal {U}}_{[2,m]} = (u_2, \cdots , u_m)] = \prod _{i = 2}^{m} {\mathbb {P}}[\vartheta _{{\mathbf {P}}}(v_i) = u_i]. \end{aligned}$$

and thereby

$$\begin{aligned} H_{P_{n, v_{[1,m]}}} ({\mathcal {U}}_{[2,m]}) = \sum _{i = 2}^m H_{{\mathbb {P}}}(\vartheta _{{\mathbf {P}}}(v_i)) = {\mathbf {H}}_1^{\top } \Phi (v_{[2,m]}). \end{aligned}$$

Using (3.6), this yields

$$\begin{aligned} H_{P_n}({\mathcal {U}}_{[2,m]} \, | \, {\mathcal {V}}_{[1,m]}) = \frac{1}{\lambda } \sum _{v_{[1,m]} \in \mathcal L_{\vartheta }^m} \mu _{{\mathbf {P}}} ( [v_{[1,m]}] ) {\mathbb {E}}[|\vartheta _{{\mathbf {P}}}(v_1)|] \, {\mathbf {H}}_1^{\top } \Phi (v_{[2,m]}). \end{aligned}$$

For the corresponding asymptotic behaviour we note that, again by ergodicity of \(\mu _{{\mathbf {P}}}\), \(\Phi (v_{[2,m]})/m\) converges to \({\mathbf {R}}\) for \(\mu _{{\mathbf {P}}}\)-almost every v. Thus,

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{1}{n} H_{P_n}({\mathcal {U}}_{[2,m]} \, | \, {\mathcal {V}}_{[1,m]}) = \frac{1}{\lambda _-} {\mathbf {H}}_1^{\top } {\mathbf {R}} \sum _{v_1 \in \mathcal A} \mu _{{\mathbf {P}}} ([v_1]) \frac{{\mathbb {E}}[|\vartheta _{{\mathbf {P}}}(v_1)|]}{\lambda } = \frac{1}{\lambda _-} {\mathbf {H}}_1^{\top } {\mathbf {R}}.\nonumber \\ \end{aligned}$$
(3.8)

Hence, combining the contributions from (3.7) and (3.8), we get by (3.5),

$$\begin{aligned} h(\mu _{{\mathbf {P}}}) = \lim _{n \rightarrow \infty } \frac{1}{n} H_{P_n}({\mathcal {W}}_n) \le \frac{1}{\lambda _-} \bigl ( h(\mu _{{\mathbf {P}}}) + {\mathbf {H}}_1^{\top } {\mathbf {R}}). \end{aligned}$$

As \(\varepsilon \rightarrow 0\), we obtain \(\lambda _- \rightarrow \lambda \) and hence

$$\begin{aligned} h(\mu _{{\mathbf {P}}}) \le \frac{1}{\lambda - 1} {\mathbf {H}}_1^{\top } {\mathbf {R}}, \end{aligned}$$

completing the proof. \(\square \)

The sequence of vectors \(({\mathbf {H}}_n^{\top } )_{n \in {\mathbb N}}\) can be bounded via a matrix-recursion that involves the substitution matrix.

Proposition 3.14

Let \(\vartheta _{{\mathbf {P}}}\) be a primitive random substitution. Then, for every \(n,k \in {\mathbb N}\), we have that

$$\begin{aligned} {\mathbf {H}}_{n+k}^{\top } \le {\mathbf {H}}_n^{\top } M^k + {\mathbf {H}}_k^{\top }, \end{aligned}$$

to be understood elementwise. In particular,

$$\begin{aligned} {\mathbf {H}}_{n+k}^{\top } {\mathbf {R}} \le \lambda ^k {\mathbf {H}}_n^{\top } {\mathbf {R}} + {\mathbf {H}}_k^{\top } {\mathbf {R}}. \end{aligned}$$

If \(\vartheta _{{\mathbf {P}}}\) has unique realisation paths, equality occurs precisely if \(\vartheta _{{\mathbf {P}}}^{n}(a)\) is completely determined by \(\vartheta _{{\mathbf {P}}}^{n+k}(a)\).

Proof

First, let \(v \in {\mathcal {L}}_{\vartheta }^m\) and note that the random variable \(\vartheta _{{\mathbf {P}}}^n(v)\) can be written as a function of \((\vartheta _{{\mathbf {P}}}^n(v_1), \ldots , \vartheta _{{\mathbf {P}}}^n(v_m))\). Due to the independence of the random variables in the last tuple, we obtain that

$$\begin{aligned} H_{{\mathbb {P}}} (\vartheta _{{\mathbf {P}}}^n(v)) \le H_{{\mathbb {P}}} \bigl (\vartheta _{{\mathbf {P}}}^n(v_1), \ldots , \vartheta _{{\mathbf {P}}}^n(v_m) \bigr ) = \sum _{i = 1}^m H_{{\mathbb {P}}}(\vartheta _{{\mathbf {P}}}^n(v_i)) = {\mathbf {H}}_n^{\top } \Phi (v). \end{aligned}$$

If \(\vartheta _{{\mathbf {P}}}\) has unique realisation paths, we even obtain equality. Using the Markov property of the substitution process in the first step, we get for every \(a \in {\mathcal {A}}\),

$$\begin{aligned} H_{{\mathbb {P}}}(\vartheta _{{\mathbf {P}}}^{n+k}(a)| \vartheta _{{\mathbf {P}}}^k(a))&= \sum _{v \in \vartheta _{{\mathbf {P}}}^k(a)} {\mathbb {P}}[\vartheta _{{\mathbf {P}}}^k(a) = v] H_{{\mathbb {P}}} (\vartheta _{{\mathbf {P}}}^n(v))\\ {}&\le {\mathbf {H}}_n^{\top } \sum _{v \in \vartheta _{{\mathbf {P}}}^k(a)} {\mathbb {P}}[\vartheta _{{\mathbf {P}}}^k(a) = v] \Phi (v) \\ {}&= {\mathbf {H}}_n^{\top } {\mathbb {E}} [\Phi (\vartheta _{{\mathbf {P}}}^k(a))] = {\mathbf {H}}_n^{\top } M^k e_a,\end{aligned}$$

again with equality in case of unique realisation paths. Therefore, for all \(a \in {\mathcal {A}}\),

$$\begin{aligned} H_{{\mathbb {P}}}(\vartheta _{{\mathbf {P}}}^{n+k}(a)) \le H_{{\mathbb {P}}}(\vartheta _{{\mathbf {P}}}^{n+k}(a)|\vartheta _{{\mathbf {P}}}^k(a)) + H_{{\mathbb {P}}}(\vartheta _{{\mathbf {P}}}^k(a)) \le {\mathbf {H}}_n^{\top } M^k e_a + H_{k,a}. \end{aligned}$$

The first inequality is an equality precisely if \(\vartheta _{{\mathbf {P}}}^k(a)\) is completely determined by \(\vartheta _{{\mathbf {P}}}^{n+k}(a)\) and the second inequality is an equality, provided that \(\vartheta _{{\mathbf {P}}}\) has unique realisation paths. \(\square \)

Corollary 3.15

Let \(\vartheta _{{\mathbf {P}}}\) be a primitive random substitution. Then, for all \(n \in {\mathbb N}\),

$$\begin{aligned} \frac{1}{\lambda ^n - 1} {\mathbf {H}}_n^{\top } {\mathbf {R}} \le \frac{1}{\lambda - 1} {\mathbf {H}}_1^{\top } {\mathbf {R}}. \end{aligned}$$

If \(\vartheta _{{\mathbf {P}}}\) has unique realisation paths, we have equality for all \(n \in {\mathbb N}\) if and only if \(\vartheta _{{\mathbf {P}}}\) satisfies the disjoint set condition.

Proof

Given \(n \ge 2\), iterating the relation \({\mathbf {H}}_n^{\top } {\mathbf {R}} \le \lambda ^{n-1} {\mathbf {H}}_1^{\top } {\mathbf {R}} + {\mathbf {H}}_{n-1}^{\top } {\mathbf {R}}\) yields

$$\begin{aligned} {\mathbf {H}}_n^{\top } {\mathbf {R}} \le {\mathbf {H}}_1^{\top } {\mathbf {R}} \sum _{k = 0}^{n-1} \lambda ^k = \frac{\lambda ^n -1}{\lambda - 1} {\mathbf {H}}_1^{\top } {\mathbf {R}}, \end{aligned}$$

immediately giving the required inequality. Given the property of unique realisation paths, equality holds if and only if \(\vartheta _{{\mathbf {P}}}^n(a)\) completely determines \(\vartheta _{{\mathbf {P}}}(a)\) for all \(a \in {\mathcal {A}}\) and \(n \in {\mathbb N}\). This is just a reformulation of the disjoint set condition; compare Remark 3.1. \(\square \)

3.5 The Lower Bound

In this section, we will establish the lower bounds for the measure theoretic entropy in Theorems 3.3 and 3.5. Again, our proof relies heavily on the self-consistency relation for \(\mu _{{\mathbf {P}}}\) presented in Sect. 3.2.

Proposition 3.16

Let \(\vartheta _{{\mathbf {P}}}\) be a primitive random substitution with associated measure \(\mu _{{\mathbf {P}}}\). Then,

$$\begin{aligned} h(\mu _{{\mathbf {P}}}) \ge \frac{1}{\lambda ^k} {\mathbf {H}}_k^{\top } {\mathbf {R}} - H(\lambda ^{-k}), \end{aligned}$$

for all \(k \in {\mathbb N}\). If \(\vartheta _{{\mathbf {P}}}\) has unique realisation paths, it is

$$\begin{aligned} h(\mu _{{\mathbf {P}}}) \ge \frac{1}{\lambda ^k} {\mathbf {H}}_k^{\top } {\mathbf {R}}, \end{aligned}$$

for all \(k \in {\mathbb N}\).

Proof

Again, it suffices to consider the case \(k=1\). We take over the notation from the proof of Proposition 3.13 with one modification. For \(\varepsilon > 0\), we now consider \(\lambda _+ = \lambda + \varepsilon \) and set

$$\begin{aligned} m = m_-(n) = \Bigl \lceil \frac{n}{\lambda _+}\Bigr \rceil . \end{aligned}$$

This is to ensure that \({\mathcal {W}}_n\) and \({\mathcal {J}}\) determine \({\mathcal {U}}_2 \cdots {\mathcal {U}}_m\) on a set of large probability, given by

$$\begin{aligned} B_n = \{ (v, u_1, \ldots , u_n,j) : |u_2 \cdots u_m| \le n - |\vartheta | \}. \end{aligned}$$

Using standard properties of conditional entropy, we get

$$\begin{aligned} H_{P_n}({\mathcal {W}}_n) \ge H_{P_n} ({\mathcal {W}}_n \, | \, {\mathcal {V}}_{[1,m]}) \ge H_{P_n} ( \mathcal U_{[2,m]} \, | \, {\mathcal {V}}_{[1,m]} ) - H_{P_n} ({\mathcal {U}}_{[2,m]} \, | \, {\mathcal {W}}_n).\qquad \end{aligned}$$
(3.9)

Just like in the proof of Proposition 3.13 it follows that

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{1}{n} H_{P_n} ( {\mathcal {U}}_{[2,m]} \, | \, {\mathcal {V}}_{[1,m]} ) = \frac{1}{\lambda _+} {\mathbf {H}}_1^{\top } {\mathbf {R}}. \end{aligned}$$

It remains to find an adequate upper bound for \(H_{P_n} (\mathcal U_{[2,m]} \, | \, {\mathcal {W}}_n)\). To that end, we introduce an additional random variable on \(\Omega _n\) via

$$\begin{aligned} \ell _m :(v, u_1, \ldots , u_n, j) \mapsto |u_2 \cdots u_m|. \end{aligned}$$

Next, we obtain

$$\begin{aligned} \begin{aligned} H_{P_n} ({\mathcal {U}}_{[2,m]} \, | \, {\mathcal {W}}_n)&\le H_{P_n} ({\mathcal {U}}_{[2,m]} \, | \, {\mathcal {W}}_n, {\mathcal {J}}, \ell _m) + H_{P_n} ({\mathcal {J}}, \ell _m \, | {\mathcal {W}}_n) \\&= H_{P_n} ({\mathcal {U}}_{[2,m]} \, | \, {\mathcal {W}}_n, {\mathcal {J}}, \ell _m) + O(\log (m)). \end{aligned} \end{aligned}$$
(3.10)

The last step follows because the number of distinct realisations of \(({\mathcal {J}}, \ell _m)\) can be bounded from above by \(|\vartheta |^2 m\). Conditioned on \({\mathcal {W}}_n, {\mathcal {J}}, \ell _m\), and provided \(\ell _m \le n- |\vartheta |\), knowledge of \({\mathcal {U}}_{[2,m]}\) is equivalent to knowledge of

$$\begin{aligned} |{\mathcal {U}}|_{[2,m]} :(v, u_1, \ldots , u_n,j) \mapsto (|u_2|, \ldots , |u_m|). \end{aligned}$$

Indeed, on the set \(B_n\) (that is, if \(\ell _m \le n -|\vartheta |\)) we observe that \({\mathcal {W}}_n, {\mathcal {J}}\) and \(\ell _m\) determine the word \(u_2 \cdots u_m\), such that knowing the lengths of the individual words allows us to infer \((u_2, \ldots , u_m)\). By conditioning,

$$\begin{aligned} H_{P_n} ({\mathcal {U}}_{[2,m]} \, | \, {\mathcal {W}}_n, {\mathcal {J}}, \ell _m)\le & {} H_{P_n} (|{\mathcal {U}}|_{[2,m]} \, | \, {\mathcal {W}}_n, {\mathcal {J}}, \ell _m) \\&+ H_{P_n} ({\mathcal {U}}_{[2,m]} \, | \, |\mathcal U|_{[2,m]}, {\mathcal {W}}_n, {\mathcal {J}}, \ell _m). \end{aligned}$$

Let \(M = \max _{a \in {\mathcal {A}}} \# \vartheta (a)\), implying \(\# \sigma ({\mathcal {U}}_{[2,m]}) \le M^m\). By the observations above, we can bound

$$\begin{aligned} H_{P_n} ({\mathcal {U}}_{[2,m]} \, | \, |{\mathcal {U}}|_{[2,m]}, \mathcal W_n, {\mathcal {J}}, \ell _m) \le P_n(B_n^C) \, m \log (M). \end{aligned}$$

Since \(P(B_n^C) \rightarrow 0\) as \(n \rightarrow \infty \) by Lemma 3.12, it follows that

$$\begin{aligned} H_{P_n} ({\mathcal {U}}_{[2,m]} \, | \, {\mathcal {W}}_n, {\mathcal {J}}, \ell _m) \le H_{P_n} (|{\mathcal {U}}|_{[2,m]} \, | \, \ell _m) + o(n). \end{aligned}$$
(3.11)

If \(\vartheta _{{\mathbf {P}}}\) has unique realisation paths, we even get that \({\mathcal {W}}_n, {\mathcal {J}}, \ell _m\) determine \(\mathcal U_{[2,m]}\) completely on \(B_n\), yielding

$$\begin{aligned} H_{P_n} ({\mathcal {U}}_{[2,m]} \, | \, {\mathcal {W}}_n, {\mathcal {J}}, \ell _m) = o(n), \end{aligned}$$

by an analogous argument. Given \(\ell _m = \ell \), the number of possible values of \(|U|_{[2,m]}\) is bounded above by the number of choices to decompose a block of length \(\ell \) into \(m-1\) smaller blocks, that is, by the binomial coefficient \(\left( {\begin{array}{c}\ell -1\\ m -2\end{array}}\right) \). Using this bound on \(B_n\) and the fixed bound \(M^m\) on \(B_n^C\), we obtain

$$\begin{aligned} H_{P_n} (|{\mathcal {U}}|_{[2,m]} \, | \, \ell _m)&\le \sum _{\ell = m-1}^{n -|\vartheta |} P_n[\ell _m = \ell ] \log {\left( {\begin{array}{c}\ell -1\\ m -2\end{array}}\right) } + P_n(B_n^C) \, m \log (M) \\&\le \log { \left( {\begin{array}{c}n\\ m-2\end{array}}\right) } + o(n) \le n\, H ((m-2)/n) + o(n). \end{aligned}$$

Since we have seen in (3.10) and (3.11) that \(H_{P_n} (|{\mathcal {U}}|_{[2,m]} \, | \, \ell _m)\) bounds \(H_{P_n}({\mathcal {U}}_{[2,m]} \, | \, {\mathcal {W}}_n)\) up to a term of order o(n), we get from (3.9) that

$$\begin{aligned} h(\mu _{{\mathbf {P}}})&= \lim _{n \rightarrow \infty } \frac{1}{n} H_{P_n}({\mathcal {W}}_n) \ge \lim _{n \rightarrow \infty } \frac{1}{n} H_{P_n}({\mathcal {U}}_{[2,m]} \, | \, {\mathcal {V}}_{[1,m]}) - \limsup _{n \rightarrow \infty } H_{P_n}({\mathcal {U}}_{[2,m]} \, | \, {\mathcal {W}}_n) \\ {}&\ge \frac{1}{\lambda _+} {\mathbf {H}}_1^{\top } {\mathbf {R}} - H(\lambda _+^{-1}) \xrightarrow { \varepsilon \rightarrow 0} \frac{1}{\lambda } {\mathbf {H}}_1^{\top } {\mathbf {R}} - H(\lambda ^{-1}). \end{aligned}$$

If \(\vartheta _{{\mathbf {P}}}\) has unique realisation paths, we have \(H_{P_n}({\mathcal {U}}_{[2,m]} \, | \, {\mathcal {W}}_n) = o(n)\), which gives the stronger bound

$$\begin{aligned} h(\mu _{{\mathbf {P}}}) \ge \frac{1}{\lambda } H^1 R, \end{aligned}$$

in this case. \(\square \)

For the remainder of this section, we restrict to the case of unique realisation paths.

Proposition 3.17

Let \(\vartheta _{{\mathbf {P}}}\) be a primitive random substitution with unique realisation paths. Then,

$$\begin{aligned} {\mathbf {H}}_{n+k}^{\top } \ge {\mathbf {H}}_n^{\top } M^k \end{aligned}$$

for all \(n,k \in {\mathbb N}\). Equality holds if and only if \(\vartheta _{{\mathbf {P}}}^{n+k}(a)\) is independent of \(\vartheta _{{\mathbf {P}}}^n(a)\) for all \(a \in {\mathcal {A}}\).

Proof

As in the proof of Proposition 3.14, we obtain

$$\begin{aligned} H(\vartheta _{{\mathbf {P}}}^{n+k}(a)) \ge H(\vartheta _{{\mathbf {P}}}^{n+k} | \vartheta _{{\mathbf {P}}}^n(a)) = {\mathbf {H}}_n^{\top } M^k e_a, \end{aligned}$$

for all \(a \in {\mathcal {A}}\). Equality holds if and only if \(\vartheta _{{\mathbf {P}}}^{n+k}(a)\) and \(\vartheta _{{\mathbf {P}}}^n(a)\) are independent random variables. \(\square \)

Corollary 3.18

Let \(\vartheta _{{\mathbf {P}}}\) be primitive with unique realisation paths. Then, for all \(m \le n\),

$$\begin{aligned} \frac{1}{\lambda ^m} {\mathbf {H}}_m^{\top } {\mathbf {R}} \le \frac{1}{\lambda ^n} {\mathbf {H}}_n^{\top } {\mathbf {R}}. \end{aligned}$$

Equality holds for all \(m \le n\) if and only if \(\vartheta _{{\mathbf {P}}}\) satisfies the identical set condition with identical production probabilities.

Proof

By Proposition 3.17, we get

$$\begin{aligned} \frac{1}{\lambda ^n} {\mathbf {H}}_n^{\top } {\mathbf {R}} \ge \frac{1}{\lambda ^n} {\mathbf {H}}_m^{\top } M^{n-m} {\mathbf {R}} = \frac{1}{\lambda ^m} {\mathbf {H}}_m^{\top } {\mathbf {R}}. \end{aligned}$$

Equality for all \(m \le n\) holds precisely if

$$\begin{aligned}\ \frac{1}{\lambda ^n} {\mathbf {H}}_n^{\top } {\mathbf {R}} = \frac{1}{\lambda } {\mathbf {H}}_1^{\top } {\mathbf {R}}, \end{aligned}$$

for all \(n \in {\mathbb N}\). This is the case if and only if for all \(a \in {\mathcal {A}}\), \(\vartheta _{{\mathbf {P}}}(a)\) is independent from \(\vartheta _{{\mathbf {P}}}^n(a)\) for all \(n \in {\mathbb N}\), which means that \(\vartheta _{{\mathbf {P}}}^{n-1}(v)\) has the same distribtution for all possible realisations v of \(\vartheta _{{\mathbf {P}}}(a)\). This is precisely the identical set condition with identical production probabilities. \(\square \)

With the results established thus far, our main results follow in a straightforward manner.

Proof of Theorem 3.3

The fact that

$$\lambda ^{-k} {\mathbf {H}}_k^{\top } {\mathbf {R}} - H(\lambda ^{-k}) \le h(\mu _{{\mathbf {P}}}) \le (\lambda ^k - 1)^{-1} {\mathbf {H}}_k^{\top } {\mathbf {R}}$$

for all \(k \in {\mathbb N}\) follows directly by combining Proposition 3.13 and Proposition 3.16. The convergence of \(\lambda ^{-k} {\mathbf {H}}_k^{\top } {\mathbf {R}}\) as \(k \rightarrow \infty \) can be seen from the reformulation of this relation in (3.2). \(\square \)

Proof of Theorem 3.5

The upper and lower bounds for \(h(\mu _{{\mathbf {P}}})\) were established in Proposition 3.13 and Proposition 3.16. The statements on the equivalent conditions for equality with the lower or upper bound are given in Corollaries 3.15 and 3.18. The fact that the sequence of lower bounds is non-decreasing is also contained in Corollary 3.18. \(\square \)

4 Measures of Maximal Entropy

4.1 Existence of Frequency Measures of Maximal Entropy

By comparing the results for measure theoretic entropy established in Sect. 3 with the results on topological entropy obtained in [17], we ascertain that for random substitution subshifts there often exists a frequency measure of maximal entropy. In particular, as a consequence of Corollary 2.14 and Theorem 3.5, we obtain that every subshift of a primitive and compatible random substitution satisfying the identical set condition or disjoint set condition has a frequency measure of maximal entropy. This measure of maximal entropy is the frequency measure corresponding to uniform probabilities.

Theorem 4.1

Let \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) be a primitive and compatible random substitution satisfying either the disjoint set condition or the identical set condition, with corresponding frequency measure \(\mu _{{\mathbf {P}}}\). If \({\mathbb {P}} [\vartheta _{{\mathbf {P}}} (a) = s] = 1/(\# \vartheta (a))\) for all \(a \in {\mathcal {A}}\) and \(s \in \vartheta (a)\), then \(\mu _{{\mathbf {P}}}\) is a measure of maximal entropy for the system \((X_{\vartheta }, S)\).

Proof

For \(a \in {\mathcal {A}}\) and \(s \in \vartheta (a)\), we have that \({\mathbb {P}} [\vartheta _{{\mathbf {P}}} (a) = s] = 1/(\#\vartheta (a))\); hence,

$$\begin{aligned} {\mathbf {H}}_1^{\top } {\mathbf {R}} = \sum _{a \in {\mathcal {A}}} R_{a} \log (\# \vartheta (a)). \end{aligned}$$

If \(\vartheta _{{\mathbf {P}}}\) satisfies the disjoint set condition, then by Theorem 3.5, we have

$$\begin{aligned} h (\mu _{{\mathbf {P}}}) = \frac{1}{\lambda -1} \sum _{a \in {\mathcal {A}}} R_{a} \log (\# \vartheta (a)). \end{aligned}$$

Thus, it follows by Corollary 2.14 that \(h (\mu _{{\mathbf {P}}}) = h_{\text {top}} (X_{\vartheta })\), and so \(\mu _{{\mathbf {P}}}\) is a measure of maximal entropy.

Assume that \(\vartheta _{{\mathbf {P}}}\) satisfies the identical set condition. Before we can apply Theorem 3.5, we first need to verify that \(\vartheta _{{\mathbf {P}}}\) has identical production probabilities. To this end, let \(a \in {\mathcal {A}}\), and u and \(v \in \vartheta (a)\). Since \(\vartheta _{{\mathbf {P}}}\) is compatible, \(|u |_{b} = |v |_{b}\) for all \(b \in {\mathcal {A}}\). Hence, if \(t \in \vartheta ^{2} (a)\), it follows that

$$\begin{aligned} {\mathbb {P}} [\vartheta _{{\mathbf {P}}} (u) = t] = \prod _{b \in {\mathcal {A}}} (\# \vartheta (b))^{-|u |_{b}} = \prod _{b \in {\mathcal {A}}} (\# \vartheta (b))^{-|v |_{b}} = {\mathbb {P}} [\vartheta _{{\mathbf {P}}} (v) = t]. \end{aligned}$$

By way of induction, let \(n \in {\mathbb {N}}\) and assume that \({\mathbb {P}} [\vartheta _{{\mathbf {P}}}^{n-1} (u) = w] = [\vartheta _{{\mathbf {P}}}^{n-1} (v) = w]\) for all \(w \in \vartheta ^{n} (a)\). Since \(\vartheta _{{\mathbf {P}}}\) satisfies the identical set condition, for all \(t \in \vartheta ^{n+1} (a)\) we have \(t \in \vartheta ^{n} (u) \cap \vartheta ^{n} (v)\), so

$$\begin{aligned} {\mathbb {P}} [\vartheta _{{\mathbf {P}}}^{n} (u) = t]&= \sum _{w \in \vartheta ^{n-1} (u)} {\mathbb {P}} [\vartheta _{{\mathbf {P}}}^{n-1} (u) = w] \, {\mathbb {P}} [\vartheta _{{\mathbf {P}}} (w) = t] \\&=\sum _{w \in \vartheta ^{n-1} (v)} {\mathbb {P}} [\vartheta _{{\mathbf {P}}}^{n-1} (v) = w] \, {\mathbb {P}} [\vartheta _{{\mathbf {P}}} (w) = t] = {\mathbb {P}} [\vartheta _{{\mathbf {P}}}^{n} (v) = t]. \end{aligned}$$

Therefore, by induction, \(\vartheta _{{\mathbf {P}}}\) has identical production probabilities, and thus, by Theorem 3.5, we have

$$\begin{aligned} h(\mu _{{\mathbf {P}}}) = \frac{1}{\lambda } \sum _{a \in {\mathcal {A}}} R_{a} \log (\# \vartheta (a)). \end{aligned}$$

This with Corollary 2.14 yields that \(h (\mu _{{\mathbf {P}}}) = h_{\text {top}} (X_{\vartheta })\). Namely, \(\mu _{{\mathbf {P}}}\) is a measure of maximal entropy. \(\square \)

In general, a primitive and compatible random substitution with uniform probabilities need not give rise to a frequency measure of maximal entropy: see, for instance, Example 5.4. However, for any subshift of a primitive and compatible random substitution, a measure of maximal entropy can be realised as a weak limit of frequency measures.

Theorem 4.2

Let X be the subshift of a primitive and compatible random substitution. There exists a sequence of frequency measures \((\mu _{n})_{n}\) such that \(\mu _{n}\) converges weakly to a measure of maximal entropy \(\mu \) for the system (XS).

Proof

Let \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) be a primitive and compatible random substitution that gives rise to the subshift \(X_{\vartheta }\), and let \(\lambda \) denote the Perron–Frobenius eigenvalue of the substitution matrix \(M_{\vartheta }\). Then, for all \(n \in {\mathbb {N}}\), the substitution \(\vartheta ^{n}\) gives rise to the same subshift as \(\vartheta \), namely \(X_{\vartheta }\). Let \({\mathbf {P}}_{n}\) denote the family of probability vectors corresponding to uniform probabilities on \(\vartheta ^{n}\), and let \(\mu _{n}\) denote the frequency measure corresponding to the random substitution \((\vartheta ^{n},{\mathbf {P}}_{n})\). Since the space of probability measures supported on a compact set and endowed with the weak topology is compact, there exists a probability measure \(\mu \) and a subsequence \((n_k)_k\) of the natural numbers such that \((\mu _{n_k})_{k \in {\mathbb {N}}}\) converges weakly to \(\mu \). By Theorem 3.5, we have

$$\begin{aligned} h (\mu _{n_k}) \ge \frac{1}{\lambda ^{n_k}} \sum _{a \in {\mathcal {A}}} R_{a} \log (\# \vartheta ^{n_k} (a)) \end{aligned}$$

for all \(k \in {\mathbb {N}}\). By Theorem 2.12, the right hand term converges to the topological entropy of the system (XS) as k tends to infinity. Hence,

$$\begin{aligned} \limsup _{k \rightarrow \infty } h (\mu _{n_k}) \ge h_{\text {top}} (X_{\vartheta }), \end{aligned}$$

and so it follows, by the upper semi-continuity of measure theoretic entropy, that \(h (\mu ) = h_{\text {top}} (X_{\vartheta })\). \(\square \)

4.2 Intrinsic Ergodicity

For a class of primitive random substitutions satisfying the disjoint set condition, the frequency measure of maximal entropy given by Theorem 4.1 is the unique measure of maximal entropy among all shift-invariant Borel probability measures. This is the content of the main result of this section (Theorem 4.8). The random substitutions considered here are all constant length and recognisable, the definition of which is given below. Recognisablity also appears in the work of Miro et al [27] on topological mixing of random substitutions and in Rust’s paper on periodic points [36].

Definition 4.3

Let \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) denote a random substitution over a finite alphabet \({\mathcal {A}}\), and suppose that \(|\vartheta (a) |\) is well-defined for all \(a \in {\mathcal {A}}\). We call \(\vartheta _{{\mathbf {P}}}\) recognisable if for all \(x \in X_{\vartheta }\) there exist a unique \(y = \cdots y_{-1}y_{0}y_{1} \cdots \in X_{\vartheta }\) and a unique integer \(k \in \{ 0, \ldots , | \vartheta (y_{0}) | - 1 \}\) with \(S^{-k}(x) \in \vartheta (y)\).

Observe that if \(\vartheta _{{\mathbf {P}}}\) is recognisable, then so is \(\vartheta ^{m}_{{\mathbf {P}}}\) for all \(m \in {\mathbb {N}}\), and if \(\vartheta _{{\mathbf {P}}}\) is of constant length \(\ell \), then recognisability implies that every \(x \in X_{\vartheta }\) is contained in precisely one of the sets \(S^{k}(\vartheta (X_{\vartheta }))\) for \(k \in \{1, \ldots , \ell \}\). Further, we have the following local version of recognisability. This is similar to the case of deterministic substitutions where an equivalence between global and local recognisability holds. Intuitively, local recognisability means that applying a finite window to a sequence is enough to determine the position and the type of the inflation word in the middle of that window.

Lemma 4.4

Let \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) denote a primitive random substitution over an alphabet \({\mathcal {A}}\), and suppose that \(|\vartheta (a) |\) is well-defined for all \(a \in {\mathcal {A}}\). If \(\vartheta _{{\mathbf {P}}}\) is recognisable, then there exists a smallest natural number \(\kappa (\vartheta )\), called the recognisability radius of \(\vartheta \), with the following property. If \(x \in \vartheta ([a])\) for some \(a \in {\mathcal {A}}\) and \(x_{[-\kappa (\vartheta ),\kappa (\vartheta )]} = y_{[-\kappa (\vartheta ),\kappa (\vartheta )]}\) for some \(y \in X_{\vartheta }\), then \(y \in \vartheta ([a])\).

Proof

By way of contradiction, suppose there is no radius of recognisability. In which case, there exists a sequence of tuples \(((x^{(k)},y^{(k)}))_{k \in {\mathbb {N}}}\) with \((x^{(k)},y^{(k)}) \in \vartheta ([a]) \times \vartheta ( [a])^{C}\) and \(x^{(k)}_{[-k,k]} = y^{(k)}_{[-k,k]}\) for all \(k \in {\mathbb {N}}\). Let \((x,y) \in X_{\vartheta } \times X_{\vartheta }\) be an accumulation point of this sequence. By recognisability,

$$\begin{aligned} X_{\vartheta } = \bigsqcup _{b \in {\mathcal {A}}} \bigsqcup _{k = 0}^{|\vartheta (b) |- 1} S^{k} (\vartheta ([b])), \end{aligned}$$

and by construction, \(x = y\). Due to Lemma 2.5, and since S is continuous, we have that \(S^{k}(\vartheta ([b]))\) is compact for all \(b \in {\mathcal {A}}\) and \(k \in \mathbb Z\). Hence, both \(\vartheta ([a])\) and \(\vartheta ([a])^{C}\) are compact. It therefore follows that \(x \in \vartheta ([a])\) and \(x= y \in \vartheta ([a])^{C}\), leading to a contradiction. \(\square \)

Lemma 4.5

Assume the setting of Lemma 4.4. If the random substitution \(\vartheta _{{\mathbf {P}}}\) is recognisable, then it satisfies the disjoint set condition.

Proof

By way of contradiction, suppose that \(\vartheta _{{\mathbf {P}}}\) does not satisfy the disjoint set condition. In which case, there exist \(a \in {\mathcal {A}}\), and s and \(t \in \vartheta (a)\) with \(s \ne t\) and \(\vartheta (s) \cap \vartheta (t) \ne \varnothing \). For \( x \in [a]\), observe that there exist y and \(z \in \vartheta (x)\) such that \(y_{[0, |\vartheta (a) |-1]} = s\), \(z_{[0, |\vartheta (a) |- 1]} = t\), and y coincides with z at all other positions. Hence, there exists a \(w \in \vartheta (y) \cap \vartheta (z)\) that can be constructed by mapping s and t to the same word \(v \in \vartheta (s) \cap \vartheta (t)\). This is a contradiction to recognisability. \(\square \)

The converse of this statement does not hold: a counterexample is given by the random period doubling substitution. When establishing intrinsic ergodicity for certain random substitutions, we will be concerned with recognisability for some power of those random substitutions. It follows from a simple recursive argument that the recognisability radius of \(\vartheta ^{m}_{{\mathbf {P}}}\) grows (asymptotically) at most with the inflation factor as m increases. For constant length substitutions the precise result reads as follows.

Lemma 4.6

Let \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) be a primitive random substitution of constant length \(\ell \). If \(\vartheta _{{\mathbf {P}}}\) is recognisable, then for all \(m \in {\mathbb {N}}\), we have that

$$\begin{aligned} \kappa (\vartheta ^{m}) \le \frac{\ell ^{m} -1}{\ell -1} \kappa (\vartheta ). \end{aligned}$$

Proof

We proceed by induction. The result is immediate for \(m =1\). Assume it holds for some \(m \in {\mathbb {N}}\), and note, by primitivity, that \(X_{\vartheta } = X_{\vartheta ^{m}}\). Let \(a \in {\mathcal {A}}\), \(x \in \vartheta ^{m+1}([a])\) and \(y \in X_{\vartheta }\) with \(x_{[-k,k]} = y_{[-k,k]}\) for \(k = \ell \kappa (\vartheta ^{m}) + \kappa (\vartheta )\); in particular, \(y \in \vartheta (X_{\vartheta })\). Let \(v \in \vartheta ^{m}([a])\) be such \(x \in \vartheta (v)\), and let \(w \in X_{\vartheta }\) such that \(y \in \vartheta (w)\). Applying local recognisability to the pair \((S^{j\ell }x,S^{j\ell }y)\) for each \(j \in \{-\kappa (\vartheta ^{m}), \ldots , \kappa (\vartheta ^{m}) \}\), in combination with Lemma 4.5, we obtain that \(v_{[-\kappa (\vartheta ^{m}), \kappa (\vartheta ^{m})]} = w_{[-\kappa (\vartheta ^{m}), \kappa (\vartheta ^{m})]}\). By the definition of \(\kappa (\vartheta ^{m})\), this implies \(w \in \vartheta ^{m}([a])\) and so \(y \in \vartheta (w) \subseteq \vartheta ^{m+1}([a])\), yielding

$$\begin{aligned} \kappa (\vartheta ^{m+1}) \le \ell \kappa (\vartheta ^{m}) + \kappa (\vartheta ) = \kappa (\vartheta ) \sum _{j = 0}^{m} \ell ^{j} = \frac{\ell ^{m} - 1}{\ell - 1} \kappa (\vartheta ), \end{aligned}$$

where the second to last equality follows from the inductive hypothesis. \(\square \)

Since every primitive recognisable random substitution \(\vartheta _{{\mathbf {P}}}\) satisfies the disjoint set condition, if \(\vartheta _{{\mathbf {P}}}\) is compatible, then Theorem 4.1 gives that the frequency measure corresponding to uniform probabilities is a measure of maximal entropy. Without compatibility, we may not utilise Theorem 2.12 to obtain a formula for the topological entropy of the corresponding subshift. However, we can compute directly the topological entropy for a class of random substitution subshifts that includes all the non-compatible random substitution subshifts for which we prove intrinsic ergodicity in Theorem 4.8. This is the content of Lemma 4.7. Combining this with Theorem 3.5 gives that the frequency measure corresponding to uniform probabilities is a measure of maximal entropy.

Lemma 4.7

Let \(\vartheta _{{\mathbf {P}}}\) be a primitive recognisable random substitution of constant length \(\ell \). If there exists an \(N \in {\mathbb {N}}\) such that \(\# \vartheta (a) = N\) for all \(a \in {\mathcal {A}}\), then

$$\begin{aligned} h_{\text {top}} (X_{\vartheta }) = \frac{1}{\ell -1} \log (N). \end{aligned}$$
(4.1)

In particular, the frequency measure \(\mu \) corresponding to uniform probabilities is a measure of maximal entropy for the subshift \(X_{\vartheta }\).

Proof

For \(m \in {\mathbb {N}}\), we have

$$\begin{aligned} {\mathcal {L}}_{\vartheta }^{m\ell } = \bigcup _{v \in {\mathcal {L}}_{\vartheta }^{m+1}} \bigcup _{u \in \vartheta (v)} \bigcup _{j=1}^{\ell } \left\{ u_{[j,j+\ell m-1]} \right\} . \end{aligned}$$

Since by our hypothesis and Lemma 4.5 we have \(\# \vartheta (v) = N^{|v |}\) for all \(v \in {\mathcal {L}}_{\vartheta }^{m+1}\), it follows that \(\# {\mathcal {L}}_{\vartheta }^{m\ell } \le \ell N^{m+1} \# {\mathcal {L}}_{\vartheta }^{m+1}\), and so

$$\begin{aligned} h_{\text {top}} (X_{\vartheta }) = \lim _{m \rightarrow \infty } \frac{\log (\# {\mathcal {L}}_{\vartheta }^{m\ell })}{m\ell } \le \frac{1}{\ell } \log (N) + \frac{1}{\ell } h_{\text {top}} (X_\vartheta ). \end{aligned}$$
(4.2)

On the other hand,

$$\begin{aligned} {\mathcal {L}}_{\vartheta }^{m\ell } \supseteq \vartheta ({\mathcal {L}}_{\vartheta }^{m}) = \bigcup _{v \in {\mathcal {L}}_{\vartheta }^{m}} \vartheta (v), \end{aligned}$$

By recognisability, there is a number \(r \le \kappa (\vartheta )\) such that for \(u,v \in {\mathcal {L}}_{\vartheta }^m\) with \(v_{[r+1, m -r]} \ne u_{[r+1, m -r]}\) we have \(\vartheta (u) \cap \vartheta (v) = \varnothing \). Hence, \(\# {\mathcal {L}}_{\vartheta }^{m \ell } \ge N^m \# {\mathcal {L}}_{\vartheta }^{m - 2r}\), so

$$\begin{aligned} h_{\text {top}} (X_{\vartheta }) = \lim _{m \rightarrow \infty } \frac{1}{m \ell } \log (\# {\mathcal {L}}_\vartheta ^{m\ell }) \ge \frac{1}{\ell } \log (N) + \frac{1}{\ell } h_{\text {top}} (X_{\vartheta }). \end{aligned}$$
(4.3)

Combining (4.2) and (4.3) and rearranging yields (4.1). To see that \(\mu \) is a measure of maximal entropy for the subshift \(X_{\vartheta }\), observe that by Theorem 3.5 and Lemma 4.5, we have

$$\begin{aligned} h(\mu ) = \frac{1}{\ell - 1} \sum _{a \in {\mathcal {A}}} R_{a} \sum _{s \in \vartheta (a)} \frac{1}{N} \log (N) = \frac{1}{\ell -1} \log (N) \text {,} \end{aligned}$$

since \(\# \vartheta (a) = N\) for all \(a \in {\mathcal {A}}\) and \(\sum _{a \in {\mathcal {A}}} R_{a} = 1\). Hence \(h (\mu ) = h_{\text {top}} (X_{\vartheta })\).

\(\square \)

Remark 4.1

In contrast to the compatible case, it is not true in general that for a primitive and constant length random substitution the measure corresponding to uniform probabilities is a measure of maximal entropy. We present an example of such a random substitution in Example 5.3.

We now give the statement of the main result of this section, Theorem 4.8.

Theorem 4.8

Let \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) be a primitive recognisable random substitution of constant length \(\ell \) and assume that at least one of the following holds:

  1. (i)

    \(\vartheta (a)\) has the same cardinality for all \(a \in {\mathcal {A}}\);

  2. (ii)

    \(\vartheta _{{\mathbf {P}}}\) is compatible and \(\ell \) is the only non-zero eigenvalue of the substitution matrix.

Under these hypotheses, the system \((X_{\vartheta }, S)\) is intrinsically ergodic. Moreover, the unique measure of maximal entropy is the frequency measure corresponding to uniform probabilities.

The proof of Theorem 4.8 is presented in Sect. 4.4. We note that the subshifts considered in Theorem 4.8 do not satisfy the specification property of Bowen [5] or the weaker specification property of Climenhaga and Thompson [8]. Compare also Remark 4.2 below.

4.3 Gibbs Properties of Frequency Measures

The proof of Theorem 4.8 follows a similar approach to the proof that the Parry measure is the unique measure of maximal entropy for irreducible shifts of finite type, due to Adler and Weiss [2]. An important feature of their proof is a Gibbs property, which states that for the measure of maximal entropy \(\mu \), there exist constants \(A,B > 0\) such that \(A \mathrm {e}^{-|u |h} \le \mu ([u]) \le B \mathrm {e}^{-|u |h}\) for every legal word u, where h denotes the topological entropy of the system. Such a Gibbs property does not hold for the subshifts considered in Theorem 4.8. However, we can obtain a weaker Gibbs property for cylinder sets of exact inflation words. This is the content of Lemma 4.12, which utilises the following auxiliary results.

Lemma 4.9

Let \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) be a primitive random substitution with corresponding frequency measure \(\mu _{{\mathbf {P}}}\). If for every \(a_i \in {\mathcal {A}}\), the length \(|\vartheta (a_i) |\) is well-defined, then for all \(v \in {\mathcal {L}}_{\vartheta }\) and \(w \in \vartheta (v)\),

$$\begin{aligned} \mu _{{\mathbf {P}}} ([w]) \ge \frac{1}{\lambda } \mu _{{\mathbf {P}}} ([v]) {\mathbb {P}} [\vartheta _{{\mathbf {P}}} (v) = w]. \end{aligned}$$

If in addition, \(\vartheta _{{\mathbf {P}}}\) is recognisable and constant length, and \(|\vartheta (v) |> 2 \kappa (\vartheta )\), then

$$\begin{aligned} \mu _{{\mathbf {P}}} ([w]) = \frac{1}{\lambda } \sum _{u \in \mathcal L_{\vartheta }^{|v|}} \mu _{{\mathbf {P}}} ([u]) {\mathbb {P}} [\vartheta _{{\mathbf {P}}} (u) = w]. \end{aligned}$$

Proof

Let \(v \in {\mathcal {L}}_{\vartheta }\) and let \(w \in \vartheta (v)\) be fixed. Let \(n = |w|\) and \({\mathcal {J}}_{n}(v) = \{ u \in {\mathcal {L}}_{\vartheta }^{n} : u_{[1, |v |]} = v \}\). Since we assumed that the lengths of inflation words are well-defined, the relation in Lemma 3.8 simplifies to

$$\begin{aligned} \mu _{{\mathbf {P}}} ([w]) = \frac{1}{\lambda } \sum _{u \in {\mathcal {L}}_{\vartheta }^{n}} \mu _{{\mathbf {P}}} ([u]) \sum _{j = 1}^{|\vartheta (u_1) |} {\mathbb {P}} [\vartheta _{{\mathbf {P}}} (u)_{[j, j + |w |- 1]} = w] \end{aligned}$$

Using that [v] is the union of all [u] with \(u \in {\mathcal {J}}_{n}(v)\) we thereby obtain

$$\begin{aligned} \mu _{{\mathbf {P}}} ([w])&\ge \frac{1}{\lambda } \sum _{u \in {\mathcal {J}}_{n}(v)} \mu _{{\mathbf {P}}} ([u]) {\mathbb {P}} [\vartheta _{{\mathbf {P}}} (u)_{[1,|w |]} = w] = \frac{1}{\lambda } \sum _{u \in {\mathcal {J}}_{n}(v)} \mu _{{\mathbf {P}}} ([u]) {\mathbb {P}} [\vartheta _{{\mathbf {P}}} (v) = w] \\&= \frac{1}{\lambda } \mu _{{\mathbf {P}}} ([v]) {\mathbb {P}} [\vartheta _{{\mathbf {P}}} (v) = w]. \end{aligned}$$

If \(\vartheta _{{\mathbf {P}}}\) is recognisable and of constant length, and \(|\vartheta (v) |> 2 \kappa (\vartheta )\), then there is a unique way to decompose w into inflation words. However, there might still be several words \(u \in {\mathcal {L}}_{\vartheta }\) with \(|u| = |v|\) such that \(w \in \vartheta (u)\). Lemma 3.8 yields

$$\begin{aligned} \mu _{{\mathbf {P}}} ([w]) = \frac{1}{\lambda } \sum _{u \in \mathcal L_{\vartheta }^{|v|}} \mu _{{\mathbf {P}}} ([u]) {\mathbb {P}} [\vartheta _{{\mathbf {P}}} (u) = w]. \end{aligned}$$

\(\square \)

Lemma 4.10

Let \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) be a primitive random substitution satisfying the disjoint set condition. Assume that \({\mathbb {P}} [\vartheta _{{\mathbf {P}}} (a) = s] = 1/\# \vartheta (a)\) for all \(a \in {\mathcal {A}}\) and \(s \in \vartheta (a)\) and that at least one of the following conditions is satisfied:

  1. (i)

    \(\vartheta _{{\mathbf {P}}}\) is of constant length \(\ell \) and \(\# \vartheta (a) = \# \vartheta (b)\) for all \(a, b \in {\mathcal {A}}\);

  2. (ii)

    \(\vartheta _{{\mathbf {P}}}\) is compatible and the second largest eigenvalue \(\tau \) of the substitution matrix satisfies \(|\tau |< 1\).

Under these hypotheses, there exists a constant \(c > 0\) such that \({\mathbb {P}}[\vartheta _{{\mathbf {P}}}^{m}(a) = w] \ge c \mathrm {e}^{- |w |h_{\text {top}}(X_{\vartheta })}\) for all \(m \in {\mathbb {N}}\), \(a \in {\mathcal {A}}\) and \(w \in \vartheta ^{m}(a)\). In particular, when \(\vartheta _{{\mathbf {P}}}\) is of constant length, we have that \({\mathbb {P}} [\vartheta _{{\mathbf {P}}}^m (a) = w] = \mathrm {e}^{h_{\text {top}}(X_{\vartheta })} \mathrm {e}^{-|w| h_{\text {top}}(X_{\vartheta })}\).

Proof

As \(\vartheta _{{\mathbf {P}}}\) satisfies the disjoint set condition, by induction, for \(a \in {\mathcal {A}}\), \(m \in {\mathbb {N}}\) and \(w \in \vartheta ^{m} (a)\),

$$\begin{aligned} {\mathbb {P}}[\vartheta _{{\mathbf {P}}}^{m} (a) = w] = \frac{1}{\# \vartheta ^{m}(a)}. \end{aligned}$$
(4.4)

Let us first consider case (i). Since \(\vartheta _{{\mathbf {P}}}\) satisfies the disjoint set condition, we have \(\# \vartheta ^{m}(a) = \# \vartheta ^{m}(b)\) for all \(a, b \in {\mathcal {A}}\). Hence, by Corollary 2.14, and since the right Perron–Frobenius eigenvector \({\mathbf {R}}\) of \(\vartheta _{{\mathbf {P}}}\) is normalised so that \(\Vert {\mathbf {R}} \Vert _{1} = 1\), we have

$$\begin{aligned} \log (\# \vartheta ^{m}(a)) =&\sum _{b \in {\mathcal {A}}} R_b \log (\#\vartheta ^{m}(b)) = (\ell ^{m} - 1)h_{\text {top}}(X_{\vartheta }) \\ =&|\vartheta ^{m}(a) |h_{\text {top}}(X_{\vartheta }) - h_{\text {top}}(X_{\vartheta }). \end{aligned}$$

Taking the exponential of both sides, we conclude from (4.4) that \({\mathbb {P}} [\vartheta _{{\mathbf {P}}} (a) = w] = \mathrm {e}^{h_{\text {top}}(X_{\vartheta })} \mathrm {e}^{-|w| h_{\text {top}}(X_{\vartheta })}\). Let us now consider case (ii). Since the Perron–Frobenius eigenvalue \(\lambda \) of \(\vartheta _{{\mathbf {P}}}\) is simple, we can split the substitution matrix M as \(M = \lambda {\mathbf {R}} {\mathbf {L}}^{\top } + N\), where \({\mathbf {R}}\) and \({\mathbf {L}}\) are respectively the right and left Perron–Frobenius eigenvectors of \(\vartheta _{{\mathbf {P}}}\) and where \(N {\mathbf {R}} {\mathbf {L}}^{\top } = 0 = {\mathbf {R}} {\mathbf {L}}^{\top } N\). Since \(\vartheta _{{\mathbf {P}}}\) satisfies the disjoint set condition, it follows by [17, Lemma 10], that \({\mathbf {q}}_{m}^{\top } = {\mathbf {q}}_{1}^{\top } \sum _{k=0}^{m-1} M^{k}\), for all \(m \in {\mathbb {N}}\), and where \({\mathbf {q}}_{m}\) is as defined in (2.2). Hence,

$$\begin{aligned} {\mathbf {q}}_{m}^{\top } =\&{\mathbf {q}}_{1}^{\top } \sum _{k=0}^{m-1} M^{k} = {\mathbf {q}}_1^{\top } \sum _{k = 0}^{m-1} \lambda ^{k} {\mathbf {R}} {\mathbf {L}}^{\top } + {\mathbf {q}}_1^{\top } \sum _{k=0}^{m-1} N^{k}\\ =\&\frac{\lambda ^{m} - 1}{\lambda - 1} {\mathbf {q}}_1^{\top } {\mathbf {R}} {\mathbf {L}}^{\top } + {\mathbf {q}}_1^{\top } \sum _{k=0}^{m-1} N^{k} =\, (\lambda ^{m} - 1) h_{\text {top}}(X_{\vartheta }) {\mathbf {L}}^{\top } + {\mathbf {q}}_1^{\top } \sum _{k=0}^{m-1} N^{k}. \end{aligned}$$

By construction, \(\tau \) is the dominant eigenvalue of N, and so there exists a \(c > 0\) and \(n \in {\mathbb N}\) such that \(\Vert N^{k} \Vert _{\infty } < c k^n |\tau |^{k}\) for all \(k \in {\mathbb {N}}\). Hence, there is \(r \in {\mathbb R}\) with \(|\tau |< r < 1\) such that \(\Vert N^{k} \Vert _{\infty } < c r^k\). We therefore obtain

$$\begin{aligned} \log (\# \vartheta ^{m}(a))&= q_{m,a} \! \le (\lambda ^{m} - 1)L_{a} h_{\text {top}}(X_{\vartheta }) + \! \Vert {\mathbf {q}}_1 \Vert _{\infty } \! \sum _{k = 0}^{m-1} \Vert N^{k} \Vert _{\infty }\\&\le (\lambda ^{m} - 1)L_{a} h_{\text {top}}(X_{\vartheta }) + \! \frac{c}{1 - { r }} \Vert {\mathbf {q}}_1 \Vert _{\infty }, \end{aligned}$$

where \(q_{m,a}\) is as defined in (2.2). On the other hand, by Lemma 3.11, we have that

$$\begin{aligned} |\vartheta ^{m}(a) |\ge L_{a} \lambda ^{m} - D |\tau |^{m} \ge L_{a} \lambda ^{m} - D, \end{aligned}$$

for some \(D>0\). Hence, there exists a constant \(C >0\) such that \(\log (\# \vartheta ^{m}(a)) \le |\vartheta ^{m}(a) |h + C\). Taking the exponential of both sides, we conclude from (4.4) that \({\mathbb {P}} [\vartheta _{{\mathbf {P}}}^m (a) = w] \ge \mathrm {e}^{-|w| h} \mathrm {e}^{-C}\). Setting \(c = e^{-C}\) completes the proof. If \(\vartheta _{{\mathbf {P}}}\) is additionally assumed to be of constant length, then \(\tau = 0\) since the eigenvalues of the matrix associated to a constant length substitution are integers. In this case, the matrix M satisfies \(M = \lambda {\mathbf {R}} {\mathbf {L}}^{\top }\), where \({\mathbf {L}} = (1, \ldots , 1)\) by the constant length property. Thus, it follows by the same arguments as above that \(\log (\# \vartheta ^{m} (a)) = (\lambda ^m - 1) h_{\text {top}}(X_{\vartheta })\). Taking the exponential of both sides, it follows from (4.4) that \({\mathbb {P}} [\vartheta _{{\mathbf {P}}}^m (a) = w] = \mathrm {e}^{h_{\text {top}}(X_{\vartheta })} \mathrm {e}^{-|w| h_{\text {top}}(X_{\vartheta })}\). \(\square \)

Lemma 4.11

If \(\vartheta _{{\mathbf {P}}} = (\vartheta , {\mathbf {P}})\) satisfies either of the conditions of Lemma 4.10, and if \(\mu _{{\mathbf {P}}}\) denotes the corresponding frequency measure, then there exists a constant \(c >0\) such that

$$\begin{aligned} \mu _{{\mathbf {P}}} ([w]) \ge \mu _{{\mathbf {P}}} ([v]) \frac{c^{|v |}}{|w |\mathrm {e}^{|w |h_{\text {top}}(X_{\vartheta })}} \end{aligned}$$

for all \(v \in {\mathcal {L}}_{\vartheta }\), \(m \in {\mathbb {N}}\) and \(w \in \vartheta ^{m} (v)\). If, in addition, \(\vartheta _{{\mathbf {P}}}\) is constant length and recognisable and \(|v |> 2 \kappa (\vartheta )\), then

$$\begin{aligned} \mu _{{\mathbf {P}}} ([w]) \le \frac{|v |\mathrm {e}^{|v |h_{\text {top}}(X_{\vartheta })}}{|w |\mathrm {e}^{|w |h_{\text {top}}(X_{\vartheta })}} \text {.} \end{aligned}$$

Proof

Let \(v \in {\mathcal {L}}_{\vartheta }\), \(m \in {\mathbb {N}}\) and \(w \in \vartheta ^{m}(v)\) be fixed. Applying Lemma 4.9 to \(\vartheta _{{\mathbf {P}}}^{m}\) yields

$$\begin{aligned} \mu _{{\mathbf {P}}} ([w]) \ge \frac{1}{\lambda ^{m}} \mu _{{\mathbf {P}}} ([v]) {\mathbb {P}}[\vartheta _{{\mathbf {P}}}^{m} (v) = w]. \end{aligned}$$
(4.5)

Since \(\vartheta _{{\mathbf {P}}}\) is compatible or constant length, we can decompose w into subwords \(w = w^{(1)} \cdots w^{(|v |)}\) such that \(w^{(j)} \in \vartheta ^{m} (v_{j})\) for all \(j \in \{ 1, \ldots , |v |\}\). Hence, it follows by Lemma 4.10 that there is a constant \(c > 0\) such that

$$\begin{aligned} {\mathbb {P}}[\vartheta _{{\mathbf {P}}}^{m} (v) = w] =&\prod _{j = 1}^{|v |} {\mathbb {P}}[\vartheta _{{\mathbf {P}}}^{m} (v_j) = w^{(j)}]\nonumber \\ \ge&\prod _{j=1}^{|v |} c \, \mathrm {e}^{-|w^{(j)} |h_{\text {top}}(X_{\vartheta })} = c^{|v |} \mathrm {e}^{-|w |h_{\text {top}}(X_{\vartheta })}. \end{aligned}$$
(4.6)

By Lemma 3.11, there is a universal constant \(D > 0\) such that \(\lambda ^{m} \le D |\vartheta ^{m}(a) |\) for all \(m \in {\mathbb {N}}\) and \(a \in {\mathcal {A}}\). Combining this with (4.5) and (4.6) yields the required result.

Now, assume additionally that \(\vartheta _{{\mathbf {P}}}\) is recognisable and of constant length \(\ell \). Then by Lemma 4.10 we have that \({\mathbb {P}} [\vartheta _{{\mathbf {P}}}^{m} (u) = w] = \mathrm {e}^{|u |h_{\text {top}}(X_{\vartheta })} \mathrm {e}^{-|w |h_{\text {top}}(X_{\vartheta })}\) for every \(u \in {\mathcal {L}}_{\vartheta }^{|v|}\) with \(w \in \vartheta (u)\). Thus, the lower bound follows by identical arguments to the above, taking \(c = e^{h_{\text {top}}(X_{\vartheta })}\). For the upper bound, observe that if \(|v |> 2 \kappa (\vartheta )\), we also have \(|\vartheta ^m(v)| = \ell ^m |v| > 2 \kappa (\vartheta ^m)\), for all \(m \in {\mathbb {N}}\) by Lemma 4.6. Hence, noting that \(|u| = |v|\) and \(\ell ^{-m} = |v |/|w |\), we find by Lemma 4.9,

$$\begin{aligned} \mu _{{\mathbf {P}}} ([w]) = \frac{1}{\ell ^m} \sum _{u \in \mathcal L_{\vartheta }^{|v|}} \mu _{{\mathbf {P}}} ([u]) {\mathbb {P}} [\vartheta ^m_{{\mathbf {P}}} (u) = w] \le \frac{|v |\mathrm {e}^{|v |h_{\text {top}}(X_{\vartheta })}}{ |w |\mathrm {e}^{|w |h_{\text {top}}(X_{\vartheta })}}. \end{aligned}$$

\(\square \)

In the proof of Theorem 4.8 we only require the lower bound of Lemma 4.11. However, the upper bound allows us to show that the subshifts we consider in Theorem 4.8 do not satisfy the Gibbs property, therefore do not satisfy the specification property of [5]. Instead, these subshifts satisfy the following Gibbs-like property.

Lemma 4.12

Let \(\vartheta _{{\mathbf {P}}}\) be a random substitution satisfying the conditions of Theorem 4.8, and let \(\mu _{{\mathbf {P}}}\) denote the corresponding frequency measure. Then there exist constants \(c_1, c_2 > 0\) such that for all \(a \in {\mathcal {A}}\), \(m \in {\mathbb {N}}\) and \(w \in \vartheta ^m (a)\),

$$\begin{aligned} \frac{c_1}{|w|} \mathrm {e}^{-|w| h_{\text {top}}(X_{\vartheta })} \le \mu _{{\mathbf {P}}} ([w]) \le \frac{c_2}{|w|} \mathrm {e}^{-|w| h_{\text {top}}(X_{\vartheta })} \text {.} \end{aligned}$$

Proof

The lower bound follows immediately from Lemma 4.11, taking \(c_1 = \min _{a \in {\mathcal {A}}} c \mu _{{\mathbf {P}}} ([a])\) where c is the constant given by Lemma 4.11. For the upper bound, let M be the least integer such that \(\ell ^{M} > 2 \kappa (\vartheta )\) and set \(c_2 = \max _{u \in {\mathcal {L}}_{\vartheta }, \, |u| \le \ell ^{M}} |u |\mathrm {e}^{|u |h_{\text {top}}(X_{\vartheta })}\). Clearly \(\mu _{{\mathbf {P}}} ([w]) \le c_2 \mathrm {e}^{-|w |h_{\text {top}}(X_{\vartheta })} / |w |\) if \(|w |\le \ell ^{M}\), since \(\mu _{{\mathbf {P}}}\) is a probability measure. On the other hand, if \(m > M\) and \(w \in \vartheta ^{m} (a)\) then it follows by Lemma 4.11 that there is a \(v \in \vartheta ^{M} (a)\) such that

$$\begin{aligned} \mu _{{\mathbf {P}}} ([w]) \le \frac{|v |\mathrm {e}^{|v |h_{\text {top}}(X_{\vartheta })}}{|w |\mathrm {e}^{|w |h_{\text {top}}(X_{\vartheta })}} \le \frac{c_2}{|w |} \mathrm {e}^{-|w |h_{\text {top}}(X_{\vartheta })} \text {.} \end{aligned}$$

\(\square \)

Remark 4.2

The upper bound on \(\mu _{{\mathbf {P}}}\) in Lemma 4.12 is irreconcilable with the bound for the unique measure of maximal entropy on subshifts with a weak specification property established in [8, Lemma 5.12]. For the subshifts \(X_{\vartheta }\) with random substitutions as in Theorem 4.8, \(\mu _{{\mathbf {P}}}\) (with \({\mathbf {P}}\) the uniform distribution) is the unique measure of maximal entropy. Hence, each such \(X_{\vartheta }\) does not satisfy the weak specification property in [8]. In particular, Theorem 4.8 establishes intrinsic ergodicity for subshifts beyond the more classical context of subshifts with (weak) specification.

4.4 Proof of Theorem 4.8

We now present the proof of Theorem 4.8. In addition to the Gibbs property proved in the previous section, we also utilise the following result, which is proved in [14].

Lemma 4.13

[14, Lemma 8.8] Let (Xd) be a compact metric space and let \(\varrho \) be a Borel probability measure on X. If \(B \subset X\) is measurable and \((\xi _{n})_{n \in {\mathbb {N}}}\) is a sequence of finite measurable partitions of X for which \(\lim _{n \rightarrow \infty } \max _{P \in \xi _n} {\text {diam}}(P) = 0\), then there exists a sequence of sets \((A_{n})_{n \in {\mathbb {N}}}\) with \(A_{n} \in \sigma (\xi _{n})\) and \(\lim _{n \rightarrow \infty } \varrho (A_{n} \triangle B) = 0\). Here, \(\sigma (\xi _{n})\) denotes the sigma algebra generated by the partition \(\xi _{n}\).

Proof of Theorem 4.8

Let \(\mu \) denote the frequency measure of maximal entropy given by Theorem 4.1 or Lemma 4.7, and let \(m \in {\mathbb {N}}\). For each \(k \in \{ 0, \ldots , \ell ^{m} -1 \}\), let \(X_{m,k}\) denote the subset of \(X_{\vartheta }\) defined by \(X_{m,k} = S^{k}(\vartheta ^{m}(X_{\vartheta }))\). It follows by recognisability that these subsets are pairwise disjoint for different choices of k. Note, by Lemma 2.5 the subsets \(X_{m, k}\) are closed, and since by the constant length property

$$\begin{aligned} S^{\ell ^{m}} (X_{m,k}) = S^{\ell ^{m}} (S^{k} (\vartheta ^{m}(X_{\vartheta }))) = S^{k} (\vartheta ^{m}(S X_{\vartheta })) = S^{k}(\vartheta ^{m}(X_{\vartheta })) = X_{m,k}, \end{aligned}$$

\(X_{m, k}\) is \(S^{\ell ^{m}}\)-invariant. In other words, \(X_{m, k}\) is a subshift under \(S^{\ell ^{m}}\). Since every \(x \in X_\vartheta \) can be split into level m inflation words, we have

$$\begin{aligned} X_\vartheta = \bigsqcup _{k = 0}^{\ell ^{m} - 1} X_{m,k}, \end{aligned}$$

where the union is disjoint due to recognisability. Lemma 4.6 implies that \(r = \left\lceil \kappa (\vartheta )/(\ell -1) \right\rceil + 1\) satisfies

$$\begin{aligned} \ell ^{m} r > \frac{\ell ^{m} -1}{\ell -1} \kappa (\vartheta ) + \ell ^{m} \ge \kappa (\vartheta ^{m}) + \ell ^{m}. \end{aligned}$$

By the constant length property, this ensures that every word of length at least \(2 r \ell ^{m}\) has a unique decomposition into inflation words. This together with Lemma 4.4 implies, for all \(u \in {\mathcal {L}}_{\vartheta }^{2r}\) and \(w \in \vartheta ^{m}(u)\), that \(|w |= 2r \ell ^{m}\) and \(S^{r \ell ^{m}}([w]) \subset \vartheta ^{m}(X_{\vartheta })\). Let us consider the following partition of \(X_{m,k}\):

$$\begin{aligned} \xi _{m,k} = S^{r \ell ^{m}} \left( \left\{ S^{k}([w]) : w \in \vartheta ^{m}(u) \; \text {and} \; u \in {\mathcal {L}}_{\vartheta }^{2r}\right\} \right) . \end{aligned}$$

This in turn yields a partition of \(X_{\vartheta }\), namely

$$\begin{aligned} \xi _{m} = \bigcup _{k = 0}^{\ell ^{m} - 1} \xi _{m,k}. \end{aligned}$$

By way of a contradiction, assume that \(\nu \ne \mu \) is another ergodic measure of maximal entropy. Since distinct ergodic measures are mutually singular, there exists an S-invariant set B with \(\mu (B) = 0\) and \(\nu (B) = 1\). Note, the diameter of the atoms of \(\xi _m\) tends uniformly to zero as m tends to infinity, and so \((\xi _m)_{m \in {\mathbb {N}}}\) meets the requirements of Lemma 4.13. Applying it to the measure \(\varrho ' = (\mu + \nu )/2\) we obtain that, given \(\varepsilon > 0\), there exists \(m \in {\mathbb {N}}\) and \(A_m \in \sigma (\xi _m)\) such that

$$\begin{aligned} (\mu + \nu ) (A_m \triangle B) < \varepsilon . \end{aligned}$$
(4.7)

For \(k \in \{ 0, \ldots , \ell ^{m} - 1\}\), let \(A_{m,k} = A_m \cap X_{m,k}\) and \(B_{m,k} = B \cap X_{m,k}\), and define the conditional probability measures \(\mu _{m,k}\) and \(\nu _{m,k}\) by

$$\begin{aligned} \mu _{m,k} = \frac{1}{\mu (X_{m,k})} \, \mu \vert _{X_{m,k}} \quad \text {and} \quad \nu _{m,k} = \frac{1}{\nu (X_{m,k})} \, \nu \vert _{X_{m,k}}. \end{aligned}$$

For all \(j \in \{ 0, \ldots , \ell ^{m} - 1 \}\), we have \(S^{k-j} (X_{m,j}) = X_{m,k}\), and since \(\mu \) and \(\nu \) are S-invariant and since the sets \(X_{m,k}\) are disjoint, it follows that

$$\begin{aligned} \mu (X_{m,k}) = \mu (X_{m,j}) = \frac{1}{\ell ^{m}} \quad \text {and} \quad \nu (B \cap X_{m,k}) = \nu (B \cap X_{m,j}) = \frac{1}{\ell ^{m}}. \end{aligned}$$

Consequently, \(\nu _{m,k}(B_{m,k}) = \ell ^{m} \, \nu (B \cap X_{m,k}) = 1\). On the other hand, \(\mu _{m,k} (B_{m,k}) = \ell ^{m} \mu (B \cap X_{m,k}) = 0\). Since \(\{ X_{m,k} : k \in \{ 0, \ldots , \ell ^{m} -1 \} \}\) forms a partition of \(X_{\vartheta }\), we can rewrite (4.7) as

$$\begin{aligned} \sum _{k = 0}^{\ell ^{m} -1} (\mu _{m,k} + \nu _{m,k})(A_{m,k} \triangle B_{m,k})&= \ell ^{m} \sum _{k=0}^{\ell ^{m}-1} (\mu + \nu )((A_m \triangle B) \cap X_{m,k}) \\&= \ell ^{m} (\mu + \nu ) (A_m \triangle B) < \ell ^{m} \varepsilon . \end{aligned}$$

Hence, there exists a \(k'\) such that

$$\begin{aligned} (\mu _{m,k'} + \nu _{m,k'})(A_{m,k'} \triangle B_{m,k'}) < \varepsilon . \end{aligned}$$
(4.8)

Here we observe that \(A_{m,k'} \in \sigma (\xi _{m,k'})\), and recall, if \(|v |\ge 2 \ell ^{m} r\), then the word v has a unique inflation word decomposition under \(\vartheta ^{m}\). Therefore, there exists a unique \(j \in \{ 0, \ldots , \ell ^{m} - 1\}\) such that \([v] \subset X_{m,j}\).

Note that the system \((X_{m,j}, S^{\ell ^m})\) equipped with the measure \(\nu _{m,j}\) is an induced subshift obtained from \((X_{\vartheta },S)\) equipped with the measure \(\nu \) by inducing on \(X_{m,j}\). Hence, by Abramov’s formula [1],

$$\begin{aligned} h(S, \nu ) = \frac{1}{\ell ^m} h(S^{\ell ^{m}}\!, \nu _{m,j}). \end{aligned}$$

We now proceed by similar arguments to Adler and Weiss’ [2] proof that Markov shifts are intrinsically ergodic, applied to the system \((X_{m,k^{'}}, S^{\ell ^{m}})\) and the \(S^{\ell ^{m}}\)-invariant measures \(\mu _{m,k^{'}}\) and \(\nu _{m,k^{'}}\). For ease of notation, in the following we write \(k = k^{'}\) and \(T = S^{\ell ^{m}}\). Note that

$$\begin{aligned} \alpha _{m,k} = \{ S^{k}([w]) : w \in \vartheta ^{m}(a), a \in {\mathcal {A}} \} \end{aligned}$$

forms a generating partition of \(X_{m,k}\), and by the fact that \(\vartheta _{{\mathbf {P}}}\) is of constant length and recognisable,

$$\begin{aligned} \xi _{m,k} = \bigvee _{j = -r}^{r-1} T^{-j}(\alpha _{m, k}). \end{aligned}$$

Let \(\eta _{m} = \{ A_{m,k}, X_{m,k} \setminus A_{m,k} \}\) and for a set \(A \subseteq X_{m,k}\) denote by \(t_{m}(A)\) the number of atoms in \(\xi _{m,k}\) that intersect A. By definition, and using (3.1), we have

$$\begin{aligned} 2 r \ell ^{m} h(S, \nu )&= 2 r h(S^{\ell ^{m}}, \nu _{m,k}) \le H_{\nu _{m,k}}(\xi _{m,k})\\ {}&\le H_{\nu _{m,k}}(\eta _m) + H_{\nu _{m,k}}(\xi _{m,k} \vert \eta _m)\\ {}&\le \log (2) + \nu _{m,k}(A_{m,k}) \log (t_m(A_{m,k}))\\ {}&\qquad + \nu _{m,k}(X_{m,k} \setminus A_{m,k}) \log (t_m(X_{m,k} \setminus A_{m,k})).\end{aligned}$$

Let \(S^{r \ell ^{m} + k}[w] \in \xi _{m,k}\), with \(w \in \vartheta ^{m}(v)\) for some \(v \in {\mathcal {L}}_{\vartheta }^{2r}\). By Lemma 4.11, we have that

$$\begin{aligned} \mu _{m,k} (S^{r \ell ^{m} + k} ([w]))= \ell ^{m} \mu ([w]) \ge \mu ([v]) \frac{c^{2r}}{2 r \mathrm {e}^{ 2 \ell ^{m} r h_{\text {top}}(X_{\vartheta })}} \ge C \mathrm {e}^{- 2 r \ell ^{m} h_{\text {top}}(X_{\vartheta })}, \end{aligned}$$

taking \(C = c^{2r} (\min _{v \in {\mathcal {L}}_{\vartheta }^{2r}} \mu ([v])) / 2r \). We have that \(C > 0\) since \(\mu ([v]) > 0\) for all \(v \in {\mathcal {L}}_{\vartheta }^{r}\). Hence,

$$\begin{aligned} t_m(A_{m,k})&\le \frac{1}{C} \, \mu (A_{m,k}) \mathrm {e}^{2 \ell ^{m} r h_{\text{ top }}(X_{\vartheta })} \qquad \text{ and }\\ t_m(X_{m,k} \setminus A_{m,k})&\le \frac{1}{C} \,\mu (X_{m,k} \setminus A_{m,k}) \mathrm {e}^{2 \ell ^{m} r h_{\text{ top }}(X_{\vartheta })}. \end{aligned}$$

This yields \(0 \le \log (2) - \log (C) + \nu _{m,k}(A_{m,k}) \log (\mu _{m,k}(A_{m,k}))\). By (4.8), we have that \(\mu _{m,k}(A_{m,k}) < \varepsilon \) and \(\nu _{m,k}(A_{m,k}) > 1 - \varepsilon \). This implies the following contradiction:

$$\begin{aligned} 0 \le \lim _{\varepsilon \rightarrow 0} \left( \log (2) - \log (C) + (1 - \varepsilon ) \log (\varepsilon ) \right) = - \infty . \end{aligned}$$

\(\square \)

From Lemma 4.11, we have used only the lower bound in the proof of Theorem 4.8. Since this inequality holds under less restrictive conditions, it seems natural to inquire whether Theorem 4.8 can be sharpened accordingly by replacing the constant length assumption with a weaker condition. However, a closer inspection reveals that the last part of the proof relies on the detailed control that the constant length assumption provides. A definite answer therefore remains as an open problem.

5 Examples and Open Questions

In this section we present examples of random substitution subshifts that exhibit various properties. We first present several examples that illustrate the main results of this paper and their applications to two prototypical examples of random substitutions, the random period doubling (Example 5.2) and random Fibonacci (Example 5.4) substitutions. We then consider some familiar examples of subshifts which can be obtained as subshifts of primitive random substitutions, including the golden mean shift (Example 5.5) and the Dyck shift (Example 5.7). A summary of the key properties of each of the examples is presented in the table below.

 

5.1

5.2

5.3

5.4

5.5

5.6

5.7

Unique r. paths

Compatible

Constant length

(ISC)/(DSC)

(DSC)

(DSC)

(DSC)

(DSC)

(ISC)

Recognisable

Frequency MME

?

Intrinsically ergodic

?

?

?

By the existence of a frequency measure of maximal entropy, we mean that there exists a choice of probabilities on the given set-valued substitution that gives rise to a frequency measure of maximal entropy. In particular, when we say there does not exist such a frequency measure of maximal entropy, we do not rule out the possibility that there exists another random substitution that gives rise to the same subshift for which the corresponding frequency measure is a measure of maximal entropy.

We first give an example of a random substitution which satisfies the conditions of Theorem 4.8, thus gives rise to an intrinsically ergodic subshift.

Example 5.1

Let \(\vartheta \) be the random substitution defined by

$$\begin{aligned} \vartheta :{\left\{ \begin{array}{ll} a \mapsto {\left\{ \begin{array}{ll} aaa &{} \text {with probability } 1/2,\\ abb &{} \text {with probability } 1/2, \end{array}\right. }\\ b \mapsto {\left\{ \begin{array}{ll} bba &{} \text {with probability } 1/2,\\ aba &{} \text {with probability } 1/2, \end{array}\right. } \end{array}\right. } \end{aligned}$$

with associated subshift \(X_{\vartheta }\) and corresponding frequency measure \(\mu \). One can verify that \(\vartheta \) is recognisable and satisfies the conditions of Theorem 4.8 (specifically (i)). Hence, \(\mu \) is the unique measure of maximal entropy for the system \((X_{\vartheta },S)\). By Theorem 3.5, we have

$$\begin{aligned} h (\mu ) = h_{\text {top}} (X_{\vartheta }) = \frac{1}{2}\log (2) \text {.} \end{aligned}$$

Example 5.2

(Random period doubling) Let \(p \in (0,1)\), let \(\vartheta _{p}\) be the random substitution defined by

$$\begin{aligned} \vartheta _{p} :{\left\{ \begin{array}{ll} a \mapsto {\left\{ \begin{array}{ll} ab &{} \text {with probability} \; p,\\ ba &{} \text {with probability} \; 1-p, \end{array}\right. }\\ b \mapsto aa, \end{array}\right. } \end{aligned}$$

and let \(\mu _{p}\) denote the corresponding frequency measure. We have that \(\vartheta _{p}\) is compatible and satisfies the disjoint set condition, so it follows by Theorem 3.5 that

$$\begin{aligned} h (\mu _{p}) = -\frac{2}{3} (p \log (p) + (1-p) \log (1-p)). \end{aligned}$$

Moreover, by Theorem 4.1, we have that \(\mu _{1/2}\) is a measure of maximal entropy for the system \((X_{\vartheta },S)\). It is known that \(\vartheta _p\) is not recognisable; therefore, we are unable to apply Theorem 4.8, so it remains open as to whether or not this is the unique measure of maximal entropy.

For each of the previous two examples, the frequency measure corresponding to uniform probabilities was a measure of maximal entropy. However, this is not the case for all primitive random substitutions satisfying the disjoint set condition, as is demonstrated by the following example. Here, the frequency measure of greatest entropy occurs at a non-uniform choice of probabilities, and this frequency measure is not a measure of maximal entropy.

Example 5.3

Let \(p \in (0,1)\) and let \(\vartheta _{{\mathbf {P}}}\) be the random substitution defined by

$$\begin{aligned} \vartheta _{p} :{\left\{ \begin{array}{ll} a \mapsto {\left\{ \begin{array}{ll} aa &{}\text {with probability }p,\\ ab &{}\text {with probability }1-p, \end{array}\right. }\\ b \mapsto ba, \end{array}\right. } \end{aligned}$$

with corresponding frequency measure \(\mu _p\) and subshift \(X_{\vartheta }\). Since \(\vartheta _{p}\) is constant length and satisfies the disjoint set condition, it follows by Theorem 3.5 that

$$\begin{aligned} h (\mu _p) = - \frac{1}{2-p} (p \log p + (1-p) \log (1-p)) \text {.} \end{aligned}$$

The value of p that maximises the above expression is \(p = \tau ^{-1}\) (Fig. 2), where \(\tau \) is the golden ratio, for which the corresponding entropy is

$$\begin{aligned} h (\mu _{\tau ^{-1}}) = \log \tau \approx 0.481212 \text {.} \end{aligned}$$

On the other hand, one can compute that the topological entropy of the system \((X_{\vartheta }, S)\) is

$$\begin{aligned} h_{\text {top}} (X_{\vartheta }, S) = \sum _{n=1}^{\infty } \frac{1}{2^n} \log n \approx 0.507834 \text {,} \end{aligned}$$

so \(\mu _{\tau ^{-1}}\) is not a measure of maximal entropy. Thus, the conclusion of Theorem 4.1 does not hold without compatibility, even for constant length random substitutions. We note that the topological entropy equals \(\log \sigma \), where \(\sigma \) is Somos’s quadratic recurrence constant [15, p. 446]. It is an open question as to whether \(\sigma \) is algebraic or transcendental.

Fig. 2
figure 2

Plot of \(h (\mu _p)\) for \(p \in (0,1)\)

The previous examples all satisfy the disjoint set condition, so we could obtain a closed form expression for the entropy via Theorem 3.5. This is not the case for our next example, the random Fibonacci substitution, which is compatible but does not satisfy either the disjoint or identical set condition.

Example 5.4

(Random Fibonacci) Let \(\vartheta _{\text {RF}}\) denote the random substitution defined by

$$\begin{aligned} \vartheta _{\text {RF}} :{\left\{ \begin{array}{ll} a \mapsto {\left\{ \begin{array}{ll} ab &{} \text {with probability} \; 1/2,\\ ba &{} \text {with probability} \; 1/2, \end{array}\right. }\\ b \mapsto a, \end{array}\right. } \end{aligned}$$

and let \(\mu _{\text {RF}}\) denote the corresponding frequency measure. Since \(\vartheta _{\text {RF}}\) satisfies neither the identical set condition nor the disjoint set condition, Theorem 3.5 does not yield a closed form formula for the measure theoretic entropy of \(\mu _{\text {RF}}\). However, we may use Theorem 3.5 to obtain a sequence of bounds on \(h (\mu _{\text {RF}})\). Indeed, we have \(\lambda ^{-k} {\mathbf {H}}_{k}^{\top } {\mathbf {R}} \le h (\mu _{\text {RF}}) \le (\lambda ^{k} - 1)^{-1} {\mathbf {H}}_{k}^{\top } {\mathbf {R}}\) for all \(k \in {\mathbb {N}}\), and a computer-assisted calculation of \({\mathbf {H}}_{6}^{\top } {\mathbf {R}}\) yields

$$\begin{aligned} 0.3908< \frac{1}{\lambda ^{6}} {\mathbf {H}}_{6}^{\top } {\mathbf {R}} \le h (\mu _{\text {RF}}) \le \frac{1}{\lambda ^{6} - 1} {\mathbf {H}}_{6}^{\top } {\mathbf {R}} < 0.4140, \end{aligned}$$

noting that \(\lambda = \tau \), where \(\tau \) is the golden ratio. It was shown in [16, 29] that

$$\begin{aligned} h_{\text {top}}(X_{\vartheta _{\text {RF}}}) = \sum _{m=2}^{\infty } \frac{\log (m)}{\tau ^{m+2}} \approx 0.444399, \end{aligned}$$

so \(\mu _{\text {RF}}\) is not a measure of maximal entropy. By taking higher powers, we obtain frequency measures of greater entropy. If we consider the square of \(\vartheta _{\text {RF}}\) with uniform probabilities, namely

$$\begin{aligned} \vartheta _{\text {RF}, 2} :{\left\{ \begin{array}{ll} a \mapsto {\left\{ \begin{array}{ll} baa &{} \text {with probability} \; 1/3,\\ aba &{} \text {with probability} \; 1/3,\\ aab &{} \text {with probability} \; 1/3, \end{array}\right. }\\ b \mapsto \,{\left\{ \begin{array}{ll} ab &{} \text {with probability} \; 1/2,\\ ba &{} \text {with probability} \; 1/2, \end{array}\right. } \end{array}\right. } \end{aligned}$$

and let \(\mu _{\text {RF}, 2}\) be the corresponding frequency measure, then by Theorem 3.5 and a computer-aided calculation of \({\mathbf {H}}_{3}^{\top } {\mathbf {R}}\) we obtain

$$\begin{aligned} 0.4177< \frac{1}{\lambda ^{6}} {\mathbf {H}}_{3}^{\top } {\mathbf {R}} \le h (\mu _{\text {RF},2}) \le \frac{1}{\lambda ^{6}-1} {\mathbf {H}}_{3}^{\top } {\mathbf {R}} < 0.4424. \end{aligned}$$

Here, \(\lambda ^{2}\) is the Perron–Frobenius eigenvalue of \(\vartheta _{\text {RF}, 2}\). Hence, \(h (\mu _{\text {RF}})< h (\mu _{\text {RF}, 2}) < h_{\text {top}}(X_{\vartheta _{\text {RF}}})\), so \(\mu _{\text {RF}, 2} \) is still not a measure of maximal entropy, but has strictly greater entropy than \(\mu _{\text {RF}}\). Theorem 4.2 gives that a measure of maximal entropy can be obtained as a weak limit of frequency measures. In particular, if \((\mu _{\text {RF}, n})_{n \in {\mathbb {N}}}\) is the sequence of frequency measures corresponding to the n-th power of \(\vartheta _{\text {RF}}\) with uniform probabilities, then there exists a subsequence \((\mu _{\text {RF}, n_{k}})_{k \in {\mathbb {N}}}\) that converges weakly to a measure of maximal entropy. As to whether the system \((X_{\vartheta _{\text {RF}}}, S)\) is intrinsically ergodic, this remains open.

We now consider applications of our results to other common subshifts in symbolic dynamics. It was shown in [18] that every topologically transitive shift of finite type can be obtained as the subshift of a primitive random substitution. For the golden mean shift, it is possible to obtain the Parry measure as a weak limit of frequency measures corresponding to primitive random substitutions.

Example 5.5

(The golden mean shift) The golden mean shift is the shift of finite type over the alphabet \(\{ a, b \}\) defined by the forbidden word set \({\mathcal {F}} = \{ bb \}\). The subshift X can be obtained as the subshift of the random substitution

$$\begin{aligned} \vartheta :{\left\{ \begin{array}{ll} a \mapsto {\left\{ \begin{array}{ll} aa &{} \text {with probability }\tau ^{-1},\\ aba &{} \text {with probability }\tau ^{-2},\\ \end{array}\right. }\\ b \mapsto b. \end{array}\right. } \end{aligned}$$

However, this random substitution is not primitive, so we cannot directly apply our results. To circumvent this issue, let \(\varepsilon \in (0,1)\) and let \(\vartheta _{\varepsilon }\) be the random substitution defined by

$$\begin{aligned} \vartheta _{\varepsilon } :{\left\{ \begin{array}{ll} a \mapsto {\left\{ \begin{array}{ll} aa &{} \text {with probability }\tau ^{-1},\\ aba &{} \text {with probability }\tau ^{-2},\\ \end{array}\right. }\\ b \mapsto {\left\{ \begin{array}{ll} b &{} \text {with probability }1-\varepsilon ,\\ abb &{} \text {with probability }\varepsilon , \end{array}\right. } \end{array}\right. } \end{aligned}$$

and let \(\mu _{\varepsilon }\) denote the corresponding frequency measure. For all \(\varepsilon \in (0,1)\), \(\vartheta _{\varepsilon }\) is a primitive random substitution with unique realisation paths satisfying the disjoint set condition. Let \(\mu \) be the weak limit of \(\mu _{\varepsilon }\) as \(\varepsilon \rightarrow 0\). By compactness, \(\mu \) is a shift-invariant probability measure. Also note that X is the support of \(\mu \). One can show that \(R_{a, \varepsilon } / (\lambda _{\varepsilon } - 1) \rightarrow \tau ^2 / (\tau ^2 +1)\) as \(\varepsilon \rightarrow 0\), where \(\lambda _{\varepsilon }\) and \(R_{a, \varepsilon }\) are the Perron–Frobenius eigenvalue and the entry of the right Perron–Frobenius eigenvector corresponding to the letter a, respectively. Thus, it follows by the upper semi-continuity of entropy and Theorem 3.5 that

$$\begin{aligned} \begin{aligned} h (\mu )&\ge \limsup _{\varepsilon \rightarrow 0} h (\mu _{\varepsilon }) = \limsup _{\varepsilon \rightarrow 0} \frac{1}{\lambda _{\varepsilon } - 1} {\mathbf {H}}_{1}^{\top } {\mathbf {R}} \\&\ge \limsup _{\varepsilon \rightarrow 0} \frac{-1}{\lambda _{\varepsilon } - 1} R_{a,\varepsilon } (\tau ^{-2} \log \tau ^{-2} + \tau ^{-1} \log \tau ^{-1})\\&= \frac{\tau ^2}{\tau ^2+1} (2 \tau ^{-2} + \tau ^{-1}) \log \tau = \log \tau \text {,} \end{aligned} \end{aligned}$$

where in the last equality we have used the characteristic equation \(\tau ^2 = \tau + 1\). Since \(h_{\text {top}} (X,S) = \log \tau \) and the Parry measure is the unique measure of maximal entropy [2, 31], we conclude that \(\mu \) must be the Parry measure.

We note that the algorithm in [18] yields a primitive random substitution that gives rise to the golden mean shift. However, a closer inspection reveals that if the corresponding frequency measure is the Parry measure then we require two of the realisations to occur with probability zero and the resulting random substitution is the random substitution \(\vartheta \) defined in Example 5.5, which is not primitive. As to whether there exists a primitive random substitution for which the Parry measure is the corresponding frequency measure remains open. Our next example is a sofic shift for the which the unique measure of maximal entropy can be obtained as a frequency measure of a primitive random substitution.

Example 5.6

(A sofic shift) Let \(p \in (0,1)\), let \(\vartheta _{p}\) be the random substitution defined by

$$\begin{aligned} \vartheta _{p} :a, b \mapsto {\left\{ \begin{array}{ll} ab &{} \text {with probability} \; p,\\ ba &{} \text {with probability} \; 1-p, \end{array}\right. } \end{aligned}$$

and let \(\mu _{p}\) denote the corresponding frequency measure. In [19, Proposition 6.7], the measure theoretic entropy of \(\mu _p\) was calculated directly and shown to be

$$\begin{aligned} h (\mu _{p}) = - \frac{1}{2} (p \log (p) + (1-p) \log (1-p)). \end{aligned}$$

Since \(\vartheta _{p}\) satisfies the identical set condition and has identical production probabilities, Theorem 3.5 gives an alternative method of obtaining this formula. Moreover, by Theorem 4.1, for \(p = 1/2\), the measure \(\mu _p\) is a measure of maximal entropy. Notice that \(\vartheta _p\) is of constant length, but not recognisable since it does not satisfy the disjoint set condition. Hence, Theorem 4.8 may not be applied. However, it was shown in [19, Corollary 6.8] that the subshift associated to \(\vartheta _{p}\) is a sofic shift, thus intrinsically ergodic. Hence, \(\mu _{p}\) with \(p = 1/2\) is the unique measure of maximal entropy for the system \((X_{\vartheta },S)\).

We finally present an example of a random substitution subshift which has multiple measures of maximal entropy. This is the Dyck shift, which was shown in [24] to support two distinct ergodic measures of maximal entropy.

Example 5.7

(The Dyck shift) For \(i \in \{ 1,2,3,4 \}\), let \({\mathbf {p}}_i = (p_{i,1},p_{i,2},p_{i,3})\) be a probability vector and let \({\mathbf {P}} = \{ {\mathbf {p}}_1, {\mathbf {p}}_2, {\mathbf {p}}_3, {\mathbf {p}}_4 \}\). Define the random substitution \(\vartheta _{{\mathbf {P}}}\) over the alphabet \({\mathcal {A}} = \{ (, \, ), \, [, \, ] \}\) by

$$\begin{aligned} \vartheta _{{\mathbf {P}}} :{\left\{ \begin{array}{ll} \; \begin{aligned} ( &{}&{}\mapsto &{}&{} {\left\{ \begin{array}{ll} \; ( &{}\text { with probability }p_{1,1},\\ \; ( ( ) &{}\text { with probability }p_{1,2},\\ \; ( [ ] &{}\text { with probability }p_{1,3},\\ \end{array}\right. } &{}\quad &{} ) &{}&{}\mapsto &{}&{} {\left\{ \begin{array}{ll} \; ) &{}\text { with probability }p_{2,1},\\ \; ( ) ) &{}\text { with probability }p_{2,2},\\ \; [ ] ) &{}\text { with probability }p_{2,3},\\ \end{array}\right. }\\ [ &{}&{}\mapsto &{}&{} {\left\{ \begin{array}{ll} \; [ &{}\text { with probability }p_{3,1},\\ \; [ ( ) &{}\text { with probability }p_{3,2},\\ \; [ [ ] &{}\text { with probability }p_{3,3},\\ \end{array}\right. } &{}\quad &{} ] &{}&{}\mapsto &{}&{} {\left\{ \begin{array}{ll} \; ] &{}\text { with probability }p_{4,1},\\ \; ( ) ] &{}\text { with probability }p_{4,2},\\ \; [ ] ] &{}\text { with probability }p_{4,3}.\\ \end{array}\right. } \end{aligned} \end{array}\right. } \end{aligned}$$

The corresponding subshift is the Dyck shift, which supports two distinct measures of maximal entropy [19]. The random substitution \(\vartheta _{{\mathbf {P}}}\) does not have unique realisation paths since, for example, the word (()) can be obtained as two different realisations of () under \(\vartheta _{{\mathbf {P}}}\). Consequently, it is difficult to verify whether or not either or both of the ergodic measures of maximal entropy can be obtained as frequency measures.

This final example motivates the following open question.

Question 5.8

Under what conditions does a primitive random substitution give rise to an intrinsically ergodic subshift?

We have presented three examples of random substitutions which give rise to intrinsically ergodic subshifts. In general it appears to be difficult to deduce whether a random substitution subshift is intrinsically ergodic. The absence of a Gibbs property and specification provide obstacles to adapting many of the conventional methods for checking whether a subshift is intrinsically ergodic. Further, there does not appear to be an easy way of extending the proof of Theorem 4.8 to the case where the substitution is not recognisable or constant length. As such, we leave a definitive answer to future work.