Uniformly de Bruijn sequences and symbolic Diophantine approximation on fractals

Intrinsic Diophantine approximation on fractals, such as the Cantor ternary set, was undoubtedly motivated by questions asked by K. Mahler (1984). One of the main goals of this paper is to develop and utilize the theory of infinite de Bruijn sequences in order to answer closely related questions. In particular, we prove that the set of infinite de Bruijn sequences in $k\geq 2$ letters, thought of as a set of real numbers via a decimal expansion, has positive Hausdorff dimension. For a given $k$, these sequences bear a strong connection to Diophantine approximation on certain fractals. In particular, the optimality of an intrinsic Dirichlet function on these fractals with respect to the height function defined by symbolic representations of rationals follows from these results.


Introduction
In this paper, we give a novel application of combinatorics to the field of Diophantine approximation. Since we do not assume that the reader is familiar with this field, let us first recall some important concepts and ideas. We refer the reader to Section 5 where we rigorously define and discuss these notions.
Classically, the field of Diophantine approximation sought to quantify how well real numbers can be approximated by rationals, weighing the distance to the rational point against some function of its denominator. The inaugural result in the field is Dirichlet's theorem, Theorem 5.1, which states that every irrational real number has infinitely many rational points p/q that lie within distance 1/q 2 of it. This result raises the question of whether that function, 1/q 2 , can be improved. That it cannot be, in a sense made precise in Section 5, is due to a result of Liouville, who showed that quadratic irrational numbers, like √ 2, admit no better rate of approximation. In modern terminology, we call such points badly approximable.
A more complete description of the set of badly approximable numbers, in this and related contexts, was the subject of much activity in the early-to-mid twentieth century. Via a characterization of badly approximable numbers in terms of continued fraction expansions one can show that the set of badly approximable numbers is uncountable, but it is also relatively easy to show that this set is a Lebesgue null set [6, Theorem 1.9 and Corollary 1.6], so we must turn to other notions of "size". One such notion, particularly well-suited to disntinguishing between sets of measure zero, is that of Hausdorff dimension. Jarník showed that despite being a Lebesgue null set, the set of badly approximable real numbers has full Hausdorff dimension, so it is still "large" in some sense.
As discussed further in Section 5, the core questions of Diophantine approximation can be formulated in many diverse contexts, essentially whenever we have a complete metric space X, a countable dense subset Q, and some notion of "height" defined on Q (this would be the size of the denominator in the classical case above). Over the last decade, a plethora of results regarding Diophantine approximation on fractals have emerged [4,5,8,10,11,12,14,15,18]. Many of these results were motivated by the following question(s) posed by K. Mahler in 1984 [17, §2]: "How close can irrational elements of Cantor's set be approximated by rational numbers (1) in Cantor's set, and (2) by rational numbers not in Cantor's set?" In this paper we will restrict our attention to Mahler's first question; see Section 6 for details. We remark that while in [12], the first-and third-named authors were able to exhibit an optimal Dirichlet function (see Definition 5.2) corresponding to Mahler's second question, it seems that finding an analogous answer to his first question is significantly harder, see e.g. [5,7,12] for detailed discussions and conjectures regarding this question.
In [12], a new height function was defined on the rational points of the Cantor set (see Section 6), and a Dirichlet-type theorem was proven [12, Corollary 2.2 and its proof]. The purpose of this paper is to demonstrate the optimality of that Dirichlet theorem, and give an estimate on the Hausdorff dimension of the set of "badly approximable" points. This set, as noted in [12], admits a precise combinatorial description, although at the time we had been unable to exhibit any members belonging to it. In the present paper, we focus on a combinatorially defined subset of the set of badly approximable points, the set of uniformly de Bruijn sequences. The existence of uniformly de Bruijn sequences demonstrates the optimality of the Dirichlet function (Theorem 6.3), and by estimating from below the Hausdorff dimension of the set of uniformly de Bruijn sequences (Theorem 2.1), we are able to get a positive lower bound for the Hausdorff dimension of the set of badly approximable points (Corollary 6.4), a first step towards a Jarník-type result. See Section 6 for a more nuanced discussion of these points.
1.1. Acknowledgements. The first-named author was supported in part by the Simons Foundation grant #245708. The third-named author was supported in part by the EPSRC Programme Grant EP/J018260/1. The authors would like to thank Joseph Kung for valuable comments on an earlier version of the paper, and Jonah Ostroff for introducing us to the notion of de Bruijn sequences. The authors thank the anonymous referee for valuable comments.

Finite and infinite de Bruijn sequences
Let A be a finite alphabet of cardinality k ≥ 2. We recall that a (non-cyclic) de Bruijn sequence of order n in A is a sequence ω of length k n + n − 1 in the alphabet A that has the property that every sequence of length n in A appears as a consecutive substring of ω exactly once. For example, in the alphabet {0, 1}, the sequence 00110 is a de Bruijn sequence of order 2 while in the alphabet {0, 1, 2}, the sequence 00010020110120210221112122200 is a de Bruijn sequence of order 3. We say that an infinite sequence ω ∈ A N is infinitely de Bruijn if the set (2.1) B ω def = {n ∈ N : the initial segment of ω of length k n + n − 1 is a de Bruijn sequence of order n} is infinite. We say that ω is totally de Bruijn if B ω = N, and uniformly de Bruijn if B ω has bounded gap sizes. The construction of infinitely de Bruijn sequences goes back to Becher and Heiber [2], 1 who showed that when k ≥ 3, totally de Bruijn sequences could be constructed recursively by extending each de Bruijn sequence of order n to a de Bruijn sequence of order (n + 1). We shall discuss their method in more detail below. When k = 2, it is known that no totally de Bruijn sequence exists, but Becher and Heiber do construct a uniformly de Bruijn sequence such that B ω = 2N.
In order to state our main theorem for this section, let us briefly recall the definition and basic properties of the Hausdorff dimension of a fractal 2 F ⊆ R d , see e.g. [9,. Let d denote the standard metric on R d , and let diam(U ) denote the diameter of a set U ⊆ R d . Fix δ > 0 and let F ⊆ R d . We say that a countable collection 1 Note that in [2], the phrase "infinite de Bruijn sequence" has a different meaning; we do not use that meaning in this paper because it makes an ad hoc distinction between the k = 2 case and the k ≥ 3 case. 2 The word "fractal" normally has a connotative but not a denotative meaning in mathematics; a set is called a fractal if it is "sufficiently complicated at fine scales".
It is well known that for every F ⊆ R d we have 0 ≤ dim H (F ) ≤ d, and that if dim H (F ) > 0, then F is uncountable, but not vice versa. 3 We also recall that if b ≥ 2 is an integer, then the base b expansion of a number x ∈ [0, 1] is the series where ω 1 , ω 2 , . . . ∈ {0, 1, . . . , b − 1} are chosen so that the value of the series is equal to x. This choice is unique unless x is a rational number whose denominator is a power of b, in which case there are exactly two ways in which the infinite word ω = ω 1 ω 2 · · · can be chosen.
Denote by δ the Hausdorff dimension of the set F consisting of all numbers that can be written in the form i.e. the set of all numbers in F that have at least one base b expansion composed entirely of digits from A. 4 Then the set S consisting of all elements of F that have at least one base b expansion that is uniformly de Bruijn satisfies In particular, S has positive Hausdorff dimension but not full Hausdorff dimension.
Note that for large values of k, Stirling's formula gives α k ∼ log(k!) k log(k) ∼ 1 − 1 log(k) (where x ∼ y means (1 − x)/(1 − y) → 1), and in particular α k → 1 as k → ∞. Thus S gets closer and closer to having full dimension as the number of allowed digits increases.

Preliminaries
We begin by recalling some key definitions used in Becher and Heiber's paper, as well as the proof of the well-known BEST 5 theorem.
Note that every vertex has in-degree and out-degree both equal to k def = #(A), for a total of k n vertices and k n+1 edges.
If ω is a sequence of length ℓ ≥ n in A, then the path induced by ω on G is the path 6 γ = γ 1 · · · γ ℓ−n+1 in G defined by the formula γ i def = ω i · · · ω i+n−1 ∈ V (G). 3 The set of Liouville numbers on the real line is a standard example of a comeager (and thus uncountable) set of Hausdorff dimension 0. 4 It is well known that δ = log(k)/ log(b), see Subsection 5.2. 5 An acronym after the people who discovered it: de Bruijn, van Aardenne-Ehrenfest, Smith, and Tutte. 6 In this paper a "path" in a directed graph is a sequence of vertices such that each pair of consecutive vertices is connected by an edge from the first vertex to the second vertex. The length of a path is the number of such edges, or equivalently the number of vertices minus one (counting multiplicity in both cases). A path is simple if all its vertices are distinct except possible the first and last, and Eulerian if it contains each edge exactly once.

Observation 3.2.
Let ω be a sequence of length ℓ ω = k m + m − 1, and let γ be the path induced by ω on G n (A). Note that the length of γ is ℓ γ = ℓ ω − n = k m + m − n − 1; in particular, ℓ γ = k n+1 if m = n + 1, and ℓ γ < k n if m ≤ n. Moreover, (I) If m = n + 1, then ω is de Bruijn if and only if γ is Eulerian. (II) If m ≤ n and ω is de Bruijn, then γ is a simple path. Remark 3.3. If m = n and ω is de Bruijn, then γ is a simple path that visits each vertex exactly once. However, since γ starts and ends at different vertices, it is not a Hamiltonian cycle, contrary to [2, p.931, first para.]. In particular, the edge set of γ does not form a regular graph on V (Ω), as is claimed in [2, Proof of Lemma 3, last para.]. Consequently, the proof given there is technically incorrect; it can be trivially fixed by adding a step where γ is extended to a Hamiltonian cycle; cf. the first two paragraphs of the proof of Corollary 4.3 below. Similar remarks apply to [2, Proof of Lemma 5, last para.]. Now let X = (V (X), E(X)) be a directed graph such that for each vertex x ∈ V (X), the in-degree and out-degree of x are nonzero and equal to each other (though they may depend on x). Fix a vertex x 0 ∈ V (X), and let E be the set of Eulerian paths of X that start and end at x 0 . Note that, unlike standard convention, we consider two Eulerian paths to be different if they are formally different as sequences of vertices even if they are cyclically equivalent. Let T be the set of directed spanning trees of X rooted at x 0 with edges pointing towards x 0 .
Since both the conclusion of the BEST theorem and its proof will be important for our argument, we recall them now. We once again remind the reader that our statement differs slightly from the usual one because of our convention about counting Eulerian paths: we do not consider cyclically equivalent paths to be the same. But the difference is easy to quantify: the number of Eulerian paths in each cyclic equivalence class that start and end at x 0 is equal to the degree of x 0 (we recall that by assumption the in-degree and out-degree are equal). So our count will be off from the conventional one by a factor of deg(x 0 ). Theorem 3.4 (BEST theorem). We have Proof. Let T ∈ T be a directed spanning tree rooted at x 0 . For each x ∈ V (X), let E x denote the set of edges in X with initial vertex x, and let T x = E(T ) ∩ E x , where E(T ) denotes the edge set of T . If x = x 0 , then T x is a singleton, say T x = {v x }, while T x0 = . Now let Ord(S) denote the set of total orderings of a set S, and note that the cardinality of the set be the Eulerian path that starts and ends at x 0 defined recursively as follows: Suppose that the points x 0 = γ 0 , γ 1 , . . . , γ i have been defined, and let x = γ i . Then the next vertex γ i+1 must be chosen so that We will also need the following sufficient condition for the right-hand side of (3.1) to be nonzero: If X is connected, then there is at least one directed spanning tree rooted at x 0 , i.e. T = .
Proof. Let T be a maximal directed tree rooted at x 0 . By the maximality of T , there is no edge from any vertex not in T to any vertex in T . Since each vertex of X has equal in-degree and out-degree, the number of edges from which is equal to zero. Since X is connected, this means that either  N). In other words, then H s (S) = 0 and thus dim H (U ) ≤ s.
It turns out to be convenient to consider a collection {U j : j ∈ N} that naturally splits up into subcollections, say {U j : j ∈ N} = m C m for some sequence of collections (C m ) ∞ m=1 . In this case, the summability condition (4.1) is equivalent to the condition is the s-dimensional cost of C m . Note that cost s (C m ) should be distinguished from the expression (cost 1 (C m )) s , which denotes instead the 1-dimensional cost of C m raised to the power of s. The set S can be written in terms of the collections (C m ) ∞ m=1 as follows: In what follows we will abuse terminology somewhat by calling cost s (C m ) the "cost" of the set S m def = U∈Cm U , although strictly speaking, it depends not only on S m but also on how it is decomposed.
Proof of upper bound. For each m, let S m be the set consisting of all elements of F corresponding to base b expansions whose initial segments of length k m + m − 1 are de Bruijn sequences of order m in A. Then the lim sup of the sequence (S m ) ∞ m=1 consists of those elements of F with infinitely de Bruijn base b expansions. In particular, the set S consisting of those elements of F with uniformly de Bruijn base b expansions satisfies: By the Hausdorff-Cantelli lemma, if we can find an s such that then we can conclude that dim H (S) ≤ s. We will show that (4.2) holds for all s > δ log(k!) k log(k) .
For each m, we view S m as the union of the collection : ω is a de Bruijn sequence of order m in the alphabet A}, where for each ω, S ω m is the set of points x ∈ F corresponding to base b expansions whose initial segments of length k m +m−1 are equal to ω. Let G be the de Bruijn graph of order (m−1) on A (see Definition 3.1), so that #(V (G)) = k m−1 . By Observation 3.2(I), the collection C m is in bijection with the set of Eulerian paths on G. Fix a vertex x 0 ∈ V (G). We can estimate the number of Eulerian paths starting and ending at x 0 via the BEST theorem. Specifically, we have x∈V has degree equal to k. The number of spanning trees rooted at x 0 is at most k #(V (G))−1 , since an edge must be chosen emanating from each vertex x = x 0 , and each vertex has out-degree k. And for the same reason, deg(x 0 ) = k. Therefore, the number of Eulerian paths starting and ending at x 0 is at most By the ratio test, this series converges as long as lim m→∞ |a m+1 /a m | < 1, where a m denotes the mth term. A straightforward computation yields: which tends to 0 as m → ∞. Thus by Lemma 4.1, we have Since for all k ≥ 2 we have k! < k k and thus log(k!) k log(k) < 1, we deduce that the Hausdorff dimension of S is strictly less than δ.

4.2.
The lower bound. The proof of the lower bound is significantly more involved, and will require a few preliminary results. We begin with the following proposition: Proposition 4.2. Let X be a k-regular connected directed graph, fix x 0 ∈ V (X), and let E be the set of Eulerian paths of X that start and end at x 0 . Then there exists E ′ ⊆ E such that: (ii) If δ is a path of length ℓ δ starting at x 0 , then the number of paths in E ′ that extend δ is at most k · (k − 1)! #(V (X))−ℓ δ /k . 7 In fact, the exact count for such sequences is known, but we prefer this estimate because it is simpler and yields the same upper bound on the Hausdorff dimension.
Proof. Since X is connected, by Lemma 3.5 there exists a directed spanning tree T rooted at x 0 . Let E ′ be the set of Eulerian paths δ that start and end at x 0 such that for all xy ∈ E(X) and xz ∈ E(T ) with y = z, the edge xy appears in δ before xz does. Equivalently, where the notation is as in the proof of the BEST theorem. Then the proof of the BEST theorem implies that . Now let δ be a path starting at x 0 that has at least one extension is an extension of δ if and only if the algorithm described in the proof of the BEST theorem produces δ on input o. Equivalently, f (T, o) is an extension of δ if for each edge xy of δ, the rank of xy according to o x is the same as its rank according to its location in δ. The number of elements o ∈ O(T ) satisfying this condition is given by the formula where E x denotes the set of edges with initial vertex x, and E(δ) denotes the edge set of δ. Here we use the convention (−1)! = 1, since if E x \ E(δ) = , then there is exactly one ordering o x satisfying the appropriate condition, namely the ordering determined by δ, and by hypothesis the element v x comes last in this ordering. Now since The next result will furnish the lower bound for k ≥ 4. Although it is valid for k = 3, it provides no useful information in this case since 0 is always a (trivial) lower bound on the dimension. Corollary 4.3. Let the notation be as in Theorem 2.1, and let S be the set of numbers in F with totally de Bruijn base b expansions. Assume that k ≥ 4. Then the Hausdorff dimension of S is bounded below by α k δ > 0, where δ is the Hausdorff dimension of F (and equals log(k)/ log(b)), and Before we turn to the proof, we recall the so-called Mass Distribution Principle, an extremely useful tool for bounding the Hausdorff dimension from below. . Let F be a metric space, and let µ be a measure on F such that 0 < µ(F ) < ∞. Fix s, ε > 0, and suppose that there exists C > 0 such that µ(U ) ≤ C · diam(U ) s for every set Proof of Corollary 4.3. Fix n ∈ N, and let ω = ω 1 · · · ω k n +n−1 be a de Bruijn sequence of order n in A. Since the path induced by ω on G n−1 (A) is an Eulerian path in a directed graph in which each vertex has equal in-degree and out-degree, it must start and end at the same vertex, which means that the first (n − 1) letters of ω are the same as the last (n − 1) letters, i.e. ω k n +i = ω i for all i = 1, . . . , n − 1. 8 Now let ω k n +n = ω n and ω ′ = ω 1 · · · ω k n +n . Then the first n letters of ω ′ are the same as the last n letters, but no other block of n letters is repeated in ω ′ .
Let G = G n (A) be the de Bruijn graph of order n on A, and let γ = γ 1 · · · γ k n +1 be the path induced by ω ′ on G. Then γ is a Hamiltonian cycle (i.e. a simple path traversing each vertex once). The collection of de Bruijn sequences of order (n + 1) that extend ω ′ is isomorphic to the collection of Eulerian paths on G that extend γ.
Let x 0 def = γ 1 = γ k n +1 be the common initial and terminal vertex of γ. Then the collection of Eulerian paths of G that extend γ is isomorphic to the set of Eulerian paths of X ω def = G \ E(γ) that start and end at x 0 , which we denote by E(ω). Since X ω is a (k − 1)-regular connected directed graph whose vertex set has size k n (see the proof of [2, Lemma 3] for connectedness), we may use Proposition 4.2 to extract a subset E ′ (ω) ⊆ E(ω). Pulling this subset back via the appropriate correspondences gives us a set S ′ (ω), contained in the set of all de Bruijn sequences of order (n + 1) extending ω ′ (and thus also extending ω), with the following properties: (ii) If τ is a sequence of length ℓ τ extending ω, then the number of sequences in S ′ (ω) that extend τ is at most (k − 1) · (k − 2)! k n −(ℓτ −ℓω−1)/k , where ℓ ω = k n + n − 1 is the length of ω.
Now we proceed to define a probability measure µ on F ≡ E N via a random algorithm: start with a fixed de Bruijn sequence ω (1) of order 1, and if ω (n) is a de Bruijn sequence of order n, then let ω (n+1) ∈ S ′ (ω (n) ) be chosen randomly with respect to the uniform measure on S ′ (ω (n) ), independent of all previous selections.
Let ω be the unique infinite sequence that extends all of the finite sequences ω (n) (n ∈ N). Then ω is a base b expansion of a unique point π(ω) ∈ F . (The point π(ω) may have a base b expansion other than ω, but there is no other point with base b expansion ω.) We let µ be the probability measure describing the distribution of the random variable π(ω). (The existence of such a µ can be guaranteed e.g. by the Kolmogorov extension theorem.) To demonstrate that µ satisfies the hypotheses of the mass distribution principle, we first estimate the measure of cylinder sets of a certain length, then arbitrary cylinder sets, then balls. Here a cylinder set is a set of the form [τ ] = {π(ω) : ω i = τ i ∀i = 1, . . . , ℓ τ }, where τ = τ 1 · · · τ ℓτ is a finite sequence in the alphabet A. Our first estimate is easy: if ℓ τ = k n+1 + n for some n, then [τ ] is precisely the set of π(ω) in the above construction such that ω (n+1) = τ , so µ([τ ]) is just the probability that ω (n+1) = τ , i.e.
if it is possible that ω (n+1) = τ , and µ([τ ]) = 0 otherwise. Now consider the more general case where the length of τ satisfies k n + n − 1 < ℓ τ ≤ k n+1 + n for some n. Then by (ii) above, [τ ] contains at most (k − 1) · (k − 2)! k n −(ℓτ −(k n +n))/k cylinders of length k n+1 + n. Combining with (4.5) shows that Here and hereafter we use the notation exp x (y) def = x y . To apply the mass distribution principle (Lemma 4.4), we now need to relate this measure to the diameter of the cylinder [τ ]. Since elements of [τ ] have the first ℓ τ digits of their base b expansions fixed, the diameter of [τ ] is approximately b −ℓτ (to be precise, it is c · b −ℓτ for some constant 0 < c ≤ 1). Thus where C = (k − 1) · c −α k δ and s = α k δ. But any subset of F can be covered by at most two cylinder sets with comparable diameter, so a similar formula holds for arbitrary sets. Thus by Lemma 4.4, we have dim H (S) ≥ s = α k δ.
As is evident from Corollary 4.3, we now have to deal with the cases k = 2 and k = 3 separately, since in those cases the formula (4.4) gives α 2 = α 3 = 0, which is not a useful bound. Note that the Cantor ternary set falls into the case k = 2, since its set of admissible numerators is A = {0, 2}.
Proposition 4.5. If k = 2 and ω is a de Bruijn sequence of order (n − 2) in A, then the number of de Bruijn sequences of order (n + 1) that extend ω is at least 2 2 n−2 .
In the case where k = 3 and ω is a de Bruijn sequence of order (n − 1) in A, then the number of de Bruijn sequences of order (n + 1) that extend ω is at least 4 3 n−1 .
Proof. For convenience we let ∆ = 2 if k = 3, and ∆ = 3 if k = 2; then ω is a de Bruijn sequence of order (n − ∆ + 1). The first paragraph of Corollary 4.3 shows that the first (n − ∆) letters of ω are the same as the last (n − ∆) letters. So if we extend ω to a word ω ′ of length k n−∆+1 + n by letting ω k n−∆+1 +i = ω i for i = n − ∆ + 1, . . . , n, then the first n letters of ω ′ are the same as the last n letters, but no other block of n letters is repeated.
Let G be the de Bruijn graph of order n on A, and let γ be the path induced by ω ′ on G. The length of γ is ℓ γ = k n−∆+1 , and γ is a simple path that starts and ends at the same vertex x 0 . As in the proof of Corollary 4.3, we let X = X ω = G \ E(γ), where E(γ) is the edge set of γ. The collection of de Bruijn sequences of order (n + 1) that extend ω is isomorphic to the collection of Eulerian paths on G that extend γ, which in turn is isomorphic to the collection of Eulerian paths on X ω that start and end at x 0 . By the BEST theorem, the cardinality of this collection is If k = 3, we complete the proof with the following calculation: In the first inequality, we have used Lemma 3.5 and the proof of [2, Lemma 3] to deduce that #(T ) ≥ 1.
For the remainder of the proof, we assume that k = 2. In this case, the strategy of the above calculation cannot work, since we have [deg(x; X ω ) − 1]! = 1 for all x ∈ V (G) and thus N ≤ 2#(T ). Instead we must estimate the number of spanning trees in X ω .
Let S be the set of sequences of length (n−1) that do not occur in ω, and note that #(S) = 2 n−1 −2 n−2 = 2 n−2 . For each τ ∈ S, let E τ = {aτ b : a, b ∈ A} ⊆ E(X ω ), where aτ b is shorthand for (aτ )(τ b), the edge from the vertex aτ to the vertex τ b. Note that the sets E τ (τ ∈ S) are disjoint. Lemma 4.6. If T is a directed spanning tree and τ ∈ S, then there exists a directed spanning tree T ′ = T such that T ′ \ E τ = T \ E τ .
Proof. By contradiction, suppose that the conclusion of the lemma is false, i.e. that there exists no such spanning tree T ′ .
Denote the partial order on V (G) induced by the tree T by <, i.e. write x < y if there is a path in T from x to y, and write x ≤ y if either x < y or x = y. We write x < * y if x is a direct descendant of y, i.e. if xy ∈ E(T ). For each a ∈ A, let f (a) ∈ A be chosen to satisfy aτ f (a) ∈ E(T ), and let g(a) = σ(f (a)), where σ : A → A is the permutation that swaps the two elements of A. Consider the graph T ′ = T ∪ {aτ g(a)} \ {aτ f (a)}. Then T ′ = T and T ′ \ E τ = T \ E τ , so by the contradiction hypothesis, T ′ is not a directed spanning tree, which implies that τ g(a) ≤ aτ . On the other hand, we have aτ < * τ f (a) since aτ f (a) ∈ T . Now write A = {a, b}, c = f (a), and d = σ(c) = g(a). Then either f (b) = c or f (b) = d, and thus we have one of the following two diagrams: τ d ≤ aτ < * τ c > * bτ ≥ τ d or τ d ≤ aτ < τ c ≤ bτ < τ d.
Both diagrams are impossible for directed trees: the left-hand diagram is impossible because if aτ and bτ are siblings, then they have no common descendants, while the right-hand diagram is disjoint because it is a nontrivial directed loop. This is the desired contradiction. ⊳ It follows from Lemma 4.6 that there exists a function φ : T × S → T such that for all T ∈ T and τ ∈ S, we have φ(T, τ ) = T and φ(T, τ ) \ E τ = T \ E τ . Now by Lemma 3.5 and the proof of [2, Lemma 5], X has a directed spanning tree T 0 rooted at x 0 . Let (τ i ) N i=1 be an indexing of S, where N = 2 n−2 . Given ω ∈ {0, 1} N , we define recursively Then the map {0, 1} N ∋ ω → T ω,N ∈ T is injective. Thus N ≥ #(T ) ≥ #({0, 1} N ) = 2 2 n−2 , which completes the proof.
Corollary 4.7. Let the notation be as in Theorem 2.1. Suppose that k ≤ 3, and let Then the Hausdorff dimension of the set {π(ω) ∈ F : B ω contains an arithmetic progression with gap size ∆} is at least α k δ.
Proof. Let B = 2 if k = 2 and B = 4 if k = 3. Then α k = ((k ∆ − 1) · (k ∆ log B (k) − 1)) −1 , and Proposition 4.5 can be expressed uniformly as follows: If ω is a de Bruijn sequence of order n in A, then the number of de Bruijn sequences of order n + ∆ that extend ω is at least exp B (k n ). We denote the set of all such extensions by S ′ (ω). As in the proof of Corollary 4.3, we define a probability measure µ by a random algorithm: let ω (1) be a fixed de Bruijn sequence of order ∆, and if ω (n) is a de Bruijn sequence of order n∆, then let ω (n+1) be chosen randomly with respect to the uniform measure on S ′ (ω (n) ), independent of all previous selections. As before we let ω ∈ A N be the unique common extension, we let π(ω) ∈ F be the unique number for which ω is a base b expansion, and we let µ be the probability measure describing the distribution of π(ω).
We need this expression to be bounded as n → ∞. Applying the change of variables x = 9 n , we need to show that lim sup This is true if and only if 4 −1/8 · (3 9 · 4 −1 ) t < 1, which in turn is true if and only if t < α 3 . This proves that the hypothesis of the mass distribution principle holds for cylinder sets. As in the proof of Corollary 4.3, any subset of F can be covered by at most two cylinder sets with comparable diameter, so the hypothesis of the mass distribution principle holds for arbitrary sets as well.

Remark 4.8.
Either of the strategies used in this proof, the (simpler) strategy for the k = 3 case or the (more complicated) strategy for the k = 2 case, could have been used (after minor modification) in the case k ≥ 4 as well, but the resulting bound would have been significantly worse, measured by the fact that the analogues of α k would not have tended to 1. Similarly, the strategy for the k = 2 case could have been used for the k = 3 case, again resulting in a worse bound. In general, the principle is that whatever techniques work for one value of k will also work for higher values of k, but may not give very good estimates for higher values of k.

Intrinsic Diophantine approximation
5.1. Diophantine approximation -a brief survey. We first recall some definitions and state some well-known classical theorems: Definition 5.1. Let H : Q → R >0 be a function. We think of H as a "height function", and for all p ∈ Z and q ∈ N, we define the height of p/q to be the number H(p/q). We say that a function ψ : R >0 → R >0 is a Dirichlet function (with respect to the height function H) if for every x ∈ R \ Q there exist infinitely many rationals p/q such that x − p/q < ψ(H(p/q)).
Historically speaking, the only height function considered on the unit interval [0, 1] was the function H std (p/q) = q, where p and q are chosen in reduced form, i.e. gcd(p, q) = 1. We will refer to this as the standard height. It is readily verified that for example ψ 0 (q) = 1 and ψ 1 (q) = 1/q are Dirichlet functions with respect to the standard height function and using the terminology of Definition 5.1, Dirichlet's approximation theorem may be stated as follows: Theorem (Dirichlet). ψ 2 (q) = 1 q 2 is a Dirichlet function with respect to the standard height function. 9 9 In fact, Dirichlet's theorem furnishes a similar result for all dimensions d. It was recently pointed out to us by Y. Bugeaud that the one-dimensional version of this result is actually much older, coming directly from the theory of continued fractions (see e.g. [16, displayed equation on p.28]). Nevertheless, we call the theorem "Dirichlet's theorem" so as to conform to usual practice.
For our purposes, although of interest in its own right, an improvement of a Dirichlet function by a multiplicative constant is not significant. More precisely: Definition 5.2. We say that a Dirichlet function ψ is optimal if there does not exist a Dirichlet function φ for which lim q→∞ φ(q) ψ(q) → 0. It is clear that Dirichlet's theorem implies that the Dirichlet functions ψ 0 and ψ 1 defined above are not optimal. The optimality of the function ψ 2 (q) = 1/q 2 was demonstrated by Liouville, who proved that quadratic irrationals are badly approximable. A real number x is called badly approximable if there exists c(x) > 0 such that Liouville's result was later significantly improved by Jarník, who proved that the Hausdorff dimension of the set of badly approximable numbers is 1.
5.2. Iterated function systems, limit sets, and Hausdorff dimension. Let k ≥ 2 be an integer.
In what follows, we shall consider a finite famiily (S i ) k i=1 of contracting similarities on the unit interval I = [0, 1]. This means that for every 1 ≤ i ≤ k, the map S i : I → I satisfies for some 0 < c i < 1. We shall call such a family of similarities an Iterated Function System or IFS. A nonempty compact set F ⊆ I is said to be the attractor or the limit set of the IFS if It is well known (see e.g., [9,Chapter 9]) that the attractor F exists and is unique. Furthermore, if there exists a bounded nonempty open set U such that with the union disjoint, then the IFS is said to satisfy the open set condition. In this case, the Hausdorff dimension of the attractor is equal to the unique solution s > 0 of the equation We say that that the IFS (S i ) k i=1 satisfies the strong separation condition if S i (F ) ∩ S j (F ) = for all i = j, where F is the attractor. 10 A particularly important example of an iterated function system is the system where b ≥ 2 is fixed. This system satisfies the open set condition (with U = (0, 1)) but not the strong separation condition, and its attractor is the entire interval I. In some sense this IFS encodes the base b expansion(s) of any number in the interval [0, 1], since the number can be written as x = lim n→∞ S ω1 • · · · • S ωn (0). 10 Note that the strong separation condition implies (but is not implied by) the open set condition.
By looking at subsystems of the system (5.2), we can find IFSes whose limit sets can be described in terms of base b expansions. Fix A ⊆ C(b), and consider the subsystem of (5.2) consisting of the similarities (S i ) i∈A . We call such a subsystem a base b IFS. Its limit set is precisely the set of all numbers in [0, 1] that have at least one base b expansion whose digits all lie in A, i.e. We remark that it is easy to check whether a base b IFS satisfies the strong separation condition: Observation 5.3. The base b IFS defined by the alphabet A ⊆ C(b) satisfies the strong separation condition if and only if at least one of the following is true: (3) A does not contain any pair of consecutive integers.
If a base b IFS satisfies the strong separation condition, then every element of its limit set F has exactly one base b expansion whose digits come from A. In this case, there is no ambiguity about talking about "the base b expansion" of a number in F , since we understand that if there is more than one base b expansion, then we are talking about the one whose digits come from A.

5.3.
Intrinsic approximation on limit sets. Let F ⊆ R be a closed set, which we will think of as a fractal. The field of intrinsic Diophantine approximation is concerned with finding rational approximations to an irrational number x ∈ F by rational numbers that lie on the fractal F . Thus Mahler's first question is about intrinsic approximation on the Cantor set. More generally, one may ask about intrinsic approximation on the attractor of any similarity IFS. This leads to the following definition: Definition 5.4. Let F ⊆ R be a closed set, and let H : F ∩ Q → R >0 be a height function. We say that a function ψ : R >0 → R >0 is an intrinsic Dirichlet function on F (with respect to the height function H) if for every x ∈ F \ Q there exist infinitely many rationals p/q ∈ F ∩ Q such that x − p/q < ψ(H(p/q)).
Optimality of intrinsic Dirichlet functions can be defined in the same way as in Definition 5.2.
We have the following result: Proposition 5.5 ([5, Corollary 2.2]). Let F be the limit set of a base b IFS, and let δ be the Hausdorff dimension of F . Then for all x ∈ F , there exist infinitely many rational numbers p/q ∈ F (p ∈ Z, q ∈ N) such that In other words, the function ψ * (q) = (q · (log b q) 1/δ ) −1 is an intrinsic Dirichlet function on F for the standard height function.

The symbolic height function
Let F be the limit set of a base b IFS satisfying the strong separation condition, and fix a rational number r ∈ F ∩ Q. It is well known that the base b expansion of r is preperiodic, i.e.
for some i ≥ 0, j ≥ 1, and ω 1 , . . . , ω i+j ∈ A. Here the bar indicates that the string ω i+1 · · · ω i+j is infinitely repeated. Rewriting the right-hand side as a sum of fractions yields where ω 1 . . . ω i and ω i+1 . . . ω i+j are integers that have been written in base b. Adding the two resulting fractions together, we end up with a (complicated) expression whose denominator is b i (b j − 1). Further cancellations may or may not be possible, but we can always write the rational number as a fraction of two integers, the denominator of which is b i (b j − 1).
This fact leads to a natural height function on F ∩ Q related to the base b structure of the fractal F : where the indices i and j are the smallest integers such that r can be written in the form (6.1). The function H sym is called the symbolic height function. It was studied in a more general context in [12]. Notice the symbolic height of a rational number may not be the same as its standard height (i.e. its denominator in reduced form). For example, the rational number 0.20 3 in the Cantor ternary set is equal to 3 4 , so its standard height is 4. Nonetheless, the symbolic height of 0.20 3 is 3 0 · (3 2 − 1) = 8. It should be thought of as the denominator resulting from the following calculation: Although more cancellation is possible at the end of this calculation, this will not always be the case, 11 so in a principled way we have stopped reducing the fraction here. The calculation illustrates the fact that the symbolic height of a rational number r can be thought of as a "symbolic denominator", i.e. the denominator of a certain representation of r as the quotient of two integers. The numerator of this representation can be thought of as a "symbolic numerator" (in the above example the symbolic numerator would be 2), but as usual, for purposes of Diophantine approximation it is simpler to just work with the denominator. Note that the standard height is by definition smaller than the symbolic one, since we have p std /q std = p sym /q sym , but the left-hand side is in reduced form.
We remark that heuristically, if we are given two rational numbers r 1 and r 2 , and we are told that r 1 lies in the limit set of a base b IFS, but we are not told anything about r 2 , then we should expect the (multiplicative) discrepancy between the standard height and the symbolic height to be smaller for r 1 than for r 2 . This is because if we choose the numerator and denominator of a rational randomly, then the numbers i and j satisfying (6.1) may be comparable to the standard height of the rational (meaning that the symbolic height is an exponential function of the standard height), but the number would be exceedingly unlikely to lie in any base b limit set, since its digits would essentially be random. By contrast, if we choose the digits of a rational randomly out of a fixed alphabet A (with a fixed period and preperiod), then the amount of cancellation we expect to see in the symbolic representation of the rational will be much smaller, so the standard height and symbolic height will be relatively close. More heuristics regarding the relation between the symbolic height function and the standard one were discussed in [12]. 11 For example, the fraction at the end of the calculation 0.270 9 = 2 9 9 + 70 9 9 · 1 9 2 − 1 = 2 · 80 + 7 · 9 9 · 80 = 223 720 is already in reduced form.
One reason the symbolic height function is interesting is that it naturally shows up in the proofs of results regarding the standard height function. For example, the proof of Proposition 5.5 can easily be modified to bound |x − p/q| in terms of the symbolic height of p/q rather than the standard height: Proposition 6.1 ([5, Proof of Corollary 2.2]). Let F be the limit set of a base b IFS, and let δ be the Hausdorff dimension of F . Then for all x ∈ F , there exist infinitely many rational numbers r = p sym /q sym ∈ F such that x − p/q < 1 q sym (log b q sym ) 1/δ · In other words, the function ψ * (q) = (q · (log b q) 1/δ ) −1 is an intrinsic Dirichlet function on F for the symbolic height function.
In fact, the proof of [5, Corollary 2.2] essentially proceeds by first proving Proposition 6.1 and then using the inequality H std ≤ H sym to deduce Proposition 5.5. It appears extremely difficult to prove any improvement (either for all points or only for some) of Proposition 5.5 for the standard height without just proving the same bound for the symbolic height. So in some way, the symbolic height is measuring the "strength of our techniques".
Although the symbolic height function is motivated in terms of the standard height function, it can also be analyzed on its own terms. For example, we can ask whether the intrinsic Dirichlet function ψ * appearing in Proposition 6.1 is optimal for the symbolic height function. This is the same (cf. [13, §2.1]) as asking whether there exist any points in F that are badly symbolically approximable with respect to ψ * : Definition 6.2 (Special case of [12, Definition 4.7]). Let F be a base b limit set, and let δ denote the Hausdorff dimension of F . A number x ∈ F is called badly symbolically approximable (with respect to ψ * ) if there exists κ > 0 such that for every r = p sym /q sym ∈ F ∩ Q, we have (6.3) |x − r| ≥ κ q sym (log b q sym ) 1/δ . Theorem 6.3 (Corollary of [12, Lemma 4.9]; or see below). Let F be the limit set of a base b IFS satisfying the strong separation condition. Then any x ∈ F whose base b expansion is uniformly de Bruijn is badly symbolically approximable.
Combining with Theorem 2.1 gives: Corollary 6.4. With F as above, the set of badly symbolically approximable points has dimension at least α k δ > 0, where In particular, the intrinsic Dirichlet function φ * appearing in Proposition 6.1 is optimal.
We remark that the optimality assertion follows directly from combining Theorem 6.3 with [2, Corollary 7]; Theorem 2.1 is not needed.
In contrast to Proposition 6.1, Theorem 6.3 and Corollary 6.4 are weaker than their (unproven) analogues for the standard height function. This is because while Proposition 6.1 is about finding good approximations to points, in Theorem 6.3 and Corollary 6.4 we show that for certain points, good approximations cannot exist. But the inequality H std ≤ H sym means that the quality of an approximation is better according to the standard height than according to the symbolic height, which yields the appropriate implications.
We remark that Theorem 6.3 is only a one-way implication: there may be (and almost certainly are) badly symbolically approximable numbers whose base b expansions are not uniformly de Bruijn. A combinatorial characterization of the base b expansions of badly symbolically approximable numbers was given in [12, Lemma 4.9]. As a consequence of the one-sidedness of the implication, Theorem 6.3 yields a lower bound on the dimension of the set of badly symbolically approximable points but not an upper bound.
In fact, we believe that there is no nontrivial upper bound: we conjecture that the Hausdorff dimension of the set of badly symbolically approximable points of any base b limit set F is equal to the Hausdorff dimension of F . This conjecture is motivated by other situations in Diophantine approximation where the dimension of the set of badly approximable points has always turned out to be full. However, Theorem 2.1 shows that this conjecture cannot be proven using uniformly de Bruijn sequences.
Although Theorem 6.3 is a consequence of the much more general result [12, Lemma 4.9], we prove it here for completeness and ease of exposition.
Proof of Theorem 6.3. Let x ∈ F be a number whose base b expansion, which we denote by ω, is uniformly de Bruijn. Let ℓ denote the size of the largest gap in the set B ω defined by (2.1). Fix r ∈ F ∩ Q, and let the representation r = 0.τ 1 . . . τ i τ i+1 . . . τ j be chosen so as to minimize i and j. Then the symbolic height of r, as defined in (6.2), is q sym = b i (b j−i − 1) ≤ b j . Since the IFS defining F is assumed to satisfy the strong separation condition, the distance between x and r is comparable to b −m , where m is the largest index for which ω i = τ i for all i ≤ m. In fact, a careful analysis shows that |x − r| ≥ b −(m+2) , though the precise constant factor is not relevant. We claim that if j ≥ ℓ, then which demonstrates that (6.3) holds with κ = b −(ℓ+2) . We now separate into two cases: Case 1: m ≤ j + ℓ. In this case, we have Case 2: m > j + ℓ. In this case, by the mth letter, the sequence τ will have already begun to repeat. The longest repeated string in the sequence τ 1 . . . τ m is τ i+1 . . . τ m−(j−i) = τ j+1 . . . τ m . Note that although the two sides of this equation represent distinct instances of the same string as a substring of τ 1 . . . τ m , the two instances may overlap with each other; this happens if and only if m > 2j − i. For the purposes of our calculations, it does not matter whether these two instances overlap or not.
By the definition of m, we have ω 1 · · · ω m = τ 1 · · · τ m , so ω also has a repeated string ω i+1 . . . ω m−(j−i) = ω j+1 . . . ω m of length (m − j) occurring in the first m letters. On the other hand, by the definition of ℓ, there exists m − j − ℓ < n ≤ m − j such that n ∈ B ω , which implies that ω has no repeated string of length n occurring in the first k n + n − 1 letters of ω. Since n ≤ m − j, it follows that m > k n + n − 1, and thus k n ≤ m − n < j + ℓ ≤ 2j.