Uniformly de Bruijn Sequences and Symbolic Diophantine Approximation on Fractals

Intrinsic Diophantine approximation on fractals, such as the Cantor ternary set, was undoubtedly motivated by questions asked by K. Mahler (1984). One of the main goals of this paper is to develop and utilize the theory of infinite de Bruijn sequences in order to answer closely related questions. In particular, we prove that the set of infinite de Bruijn sequences in k≥2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${k \geq 2}$$\end{document} letters, thought of as a set of real numbers via a decimal expansion, has positive Hausdorff dimension. For a given k, these sequences bear a strong connection to Diophantine approximation on certain fractals. In particular, the optimality of an intrinsic Dirichlet function on these fractals with respect to the height function defined by symbolic representations of rationals follows from these results.


Introduction
In this paper, we give a novel application of combinatorics to the field of Diophantine approximation. Since we do not assume that the reader is familiar with this field, let us first recall some important concepts and ideas. We refer the reader to Section 5 where we rigorously define and discuss these notions.
Classically, the field of Diophantine approximation sought to quantify how well real numbers can be approximated by rationals, weighing the distance to the rational point against some function of its denominator. The inaugural result in the field is Dirichlet's theorem, Theorem 5.2, which states that every irrational real number has (1) in Cantor's set, and (2) by rational numbers not in Cantor's set?" In this paper, we will restrict our attention to Mahler's first question; see Section 6 for details. We remark that while in [11], the first-and third-named authors were able to exhibit an optimal Dirichlet function (see Definition 5.3) corresponding to Mahler's second question, it seems that finding an analogous answer to his first question is significantly harder, see, e.g., [4,6,11] for detailed discussions and conjectures regarding this question.
In [11], a new height function was defined on the rational points of the Cantor set (see Section 6), and a Dirichlet-type theorem was proven [11, Corollary 2.2 and its proof]. The purpose of this paper is to demonstrate the optimality of that Dirichlet theorem, and give an estimate on the Hausdorff dimension of the set of "badly approximable" points. This set, as noted in [11], admits a precise combinatorial description, although at the time we had been unable to exhibit any members belonging to it. In the present paper, we focus on a combinatorially defined subset of the set of badly approximable points, the set of uniformly de Bruijn sequences. The existence of uniformly de Bruijn sequences demonstrates the optimality of the Dirichlet function (Theorem 6.3), and by estimating from below the Hausdorff dimension of the set of uniformly de Bruijn sequences (Theorem 2.1), we are able to get a positive lower bound for the Hausdorff dimension of the set of badly approximable points (Corollary 6.4), a first step towards a Jarník-type result. See Section 6 for a more nuanced discussion of these points.

Finite and Infinite de Bruijn Sequences
Let A be a finite alphabet of cardinality k ≥ 2. We recall that a (non-cyclic) de Bruijn sequence of order n in A is a sequence ω of length k n + n − 1 in the alphabet A that has the property that every sequence of length n in A appears as a consecutive substring of ω exactly once. For example, in the alphabet {0, 1}, the sequence 00110 is a de Bruijn sequence of order 2 while in the alphabet {0, 1, 2}, the sequence 00010020110120210221112122200 is a de Bruijn sequence of order 3. We say that an infinite sequence ω ∈ A N is infinitely de Bruijn if the set B ω def = {n ∈ N : the initial segment of ω of length k n + n − 1 is a de Bruijn sequence of order n} (2.1) is infinite. We say that ω is totally de Bruijn if B ω = N, and uniformly de Bruijn if B ω has bounded gap sizes. The construction of infinitely de Bruijn sequences goes back to Becher and Heiber [1], * who showed that when k ≥ 3, totally de Bruijn sequences could be constructed recursively by extending each de Bruijn sequence of order n to a de Bruijn sequence of order (n + 1). We shall discuss their method in more detail below. When k = 2, it is known that no totally de Bruijn sequence exists, but Becher and Heiber do construct a uniformly de Bruijn sequence such that B ω = 2N. In order to state our main theorem for this section, let us briefly recall the definition and basic properties of the Hausdorff dimension of a fractal † F ⊆ R d , see, e.g., [8,. Let d denote the standard metric on R d , and let diam(U) denote the diameter of a set U ⊆ R d . Fix δ > 0 and let F ⊆ R d . We say that a countable collection {U j : j ∈ N} of subsets of R d is a δ -cover of F if F ⊆ ∞ j=1 U j and diam(U j ) ≤ δ for every j. For each s ≥ 0, let Then the s-dimensional Hausdorff measure of F is the number and the Hausdorff dimension of F is the number We also recall that if b ≥ 2 is an integer, then the base b expansion of a number x ∈ [0, 1] is the series where ω 1 , ω 2 , . . . ∈ {0, 1, . . . , b − 1} are chosen so that the value of the series is equal to x. This choice is unique unless x is a rational number whose denominator is a power of b, in which case there are exactly two ways in which the infinite word ω = ω 1 ω 2 · · · can be chosen.
In particular, S has positive Hausdorff dimension but not full Hausdorff dimension.
Note that for large values of k, Stirling's formula gives α k ∼ log(k!) k log(k) ∼ 1− 1 log(k) (where x ∼ y means (1 − x)/(1 − y) → 1), and in particular α k → 1 as k → ∞. Thus S gets closer and closer to having full dimension as the number of allowed digits increases.

Preliminaries
We begin by recalling some key definitions used in Becher and Heiber's paper, as well as the proof of the well-known BEST ¶ theorem.
Note that every vertex has in-degree and out-degree both equal to k def = #(A), for a total of k n vertices and k n+1 edges. § It is well known that δ = log(k)/ log(b), see Subsection 5.2. ¶ An acronym after the people who discovered it: de Bruijn, van Aardenne-Ehrenfest, Smith, and Tutte.

UniformlydeBruijnSequencesandDiophantineApproximation 275
Uniformly de Bruijn Sequences and Diophantine Approximation 5 If ω is a sequence of length ≥ n in A, then the path induced by ω on G is the path γ = γ 1 · · · γ −n+1 in G defined by the formula Observation 3.2. Let ω be a sequence of length ω = k m +m−1, and let γ be the path induced by ω on G n (A). Note that the length of γ is γ = ω − n = k m + m − n − 1; in particular, γ = k n+1 if m = n + 1, and γ < k n if m ≤ n. Moreover, Now let X = (V (X), E(X)) be a directed graph such that for each vertex x ∈ V (X), the in-degree and out-degree of x are nonzero and equal to each other (though they may depend on x). Fix a vertex x 0 ∈ V (X), and let E be the set of Eulerian paths of X that start and end at x 0 . Note that, unlike standard convention, we consider two Eulerian paths to be different if they are formally different as sequences of vertices even if they are cyclically equivalent. Let T be the set of directed spanning trees of X rooted at x 0 with edges pointing towards x 0 .
Since both the conclusion of the BEST theorem and its proof will be important for our argument, we recall them now. We once again remind the reader that our statement differs slightly from the usual one because of our convention about counting Eulerian paths: we do not consider cyclically equivalent paths to be the same. But the difference is easy to quantify: the number of Eulerian paths in each cyclic equivalence class that start and end at x 0 is equal to the degree of x 0 (we recall that by assumption the in-degree and out-degree are equal). So our count will be off from the conventional one by a factor of deg(x 0 ).

Theorem 3.4. (BEST theorem) We have
Proof. Let T ∈ T be a directed spanning tree rooted at x 0 . For each x ∈ V (X), let E x denote the set of edges in X with initial vertex x, and let In this paper a "path" in a directed graph is a sequence of vertices such that each pair of consecutive vertices is connected by an edge from the first vertex to the second vertex. The length of a path is the number of such edges, or equivalently, the number of vertices minus one (counting multiplicity in both cases). A path is simple if all its vertices are distinct except possible the first and last, and Eulerian if it contains each edge exactly once.
be the Eulerian path that starts and ends at x 0 defined recursively as follows: suppose that the points x 0 = γ 0 , γ 1 , . . . , γ i have been defined, and let x = γ i . Then the next vertex γ i+1 must be chosen so that γ i γ i+1 ∈ E x , but γ i γ i+1 = γ j γ j+1 for all j < i. We make this choice so as to minimize γ i γ i+1 according to the ordering o x subject to these restrictions. If the edges of E x \ T x have been exhausted, then if x = x 0 we choose the vertex v x , and if x = x 0 , then we terminate the path. There is some work to do to show that f (T, o) is indeed an Eulerian path, and that every Eulerian path that starts and ends at x 0 can be represented uniquely as f (T, o) for some T ∈ T and o ∈ O(T ), see, e.g., [17, pp. 445-446]. This implies that f is a bijection between niquely as f (T, n between T ∈T O(T ) = {(T, o) : T ∈ T , o ∈ O(T )} and E, which completes the proof.
We will also need the following sufficient condition for the right-hand side of (3.1) to be nonzero: Lemma 3.5. If X is connected, then there is at least one directed spanning tree rooted at x 0 , i.e., T = .
Proof. Let T be a maximal directed tree rooted at x 0 . By the maximality of T , there is no edge from any vertex not in T to any vertex in T . Since each vertex of X has equal in-degree and out-degree, the number of edges from V (T ) to V (X) \V (T ) is equal to the number of edges from V (X) \ V (T ) to V (T ), which is equal to zero. Since X is connected, this means that either V (T ) = or V (X) \ V (T ) = . But x 0 ∈ V (T ) by construction, so V (X) \ V (T ) = and thus T is a spanning tree, i.e., T ∈ T .

The Upper Bound
We begin by establishing the upper bound of Theorem 2.1. To do this we will use the Hausdorff-Cantelli lemma, a very useful tool for establishing upper bounds on the Hausdorff dimensions of certain sets, see, e.g., [2,Lemma 3.10]. Let U j : j ∈ N be a countable collection of sets in R d , and let U be the set consisting of those elements of R d that belong to infinitely many of the sets U j ( j ∈ N). In other words, It turns out to be convenient to consider a collection U j : j ∈ N that naturally splits up into subcollections, say, U j : j ∈ N = m C m for some sequence of collections (C m ) ∞ m=1 . In this case, the summability condition (4.1) is equivalent to the condition to the power of s. The set S can be written in terms of the collections (C m ) ∞ m=1 as follows: In what follows we will abuse terminology somewhat by calling cost s (C m ) the "cost" of the set S m def = U∈C m U, although strictly speaking, it depends not only on S m but also on how it is decomposed.
Proof of upper bound. For each m, let S m be the set consisting of all elements of F corresponding to base b expansions whose initial segments of length k m +m−1 are de Bruijn sequences of order m in A. Then the lim sup of the sequence (S m ) ∞ m=1 consists of those elements of F with infinitely de Bruijn base b expansions. In particular, the set S consisting of those elements of F with uniformly de Bruijn base b expansions satisfies By the Hausdorff-Cantelli lemma, if we can find an s such that then we can conclude that dim H (S) ≤ s. We will show that (4.2) holds for all s > δ log(k!) k log(k) . For each m, we view S m as the union of the collection where for each ω, S ω m is the set of points x ∈ F corresponding to base b expansions whose initial segments of length k m + m − 1 are equal to ω. Let G be the de Bruijn graph of order (m − 1) on A (see Definition 3.1), so that #(V (G)) = k m−1 . By Observation 3.2(I), the collection C m is in bijection with the set of Eulerian paths on G. Fix a vertex x 0 ∈ V (G). We can estimate the number of Eulerian paths starting and , since every vertex x ∈ V (G) has degree equal to k. The number of spanning trees rooted at x 0 is at most k #(V (G))−1 , since an edge must be chosen emanating from each vertex x = x 0 , and each vertex has out-degree k. And for the same reason, deg(x 0 ) = k. Therefore, the number of Eulerian paths starting and ending at x 0 is at most Now fix ε > 0 and set By the ratio test, this series converges as long as lim m→∞ |a m+1 /a m | < 1, where a m denotes the mth term. A straightforward computation yields: which tends to 0 as m → ∞. Thus by Lemma 4.1, we have Since for all k ≥ 2 we have k! < k k and thus log(k!) k log(k) < 1, we deduce that the Hausdorff dimension of S is strictly less than δ . * In fact, the exact count for such sequences is known, but we prefer this estimate because it is simpler and yields the same upper bound on the Hausdorff dimension. Uniformly de Bruijn Sequences and Diophantine Approximation 9

The Lower Bound
The proof of the lower bound is significantly more involved, and will require a few preliminary results. We begin with the following proposition: , and let E be the set of Eulerian paths of X that start and end at x 0 . Then there exists E ⊆ E such that: Proof. Since X is connected, by Lemma 3.5 there exists a directed spanning tree T rooted at x 0 . Let E be the set of Eulerian paths δ that start and end at x 0 such that for all xy ∈ E(X) and xz ∈ E(T ) with y = z, the edge xy appears in δ before xz does. Equivalently, where the notation is as in the proof of the BEST theorem. Then the proof of the BEST theorem implies that # where E x denotes the set of edges with initial vertex x, and E(δ ) denotes the edge set of δ . Here we use the convention then there is exactly one ordering o x satisfying the appropriate condition, namely, the ordering determined by δ , and by hypothesis the element v x comes last in this ordering. Now since The next result will furnish the lower bound for k ≥ 4. Although it is valid for k = 3, it provides no useful information in this case since 0 is always a (trivial) lower bound on the dimension.
Before we turn to the proof, we recall the so-called Mass Distribution Principle, an extremely useful tool for bounding the Hausdorff dimension from below.
Proof of Corollary 4.3. Fix n ∈ N, and let ω = ω 1 · · · ω k n +n−1 be a de Bruijn sequence of order n in A. Since the path induced by ω on G n−1 (A) is an Eulerian path in a directed graph in which each vertex has equal in-degree and out-degree, it must start and end at the same vertex, which means that the first (n − 1) letters of ω are the same as the last (n − 1) letters, i.e., ω k n +i = ω i for all i = 1, . . . , n − 1. * Now let ω k n +n = ω n and ω = ω 1 · · · ω k n +n . Then the first n letters of ω are the same as the last n letters, but no other block of n letters is repeated in ω .
Let G = G n (A) be the de Bruijn graph of order n on A, and let γ = γ 1 · · · γ k n +1 be the path induced by ω on G. Then γ is a Hamiltonian cycle (i.e., a simple path traversing each vertex once). The collection of de Bruijn sequences of order (n + 1) that extend ω is isomorphic to the collection of Eulerian paths on G that extend γ.
Let x 0 def = γ 1 = γ k n +1 be the common initial and terminal vertex of γ. Then the collection of Eulerian paths of G that extend γ is isomorphic to the set of Eulerian paths of X ω def = G \ E(γ) that start and end at x 0 , which we denote by E(ω). Since X ω is a (k − 1)-regular connected directed graph whose vertex set has size k n (see the proof of [1, Lemma 3] for connectedness), we may use Proposition 4.2 to extract a subset E (ω) ⊆ E(ω). Pulling this subset back via the appropriate correspondences gives us a set S (ω), contained in the set of all de Bruijn sequences of order (n + 1) extending ω (and thus also extending ω), with the following properties: (ii) If τ is a sequence of length τ extending ω, then the number of sequences in S (ω) that extend τ is at most (k − 1) · (k − 2)! k n −( τ − ω −1)/k , where ω = k n + n − 1 is the length of ω. * This phenomenon is related to the fact that we consider non-cyclic de Bruijn sequences instead of cyclic ones: each cyclic de Bruijn sequence ω = ω 1 ··· ω k n corresponds to a non-cyclic de Bruijn sequence ω 1 ··· ω k n ω 1 ··· ω n−1 that is longer but has the same number of consecutive substrings. This correspondence makes it obvious that the first (n − 1) letters of a non-cyclic de Bruijn word are expected to be the same as the last (n − 1) letters. However, by itself this is not a proof, because our definition of non-cyclic de Bruijn sequences did not assume that they were constructed from cyclic ones. Now we proceed to define a probability measure µ on F ≡ E N via a random algorithm: start with a fixed de Bruijn sequence ω (1) of order 1, and if ω (n) is a de Bruijn sequence of order n, then let ω (n+1) ∈ S (ω (n) ) be chosen randomly with respect to the uniform measure on S (ω (n) ), independent of all previous selections. Let ω be the unique infinite sequence that extends all of the finite sequences ω (n) (n ∈ N). Then ω is a base b expansion of a unique point π(ω) ∈ F. (The point π(ω) may have a base b expansion other than ω, but there is no other point with base b expansion ω.) We let µ be the probability measure describing the distribution of the random variable π(ω). (The existence of such a µ can be guaranteed, e.g., by the Kolmogorov extension theorem.) To demonstrate that µ satisfies the hypotheses of the mass distribution principle, we first estimate the measure of cylinder sets of a certain length, then arbitrary cylinder sets, then balls. Here a cylinder set is a set of the form [τ] = {π(ω) : ω i = τ i ∀ i = 1, . . . , τ }, where τ = τ 1 · · · τ τ is a finite sequence in the alphabet A. Our first estimate is easy: if τ = k n+1 + n for some n, then [τ] is precisely the set of π(ω) in the above construction such that ω (n+1) = τ, so µ([τ]) is just the probability that ω (n+1) = τ, i.e., if it is possible that ω (n+1) = τ, and µ([τ]) = 0 otherwise. Now consider the more general case where the length of τ satisfies k n + n − 1 < τ ≤ k n+1 + n for some n. Then by (ii) above, [τ] contains at most (k − 1) · (k − 2)! k n −( τ −(k n +n))/k cylinders of length k n+1 + n. Combining with (4.5) shows that Here and hereafter we use the notation exp x (y) def = x y . To apply the mass distribution principle (Lemma 4.4), we now need to relate this measure to the diameter of the cylinder [τ]. Since elements of [τ] have the first τ digits of their base b expansions fixed, the diameter of where C = (k − 1) · c −α k δ and s = α k δ . But any subset of F can be covered by at most two cylinder sets with comparable diameter, so a similar formula holds for arbitrary sets. Thus by Lemma 4.4, we have dim H (S) ≥ s = α k δ .
As is evident from Corollary 4.3, we now have to deal with the cases k = 2 and k = 3 separately, since in those cases the formula (4.4) gives α 2 = α 3 = 0, which is not a useful bound. Note that the Cantor ternary set falls into the case k = 2, since its set of admissible numerators is A = {0, 2}. L. Fishman, K. Merrill, and D. Simmons Proposition 4.5. If k = 2 and ω is a de Bruijn sequence of order (n − 2) in A, then the number of de Bruijn sequences of order (n + 1) that extend ω is at least 2 2 n−2 .
In the case where k = 3 and ω is a de Bruijn sequence of order (n − 1) in A, then the number of de Bruijn sequences of order (n + 1) that extend ω is at least 4 3 n−1 .
Proof. For convenience, we let ∆ = 2 if k = 3, and ∆ = 3 if k = 2; then ω is a de Bruijn sequence of order (n − ∆ + 1). The first paragraph of Corollary 4.3 shows that the first (n − ∆) letters of ω are the same as the last (n − ∆) letters. So if we extend ω to a word ω of length k n−∆+1 + n by letting ω k n−∆+1 +i = ω i for i = n − ∆ + 1, . . ., n, then the first n letters of ω are the same as the last n letters, but no other block of n letters is repeated.
Let G be the de Bruijn graph of order n on A, and let γ be the path induced by ω on G. The length of γ is γ = k n−∆+1 , and γ is a simple path that starts and ends at the same vertex x 0 . As in the proof of Corollary 4.3, we let X = X ω = G \ E(γ), where E(γ) is the edge set of γ. The collection of de Bruijn sequences of order (n + 1) that extend ω is isomorphic to the collection of Eulerian paths on G that extend γ, which in turn is isomorphic to the collection of Eulerian paths on X ω that start and end at x 0 . By the BEST theorem, the cardinality of this collection is If k = 3, we complete the proof with the following calculation: In the first inequality, we have used Lemma 3.5 and the proof of [1, Lemma 3] to deduce that #(T ) ≥ 1.
For the remainder of the proof, we assume that k = 2. In this case, the strategy of the above calculation cannot work, since we have [deg(x; X ω ) − 1]! = 1 for all x ∈ V (G) and thus N ≤ 2#(T ). Instead we must estimate the number of spanning trees in X ω .
Let S be the set of sequences of length (n − 1) that do not occur in ω, and note that #(S) = 2 n−1 − 2 n−2 = 2 n−2 . For each τ ∈ S, let E τ = {aτb : a, b ∈ A} ⊆ E(X ω ), where aτb is shorthand for (aτ)(τb), the edge from the vertex aτ to the vertex τb. Note that the sets E τ (τ ∈ S) are disjoint. Lemma 4.6. If T is a directed spanning tree and τ ∈ S, then there exists a directed spanning tree T = T such that T \ E τ = T \ E τ . Uniformly de Bruijn Sequences and Diophantine Approximation 13 Proof. By contradiction, suppose that the conclusion of the lemma is false, i.e., that there exists no such spanning tree T . Denote the partial order on V (G) induced by the tree T by <, i.e., write x < y if there is a path in T from x to y, and write x ≤ y if either x < y or x = y. We write x < * y if x is a direct descendant of y, i.e., if xy ∈ E(T ) . For each a ∈ A, let f (a)  τd ≤ aτ < * τc > * bτ ≥ τd or τd ≤ aτ < τc ≤ bτ < τd.
Both diagrams are impossible for directed trees: the left-hand diagram is impossible because if aτ and bτ are siblings, then they have no common descendants, while the right-hand diagram is disjoint because it is a nontrivial directed loop. This is the desired contradiction.
It follows from Lemma 4.6 that there exists a function φ : T × S → T such that for all T ∈ T and τ ∈ S, we have φ (T, τ) = T and φ (T, τ) \ E τ = T \ E τ . Now by Lemma 3.5 and the proof of [1, Lemma 5], X has a directed spanning tree T 0 rooted at x 0 . Let (τ i ) N i=1 be an indexing of S, where N = 2 n−2 . Given ω ∈ {0, 1} N , we define recursively Then the map {0, 1} N ω → T ω, N ∈ T is injective. Thus N ≥ #(T ) ≥ # {0, 1} N = 2 2 n−2 , which completes the proof.  and Proposition 4.5 can be expressed uniformly as follows: if ω is a de Bruijn sequence of order n in A, then the number of de Bruijn sequences of order n + ∆ that extend ω is at least exp B (k n ). We denote the set of all such extensions by S (ω).

Corollary 4.7. Let the notation be as in Theorem 2.1. Suppose that k ≤ 3, and let
As in the proof of Corollary 4.3, we define a probability measure µ by a random algorithm: let ω (1) be a fixed de Bruijn sequence of order ∆, and if ω (n) is a de Bruijn sequence of order n∆, then let ω (n+1) be chosen randomly with respect to the uniform measure on S ω (n) , independent of all previous selections. As before we let ω ∈ A N be the unique common extension, we let π(ω) ∈ F be the unique number for which ω is a base b expansion, and we let µ be the probability measure describing the distribution of π(ω).
There are two ways that we could bound µ([τ]):

Since [τ]
can be written as the union of at most exp 3 9 n+1 + 2(n + 1) − 1 − τ cylinder sets corresponding to de Bruijn sequences of order 2(n + 1), we have Which of these bounds is better depends on the value of τ . Now, as in the proof of Corollary 4.3, we have diam( To apply the mass distribution principle, we need to show that for some constant C. It is enough to show that min 4 −9 n /8 , exp 3 possibly with a different value of C, where t = s/δ < α 3 < 1. Equivalently, we need to show that min 4 −9 n /8 · 3 t τ , 3 9 n+1 · 9 n · 4 −9 n+1 /8 · 3 (t−1) τ ≤ C. Now the first input to the binary operator min is an increasing function of τ , while the second input is a decreasing function of τ . It follows that the largest value the left-hand side can attain is the value attained when the two inputs to min are equal, i.e., when 4 −9 n /8 = 3 9 n+1 · 9 n · 4 −9 n+1 /8 · 3 − τ , at which point the left-hand side is 4 −9 n /8 · 3 9 n+1 · 9 n · 4 9 n /8−9 n+1 /8 t .
We need this expression to be bounded as n → ∞. Applying the change of variables x = 9 n , we need to show that lim sup

This is true if and only if
which in turn is true if and only if t < α 3 . This proves that the hypothesis of the mass distribution principle holds for cylinder sets. As in the proof of Corollary 4.3, any subset of F can be covered by at most two cylinder sets with comparable diameter, so the hypothesis of the mass distribution principle holds for arbitrary sets as well.

Remark 4.8.
Either of the strategies used in this proof, the (simpler) strategy for the k = 3 case or the (more complicated) strategy for the k = 2 case, could have been used (after minor modification) in the case k ≥ 4 as well, but the resulting bound would have been significantly worse, measured by the fact that the analogues of α k would not have tended to 1. Similarly, the strategy for the k = 2 case could have been used for the k = 3 case, again resulting in a worse bound. In general, the principle is that whatever techniques work for one value of k will also work for higher values of k, but may not give very good estimates for higher values of k.

Diophantine Approximation -a Brief Survey
We first recall some definitions and state some well-known classical theorems: Definition 5.1. Let H : Q → R >0 be a function. We think of H as a "height function", and for all p ∈ Z and q ∈ N, we define the height of p/q to be the number H(p/q). We say that a function ψ : R >0 → R >0 is a Dirichlet function (with respect to the height function H) if for every x ∈ R \ Q there exist infinitely many rationals p/q such that x − p/q < ψ(H(p/q)). L. Fishman, K. Merrill, and D. Simmons Historically speaking, the only height function considered on the unit interval [0, 1] was the function H std (p/q) = q, where p and q are chosen in reduced form, i.e., gcd(p, q) = 1. We will refer to this as the standard height. It is readily verified that, for example, ψ 0 (q) = 1 and ψ 1 (q) = 1/q are Dirichlet functions with respect to the standard height function and using the terminology of Definition 5.1, Dirichlet's approximation theorem may be stated as follows: q 2 is a Dirichlet function with respect to the standard height function. * For our purposes, although of interest in its own right, an improvement of a Dirichlet function by a multiplicative constant is not significant. More precisely: Definition 5.3. We say that a Dirichlet function ψ is optimal if there does not exist a Dirichlet function φ for which lim q→∞ φ (q) ψ(q) → 0. It is clear that Dirichlet's theorem implies that the Dirichlet functions ψ 0 and ψ 1 defined above are not optimal. The optimality of the function ψ 2 (q) = 1/q 2 was demonstrated by Liouville, who proved that quadratic irrationals are badly approximable. A real number x is called badly approximable if there exists c(x) > 0 such that Liouville's result was later significantly improved by Jarník, who proved that the Hausdorff dimension of the set of badly approximable numbers is 1.

Iterated Function Systems, Limit Sets, and Hausdorff Dimension
Let k ≥ 2 be an integer. In what follows, we shall consider a finite famiily of contracting similarities on the unit interval I = [0, 1]. This means that for every 1 ≤ i ≤ k, the map S i : I → I satisfies for some 0 < c i < 1. We shall call such a family of similarities an Iterated Function System or IFS. A nonempty compact set F ⊆ I is said to be the attractor or the limit set of the IFS if It is well known (see, e.g., [8,Chapter 9]) that the attractor F exists and is unique. with the union disjoint, then the IFS is said to satisfy the open set condition. In this case, the Hausdorff dimension of the attractor is equal to the unique solution s > 0 of the equation We say that the IFS (S i ) k i=1 satisfies the strong separation condition if for all i = j, where F is the attractor. † A particularly important example of an iterated function system is the system where b ≥ 2 is fixed. This system satisfies the open set condition with U = (0, 1) but not the strong separation condition, and its attractor is the entire interval I. In some sense this IFS encodes the base b expansion(s) of any number in the interval [0, 1], since the number can be written as x = lim n→∞ S ω 1 • · · · • S ω n (0).
By looking at subsystems of the system (5.2), we can find IFSes whose limit sets can be described in terms of base b expansions. Fix A ⊆ C(b), and consider the subsystem of (5.2) consisting of the similarities (S i ) i∈A . We call such a subsystem a base b IFS. Its limit set is precisely the set of all numbers in [0, 1] that have at least one base b expansion whose digits all lie in A, i.e., We remark that it is easy to check whether a base b IFS satisfies the strong separation condition:  If a base b IFS satisfies the strong separation condition, then every element of its limit set F has exactly one base b expansion whose digits come from A. In this case, there is no ambiguity about talking about "the base b expansion" of a number in F, since we understand that if there is more than one base b expansion, then we are talking about the one whose digits come from A.

Intrinsic Approximation on Limit Sets
Let F ⊆ R be a closed set, which we will think of as a fractal. The field of intrinsic Diophantine approximation is concerned with finding rational approximations to an irrational number x ∈ F by rational numbers that lie on the fractal F. Thus Mahler's first question is about intrinsic approximation on the Cantor set. More generally, one may ask about intrinsic approximation on the attractor of any similarity IFS. This leads to the following definition: Definition 5.5. Let F ⊆ R be a closed set, and let H : F ∩ Q → R >0 be a height function. We say that a function ψ : R >0 → R >0 is an intrinsic Dirichlet function on F (with respect to the height function H) if for every x ∈ F \ Q there exist infinitely many rationals p/q ∈ F ∩ Q such that x − p/q < ψ(H(p/q)).

Optimality of intrinsic Dirichlet functions can be defined in the same way as in Definition 5.3.
We have the following result: Proposition 5.6. ([4, Corollary 2.2]) Let F be the limit set of a base b IFS, and let δ be the Hausdorff dimension of F. Then for all x ∈ F, there exist infinitely many rational numbers p/q ∈ F (p ∈ Z, q ∈ N) such that In other words, the function ψ * (q) = q · (log b q) 1/δ −1 is an intrinsic Dirichlet function on F for the standard height function.

The Symbolic Height Function
Let F be the limit set of a base b IFS satisfying the strong separation condition, and fix a rational number r ∈ F ∩ Q. It is well known that the base b expansion of r is preperiodic, i.e., for some i ≥ 0, j ≥ 1, and ω 1 , . . . , ω i+ j ∈ A. Here the bar indicates that the string ω i+1 · · · ω i+ j is infinitely repeated. Rewriting the right-hand side as a sum of fractions yields where ω 1 · · · ω i and ω i+1 · · · ω i+ j are integers that have been written in base b. Adding the two resulting fractions together, we end up with a (complicated) expression whose denominator is b i b j − 1 . Further cancellations may or may not be possible, but we can always write the rational number as a fraction of two integers, the denominator of which is This fact leads to a natural height function on F ∩ Q related to the base b structure of the fractal F: where the indices i and j are the smallest integers such that r can be written in the form (6.1). The function H sym is called the symbolic height function. It was studied in a more general context in [11]. Notice the symbolic height of a rational number may not be the same as its standard height (i.e., its denominator in reduced form). For example, the rational number 0.20 3 in the Cantor ternary set is equal to 3 4 , so its standard height is 4. Nonetheless, the symbolic height of 0.20 3 is 3 0 · 3 2 − 1 = 8. It should be thought of as the denominator resulting from the following calculation: Although more cancellation is possible at the end of this calculation, this will not always be the case, ‡ so in a principled way we have stopped reducing the fraction here. The calculation illustrates the fact that the symbolic height of a rational number r can be thought of as a "symbolic denominator", i.e., the denominator of a certain representation of r as the quotient of two integers. The numerator of this representation can be thought of as a "symbolic numerator" (in the above example the symbolic ‡ For example, the fraction at the end of the calculation 0.270 9 = 2 9 9 + 70 9 9 · 1 9 2 −1 = 2·80+7·9 9·80 = 223 720 is already in reduced form. numerator would be 2), but as usual, for purposes of Diophantine approximation it is simpler to just work with the denominator. Note that the standard height is by definition smaller than the symbolic one, since we have p std /q std = p sym /q sym , but the left-hand side is in reduced form.
We remark that heuristically, if we are given two rational numbers r 1 and r 2 , and we are told that r 1 lies in the limit set of a base b IFS, but we are not told anything about r 2 , then we should expect the (multiplicative) discrepancy between the standard height and the symbolic height to be smaller for r 1 than for r 2 . This is because if we choose the numerator and denominator of a rational randomly, then the numbers i and j satisfying (6.1) may be comparable to the standard height of the rational (meaning that the symbolic height is an exponential function of the standard height), but the number would be exceedingly unlikely to lie in any base b limit set, since its digits would essentially be random. By contrast, if we choose the digits of a rational randomly out of a fixed alphabet A (with a fixed period and preperiod), then the amount of cancellation we expect to see in the symbolic representation of the rational will be much smaller, so the standard height and symbolic height will be relatively close. More heuristics regarding the relation between the symbolic height function and the standard one were discussed in [11].
One reason the symbolic height function is interesting is that it naturally shows up in the proofs of results regarding the standard height function. For example, the proof of Proposition 5.6 can easily be modified to bound |x − p/q| in terms of the symbolic height of p/q rather than the standard height: Proposition 6.1. ([4, Proof of Corollary 2.2]) Let F be the limit set of a base b IFS, and let δ be the Hausdorff dimension of F. Then for all x ∈ F, there exist infinitely many rational numbers r = p sym /q sym ∈ F such that In other words, the function ψ * (q) = q · (log b q) 1/δ −1 is an intrinsic Dirichlet function on F for the symbolic height function.
In fact, the proof of [4, Corollary 2.2] essentially proceeds by first proving Proposition 6.1 and then using the inequality H std ≤ H sym to deduce Proposition 5.6. It appears extremely difficult to prove any improvement (either for all points or only for some) of Proposition 5.6 for the standard height without just proving the same bound for the symbolic height. So in some way, the symbolic height is measuring the "strength of our techniques".
Although the symbolic height function is motivated in terms of the standard height function, it can also be analyzed on its own terms. For example, we can ask whether the intrinsic Dirichlet function ψ * appearing in Proposition 6.1 is optimal for the symbolic height function. This is the same (cf. [12, §2.1]) as asking whether there exist any points in F that are badly symbolically approximable with respect to ψ * : Definition 6.2. (Special case of [11,Definition 4.7]) Let F be a base b limit set, and let δ denote the Hausdorff dimension of F. A number x ∈ F is called badly symbolically approximable (with respect to ψ * ) if there exists κ > 0 such that for every r = p sym /q sym ∈ F ∩ Q, we have Theorem 6.3. (Corollary of [11,Lemma 4.9]; or see below) Let F be the limit set of a base b IFS satisfying the strong separation condition. Then any x ∈ F whose base b expansion is uniformly de Bruijn is badly symbolically approximable.
Combining with Theorem 2.1 gives: Corollary 6.4. With F as above, the set of badly symbolically approximable points has dimension at least α k δ > 0, where In particular, the intrinsic Dirichlet function φ * appearing in Proposition 6.1 is optimal.
We remark that the optimality assertion follows directly from combining Theorem 6.3 with [1, Corollary 7]; Theorem 2.1 is not needed.
In contrast to Proposition 6.1, Theorem 6.3 and Corollary 6.4 are weaker than their (unproven) analogues for the standard height function. This is because while Proposition 6.1 is about finding good approximations to points, in Theorem 6.3 and Corollary 6.4 we show that for certain points, good approximations cannot exist. But the inequality H std ≤ H sym means that the quality of an approximation is better according to the standard height than according to the symbolic height, which yields the appropriate implications.
We remark that Theorem 6.3 is only a one-way implication: there may be (and almost certainly are) badly symbolically approximable numbers whose base b expansions are not uniformly de Bruijn. A combinatorial characterization of the base b expansions of badly symbolically approximable numbers was given in [11,Lemma 4.9]. As a consequence of the one-sidedness of the implication, Theorem 6.3 yields a lower bound on the dimension of the set of badly symbolically approximable points but not an upper bound. In fact, we believe that there is no nontrivial upper bound: we conjecture that the Hausdorff dimension of the set of badly symbolically approximable points of any base b limit set F is equal to the Hausdorff dimension of F. This conjecture is motivated by other situations in Diophantine approximation where the dimension of the set of badly approximable points has always turned out to be full. However, Theorem 2.1 shows that this conjecture cannot be proven using uniformly de Bruijn sequences.
Although Theorem 6.3 is a consequence of the much more general result [11,Lemma 4.9], we prove it here for completeness and ease of exposition.
Proof of Theorem 6.3. Let x ∈ F be a number whose base b expansion, which we denote by ω, is uniformly de Bruijn. Let denote the size of the largest gap in the set B ω defined by (2.1). Fix r ∈ F ∩ Q, and let the representation r = 0.τ 1 · · · τ i τ i+1 · · · τ j be chosen so as to minimize i and j. Then the symbolic height of r, as defined in (6.2), is q sym = b i b j−i − 1 ≤ b j . Since the IFS defining F is assumed to satisfy the strong separation condition, the distance between x and r is comparable to b −m , where m is the largest index for which ω i = τ i for all i ≤ m. In fact, a careful analysis shows that |x − r| ≥ b −(m+2) , though the precise constant factor is not relevant. We claim that if j ≥ , then which demonstrates that (6.3) holds with κ = b −( +2) . We now separate into two cases: Case 1: m ≤ j + . In this case, we have Case 2: m > j + . In this case, by the mth letter, the sequence τ will have already begun to repeat. The longest repeated string in the sequence τ 1 · · · τ m is τ i+1 · · · τ m−( j−i) = τ j+1 · · · τ m . Note that although the two sides of this equation represent distinct instances of the same string as a substring of τ 1 · · · τ m , the two instances may overlap with each other; this happens if and only if m > 2 j − i. For the purposes of our calculations, it does not matter whether these two instances overlap or not.
By the definition of m, we have ω 1 · · · ω m = τ 1 · · · τ m , so ω also has a repeated string ω i+1 · · · ω m−( j−i) = ω j+1 · · · ω m of length (m− j) occurring in the first m letters. On the other hand, by the definition of , there exists m − j − < n ≤ m − j such that n ∈ B ω , which implies that ω has no repeated string of length n occurring in the first k n + n − 1 letters of ω. Since n ≤ m − j, it follows that m > k n + n − 1, and thus k n ≤ m − n < j + ≤ 2 j.