New bounds on the dimensions of planar distance sets

We prove new bounds on the dimensions of distance sets and pinned distance sets of planar sets. Among other results, we show that if $A\subset\mathbb{R}^2$ is a Borel set of Hausdorff dimension $s>1$, then its distance set has Hausdorff dimension at least $37/54\approx 0.685$. Moreover, if $s\in (1,3/2]$, then outside of a set of exceptional $y$ of Hausdorff dimension at most $1$, the pinned distance set $\{ |x-y|:x\in A\}$ has Hausdorff dimension $\ge \tfrac{2}{3}s$ and packing dimension at least $ \tfrac{1}{4}(1+s+\sqrt{3s(2-s)}) \ge 0.933$. These estimates improve upon the existing ones by Bourgain, Wolff, Peres-Schlag and Iosevich-Liu for sets of Hausdorff dimension $>1$. Our proof uses a multi-scale decomposition of measures in which, unlike previous works, we are able to choose the scales subject to certain constrains. This leads to a combinatorial problem, which is a key new ingredient of our approach, and which we solve completely by optimizing certain variation of Lipschitz functions.


Introduction. Given
x, y ∈ A}. K. Falconer [7] pioneered the study of the relationship between the Hausdorff dimensions of A and ∆(A). He proved that if d ≥ 2 and A ⊂ R d is a Borel (or even analytic) set then dim H (∆(A)) ≥ min(dim H (A) − 1 2 (d − 1), 1), where dim H stands for Hausdorff dimension. Falconer also constructed compact sets A ⊂ R d (based on lattices) of any Hausdorff dimension such that dim H (∆(A)) ≤ min(2 dim H (A)/d, 1). Although it is not explicitly stated in [7], the conjecture that these lattice constructions are extremal, in the sense that one should have dim H (∆(A)) = 1 if dim H (A) ≥ d/2, has become known as the Falconer distance set problem.
Falconer's problem is a continuous version of the celebrated P. Erdős distinct distances problem [5], asserting (in the plane) that if |A| = N , A ⊂ R 2 , then |∆(A)| ≥ cN/ √ log N . L. Guth and N. Katz [8] (building up on work of Gy. Elekes and M. Sharir [4]) famously solved this problem, up to logarithmic factors, by showing that |∆(A)| ≥ cN/ log N . However, the approach of Guth and Katz and, indeed, all previous methods developed to tackle Erdős' problem, do not appear to be able to yield progress on Falconer's problem.
From now on, we focus on the case d = 2, which is the first non-trivial case, the best understood, and the focus of this article. T. Wolff [27], based on a method of P. Mattila [15] and extending ideas of J. Bourgain [1], proved that if A ⊂ R 2 is a Borel set with dim H (A) ≥ 4/3, then dim H (∆(A)) = 1. In fact, he proved that dim H (A) > 4/3 ensures that ∆(A) has positive length, and established the more general dimension formula (1.1) dim H (∆(A)) ≥ min In particular, if s > 1, then one can find many y ∈ A such that dim H (∆ y (A)) ≥ min 2 3 s, 1 .
We remark that we get better bounds for the dimension of the full distance set, see Theorem 1.4 below.
The last claim in Theorem 1.1 improves the previously known bounds for the dimensions of pinned distance sets ∆ y (A) with y ∈ A for all s ∈ (1, 3/2]. The bound (1.4) also improves upon (1.3) (and the variant of Iosevich and Liu) in large regions of parameter space, and in particular for t = min( 2 3 s, 1) and all s ∈ (3/5, 5/3). Theorem 1.1 is a special case of a more general result that takes into account the Hausdorff and also the packing dimension of A. We refer to [6, §3.5] for the definition and main properties of packing dimension dim P , and simply note that it satisfies dim H (A) ≤ dim P (A) ≤ dim B (A), where dim B denotes the upper box-counting (or Minkowski) dimension. For our method, the worst case is that in which A has maximal packing dimension 2, and we get better bounds for the distance set under the assumption that the packing dimension is smaller: Given 0 < s ≤ u ≤ 2, the following holds: if A is a Borel subset of R 2 with dim H A ≥ s and dim P A ≤ u, then dim H y ∈ R 2 : dim H (∆ y (A)) < χ(s, u) ≤ max(1, 2 − s).
In particular, if s > 1 then there are many y ∈ A such that dim H (∆ y (A)) ≥ χ(s, u), and hence if dim H (A) > 1 and dim P A ≤ 2 dim H A − 1, then dim H (∆ y (A)) = 1 for many y ∈ A.
We remark that, taking u = s, this theorem recovers the main result of [26] mentioned above, namely that if dim H (A) = dim P (A) > 1, then dim H (∆ y A) = 1 for many y ∈ A.
On the other hand, it was known from (1.3) that if dim H (A) > 3/2 then there is y ∈ A such that dim H (∆ y A) = 1. The last claim in Theorem 1.2 can be seen as interpolating between these two situations, and hence provides a new, more general, geometric condition under which Falconer's conjecture is known to hold.
When dim H (A) > 1, we are able to get much better lower bounds for the packing dimension of the pinned distance sets: Theorem 1.3. Let A be a Borel subset of R 2 with s = dim H (A) ∈ (1, 3/2). Then dim H y ∈ R 2 : dim P (∆ y (A)) < 1 + s + 3s(2 − s) 4 ≤ 1.
In particular, there is y ∈ A such that dim P (∆ y (A)) ≥ 1 + s + 3s(2 − s) We recall that since upper box-counting dimension is at least as large as packing dimension, the above theorem also holds for upper box-counting dimension. Even though Falconer's conjecture is about the Hausdorff dimension of the distance set, this result presents further evidence towards its validity.
Finally, as anticipated above, we get a better bound for the dimension of the full distance set when dim H (A) is slightly larger than 1: A calculation shows that this indeed improves upon Wolff's bound (1.1) for the dimension of the full distance set for s ∈ (1, 1.21931 . . .) (and upon Bourgain's bound (1.2) for all s > 1). We remark that this theorem is obtained by combining the idea of the proof of Theorem 1.2 with a known effective variant of Wolff's bound (1.1). Although achieving this combination takes quite a bit of work, Theorem 1.2 should perhaps be considered the most basic result, since its proof is shorter and already contains most of the main ideas, and the improvement given by Theorem 1.4 is relatively modest. Note also that already applying Theorem 1.1 for the full distance set improves upon (1.1) for s ∈ (1, 6/5). See Figure 1 for a comparison of the lower bounds from Theorems 1.1, 1.3 and 1.4 and Wolff's lower bound.
After this paper was made public, B. Liu [14] posted a preprint extending Wolff's result to pinned distance sets. In particular, he shows that if A ⊂ R 2 is a Borel set with dim H (A) > 4/3, then ∆ x (A) has positive Lebesgue measure for some x ∈ A (with bounds on the dimension of the exceptional set). This is stronger than our Theorem 1.1 for s > 4/3 (other than the exceptional set being larger).

Strategy of proof.
Our approach is completely different to those of Wolff, Bourgain, Peres and Schlag and Iosevich and Liu. Rather, it can be seen as a continuation of the ideas successively developed in [22,25,26] to attack the distance set problems for sets with certain regularity. Thus, one of the main points of this paper is extending the strategy of these papers so that it can be applied to general sets. At the core of our method is a lower box-counting estimate for pinned distance sets ∆ y A in terms of a multi-scale decomposition of A or, rather, a Frostman measure µ supported on A. See Section 4 for precise statements. A key aspect of these estimates is that they recover a global lower box-counting estimate for ∆ y A from bounds on local, discretized and linearized estimates for the pinned distance measures ∆ y µ.
The general philosophy of obtaining lower bounds for the dimension of projected sets and measures, in terms of multi-scale averages of local projections is behind a large number of results in fractal geometry in the last few years, see e.g. [10,9] and references there. The insight that this approach can be used also to study distance sets is due to Orponen [20,22].
Up until the paper [26], the scales in the multi-scale decomposition behind all the variants of the method described above were of the form 2 −N j for some fixed N . One of the innovations of [26] was to modify the method so that it could handle also scales of the form 2 −(1+ε) j (the point being that (1 + ε) j is exponential in j, rather than linear). Although this was flexible enough to handle sets of equal Hausdorff and packing dimension (as opposed to Ahlfors-regular sets as in [22,25]), it was still too restrictive for dealing with general sets.
One of the main innovations of this paper is that we are able to work with scales 2 −M j where the M j only need to satisfy τ M j ≤ M j+1 −M j ≤ M j +T (where τ > 0, T ∈ N are fixed parameters). This provides a major degree of flexibility. In particular, a crucial point is that we are able to pick the sequence (M j ) depending on the set A (or the Frostman measure µ), while in all previous works the scales in the multiscale decomposition were basically fixed. See Proposition 4.4. This leads us to the combinatorial problem of optimizing the choice of (M j ) for each measure µ. We solve this problem completely, up to negligible error terms, in Section 5.
In fact, we deduce the combinatorial statements we need from several statements about the variation of Lipschitz functions, which might be of independent interest. More precisely, given a 1-Lipschitz function f : [0, a] → R satisfying certain additional assumptions, we seek to minimize ∞ n=1 f (a n ) − min [an,a n−1 ] f, where (a n ) ∞ n=0 is a strictly decreasing sequence tending to 0 with a = a 0 and a n ≤ 2a n+1 . Conversely, we also study the structure of functions f for which these sums are (for some sequence (a i )) close to the minimum possible value. We underline that this part of the method is completely new as the combinatorial problem does not arise for fixed multi-scale decompositions.
Another obstacle to dealing with arbitrary sets and measures is that energies of measures (which play a key role throughout) do not have a nice multi-scale decomposition in general. We deal with this by decomposing a general measure supported on [0, 1) 2 as a superposition of measures with a regular Cantor structure, plus a small error term: see Corollary 3.5. This step is an adaptation of some ideas of Bourgain we learned from [3]. After some technical difficulties, this reduces our study to those regular measures for which a suitable multi-scale expression of the energy does exist, see Lemma 3.3.
The strategy just discussed is behind the proofs of Theorems 1.2, 1.3 and 1.4. However (as briefly indicated above), the proof of Theorem 1.4 is based on merging these ideas with a more quantitative version of Wolff's result that if dim H (A) ≥ 4/3 then dim H (∆(A)) = 1, see Theorem 6.4 below. The fact that one can improve upon Theorem 1.1 (for the full distance set) is based on the observation that for some sets A ⊂ R 2 of Hausdorff dimension s > 1 for which the method of the proof of Theorem 1.1 cannot give anything better than dim H (∆(A)) ≥ 2s/3, the quantitative version of Wolff's Theorem can give a much better bound. The fact that these two methods are based on totally different techniques and also have different "enemies" that one must overcome, suggests that neither of them (or even in combination as we do here) provides a definitive line of attack on Falconer's problem.
1.4. Sets of directions, and the case of dimension 1. Although Theorem 1.1 does provide new information on the pinned distance sets ∆ y A when dim H A = 1, it gives no information whatsoever on dim H (∆(A)) in this case. There are some wellknown "enemies" that one must handle in order to improve upon the easy bound dim H (∆(A)) ≥ 1/2 when dim H A = 1. One is that the corresponding fact is false over the complex numbers: R 2 is a subset of C 2 of half the dimension of the ambient space for which the (squared) distance set also has half the dimension of the ambient space. Hence any improvements over 1/2 in the real case must take into account the order structure of R. The other obstacle is a well-known counterexample to a naive discretization of the problem: see [13,Eq. (2) and Figure 1]. These enemies do not arise when dim H (A) > 1. Despite these conceptual differences, we underline that, with the exception of the work of Katz and Tao [13] underpinning Bourgain's bound (1.2), none of the other methods developed so far make any distinction between the cases dim H (A) = 1 and dim H (A) = 1 + δ.
From the point of view of our strategy, the key significance of the assumption dim H (A) > 1 is that in this case the sets of directions determined by points in A has positive Lebesgue measure. In fact, we need a far more quantitative "pinned" version of this fact, which is due to Orponen [21], improving upon a related result by Mattila and Orponen [19] (see Proposition 3.11 below). However, even the fact that the direction set has positive measure clearly fails if dim H A = 1 when A is contained in a line. Since dim H (∆ y A) = dim H (A) trivially when A is contained in a line, this does not rule out an extension of our approach to the case dim H (A) = 1. However, this would require some variant of Proposition 3.11 when both s, u are slightly less than 1, under a suitable hypothesis of non-concentration on lines, and this appears to be very hard. In [21,Corollary 1.8], Orponen also proved that the direction set of a planar set of Hausdorff dimension 1 which is not contained in a line has Hausdorff dimension ≥ 1/2, but this is very far from positive measure, let alone from anything resembling Proposition 3.11.
To understand why directions arise naturally, we recall that our whole approach is based on bounding the size of pinned distance sets in terms of a multi-scale average of local linearized pinned distance measures. The derivative of the distance function x → |x − y| is precisely the direction spanned by x and y. Thus we are led to study orthogonal projections of certain measures localized around x, where the angle is given by the direction determined by x and y. The fact that these directions are "well distributed" in a suitable sense can then be used in conjunction with a finitary version of Marstrand's projection theorem (see Lemma 3.6) and several applications of Fubini to conclude that one can choose y such that for "many" x the direction determined by x and y is good in the sense that the L 2 norm of the projection is controlled by the 1-energy of the measure being projected.
1.5. Structure of the paper. In Section 2 we introduce notation to be used in the rest of the paper. Section 3 contains some preliminary definitions and results that will be repeatedly used in the later proofs. In Section 4 we establish a lower bound for the box-counting numbers of pinned distance sets that will be at the heart of the proofs of all main theorems. Section 5 contains a number of optimization results about Lipschitz functions on the line, as well as corollaries of these results for discrete [−1, 1]-sequences; these corollaries play a key role in the proofs of the main theorems. Theorems 1.2, 1.3 and 1.4 are proved in Section 6. We conclude with some remarks on the sharpness of our results in Section 7.
We remark that §5.2 and §5.3 are not needed for the proof of Theorem 1.2 (the results from §5.2 are required only in the proof of Theorem 1.3, and §5.3 is needed only for the proof of Theorem 1.4).
1.6. Acknowledgments. This project was started while the authors were staying at Institut Mittag-Leffler as part of the program Fractal Geometry and Dynamics. We are grateful to the organizers for the opportunity to take part, and to the organizers, staff, and fellow participants for the pleasant stay.
We also wish to thank for T. Orponen for many useful discussions at the early stage of this project, and an anonymous referee for several suggestions that improved the paper, and in particular for suggesting a simplification of the statement and proof of Proposition 3.12.

NOTATION
We use Landau's O(·) notation: given X > 0, O(X) denotes a positive quantity bounded above by CX for some constant C > 0. If C is allowed to depend on some other parameters, these are denoted by subscripts. We sometimes write X Y in place of X = O(Y ) and likewise with subscripts. We write X Y , X ≈ Y to denote Y X, X Y X respectively. Throughout the rest of the paper, we work with three parameters that we assume fixed: a large integer T and small positive numbers ε, τ . We briefly indicate their meaning: (1) We will decompose sets and measures in the base 2 T . In particular, we will work with sets and measures that have a regular tree (or Cantor) structure when represented in this base: see Definition 3.2. (2) The parameter τ will be used to define sets of bad projections: see Definition 3.8. The fact that τ > 0 is required to ensure that these sets have small measure. It also keeps some error terms negligible, see Proposition 4.4. (3) Finally, ε will denote a generic small parameter; it can play different roles at different places. We will use the notation o T,ε,τ (1) = o T →∞,ε→0 + ,τ →0 + (1) to denote any function f (T, ε, τ ) such that If a particular instance of o(1) is independent of some of the variables, we drop these variables from the notation. Different instances of the o(1) notation may refer to different functions of T, ε, τ , and they may depend on each other, so long as they can always be made arbitrarily small.
Note that e.g. O ε (1) denotes any (finite) function of ε, while o ε (1) denotes a function of ε that tends to 0 as ε → 0 + . We will often work at a scale 2 −T ; it is useful to think that → ∞ while T, ε, τ remain fixed.
The family of Borel probability measures on a metric space X is denoted by P(X).
We let D j be the half-open 2 −jT -dyadic cubes in R d (where d is understood from context), and let D j (x) be the only cube in D j containing x ∈ R d . Given a measure µ ∈ P(R d ), we also let D j (µ) be the cubes in D j with positive µ-measure. Note that these families depend on T . Given A ⊂ R d , we also denote by N (A, j) the number of cubes in D j that intersect A.
A 2 −m -measure is a measure in P([0, 1) d ) such that the restriction to any 2 −mdyadic cube Q is a multiple of Lebesgue measure on Q, i.e. a measure defined down to resolution 2 −m . Likewise, a 2 −m -set is a union of 2 −m dyadic cubes. If µ ∈ P(R d ) is an arbitrary measure, then we denote that is R (µ) is the 2 −T -measure that agrees with µ on all dyadic cubes of side length 2 −T . We also define the corresponding analog for sets: given A ⊂ R d , R (A) denotes the union of all cubes in D that intersect A.
Due to our use of dyadic cubes, it will often be convenient to deal with supports in the dyadic metric, i.e. given µ ∈ P([0, 1) d ) we let Note that µ(supp d (µ)) = 1 and that supp d (µ) ⊂ supp(µ).
If a measure µ ∈ P(R d ) has a density in L p , then its density is sometimes also denoted by µ, and in particular µ p stands for the L p norm of its density.
We make some further definitions. Let µ ∈ P([0, 1) d ). If Q is a dyadic cube and µ(Q) > 0, then we denote µ Q = Hom Q µ Q , where Hom Q is the homothety renormalizing Q to [0, 1) d . If M < N be integers, then for x ∈ supp d (µ), we define Logarithms are always to base 2.

Regular measures and energy.
In this section we define some important notions and prove some preliminary results.
Recall that the s-energy of µ ∈ P(R d ) is If µ is a 2 −T -measure and 0 < s < d, then the sum runs up to (in particular, the s-energy is finite).
Proof. First of all, by [23, Theorem 3.1], we can replace E s (µ) by the s-energy on the 2 T -ary tree, i.e. by where |x ∧ y| = max{j : y ∈ D j (x)} (both energies are comparable up to a O T,d (1) factor). The formula for E s (µ) now follows from a standard calculation, see e.g. [26,Lemma 3.1] for the case T = 1 (the proof of the general case is identical). Finally, the case in which µ is a 2 −T -measure follows again from another simple calculation, see e.g. [26,Lemma 3.2] for the case T = 1.
One of the key steps in the proof of the main theorems is to decompose an arbitrary 2 −T -measure in terms of measures which have a uniform tree structure when represented in base 2 T . This notion (which is inspired by some constructions of Bourgain [3]) is made precise in the next definition.
where Q is the only cube in D j−1 containing Q.
The expression 2 −T (σ j +1) in the definition may appear strange, but it turns out to be a convenient normalization. The key point in this definition is that a measure is σ-regular if all cubes of positive mass have roughly the same mass, and the sequence (σ j ) helps quantify this common mass.
Proof. We use crude bounds which are enough for our purposes. From the definition it is clear that if Q ∈ D j (ν) then This implies, in particular, that From the two displayed equations and Lemma 3.1 it follows that Bounding j=1 by times the maximal term in the right-hand side, we deduce that This yields the claim.
Heuristically, the previous lemma says that for log E s (ν) to be small, it must hold that Recalling the connection of σ i to branching numbers, this means that the average branching number over any initial set of scales has to be sufficiently large, in a manner depending on s.
The following is a variant of Bourgain's regularization argument (see e.g. [3, Section 2] for a clean example). Recall that supp d (µ) denotes the dyadic support of µ.
and that supp d (µ) is the union of the X (k) together with X (>T d) . Pick the smallest .
Set X := X (k) and µ = µ X . Now continue inductively, replacing by − 1 and µ by µ , until we eventually get a set X 1 and a sequence (σ 1 , . . . , σ ) ∈ [−1, d − 1] . Note that for Q ∈ D j (µ i ) the value of µ i (Q)/µ i ( Q) remains constant for i ≤ j and, in particular, for i = 1. Hence X = X 1 has the desired properties.
The set X given by the lemma will have far too little measure for our purposes: later we will need µ X (A) to be large (in particular nonzero) for certain sets A of mass roughly −2 . By iterating the construction, we are able to get a moderately long sequence of sets X i such that µ(R d \ ∪ i X i ) −2 ; by pigeonholing we will then be able to select some X i with µ X i (A) suitably large. Corollary 3.5. Fix ≥ 1, write m = T , and let µ be a 2 −m -measure on [0, 1) d . There exists a family of pairwise disjoint 2 −m -sets X 1 , . . . , X N with X i ⊂ supp d (µ), and such that: Moreover, the family (X i ) N i=1 may be constructed so that it is determined by d, T, ε, and µ (even though there may be other families satisfying the above properties).
Proof. Let X 1 be the set given by Lemma 3.4, and put B 1 = [0, 1] d \ X 1 . Continue inductively: once X j , B j are defined, let X j+1 be the set given by Lemma 3.4 applied to µ B j , and set Let N be the smallest integer such that µ(B N ) ≤ 2 −εm ; such N exists thanks to (3.2). It is clear that in this construction the family X 1 , . . . , X N is determined by d, T, ε, , µ since the set X constructed in the proof of Lemma 3.4 is determined by d, T, , µ.
The first part of claim (i) is immediate. Then note that so there must be i such that µ X i (A) ≥ µ(A) − 2 −εm , as claimed. Finally, (ii) is immediate from (3.2) and the definition of N , and (iii) is clear since the sets X i were provided by Lemma 3.4.

Sets of bad projections.
In this subsection, we introduce sets of "bad" multiscale projections for a measure µ around a point x. The simple fact that these sets can be taken to have small measure (independently of µ and x) will play a crucial role later. Although a similar notion was introduced in [26], the sets of bad projections we use here are far more flexible and also more involved, depending on the decomposition into regular measures provided by Corollary 3.5.
Given θ ∈ S 1 , we denote the orthogonal projection x → x · θ by Π θ . Normalized Lebesgue measure on S 1 will be denoted by | · |. We recall the following consequence of the energy version of Marstrand's projection theorem. Lemma 3.6. Let µ ∈ P([0, 1) 2 ) have finite 1-energy. Then, for any R > 0, Proof. This is just a consequence of Markov's inequality and the identity We restate [26, Lemma 3.7] using our notation, for later reference.
Finally, if µ ∈ P([0, 1) 2 ) and x ∈ supp d (µ), we let We record the following immediate consequence of Lemma 3.9 for later use.
3.3. Radial projections. The following result was recently established by T. Orponen [21]. We state it only in the plane. We denote the radial projection with center y by P y , i.e. P y (x) = (y − x)/|y − x| ∈ S 1 is the (oriented) direction determined by x and y. Proposition 3.11. Let µ, ν ∈ P([0, 1) 2 ) be measures with disjoint supports, such that E s (µ) < ∞, E u (ν) < ∞ for some u > 1, 2 − u < s < 1. Then there is p = p(s, u) > 1 such that P x ν is absolutely continuous with a density in L p (S 1 ) for µ almost all x. Moreover, Proof. This is stated in [21,Equation (3.5)], except that Orponen deals with weighted measures µ y = |x−y| −1 dµ instead of µ (note that the roles of µ and ν are interchanged in [21]). Since the weight |x − y| −1 is bounded away from 0 and ∞ by the assumption that the supports of µ and ν are bounded and disjoint, the claim also holds for µ.
We point out that Proposition 3.11 uses the Fourier transform, and is the only point in the proofs of Theorems 1.2 and 1.3 that does (on the other hand, the proof of Theorem 1.4 relies heavily on the strongly Fourier-analytic approach of Mattila-Wolff).

BOX-COUNTING ESTIMATES FOR PINNED DISTANCE SETS
In this section we derive a lower bound on box-counting numbers of pinned distance sets that will be crucial in the proofs of Theorems 1.2 ,1.3 and 1.4. Our estimate will be in terms of a multiscale decomposition where, unlike previous works in the literature, we are allowed to choose the sequence of scales (depending on the set or measure for which we are seeking estimates). This additional flexibility will ultimately allow us to improve upon the easy bounds on the dimensions of distance sets.
To begin, we recall some basic facts about entropy. If ν ∈ P(R d ) and A is a finite partition of R d (or of a set of full ν-measure), then the entropy of ν with respect to A is given by with the usual convention 0 · log 0 = 0. It follows from the concavity of the logarithm that one always has H(ν, A) ≤ log |A|. Hence, a lower bound for H(ν, D j ) provides a lower bound for N (A, j) if A is a Borel set of full measure (recall that N (A, j) denotes the number of elements in D j that intersect A). We will apply this when ν is supported on a pinned distance set. Although box-counting numbers in principle give bounds only for box dimension, together with standard mass pigeonholing arguments we will be able to get bounds also for Hausdorff and packing dimension.
The following proposition is the key device that will allow us to bound from below the entropy of pinned distance measures (and hence also the box-counting numbers of pinned distance sets). Roughly speaking, we bound the entropy of the projection of a measure µ under the pinned distance map by an average over both scales and space (the latter, weighted by µ) of a quantity involving the L 2 norms of projected local pinned distance measures. We emphasize that this method to bound the dimension of (linear or nonlinear) projections from below goes back in various forms to [10,9,22], although the use of projected L 2 norms (rather than projected entropies) was first used in [26].
Before stating the proposition we introduce some definitions. Given L ∈ N, a good partition of (0, L] is an integer sequence Proposition 4.1. Let µ ∈ P([0, 1) d ), let y ∈ R d be at distance ≥ ε from supp(µ), and fix a good partition (N i ) q i=0 of (0, ]. Then where x Q is an arbitrary point in Q.
Proof. Write D i = N i+1 − N i . Note that our D i correspond to D T i and our T N i to m i in [26]. Recall also that µ Q denotes the magnification of µ Q to the unit cube. It is shown in [26, Proposition 3.8 and Remark 3.10] that Applying Lemma 3.7 to ν = µ Q for some Q ∈ D N i and k = D i , we get that On the other hand, a simple convexity argument (see [26,Lemma 3.6]) yields that, for any ν ∈ P(R) and k ∈ N, Applying this with k = D i and ν = Π θ(y,x Q ) µ Q , and recalling (4.3), we deduce that . Using this bound in each term in the right-hand side of (4.2), and absorbing the sum of the q O(1) terms into O T,ε (q), we get the claim.
We remark that the assumption that N j+1 − N j ≤ N j + 1 in the definition of good partition (which will play a crucial role later) arises from the linearization of the distance function, and cannot be substantially weakened. The key advantage of having L 2 norms instead of entropies in this proposition is that the estimate one gets is robust under passing to subsets of moderately large measure: Proposition 4.2. With the assumptions and notation from Proposition 4.1, let us write F(µ) for the right-hand side of (4.1) (we assume y and the partition ( Proof. We start with the trivial observation that if ρ, ρ ∈ P(R d ) have an L 2 density and ρ (S) ≤ Kρ(S) for all Borel sets S, then the same bound transfers over to the densities for a.e. point, and so ρ 2 2 ≤ K 2 ρ 2 2 . Let ζ = 1/(T ) ∈ (0, 1). Fix i ∈ {0, . . . , q − 1}, and note that for any Borel set S ⊂ [0, 1) 2 . This domination is preserved under push-forwards and the action of R D i (where as before D i = N i+1 − N i ), so in light of our initial observation we get On the other hand, for any 2 −T D -measure ρ on R one has ρ 2 2 ≤ 2 T D . In light of Lemma 3.7, this implies that Splitting (for each i) the sum Q∈D N i in Proposition 4.1 into the cubes with ν(Q) ≥ ζµ(Q) and ν(Q) < ζµ(Q), and recalling (4.4), we arrive at the estimate where we merged the sum of the (log of the) implicit constants in (4.6) into O T,ε (q).
Our next goal is to get a simpler lower bound in the context of Proposition 4.2 when µ is σ-regular (recall Definition 3.2), and ν is the restriction of µ to the set of points which are not bad in the sense of §3.2. Combining the results of §3.2 and §3.3, we will later be able to deal with general measures via a reduction to this special case.
We require some additional definitions: Given a finite sequence (σ 1 , . . . , σ L ) ∈ R L , let For any good partition P = (N j ) q j=0 of (0, L] and any σ ∈ R L , we denote where σ|I denotes the restriction of the sequence σ to the interval I. Finally, given σ ∈ R L and τ ∈ (0, 1), we let Recall that o T,ε (1) denotes a function of T and ε which tends to 0 as T → ∞, ε → 0 + . Then where Fix i 0 as the smallest value of i such that N i ≥ β , and note that N i 0 < 2β + 1.
Let us rewrite the inequality from Proposition 4.2 applied to ρ and ρ A in the form where x Q are arbitrary points in Q. By assumption, we may choose these points so that Using that (1 + τ ) q ≤ , we bound . Now, to estimate the main term Σ II , we need to go back to Definition 3.8. By (4.8), and using that P is a τ -good partition of (0, ], we have θ( On the other hand, by the assumption that ρ is (σ 1 , . . . , σ )-regular, and since P is a good partition of (0, ], the measure ρ(Q; N i+1 ) is (σ N i +1 , . . . , σ N i+1 )regular. Hence, using Lemma 3.3, we obtain Combining the last two displayed formulas, we deduce that . Adding up from i = i 0 to q − 1 and again using q = O τ (log ), we get Combining (4.9), (4.10) and (4.11), we conclude that where Error is as in the statement. Recall that F(µ) denotes the right-hand side of (4.1) in Proposition 4.1. Now Proposition 4.1 guarantees that Since H(µ, A) ≤ log |A| for any finite Borel partition A of a set of full µ-measure, this finishes the proof.
Note that in this proposition, the sequence σ depends on the measure ρ and the bound is in terms of M τ (σ) (we will be able to make the error term arbitrarily small). Thus we are led to the combinatorial problem of minimizing M(σ, (N i )) over all τgood partitions for a given σ ∈ [−1, 1] . This problem will be tackled in the next section: see Proposition 5.23, and also Proposition 5.24 for the case in which we are allowed to restrict σ to (0, L] for some large L.
. . > 0 and a n → 0; it is a good partition if we also have a k−1 /a k ≤ 2 for every k ≥ 1.
A sequence (a n ) ∞ n=0 is a τ -good partition for a given 0 < τ < 1 if it is a good partition and we also have a k−1 /a k ≥ 1 + τ for every k ≥ 1.
Let f : [0, a] → R be continuous and (a n ) be a partition of [0, a]. By the total drop of f according to (a n ) we mean T(f, (a n )) = ∞ n=1 f (a n ) − min [an,a n−1 ] f, and we also introduce the notation T(f ) = inf{ T(f, (a n )) : (a n ) is a good partition of [0, a] }, We call the interval [a n , a n−1 ] increasing if min [an,a n−1 ] f = f (a n ) and decreasing if min [an,a n−1 ] f = f (a n−1 ). (Note that f needs not be increasing or decreasing on [a n , a n−1 ].) In this section we investigate the following question: given a 1-Lipschitz function f : [0, a] → R satisfying certain bounds, how large can T(f ) and T τ (f ) be?
First we study T(f ). Later we show (see Corollary 5.20) that for small τ the quantities T(f ) and T τ (f ) are close. Finally, from the bounds on T τ (f ) we deduce corresponding bounds on M τ (σ): see for example Proposition 5.23. Hence this problem is closely related to that of minimizing the dimension loss when estimating the dimension of the pinned distance set via Proposition 4.4. Dealing first with Lipschitz functions rather than [−1, 1]-sequences allows us to avoid certain technicalities and make the arguments more transparent.
The basic result is the following. Then Proof. Since f (a) ≥ Da and a > 0, the second inequality of (5.1) is clear, so it enough to prove the first inequality.
Note that We will construct a good partition (a n ) with the following two extra properties: (*) every interval [a n , a n−1 ] (n = 1, 2, . . .) is either increasing or decreasing (recall Definition 5.1), and First we show that this is enough to prove our claim. Let a = a 0 > a 1 > . . . be the endpoints of the union of each maximal block of consecutive intervals of the same type (increasing or decreasing). It easily follows from the definitions and telescoping that T(f, (a n )) = T(f, (a k )). Hence to obtain (5.1) it is enough to prove We claim that Indeed, by construction, the interval [a k , a k−1 ] is either increasing or decreasing. If it is increasing then If [a k+1 , a k ] is decreasing then, using first (**) and the fact that ρ < 1, and then (5.2), we get which completes the proof of (5.4).
Therefore it is enough to construct a good partition (a n ) with properties (*) and (**). Let a 0 = a and suppose that a 0 > . . . > a n > 0 are already constructed with properties (*) and (**) (up to n).
In this case let a n+1 ∈ [a n /2, a n ] be the smallest number such that f (a n+1 ) = min [an/2,an] f . Then [a n+1 , a n ] is an increasing interval and so (*) and (**) still hold and we can continue the procedure. Case 2. min [an/2,an] f = f (a n ) and f (a n /2) − f (a n ) ≤ h · (a n − a n /2).
In this case let a n+1 = a n /2, and again (*), (**) hold for the extended sequence and we can continue the procedure. Case 3. min [an/2,an] f = f (a n ) and f (a n /2) − f (a n ) > h · (a n − a n /2).
and this implies h ≥ −D.
Since h ≥ −D and f (x) ≥ Dx we have f (a n ) ≥ Da n ≥ −ha n and so This and the assumption f (a n /2) − f (a n ) > h · (a n − a n /2) implies that there exists a largest b ∈ [0, a n /2) be such that a n are already constructed and (5.6) holds for M = m. If b m ≥ a n /2 then we can take b m+1 = a n and M = m + 1. Then the construction is completed and (5.6) holds. Now consider the case b m < a n /2.
Using that b is the largest number in [0, a n /2] for which (5.5) holds, b m ≥ b and f (a n /2) − f (a n ) > h · (a n − a n /2), we get .
Using (5.7) and Thus the last inequality and D < C imply that Hence, using also that f is 1-Lipschitz and b m < a n /2, we obtain This completes the proof of (5.8) and so also the proof of b m+1 > b m . It is easy to see that (5.6) holds for M = m + 1. Note also that the property b i /b i−2 ≥ 2 implies that the construction of the sequence (b i ) is completed after finitely many steps. Now, to finish Case 3 we take a n+j = b M −j for j = 1, . . . , M . Then (*) and (**) hold (up to n + m) and so the procedure can be continued.
This way we obtain a sequence a = a 0 > a 1 > . . . > 0 that forms a good partition with (*) and (**), provided a n → 0. Therefore it remains to prove that a n → 0.
Since a n+1 = a n /2 when Case 2 is applied and a n+M = b 0 = b ≤ a n /2 in Case 3, we are done if Case 2 or Case 3 is applied infinitely many times. It is easy to see that if both a n+1 and a n+2 were obtained from Case 1, then we have a n /a n+2 ≥ 2. Thus a n → 0, which completes the proof.

Small drop on initial segments.
The results in this subsection are required in the proof of Theorem 1.3. We aim to minimize T(f |[0, u])/u, where u > 0 is a new parameter that we are allowed to choose, subject to not being too small. The analysis will be strongly based on the study of hard points which we now define: We will say that a function f defined on an interval I is piecewise linear if I can be decomposed into finitely many intervals such that f is linear on each of them.
and every closed subinterval of (0, a] intersects only finitely many [u j , v j ]. (ii) We have where the empty sum is meant to be zero.
Proof. The first statement is easy, using that f is piecewise linear. First we prove ≥ in (5.9). Let (a n ) be a good partition of [0, a] and let a = a 0 > a 1 > . . . be an ordered enumeration of the set {a n } ∪ {u j } ∪ {v j }. It is easy to check that (a n ) is also a good partition of [0, a], and that by inserting a hard point of f into a good partition (a n ), the value of T(f, (a n )) is not changed. Thus T(f, (a n )) = T(f, (a n )). Now every [u j , v j ] is of the form [u j , v j ] = ∪ m j n=n j [a n , a n−1 ]. Since f must be nonincreasing on any interval [u j , v j ] we obtain [a n ,a n−1 ] f for every j. Adding up, and using that f (a n ) − min [a n ,a n−1 ] f ≥ 0 and T(f, (a n )) = T(f, (a n )) we get the claim.
To prove the other inequality we construct by induction a good partition of [0, a] such that T(f, (a n )) ≤ j f (u j ) − f (v j ). Let a 0 = a. Suppose that a 0 , . . . , a n are already defined.
Case 1. If a n ∈ (u j , v j ] for some j then choose k ≥ 1 and a n > a n+1 > . . . > a n+k = u j so that a n+i /a n+i−1 ≤ 2 for i = 1, . . . , k.

Case 2.
Otherwise let a n+1 ∈ [a n /2, a n ] be the smallest number for which f (a n+1 ) = min [an/2,an] f . We claim that a n+1 < a n . If a n ∈ H then this is clear from the definition. Since the only points of H that are not handled in the previous case are the left endpoints of the intervals [u j , v j ] we can suppose that a n = u j for some j. By the piecewise linearity of f , there exists w ∈ (u j /2, u j ) such that f is linear on [w, u j ] and w > v j+1 . Since u j is a hard point, f cannot be increasing on [w, u j ]. If f is constant on [w, u j ] then a n+1 ≤ w < u j = a n , so we are done. So we can suppose that f is decreasing on [w, Thus indeed a n+1 < u j = a n .
Note that if Case 2 was applied to obtain both a n+1 and a n+2 then a n /a n+2 ≥ 2. This implies that a n → 0, so (a n ) is a good partition of [0, a]. It remains to show that If a n was obtained in Case 1 then [a n , a n−1 ] is a subinterval of some [u j , v j ] and f (a n ) − min [an,a n−1 ] f = f (a n ) − f (a n−1 ). If a n was obtained in Case 2 then f (a n ) − min [an,a n−1 ] f = 0. Note also that f is nonincreasing on each which completes the proof. The next proposition (or rather, the discrete corollary given in Proposition 5.24 below) will be crucial to get estimates on the packing dimension of the pinned distance sets.
Then for every δ ∈ (0, 1/2) there exists u ∈ [3aΦ(D)2 −1/δ , a] such that Proof. Let H ⊂ [0, a] be the set of hard points of f . If H = ∅ then by Lemma 5.4, T(f ) = 0, so u = a is clearly a good choice in this case. So suppose that H is nonempty. First we briefly explain the idea of the proof in this nontrivial case. For simplicity, suppose that a = 1 and D = 0, which is the most interesting case anyway. Assume that the maximum of f (x)/x on H ∩ (0, 1] exists and is attained at u, and let B be this maximum. Since u is a hard point, f (x) ≥ f (u) = Bu on [u/2, u], and a calculation using that f is 1-Lipschitz shows that that Unfortunately, f (x)/x may not have a maximum on H ∩ (0, 1] and, even if it does, we might get an u which is too small. To avoid these problems we replace f (x)/x by f (x)/x + δ log x. Then we can show that u exists, is not too small, and it still satisfies the claim of the proposition.
We now continue with the actual proof. Note that H is a closed set, and let h = max H. By Lemma 5.4, If h < 3aΦ(D) then, applying Proposition 5.2 on [0, h] with C = 1, we get so u = a is a good choice in this case. Therefore in the rest of the proof we can suppose that h ≥ 3aΦ(D). Let (Recall that in this paper log denotes log 2 .) Since f is nonnegative and 1-Lipschitz, 0 ≤ f (x)/x ≤ 1 on (0, a], so for any x ∈ (0, 2 −1/δ h) we have Now we claim that To prove this we define a sequence u 0 > u 1 > . . . ∈ H inductively. Let u 0 = h. Suppose that u n ∈ H is already defined. Let v ∈ H ∩ [δu n , u n ] be the largest number such that φ(v) = max H∩[δun,un] φ. If v = u n then let N = n and the procedure is terminated.
Let u be chosen according to (5.13). Then, using that h ≥ 3aΦ(D), we have u ≥ 2 −1/δ h ≥ 2 −1/δ · 3aΦ(D), so the requirement u ∈ [3aΦ(D)2 −1/δ , a] is satisfied. Thus it remains to prove (5.10). Let Since u is chosen according to (5.13), we have . Now we claim that every p ∈ H ∩ [δu, u] is also a hard point of F . Suppose, on the contrary, that p ∈ H ∩ [δu, u] is not a hard point of F . Then there exists a q ∈ [p/2, p] such that F (q) < F (p). By (5.14) we have f (p) ≤ Bp ≤ 2Bp, so by definition F (p) = f (p), and consequently we have which implies that f (q) = F (q). Thus f (q) < f (p), so p cannot be a hard point of f , which is a contradiction.
So in the rest of the proof we may assume that (5.17) f (u)/u ≥ −δ log δ.
Since δ < 1/2 this also implies that f (u)/u > δ. Putting this together with the fact that u was chosen according to (5.13), and with the inequality log y ≤ y − 1, we get that if x ∈ H ∩ [δu, u), then Since u is a hard point, f (x) ≥ f (u) on [u/2, u], and so (5.18) implies that H∩[u/2, u) = ∅.
Again because u is a hard point, f (u/2) ≥ f (u). Using this, δ < 1/2 and the fact that f is 1-Lipschitz, we get Using again that f is 1- Let Since Let C = min(2B, 1). We have just seen that Note also that D ≤ f (u)/u = B + δ log δ < B, and so D ≤ C/2 since we assumed that , so we can apply Proposition 5.2 to get Note that C/2−D 1+2C−3D < 1. Using calculus, we get that (1−C)(C/2−D) Combining the above inequality with (5.15) and (5.20), we get (5.10).
then T(f ) = (1 − 2D)/3. In this section we prove a quantitative stability result (Proposition 5.15) for D ∈ [0, 1/3], stating that if T(f ) is close to (1 − 2D)/3 then f (x) must be close to the above function when x is not too far from 0 or from 1.
The general plan to get this result is the following. Let b = min [1/2,1] f and choose a ∈ [1/2, 1] such that f (a) = b. It is easy to see that T(f ) = T(f |[0, a]), so it is enough to study f |[0, a] instead of f . We need to get an upper estimate on T (f ) when f is not close enough to the function defined in the previous paragraph. This upper estimate will be obtained by finding a point p ∈ [0, a] such that in the good partition in the definition of T (f ), the points a n in [p, a] can be chosen such that min [an,a n−1 ] f = f (a n−1 ), and so for these indices the sum of the terms f (a n ) − min [an,a n−1 ] f is f (p) − f (a) or, in other words, the smallest possible. Combining this with a near optimal good partition for f |[0, p] guaranteed by Proposition 5.2, we get a near optimal lower bound for T (f ) for all f with such a special point p and value f (p). These points p will be called simple points, and after proving the above described near optimal upper estimate, most of the proof will be about hunting a simple point such that the estimate we obtain for T(f ) is the upper estimate we claim.
First we collect some assumptions and define precisely the above mentioned notion of simple points. A point p ∈ [0, a] is called simple if there exists a finite sequence p = p 0 < p 1 < . . . < p k = a such that Lemma 5.7. If (5.21) holds and p ∈ [0, a] is a simple point then Hence for any δ > 0 there exists a good partition (a n ) of [0, p] such that Since p is simple there exists a finite sequence p = p 0 < p 1 < . . . < p k = a such that (5.22) holds. For n ≤ k let a n = p k−n and for n > k let a n = a n−k . Then (a n ) is a good partition of [0, a] and which completes the proof.  for every z ∈ [p, a/2) there exists y ∈ (z, 2z] such that f (y) ≤ f (z) then p is simple.
Proof. Let p 0 = p. Suppose that n ≥ 0 and p 0 < . . . < p n are defined such that (5.22) holds for k = n. If p n ≥ a/2 then let p n+1 = a and we are done. Otherwise, let p n+1 ∈ [p n , 2p n ] be the largest number such that f (p n+1 ) = min [pn,p n+1 ] f . By (5.23) we also have p n+1 > p n . It remains to check that the procedure terminates, which follows from the simple observation that p n+2 ≥ min(2p n , a) by definition. − δ for some δ ∈ (0, a/3) then Proof. First note that δ < a/3 implies that t 0 < a. Since f (0) < −2 · 0 + a + b and f (a) ≥ −2 · a + a + b there exists a t ∈ (0, a] such that f (t) = −2t + a + b. By Lemma 5.9, t is a simple point, so writing α = 1−2D 3(1−D) and using Lemma 5.7, we get Combining this with the assumption T(f ) > 1−2D 3 − δ and multiplying through by which can be rewritten as By Lemma 5.10, this implies t < a+b 3 + δ(1 − D) = t 0 . Using this and the 1-Lipschitz property of f , we obtain Using again that f is 1-Lipschitz, this gives the claim.
It is useful to note that by the 1-Lipschitz property of f , the assumptions u ≤ a, f (u) = v and f (a) = b imply that u + v ≤ a + b, and so u 2 ≤ a+b−v 2 . By Lemma 5.8 it is enough to check (5.23). So let z ∈ [p, a/2). We distinguish three cases.
First suppose that z ≥ a+b−v 2 . Then, using that f ( a+b−v 2 ) ≥ v, f is 1-Lipschitz, 2z < a and f (a) = b, we get Therefore (5.23) holds in this case.
Finally, suppose that z ∈ [p, u 2 ). Then , which completes the proof.
If v < 0 then the claim is clear, so we can suppose that v ≥ 0. By Lemma 5.11, f (t 0 ) > v. Thus if the claim is false then there exists a u ∈ (t 0 , 2t 0 − 6δ(1 − D)] such that f (u) = v.
By (5.21), we have b ≤ a 2 , which implies = v, all the assumptions of Lemma 5.12 hold, so we get that p is simple.
Then by Lemma 5.7 we have T Note that Lemma 5.10 implies that b + 1−2D 3 ≥ a+b 3 , so we obtain v > a+b 3 − 2δ(1 − D), which is a contradiction.
From the last lemma and the Lipschitz property of f one can easily derive a good lower estimate also on [2t 0 − 6δ (1 − D), a]. However, the next lemma will lead to an even better (and, as we will see later, sharp) estimate on the right part of [0, a]. Lemma 5.14. Suppose that (5.21) holds, Then Proof. Since f (0) < −2 · 0 + u + v and f (a) ≥ −2 · a + u + v there exists a p ∈ (0, a] such that f (p) = −2p + u + v. First we prove that p is a simple point. To get this, by Lemma 5.12, it is enough to check that p ≤ a+b−v Since f is 1-Lipschitz this implies that p < t 0 . Using this, u + v ≤ a + b and finally the assumption u ≥ 2v + 6δ(1 − D), we get Taking the linear combination of these inequalities with weights 2/3 and 1/3, we get By the assumption u ≥ 2v + 6δ(1 − D), this gives f (x) ≥ v on [p, t 0 ]. Using that f is 1-Lipschitz and then (5.24), we get that on [t 0 , a] we have 2x+f (x) ≥ 2t 0 + f (t 0 ) > a + b, which implies that f (x) ≥ v also on [t 0 , a+b−v 2 ]. Therefore, by Lemma 5.12, p is indeed a simple point. Now Lemma 5.7 gives Using these facts, the last displayed equation yields Combining this with the assumption T(f ) Combining these facts, we conclude that which implies (using also that D < 1/2) the claim.
The following proposition provides a global quantitative estimate for functions f : [0, 1] → R for which T(f ) is close to the maximum possible value. .
It remains to prove the lower estimate of (5.28). Lemma 5.14 (for f |[0, a]) gives that every point of the graph of f |(a/2, a] must be above either the y = 1 D)). On the other hand, a calculation using δ ≤ 1/21, D ≤ 1/3, a ≤ 1 and (5.30) shows that D). Using the 1-Lipschitz property of f , this gives the lower estimate of (5.28).
], and f (x) = 1 + D − x on The following corollary can be seen as a version of Proposition 5.15 that is closer to the kind of estimates we will need in the proof of Theorem 1.4.

Total drop for τ -good partitions.
In this subsection we show that for small τ , allowing only τ -good partitions (recall Definition 5.1) does not change too much the smallest possible total drop, see Corollary 5.20 below. We begin with a lemma that will allow us to obtain τ -good partitions from partitions that satisfy a weaker property, with a controlled change in the total drop. Lemma 5.18. Let f : [0, a] → R be a 1-Lipschitz function and (a n ) be a good partition of [0, a]. Suppose that τ > 0, K > 1 is an integer, (1 + τ ) K < 2 and a n−K /a n ≥ 2 for every n ≥ K. Then T τ (f ) < T(f, (a n )) + 6K(K − 1)τ a.
Proof. Fix i ∈ N 0 and consider the numbers β j = a iK+j−1 /a iK+j (j = 1, . . . , K). Then β 1 · · · β K = a iK /a iK+K ≥ 2 and for every j we have 1 < β j ≤ 2. The goal is to make every β j at least 1 + τ so that each of them remains at most 2, the product β 1 · · · β K stays fixed, and the numbers a iK+j are changed by only a small amount.
So let β j = 1 + τ if β j ≤ 1 + τ , and to get the remaining β j 's decrease some of the corresponding β j 's (and choose β j = β j for the rest), so that still β j ≥ 1 + τ and β 1 · · · β K = β 1 · · · β K ; this is possible since (1 + τ ) K < 2. Then let a iK = a iK and for each j = 1, . . . , K let a iK+j = a iK /(β 1 · · · β j ). Note that a iK+K = a iK+K , 1 + τ ≤ a iK+j−1 /a iK+j ≤ 2 for every j = 1, . . . , K and each a iK+j was multiplied by a factor between (1 + τ ) −K and (1 + τ ) K to get a iK+j . This implies that for every j = 1, . . . , K − 1, Note that (a n ) ∞ n=0 obtained by applying this procedure for every i ∈ N 0 is a τ -good partition of [0, a]. Let x is increasing on (0, ∞) (being a polynomial with positive coefficients) and (1 + τ ) K < 2, we have where we used the inequality e t − 1 ≥ t. Thus Combining this with (5.33) and a iK = a iK , then adding up, we get ∞ n=0 |a n − a n | < Since f is 1-Lipschitz, changing one a n by η can change T(f, (a n )) by at most 2η, so the above inequality implies T(f, (a n )) < T(f, (a n )) + 6K(K − 1)τ a, which completes the proof of the lemma.
The next lemma shows that we can replace an arbitrary good partition by one satisfying the assumptions of Lemma 5.18, without increasing the total drop. Lemma 5.19. For any δ > 0 and 1-Lipschitz function f : [0, a] → R there exists a good partition (a n ) such that a n−3 /a n > 2 for every n ≥ 3 and T(f, (a n )) ≤ T(f ) + δ.
Proof. It is enough to show that for any good partition (a n ) there exists a good partition (a n ) such that a n−3 /a n > 2 for every n ≥ 3 and T(f, (a n )) ≤ T(f, (a n )).
First we claim that we can suppose that every interval [a n , a n−1 ] is increasing or decreasing (recall Definition 5.1). Indeed, for each n ≥ 1 if on the interval [a n , a n−1 ] the minimum of f is taken at p ∈ (a n , a n−1 ) then inserting p to the partition (in between a n−1 and a n ) we get a new good partition such that [a n , p] is decreasing and [p, a n−1 ] is increasing and it is easy to see that T(f, (a n )) is not changed.
Suppose that a n−2 /a n < 2 (n ≥ 2). If [a n , a n−1 ] and [a n−1 , a n−2 ] are both increasing or both decreasing, then by merging these intervals we get an interval of the same type, and T(f, (a n )) remains unchanged. If [a n , a n−1 ] is increasing and [a n−1 , a n−2 ] is decreasing then after merging the two intervals the minimum of f on [a n , a n−2 ] is still achieved at one of the endpoints of the interval, and T(f, (a n )) does not increase.
Applying the above merging procedure inductively (starting with n = 2) whenever possible, we get a good partition (a n ) such that whenever a n−2 /a n < 2 (n ≥ 2) then [a n , a n−1 ] is decreasing and [a n−1 , a n−2 ] is increasing. Since this cannot happen for both n and n − 1 we get that a n−2 /a n ≥ 2 or a n−3 /a n−1 ≥ 2 for any n ≥ 3, which clearly implies that a n−3 /a n > 2.

Discretizing the estimates.
Recall from Definition 4.3 the notion of τ -good partition of an integer interval (0, ], and the notation M(σ, (N i )). Sometimes we refer to these as integer partitions for emphasis. Note that the requirement (4.7) for a τgood integer partition slightly differs from the requirement 1 + τ ≤ a k−1 /a k ≤ 2 for a τ -good partition (see Definition 5.1), which is equivalent to τ a k ≤ a k−1 − a k ≤ a k . These two notions are connected by the following lemma.
Proof. Let N 0 < . . . < N q be the values taken by the sequence a n . Since a n → 0 we get N 0 = 0. Thus 0 = N 0 < . . . < N q = L is an integer partition of (0, L]. Using that (a n ) is a good partition we get a n ≤ a n ≤ 2 a n+1 < 2 a n+1 + 2, hence a n − a n+1 ≤ a n+1 + 1. Thus to prove that (N j ) is a τ -good integer partition of (0, L] it is enough to show that a n − a n+1 ≥ τ a n+1 if a n > a n+1 . This is clear if τ a n+1 ≤ 1. Otherwise, using also that (a n ) is a (2τ )-good partition, we get a n − a n+1 > a n − a n+1 − 1 ≥ 2τ a n+1 − 1 ≥ 2τ a n+1 − 1 > τ a n+1 .
The following lemma will help us translate the results for Lipschitz functions to results for [−1, 1]-sequences.
Let N be the largest index such that a N ≥ √ ζ. Then a N ≤ 2a N +1 < 2 √ ζ. By applying Corollary 5.20 and Proposition 5.2 to f 1 |[0, a N ] with D = −1 and C = 1, we get Hence for any δ > 0 there exists a a (2τ )-good partition Let a n = a n if n ≤ N and a n = b n−N otherwise. Then (a n ) is a (2τ )-good partition of [0, L/ ] such that , (a n )) ≤ T(f |[0, L/ ], (a n )) + 2 ζ + 72τ + δ.
Applying Lemma 5.21 for f 1 |[0, L/ ] and the (2τ )-good partition (a n ) of [0, L/ ], we get a τ -good integer partition 0 = N 0 < . . . < N q = L of (0, L] such that Noting that the left-hand side of the above expression is exactly 1 M(σ, (N j )), and the right-hand side is at most the right-hand side of (5.34) by (5.35), the proof is complete.
The next proposition is a version of Proposition 5.2 for sequences, and will play a central role in the proof of Theorem 1.2.
Using this and applying Proposition 5.5, we obtain a u ∈ [η, 1] such that Let L be the smallest integer such that u ≤ L/ . Then clearly L ∈ [η , ].
and recall that χ(s, u) = min(ψ(s, u), 1) for 0 ≤ s ≤ u ≤ 2 and s < 2, and moreover χ(s, u) = 1 if and only if u ≤ 2s − 1 (which forces s ≥ 1). The next proposition encapsulates some preliminary reductions towards the proof of Theorem 1.2. We first explain how to deduce the theorem from the proposition; the rest of the section is then devoted to the proof of the proposition. Proposition 6.1. For every 0 < s ≤ u ≤ 2 with u > 2s − 1, the following holds. Let Proof of Theorem 1.2 (assuming Proposition 6.1). We proceed by contradiction. Assume, then, that there exists a Borel set Since dim H (∆ y A) does not increase if we replace A by any subset, every Borel set of dimension s > 0 contains compact subsets of positive s -dimensional Hausdorff measure for all 0 < s < s, and χ(s, u) is continuous, at the price of replacing η by η/2 we may assume that in (6.1) the set A is compact and of positive s-dimensional Hausdorff measure. In turn, a routine verification shows that if A is compact, then the set is Borel. Hence in (6.1) we may also assume that B is Borel.
Let µ ∈ P(R 2 ) be an s-Frostman measure on A, i.e. µ is a Radon measure supported on A and µ(B(x, r)) ≤ Cr s for all x ∈ R 2 , r > 0, where C is independent of x (recall that we assumed that A has positive s-dimensional Hausdorff measure). By assumption, dim P (supp(µ)) ≤ u. Using that packing dimension is equal to the modified upper box counting dimension (see e.g. [6,Proposition 3.8]), and that dim B (A 0 ) = dim B (A 0 ), we see that for every δ > 0 there is a compact set A 0 ⊂ A of positive µ-measure such that dim B (A 0 ) ≤ min(u + δ, 2).
We can then find disjoint compact subsets B ⊂ B, A ⊂ A 0 such that still µ(A ) > 0, dim H (B ) > max(1, 2 − s). Then (provided δ was taken small enough in terms of s, u, η) This inequality is preserved under (joint) scaling and translation of µ, A , B , so it holds in particular for some compact A , B ⊂ [0, 1) 2 . Since µ A (B(x, r)) ≤ C r s for some constant C > 0, we can check that E s−δ (µ A ) < ∞. Since supp(µ A ) ⊂ A , this contradicts Proposition 6.1 applied to µ A and B , with s − δ, min(u + δ, 2) in place of s, u (provided δ was taken small enough in terms of s, u, dim H (B )).
In order to bound the Hausdorff dimension of ∆ y A from below, we will use the following standard criterion; although it is well known, we include the short proof for completeness. We now begin the proof of Proposition 6.1. Since µ, s, u are fixed, any (possibly implicit) constants appearing in the proof may depend on them. Let ν be a measure supported on B with finite u -energy where u > max(1, 2 − s). Let κ = κ(µ, ν) > 0 be the number given by Proposition 3.12. We will show that (under the assumptions of the proposition) there exists y ∈ B (possibly depending on T, ε, τ ) such that Recall that o T,ε,τ (1) stands for a function of T, ε, τ which tends to 0 as T → ∞ and ε, τ → 0 + . We will henceforth assume that T, ε, τ are given, and that the integer 0 is chosen large enough in terms of T, ε, τ so that all the claimed inequalities hold. As a first instance of this, apply Lemma 3.10 to get that provided 0 was taken large enough (in terms of T, ε, τ ).
Hence, in order to complete the proof of Proposition 6.1, it is enough to establish the following.
Claim. If the Borel set where 0 is taken sufficiently large in terms of T, ε, τ , then (6.6) log Fix, then, A 2 as above. Since the set ∆ y (R A 2 ) is contained in the ( √ 2 · 2 −T )neighborhood of ∆ y A 2 , the numbers log N (∆ y A 2 , ) and log N (∆ y R A 2 , ) differ by at most a constant. Hence we can, and do, assume that A 2 = R A 2 from now on. Moreover, we may assume that A 2 ⊂ R (A 1 ), since whenever A 2 = R A 2 and µ A 1 (A 2 ) ≥ −2 , the same holds for A 2 ∩ R (A 1 ).
(iv) X is contained in R supp d (µ). By Lemma 3.3 and (ii), (iii) above, and assuming that 0 was taken large enough in terms of T , we have On the other hand, we have assumed that dim B (supp(µ)) ≤ u, so that N (supp(µ), j) ≤ O ε (1)2 (u+ε)T j for all j ∈ N. By (iv) above, this also holds for X in place supp(µ) if j ≤ . On the other hand, using that ρ is σ-regular as in (3.1), we get Combining these estimates, we deduce that 2 T (σ 1 +...+σ j +j) ≤ O ε (1)2 (u+ε)T j , and hence provided 0 was taken large enough in terms of ε. Combining (6.7) and (6.8), we see that the assumptions of Proposition 5.23 are satisfied with γ = s − 1, Γ = u − 1, and ζ = o T,ε (1). After another short calculation, and starting with 0 large enough in terms of τ , we deduce that Recall from (6.4) that if x ∈ A 1 , then θ(x, y) / ∈ Bad 0 (µ, x). Hence, according to the definition of the sets Bad 0 (R µ, x) and Bad 0 (µ, x) in (3.3) and (3.4) respectively, we have θ(x, y) / ∈ Bad ε (R µ, x) = Bad ε (ρ, x) for all x ∈ A 1 ∩ X. Since we have assumed that A 2 ⊂ R (A 1 ), the hypotheses of Proposition 4.4 are met by ρ and A 2 , with β = ε (the separation assumption follows from (6.5)). Recalling (i), we see that if 0 was taken even larger in terms of T, ε, τ we can make the error term in Proposition 4.4 equal to o T,ε,τ (1). In light of (6.9), Proposition 4.4 gives exactly (6.6).
This completes the proof of the claim and, with it, of Proposition 6.1 and Theorem 1.2.
6.2. Proof of Theorem 1.3. In this section we prove Theorem 1.3. The proof goes along the same lines as the proof of Theorem 1.2, except that we rely on Proposition 5.24 instead of Proposition 5.23 to choose the scales in the multi-scale decomposition. The need to deal with two different scales 2 −T L and 2 −T also creates some additional challenges. Write (This should not be confused with the function ψ(s, u) from §6.1.) The next proposition contains the core of Theorem 1.3.
The reason we need to go via box dimension is that the map y → dim B ∆ y (U ) is Borel if U is compact, while it is unclear whether the map y → dim P ∆ y (U ) is Borel, since it was proved in [18] that packing dimension is not a Borel function of the set if one considers the Hausdorff metric on the compact subsets of R. Now suppose the claim of Theorem 1.3 does not hold. Then we can find a Borel set A ⊂ R 2 with dim H (A) = s ∈ (1, 3/2) and η > 0 such that Let ν a Frostman measure on A of exponent t ∈ (1, s), sufficiently close to s that ψ(t) ≥ ψ(s) − η, and note that (6.11) dim H {y ∈ R 2 : dim P (∆ y (supp(ν))) < ψ(t)} > 1.
Fix a countable basis (U i ) of open sets of supp(ν) (in the relative topology). Note that dim H (U i ) ≥ t for all i since ν is a Frostman measure. Hence, from (6.10) we get that Fix y ∈ R 2 \ E. Let (F j ) be a countable cover of ∆ y (supp(ν)). By Baire's Theorem, some ∆ −1 y (F j ) has nonempty interior in supp(ν), and hence contains some U i . By the definition of E, By the characterization of packing dimension as modified upper box counting dimension ([6, Proposition 3.8]), we conclude that dim P (∆ y (supp(ν)) ≥ ψ(t) whenever y ∈ R 2 \ E. Since dim H (E) ≤ 1, this contradicts (6.11), finishing the proof.
We now start the proof of Proposition 6.3. Let ν be a measure supported on B with finite u-energy for some u > 1, and let κ = κ(µ, ν) > 0 be the number given by Proposition 3.12. Apply Lemma 3.10 to obtain the bound provided 0 was taken large enough in terms of T, ε, τ . Recall that the set Θ in Equation (6.3) is Borel. Applying Proposition 3.12 to Θ and Fubini, we obtain a compact set A ⊂ supp d (µ) with µ(A) > 2/3 and a point y ∈ B such that (6.12) Fix a number δ > 0. We will show that where Error T,ε,τ (δ) can be made arbitrarily small by first taking δ small enough, and then taking T large enough and ε, τ small enough, all in terms of δ. This error term may also depend on s. Fix a large integer 0 . We claim that it is enough to find a scale L ∈ [ 0 , ], tending to infinity with , such that (6.13) log where the error term has property detailed above. Indeed, since ∆ y (R L A) is contained in the O(2 −T L )-neighborhood of ∆ y (A), this implies the corresponding lower bound for dim B (∆ y (A)). Apply Corollary 3.5 to R µ. Taking large enough that 2 −εT 2/3 < µ(A) ≤ µ(R A), there is a 2 −T -set X such that, setting ρ = (R µ) X , T whence, as we saw in the proof of Proposition 6.1, Note that, provided 0 was taken large enough, (6.7) still holds, since it only depends on (ii) and (iii). We are then in the setting of Proposition 5.24 with γ = s − 1, ζ = o T,ε (1). Let η = η(δ, s − 1) > 0 be the number given by the proposition; we underline that, since δ is chosen before T, ε, τ , the number η is also independent of T, ε, τ (it is useful to keep in mind that η does depend on δ). A short calculation shows that 1 − ψ(s) = Φ(s − 1). From now we assume that is taken large enough (in terms of δ and s) that η ≥ 0 . Then, applying Proposition 5.24 and making even larger, we get an integer L ∈ [η , ] ⊂ [ 0 , ] such that (6.14) Note that R L ρ is (σ 1 , . . . , σ L )-regular. Also, if x ∈ A ∩ X, then θ(x, y) / ∈ Bad ε (R µ, x) (by (6.12) and (3.4)) Note that x → Bad (ε/η)L L (R L ρ, x) is constant on each square of D L (R L ρ). Hence, for each x ∈ R L (A ∩ X) there is x ∈ A ∩ X ⊂ R L (A ∩ X) such that θ( x, y) / ∈ Bad (ε/η)L L (R L ρ, x). Assume ε < η. If ε was taken small enough and large enough that dist(B, supp(µ)) ≥ ε + √ 2 · 2 − 0 , then all the hypotheses of Proposition 4.4 are satisfied for R L ρ, R L (A ∩ X) and L in place of ρ, A and , with β = ε/η. Using (i) above (which implies R L ρ(R L A) ≥ 1/2) and the bound L ≥ η , the error term in Proposition 4.4 can be bounded by Making large enough in terms of T, ε, τ, δ and s, this error term can be made o T,ε (1)+ 2εη −1 . Hence Proposition 4.4 together with (6.14) ensure that (6.13) holds, with the error behaving as claimed.
Since L ≥ η and is arbitrarily large, L is also arbitrarily large. Hence we have shown that (6.13) holds for arbitrarily large L and, as explained above, this completes the proof of Proposition 6.3 and, with it, of Theorem 1.3.

Proof of Theorem 1.4.
In this section we prove Theorem 1.4. Throughout this section, we let ∆ : R 4 → R, (x, y) → |x − y|. We start by recalling a more quantitative version of the Mattila-Wolff bound (1.1).
. It is enough to consider the case in which A is bounded. After translating and rescaling A, we may further assume that A ⊂ [0, 1) 2 . Let µ 1 , µ 2 ∈ P([0, 1) 2 ) be measures supported on A such that E s (µ 1 ), E s (µ 2 ) < ∞, and their supports are (2ε)-separated (making ε smaller if needed). Any implicit constants arising in the proof may depend on µ 1 , µ 2 and s.
Since ∆(G) ⊂ ∆(A × A), this will establish the theorem. In turn, since a Borel set F ⊂ R satisfies (∆µ G )(F ) > −2 if and only if B = ∆ −1 (F ) satisfies µ G (B) > −2 , according to Lemma 6.2, in order to complete the proof it is enough to prove the following claim.
We start the proof of the claim. Firstly, replacing B by a compact subset of almost the same measure we may assume that B is compact. We may assume also that be the 2 −T -sets given by Corollary 3.5 applied to R (µ i ). Note that we have a disjoint union particular, it is a 2 −T -measure. Also, by Lemma 3.1 and Corollary 3.5(ii), and using our assumption that E s (µ i ) < ∞, Hence, using Lemma 3.3 and increasing the value of 0 again, any σ = σ (i) k satisfies (6.18) where ζ = o T,ε (1). By starting with appropriate T, ε, we may assume that ζ < (s−1) 2 .
If ρ is σ-regular, we write Note that, since the X (i) k are 2 −T -sets, then so if F i . Consider two (non mutually exclusive) cases: Roughly speaking, in the first case we will argue as in the proof of Theorem 1.2, while in case (b) we will appeal to Proposition 5.25.
Assume then that (a) holds. Without loss of generality, suppose µ B (F 1 ×R 2 ) ≥ 1/3. Instead of showing that (6.16) holds directly for B, we will show that it holds for the set B = y∈[0,1) where, for the rest of this section, given A ⊂ R 2 × R 2 we denote its "horizontal" sections by A y = {x : (x, y) ∈ A} (for y ∈ R 2 ). In other words, to form B we make each horizontal fiber of B into a union of squares in D . One can check that B is Borel (in fact, σ-compact). Since B ⊂ B ⊂ R B, the numbers N (∆(B ), ) and N (∆(B), ) differ by at most a multiplicative constant so that proving (6.16) for B implies it also for B. Since we are assuming that B ⊂ G, we have that B ⊂ G , where G is defined analogously to B . Using Fubini, that F 1 = R F 1 , our definition of B , the assumption µ B (F 1 × R 2 ) ≥ 1/3, and the fact that µ(B) ≥ −2 /3, we get Applying (6.17) with i = 1, we can decompose Hence, using that R µ 1 ( X (1) ) ≤ 2 −εT and taking large enough, there exists k such that (ρ (1) k × µ 2 )((F 1 × R 2 ) ∩ B ) ≥ −2 /9 − 2 −εT ≥ −2 /10. By the definition of F 1 , we must have D(ρ (1) k ) ≥ s 0 − δ, and supp d (ρ (1) k ) ⊂ F 1 . By Fubini, we can find y ∈ supp d (µ 2 ) such that (6.20) ρ (1) k (B y ) ≥ −2 /10. Since B ⊂ G ⊂ R G, we know that if x ∈ B y then there exists x ∈ D (x) such that ( x, y) ∈ G. By (6.15), this implies that θ( x, y) / ∈ Bad 0 (µ 1 , x). Recalling the definitions (3.3), (3.4), we have shown that the hypotheses of Proposition 4.4 hold for ρ (1) k and B y , with β = ε (the separation between y and supp(ρ (1) k ) follows from the fact that the supports of µ 1 and µ 2 are (2ε)-separated, making larger again). Recalling (6.20), we see that the error term in Proposition 4.4 can be made ≤ o T,ε,τ (1) by making even larger. Applying the proposition, and recalling that D(ρ (1) k ) ≥ s 0 − δ, where δ = o T,ε,τ (1) was defined in (6.19), we conclude that (for this fixed value of y) log N (∆ y (B y ), ) ≥ T (s 0 − o T,ε,τ (1)), and hence the same lower bound holds for log N (∆(B ), ). This concludes the proof of the claim in case (a).
We now consider case (b). Since N (∆(B), ) and N (∆(R B), ) differ by at most a multiplicative constant, it is enough to prove (6.16) for R B in place of B. It follows from the assumption of case (b), the decomposition (6.17) for both µ 1 , µ 2 , and the definitions of the sets F i that Hence, using that µ(B) ≥ −2 /3, we can find k 1 , k 2 such that D(ρ (i) k i ) < s 0 − δ for i = 1, 2, and (6.21) ρ (1) if is large enough in terms of ε.

SHARPNESS OF THE RESULTS
It is natural to ask what parts of our approach are sharp and which are not. In this section we show that the results of Section 5 are sharp, up to error terms. Hence, if the main results are not sharp (which seems likely), this is not due to the estimates for M τ (σ), but rather to the fact that Proposition 4.4 (which connects the value of M τ (σ) to the size of distance sets) is itself not sharp.
We begin by showing that Proposition 5.2 is sharp for all parameter values (and even the value of f (a) can be chosen as an arbitrary b ∈ [Da, Ca]). This is illustrated by the following functions.