Dyadic representation theorem using smooth wavelets with compact support

The representation of a general Calder\'on--Zygmund operator in terms of dyadic Haar shift operators first appeared as a tool to prove the $A_2$ theorem, and it has found a number of other applications. In this paper we prove a new dyadic representation theorem by using smooth compactly supported wavelets in place of Haar functions. A key advantage of this is that we achieve a faster decay of the expansion when the kernel of the general Calder\'on--Zygmund operator has additional smoothness.


Introduction
It was long conjectured that classical inequalities for singular integrals T on weighted spaces L 2 (w) with a Muckenhoupt A 2 weight w should take the sharp form . This A 2 conjecture was first verified by one of us [7] by introducing a dyadic representation of T , an expansion in terms of simpler discrete model operators (called dyadic/Haar shifts). Earlier versions of the A 2 conjecture for special operators such as the martingale transform, the Beurling-Ahlfors transform, the Hilbert transform and the Riesz transform were due to J. Wittwer [29], S. Petermichl and A.Volberg [25], S. Petermichl [23,24], respectively. Since then, simpler proofs of the A 2 theorem as in Lerner [12], Lacey [11], and Lerner-Ombrosi [13] replaced the dyadic representation by sparse domination, but the original dyadic representation theorem continues to have an independent interest and other applications.
One such application is the extension of the linear dyadic representation to biparameter (also known as product-space, or Journé-type, after [10]) singular integrals in [17], which has defined the new standard framework for the study of these operators. The multi-parameter extension of this is due to Y. Ou [21] and a bi-linear version is due to Li-Martikainen-Ou-Vuorinen [14]. In the bi-parameter context the representation theorem has proven to be extremely useful e.g., in connection with bi-parameter commutators and weighted analysis, see Holmes-Petermichl-Wick [6], Ou-Petermichl-Strouse [22] and Li-Martikainen-Vuorinen [15,16]. On the other hand, there are some fundamental obstacles to sparse domination of bi-parameter objects, see [1], which makes the dyadic representation particularly useful in this setting.
In another direction, an open problem in vector-valued Harmonic Analysis is to describe the linear dependence of the norm of a vector-valued Calderón-Zygmund operator on the UMD constant of the underlying Banach space. In abstract UMD spaces, the linear bound has only been shown for the Beurling-Ahlfors transform and for some other special operators with even kernel such as certain Fourier multiplier operators (see [19]). It is also interesting to mention that, as was the case with the A 2 theorem, the linear bound for the Beurling-Ahlfors transform has been known for some time, yet the possible linear dependence between the vector-valued Hilbert transform and the UMD constant is still a famous open problem (see [9,Problem O.6]). More recently, Pott and Stoica established in [26] the linear dependence of sufficiently smooth Banach space-valued even singular integrals on the UMD constant by showing such a linear estimate for symmetric dyadic shifts. Their estimate for dyadic shifts grows like 2 max(i,j)/2 in terms of the parameters (i, j) of the shifts. As explained in their work, to have convergence, one needs a decay factor 2 −s max(i,j) , which is guaranteed by kernel smoothness s > 1 2 and only in dimension d = 1. It is interesting to notice that in most other applications of the dyadic representation theorem, notably to the weighted inequalities, the rate of convergence of the representation is irrelevant as long as it is exponential. Formally, the same argument should work in any dimension d assuming smoothness of order s > 1 2 d, but the existing Haar dyadic representation can only "see" smoothness up to order s ≤ 1; thus 1 2 d < s ≤ 1 forces d = 1. This motivated us to find a new version of the dyadic representation theorem with faster decay using smooth wavelets with compact support. Our main result is the following (see Section 2 for a precise definition of the wavelet shifts S ij ω and the required regularity of the wavelets): 1.1. Theorem. Let s ∈ Z + , and T be a bounded Calderón-Zygmund operator in L 2 (R d ) with a kernel satisfying |∂ α K(x, y)| ≤ K CZs |x−y| −d−|α| for every |α| ≤ s. In addition, suppose that T, T * : P s → P s , where P s is the space of polynomials of degree less than s. Then for any given ǫ > 0, T has an expansion, say for where c depends only on d, s and ǫ, E ω is the expectation with respect to the random parameter ω, and S ij ω is a version of a dyadic shift with parameters (i, j) but using sufficiently regular wavelets in place of the Haar functions.
Having this result at our disposal, we can hope to extend the result of [26] to dimensions d > 1. We plan to address this question in future work. Another possible area of applications is numerical algorithms for singular integrals, as in [2], where an ancestor of the dyadic representation is used for this purpose. It is clear that, in such applications, a high rate of convergence would be preferred.
The interpretation of the assumption that T and T * map the space of polynomials P s into itself is made rigorous in Section 3, where we restate Theorem 1.1 as Theorem 3.2. These are "special cancellation" or "vanishing paraproduct" assumptions that one might like to remove in future work.
Since the circulation of this work, Di Plinio et al. [5] have presented an alternative "representation theorem using smooth wavelets", where they also deal with the paraproduct terms arising from more general cancellation assumptions; see also [4] for an extension of their representation to bi-linear operators. Their version is a closer relative of the continuous wavelet transform, in contrast to the semi-discrete representation in our Theorem 1.1.
The paper is organized as follows: in Section 2 we recall the necessary definitions and results that we are using. Section 3 is dedicated to a detailed statement of our main result (see Theorem 3.2). As in the case of the dyadic representations using Haar functions, our proof of Theorem 1.1/3.2 relies on an expansion (see Proposition 3.4) of the Calderón-Zygmund operator in terms of the (previously Haar, now smooth) wavelet basis, but the subsequent analysis of the expansion presents some significant departures from the Haar case. We split the series that appears in this expansion into five parts which are treated in Section 4.
Notation. Throughout the paper, we denote by c, C constants that depend at most on some fixed parameters that should be clear from the context. The notation A B means that A ≤ CB holds for such a constant C. Moreover, when Q is a cube and t > 0, then tQ represents the cube with the same centre and t times the sidelength of Q. Also, we make the convection that | | stands for the ℓ ∞ norm on R d , i.e., |x| := max 1≤i≤d |x i |. While the choice of the norm is not particularly important, this choice is slightly more convenient than the usual Euclidean norm when dealing with cubes as we will: e.g., the diameter of a cube in the ℓ ∞ norm is equal to its sidelength ℓ(Q).
Acknowledgements. Both authors were supported by the Academy of Finland through project No. 314829 and through the Finnish Centre of Excellence in Randomness and Structure "FiRST" (project No. 346314). The second author is very grateful to his doctoral supervisor Prof. Tuomas Hytönen for many discussions, giving him a lot of motivation on the subject and plenty of remarks helpful to improve the content and the exposition of this paper. Also, the second author would like to thank the Foundation for Education and European Culture (Founders Nicos and Lydia Tricha) for their financial support during the academic years 2017-2018, 2018-2019 and 2019-2020.
The authors are grateful to the anonymous referees for their constructive comments that improved our presentation.

Then
D ω := {I+ω : I ∈ D 0 }, and it is straightforward to check that D ω inherits the important nestedness property of D 0 : if I, J ∈ D ω , then I ∩ J ∈ {I, J, ∅}. When the particular ω is unimportant, the notation D is sometimes used for a generic dyadic system.

2.
A. Random dyadic systems; good and bad cubes. We obtain a notion of random dyadic systems by equipping the parameter set Ω := ({0, 1} d ) Z with the natural probability measure: each component ω j has an equal probability 2 −d of taking any of the 2 d values in {0, 1} d , and all components are independent of each other. We denote by E ω the expectation over the random variables ω j , j ∈ Z.
Consider the modulus of continuity Θ(t) = t θ , θ ∈ (0, 1), for which we will formulate the notion of good and bad cubes. We also fix a (large) parameter r ∈ Z + .
roughly, I is relatively close to the boundary of a much bigger cube. A cube is called good if it is not bad.
We repeat from [8, Section 2.3] some basic probabilistic observations related to badness. Let I ∈ D 0 be a reference interval. The position of the translated interval by definition, depends only on ω j for 2 −j < ℓ(I). On the other hand, the badness of I+ω depends on its relative position with respect to the bigger intervals The same translation component j:2 −j <ℓ(I) 2 −j ω j appears in both I+ω and J+ω, and so does not affect the relative position of these intervals. Thus this relative position, and hence the badness of I, depends only on ω j for 2 −j ≥ ℓ(I). In particular: 2.3. Lemma. For I ∈ D 0 , the position and badness of I+ω are independent random variables.
Another observation is the following: by symmetry and the fact that the condition of badness only involves relative position and size of different cubes, it readily follows that the probability of a particular cube I+ω being bad is equal for all cubes I ∈ D 0 : The final observation concerns the value of this probability: The proof of the previous lemma can be found in [8, Lemma 2.3].
2.B. Wavelet functions. We introduce the notion of the smooth wavelet functions with compact support associated to any given dyadic system D. Such wavelets were originally constructed by I. Daubechies [3] but in this paper we will follow [18].
In [18,Chapter 3] one can find the construction of the smooth wavelets with compact support for d = 1. Moreover, once the 1-dimensional wavelets and the related father wavelets ψ 0 = φ are available, the d-dimensional wavelets can be constructed by and we make the convention ψ 1 = ψ.
, and this collection has the following fundamental properties of a wavelet basis: Here u, v ∈ N are two parameters that may or may not be equal. Note that Haar functions correspond to m = 1, u = v = 0, but in general m > 1.
For a fixed D, all the wavelet functions ψ η I , Since the different η's seldom play any major role, this will be often abbreviated (with slight abuse of language) simply as and the finite summation over η is understood implicitly.
where ψ I is a wavelet function on I (similarly ψ J ), and the a IJK are coefficients with The dyadic shifts considered in many other papers correspond to the special case of Haar wavelets.
The wavelet shift is called good if all dyadic cubes I, J, K such that a IJK = 0 satisfy mI, mJ ⊂ K; otherwise, it is called bad. We note that this condition is automatic when m = 1, but not in general. Nevertheless, a closely related notion of good shifts already appeared in [7], where it played a certain role. This notion was not needed in the many works that appeared on this topic since [7]. The L 2 boundedness of the good wavelet shift S is a consequence of the following facts: 2.8. Lemma. If S is a good wavelet shift then A K indicates an "averaging operator" on K which satisfies: Proof. Since S is a good wavelet shift, if a IJK = 0 then mJ ⊂ K and mI ⊂ K, for fixed m ≥ 1, i.e., mJ and mI are good cubes inside K.
Using the bound (2.6) for the coefficients a IJK , the regularity of ψ I , and the previous fact that mJ ⊂ K, mI ⊂ K, for fixed m ≥ 1, we have where the (easy to check) bounded overlap of the cubes mJ (respectively mI) was used in the last step.
2.9. Corollary. Let S be a good wavelet shift. The following estimate for the "averaging operator" A K holds: Proof. Applying the pointwise bound of Lemma 2.8 to each A K we have 2.10. Lemma. Let S be a good wavelet shift. Then Proof. We use the orthonormality of the wavelet functions. Let H i K := span{ψ I : I ⊆ K, ℓ(I) = 2 −i ℓ(K)}, and let P i K be the orthogonal projection of L 2 onto this subspace. For a fixed i, these spaces are orthogonal, as K ranges over D.
We have f, ψ I = P i K f, ψ I for all I appearing in A K , and hence We can apply these identities and Pythagoras' theorem to the result that: where we used the L 2 boundedness of A K from Corollary 2.9 in the second-to-last step.
3. The dyadic representation theorem for smooth compactly supported wavelets Let T be a Calderón-Zygmund operator on R d . That is, it acts on a suitable dense subspace of functions in L 2 (R d ) (for the present purposes, this class should at least contain the indicators of cubes in R d ) and has the kernel representation Moreover, we assume that the kernel is s-times differentiable and satisfies the higher order standard estimate: for all x, y ∈ R d , x = y, α ∈ N and |α| ≤ s. Let us denote the smallest admissible constant C 1 by K CZs . We say that T is a bounded Calderón-Zygmund operator, if in addition T : L 2 (R d ) → L 2 (R d ), and we denote its operator norm by T L 2 →L 2 .
Under such assumptions, we can also define the action of T on the space P s of polynomials of degree less than s. This is well known, and the reader can consult [28] (see also [27]) for a comprehensive discussion. The necessary set-up for our needs is as follows: If ψ ∈ C s c (B(0, R)) satisfies´R d P ψ = 0 for all P ∈ P s , then we have, for |x| > 2R, This expression is integrable against any P ∈ P s over the region B(0, 2R) c . On the other hand, it is clear that is well defined for every P ∈ P s and every ψ ∈ C s c (R d ) that is orthogonal to P s . This defines T * P as a functional on the said subspace of C s c (R d ), and the definition of T P can be given in a similar way, since the adjoint T * satisfies the same assumptions.
We say that T maps P s into itself, if T P, ψ = 0 for all P ∈ P s and all ψ ∈ C s c (R d ) that are orthogonal to P s . Here is our main result: 3.2. Theorem. Let T be a bounded Calderón-Zygmund operator with a kernel satisfying (3.1) and suppose that both T and T * map P s into itself, in the sense defined above. Moreover, let the wavelet ψ I satisfy the regularity and cancellation property for u = s and v = s−1, respectively. Then for any given ǫ > 0, T has an expansion, say for f, g ∈ C 1 c (R d ), where c depends only on d, s and ǫ, and S ij ω is a good wavelet shift of parameters (i, j) on the dyadic system D ω .
The following remark shows that the assumption that T and T * map P s into itself follows from the other assumptions of Theorem 3.2, if in addition the operator T is translation invariant.

3.3.
Remark. Let T be a bounded Calderón-Zygmund operator with a kernel satisfying (3.1) and suppose in addition that T is translation invariant. Then both T and T * map P s into itself.
Proof. It is enough to consider just T , since all the assumptions, and hence the conclusions, pass to the adjoint T * . For the result concerning T , we refer the reader to [ A key to the proof of the dyadic representation is a random expansion of T in terms of wavelet functions ψ I , where the bad cubes are avoided: Then the following representation is valid: We make use of the above random wavelet expansion of f , multiply and divide by π good = P ω (I+ω good) = E ω 1 good (I+ω), and use the independence from Lemma 2.3 to get: On the other hand, using independence again in half of this double sum, we have g, ψ J+ω ψ J+ω , T ψ I+ω ψ I+ω , f .

DYADIC REPRESENTATION THEOREM USING WAVELETS WITH COMPACT SUPPORT 9
Comparison with the basic identity shows that Symmetrically, we also have We focus on the summation inside E ω , for a fixed value of ω ∈ Ω, and manipulate it into the required form. Moreover, we will focus on the half of the sum with ℓ(J) ≥ ℓ(I), the other half being handled symmetrically. We further divide this sum into the following parts: We observe that the main difference in the division of the previous sum and the one in [8, after the Proposition 3.5] is that the sum σ out in [8] has been split into σ far and σ between , which are handled differently. Regarding the sum σ in we will not use the same method as in [8,Section 3.2]. The sums σ = and σ near will be treated in a similar but not exactly the same way as in [8,Section 3.3].
In order to recognize these series as sums of good wavelet shifts, we need to find, for each pair (I, J) appearing here, a common dyadic ancestor which contains mI and mJ. The following lemma provides the existence of such containing cubes, with control on their size: 3.6. Lemma. If I ∈ D is good and J ∈ D is a cube with ℓ(J) ≥ ℓ(I), then there exists K ⊇ mI ∪ mJ which satisfies Proof. Let us start with the following initial observation: if I ∈ D is good and K ∈ D satisfies I ⊆ K, and ℓ(K) ≥ 2 r ℓ(I), then when r is large enough. Hence mI ⊆ K, and we can proceed with the proof of mJ ⊆ K. Using an elementary triangle inequality we estimate dist(I, K c ) in the following way: Thus, In order to conclude that mJ ⊆ K we want the right hand side of (3.7) to be non-negative. This is achieved by taking the smallest ℓ(K) such that Then, in fact . Note that we can start the summation from 1 instead of 0, since the disjointness of I and J implies that K = I ∨ J must be strictly larger than either of I and J. The goal is to identify the quantity in parentheses as a decaying factor times an averaging operator with parameters (i, j). The proof of the following lemma is similar to [8,Lemma 3.8] but to make use of the smoothness, we subtract a higher order Taylor expansion of the kernel K instead of y → K(x, y) at y = c I . 4.1. Lemma. For I and J appearing in σ far , we have where K = I ∨ J and θ ∈ (0, 1).
Proof. Using the properties of ψ I , Taylor series of order s of y → K(x, y) at the centre point y = c I of I, higher order standard estimate of the kernel (3.1), and Lemma 3.6 |J| |I|.
K f , where θ ∈ (0, 1) and A ij K is an averaging operator with parameters (i, j). Proof. By Lemma 4.1, substituting ℓ(I)/ℓ(K) = 2 −i , and the first factor is precisely the required size of the coefficients of A ij K .
Summarizing, we have where we choose θ = ǫ d+s for any given ǫ > 0 and S ij is a good wavelet shift with parameters (i, j).

4.B.
Intermediate cubes, σ between . Let M > m. In this part, we make use of the fact that ψ J has a Taylor series of order s at the centre point c I of I and we denote We drop c I , when it is clear from the context. Thus, we have Observe that due to the hypothesis of Theorem 3.2 that the operators T, T * map P s to itself, and by the cancellation of ψ I the last term of (4.3) vanishes. The first term of (4.3) we can further split as In the following we estimate the remaining non-vanishing terms of (4.4). For these terms, we obtain estimates that do not depend on the fact that we are dealing with the intermediate cubes, and in fact we will use these same estimates again to deal with σ in . Proof. Let us denote Tayl s (K) := 0≤|α|<s (y−cI ) α α! ∂ α 2 K(x, c I ). Using the cancellation of ψ I , the Taylor series of order s of y → K(x, y) at the centre point y = c I of I and the higher order standard estimate of the kernel (3.1) Now, using the regularity property and the Taylor series of order s of ψ J at the centre point c I of I we derive the following two estimates: where a = |α| ∈ N.
Thus, the right hand side of (4.6) is dominated by

4.7.
Lemma. For all I, J ∈ D such that ℓ(I) ≤ ℓ(J), we have Proof. By the Taylor series of order s of ψ J at the centre point c I of I and the regularity properties of ψ I , ψ J , we can compute the left hand side of (4.8) as follows: By combining equation (4.4), Lemmata 4.5 and 4.7 we have (4.9) Now, for the completion of the analysis of the sum σ between we will need the following lemma:    By combining Lemmata 3.6 and 4.10 we can estimate |K|/|J| ℓ(I)/ℓ(J) s−ǫ of (4.13) as follows: where we choose θ = ǫ d+s for any given ǫ > 0. Summarizing, from (4.14) and (4.15) we have where A ij K is an averaging operator and S ij is a good wavelet shift with parameters (i, j). where a(I, J) is defined in (4.13) and satisfies the estimate (4.14).
We observe that for the contained cubes σ in , we have from Lemma 3.6 the bound . Also, from the definition of the contained cubes we have D(I, J) ℓ(J), which is the same as the conclusion of Lemma 4.10 in the case of σ between . Thus, we have all the same auxiliary estimates as in σ between and the same conclusion where A ij K is an averaging operator and S ij is a good wavelet shift with parameters (i, j).

4.D.
Near-by cubes, σ = and σ near . We are left to deal with the sums σ = of equal cubes I = J, as well as σ near of disjoint near-by cubes with dist(I, J) ≤ ℓ(J)(ℓ(I)/ℓ(J)) θ . Since I is good, this necessarily implies that ℓ(I) > 2 −r ℓ(J). Then, for a given J, there are only boundedly many related I in this sum. Note that in contrast to [8, Section 3.3] we compute both sums using good wavelet shifts of type (i, i) and (i, j).
Using this lemma and applying Lemma 3.6 for the good I = J ∈ D and a cube J ′ ∈ D adjacent to I (i.e., ℓ(J ′ ) = ℓ(I) and dist(I, J ′ ) = 0), we have that K := I ∨J ′ satisfies ℓ(K) ℓ(I) and mI ⊂ K. Moreover, from Lemma 4.16, we have Thus, we can organize the sum σ = as follows where A ii K is an averaging operator and S ii is a good wavelet shift with parameters (i, i).
For I and J participating in σ near , we conclude from Lemma 3.6 that K := I ∨ J satisfies ℓ(K) ℓ(I). Also, from Lemma 4.16, we have Hence, we may organize g, S ij f , where A ij K is an averaging operator and S ij is a good wavelet shift with parameters (i, j).
Summarizing, we have where S ij is a good wavelet shift of type (i, j).