1 Introduction

Working on the Euclidean product space \(\mathbb {R}^2 = \mathbb {R}\times \mathbb {R}\), we define for \(x = (x^1, x^2)\) and \(y = (y^1, y^2)\) the decay factor

$$\begin{aligned} D_{\theta }(x, y):= \Bigg (\frac{|x^1-y^1|}{|x^2-y^2|} + \frac{|x^2-y^2|}{|x^1-y^1|}\Bigg )^{-\theta } < 1, \qquad \theta \in (0, 1], \end{aligned}$$
(1.1)

whenever \(x^1 \ne y^1\) and \(x^2 \ne y^2\). Notice that this decay factor becomes larger and larger as \(\theta \) shrinks. The point is that when \(\theta = 1\) it is at its smallest, and then

$$\begin{aligned} \frac{1}{|x^1-y^1|}\frac{1}{|x^2-y^2|} D_{1}(x,y) = \frac{1}{|x^1-y^1|^{2}+|x^2-y^2|^{2}} = \frac{1}{|x-y|^2}. \end{aligned}$$

That is, in this case, the bi-parameter size estimate multiplied with this decay factor yields the usual one-parameter size estimate. When \(\theta < 1\), the decay factor is larger and the corresponding product is something between the bi-parameter and one-parameter size estimate.

We say that kernels that decay like

$$\begin{aligned} \frac{1}{|x^1-y^1|}\frac{1}{|x^2-y^2|} D_{\theta }(x,y) \end{aligned}$$

for some \(\theta \) and satisfy some similar continuity estimates are CZX kernels—one can pronounce the “X” in “CZX” as “exotic”. Such kernels are more singular than the standard Calderón–Zygmund kernels, but less singular than the product Calderón–Zygmund(–Journé) kernels [5, 13, 18]. Even with \(\theta = 1\), they are different from the standard Calderón–Zygmund kernels—in this case, the difference is only in the Hölder estimates (see Sect. 2). The CZX kernels can, for example, be motivated by looking at Zygmund dilations [4, 19,20,21]. Zygmund dilations are a group of dilations lying in between the standard product theory and the one-parameter setting—in \(\mathbb {R}^3 = \mathbb {R}\times \mathbb {R}^2\) they are the dilations \((x_1, x_2, x_3) \mapsto (\delta _1 x_1, \delta _2 x_2, \delta _1 \delta _2 x_3)\). Recently, in [8] and subsequently in [3, 9] general convolution form singular integrals invariant under Zygmund dilations were studied. In these papers the decay factor

$$\begin{aligned} t \mapsto \Big (t+\frac{1}{t}\Big )^{-\theta } \end{aligned}$$

controls the additional, compared to the product setting, decay with respect to the Zygmund ratio

$$\begin{aligned} \frac{|x_1x_2|}{|x_3|}. \end{aligned}$$

See also our recent paper [12] which attacks the Zygmund setting from the point of view of new multiresolution methods. Essentially, in the current paper, we isolate the conditions on the lower-dimensional kernels obtained by fixing the variables \(x^1,y^1\) in the Zygmund setting [8, 12] and ignoring the dependence on these variables. A class of CZX operators is also induced by the Fefferman–Pipher multipliers [4]— importantly, they satisfy \(\theta = 1\) but with an additional logarithmic growth factor. This subtle detail has a key relevance for the weighted estimates as we explain below.

There is a useful operator-valued viewpoint to multi-parameter analysis—Journé [13] views, e.g. bi-parameter operators as “operator-valued one-parameter operators”. For recent work using this viewpoint see e.g. [11]. Developing such an approach to Zygmund SIOs is interesting. The operator-valued viewpoint is useful for example when proving the necessity of T1 type assumptions in the product setting, see e.g. [6], and the full product BMO type T1 theory of Zygmund SIOs is still to be developed. The operator-valued approach will necessarily be complicated in the Zygmund setting, since the parameters are tied and it is not as simple as fixing a single variable. Our new exotic operators are pertinent to the operator-valued viewpoint, where Zygmund SIOs could partly be seen as operator-valued one-parameter operators the values being exotic operators.

It has been known for a long time that Calderón–Zygmund operators act boundedly in the weighted spaces \(L^p(w)\) whenever w belongs to the Muckenhoupt class \(A_p\), defined by the finiteness of the weight constant

$$\begin{aligned}{}[w]_{A_p}:=\sup _J \langle w\rangle _J\langle w^{-1/(p-1)}\rangle _J^{p-1}, \end{aligned}$$

where the supremum is over all cubes J. On the other hand, the more singular multi-parameter Calderón–Zygmund(–Journé) operators in general satisfy such bounds only for the smaller class of strong \(A_p\) weights, defined by \([w]_{A_p^*}\), where the supremum is over all axes-parallel rectangles. While on a general level, the CZX operators behave quite well with any \(\theta \), even with \(\theta < 1\), for one-parameter weighted estimates it is critical that \(\theta = 1\), the aforementioned logarithmic extra growth being allowed.

1.2 Theorem

Let \(T\in \mathcal {L}(L^2(\mathbb {R}^2))\) be an operator with a CZX kernel.

  1. 1.

    If \(\theta < 1\) in (1.1), one-parameter weighted estimates may fail.

  2. 2.

    If \(\theta = 1\) in (1.1), possibly with a logarithmic growth factor, then for every \(p\in (1,\infty )\) and every \(w\in A_p(\mathbb {R}^2)\) the operator T extends boundedly to \(L^p(w)\).

In the paper [12], we also develop the corresponding counterexamples in the full Zygmund case. There the interest is whether Zygmund singular integrals are weighted bounded with respect to the Zygmund weights—a larger class than the strong \(A_p\) with the supremum running only over the so-called Zygmund rectangles satisfying the natural scaling. For \(\theta < 1\), the situation parallels the one from the CZX world—they need not be weighted bounded with respect to the Zygmund weights.

Apart from the weighted estimates, we want to make the case that, in many ways, the CZX kernels with an arbitrary \(\theta \) can be seen as part of the extended realm of standard kernels, rather than the more complicated product theory. In particular, the T1 theorem for CZX kernels takes the following form reminiscent of the standard T1 theorem [1].

1.3 Theorem

Let B(fg) be a bilinear form defined on finite linear combinations of indicators of cubes of \(\mathbb {R}^2\), and such that

$$\begin{aligned} B(f,g)=\iint K(x,y)f(y)g(x)\,\textrm{d}x\,\textrm{d}y \end{aligned}$$

when \(\{f\ne 0\}\cap \{g\ne 0\}=\varnothing \), where \(K\in CZX(\mathbb {R}^2)\). Then the following are equivalent:

  1. (1)

    There is a bounded linear \(T\in \mathcal {L}(L^2(\mathbb {R}^2))\) such that \(\langle Tf,g \rangle =B(f,g)\).

  2. (2)

    B satisfies

    • the weak boundedness property \(|B(1_I,1_I)|\lesssim |I|\) for all cubes \(I\subset \mathbb {R}^2\), and

    • the T(1) conditions

      $$\begin{aligned} B(1,g)=\int b_1 g,\qquad B(f,1)=\int b_2 f \end{aligned}$$

      for some \(b_1,b_2\in {\text {BMO}}(\mathbb {R}^2)\) and all fg with \(\int f=0=\int g\).

    Moreover, under these conditions,

  3. (3)

    T defines a bounded operator from \(L^\infty (\mathbb {R}^2)\) to \({\text {BMO}}(\mathbb {R}^2)\), from \(L^1(\mathbb {R}^2)\) to \(L^{1,\infty }(\mathbb {R}^2)\), and on \(L^p(\mathbb {R}^2)\) for every \(p\in (1,\infty )\).

In fact, our proof also gives a representation of B(fg), Theorem 4.9, which includes both one-parameter [10] and bi-parameter [18] elements. The following commutator bounds follow from the representation; however, the argument is not entirely standard due to the hybrid nature of the model operators.

1.4 Theorem

Let \(T\in \mathcal {L}(L^2(\mathbb {R}^2))\) be an operator associated with a CZX kernel K. Then

$$\begin{aligned} \Vert [b,T]f \Vert _{L^p} \lesssim \Vert b\Vert _{{\text {BMO}}} \Vert f \Vert _{L^p} \end{aligned}$$

whenever \(p \in (1, \infty )\). Here \([b,T]f:=bTf-T(bf)\).

Thus, the commutator estimate holds with the one-parameter \({\text {BMO}}\) space. This is another purely one-parameter feature of these exotic operators. As the weighted estimates do not, in general, hold, the commutator estimate cannot be derived from the well-known Cauchy integral trick.

Over the past several years, a standard approach to weighted norm inequalities has been via the methods of sparse domination pioneered by Lerner. For \(\theta = 1\), we can derive our weighted estimates directly from our representation theorem. However, we also provide some additional sparse estimates that give a solid quantitative dependence on the \(A_p\) constant and yield two-weight commutator estimates for free.

1.5 Theorem

Let \(T\in \mathcal {L}(L^2(\mathbb {R}^2))\) be an operator with a CZX kernel with \(\theta = 1\). Then for every \(p\in (1,\infty )\) and every \(w\in A_p(\mathbb {R}^2)\) the operator T extends boundedly to \(L^p(w)\) with norm

$$\begin{aligned} \Vert T\Vert _{\mathcal {L}(L^p(w))}\lesssim _p [w]_{A_p}^{p'}. \end{aligned}$$

Moreover, if \(\nu =w^{\frac{1}{p}}\lambda ^{-\frac{1}{p}}\) with \(w,\lambda \in A_p\) and

$$\begin{aligned} \Vert b\Vert _{{\text {BMO}}_\nu }:= \sup _I \frac{1}{\nu (I)}\int _I |b-\langle b\rangle _I| < \infty , \end{aligned}$$

where the supremum is over cubes \(I \subset \mathbb {R}^2\), then

$$\begin{aligned} \Vert [b,T]\Vert _{L^p(w)\rightarrow L^p(\lambda )}\lesssim \Vert b\Vert _{{\text {BMO}}_\nu }. \end{aligned}$$

The quantitative bound (in particular quadratic in \([w]_{A_2}\) when \(p=2\)) is worse than the linear \(A_2\) theorem valid for classical Calderón–Zygmund operators [10].

We conclude the introduction with an outline of how the paper is organized. In Sect. 2, we define the CZX kernels and prove part of Theorem 1.3 in Proposition 2.4. Section 3 begins with the definition of CZX forms. Lemma 3.3 proves estimates for CZX forms acting on Haar functions, which will be used in the representation theorem, Theorem 4.9. In Proposition 4.4, we prove certain weighted maximal function estimates, which are at the heart of proving that CZX forms with decay parameter \(\theta _2=1\) satisfy weighted estimates. The dyadic operators used to represent CZX forms are defined in Definition 4.5, and estimates for them are proved in Lemma 4.6. The representation identity and the T1 theorem, and the weighted estimates when \(\theta _2=1\), for CZX forms are recorded in Theorem 4.9. Theorem 1.4 is proved in Sect. 5. In the beginning of Sect. 6, we construct the counterexamples required to prove (1) of Theorem 1.2. The sparse domination of CZX operators with \(\theta _2=1\) is recorded in Corollary 6.7. Theorem 1.5 is proved in Corollary 6.8 and in the discussion after Proposition 6.10.

2 CZX Kernels

We work in \(\mathbb {R}^2 = \mathbb {R}\times \mathbb {R}\). Let \(\theta _1, \theta _2 \in (0, 1]\). For \(x^1 \ne y^1\) and \(x^2 \ne y^2\) define

$$\begin{aligned} D_{\theta _2}(x, y):= \Bigg (\frac{|x^1-y^1|}{|x^2-y^2|} + \frac{|x^2-y^2|}{|x^1-y^1|}\Bigg )^{-\theta _2} < 1. \end{aligned}$$

We assume that the kernel \(K :\mathbb {R}^2 {\setminus } \{x^1 = y^1 or x^2 = y^2\} \rightarrow \mathbb {C}\) satisfies the size estimate

$$\begin{aligned} |K(x,y)| \lesssim \frac{1}{|x^1-y^1|}\frac{1}{|x^2-y^2|} D_{\theta _2}(x,y) \end{aligned}$$

and the mixed Hölder and size estimate

$$\begin{aligned} |K(x,y) - K((w^1, x^2), y)| \lesssim \frac{|x^1-w^1|^{\theta _1}}{|x^1-y^1|^{1+\theta _1}} \frac{1}{|x^2-y^2|} D_{\theta _2}(x,y) \end{aligned}$$

whenever \(|x^1-w^1| \le |x^1-y^1|/2\), together with the other three symmetric mixed Hölder and size estimates. If this is the case, we say that \(K\in CZX(\mathbb {R}^2)\). Again, such kernels are more singular than standard Calderón–Zygmund kernels, but less singular than the product Calderón–Zygmund(–Journé) kernels. See Remark 4.10 for some additional logarithmic factors when \(\theta _2=1\) and why they are relevant from the point of view of Fefferman–Pipher multipliers [4].

2.1 Lemma

Let \(K\in CZX(\mathbb {R}^2)\) and \(x^1, x^2, y^2 \in \mathbb {R}\). Then

$$\begin{aligned} \int _{\mathbb {R}} |K(x,y)|\,\textrm{d}y^1 \lesssim \frac{1}{|x^2-y^2|}. \end{aligned}$$

Also, for \(L > 0\) there holds that

$$\begin{aligned} \int _{\{y^1:|x^1-y^1| \lesssim L\}} |K(x,y)|\,\textrm{d}y^1 \lesssim \frac{L^{\theta _2}}{|x^2-y^2|^{1+\theta _2}}, \end{aligned}$$

which is a useful estimate if \(L \lesssim |x^2-y^2|\).

Proof

By elementary calculus

$$\begin{aligned} \begin{aligned}&\int _{\mathbb {R}} |K(x,y)|\,\textrm{d}y^1 \lesssim \frac{1}{|x^2-y^2|} \int _0^\infty \frac{1}{u}\Big (\frac{u}{|x^2-y^2|}+\frac{|x^2-y^2|}{u}\Big )^{-\theta _2}\,\textrm{d}u \\&\quad \lesssim \frac{1}{|x^2-y^2|} \Big ( \int _0^{|x^2-y^2|}\frac{\,\textrm{d}u}{u^{1-\theta _2}|x^2-y^2|^{\theta _2}} +\int _{|x^2-y^2|}^\infty \frac{\,\textrm{d}u}{u^{1+\theta _2}|x^2-y^2|^{-\theta _2}}\Big )\\&\quad \lesssim \frac{1}{|x^2-y^2|}, \end{aligned} \end{aligned}$$

and the logic for the second estimate is also clear from this. \(\square \)

The first sharper estimate in the next lemma is only needed to derive the weighted estimates in the case \(\theta _2=1\).

2.2 Lemma

Let \(K\in CZX(\mathbb {R}^2)\) and \(J=J^1\times J^2\subset \mathbb {R}^2\) be a square with centre \(c_J=(c_{J^1},c_{J^2})\). If \(x\in J\) and \(y\in (3J^1)^c\times (3J^2)^c\), then

$$\begin{aligned} \begin{aligned} |K(x,y)-K(c_J,y)|&\lesssim \prod _{i=1}^2\frac{1}{{\text {dist}}(y^i,J^i)}\times \frac{\ell (J)^{\theta _1}(\min _{i=1,2}{\text {dist}}(y^i,J^i))^{\theta _2-\theta _1}}{(\max _{i=1,2}{\text {dist}}(y^i,J^i))^{\theta _2}} \\&\lesssim \prod _{i=1}^2\frac{\ell (J)^\theta }{{\text {dist}}(y^i,J^i)^{1+\theta }},\qquad \theta :=\frac{1}{2}\min (\theta _1,\theta _2). \end{aligned} \end{aligned}$$

Proof

There holds that

$$\begin{aligned}{} & {} |K(x,y)-K(c_J,y)| \le |K(x^1,x^2,y)-K(c_{J^1},x^2,y)|\\{} & {} \quad +|K(c_{J^1},x^2,y)-K(c_{J^1},c_{J^2},y)|. \end{aligned}$$

Since \(2|x^i-c_{J^i}|\le \ell (J)\le {\text {dist}}(y^i,J^i)\le \min (|y^i-x^i|,|y^i-c_{J^i}|)\), we conclude

$$\begin{aligned} \begin{aligned} |K(x^1,x^2,y)-K(c_{J^1},x^2,y)|&\lesssim \frac{|x^1-c_{J^1}|^{\theta _1}}{|x^1-y^1|^{1+\theta _1}}\frac{1}{|x^2-y^2|}D_{\theta _2}(x, y), \\ |K(c_{J^1},x^2,y)-K(c_{J^1},c_{J^2},y)|&\lesssim \frac{1}{|c_{J^1}-y^1|}\frac{|x^2-c_{J^2}|^{\theta _1}}{|c_{J^2}-y^2|^{1+\theta _1}}D_{\theta _2}(c_J, y). \end{aligned} \end{aligned}$$

Suppose for instance that \({\text {dist}}(y^1,J^1)\ge {\text {dist}}(y^2,J^2)\). Then the sum simplifies to

$$\begin{aligned} |K(x,y)-K(c_J,y)| \lesssim \frac{1}{{\text {dist}}(y^1,J^1)}\frac{\ell (J)^{\theta _1}}{{\text {dist}}(y^2,J^2)^{1+\theta _1}} \Big (\frac{{\text {dist}}(y^1,J^1)}{{\text {dist}}(y^2,J^2)}\Big )^{-\theta _2}, \end{aligned}$$

where further

$$\begin{aligned} \begin{aligned}&\frac{\ell (J)^{\theta _1}}{{\text {dist}}(y^2,J^2)^{\theta _1}} \Big (\frac{{\text {dist}}(y^1,J^1)}{{\text {dist}}(y^2,J^2)}\Big )^{-\theta _2} =\frac{\ell (J)^{\theta _1}{\text {dist}}(y^2,J^2)^{\theta _2-\theta _1}}{{\text {dist}}(y^1,J^1)^{\theta _2}} \\&\quad \le \Big (\frac{\ell (J)}{{\text {dist}}(y^1,J^1)}\Big )^{\min (\theta _1,\theta _2)} \le \prod _{i=1}^2\Big (\frac{\ell (J)}{{\text {dist}}(y^i,J^i)}\Big )^{\theta } \end{aligned} \end{aligned}$$

with \(\theta :=\frac{1}{2}\min (\theta _1,\theta _2)\). \(\square \)

A combination of the previous two lemmas shows that CZX-kernels satisfy the Hörmander integral condition:

2.3 Lemma

Let \(K\in CZX(\mathbb {R}^2)\), and \(x\in J\) for some cube \(J=J^1\times J^2\subset \mathbb {R}^2\) with centre \(c_J\). Then

$$\begin{aligned} \int _{(3J)^c}|K(x,y)-K(c_J,y)|\,\textrm{d}y\lesssim 1. \end{aligned}$$

Proof

Notice that

$$\begin{aligned} (3J)^c=((3J^1)^c\times 3J^2)\cup (3J^1\times (3J^2)^c)\cup ((3J^1)^c\times (3J^2)^c), \end{aligned}$$

where the first two components on the right-hand side are symmetric. For these, we simply estimate

$$\begin{aligned} \begin{aligned} \int _{(3J^1)^c \times 3J^2}|K(x,y)|\,\textrm{d}y&=\int _{(3J^1)^c}\Big (\int _{3J^2}|K(x,y)|\,\textrm{d}y^2\Big )\,\textrm{d}y^1 \\&\quad \lesssim \int _{(3J^1)^c}\frac{\ell (J)^{\theta _2}}{|x^1-y^1|^{1+\theta _2}}\,\textrm{d}y^1\lesssim 1, \end{aligned} \end{aligned}$$

where the first \(\lesssim \) was an application of Lemma 2.1. The estimate for \(K(c_J,y)\) is of course a special case of this with \(x=c_J\).

For the remaining component of the integration domain, there holds that

$$\begin{aligned} \begin{aligned} \int _{(3J^1)^c \times (3J^2)^c}|K(x,y)-K(c_J,y)|\,\textrm{d}y&\lesssim \int _{(3J^1)^c \times (3J^2)^c}\prod _{i=1}^2\frac{\ell (J)^\theta }{{\text {dist}}(y^i,J^i)^{1+\theta }}\,\textrm{d}y \\&\quad =\prod _{i=1}^2\int _{(3J^i)^c}\frac{\ell (J)^\theta }{{\text {dist}}(y^i,J^i)^{1+\theta }}\,\textrm{d}y^i \lesssim 1, \end{aligned} \end{aligned}$$

where the first \(\lesssim \) was an application of Lemma 2.2. \(\square \)

At this point, we can already provide a proof of part (1.3) of Theorem 1.3, which we restate as

2.4 Proposition

Let \(T\in \mathcal {L}(L^2(\mathbb {R}^2))\) be an operator associated with a CZX kernel K. Then T extends boundedly from \(L^\infty (\mathbb {R}^2)\) into \({\text {BMO}}(\mathbb {R}^2)\), from \(L^1(\mathbb {R}^2)\) into \(L^{1,\infty }(\mathbb {R}^2)\), and from \(L^p(\mathbb {R}^2)\) into itself for all \(p\in (1,\infty )\).

Proof

By Lemma 2.3, the kernel K satisfies the Hörmander integral condition; the symmetry of the assumption on K ensures that it also satisfies the version with the roles of the first and second variable interchanged. It is well known that any \(L^2(\mathbb {R}^2)\)-bounded operator with a Hörmander kernel satisfies the mapping properties stated in the proposition. (See e.g. [22, §I.5] for the boundedness from \(L^1(\mathbb {R}^2)\) into \(L^{1,\infty }(\mathbb {R}^2)\), and from \(L^p(\mathbb {R}^2)\) into itself for \(p\in (1,2)\), and [22, §IV.4.1] for the boundedness from \(L^\infty (\mathbb {R}^2)\) into \({\text {BMO}}(\mathbb {R}^2)\). The latter is formulated for convolution kernels \(K(x,y)=K(x-y)\), but an inspection of the proof shows that it extends to the general case with trivial modifications. The case of \(p\in (2,\infty )\) can be inferred either by duality (observing that the adjoint \(T^*\) satisfies the same assumption) or by interpolation between the \(L^2(\mathbb {R}^2)\) and the \(L^\infty (\mathbb {R}^2)\)-to-\({\text {BMO}}(\mathbb {R}^2)\) estimates.) \(\square \)

3 Haar Coefficients of CZX Forms

We recall the weak boundedness property and the T1 assumptions, which are just the same as in the classical theory for usual Calderón–Zygmund forms.

3.1 Definition

Let B(fg) be a bilinear form defined on finite linear combinations of indicators of cubes of \(\mathbb {R}^2\), and such that

$$\begin{aligned} B(f,g)=\iint K(x,y)f(y)g(x)\,\textrm{d}x\,\textrm{d}y \end{aligned}$$

when \(\{f\ne 0\}\cap \{g\ne 0\}=\varnothing \), where \(K\in CZX(\mathbb {R}^2)\). We say that B is a \(CZX(\mathbb {R}^2)\)-form.

3.2 Definition

A \(CZX(\mathbb {R}^2)\)-form satisfies the weak boundedness property if \(|B(1_I,1_I)|\lesssim |I|\) for all cubes \(I\subset \mathbb {R}^2\). It satisfies the T1 conditions if

$$\begin{aligned} B(1,g)=\int b_1 g,\qquad B(f,1)=\int b_2 f \end{aligned}$$

for some \(b_1,b_2\in {\text {BMO}}(\mathbb {R}^2)\) and all fg with \(\int f=0=\int g\). Here

$$\begin{aligned} \Vert b\Vert _{{\text {BMO}}} = \Vert b\Vert _{{\text {BMO}}(\mathbb {R}^2)}:=\sup _I \frac{1}{|I|} \int _I |b-\langle b\rangle _I|, \end{aligned}$$

where the supremum is over all cubes \(I \subset \mathbb {R}^2\) and \(\langle b\rangle _I = \frac{1}{|I|} \int _I b\).

For an interval \(I \subset \mathbb {R}\), we denote by \(I_{l}\) and \(I_{r}\) the left and right halves of the interval I, respectively. We define \(h_{I}^0 = |I|^{-1/2}1_{I}\) and \(h_{I}^1 = |I|^{-1/2}(1_{I_{l}} - 1_{I_{r}})\). Let now \(I = I^1 \times I^2\) be a cube, and define the Haar function \(h_I^{\eta }\), \(\eta = (\eta ^1, \eta ^2) \in \{0,1\}^2\), via

$$\begin{aligned} h_I^{\eta } = h_{I^1}^{\eta ^1} \otimes h_{I^2}^{\eta ^2}. \end{aligned}$$

3.3 Lemma

Let B be a \(CZX(\mathbb {R}^2)\)-form satisfying the weak boundedness property. There holds that

$$\begin{aligned} \begin{aligned} |B(h_I^\beta ,h_{J}^\gamma )|&\lesssim \prod _{i=1}^2\Big (\frac{\ell (I)}{\ell (I)+{\text {dist}}(I^i,J^i)}\Big )\times \frac{\ell (I)^{\theta _1}(\ell (I)+\min _{i=1,2}{\text {dist}}(I^i,J^i))^{\theta _2-\theta _1}}{(\ell (I)+\max _{i=1,2}{\text {dist}}(I^i,J^i))^{\theta _2}} \\&\lesssim \prod _{i=1}^2\Big (\frac{\ell (I)}{\ell (I)+{\text {dist}}(I^i,J^i)}\Big )^{1+\theta },\qquad \theta :=\frac{1}{2}\min (\theta _1,\theta _2), \end{aligned} \end{aligned}$$

whenever IJ are dyadic cubes with equal side lengths \(\ell (I)=\ell (J)\) and at least \(\beta \ne 0\) or \(\gamma \ne 0\).

Proof

We consider several cases. Adjacent cubes:

By this, we mean that \({\text {dist}}(I,J)=0\), but \(I\ne J\). Here, we simply put absolute values inside. We are thus led to estimate

$$\begin{aligned} \int _I\int _J|K(x,y)h_I^\beta (x)h_J^\gamma (y)|\,\textrm{d}y \,\textrm{d}x \le \frac{1}{|I|}\int _I\int _J|K(x,y)|\,\textrm{d}y \,\textrm{d}x. \end{aligned}$$
(3.4)

By symmetry, we may assume for instance that \(I^2\ne J^2\). Lemma 2.1 gives that

$$\begin{aligned} \int _{J^1}|K(x,y)|\,\textrm{d}y^1\lesssim \frac{1}{|x^2-y^2|}. \end{aligned}$$

The assumption \(I^2\ne J^2\) implies that

$$\begin{aligned} \int _{I^2}\int _{J^2}\frac{\,\textrm{d}x^2\,\textrm{d}y^2}{|x^2-y^2|} \le \int _{3J^2 \setminus J^2}\int _{J^2} \frac{\,\textrm{d}x^2\,\textrm{d}y^2}{|x^2-y^2|} \lesssim \ell (I). \end{aligned}$$

The dependence on \(x^1\) has already disappeared, and integration with respect to \(x^1\in I^1\) results in another \(\ell (I)\). Then we are only left with observing that \(\ell (I)^2/|I|=1\).

Equal cubes: Now

$$\begin{aligned} B(h_I^\beta ,h_I^\gamma ) =\sum _{I',J'\in {\text {ch}}(I) } \langle h_I^\beta \rangle _{I'}\langle h_I^\gamma \rangle _{J'} B(1_{I'}, 1_{J'}), \end{aligned}$$

where \(|\langle h_I^\beta \rangle _{I'}\langle h_I^\gamma \rangle _{J'}|=|I|^{-1}\). For \(J'=I'\), the WBP implies that \(|B(1_{I'},1_{I'})|\lesssim |I'|\le |I|\). For \(J'\ne I'\), we can estimate the term as in the case of adjacent \(I\ne J\), recalling that only the size and no cancellation of the Haar functions was used there.

Cubes separated in one direction:

By this, we mean that, say, \({\text {dist}}(I^1,J^1)=0<{\text {dist}}(I^2,J^2)\), or the same with 1 and 2 interchanged. We still apply only the non-cancellative estimate (3.4) (in contrast to what one would do with standard Calderón–Zygmund operators). From Lemma 2.1, we deduce that

$$\begin{aligned} \int _{J^1}|K(x,y)|\,\textrm{d}y^1\lesssim \frac{\ell (I)^{\theta _2}}{|x^2-y^2|^{1+\theta _2}}\lesssim \frac{\ell (I)^{\theta _2}}{(\ell (I)+{\text {dist}}(I^2,J^2))^{1+\theta _2}}. \end{aligned}$$

There is no more dependence on the remaining variables \(x^1,x^2,y^2\), so integrating over these gives the factor \(\ell (I)^3\). After dividing by \(|I|=\ell (I)^2\) in (3.4), we arrive at the bound

$$\begin{aligned} \Big (\frac{\ell (I)}{\ell (I)+{\text {dist}}(I^2,J^2)}\Big )^{1+\theta _2}. \end{aligned}$$

Cubes separated in both directions:

By this, we mean that \({\text {dist}}(I^i,J^i)>0\) for both \(i=1,2\). It is only here that we make use of the assumed cancellation of at least one of the Haar functions, say \(h_I^\beta \). Thus,

$$\begin{aligned} B(h_I^\beta ,h_J^\gamma ) =\int _I\int _J [K(x,y)-K(c_I,y)]h_I^\beta (x) h_J^\gamma (y)\,\textrm{d}y\,\textrm{d}x, \end{aligned}$$

where \(c_I=(c_{I^1},c_{I^2})\) is the centre of I. Now \(x\in I\) and \(y^i\in J^i\subset (3I^i)^c\) for \(i=1,2\), so Lemma 2.2 applies to give

$$\begin{aligned} \begin{aligned} |B(h_I^\beta ,h_J^\gamma )|&\lesssim \int _I\int _J\prod _{i=1}^2\frac{1}{(\ell (I)+{\text {dist}}(I^i,J^i))}\\&\quad \times \frac{\ell (I)^{\theta _1}(\ell (I)+\min _{i=1,2}{\text {dist}}(I_i,J_i))^{\theta _2-\theta _1}}{ (\ell (I)+\max _{i=1,2}{\text {dist}}(I^i,J^i))^{\theta _2}}\times \frac{1}{|I|}\,\textrm{d}y\,\textrm{d}x, \end{aligned} \end{aligned}$$

which readily simplifies to the claimed bound after \(|I|^2/|I|=\ell (I)^2\). \(\square \)

4 Dyadic Representation and T1 Theorem

Let \(\mathcal {D}_0\) be the standard dyadic grid in \(\mathbb {R}\). For \(\omega \in \{0,1\}^{\mathbb {Z}}\), \(\omega = (\omega _i)_{i \in \mathbb {Z}}\), we define the shifted lattice

$$\begin{aligned} \mathcal {D}(\omega ):= \Big \{L + \omega := L + \sum _{i:2^{-i} < \ell (L)} 2^{-i}\omega _i :L \in \mathcal {D}_0\Big \}. \end{aligned}$$

Let \(\mathbb {P}_{\omega }\) be the product probability measure on \(\{0,1\}^{\mathbb {Z}}\). We recall the notion of k-good cubes from [7]. We say that \(G \in \mathcal {D}(\omega , k)\), \(k \ge 2\), if \(G \in \mathcal {D}(\omega )\) and

$$\begin{aligned} d(G, \partial G^{(k)}) \ge \frac{\ell (G^{(k)})}{4} = 2^{k-2} \ell (G). \end{aligned}$$
(4.1)

Notice that

$$\begin{aligned} \mathbb {P}_{\omega }( \{ \omega :L + \omega \in \mathcal {D}(\omega , k) \}) = \frac{1}{2} \end{aligned}$$
(4.2)

for all \(L \in \mathcal {D}_0\) and \(k \ge 2\).

For \(\sigma = (\sigma ^1, \sigma ^2) \in \{0,1\}^{\mathbb {Z}} \times \{0,1\}^{\mathbb {Z}}\) and dyadic \(\lambda > 0\) define

$$\begin{aligned} \mathcal {D}(\sigma )&:= \mathcal {D}(\sigma ^1) \times \mathcal {D}(\sigma ^2), \\ \mathcal {D}_{\lambda }(\sigma )&:= \{I = I^1 \times I^2 \in \mathcal {D}(\sigma ):\ell (I^1) = \lambda \ell (I^2) \}, \\ \mathcal {D}_{{\square }}(\sigma )&:= \mathcal {D}_{1}(\sigma ). \end{aligned}$$

Let \(\mathbb {P}_{\sigma }:= \mathbb {P}_{\sigma ^1} \times \mathbb {P}_{\sigma ^2}\). For \(k = (k^1, k^2)\), \(k^1, k^2 \ge 2\), we define \(\mathcal {D}(\sigma , k) = \mathcal {D}(\sigma ^1, k^1) \times \mathcal {D}(\sigma ^2, k^2)\).

We will need an estimate for the maximal operator

$$\begin{aligned} M_{\mathcal {D}_\lambda (\sigma )}f(x):=\sup _{I\in \mathcal {D}_\lambda (\sigma )}1_I(x)\langle |f|\rangle _I. \end{aligned}$$

Before bounding it, we recall the following interpolation result due to Stein and Weiss, see [23, Theorem 2.11].

4.3 Proposition

Suppose that \(1 \le p_0,p_1 \le \infty \) and let \(w_0\) and \(w_1\) be positive weights. Suppose that T is a sublinear operator that satisfies the estimates

$$\begin{aligned} \Vert T f \Vert _{L^{p_i}(w_i)} \le M_i \Vert f \Vert _{L^{p_i}(w_i)}, \quad i=1,2. \end{aligned}$$

Let \(t \in (0,1)\) and define \( 1/p=(1-t)/p_0+t/p_1 \) and \(w=w_0^{p(1-t)/p_0}w_1^{pt/p_1}\). Then T satisfies the estimate

$$\begin{aligned} \Vert T f \Vert _{L^{p}(w)} \le M_0^{1-t}M_1^t \Vert f \Vert _{L^{p}(w)}. \end{aligned}$$

4.4 Proposition

For all \(p\in (1,\infty )\) and all \(w\in A_p\), there are constants \(C=C(p,w),\eta =\eta (p,w)>0\) such that

$$\begin{aligned} \Vert M_{\mathcal {D}_\lambda (\sigma )}f\Vert _{L^p(w)}\le C\cdot D(\lambda )^{1-\eta }\Vert f\Vert _{L^p(w)}, \end{aligned}$$

where \(D(\lambda ):=\max (\lambda ,\lambda ^{-1})\).

Proof

The parameter \(\sigma \) plays no role in this argument, so we drop it from the notation. Since \(\mathcal {D}_\lambda \) has the same nestedness structure as the usual \(\mathcal {D}_{{\square }}\), the unweighted bound

$$\begin{aligned} \Vert M_{\mathcal {D}_\lambda }f\Vert _{L^s}\le s'\Vert f\Vert _{L^s},\quad \forall s\in (1,\infty ), \end{aligned}$$

holds. On the other hand, for any \(I \in \mathcal {D}_\lambda \), there is some \(J\in \mathcal {D}_{{\square }}\) such that \(I\subset J\) and \(|J|\le D(\lambda )|I|\). Therefore, we conclude that

$$\begin{aligned} M_{\mathcal {D}_\lambda }f(x)= \sup _{I\in \mathcal {D}_\lambda } \langle |f|\rangle _I 1_I (x)\le D(\lambda ) \sup _{J\in \mathcal {D}_{{\square }}} \langle |f|\rangle _J 1_J (x) = D(\lambda ) M_{\mathcal {D}_{{\square }}}f(x), \end{aligned}$$

and so

$$\begin{aligned} \Vert M_{\mathcal {D}_\lambda }f\Vert _{L^s(w)}\le C(s,w)D(\lambda )\Vert f\Vert _{L^s(w)},\quad \forall s\in (1,\infty ),\quad \forall w\in A_s. \end{aligned}$$

Let us now consider \(s\in (1,\infty )\) and \(w\in A_s\) fixed. It is well known that we can find a \(\delta =\delta (s,w)>0\) such that \(w^{1+\delta }\in A_s\), and thus

$$\begin{aligned} \Vert M_{\mathcal {D}_\lambda }f\Vert _{L^s(w^{1+\delta })}\le C(s,w^{1+\delta })D(\lambda ) \Vert f\Vert _{L^s(w^{1+\delta })}. \end{aligned}$$

Now \(w=(w^{1+\delta })^{1/(1+\delta )}\cdot 1^{\delta /(1+\delta )}\) and Proposition 4.3 shows that

$$\begin{aligned} \Vert M_{\mathcal {D}_\lambda }f\Vert _{L^s(w)}\le \big (C(s,w^{1+\delta })D(\lambda ))^{1/(1+\delta )}(s')^{\delta /(1+\delta )}\Vert f\Vert _{L^s(w)}. \end{aligned}$$

Set \(\eta :=\delta /(1+\delta )\). We have found \(\eta =\eta (\delta )=\eta (s,w)>0\) such that

$$\begin{aligned} \Vert M_{\mathcal {D}_\lambda }f\Vert _{L^s(w)}\le C(s,w)D(\lambda )^{1-\eta (s,w)}\Vert f\Vert _{L^s(w)}. \end{aligned}$$

\(\square \)

In addition to the usual Haar functions, we will need the functions \(H_{I,J}\), where I and J are cubes with equal side length. The functions \(H_{I,J}\) satisfy

  1. (1)

    \(H_{I,J}\) is supported on \(I \cup J\) and constant on the children of I and J,

  2. (2)

    \(|H_{I,J}| \le |I|^{-1/2}\) and

  3. (3)

    \(\int H_{I,J} = 0\).

We denote (by slightly abusing notation) a general cancellative Haar function \(h_{I}^{\eta }\), \(\eta \ne (0, 0)\), simply by \(h_I\).

4.5 Definition

For \(k = (k^1, k^2)\), \(k^i \ge 0\), we define that the operator \(Q_{k, \sigma }\) has either the form

$$\begin{aligned} \langle Q_{k, \sigma }f, g\rangle = \sum _{K \in \mathcal {D}_{2^{k^1-k^2}}(\sigma )} \sum _{\begin{array}{c} I, J \in \mathcal {D}_{{\square }}(\sigma ) \\ I^{(k)} = J^{(k)} = K \end{array}} a_{IJK} \langle f, H_{I,J} \rangle \langle g, h_J \rangle \end{aligned}$$

or the symmetric form, and here \(I^{(k)} = I^{(k^1)} \times I^{(k^2)}\) and the constants \( a_{IJK}\) satisfy

$$\begin{aligned} |a_{IJK}| \le \frac{|I|}{|K|}. \end{aligned}$$

4.6 Lemma

For \(p \in (1, \infty )\) there holds that

$$\begin{aligned} \Vert Q_{k, \sigma }f \Vert _{L^p} \lesssim (1+\max (k^1, k^2))^{1/2} \Vert f\Vert _{L^p}. \end{aligned}$$

Moreover, for \(w\in A_p\), there is \(\eta >0\) such that

$$\begin{aligned} \Vert Q_{k,\sigma }f\Vert _{L^p(w)}\lesssim (1+\max (k^1, k^2))^{1/2} 2^{|k^1-k^2|(1-\eta )} \Vert f\Vert _{L^p(w)}. \end{aligned}$$

Proof

We consider \(\sigma \) fixed here and drop it from the notation. Suppose, e.g. \(k^1 \ge k^2\). We write

$$\begin{aligned} \langle f, H_{I,J} \rangle = \big \langle E_{\frac{\ell (I)}{2}}f - E_{\ell (K^1)}f, H_{I,J} \big \rangle , \qquad E_{\lambda } f:= \sum _{\begin{array}{c} L \in \mathcal {D}_{{\square }} \\ \ell (L) = \lambda \end{array}} E_L f, \, E_L f = \langle f \rangle _L 1_L. \end{aligned}$$

Therefore, \(\langle f, H_{I,J} \rangle = \langle \gamma _{K, k^1} f, H_{I,J} \rangle \), where

$$\begin{aligned} \gamma _{K, k^1} f:= 1_K \sum _{\begin{array}{c} L \in \mathcal {D}_{{\square }} \\ 2^{-k^1}\ell (K^1) \le \ell (L) \le \ell (K^1) \end{array}} \Delta _L f, \qquad \Delta _L f= \sum _{\begin{array}{c} L' \in \mathcal {D}_{{\square }} \\ L' \subset L,\, \ell (L') = \frac{\ell (L)}{2} \end{array}} E_{L'} f - E_{L} f. \end{aligned}$$

Notice now that for \(w \in A_2\), there holds that

$$\begin{aligned} \Big \Vert \Big ( \sum _{K \in \mathcal {D}_{2^{k^1-k^2}}} |\gamma _{K, k^1} f|^2 \Big )^{\frac{1}{2}}\Big \Vert _{L^2(w)}^2&= \sum _{G \in \mathcal {D}_{{\square }}} \Big \Vert \sum _{\begin{array}{c} K \in \mathcal {D}_{2^{k^1-k^2}} \\ K^{(0, k^1-k^2) = G} \end{array}} \gamma _{K, k^1} f \Big \Vert _{L^2(w)}^2 \\&= \sum _{G \in \mathcal {D}_{{\square }}} \Big \Vert \sum _{\begin{array}{c} L \in \mathcal {D}_{{\square }}, \, L \subset G \\ \ell (L) \ge 2^{-k^1}\ell (G) \end{array}} \Delta _L f \Big \Vert _{L^2(w)}^2 \\&\sim \sum _{G \in \mathcal {D}_{{\square }}} \sum _{\begin{array}{c} L \in \mathcal {D}_{{\square }}, \, L \subset G \\ \ell (L) \ge 2^{-k^1}\ell (G) \end{array}} \Vert \Delta _L f\Vert _{L^2(w)}^2 \lesssim (1+k^1) \Vert f\Vert _{L^2(w)}^2, \end{aligned}$$

where we used the standard weighted square function estimate

$$\begin{aligned} \sum _{L \in \mathcal {D}_{{\square }}} \Vert \Delta _L f\Vert _{L^2(w)}^2 \sim \Vert f\Vert _{L^2(w)}^2 \end{aligned}$$

twice in the end.

To bound \(Q_k f\) we need to estimate

$$\begin{aligned}{} & {} \sum _{K \in \mathcal {D}_{2^{k^1-k^2}}} \frac{1}{|K|} \sum _{\begin{array}{c} I, J \in \mathcal {D}_{{\square }} \\ I^{(k)} = J^{(k)} = K \end{array}} \langle |\gamma _{K, k^1}f|, 1_{I} + 1_{J}\rangle \langle |\Delta _{K, k}g |, 1_J \rangle ,\\{} & {} \Delta _{K, k}g:= \sum _{\begin{array}{c} J \in \mathcal {D}_{{\square }} \\ J^{(k^1, k^2)} = K \end{array}} \Delta _J g. \end{aligned}$$

We split this into two pieces according to \(1_{I} + 1_{J}\). The first piece is bounded by

$$\begin{aligned} \begin{aligned}&\sum _{K \in \mathcal {D}_{2^{k^1-k^2}}} \int \langle |\gamma _{K, k^1}f| \rangle _K |\Delta _{K, k}g | \le \sum _{K \in \mathcal {D}_{2^{k^1-k^2}}}\int M_{\mathcal {D}_{2^{k^1-k^2}}} \gamma _{K, k^1}f \cdot |\Delta _{K, k}g | \\&\quad \le \Big \Vert \Big (\sum _{K\in \mathcal {D}_{2^{k^1-k^2}}}[M_{\mathcal {D}_{2^{k^1-k^2}}} \gamma _{K, k^1}f]^2\Big )^{1/2}\Big \Vert _{L^2(w)} \Big \Vert \Big (\sum _{K\in \mathcal {D}_{2^{k^1-k^2}}} |\Delta _{K, k}g |^2\Big )^{1/2}\Big \Vert _{L^2(w^{-1})} \\&\quad \lesssim 2^{(k^1-k^2)(1-\eta )}\Big \Vert \Big (\sum _{K\in \mathcal {D}_{2^{k^1-k^2}}}|\gamma _{K, k^1}f|^2\Big )^{1/2}\Big \Vert _{L^2(w)}\Vert g\Vert _{L^2(w^{-1})} \\&\quad \lesssim 2^{(k^1-k^2)(1-\eta )}(1+k^1)^{1/2}\Vert f\Vert _{L^2(w)}\Vert g\Vert _{L^2(w^{-1})} \\ \end{aligned} \end{aligned}$$

while the second piece is bounded by

$$\begin{aligned} \sum _{K \in \mathcal {D}_{2^{k^1-k^2}}} \Big \langle \sum _{\begin{array}{c} J \in \mathcal {D}_{{\square }} \\ J^{(k^1, k^2)} = K \end{array}} \langle |\gamma _{K, k^1}f| \rangle _J 1_J, |\Delta _{K, k}g | \Big \rangle \le \sum _{K \in \mathcal {D}_{2^{k^1-k^2}}} \int M_{\mathcal {D}_{{\square }}} \gamma _{K, k^1}f \cdot |\Delta _{K, k}g |, \end{aligned}$$

which is estimated similarly except that the bound for \(M_{\mathcal {D}_{2^{k^1-k^2}}}\) is replaced by the standard result for \(M_{\mathcal {D}_{{\square }}}\). This proves the claimed bounds in \(L^2(w)\), and the results for \(L^p(w)\) follow by Rubio de Francia’s extrapolation theorem (the correct \(1-\eta \) dependence is maintained by the extrapolation, see Remark 4.7).

For the unweighted estimate in \(L^p\) (with better complexity dependence), simply run the above argument using the Fefferman–Stein \(L^p(\ell ^2)\) estimate for the strong maximal function instead of the \(L^2(w)\) estimate of \(M_{\mathcal {D}_{2^{k^1-k^2}}}\), and use the analogous \(L^p(\ell ^2)\) estimate for the square function involving \(\gamma _{K, k^1}\) that follows via Rubio de Francia extrapolation from the proved \(L^2(w)\) estimate of the same square function. \(\square \)

4.7 Remark

It is clear that when \(p=2\), \(\eta \) depends only on the \(A_2\) constant of w. In fact, in the proof of Proposition 4.4 we get \(\eta \sim 1/{[w]_{A_2}}\). Thus,

$$\begin{aligned} \Vert Q_{k,\sigma }f\Vert _{L^2(w)}\le (1+\max (k^1, k^2))^{1/2} 2^{|k^1-k^2|} N([w]_{A_2}) \Vert f\Vert _{L^2(w)} \end{aligned}$$

with

$$\begin{aligned} N([w]_{A_2})=K([w]_{A_2}) 2^{- c|k^1-k^2|/{[w]_{A_2}}}, \end{aligned}$$

where K is an increasing function. Hence N is also an increasing function. Then standard extrapolation (see e.g. [2, Theorem 3.1]) gives that the \(L^p(w)\) bound of \(Q_{k,\sigma }\) is

$$\begin{aligned} (1+\max (k^1, k^2))^{1/2} 2^{|k^1-k^2|} N(c_p[w]_{A_p}^{\alpha (p)}). \end{aligned}$$

Then we get the desired estimate with \(\eta =c(c_p[w]_{A_p}^{\alpha (p)})^{-1}\).

4.8 Definition

We say that \(\pi _b\) is a (one-parameter) paraproduct if it has the form

$$\begin{aligned} \langle \pi _b f, g \rangle = \sum _{I \in \mathcal {D}_{{\square }}} \langle b, h_I \rangle \langle f \rangle _I \langle g, h_I \rangle \end{aligned}$$

or the symmetric form.

It is well known (and follows readily from \(H^1\)\({\text {BMO}}\) duality) that paraproducts are \(L^p\) bounded for \(p\in (1,\infty )\) (and \(L^p(w)\) bounded for \(w\in A_p\)) precisely when \(b \in {\text {BMO}}\).

4.9 Theorem

Let B be a \(CZX(\mathbb {R}^2)\)-form satisfying the weak boundedness property and the T1 conditions. Then

$$\begin{aligned} B(f, g) = \mathbb {E}_{\sigma }\Big [C_T \sum _{k^1, k^2 \ge 0} 2^{-\theta _2(k_{\max }-k_{\min })}2^{-\theta _1 k_{\min }} \langle Q_{k,\sigma }f, g \rangle + \langle \pi _{b_1, \sigma } f, g\rangle + \langle \pi _{b_2, \sigma } f, g\rangle \Big ], \end{aligned}$$

where \(k_{\max }=\max _{i=1,2}k^i\), \(k_{\min }=\min _{i=1,2}k^i\). In particular, for \(p \in (1, \infty )\) there holds that

$$\begin{aligned} |B(f, g)| \lesssim \Vert f\Vert _{L^p} \Vert g\Vert _{L^{p'}}. \end{aligned}$$

If \(\theta _2=1\), there also holds for all \(w\in A_p\) that

$$\begin{aligned} |B(f,g)|\lesssim \Vert f\Vert _{L^p(w)}\Vert g\Vert _{L^{p'}(w^{1-p'})}. \end{aligned}$$

Proof

Write (by expanding \(f = \sum _I \Delta _I f\), \(g = \sum _J \Delta _J g\) and collapsing the off-diagonal)

$$\begin{aligned} B(f,g) = \mathbb {E}_{\sigma }\sum _{\ell (I) = \ell (J)} [B(E_I f, \Delta _J g) + B(\Delta _I f, E_J g) + B(\Delta _I f, \Delta _J g)], \end{aligned}$$

where \(I, J \in \mathcal {D}_{{\square }}(\sigma )\). We begin by writing

$$\begin{aligned} \Sigma _1&:= \mathbb {E}_{\sigma }\sum _{\ell (I) = \ell (J)} B(E_I f, \Delta _J g) \\&= \mathbb {E}_{\sigma } \sum _{\ell (I) = \ell (J)} B(h_I^0, h_J) \langle f, H_{I,J} \rangle \langle g, h_J \rangle \\&\quad +\quad \mathbb {E}_{\sigma } \sum _{J \in \mathcal {D}_{{\square }}(\sigma )} B(1, h_J) \langle f \rangle _J \langle g, h_J \rangle =: \Sigma _{1, 1} + \Sigma _{1,2}, \end{aligned}$$

where \(H_{I,J}:= h_I^0 - h_J^0\). As the term \(\Sigma _{1,2}\) is readily a paraproduct, we only continue with \(\Sigma _{1, 1}\). This was a standard one-parameter start. Write

$$\begin{aligned}{} & {} \Sigma _{1, 1} = \mathbb {E}_{\sigma }\sum _{m = (m^1, m^2) \in \mathbb {Z}^2 \setminus \{0\}} \sum _{I \in \mathcal {D}_{{\square }}(\sigma )} \varphi _{I, I \dotplus m},\\{} & {} \quad \varphi _{I, I \dotplus m} = B(h_I^0, h_{I \dotplus m}) \langle f, H_{I,I \dotplus m} \rangle \langle g, h_{I \dotplus m} \rangle , \end{aligned}$$

where \(I \dot{+} m:= I + m\ell (I) \in \mathcal {D}_{{\square }}(\sigma )\). Next, write

$$\begin{aligned} \sum _{m = (m^1, m^2) \in \mathbb {Z}^2 \setminus \{0\}} = \sum _{m^1 \in \mathbb {Z}\setminus \{0\}}\sum _{m^2 \in \mathbb {Z}\setminus \{0\}} + \sum _{\begin{array}{c} m^2 \in \mathbb {Z}\setminus \{0\} \\ m = (0, m^2) \end{array} } + \sum _{\begin{array}{c} m^1 \in \mathbb {Z}\setminus \{0\} \\ m = (m^1, 0) \end{array} }. \end{aligned}$$

Focusing, for now, on the part \(m^1 \ne 0\) and \(m^2 \ne 0\), write

$$\begin{aligned} \sum _{m^1 \in \mathbb {Z}\setminus \{0\}}\sum _{m^2 \in \mathbb {Z}\setminus \{0\}} = \sum _{k^1 = 2}^{\infty } \sum _{k^2 = 2}^{\infty } \sum _{|m^1| \in (2^{k^1-3}, 2^{k^1-2}]} \sum _{|m^2| \in (2^{k^2-3}, 2^{k^2-2}]}. \end{aligned}$$

Independence and (4.2) imply that

$$\begin{aligned}&\mathbb {E}_{\sigma } \sum _{k^1 = 2}^{\infty } \sum _{k^2 = 2}^{\infty } \sum _{|m^1| \in (2^{k^1-3}, 2^{k^1-2}]} \sum _{|m^2| \in (2^{k^2-3}, 2^{k^2-2}]} \sum _{I \in \mathcal {D}_{{\square }}(\sigma )} \varphi _{I, I \dotplus m} \\&\quad = 4 \mathbb {E}_{\sigma } \sum _{k^1 = 2}^{\infty } \sum _{k^2 = 2}^{\infty } \sum _{|m^1| \in (2^{k^1-3}, 2^{k^1-2}]} \sum _{|m^2| \in (2^{k^2-3}, 2^{k^2-2}]} \sum _{I \in \mathcal {D}_{{\square }}(\sigma , k)} \varphi _{I, I \dotplus m}, \end{aligned}$$

where \(k = (k^1, k^2)\), and the gist is that for \(|m^1| \in (2^{k^1-3}, 2^{k^1-2}]\), \(|m^2| \in (2^{k^2-3}, 2^{k^2-2}]\) and \(I \in \mathcal {D}_{{\square }}(\sigma , k)\) there holds that

$$\begin{aligned} (I^1 \dotplus m^1)^{(k^1)} = (I^1)^{(k^1)} =: K^1 \qquad and \qquad (I^2 \dotplus m^2)^{(k^2)} = (I^2)^{(k^2)} =: K^2. \end{aligned}$$

Notice that \(K = I^{(k)} = (I\dotplus m)^{(k)} = K^1 \times K^2 \in \mathcal {D}_{2^{k^1 - k^2}}(\sigma )\), since

$$\begin{aligned} \ell (K^1) = 2^{k^1} \ell (I^1) = 2^{k^1} \ell (I^2) = 2^{k^1-k^2} 2^{k^2} \ell (I^2) = 2^{k^1-k^2} \ell (K^2). \end{aligned}$$

Finally, notice that Lemma 3.3 implies that

$$\begin{aligned} |B(h_I^0, h_{I \dotplus m})| \lesssim 2^{-k^1}2^{-k^2}\frac{2^{k_{\min }(\theta _2-\theta _1)}}{2^{k_{\max }\theta _2}} =2^{-\theta _2(k_{\max }-k_{\min })}2^{-\theta _1 k_{\min }}\frac{|I|}{|K|}. \end{aligned}$$

The sums, where \(m^1 = 0\) or \(m^2 = 0\) are completely similar (just do the above in one of the parameters). We are done with \(\Sigma _1\).

Of course, \(\Sigma _2:= \mathbb {E}_{\sigma }\sum _{\ell (I) = \ell (J)} B(\Delta _I f, E_J g)\) is completely symmetric. The term \(\Sigma _3:= \mathbb {E}_{\sigma }\sum _{\ell (I) = \ell (J)} B(\Delta _I f, \Delta _J g)\) does not produce a paraproduct and produces shifts with the simpler form \(H_{I, J} = h_I\).

The unweighted boundedness follows immediately from the \(L^p\) bounds of the paraproducts and the bound \(\Vert Q_{k,\sigma }f\Vert _{L^p}\lesssim (1+k_{\max })^{1/2}\Vert f\Vert _{L^p}\), since the exponentially decaying factor \(2^{-\theta _2(k_{\max }-k_{\min })}2^{-\theta _1 k_{\min }}\) clearly make the series summable for any \(\theta _1,\theta _2>0\).

Let us finally consider the weighted case with \(\theta _2=1\). Then for some \(\eta =\eta (p,w)>0\), there holds that

$$\begin{aligned} \begin{aligned}&2^{-\theta _2(k_{\max }-k_{\min })}2^{-\theta _1 k_{\min }}|\langle Q_{k,\sigma }f,g \rangle | \\&\quad \lesssim 2^{-(k_{\max }-k_{\min })}2^{-\theta _1 k_{\min }} 2^{(k_{\max }-k_{\min })(1-\eta )}(1+k_{\max })^{1/2}\Vert f\Vert _{L^p(w)}\Vert g\Vert _{L^{p'}(w^{1-p'})} \\&\quad = 2^{-\eta (k_{\max }-k_{\min })}2^{-\theta _1 k_{\min }} (1+k_{\max })^{1/2}\Vert f\Vert _{L^p(w)}\Vert g\Vert _{L^{p'}(w^{1-p'})}, \end{aligned} \end{aligned}$$

and again we have exponential decay that makes the series over \(k^1,k^2\) summable. \(\square \)

4.10 Remark

If \(\theta _2 = 1\), we may redefine \(D_1(x, y)\) to be the slightly larger quantity

$$\begin{aligned} D_1(x, y):= \Bigg (\frac{|x^1-y^1|}{|x^2-y^2|} + \frac{|x^2-y^2|}{|x^1-y^1|}\Bigg )^{-1} \log \Bigg ( \frac{|x^1-y^1|}{|x^2-y^2|} + \frac{|x^2-y^2|}{|x^1-y^1|} \Bigg ), \end{aligned}$$

and still prove the weighted estimates essentially like above. This is pertinent in the sense that if we take a Fefferman–Pipher multiplier [4]—a singular integral of Zygmund type—and use it to induce a CZX operator, a logarithmic term appears. In this threshold a weighted estimate still holds. See also [12].

5 Commutator Estimates

We show that our exotic Calderón–Zygmund operators also satisfy the usual one-parameter commutator estimates. Since weighted estimates with one-parameter weights do not in general hold (see Sect. 6), this does not follow from the well-known Cauchy integral trick.

5.1 Theorem

Let \(T\in \mathcal {L}(L^2(\mathbb {R}^2))\) be an operator associated with a CZX kernel K. Then

$$\begin{aligned} \Vert [b,T]f \Vert _{L^p} \lesssim \Vert b\Vert _{{\text {BMO}}} \Vert f \Vert _{L^p} \end{aligned}$$

whenever \(p \in (1, \infty )\). Here \([b,T]f:=bTf-T(bf)\).

Proof

By Theorem 4.9 and the well-known commutator estimates for the paraproducts \(\pi \), we only need to prove that

$$\begin{aligned} \big |\langle Q_{k,\sigma }(bf), g \rangle -\langle Q_{k,\sigma }f, bg \rangle \big |\lesssim \varphi (k)\Vert b\Vert _{{\text {BMO}}}\Vert f\Vert _{L^p}\Vert g\Vert _{L^{p'}}, \end{aligned}$$

where \(\varphi \) is some polynomial. We consider \(\sigma \) fixed and drop it from the notation. Recall the usual paraproduct decomposition of bf:

$$\begin{aligned} bf=a_1(b,f)+a_2(b,f)+a_3(b,f), \end{aligned}$$

where

$$\begin{aligned} a_1(b,f)=\sum _{I \in \mathcal {D}_{{\square }}} \Delta _I b \Delta _I f,\quad a_2(b,f)=\sum _{I \in \mathcal {D}_{{\square }}} \Delta _I b \langle f\rangle _I, \quad a_3(b,f)= \sum _{I \in \mathcal {D}_{{\square }}} \langle b\rangle _I \Delta _I f. \end{aligned}$$

Invoking the above decomposition, the well-known boundedness of paraproducts

$$\begin{aligned} \Vert a_i(b, f) \Vert _{L^p} \lesssim \Vert b\Vert _{{\text {BMO}}} \Vert f \Vert _{L^p}, \qquad i = 1,2, \end{aligned}$$

and Lemma 4.6, it suffices to control

$$\begin{aligned} \begin{aligned}&\big |\langle Q_{k}(a_3(b,f)), g \rangle -\langle Q_{k}f, a_3(b,g) \rangle \big |\\&\quad = \Big | \sum _{K \in \mathcal {D}_{2^{k^1-k^2}}} \sum _{\begin{array}{c} I, J \in \mathcal {D}_{{\square }} \\ I^{(k)} = J^{(k)} = K \end{array}} a_{IJK}\Big [ \langle a_3(b,f), H_{I,J} \rangle \langle g, h_J \rangle - \langle b\rangle _J\langle f, H_{I,J} \rangle \langle g, h_J \rangle \Big ] \Big |. \end{aligned}\nonumber \\ \end{aligned}$$
(5.2)

We may assume \(k^1\ge k^2\). There holds that

$$\begin{aligned}&\langle a_3(b,f), H_{I,J} \rangle \langle g, h_J \rangle - \langle b\rangle _J\langle f, H_{I,J} \rangle \langle g, h_J \rangle \\&\quad = \sum _{\begin{array}{c} Q\in \mathcal {D}_{{\square }}, Q\subset K^{(0, k^1-k^2)}\\ \ell (I)\le \ell (Q)\le 2^{k^1}\ell (I) \end{array}} (\langle b\rangle _Q-\langle b\rangle _J) \langle \Delta _Q f, H_{I,J}\rangle \langle g, h_J \rangle . \end{aligned}$$

Observe that \(|\langle b\rangle _Q-\langle b\rangle _J|\lesssim k^1\Vert b\Vert _{{\text {BMO}}}\). On the other hand, since we only need to consider those Q such that \(\langle \Delta _Q f, H_{I,J}\rangle \ne 0\), i.e. either \(I\subset Q\) or \(J\subset Q\), there holds that

$$\begin{aligned}&\sum _{\begin{array}{c} Q\in \mathcal {D}_{{\square }}, Q\subset K^{(0, k^1-k^2)}\\ \ell (I)\le \ell (Q)\le 2^{k^1}\ell (I) \end{array}} \big | (\langle b\rangle _Q-\langle b\rangle _J) \langle \Delta _Q f, H_{I,J}\rangle \langle g, h_J \rangle \big |\\&\quad \lesssim k^1\Vert b\Vert _{{\text {BMO}}} \sum _{\ell ^1=0}^{k^1}\Big (\big |\langle \Delta _{I^{(\ell ^1)}} f, H_{I,J}\rangle \langle g, h_J \rangle \big |+ \big | \langle \Delta _{J^{(\ell ^1)}} f, H_{I,J}\rangle \langle g, h_J \rangle \big |\Big )\\&\quad \le 2k^1\Vert b\Vert _{{\text {BMO}}} \sum _{\ell ^1=0}^{k^1}\Big (\langle | \Delta _{I^{(\ell ^1)}} f |, h_{I}^0\rangle |\langle g, h_J \rangle |+ \langle | \Delta _{J^{(\ell ^1)}} f|, h_J^0\rangle |\langle g, h_J \rangle |\Big ), \end{aligned}$$

where we have used the simple observation

$$\begin{aligned} \langle |\Delta _{I^{(\ell ^1)}} f|, h_{J}^0\rangle \le \langle |\Delta _{J^{(\ell ^1)}} f|, h_{J}^0\rangle , \quad \langle |\Delta _{J^{(\ell ^1)}} f|, h_{I}^0\rangle \le \langle |\Delta _{I^{(\ell ^1)}} f|, h_{I}^0\rangle . \end{aligned}$$

Now, returning to (5.2), for a fixed \(\ell ^1\), we get

$$\begin{aligned}&\sum _{K \in \mathcal {D}_{2^{k^1-k^2}}} \sum _{\begin{array}{c} I, J \in \mathcal {D}_{{\square }} \\ I^{(k)} = J^{(k)} = K \end{array}} \frac{|I|}{|K|}\langle |\Delta _{I^{(\ell ^1)}} f|, h_{I}^0\rangle |\langle g, h_J \rangle |\\&\quad \le \sum _{K \in \mathcal {D}_{2^{k^1-k^2}}}\Bigg \langle \sum _{\begin{array}{c} I \in \mathcal {D}_{{\square }} \\ I^{(k)} = K \end{array}}| \Delta _{I^{(\ell ^1)}} f|1_I, M_{\mathcal {D}_{2^{k^1-k^2}}}|\Delta _{K,k} g|\Bigg \rangle . \end{aligned}$$

Using extrapolation we only need to show that

$$\begin{aligned} \sum _{K \in \mathcal {D}_{2^{k^1-k^2}}} \Big \Vert \sum _{\begin{array}{c} I \in \mathcal {D}_{{\square }} \\ I^{(k)} = K \end{array}}| \Delta _{I^{(\ell ^1)}} f|1_I\Big \Vert _{L^2(w)}^2\lesssim \Vert f\Vert _{L^2(w)}^2. \end{aligned}$$

However, this is clear because

$$\begin{aligned} \sum _{K \in \mathcal {D}_{2^{k^1-k^2}}} \Big \Vert \sum _{\begin{array}{c} I \in \mathcal {D}_{{\square }} \\ I^{(k)} = K \end{array}}| \Delta _{I^{(\ell ^1)}} f|1_I\Big \Vert _{L^2(w)}^2&= \sum _{G \in \mathcal {D}_{{\square }}} \sum _{\begin{array}{c} K \in \mathcal {D}_{2^{k^1-k^2}}\\ K^{(0, k^1-k^2)}=G \end{array}} \sum _{\begin{array}{c} I \in \mathcal {D}_{{\square }} \\ I^{(k)} = K \end{array}} \big \Vert 1_I \Delta _{I^{(\ell ^1)}} f \big \Vert _{L^2(w)}^2\\&= \sum _{G \in \mathcal {D}_{{\square }}}\sum _{\begin{array}{c} I \in \mathcal {D}_{{\square }} \\ I^{(k^1, k^1)} = G \end{array}}\big \Vert 1_I\Delta _{I^{(\ell ^1)}} f\big \Vert _{L^2(w)}^2\\&= \sum _{I \in \mathcal {D}_{{\square }}} \big \Vert 1_I\Delta _{I^{(\ell ^1)}} f\big \Vert _{L^2(w)}^2 \\&=\sum _{Q\in \mathcal {D}_{{\square }}}\big \Vert \Delta _{Q} f\big \Vert _{L^2(w)}^2. \end{aligned}$$

To conclude the proof of the proposition, we are left to deal with

$$\begin{aligned}&\sum _{K \in \mathcal {D}_{2^{k^1-k^2}}} \sum _{\begin{array}{c} I, J \in \mathcal {D}_{{\square }} \\ I^{(k)} = J^{(k)} = K \end{array}} \frac{|I|}{|K|} \langle | \Delta _{J^{(\ell ^1)}} f|, h_{J}^0\rangle |\langle g, h_J \rangle |\\&\quad \le \sum _{K \in \mathcal {D}_{2^{k^1-k^2}}}\Bigg \langle \sum _{\begin{array}{c} J\in \mathcal {D}_{{\square }} \\ J^{(k)} = K \end{array}}| \Delta _{J^{(\ell ^1)}} f|1_J, M_{\mathcal {D}_{{\square }}}|\Delta _{K,k} g|\Bigg \rangle . \end{aligned}$$

After the estimate above, this is clearly similar as the other term. We are done. \(\square \)

6 Counterexample to Weighted Estimates and Sparse Bounds

We begin by showing that bounded \(CZX(\mathbb {R}^2)\) operators need not be bounded with respect to the one-parameter weights if \(\theta _2 < 1\).

6.1 Lemma

For scalars \(\theta _2 \in (0, 1]\), \(t_1, t_2 > 0\) and a bump function \(\varphi \) we define

$$\begin{aligned} K(x) = K_{t_1, t_2, \theta _2}(x) = \Big (\frac{t_1}{t_2}+\frac{t_2}{t_1}\Big )^{-\theta _2}\prod _{i=1}^2\frac{1}{t_i}\phi \left( \frac{x_i}{t_i}\right) . \end{aligned}$$

Then, uniformly on \(t_1, t_2\), \(K\in CZX(\mathbb {R}^2)\) with \(\theta _1=1\) and \(\theta _2\) appearing in the very definition of K, and

$$\begin{aligned} \Vert K * f\Vert _{L^2} \lesssim \Vert f\Vert _{L^2}. \end{aligned}$$

Proof

Suppose by symmetry that \(t_1\le t_2\). Then, using the rapid decay of all the derivatives of \(\phi \), we conclude for all N that

$$\begin{aligned} \begin{aligned}&|x_1^{1+\alpha _1}x_2^{1+\alpha _2}||\partial ^\alpha K(x)| \lesssim \Big (\frac{t_1}{t_2}\Big )^{\theta _2}\prod _{i=1}^2\Big (\frac{|x_i|}{t_i}\Big )^{1+\alpha _i}\Big (1+\frac{|x_i|}{t_i}\Big )^{-N} \\&\quad = \Big (\frac{t_1}{t_2}\Big )^{\theta _2}\prod _{i=1}^2\frac{|x_i|^{1+\alpha _i}t_i^{N-1-\alpha _i}}{(|x_i|+t_i)^N} \\&\quad =\frac{|x_1|^{1+\alpha _1-\theta _2}t_1^{N-\alpha _1-1+\theta _2}}{(|x_1|+t_1)^N}\frac{|x_2|^{1+\alpha _2+\theta _2}t_2^{N-\alpha _2-1-\theta _2}}{(|x_2|+t_2)^N}\Big (\frac{|x_1|}{|x_2|}\Big )^{\theta _2} \\&\quad =\Big (\frac{t_1}{t_2}\Big )^{2\theta _2}\frac{|x_1|^{1+\alpha _1+\theta _2}t_1^{N-\alpha _1-1-\theta _2}}{(|x_1|+t_1)^N}\frac{|x_2|^{1+\alpha _1-\theta _2}t_2^{N-\alpha _1-1+\theta _2}}{(|x_2|+t_2)^N}\Big (\frac{|x_2|}{|x_1|}\Big )^{\theta _2}. \end{aligned} \end{aligned}$$
(6.2)

In the last two lines of (6.2), if N is large enough, each factor in front of the last one can be bounded by one. Thus, we get that

$$\begin{aligned} |x_1^{1+\alpha _1}x_2^{1+\alpha _2}||\partial ^\alpha K(x)| \lesssim \min \Big \{ \Big (\frac{|x_1|}{|x_2|}\Big )^{\theta _2}, \Big (\frac{|x_2|}{|x_1|}\Big )^{\theta _2}\Big \} \sim \Big (\frac{|x_1|}{|x_2|}+\frac{|x_2|}{|x_1|}\Big )^{-\theta _2}. \end{aligned}$$

From \(\alpha \in \{(0,0),(0,1),(1,0)\}\), we get the desired kernel estimates.

For the boundedness, notice that

$$\begin{aligned} |{\widehat{K}}(\xi )| = \Big (\frac{t_1}{t_2}+\frac{t_2}{t_1}\Big )^{-\theta _2}\prod _{i=1}^2 |\widehat{\phi }(t_i \xi _i)| \le \prod _{i=1}^2 |\widehat{\phi }(t_i \xi _i)| \lesssim 1. \end{aligned}$$

\(\square \)

We fix \(t_1, t_2, \theta _2\) momentarily and denote \(K = K_{t_1, t_2, \theta _2}\). For any rectangle R of sidelengths \(t_1,t_2\), it is clear that

$$\begin{aligned} K*f\gtrsim {\text {ecc}}(R)^{-\theta _2}1_R\langle f\rangle _R \end{aligned}$$

whenever \(f\ge 0\) and

$$\begin{aligned} {\text {ecc}}(R):=\max \Big \{\frac{t_1}{t_2},\frac{t_2}{t_1}\Big \} \end{aligned}$$

is the eccentricity of R. Suppose now that for \(p \in (1, \infty )\) there holds that \(\Vert K*f\Vert _{L^p(w)}\le C([w]_{A_p}) N\Vert f\Vert _{L^p(w)}\) for all \(w \in A_p\) and \(f \in L^p(w)\). Then

$$\begin{aligned} {\text {ecc}}(R)^{-\theta _2} w(R)^{1/p}\langle f\rangle _R = \Vert {\text {ecc}}(R)^{-\theta _2} 1_R\langle f\rangle _R\Vert _{L^p(w)}\lesssim C([w]_{A_p}) N\Vert f\Vert _{L^p(w)}. \end{aligned}$$

If \(f=1_R\sigma \), where \(\sigma =w^{-1/(p-1)}\), then \(f^p w=1_R\sigma \), and hence

$$\begin{aligned} {\text {ecc}}(R)^{-\theta _2} w(R)^{1/p}\langle \sigma \rangle _R \lesssim C([w]_{A_p}) N\sigma (R)^{1/p}= C([w]_{A_p}) N\langle \sigma \rangle _R^{1/p}|R|^{1/p}, \end{aligned}$$

or

$$\begin{aligned} \langle w\rangle _R^{1/p}\langle \sigma \rangle _R^{1/p'}\lesssim C([w]_{A_p})N{\text {ecc}}(R)^{\theta _2}. \end{aligned}$$

If all \(L^2\) bounded CZX operators would satisfy the \(L^p(w)\) boundedness for all \(w \in A_p\) with a bound \(C([w]_{A_p}) N\), where N depends only on the kernel constants and boundedness constants of the operator, then for all rectangles \(R\subset \mathbb {R}^2\), the estimate

$$\begin{aligned} \langle w\rangle _R^{p'/p}\langle \sigma \rangle _R\lesssim C([w]_{A_p}) {\text {ecc}}(R)^{\theta _2p'} \end{aligned}$$
(6.3)

holds. This is because for the kernels \(K = K_{t_1, t_2, \theta _2}\) the constant N is uniformly bounded on \(t_1, t_2\).

Now consider \(w(x)=|x|^\alpha \), which belongs to \(A_p(\mathbb {R}^2)\) if \(-2<\alpha <2(p-1)\). Fix some \(\alpha \in (p-1,2(p-1))\) for now. We let our implicit constants depend on \(\alpha \) as it is not important for our argument. We consider a rectangle R of the form \((0,\epsilon )\times (\epsilon ,1)\) with eccentricity \(\sim 1/\epsilon \). On R there holds that \(|x|\sim x_2\), and then

$$\begin{aligned} \langle w\rangle _R\sim \epsilon ^{-1}\int _0^\epsilon \Big (\int _\epsilon ^1 x_2^{ \alpha }\,\textrm{d}x_2\Big )\,\textrm{d}x_1 \sim \epsilon ^{-1}\int _0^\epsilon 1\,\textrm{d}x_1 = 1 \end{aligned}$$

and

$$\begin{aligned} \langle \sigma \rangle _R \sim \epsilon ^{-1}\int _0^\epsilon \Big (\int _\epsilon ^1 x_2^{-\frac{\alpha }{p-1}}\,\textrm{d}x_2\Big )\,\textrm{d}x_1 \sim \epsilon ^{-1}\int _0^\epsilon \epsilon ^{1-\frac{\alpha }{p-1}}\,\textrm{d}x_1 = \epsilon ^{1-\frac{\alpha }{p-1}}\sim {\text {ecc}}(R)^{\frac{\alpha }{p-1}-1}. \end{aligned}$$

If (6.3) holds, then

$$\begin{aligned} {\text {ecc}}(R)^{\frac{\alpha }{p-1}-1}\lesssim {\text {ecc}}(R)^{p'\theta _2}. \end{aligned}$$

Since we can let \({\text {ecc}}(R)\rightarrow \infty \), it means that we must have \(\alpha -(p-1) - p\theta _2\le 0\). Letting \(\alpha \rightarrow 2(p-1)\) in this inequality gives \(\theta _2 \ge 1/{p'}\). Then \(\theta _2\ge 1\) follows by letting \(p\rightarrow \infty \).

Thus, weighted boundedness cannot hold in general for CZX operators if \(\theta _2 < 1\). Next, we prove some complementary sparse estimates, which refine the weighted estimates in the case \(\theta _2 = 1\).

6.4 Definition

We say that an operator T satisfies pointwise \(L^p\)-sparse domination with constants C and \(\epsilon \), if for every compactly supported \(f\in L^p(\mathbb {R}^d)\), there exists an \(\epsilon \)-sparse collection \(\mathscr {S}\) of cubes such that

$$\begin{aligned} |Tf|\le C\sum _{S\in \mathscr {S}}\langle f\rangle _{S,p}1_S, \end{aligned}$$

where \(\langle f\rangle _{S,p}:=\langle |f|^p\rangle _S^{1/p}\), and a collection \(\mathscr {S}\) of cubes is called \(\epsilon \)-sparse, if there are pairwise disjoint sets \(E(S)\subset S\) for every \(S\in \mathscr {S}\) with \(|E(S)|\ge \epsilon |S|\).

There are by now several approaches to proving sparse domination. We will use one by Lerner and Ombrosi [14], which depends on bounds on the following maximal function related to the operator T under investigation

$$\begin{aligned} \mathcal M_{T,3}^{\#}f(x):=\sup _{J\ni x}\mathop {{\mathrm{ess\,sup}}}\limits _{y,z\in J}|T(1_{(3J)^c}f)(y)-T(1_{(3J)^c}f)(z)|, \end{aligned}$$

where the supremum, once again, is over all cubes J.

6.5 Lemma

Let T be an operator with a CZX kernel satisfying \(\theta _2 = 1\). Then

where the right-hand side is the strong maximal function, with supremum over all (axes-parallel) rectangles \(R\subset \mathbb {R}^2\) containing x.

Proof

Let us fix a cube \(J\subset \mathbb {R}^2\) with centre \(c_J\), and some \(x,y,z\in J\). Note that

$$\begin{aligned} \begin{aligned} T(1_{(3J)^c}f)(y)&-T(1_{(3J)^c}f)(z) \\&=[T(1_{(3J)^c}f)(y)-T(1_{(3J)^c}f)(c_J)]-[T(1_{(3J)^c}f)(z)-T(1_{(3J)^c}f)(c_J)] \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} T(1_{(3J)^c}f)(y)-T(1_{(3J)^c}f)(c_J) =\int _{(3J)^c}[K(y,u)-K(c_J,u)]f(u)\,\textrm{d}u. \end{aligned} \end{aligned}$$

As usual, we split

$$\begin{aligned} (3J)^c=[(3J^1)^c\times 3J^2]\cup [3J^1\times (3J^2)^c]\cup [(3J^1)^c\times (3J^2)^c]. \end{aligned}$$

For the integral over the last component, Lemma 2.2 implies that

For the other components, using that \(\theta _2 = 1\) we get directly from the size estimate that

Altogether, we have checked that

$$\begin{aligned} |T(1_{(3J)^c}f)(y) -T(1_{(3J)^c}f)(z)|\lesssim M_*f(x), \end{aligned}$$

and taking the supremum over \(y,z\in J\) and then over \(J\ni x\) we see that

$$\begin{aligned} \mathcal M_{T,3}^{\#}f(x)\lesssim M_* f(x). \end{aligned}$$

\(\square \)

We now quote a slight variant of a result of Lerner and Ombrosi [14, Theorem 1.1]:

6.6 Theorem

(Lerner and Ombrosi [14]) Let T be a sublinear operator that is bounded from \(L^q(\mathbb {R}^d)\) to \(L^{q,\infty }(\mathbb {R}^d)\), and such that \(\mathcal M_{T,3}^{\#}\) is bounded from \(L^r(\mathbb {R}^d)\) to \(L^{r,\infty }(\mathbb {R}^d)\) for some \(1\le q,r<\infty \). Let \(s=\max (q,r)\). Then T satisfies pointwise \(L^s\)-sparse domination with constants

$$\begin{aligned} C=c_{d}(\Vert T\Vert _{L^q\rightarrow L^{q,\infty }}+\Vert \mathcal M_{T,3}^{\#}\Vert _{L^r\rightarrow L^{r,\infty }}),\qquad \epsilon =\epsilon _d. \end{aligned}$$

Proof

This is essentially [14, Theorem 1.1], except for some details mainly related to the constant C. Since this constant will be relevant to us below, we will explain the necessary changes. On a more trivial side, the statement in [14] involves an additional parameter \(\alpha \) in the maximal operator \(\mathcal M_{T,\alpha }^{\#}\); we simply take \(\alpha =3\). Also, in [14] the \(L^q(\mathbb {R}^d)\)-to-\(L^{q,\infty }(\mathbb {R}^d)\) boundedness is replaced by a certain “\(W_q\) condition”; however, this follows from the \(L^q(\mathbb {R}^d)\)-to-\(L^{q,\infty }(\mathbb {R}^d)\) boundedness, as pointed out shortly before [14, Theorem 1.1].

More seriously, the bound for C given in [14] has dependencies on additional parameters that we wish to avoid. For this, it is necessary to inspect the proof of [14, Theorem 1.1.]. The said proof provides the expression

$$\begin{aligned} C=(3+c)A, \end{aligned}$$

where c and A need to be chosen so that each of the sets

$$\begin{aligned}{} & {} \{M(|f|^s1_{3Q})^{1/s}>c\langle |f|\rangle _{3Q,s}\},\quad \{|T(f1_{3Q})|>A\langle |f|\rangle _{3Q,s}\},\\{} & {} \{\mathcal M_{T,3}^{\#}(f1_{3Q})>A\langle |f|\rangle _{3Q,s}\} \end{aligned}$$

have measure at most \(\epsilon _d|Q|\) for some small dimensional \(\epsilon _d\). However,

$$\begin{aligned} |\{M(|f|^s1_{3Q})>(c\langle |f|\rangle _{3Q,s})^s\}| \le \frac{C_d}{(c\langle |f|\rangle _{3Q,s})^s}\Vert |f|^s 1_{3Q}\Vert _{1} =\frac{C_d}{c^s}|3Q|=\frac{C_d 3^d}{c^s}|Q|, \end{aligned}$$

where \(C_d=\Vert M\Vert _{L^1(\mathbb {R}^d)\rightarrow L^{1,\infty }(\mathbb {R}^d)}\ge 1\), so that we can take

$$\begin{aligned} c=\Big (\frac{C_d 3^d}{\epsilon _d}\Big )\ge \Big (\frac{C_d 3^d}{\epsilon _d}\Big )^{1/s}, \end{aligned}$$

since \(s\ge 1\). Thus \(c=c_d\). On the other hand, using among other things that \(q\le s\) and Hölder’s inequality, there holds that

$$\begin{aligned} \begin{aligned} |\{|T(f1_{3Q})|>A\langle |f|\rangle _{3Q,s}\}|&\le \frac{\Vert T\Vert _{L^q\rightarrow L^{q,\infty }}^q}{(A\langle |f|\rangle _{3Q,s})^q}\Vert f 1_{3Q}\Vert _{L^q}^q \le \frac{\Vert T\Vert _{L^q\rightarrow L^{q,\infty }}^q}{(A\langle |f|\rangle _{3Q,q})^q}\Vert f 1_{3Q}\Vert _{L^q}^q \\&=\frac{\Vert T\Vert _{L^q\rightarrow L^{q,\infty }}^q}{A^q}|3Q|=\frac{3^d\Vert T\Vert _{L^q\rightarrow L^{q,\infty }}^q}{A^q}|Q|, \end{aligned} \end{aligned}$$

so to make this at most \(\epsilon _d\), it is enough to take

$$\begin{aligned} A\ge \frac{3^d}{\epsilon _d}\Vert T\Vert _{L^q\rightarrow L^{q,\infty }}\ge \Big (\frac{3^d}{\epsilon _d}\Big )^{1/q}\Vert T\Vert _{L^q\rightarrow L^{q,\infty }}, \end{aligned}$$

since \(q\ge 1\). Similarly, with \(\mathcal M_{T,3}^{\#}\) in place of T and r in place of q in order that \(|\{\mathcal M_{T,3}^{\#}>A\langle |f|\rangle _{3Q,s}\}|\le \epsilon _d|Q|\), it is enough to take

$$\begin{aligned} A\ge \frac{3^d}{\epsilon _d}\Vert \mathcal M_{T,3}^{\#}\Vert _{L^r\rightarrow L^{r,\infty }}. \end{aligned}$$

So an admissible choice is \(c=c_d\) and

$$\begin{aligned} A=\frac{3^d}{\epsilon _d}(\Vert T\Vert _{L^q\rightarrow L^{q,\infty }}+\Vert \mathcal M_{T,3}^{\#}\Vert _{L^r\rightarrow L^{r,\infty }}), \end{aligned}$$

and hence

$$\begin{aligned} (3+c)A=C_d(\Vert T\Vert _{L^q\rightarrow L^{q,\infty }}+\Vert \mathcal M_{T,3}^{\#}\Vert _{L^r\rightarrow L^{r,\infty }}). \end{aligned}$$

\(\square \)

An immediate consequence of the previous results is the following:

6.7 Corollary

Let \(T\in \mathcal {L}(L^2(\mathbb {R}^2))\) be an operator with a CZX kernel satisfying \(\theta _2 = 1\). Then for every \(p>1\), the operator T satisfies pointwise \(L^p\)-sparse domination with constants \(C\lesssim p'\) and an absolute \(\epsilon >0\).

Proof

We know from Theorem 1.3 that T is bounded from \(L^1(\mathbb {R}^2)\) to \(L^{1,\infty }(\mathbb {R}^2)\). From Lemma 6.5 we know that \(\mathcal M_{T,3}^{\#}f\lesssim M_* f\). Since the strong maximal operator is bounded from \(L^p(\mathbb {R}^2)\) to itself, and hence to \(L^{p,\infty }(\mathbb {R}^2)\), the assumptions of Theorem 6.6 are satisfied. Since \(d=2\), it is immediate that \(\epsilon =\epsilon _2\) provided by that theorem is absolute. In order to obtain the claim \(C\lesssim p'\), we observe for completeness the following (probably well known) estimate for \(M_*f\le M^2 M^1 f\), where \(M^i\) is the one-dimensional maximal operator with respect to the ith variable:

$$\begin{aligned} \begin{aligned} |\{M_*f>\lambda \}|&=\int _{\mathbb {R}}|\{y:M_*f(x,y)>\lambda \}|\,\textrm{d}x \le \int _{\mathbb {R}}|\{y:M^2 M^1 f(x,y)>\lambda \}|\,\textrm{d}x \\&\le \int _{\mathbb {R}} \lambda ^{-p}(\Vert M^2\Vert _{L^p\rightarrow L^{p,\infty }}\Vert M^1 f(x,\cdot )\Vert _{L^p(\mathbb {R})})^p\,\textrm{d}x \\&=\lambda ^{-p}\Vert M^2\Vert _{L^p\rightarrow L^{p,\infty }}^p\Vert M^1 f\Vert _{L^p(\mathbb {R}^2)}^p \\&\le \lambda ^{-p}\Vert M^2\Vert _{L^p\rightarrow L^{p,\infty }}^p(\Vert M^1\Vert _{L^p\rightarrow L^p} \Vert f\Vert _{L^p(\mathbb {R}^2)})^p. \end{aligned} \end{aligned}$$

It follows that

$$\begin{aligned} \Vert M_*\Vert _{L^p(\mathbb {R}^2)\rightarrow L^{p,\infty }(\mathbb {R}^2)}\le \Vert M^2\Vert _{L^p\rightarrow L^{p,\infty }} \Vert M^1\Vert _{L^p\rightarrow L^p} \lesssim 1\cdot p', \end{aligned}$$

using the fact that the \(L^p\) norm of the usual maximal operator is \(O(p')\), while its \(L^p\)-to-\(L^{p,\infty }\) norm can be estimated independently of p. In fact,

$$\begin{aligned} \begin{aligned} |\{Mf>\lambda \}|&=|\{(Mf)^p>\lambda ^p\}| \le |\{M(|f|^p)>\lambda ^p\}| \\&\le \lambda ^{-p}\Vert M\Vert _{L^1\rightarrow L^{1,\infty }}\Vert |f|^p\Vert _{L^1} =\lambda ^{-p}\Vert M\Vert _{L^1\rightarrow L^{1,\infty }}\Vert f\Vert _{L^p}^p, \end{aligned} \end{aligned}$$

and hence

$$\begin{aligned} \Vert M\Vert _{L^p\rightarrow L^{p,\infty }}\le \Vert M\Vert _{L^1\rightarrow L^{1,\infty }}^{1/p}\le \Vert M\Vert _{L^1\rightarrow L^{1,\infty }}. \end{aligned}$$

\(\square \)

6.8 Corollary

Let \(T\in \mathcal {L}(L^2(\mathbb {R}^2))\) be an operator with a CZX kernel satisfying \(\theta _2 = 1\). Then for every \(p\in (1,\infty )\) and every \(w\in A_p(\mathbb {R}^2)\), the operator T extends boundedly to \(L^p(w)\) with norm

$$\begin{aligned} \Vert T\Vert _{\mathcal {L}(L^p(w))}\lesssim _p [w]_{A_p}^{p'}. \end{aligned}$$

Proof

This follows by the same reasoning as [17, Theorem 1.6]. The said theorem is stated for different operators, but its proof only uses a certain sparse domination estimate for these operators, which is a slightly weaker variant (so-called form domination) of what we proved (pointwise domination) for operators with CZX kernel in Corollary 6.7, and hence the same reasoning applies to the case at hand. \(\square \)

A curious feature of the above proof of the weighted estimates is that it passes through estimates involving the strong maximal operator, which in principle should be forbidden in the theory of standard \(A_p\) weights; indeed, the strong maximal operator is bounded in \(L^p(w)\) for strong \(A_p\) weights only. The resolution of this paradox is that we only use the strong maximal operator as an intermediate step, in a part of the argument with no weights yet present, to establish some sparse bounds, which in turn imply the weighted estimates.

To conclude this section, we discuss commutator estimates. In fact, combining the ideas from [15] and [14] we can establish the following sparse domination principle.

6.9 Proposition

Let T be a linear operator that is bounded from \(L^q(\mathbb {R}^d)\) to \(L^{q,\infty }(\mathbb {R}^d)\), and such that \(\mathcal M_{T,3}^{\#}\) is bounded from \(L^r(\mathbb {R}^d)\) to \(L^{r,\infty }(\mathbb {R}^d)\) for some \(1\le q,r<\infty \). Let \(s=\max (q,r)\). Then there exists an \(\epsilon \)-sparse family \(\mathscr {S}\) such that the commutator \([b,T]f:=bTf-T(bf)\) satisfies

$$\begin{aligned} |[b,T]f|\le C \left( \sum _{S\in \mathscr {S}} \langle |(b-\langle b\rangle _S)f|\rangle _{S,s}1_S+ \sum _{S\in \mathscr {S}} |b-\langle b\rangle _S| \langle |f|\rangle _{S,s}1_S\right) , \end{aligned}$$

with constants

$$\begin{aligned} C=c_{d}(\Vert T\Vert _{L^q\rightarrow L^{q,\infty }}+\Vert \mathcal M_{T,3}^{\#}\Vert _{L^r\rightarrow L^{r,\infty }}),\qquad \epsilon =\epsilon _d. \end{aligned}$$

Proof

The proof is actually similar to [14, Theorem 1.1] but one should adapt ideas used in [15]. Let \(c, A,\epsilon _d\) be the same as those in the proof of Theorem 6.6. Apart from the sets

$$\begin{aligned}{} & {} \{M(|f|^s1_{3Q})^{1/s}>c\langle |f|\rangle _{3Q,s}\},\\{} & {} \quad \{|T(f1_{3Q})|>A\langle |f|\rangle _{3Q,s}\},\quad \{\mathcal M_{T,3}^{\#}(f1_{3Q})>A\langle |f|\rangle _{3Q,s}\}, \end{aligned}$$

for the same reason, we can also let each of the sets

$$\begin{aligned} \{M(|(b-\langle b\rangle _{3Q})f|^s1_{3Q})^{1/s}&>c\langle |(b-\langle b\rangle _{3Q})f|\rangle _{3Q,s}\},\\ \{|T((b-\langle b\rangle _{3Q})f1_{3Q})|&>A\langle |(b-\langle b\rangle _{3Q})f|\rangle _{3Q,s}\},\\ \{\mathcal M_{T,3}^{\#}((b-\langle b\rangle _{3Q})f1_{3Q})&>A\langle |(b-\langle b\rangle _{3Q})f|\rangle _{3Q,s}\} \end{aligned}$$

have measure at most \(\epsilon _d|Q|\). Denote the union of the above six sets by E and set \(\Omega =E\cap Q\). Manipulating in the same way as in [14, Theorem 1.1], we get a family of pairwise disjoint cubes \(\{P_j\}\subset Q\) such that \( \sum _{j}|P_j|\le \frac{1}{2} |Q| \) and \(|\Omega {\setminus } \cup _j P_j |=0\). The latter implies that for a.e. \(x\in Q\setminus \cup _j P_j\) there holds that

$$\begin{aligned} |T(f1_{3Q})(x)|\le A\langle |f|\rangle _{3Q,s}, \quad |T((b-\langle b\rangle _{3Q})f1_{3Q})(x)|\le A\langle |(b-\langle b\rangle _{3Q})f|\rangle _{3Q,s}. \end{aligned}$$

On the other hand, similarly as in [14, Theorem 1.1] we also have for a.e. \(x\in P_j\) that

$$\begin{aligned} |T(f1_{3Q\setminus 3P_j})(x)|&\le (2+c) A \langle |f|\rangle _{3Q,s},\\ |T((b-\langle b\rangle _{3Q})f1_{3Q\setminus 3P_j})(x)|&\le (2+c)A\langle |(b-\langle b\rangle _{3Q})f|\rangle _{3Q,s}. \end{aligned}$$

Thus

$$\begin{aligned}&|[b,T](f1_{3Q})|1_Q(x)= |[b,T](f1_{3Q})|1_{Q\setminus \cup _j P_j}(x)\\&\quad +\sum _j |[b,T](f1_{3Q\setminus 3P_j})|1_{P_j}\\&\quad +\sum _j |[b,T](f1_{3P_j})|1_{P_j}\\&=|[b-\langle b\rangle _{3Q},T](f1_{3Q})|1_{Q\setminus \cup _j P_j}(x)\\&\quad +\sum _j |[b-\langle b\rangle _{3Q},T](f1_{3Q\setminus 3P_j})|1_{P_j}\\&\quad +\sum _j |[b,T](f1_{3P_j})|1_{P_j}\\&\quad \le (3+c)A|b-\langle b\rangle _{3Q}| \langle |f|\rangle _{3Q,s}1_Q+(3+c)A \langle |(b-\langle b\rangle _{3Q})f|\rangle _{3Q,s}1_Q\\&\quad +\sum _j |[b,T](f1_{3P_j})|1_{P_j}. \end{aligned}$$

Note that the linearity of T is used in the second step. With the recursive inequality at hand, the rest is standard (see e.g. [14, Lemma 2.1]). And since the constants c and A are the same as those in Theorem 6.6, we get the desired constant in the sparse domination. \(\square \)

Analogous to Corollary 6.7, we have the following result

6.10 Proposition

Let \(T\in \mathcal {L}(L^2(\mathbb {R}^2))\) be an operator with a CZX kernel satisfying \(\theta _2 = 1\). Then for every \(p>1\), the commutator [bT] satisfies

$$\begin{aligned} |[b,T]f|\le C p'\left( \sum _{S\in \mathscr {S}} \langle |(b-\langle b\rangle _S)f|\rangle _{S,p}1_S+ \sum _{S\in \mathscr {S}} |b-\langle b\rangle _S| \langle |f|\rangle _{S,p}1_S\right) , \end{aligned}$$

where \(\mathscr {S}\) is an \(\epsilon \)-sparse family with \(\epsilon > 0\) absolute and C is an absolute constant depending only on the operator T.

In [16, Theorem 5.2] a two-weight commutator estimate for rough homogeneous singular integrals was formulated. As our sparse forms are the same as there, the two-weight commutator estimate of Theorem 1.5 follows as a direct consequence of Proposition 6.10.