1 Introduction

The usual definition of a singular integral operator (SIO)

$$\begin{aligned} Tf(x)=\int _{{\mathbb {R}}^d}K(x,y)f(y)\,\mathrm {d}y \end{aligned}$$

involves a Hölder-continuous kernel K with a power-type continuity-modulus \(t \mapsto t^{\gamma }\). However, many results continue to hold with significantly more general assumptions. Such kernel regularity considerations become non-trivial especially in connection with results that go beyond the classical Calderón–Zygmund theory—an example is the \(A_2\) theorem of Hytönen [21] with Dini-continuous kernels by Lacey [24]. Estimates for SIOs with mild kernel regularity are, for instance, linked to the theory of rough singular integrals, see e.g. [22].

The fundamental question concerning the \(L^2\) (or \(L^p\)) boundedness of an SIO T is usually best answered by so-called T1 theorems, where the action of the operator T on the constant function 1 is key. We study kernel regularity questions specifically in situations that are very tied to the T1 type arguments and the corresponding structural theory—a big part of the modern product space theory of SIOs relies on such analysis. The proofs of T1 theorems display a fundamental structural decomposition of SIOs into their cancellative parts and so-called paraproducts. It is this structure that is extremely important for obtaining further estimates beyond the initial scalar-valued \(L^p\) boundedness. Refined versions of T1 theorems provide exact identities in terms of model operators and are called representation theorems, see [20, 21, 32].

A concrete definition of kernel regularity is as follows. It concerns the required regularity of the continuity-moduli \(\omega \) appearing in the various kernel estimates, such as,

$$\begin{aligned} |K(x, y) - K(x', y)| \le \omega \left( \frac{|x-x'|}{|x-y|}\right) \frac{1}{|x-y|^d}, \,\, |x-x'| \le |x-y|/2. \end{aligned}$$

Recently, Grau de la Herrán and Hytönen [17] proved that the modified Dini condition

$$\begin{aligned} \Vert \omega \Vert _{{\text {Dini}}_{\alpha }} := \int _0^1 \omega (t) \Big ( 1 + \log \frac{1}{t} \Big )^{\alpha } \frac{dt}{t} \end{aligned}$$

with \(\alpha = \frac{1}{2}\) is sufficient to prove a T1 theorem even with an underlying measure \(\mu \) that can be non-doubling. This matches the best known sufficient condition for the classical homogeneous T1 theorem [10]—such results are implicit in Figiel [16] and explicit in Deng et al. [11]. The exponent \(\alpha = \frac{1}{2}\) has a fundamental, even sharp, feeling in all of the existing arguments.

In [17] a new type of representation theorem appears, where the key difference to the original representation theorems [20, 21] is that the decomposition of the cancellative part is in terms of different operators that package multiple dyadic shifts into one and offer more efficient bounds when it comes to kernel regularity. Some of the ideas of the decomposition in [17] are rooted in the work of Figiel [15, 16]. We simultaneously extend [17] both to the multilinear [12,13,14, 27, 33] and multi-parameter [23, 30, 32, 35] settings. The proofs of the representation theorems appear to be now converging to their final and most elegant form, and the arguments are simultaneously efficient and sharp.

Linear bi-parameter SIOs, for example, have kernels with singularities on \(x_1=y_1\) or \(x_2 = y_2\), where \(x,y\in {\mathbb {R}}^d\) are written as \(x= (x_1, x_2), y = (y_1, y_2) \in {\mathbb {R}}^{d_1}\times {\mathbb {R}}^{d_2}\) for a fixed partition \(d=d_1+d_2\). For \(x,y \in {\mathbb {C}}= {\mathbb {R}}\times {\mathbb {R}}\), compare e.g. the one-parameter Beurling kernel \(1/(x-y)^2\) with the bi-parameter kernel \(1/[(x_1-y_1)(x_2-y_2)]\)—the product of Hilbert kernels in both coordinate directions. In general, the product space analysis is quite different from one-parameter analysis and seems to resist many techniques—in part due to the failure of bi-parameter sparse domination methods, see [3] (see also [4] however), representation theorems are even more important in bi-parameter than in one-parameter. For example, the dyadic representation methods have proved very fruitful in connection with bi-parameter commutators and weighted analysis, see Holmes–Petermichl–Wick [19], Ou–Petermichl–Strouse [36] and [28]. See also [1, 2].

We discuss various applications throughout. For example, we prove the following two-weight estimate for commutators. The result (1) extends [29] and the result (2) extends [19] and [28].

Theorem 1.1

Suppose that \({\mathbb {R}}^d = {\mathbb {R}}^{d_1} \times {\mathbb {R}}^{d_2}\) is the underlying bi-parameter space, \(p \in (1, \infty )\), \(\mu , \lambda \in A_p({\mathbb {R}}^d)\) are bi-parameter weights and \(\nu = \mu ^{1/p} \lambda ^{-1/p} \in A_2({\mathbb {R}}^d)\) is the Bloom weight.

  1. (1)

    If \(T_i\), \(i = 1,2\), is a one-parameter \(\omega _i\)-CZO on \({\mathbb {R}}^{d_i}\), where \(\omega _i \in {\text {Dini}}_{3/2}\), then

    $$\begin{aligned} \Vert [T_1, [T_2, b]] \Vert _{L^p(\mu ) \rightarrow L^p(\lambda )} \lesssim \Vert b\Vert _{{\text {BMO}}_{\text {prod}}(\nu )}. \end{aligned}$$
  2. (2)

    Suppose that T is a bi-parameter \((\omega _1, \omega _2)\)-CZO. Then we have

    $$\begin{aligned} \Vert [b_m,\cdots [b_2, [b_1, T]]\cdots ]\Vert _{L^p(\mu ) \rightarrow L^p(\lambda )} \lesssim \prod _{j=1}^m\Vert b_j\Vert _{{\text {bmo}}(\nu ^{1/m})} \end{aligned}$$

    if one of the following conditions holds:

    1. (a)

      T is paraproduct free and \(\omega _i \in {\text {Dini}}_{m/2+1}\);

    2. (b)

      \(m=1\) and \(\omega _i \in {\text {Dini}}_{3/2}\);

    3. (c)

      \(\omega _i \in {\text {Dini}}_{m+1}\).

See the main text for all of the definitions and for additional results. These Bloom-style two-weight estimates have recently been one of the main lines of development concerning commutators, see e.g. [1, 2, 18, 19, 25, 26, 28, 29] for a non-exhaustive list.

2 Basic Notation and Fundamental Estimates

Throughout this paper \(A\lesssim B\) means that \(A\le CB\) with some constant C that we deem unimportant to track at that point. We write \(A\sim B\) if \(A\lesssim B\lesssim A\).

Dyadic Notation. Given a dyadic grid \({\mathcal {D}}\), \(I \in {\mathcal {D}}\) and \(k \in {\mathbb {Z}}\), \(k \ge 0\), we use the following notation:

  1. (1)

    \(\ell (I)\) is the side length of I.

  2. (2)

    \(I^{(k)} \in {\mathcal {D}}\) is the kth parent of I, i.e., \(I \subset I^{(k)}\) and \(\ell (I^{(k)}) = 2^k \ell (I)\).

  3. (3)

    \({\text {ch}}(I)\) is the collection of the children of I, i.e., \({\text {ch}}(I) = \{J \in {\mathcal {D}}:J^{(1)} = I\}\).

  4. (4)

    \(E_I f=\langle f \rangle _I 1_I\) is the averaging operator, where \(\langle f \rangle _I = \fint _{I} f = \frac{1}{|I|} \int _I f\).

  5. (5)

    \(E_{I, k}f\) is defined via

    $$\begin{aligned} E_{I,k}f = \sum _{\begin{array}{c} J \in {\mathcal {D}}\\ J^{(k)}=I \end{array}}E_J f. \end{aligned}$$
  6. (6)

    \(\Delta _If\) is the martingale difference \(\Delta _I f= \sum _{J \in {\text {ch}}(I)} E_{J} f - E_{I} f\).

  7. (7)

    \(\Delta _{I,k} f\) is the martingale difference block

    $$\begin{aligned} \Delta _{I,k} f=\sum _{\begin{array}{c} J \in {\mathcal {D}}\\ J^{(k)}=I \end{array}} \Delta _{J} f. \end{aligned}$$
  8. (8)

    \(P_{I,k}f\) is the following sum of martingale difference blocks

    $$\begin{aligned} P_{I,k}f = \sum _{j=0}^{k} \Delta _{I,j} f =\sum _{\begin{array}{c} J \in {\mathcal {D}}\\ J \subset I \\ \ell (J) \ge 2^{-k}\ell (I) \end{array}} \Delta _{J} f. \end{aligned}$$

A fundamental fact is that we have the square function estimate

$$\begin{aligned} \Vert S_{{\mathcal {D}}} f\Vert _{L^p} \sim \Vert f\Vert _{L^p}, \qquad p \in (1,\infty ), \,\, S_{{\mathcal {D}}}f := \left( \sum _{I \in {\mathcal {D}}} |\Delta _I f|^2 \right) ^{1/2}. \end{aligned}$$
(2.1)

See e.g. [7, 8] for even weighted \(\Vert S_{{\mathcal {D}}} f\Vert _{L^p(w)} \sim \Vert f\Vert _{L^p(w)}\), \(w \in A_p\), square function estimates and their history. A weight w (i.e. a locally integrable a.e. positive function) belongs to the weight class \(A_p({\mathbb {R}}^d)\), \(1< p < \infty \), if

$$\begin{aligned} {[}w]_{A_p({\mathbb {R}}^d)} := \sup _{Q} \frac{1}{|Q|}\int _Q w \Bigg ( \frac{1}{|Q|}\int _Q w^{1-p'}\Bigg )^{p-1} < \infty , \end{aligned}$$

where the supremum is taken over all cubes \(Q \subset {\mathbb {R}}^d\).

Lemma 2.2

Let \(p \in (1, \infty )\). There holds that

$$\begin{aligned} \left\| \left( \sum _{K \in {\mathcal {D}}} |P_{K,k}f|^2 \right) ^{1/2} \right\| _{L^p} \sim \sqrt{k+1} \Vert f \Vert _{L^p}, \qquad k \in \{0,1,2, \ldots \}. \end{aligned}$$

Proof

If \(f_i \in L^p\) then

$$\begin{aligned} \left\| \left( \sum _{i=0}^\infty \sum _{I \in {\mathcal {D}}} |\Delta _I f_i |^2 \right) ^{1/2} \right\| _{L^p} \sim \left\| \left( \sum _{i=0}^\infty | f_i |^2 \right) ^{1/2} \right\| _{L^p}. \end{aligned}$$
(2.3)

This follows by extrapolating the corresponding weighted \(L^2\) version of (2.3), which, in turn, simply follows from \(\Vert S_{{\mathcal {D}}} f\Vert _{L^2(w)} \sim \Vert f\Vert _{L^2(w)}\), \(w \in A_2\). Recall that the classical extrapolation theorem of Rubio de Francia says that if \(\Vert h\Vert _{L^{p_0}(w)} \lesssim \Vert g\Vert _{L^{p_0}(w)}\) for some \(p_0 \in (1,\infty )\) and all \(w \in A_{p_0}\), then \(\Vert h\Vert _{L^{p}(w)} \lesssim \Vert g\Vert _{L^{p}(w)}\) for all \(p \in (1,\infty )\) and all \(w \in A_{p}\).

Let \(K \in {\mathcal {D}}\). We have that

$$\begin{aligned} \sum _{I \in {\mathcal {D}}} | \Delta _I P_{K,k} f |^2 = \sum _{j=0}^k |\Delta _{K,j} f |^2. \end{aligned}$$

Thus, (2.3) gives that

$$\begin{aligned} \begin{aligned} \left\| \left( \sum _{K \in {\mathcal {D}}} |P_{K,k}f|^2 \right) ^{1/2} \right\| _{L^p}&\sim \left\| \left( \sum _{K \in {\mathcal {D}}} \sum _{j=0}^k |\Delta _{K,j} f |^2 \right) ^{1/2} \right\| _{L^p} \\&=\left\| \left( \sum _{j=0}^k \sum _{I \in {\mathcal {D}}} |\Delta _{I} f |^2 \right) ^{1/2} \right\| _{L^p} \sim \sqrt{k+1} \Vert f \Vert _{L^p}. \end{aligned} \end{aligned}$$

\(\square \)

We will also have use for the Fefferman–Stein inequality

$$\begin{aligned} \left\| \left( \sum _{k} |Mf_k|^2 \right) ^{1/2} \right\| _{L^p(w)} \lesssim \left\| \left( \sum _{k} |f_k|^2 \right) ^{1/2} \right\| _{L^p(w)}, \qquad p \in (1,\infty ), w \in A_p, \end{aligned}$$

where M is the Hardy–Littlewood maximal function. Often, the lighter Stein’s inequality

$$\begin{aligned} \left\| \left( \sum _{I \in {\mathcal {D}}} |E_I f_I|^2 \right) ^{1/2} \right\| _{L^p(w)} \lesssim \left\| \left( \sum _{I \in {\mathcal {D}}} |f_I|^2 \right) ^{1/2} \right\| _{L^p(w)}, \qquad p \in (1,\infty ), w \in A_p, \end{aligned}$$

is sufficient.

For an interval \(J \subset {\mathbb {R}}\) we denote by \(J_{l}\) and \(J_{r}\) the left and right halves of J, respectively. We define \(h_{J}^0 = |J|^{-1/2}1_{J}\) and \(h_{J}^1 = |J|^{-1/2}(1_{J_{l}} - 1_{J_{r}})\). Let now \(I = I_1 \times \cdots \times I_d \subset {\mathbb {R}}^d\) be a cube, and define the Haar function \(h_I^{\eta }\), \(\eta = (\eta _1, \ldots , \eta _d) \in \{0,1\}^d\), by setting

$$\begin{aligned} h_I^{\eta } = h_{I_1}^{\eta _1} \otimes \cdots \otimes h_{I_d}^{\eta _d}. \end{aligned}$$

If \(\eta \ne 0\) the Haar function is cancellative: \(\int h_I^{\eta } = 0\). We exploit notation by suppressing the presence of \(\eta \), and write \(h_I\) for some \(h_I^{\eta }\), \(\eta \ne 0\). Notice that for \(I \in {\mathcal {D}}\) we have \(\Delta _I f = \langle f, h_I \rangle h_I\) (where the finite \(\eta \) summation is suppressed), \(\langle f, h_I\rangle := \int fh_I\).

Bi-parameter Variants A weight \(w(x_1, x_2)\) (i.e. a locally integrable a.e. positive function) belongs to the bi-parameter weight class \(A_p({\mathbb {R}}^{d_1} \times {\mathbb {R}}^{d_2})\), \(1< p < \infty \), if

$$\begin{aligned} {[}w]_{A_p({\mathbb {R}}^{d_1} \times {\mathbb {R}}^{d_2})} := \sup _{R} \frac{1}{|R|}\int _R w \Bigg ( \frac{1}{|R|}\int _R w^{1-p'}\Bigg )^{p-1} < \infty , \end{aligned}$$

where the supremum is taken over \(R = I^1 \times I^2\) and each \(I^i \subset {\mathbb {R}}^{d_i}\) is a cube. Thus, this is the one-parameter definition but cubes are replaced by rectangles.

We have

$$\begin{aligned} {[}w]_{A_p({\mathbb {R}}^{d_1} \times {\mathbb {R}}^{d_2})}< \infty \text { iff } \max \left( \mathop {\mathrm{ess\,sup}}\limits _{x_1 \in {\mathbb {R}}^{d_1}} \,[w(x_1, \cdot )]_{A_p({\mathbb {R}}^{d_2})}, \mathop {\mathrm{ess\,sup}}\limits _{x_2 \in {\mathbb {R}}^{d_2}}\, [w(\cdot , x_2)]_{A_p({\mathbb {R}}^{d_1})} \right) < \infty , \end{aligned}$$

and that

$$\begin{aligned} \max \left( \mathop {\mathrm{ess\,sup}}\limits _{x_1 \in {\mathbb {R}}^{d_1}} \,[w(x_1, \cdot )]_{A_p({\mathbb {R}}^{d_2})}, \mathop {\mathrm{ess\,sup}}\limits _{x_2 \in {\mathbb {R}}^{d_2}}\, [w(\cdot , x_2)]_{A_p({\mathbb {R}}^{d_1})} \right) \le [w]_{A_p({\mathbb {R}}^{d_1}\times {\mathbb {R}}^{d_2})}, \end{aligned}$$

while the constant \([w]_{A_p}\) is dominated by the maximum to some power. For basic bi-parameter weighted theory see e.g. [19]. We say \(w\in A_\infty ({\mathbb {R}}^{d_1}\times {\mathbb {R}}^{d_2})\) if

$$\begin{aligned} {[}w]_{A_\infty ({\mathbb {R}}^{d_1}\times {\mathbb {R}}^{d_2})}:=\sup _R \frac{1}{|R|}\int _R w \exp \Bigg ( \frac{1}{|R|}\int _R \log (w^{-1}) \Bigg )<\infty . \end{aligned}$$

It is well-known that

$$\begin{aligned} A_\infty ({\mathbb {R}}^{d_1}\times {\mathbb {R}}^{d_2})=\bigcup _{1<p<\infty }A_p({\mathbb {R}}^{d_1}\times {\mathbb {R}}^{d_2}). \end{aligned}$$

We do not have any important use for the \(A_{\infty }\) constant. The \(w \in A_{\infty }\) assumption can always be replaced with the explicit assumption \(w \in A_s\) for some \(s \in (1,\infty )\), and then estimating everything with a dependence on \([w]_{A_s}\).

We denote a general dyadic grid in \({\mathbb {R}}^{d_i}\) by \({\mathcal {D}}^i\). We denote cubes in \({\mathcal {D}}^i\) by \(I^i, J^i, K^i\), etc. Thus, our dyadic rectangles take the forms \(I^1 \times I^2\), \(J^1 \times J^2\), \(K^1 \times K^2\) etc.

If A is an operator acting on \({\mathbb {R}}^{d_1}\), we can always let it act on the product space \({\mathbb {R}}^d = {\mathbb {R}}^{d_1} \times {\mathbb {R}}^{d_2}\) by setting \(A^1f(x) = A(f(\cdot , x_2))(x_1)\). Similarly, we use the notation \(A^2 f(x) = A(f(x_1, \cdot ))(x_2)\) if A is originally an operator acting on \({\mathbb {R}}^{d_2}\). Our basic bi-parameter dyadic operators – martingale differences and averaging operators—are obtained by simply chaining together relevant one-parameter operators. For instance, a bi-parameter martingale difference is \(\Delta _R f = \Delta _{I^1}^1 \Delta _{I^2}^2 f\), \(R = I^1 \times I^2\). Bi-parameter estimates, such as the square function bound

$$\begin{aligned} \left\| \left( \sum _{R \in {\mathcal {D}}^1 \times {\mathcal {D}}^2} |\Delta _R f|^2 \right) ^{1/2} \right\| _{L^p(w)} = \left\| \left( \sum _{I^i \in {\mathcal {D}}^i} |\Delta _{I^1}^1 \Delta _{I^2}^2 f|^2 \right) ^{1/2} \right\| _{L^p(w)} \sim \Vert f\Vert _{L^p(w)}, \end{aligned}$$

where \(p \in (1,\infty )\) and w is a bi-parameter \(A_p\) weight, are easily obtained using vector-valued versions of the corresponding one-parameter estimates. The required vector-valued estimates, on the other hand, follow simply by extrapolating the obvious weighted \(L^2(w)\) estimates.

We systematically collect maximal function and square function bounds now. First, some notation. When we integrate with respect to only one of the parameters we may e.g. write

$$\begin{aligned} \langle f, h_{I_1} \rangle _1(x_2):=\int _{{\mathbb {R}}^{d_1}} f(x_1, x_2)h_{I_1}(x_1) \,\mathrm {d}x_1. \end{aligned}$$

If \({\mathcal {D}}= {\mathcal {D}}^1 \times {\mathcal {D}}^2\) we define the dyadic bi-parameter maximal function

$$\begin{aligned} M_{{\mathcal {D}}} f:= \sup _{R \in {\mathcal {D}}} 1_R \big \langle |f|\big \rangle _R. \end{aligned}$$

Now define the square functions

$$\begin{aligned} S_{{\mathcal {D}}} f = \left( \sum _{R \in {\mathcal {D}}} |\Delta _R f|^2 \right) ^{1/2}, \,\, S_{{\mathcal {D}}^1}^1 f = \left( \sum _{I^1 \in {\mathcal {D}}^1} |\Delta _{I^1}^1 f|^2 \right) ^{1/2} \end{aligned}$$

and define \(S_{{\mathcal {D}}^2}^2 f\) analogously. Define also

$$\begin{aligned} S_{{\mathcal {D}}, M}^1 f= & {} \left( \sum _{I^1 \in {\mathcal {D}}^1} \frac{1_{I^1}}{|I^1|} \otimes \big [M_{{\mathcal {D}}^2} \big \langle f, h_{I^1} \big \rangle _1\big ]^2 \right) ^{1/2}, \\ S_{{\mathcal {D}}, M}^2 f= & {} \left( \sum _{I^2 \in {\mathcal {D}}^2} \big [M_{{\mathcal {D}}^1} \big \langle f, h_{I^2} \big \rangle _2\big ]^2 \otimes \frac{1_{I^2}}{|I^2|}\right) ^{1/2}. \end{aligned}$$

Let \(k=(k_1,k_2)\), where \(k_i \in \{0,1,2, \dots ,\}\), and \(K=K^1 \times K^2 \in {\mathcal {D}}\). We set

$$\begin{aligned} P^1_{K^1,k_1}f=\sum _{\begin{array}{c} I^1 \in {\mathcal {D}}^1 \\ I^1 \subset K^1 \\ \ell (I^1) \ge 2^{-k_1}\ell (K^1) \end{array}} \Delta ^1_{I^1} f \end{aligned}$$

and define similarly \(P^2_{K^2,k_2}\). Then, we define \(P_{K,k}:= P^1_{K^1,k_1}P^2_{K^2,k_2}\).

Lemma 2.4

For \(p \in (1,\infty )\) and a bi-parameter weight \(w \in A_p\) we have

$$\begin{aligned} \Vert f \Vert _{L^p(w)} \sim \Vert S_{{\mathcal {D}}} f\Vert _{L^p(w)} \sim \Vert S_{{\mathcal {D}}^1}^1 f \Vert _{L^p(w)} \sim \Vert S_{{\mathcal {D}}^2}^2 f \Vert _{L^p(w)}. \end{aligned}$$

For \(k=(k_1,k_2)\), \(k_i \in \{0,1, \dots , \}\), we have the estimates

$$\begin{aligned} \left\| \left( \sum _{K \in {\mathcal {D}}} | P_{K,k}f|^2 \right) ^{1/2} \right\| _{L^p(w)}\lesssim & {} \sqrt{k_1+1} \sqrt{k_2+1} \Vert f \Vert _{L^p(w)},\\ \left\| \left( \sum _{K^1 \in {\mathcal {D}}^1} | P^1_{K^1,k_1}f|^2 \right) ^{1/2} \right\| _{L^p(w)}\lesssim & {} \sqrt{k_1+1} \Vert f \Vert _{L^p(w)} \end{aligned}$$

and the analogous estimate with \(P^2_{K^2,k_2}\).

Moreover, for \(p, s \in (1,\infty )\) we have the Fefferman–Stein inequality

$$\begin{aligned} \left\| \left( \sum _j |M f_j |^s \right) ^{1/s} \right\| _{L^p(w)} \lesssim \left\| \left( \sum _{j} | f_j |^s \right) ^{1/s} \right\| _{L^p(w)}. \end{aligned}$$

Here M can e.g. be \(M_{{\mathcal {D}}^1}^1\) or \(M_{{\mathcal {D}}}\). Finally, we have

$$\begin{aligned} \Vert S_{{\mathcal {D}}, M}^1 f\Vert _{L^p(w)} + \Vert S_{{\mathcal {D}}, M}^2 f\Vert _{L^p(w)} \lesssim \Vert f\Vert _{L^p(w)}. \end{aligned}$$

3 Bi-parameter Singular Integrals

Bi-parameter SIOs We say that \(\omega \) is a modulus of continuity if it is an increasing and subadditive function with \(\omega (0) = 0\). A relevant quantity is the modified Dini condition

$$\begin{aligned} \Vert \omega \Vert _{{\text {Dini}}_{\alpha }} := \int _0^1 \omega (t) \Big ( 1 + \log \frac{1}{t} \Big )^{\alpha } \frac{dt}{t}, \qquad \alpha \ge 0. \end{aligned}$$
(3.1)

In practice, the quantity (3.1) arises as follows:

$$\begin{aligned} \sum _{k=1}^{\infty } \omega (2^{-k}) k^{\alpha } = \sum _{k=1}^{\infty } \frac{1}{\log 2} \int _{2^{-k}}^{2^{-k+1}} \omega (2^{-k}) k^{\alpha } \frac{dt}{t} \lesssim \int _0^1 \omega (t) \Big ( 1 + \log \frac{1}{t} \Big )^{\alpha } \frac{dt}{t}.\nonumber \\ \end{aligned}$$
(3.2)

For many standard arguments \(\alpha = 0\) is enough. For the T1 type arguments we will always need \(\alpha = 1/2\). Some further applications can require a higher \(\alpha \).

Let \({\mathbb {R}}^d = {\mathbb {R}}^{d_1} \times {\mathbb {R}}^{d_2}\) and consider an n-linear operator T on \({\mathbb {R}}^d\). We define what it means for T to be an n-linear bi-parameter SIO. Let \(\omega _i\) be a modulus of continuity on \({\mathbb {R}}^{d_i}\). Let \(f_j = f_j^1 \otimes f_j^2\), \(j = 1, \ldots , n+1\).

First, we set up notation for the adjoints of T. We let \(T^{j*}\), \(j \in \{0, \ldots , n\}\), denote the full adjoints, i.e., \(T^{0*} = T\) and otherwise

$$\begin{aligned} \langle T(f_1, \dots , f_n), f_{n+1} \rangle = \langle T^{j*}(f_1, \dots , f_{j-1}, f_{n+1}, f_{j+1}, \dots , f_n), f_j \rangle . \end{aligned}$$

A subscript 1 or 2 denotes a partial adjoint in the given parameter—for example, we define

$$\begin{aligned} \langle T(f_1, \dots , f_n), f_{n+1} \rangle \!=\! \langle T^{j*}_1(f_1, \dots , f_{j-1}, f_{n+1}^1 \otimes f_j^2, f_{j+1}, \dots , f_n), f_j^1 \!\otimes \! f_{n+1}^2 \rangle . \end{aligned}$$

Finally, we can take partial adjoints with respect to different parameters in different slots also—in that case we denote the adjoint by \(T^{j_1*, j_2*}_{1,2}\). It simply interchanges the functions \(f_{j_1}^1\) and \(f_{n+1}^1\) and the functions \(f_{j_2}^2\) and \(f_{n+1}^2\). Of course, we e.g. have \(T^{j^*, j^*}_{1,2} = T^{j*}\) and \(T^{0*, j^*}_{1,2} = T^{j*}_{2}\), so everything can be obtained, if desired, with the most general notation \(T^{j_1*, j_2*}_{1,2}\). In any case, there are \((n+1)^2\) adjoints (including T itself). Similarly, the dyadic model operators that we later define always have \((n+1)^2\) different forms.

Full Kernel Representation Here we assume that given \(m \in \{1,2\}\) there exists \(j_1, j_2 \in \{1, \ldots , n+1\}\) so that \({\text {spt}} \,f_{j_1}^m \cap {\text {spt}}\, f_{j_2}^m = \emptyset \). In this case we demand that

$$\begin{aligned} \langle T(f_1, \ldots , f_n), f_{n+1}\rangle = \int _{{\mathbb {R}}^{(n+1)d}} K(x_{n+1},x_1, \dots , x_n)\prod _{j=1}^{n+1} f_j(x_j) \,\mathrm {d}x, \end{aligned}$$

where

$$\begin{aligned} K :{\mathbb {R}}^{(n+1)d} \setminus \{ (x_{n+1},x_1, \ldots , x_{n}) \in {\mathbb {R}}^{(n+1)d}:x_1^1 = \cdots= & {} x_{n+1}^1 \text { or }\\ x_1^2= & {} \cdots = x_{n+1}^2\} \rightarrow {\mathbb {C}}\end{aligned}$$

is a kernel satisfying a set of estimates which we specify next.

The kernel K is assumed to satisfy the size estimate

$$\begin{aligned} |K(x_{n+1},x_1, \dots , x_n)| \lesssim \prod _{m=1}^2 \frac{1}{\Big (\sum _{j=1}^{n} |x_{n+1}^m-x_j^m|\Big )^{d_mn}}. \end{aligned}$$

We also require the following continuity estimates—to which we continue to refer to as Hölder estimates despite the general continuity moduli. For example, we require that we have

$$\begin{aligned}&|K(x_{n+1}, x_1, \ldots , x_n)-K(x_{n+1},x_1, \dots , x_{n-1}, (c^1,x^2_n))\\&\qquad -K((x_{n+1}^1,c^2),x_1, \dots , x_n)+K((x_{n+1}^1,c^2),x_1, \dots , x_{n-1}, (c^1,x^2_n))| \\&\quad \lesssim \omega _1 \left( \frac{|x_{n}^1-c^1| }{ \sum _{j=1}^{n} |x_{n+1}^1-x_j^1|} \right) \frac{1}{\left( \sum _{j=1}^{n} |x_{n+1}^1-x_j^1|\right) ^{d_1n}} \\&\qquad \times \omega _2 \left( \frac{|x_{n+1}^2-c^2| }{ \sum _{j=1}^{n} |x_{n+1}^2-x_j^2|} \right) \frac{1}{\left( \sum _{j=1}^{n} |x_{n+1}^2-x_j^2|\right) ^{d_2n}} \end{aligned}$$

whenever \(|x_n^1-c^1| \le 2^{-1} \max _{1 \le i \le n} |x_{n+1}^1-x_i^1|\) and \(|x_{n+1}^2-c^2| \le 2^{-1} \max _{1 \le i \le n} |x_{n+1}^2-x_i^2|\). Of course, we also require all the other natural symmetric estimates, where \(c^1\) can be in any of the given \(n+1\) slots and similarly for \(c^2\). There are, of course, \((n+1)^2\) different estimates.

Finally, we require the following mixed Hölder and size estimates. For example, we ask that

$$\begin{aligned}&|K(x_{n+1}, x_1, \ldots , x_n)-K(x_{n+1},x_1, \dots , x_{n-1}, (c^1,x^2_n))| \\&\quad \lesssim \omega _1 \left( \frac{|x_{n}^1-c^1| }{ \sum _{j=1}^{n} |x_{n+1}^1-x_j^1|} \right) \frac{1}{\left( \sum _{j=1}^{n} |x_{n+1}^1-x_j^1|\right) ^{d_1n}} \cdot \frac{1}{\left( \sum _{j=1}^{n} |x_{n+1}^2-x_j^2|\right) ^{d_2n}} \end{aligned}$$

whenever \(|x_n^1-c^1| \le 2^{-1} \max _{1 \le i \le n} |x_{n+1}^1-x_i^1|\). Again, we also require all the other natural symmetric estimates.

Partial Kernel Representations Suppose now only that there exists \(j_1, j_2 \in \{1, \ldots , n+1\}\) so that \({\text {spt}}\,f_{j_1}^1 \cap {\text {spt}}\, f_{j_2}^1 = \emptyset \). Then we assume that

$$\begin{aligned} \langle T(f_1, \ldots , f_n), f_{n+1}\rangle = \int _{{\mathbb {R}}^{(n+1)d_1}} K_{(f_j^2)}(x_{n+1}^1, x_1^1, \ldots , x_n^1) \prod _{j=1}^{n+1} f_j^1(x^1_j) \,\mathrm {d}x^1, \end{aligned}$$

where \(K_{(f_j^2)}\) is a one-parameter \(\omega _1\)-Calderón–Zygmund kernel as e.g. in [17] but with a constant depending on the fixed functions \(f_1^2, \ldots , f_{n+1}^2\). For example, this means that the size estimate takes the form

$$\begin{aligned} |K_{(f_j^2)}(x_{n+1}^1, x_1^1, \ldots , x_n^1)| \le C(f_1^2, \ldots , f_{n+1}^2) \frac{1}{\Big (\sum _{j=1}^{n} |x_{n+1}^1-x_j^1|\Big )^{d_1n}}. \end{aligned}$$

The continuity estimates are analogous.

We assume the following T1 type control on the constant \(C(f_1^2, \ldots , f_{n+1}^2)\). We have

$$\begin{aligned} C(1_{I^2}, \ldots , 1_{I^2}) \lesssim |I^2| \end{aligned}$$
(3.3)

and

$$\begin{aligned} C(a_{I^2}, 1_{I^2}, \ldots , 1_{I^2}) + C(1_{I^2}, a_{I^2}, 1_{I^2}, \ldots , 1_{I^2}) + \cdots + C(1_{I^2}, \ldots , 1_{I^2}, a_{I^2}) \lesssim |I^2| \end{aligned}$$

for all cubes \(I^2 \subset {\mathbb {R}}^{d_2}\) and all functions \(a_{I^2}\) satisfying \(a_{I^2} = 1_{I^2}a_{I^2}\), \(|a_{I^2}| \le 1\) and \(\int a_{I^2} = 0\).

Analogous partial kernel representation on the second parameter is assumed when \({\text {spt}}\, f_{j_1}^2 \cap {\text {spt}}\, f_{j_2}^2 = \emptyset \) for some \(j_1, j_2\).

Definition 3.4

If T is an n-linear operator with full and partial kernel representations as defined above, we call T an n-linear bi-parameter \((\omega _1, \omega _2)\)-SIO.

Bi-parameter CZOs We say that T satisfies the weak boundedness property if

$$\begin{aligned} |\langle T(1_R, \ldots , 1_R), 1_R \rangle | \lesssim |R| \end{aligned}$$
(3.5)

for all rectangles \(R = I^1 \times I^2 \subset {\mathbb {R}}^{d} = {\mathbb {R}}^{d_1} \times {\mathbb {R}}^{d_2}\).

An SIO T satisfies the diagonal BMO assumption if the following holds. For all rectangles \(R = I^1 \times I^2 \subset {\mathbb {R}}^{d} = {\mathbb {R}}^{d_1} \times {\mathbb {R}}^{d_2}\) and functions \(a_{I^i}\) with \(a_{I^i} = 1_{I^i}a_{I^i}\), \(|a_{I^i}| \le 1\) and \(\int a_{I^i} = 0\) we have

$$\begin{aligned} |\langle T(a_{I^1} \otimes 1_{I^2}, 1_R, \ldots , 1_R), 1_R \rangle | + \cdots + |\langle T(1_R, \ldots , 1_R), a_{I^1} \otimes 1_{I^2} \rangle | \lesssim |R| \end{aligned}$$
(3.6)

and

$$\begin{aligned} |\langle T(1_{I^1} \otimes a_{I^2}, 1_R, \ldots , 1_R), 1_R \rangle | + \cdots + |\langle T(1_R, \ldots , 1_R), 1_{I^1} \otimes a_{I^2} \rangle | \lesssim |R|. \end{aligned}$$

The product \({\text {BMO}}\) space is originally by Chang and Fefferman [5, 6], and it is the right bi-parameter \({\text {BMO}}\) space for many considerations. An SIO T satisfies the product BMO assumption if it holds

$$\begin{aligned} S1 \in {\text {BMO}}_{\text {prod}} \end{aligned}$$

for all the \((n+1)^2\) adjoints \(S = T^{j_1*, j_2*}_{1,2}\). Here \(S1:= S(1, \dots , 1)\). This can be interpreted in the sense that

$$\begin{aligned} \Vert S1 \Vert _{{\text {BMO}}_{{\text {prod}}}}= & {} \sup _{{\mathcal {D}}= {\mathcal {D}}^1 \times {\mathcal {D}}^2} \sup _{\Omega } \left( \frac{1}{|\Omega |} \sum _{ \begin{array}{c} R = I^1 \times I^2 \in {\mathcal {D}}\\ R \subset \Omega \end{array}} |\langle S1, h_R \rangle |^2 \right) ^{1/2}< \infty ,\\ \end{aligned}$$

where \(h_R = h_{I^1} \otimes h_{I^2}\), the supremum is over all dyadic grids \({\mathcal {D}}^i\) on \({\mathbb {R}}^{d_i}\) and open sets \(\Omega \subset {\mathbb {R}}^d = {\mathbb {R}}^{d_1} \times {\mathbb {R}}^{d_2}\) with \(0< |\Omega | < \infty \), and the pairings \(\langle S1, h_R\rangle \) can be defined, in a natural way, using the kernel representations.

Definition 3.7

An n-linear bi-parameter \((\omega _1, \omega _2)\)-SIO T satisfying the weak boundedness property, the diagonal BMO assumption and the product BMO assumption is called an n-linear bi-parameter \((\omega _1, \omega _2)\)-Calderón–Zygmund operator (\((\omega _1, \omega _2)\)-CZO).

Bi-parameter Model Operators For hybrid operators we will use suggestive notation, such as, \((S\pi )_i\) to denote a bi-parameter operator that behaves like an ordinary n-linear shift \(S_i\) on the first parameter and like an n-linear paraproduct \(\pi \) on the second—but this is just notation and our operators are not of tensor product form.

Shifts Let \(i=(i_1, \dots , i_{n+1})\), where \(i_j = (i_j^1, i_j^2) \in \{0,1,\ldots \}^2\). An n-linear bi-parameter shift \(S_i\) takes the form

$$\begin{aligned} \langle S_i(f_1, \ldots , f_n), f_{n+1}\rangle = \sum _{K} \sum _{\begin{array}{c} R_1, \ldots , R_{n+1} \\ R_j^{(i_j)} = K \end{array}} a_{K, (R_j)} \prod _{j=1}^{n+1} \langle f_j, {\widetilde{h}}_{R_j} \rangle . \end{aligned}$$

Here \(K, R_1, \ldots , R_{n+1} \in {\mathcal {D}}= {\mathcal {D}}^1 \times {\mathcal {D}}^2\), \(R_j = I_j^1 \times I_j^2\), \(R_j^{(i_j)} := (I_j^1)^{(i_j^1)} \times (I_j^2)^{(i_j^2)}\) and \({\widetilde{h}}_{R_j} = {\widetilde{h}}_{I_j^1} \otimes {\widetilde{h}}_{I_j^2}\). Here we assume that for \(m \in \{1,2\}\) there exist two indices \(j_0,j_1 \in \{1, \ldots , n+1\}\), \(j_0 \not =j_1\), so that \({\widetilde{h}}_{I_{j_0}^m}=h_{I_{j_0}^m}\), \({\widetilde{h}}_{I_{j_1}^m}=h_{I_{j_1}^m}\) and for the remaining indices \(j \not \in \{j_0, j_1\}\) we have \({\widetilde{h}}_{I_j^m} \in \{h_{I_j^m}^0, h_{I_j^m}\}\). Moreover, \(a_{K,(R_j)} = a_{K, R_1, \ldots ,R_{n+1}}\) is a scalar satisfying the normalization

$$\begin{aligned} |a_{K,(R_j)}| \le \frac{\prod _{j=1}^{n+1} |R_j|^{1/2}}{|K|^{n}}. \end{aligned}$$
(3.8)

We continue to define modified shifts—they are important for the weak kernel regularity. Let

$$\begin{aligned} A_{R_1, \ldots , R_{n+1}}^{j_1, j_2}(f_1, \ldots , f_{n+1}) = A_{R_1, \ldots , R_{n+1}}^{j_1, j_2} := \prod _{j=1}^{n+1} \langle f_j, {\widetilde{h}}_{R_j} \rangle , \end{aligned}$$
(3.9)

where \({\widetilde{h}}_{R_j} = {\widetilde{h}}_{I_j^1} \otimes {\widetilde{h}}_{I_j^2}\), \({\widetilde{h}}_{I_{j_1}^1} = h_{I_{j_1}^1}\), \({\widetilde{h}}_{I_{j}^1} = h_{I_{j}^1}^0\), \(j \ne j_1\), \({\widetilde{h}}_{I_{j_2}^2} = h_{I_{j_2}^2}\), \({\widetilde{h}}_{I_{j}^2} = h_{I_{j}^2}^0\), \(j \ne j_2\). A modified n-linear bi-parameter shift \(Q_k\), \(k = (k_1, k_2)\), takes the form

$$\begin{aligned} \begin{aligned} \langle Q_{k}(f_1, \ldots , f_n), f_{n+1}\rangle&= \sum _{K} \sum _{\begin{array}{c} R_1, \ldots , R_{n+1} \\ R_j^{(k)} = K \end{array}} a_{K, (R_j)} \big [A_{R_1, \ldots , R_{n+1}}^{j_1, j_2} - A_{I_{j_1}^1 \times I_1^2, \ldots , I_{j_1}^1 \times I_{n+1}^2}^{j_1, j_2} \\&\quad -A_{I_1^1 \times I_{j_2}^2, \ldots , I_{n+1}^1 \times I_{j_2}^2}^{j_1, j_2} + A_{I_{j_1}^1 \times I_{j_2}^2, \ldots , I_{j_1}^1 \times I_{j_2}^2}^{j_1, j_2}\big ] \end{aligned} \end{aligned}$$

for some \(j_1, j_2\). Moreover, \(a_{K,(R_j)} = a_{K, R_1, \ldots ,R_{n+1}}\) is a scalar satisfying the usual normalization (3.8).

We now define the hybrid operators that behave like a modified shift in one of the parameters and like a standard shift in the other. A modified/standard n-linear bi-parameter shift \((QS)_{k, i}\), \(i = (i_1, \ldots , i_{n+1})\), \(k, i_j \in \{0, 1, \ldots \}\), takes the form

$$\begin{aligned} \begin{aligned}&\langle (QS)_{k,i}(f_1, \ldots , f_n), f_{n+1}\rangle \\&\quad = \sum _{K} \sum _{\begin{array}{c} R_1, \ldots , R_{n+1} \\ R_j^{(k, i_j)} = K \end{array}} a_{K, (R_j)} \left[ \prod _{j=1}^{n+1} \langle f_j, {\widetilde{h}}_{R_j} \rangle - \prod _{j=1}^{n+1} \langle f_j, {\widetilde{h}}_{I_{j_0}^1 \times I_j^2} \rangle \right] \end{aligned} \end{aligned}$$

for some \(j_0\). Here we assume that \({\widetilde{h}}_{I_{j_0}^1} = h_{I_{j_0}^1}\), \({\widetilde{h}}_{I_{j}^1} = h_{I_{j}^1}^0\) for \(j \ne j_0\), and that there exist two indices \(j_1,j_2 \in \{1, \ldots , n+1\}\), \(j_1 \not =j_2\), so that \({\widetilde{h}}_{I_{j_1}^2}=h_{I_{j_1}^2}\), \({\widetilde{h}}_{I_{j_2}^2}=h_{I_{j_2}^2}\) and for the remaining indices \(j \not \in \{j_1, j_2\}\) we have \({\widetilde{h}}_{I_j^2} \in \{h_{I_j^2}^0, h_{I_j^2}\}\). Moreover, \(a_{K,(R_j)} = a_{K, R_1, \ldots ,R_{n+1}}\) is a scalar satisfying the usual normalization (3.8). Of course, \((SQ)_{i,k}\) is defined symmetrically.

Partial Paraproducts Partial paraproducts are hybrids of \(\pi \) and S or \(\pi \) and Q.

Let \(i=(i_1, \dots , i_{n+1})\), where \(i_j \in \{0,1,\ldots \}\). An n-linear bi-parameter partial paraproduct \((S\pi )_i\) with the paraproduct component on \({\mathbb {R}}^{d_2}\) takes the form

$$\begin{aligned} \langle (S\pi )_i(f_1, \ldots , f_n), f_{n+1} \rangle = \sum _{K = K^1 \times K^2} \sum _{\begin{array}{c} I^1_1, \ldots , I_{n+1}^1 \\ (I_j^1)^{(i_j)} = K^1 \end{array}} a_{K, (I_j^1)} \prod _{j=1}^{n+1} \langle f_j, {\widetilde{h}}_{I_j^1} \otimes u_{j, K^2} \rangle , \end{aligned}$$
(3.10)

where the functions \({\widetilde{h}}_{I_j^1}\) and \(u_{j, K^2}\) satisfy the following. There are \(j_0,j_1 \in \{1, \ldots , n+1\}\), \(j_0 \not =j_1\), so that \({\widetilde{h}}_{I_{j_0}^1}=h_{I_{j_0}^1}\), \({\widetilde{h}}_{I_{j_1}^1}=h_{I_{j_1}^1}\) and for the remaining indices \(j \not \in \{j_0, j_1\}\) we have \({\widetilde{h}}_{I_j^1} \in \{h_{I_j^1}^0, h_{I_j^1}\}\). There is \(j_2 \in \{1, \ldots , n+1\}\) so that \(u_{j_2, K^2} = h_{K^2}\) and for the remaining indices \(j \ne j_2\) we have \(u_{j, K^2} = \frac{1_{K^2}}{|K^2|}\). Moreover, the coefficients are assumed to satisfy

$$\begin{aligned} \Vert (a_{K, (I_j^1)})_{K_2} \Vert _{{\text {BMO}}} \le \frac{\prod _{j=1}^{n+1} |I_j^1|^{1/2}}{|K^1|^{n}}. \end{aligned}$$

Of course, \((\pi S)_i\) is defined symmetrically.

A modified n-linear partial paraproduct \((Q\pi )_{k}\) with the paraproduct component on \({\mathbb {R}}^{d_2}\) takes the form

$$\begin{aligned} \begin{aligned}&\langle (Q\pi )_k(f_1, \ldots , f_n), f_{n+1} \rangle \\&\quad = \sum _{K = K^1 \times K^2} \sum _{\begin{array}{c} I^1_1, \ldots , I_{n+1}^1 \\ (I_j^1)^{(k)} = K^1 \end{array}} a_{K, (I_j^1)} \left[ \prod _{j=1}^{n+1} \langle f_j, {\widetilde{h}}_{I_j^1} \otimes u_{j, K^2} \rangle - \prod _{j=1}^{n+1} \langle f_j, {\widetilde{h}}_{I_{j_0}^1} \otimes u_{j, K^2} \rangle \right] \end{aligned} \end{aligned}$$

for some \(j_0\)—here \({\widetilde{h}}_{I_{j_0}^1} = h_{I_{j_0}^1}\), \({\widetilde{h}}_{I_{j}^1} = h_{I_{j}^1}^0\) for \(j \ne j_0\) and \(u_{j, K^2}\) are like in (3.10). The constants satisfy the same normalization.

Full Paraproducts An n-linear bi-parameter full paraproduct \(\Pi \) takes the form

$$\begin{aligned} \langle \Pi (f_1, \ldots , f_n) , f_{n+1} \rangle = \sum _{K = K^1 \times K^2} a_{K} \prod _{j=1}^{n+1} \langle f_j, u_{j, K^1} \otimes u_{j, K^2} \rangle , \end{aligned}$$

where the functions \(u_{j, K^1}\) and \(u_{j, K^2}\) are like in (3.10). The coefficients are assumed to satisfy

$$\begin{aligned} \Vert (a_{K} ) \Vert _{{\text {BMO}}_{{\text {prod}}}} = \sup _{\Omega } \left( \frac{1}{|\Omega |} \sum _{K\subset \Omega } |a_{K}|^2 \right) ^{1/2} \le 1, \end{aligned}$$

where the supremum is over open sets \(\Omega \subset {\mathbb {R}}^d = {\mathbb {R}}^{d_1} \times {\mathbb {R}}^{d_2}\) with \(0< |\Omega | < \infty \).

Comparison to the Usual Model Operators The modified model operators can be written as suitable sums of the standard operators. This is practical when one is willing to lose \(\frac{1}{2}\) of kernel regularity or if some estimates are too difficult to carry out for the more complicated modified operators. However, some regularity is always lost if this decomposition is used, so it is preferable to make do without it. To communicate the gist we only give the following formulation.

Lemma 3.11

Let \(Q_k\), \(k = (k_1, k_2)\), be a modified n-linear bi-parameter shift. Then

$$\begin{aligned} Q_{k} =C\sum _{u=1}^{c} \sum _{i_1=0}^{k_1-1} \sum _{i_2=0}^{k_2-1} S^{u,i_1,i_2}, \end{aligned}$$

where each \(S = S^{u,i_1,i_2}\) is a standard n-linear bi-parameter shift of complexity \(i^m_{S, j}\), \(j \in \{1, \ldots , n+1\}\), \(m \in \{1,2\}\), satisfying

$$\begin{aligned} i^{m}_{S, j} \le k_m. \end{aligned}$$

Similarly, a modified/standard shift can be represented using standard shifts and a modified partial paraproduct can be represented using standard partial paraproducts.

Proof

For notational convenience we consider a shift \(Q_k\) of the particular form

$$\begin{aligned} \begin{aligned} \langle Q_{k}(f_1, \ldots , f_n), f_{n+1}\rangle&= \sum _{K} \sum _{\begin{array}{c} R_1, \ldots , R_{n+1} \\ R_j^{(k)} = K \end{array}} a_{K, (R_j)} \left[ A_{R_1, \ldots , R_{n+1}}^{n+1, n+1} - A_{I_{n+1}^1 \times I_1^2, \ldots , I_{n+1}^1 \times I_{n+1}^2}^{n+1,n+1} \right. \\&\left. \quad -A_{I_1^1 \times I_{n+1}^2, \ldots , I_{n+1}^1 \times I_{n+1}^2}^{n+1,n+1} + A_{I_{n+1}^1 \times I_{n+1}^2, \ldots , I_{n+1}^1 \times I_{n+1}^2}^{n+1,n+1}\right] . \end{aligned} \end{aligned}$$
(3.12)

There is no essential difference in the general case.

We define

$$\begin{aligned} b_{K, (R_j)} =|R_1|^{n/2}a_{K, (R_j)} \end{aligned}$$

and

$$\begin{aligned} B_{R_1, \ldots , R_{n+1}}^{n+1, n+1} =\prod _{j=1}^n \langle f_j \rangle _{R_j} \langle f_{n+1}, h_{R_{n+1}} \rangle . \end{aligned}$$

We can write the shift with these similarly as in (3.12) just by replacing a with b and A with B.

For the moment we define the following shorthand. For a cube I and integers \(l,j_0 \in \{1,2, \dots \}\) we define

$$\begin{aligned} D_{I,l}(j,j_0)= {\left\{ \begin{array}{ll} E_I, \quad &{}\text {if } j \in \{1, \dots , j_0-1\}, \\ P_{I,l-1}, \quad &{}\text {if } j=j_0, \\ {\text {id}}, \quad &{}\text {if } j \in \{j_0+1,j_0+2, \dots \}, \end{array}\right. } \end{aligned}$$
(3.13)

where \({\text {id}}\) denotes the identity operator.

Let \(R_1, \dots , R_{n+1}\) be as in the summation of \(Q_k\). We use the above notation in both parameters, and we denote this, as usual, with superscripts \(D^1_{I,l}(j,j_0)\) and \(D^2_{I,l}(j,j_0)\). With some work (we omit the details) it can be shown that

$$\begin{aligned} \begin{aligned} B_{R_1, \ldots , R_{n+1}}^{n+1, n+1}&=\sum _{m_1,m_2=1}^{n+1} \prod _{j=1}^{n} \langle D^1_{K^1,k_1}(j,m_1)D^2_{K^2,k_2}(j,m_2)f_j \rangle _{R_j} \langle f_{n+1}, h_{R_{n+1}} \rangle , \end{aligned} \end{aligned}$$

which gives that

$$\begin{aligned} \sum _{K} \sum _{\begin{array}{c} R_1, \ldots , R_{n+1} \\ R_j^{(k)} = K \end{array}}B_{R_1, \ldots , R_{n+1}}^{n+1, n+1} =:\sum _{m_1,m_2=1}^{n+1}\Sigma _{m_1,m_2}^1. \end{aligned}$$

Also, we have that

$$\begin{aligned} B_{I_{n+1}^1 \times I_1^2, \ldots , I_{n+1}^1 \times I_{n+1}^2}^{n+1,n+1} =\sum _{m_2=1}^{n+1} \prod _{j=1}^n \langle D^2_{K^2,k_2}(j,m_2)f_j \rangle _{I^1_{n+1} \times I^2_j} \langle f_{n+1}, h_{R_{n+1}} \rangle \end{aligned}$$

and

$$\begin{aligned} B_{I_1^1 \times I_{n+1}^2, \ldots , I_{n+1}^1 \times I_{n+1}^2}^{n+1,n+1} =\sum _{m_1=1}^{n+1} \prod _{j=1}^n \langle D^1_{K^1,k_1}(j,m_1)f_j \rangle _{I^1_{j} \times I^2_{n+1}} \langle f_{n+1}, h_{R_{n+1}} \rangle , \end{aligned}$$

which gives that

$$\begin{aligned} \sum _{K} \sum _{\begin{array}{c} R_1, \ldots , R_{n+1} \\ R_j^{(k)} = K \end{array}} B_{I_{n+1}^1 \times I_1^2, \ldots , I_{n+1}^1 \times I_{n+1}^2}^{n+1,n+1} =: \sum _{m_2=1}^{n+1} \Sigma _{m_2}^2 \end{aligned}$$

and

$$\begin{aligned} \sum _{K} \sum _{\begin{array}{c} R_1, \ldots , R_{n+1} \\ R_j^{(k)} = K \end{array}} B_{I_1^1 \times I_{n+1}^2, \ldots , I_{n+1}^1 \times I_{n+1}^2}^{n+1,n+1} =: \sum _{m_1=1}^{n+1} \Sigma ^3_{m_1}. \end{aligned}$$

Finally, we write that

$$\begin{aligned} \sum _{K} \sum _{\begin{array}{c} R_1, \ldots , R_{n+1} \\ R_j^{(k)} = K \end{array}} B_{I_{n+1}^1 \times I_{n+1}^2, \ldots , I_{n+1}^1 \times I_{n+1}^2}^{n+1,n+1}=: \Sigma ^4. \end{aligned}$$

Using the above decompositions we have the identity

$$\begin{aligned} \begin{aligned} \langle Q_{k}(f_1, \ldots , f_n), f_{n+1}\rangle&= \sum _{m_1,m_2=1}^n \Sigma ^1_{m_1,m_2} + \sum _{m_2=1}^n (\Sigma ^1_{n+1,m_2}-\Sigma ^2_{m_2})\\&\quad +\sum _{m_1=1}^n (\Sigma ^1_{m_1,n+1}-\Sigma ^3_{m_1}) \!+\! (\Sigma ^1_{n+1,n+1}-\Sigma ^2_{n+1}-\Sigma ^3_{n+1}\!+\!\Sigma ^4). \end{aligned} \end{aligned}$$

The terms \(\Sigma ^1_{m_1,m_2}\) with \(m_1,m_2 \in \{1, \dots , n\}\) and the terms inside the parentheses will be written as sums of standard shifts.

First, we take one \(\Sigma ^1_{m_1,m_2}\) with \(m_1,m_2 \in \{1, \dots , n\}\). For convenience of notation we choose the case \(m_1=m_2=:m\). Recall that

$$\begin{aligned} \Sigma ^1_{m,m} =\sum _{K} \sum _{\begin{array}{c} R_1, \ldots , R_{n+1} \\ R_j^{(k)} = K \end{array}} b_{K,(R_j)} \prod _{j=1}^{m-1} \langle f_j \rangle _K \langle P_{K,(k_1-1,k_2-1)} f_m \rangle _{R_m} \prod _{j=m+1}^n \langle f_j \rangle _{R_j} \langle f_{n+1}, h_{R_{n+1}} \rangle . \end{aligned}$$

Expanding

$$\begin{aligned} \langle P_{K,(k_1-1,k_2-1)} f_m \rangle _{R_m} = \sum _{i_1=0}^{k_1-1} \sum _{i_2=0}^{k_2-1}\sum _{L^{(i_1,i_2)}=K} \langle f_m , h_L \rangle \langle h_L \rangle _{R_m} \end{aligned}$$

there holds that

$$\begin{aligned} \begin{aligned} \Sigma ^1_{m,m}&=\sum _{i_1=0}^{k_1-1} \sum _{i_2=0}^{k_2-1} \sum _{K} \sum _{L^{(i_1,i_2)}=K} \sum _{\begin{array}{c} R_{m+1}, \ldots , R_{n+1} \\ R_j^{(k)} = K \end{array}} \left( \sum _{\begin{array}{c} R_1, \ldots , R_{m-1} \\ R_j^{(k)} = K \end{array}}\sum _{\begin{array}{c} R_m \subset L \\ R_m^{(k)}=K \end{array}} \frac{b_{K,(R_j)} \langle h_{L} \rangle _{R_m} }{|K|^{(m-1)/2} |R_{n+1}|^{(n-m)/2} }\right) \\&\quad \prod _{j=1}^{m-1} \langle f_j, h^0_K \rangle \langle f_m , h_{L} \rangle \prod _{j=m+1}^n \langle f_j, h_{R_j}^0 \rangle \langle f_{n+1}, h_{R_{n+1}} \rangle . \end{aligned} \end{aligned}$$

Since

$$\begin{aligned} \left| \sum _{\begin{array}{c} R_1, \ldots , R_{m-1} \\ R_j^{(k)} = K \end{array}}\sum _{\begin{array}{c} R_m \subset L \\ R_m^{(k)}=K \end{array}} \frac{b_{K,(R_j)} \langle h_{L} \rangle _{R_m} }{|K|^{(m-1)/2} |R_{n+1}|^{(n-m)/2} }\right| \le \frac{|K|^{(m-1)/2} |L|^{1/2} |R_{n+1}|^{(n-m+1)/2}}{|K|^n}, \end{aligned}$$

we see that

$$\begin{aligned} \Sigma ^1_{m,m} =\sum _{i_1=0}^{k_1-1} \sum _{i_2=0}^{k_2-1} \langle S_{(0, \dots , 0,(i_1,i_2), k, \dots , k)}(f_1, \dots , f_n),f_{n+1} \rangle , \end{aligned}$$

where \(S_{(0, \dots , 0,(i_1,i_2), k, \dots , k)}\) is a standard n-linear bi-parameter shift. The case of general \(m_1, m_2\) is analogous.

We turn to the terms \(\Sigma ^1_{n+1,m_2}-\Sigma ^2_{m_2}\). The terms \(\Sigma ^1_{m_1,n+1}-\Sigma ^3_{m_1}\) are symmetrical. Let \(m_2 \in \{1, \dots , n\}\). After expanding \(P^2_{K^2,k_2-1}\) in the slot \(m_2\) we have that \(\Sigma ^1_{n+1,m_2}-\Sigma ^2_{m_2}\) can be written as

$$\begin{aligned} \begin{aligned}&\sum _{i_2=0}^{k_2-1} \sum _{K} \sum _{(L^2)^{(i_2)}=K^2} \sum _{\begin{array}{c} R_1, \ldots , R_{n+1} \\ R_j^{(k)} = K \end{array}} b_{K,(R_j)} \langle h_{L^2} \rangle _{I^2_{m_2}}\\&\qquad \left[ \prod _{j=1}^{m_2-1} \langle f_j \rangle _{K} \Big \langle f_{m_2}, \frac{1_{K^1}}{|K_1|} \otimes h_{L^2} \Big \rangle \prod _{j=m_2+1}^n \langle f_j \rangle _{K^1 \times I^2_j} \right. \\&\left. \quad -\prod _{j=1}^{m_2-1} \langle f_j \rangle _{I^1_{n+1} \times K^2} \Big \langle f_{m_2}, \frac{1_{I^1_{n+1}}}{|I^1_{n+1}|} \otimes h_{L^2} \Big \rangle \prod _{j=m_2+1}^n \langle f_j \rangle _{I^1_{n+1} \times I^2_j} \right] \langle f_{n+1}, h_{R_{n+1}} \rangle . \end{aligned} \end{aligned}$$

This splits the difference \(\Sigma ^1_{n+1,m_2}-\Sigma ^2_{m_2}\) as

$$\begin{aligned} \Sigma ^1_{n+1,m_2}-\Sigma ^2_{m_2} =:\sum _{i_2=0}^{k_2-1} \Sigma ^{1,2}_{m_2,i_2}. \end{aligned}$$

We fix one \(i_2\) at this point.

Let \(g_j^{m_2}:=g_j= \langle f_j \rangle ^2_{K^2}\) for \(j \in \{1, \dots , m_2-1\}\), \(g_{m_2}^{m_2}:=g_{m_2}= \langle f_{m_2}, h_{L^2} \rangle _2 \) and \(g_j^{m_2}:=g_j= \langle f_j \rangle ^2_{I^2_j}\) for \(j \in \{m_2+1, \dots , n\}\). Using this notation we have that the term inside the brackets is \( \prod _{j=1}^n \langle g_j \rangle _{K^1}-\prod _{j=1}^n \langle g_j \rangle _{I^1_{n+1}}. \) We write that

$$\begin{aligned} \prod _{j=1}^n \langle g_j \rangle _{K^1}-\prod _{j=1}^n \langle g_j \rangle _{I^1_{n+1}} =-\sum _{i_1=0}^{k_1-1}\left( \prod _{j=1}^n \langle g_j \rangle _{(I^1_{n+1})^{(i_1)}}-\prod _{j=1}^n \langle g_j \rangle _{(I^1_{n+1})^{(i_1+1)}}\right) . \end{aligned}$$

Then, we write \(\prod _{j=1}^n \langle g_j \rangle _{(I^1_{n+1})^{(i_1)}}-\prod _{j=1}^n \langle g_j \rangle _{(I^1_{n+1})^{(i_1+1)}}\) as the sum

$$\begin{aligned} \sum _{m_1=1}^n \prod _{j=1}^{m_1-1} \langle g_j \rangle _{(I^1_{n+1})^{(i_1+1)}} \langle \Delta _{(I^1_{n+1})^{(i_1+1)}} g_{m_1}\rangle _{I^1_{n+1}} \prod _{j=m_1+1}^n \langle g_j \rangle _{(I^1_{n+1})^{(i_1)}}. \end{aligned}$$

Expanding

$$\begin{aligned} \langle \Delta _{(I^1_{n+1})^{(i_1+1)}} g_{m_1}\rangle _{I^1_{n+1}} =\langle g_{m_1}, h_{(I^1_{n+1})^{(i_1+1)}} \rangle \langle h_{(I^1_{n+1})^{(i_1+1)}} \rangle _{I^1_{n+1}} \end{aligned}$$

we get that \(\prod _{j=1}^n \langle g_j \rangle _{K^1}-\prod _{j=1}^n \langle g_j \rangle _{I^1_{n+1}}\) equals

$$\begin{aligned} -\sum _{i_1=0}^{k_1-1} \sum _{m_1=1}^n \prod _{j=1}^{m_1-1} \langle g_j \rangle _{(I^1_{n+1})^{(i_1+1)}} \langle g_{m_1}, h_{(I^1_{n+1})^{(i_1+1)}} \rangle \langle h_{(I^1_{n+1})^{(i_1+1)}} \rangle _{I^1_{n+1}} \prod _{j=m_1+1}^n \langle g_j \rangle _{(I^1_{n+1})^{(i_1)}}. \end{aligned}$$

This identity splits \(\Sigma ^{1,2}_{m_2,i_2}\) further as \(\Sigma ^{1,2}_{m_2,i_2} =: -\sum _{i_1=0}^{k_1-1} \sum _{m_1=1}^n \Sigma ^{1,2}_{m_1,m_2,i_1,i_2}\).

We fix some \(m_1\) and \(i_1\) and consider the corresponding term. For convenience of notation we look at the case \(m_1=m_2=:m\). There holds that

$$\begin{aligned} \begin{aligned} \Sigma ^{1,2}_{m,m,i_1,i_2}&=\sum _{K} \sum _{(L^2)^{(i_2)}=K^2} \sum _{\begin{array}{c} R_1, \ldots , R_{n+1} \\ R_j^{(k)} = K \end{array}} b_{K,(R_j)} \langle h_{(I^1_{n+1})^{(i_1+1)} \times L^2} \rangle _{I^1_{n+1}\times I^2_{m}} \\&\quad \prod _{j=1}^{m-1} \langle f_j \rangle _{(I^1_{n+1})^{(i_1+1)} \times K^2} \Big \langle f_{m}, h_{(I^1_{n+1})^{(i_1+1)} \times L^2} \Big \rangle \\&\quad \prod _{j=m+1}^n \langle f_j \rangle _{(I^1_{n+1})^{(i_1)} \times I^2_j} \langle f_{n+1}, h_{R_{n+1}} \rangle . \end{aligned} \end{aligned}$$

This is seen as a standard shift once we reorganize the summation and verify the normalization. We take \((I^1_{n+1})^{(i_1+1)}\) as the new “top cube” in the first parameter (\((I^1_{n+1})^{(i_1+1)}\) corresponds to \((L^1)^{(1)}\) in the summation below). There holds that \( \Sigma ^{1,2}_{m,m,i_1,i_2} \) equals

$$\begin{aligned} \begin{aligned}&\sum _{K^1}\sum _{(L^1)^{(k_1-i_1)}=K^1} \sum _{(I_{n+1}^1)^{(i_1)}=L^1} \sum _{K^2} \sum _{(L^2)^{(i_2)}=K^2} \sum _{\begin{array}{c} I^2_{m+1}, \dots , I^2_{n+1} \\ (I^2_j)^{(k_2)}=K^2 \end{array}} c_{K^1,L^1,I^1_{n+1}, K^2, L^2, I^2_{m+1}, \dots , I^2_{n+1}} \\&\quad \prod _{j=1}^{m-1} \langle f_j \rangle _{(L^1)^{(1)} \times K^2} \Big \langle f_{m}, h_{(L^1)^{(1)} \times L^2} \Big \rangle \prod _{j=m+1}^n \langle f_j \rangle _{L^1 \times I^2_j} \langle f_{n+1}, h_{R_{n+1}} \rangle , \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \begin{aligned}&c_{K^1,L^1,I^1_{n+1}, K^2, L^2, I^2_{m+1}, \dots , I^2_{n+1}} \\&\quad = \sum _{\begin{array}{c} I^1_1, \dots , I^1_{n} \\ (I^1_j)^{(k_1)}=K^1 \end{array}} \sum _{\begin{array}{c} I^2_{1}, \dots , I^2_{m-1} \\ (I^2_j)^{(k_2)}=K^2 \end{array}} \sum _{\begin{array}{c} I^2_{m} \subset L^2 \\ (I^2_{m})^{(k_2)}=K^2 \end{array}} b_{K, (R_j)}\langle h_{(L^1)^{(1)} \times L^2} \rangle _{I^1_{n+1}\times I^2_{m}}. \end{aligned} \end{aligned}$$

We have the estimate

$$\begin{aligned} \begin{aligned}&|c_{K^1,L^1,I^1_{n+1}, K^2, L^2, I^2_{m+1}, \dots , I^2_{n+1}}|\\&\quad \le \frac{|(L^1)^{(1)}|^{n/2}|I^1_{n+1}|^{1/2}}{|(L^1)^{(1)}|^n} \frac{|K^2|^{(m-1)/2} |L^2|^{1/2} |I^2_{n+1}|^{(n-m+1)/2}}{|K^2|^n} \\&\qquad \times |(L^1)^{(1)}|^{(n-1)/2} |K^2|^{(m-1)/2} |I^2|^{(n-m)/2}. \end{aligned} \end{aligned}$$

Notice that the term in the first line in the right hand side is \(2^{d_1(n-m)/2}\) times the right normalization of the shift, since in \(\Sigma ^{1,2}_{m,m,i_1,i_2}\) we have the cubes \(L^1\) related to \(f_j\) with \(j \in \{m+1, \dots , n\}\). Also, the term in the second line is almost cancelled out when one changes the averages in \(\Sigma ^{1,2}_{m,m,i_1,i_2}\) into pairings against non-cancellative Haar functions.

We conclude that for some \(C \ge 1\) we have

$$\begin{aligned} C^{-1}\Sigma ^{1,2}_{m,m,i_1,i_2} =\langle S_{(0, \dots , 0,(0,i_2), (1,k_2), \dots , (1,k_2), (i_1+1,k_2)}(f_1, \dots , f_n),f_{n+1} \rangle , \end{aligned}$$

where S is a standard n-linear bi-parameter shift of the given complexity. The case of general \(m_1, m_2\) is analogous.

Finally, we look at the term \(\Sigma ^1_{n+1,n+1}-\Sigma ^2_{n+1}-\Sigma ^3_{n+1}+\Sigma ^4\) which by definition is

$$\begin{aligned} \begin{aligned}&\sum _{K} \sum _{\begin{array}{c} R_1, \ldots , R_{n+1} \\ R_j^{(k)} = K \end{array}} b_{K,(R_j)} \\&\qquad \left[ \prod _{j=1}^n \langle f_j \rangle _K -\prod _{j=1}^n \langle f_j \rangle _{I^1_{n+1} \times K^2} -\prod _{j=1}^n \langle f_j \rangle _{K^1 \times I^2_{n+1}}\right. \\&\left. \quad \qquad + \prod _{j=1}^n \langle f_j \rangle _{R_{n+1}} \right] \langle f_{n+1}, h_{R_{n+1}} \rangle . \end{aligned} \end{aligned}$$
(3.14)

Consider the rectangles \(K, R_1, \dots , R_{n+1}\) as fixed for the moment. There holds that \(\langle f_j \rangle _K-\prod _{j=1}^n \langle f_j \rangle _{I^1_{n+1} \times K^2}\) equals

$$\begin{aligned} \begin{aligned} -\sum _{i_1=0}^{k_1-1}&\sum _{m_1=1}^n \langle h_{(I^1_{n+1})^{(i_1+1)}}\rangle _{I^1_{n+1}} \\&\quad \prod _{j=1}^{m_1-1} \langle f_j \rangle _{(I^1_{n+1})^{(i_1+1)}\times K^2} \Big \langle f_{m_1}, h_{(I^1_{n+1})^{(i_1+1)}} \otimes \frac{1_{K^2}}{|K^2|} \Big \rangle \prod _{j=m_1+1}^n \langle f_j \rangle _{(I^1_{n+1})^{(i_1)} \times K^2}. \end{aligned} \end{aligned}$$
(3.15)

Similarly, we have that \(-\prod _{j=1}^n \langle f_j \rangle _{K^1 \times I^2_{n+1}} + \prod _{j=1}^n \langle f_j \rangle _{R_{n+1}}\) equals

$$\begin{aligned} \begin{aligned} \sum _{i_1=0}^{k_1-1}&\sum _{m_1=1}^n \langle h_{(I^1_{n+1})^{(i_1+1)}}\rangle _{I^1_{n+1}} \\&\quad \prod _{j=1}^{m_1-1} \langle f_j \rangle _{(I^1_{n+1})^{(i_1+1)}\times I^2_{n+1}} \Big \langle f_{m_1}, h_{(I^1_{n+1})^{(i_1+1)}} \otimes \frac{1_{I^2_{n+1}}}{|I^2_{n+1}|} \Big \rangle \prod _{j=m_1+1}^n \langle f_j \rangle _{(I^1_{n+1})^{(i_1)} \times I^2_{n+1}}.\nonumber \end{aligned}\!\!\!\!\!\!\!\\ \end{aligned}$$
(3.16)

Let \(g^{m_1,i_1}_j= \langle f_j \rangle ^1_{(I^1_{n+1})^{(i_1+1)}}\) for \(j \in \{1, \dots , m_1-1\}\), \(g^{m_1,i_1}_{m_1}= \langle f_{m_1}, h_{(I^1_{n+1})^{(i_1+1)}} \rangle _1\) and \(g^{m_1,i_1}_j= \langle f_j \rangle ^1_{(I^1_{n+1})^{(i_1)}}\) for \(j \in \{m_1+1, \dots ,n\}\). The sum of (3.15) and (3.16) can similarly be split as

$$\begin{aligned} \begin{aligned} \sum _{i_1=0}^{k_1-1} \sum _{i_2=0}^{k_2-1}&\sum _{m_1,m_2=1}^n \langle h_{R_{n+1}^{(i_1+1,i_2+1)} }\rangle _{R_{n+1}} \\&\quad \prod _{j=1}^{m_2-1} \langle g^{m_1,i_1}_j \rangle _{(I^2_{n+1})^{(i_2+1)}} \langle g^{m_1,i_1}_{m_2}, h_{(I^2_{n+1})^{(i_2+1)}} \rangle \prod _{j=m_2+1}^n \langle g^{m_1,i_1}_j \rangle _{(I^2_{n+1})^{(i_2)}}. \end{aligned} \end{aligned}$$
(3.17)

When one recalls the definition of the functions \(g_j^{m_1,i_1}\) and writes this in terms of the functions \(f_j\), one has that in the first parameter \(f_j\) is paired with \(1_{(I_{n+1}^1)^{(i_1+1)}}/|(I_{n+1}^1)^{(i_1+1)}|\) for \(j=1, \dots , m_1-1\), \(f_{m_1}\) with \(h_{(I^1_{n+1})^{(i_1+1)}}\) and \(f_j\) with \(1_{(I_{n+1}^1)^{(i_1)}}/|(I_{n+1}^1)^{(i_1)}|\) for \(j=m_1+1, \dots , n\). Each \(f_j\) is paired similarly in the second parameter. In the case \(m_1=m_2=:m\) the summand in (3.17) can be written as

$$\begin{aligned} \langle h_{R_{n+1}^{(i_1+1,i_2+1)} }\rangle _{R_{n+1}}\prod _{j=1}^{m-1} \langle f_j \rangle _{R_{n+1}^{(i_1+1,i_2+1)} } \langle f_{m_1}, h_{R_{n+1}^{(i_1+1,i_2+1)}} \rangle \prod _{j=m+1}^n \langle f_j \rangle _{R_{n+1}^{(i_1,i_2)}}. \end{aligned}$$
(3.18)

The splitting in (3.17) gives us the identity

$$\begin{aligned} \Sigma ^1_{n+1,n+1}-\Sigma ^2_{n+1}-\Sigma ^3_{n+1}+\Sigma ^4 =: \sum _{i_1=0}^{k_1-1} \sum _{i_2=0}^{k_2-1} \sum _{m_1,m_2=1}^n \Sigma ^{1,2,3,4}_{m_1,m_2,i_1,i_2}. \end{aligned}$$

We fix some \(i_1\) and \(i_2\) and consider the case \(m_1=m_2=:m\). From (3.18) we see that

$$\begin{aligned} \begin{aligned} \Sigma ^{1,2,3,4}_{m,m,i_1,i_2}&= \sum _{K} \sum _{L^{(k_1-i_1, k_2-i_2)}=K} \sum _{R_{n+1}^{(i_1,i_2)}=L} c_{K, L,R_{n+1}}\\&\quad \prod _{j=1}^{m-1} \langle f_j \rangle _{L^{(1,1)}} \langle f_{m_1}, h_{L^{(1,1)}} \rangle \prod _{j=m+1}^n \langle f_j \rangle _{L} \langle f_{n+1}, h_{R_{n+1}} \rangle , \end{aligned} \end{aligned}$$

where

$$\begin{aligned} c_{K,L,R_{n+1}} = \sum _{\begin{array}{c} R_1, \dots , R_n \\ R_j^{(k)}=K \end{array}} b_{K,(R_j)} \langle h_{L^{(1,1)}}\rangle _{R_{n+1}}. \end{aligned}$$

The coefficient satisfies the estimate

$$\begin{aligned} |c_{K,L,R_{n+1}}| \le \frac{|R_{n+1}|^{1/2}}{|L^{(1,1)}|^{1/2}}=\frac{|L^{(1,1)}|^{n/2}|R_{n+1}|^{1/2}}{|L^{(1,1)}|^{n}} |L^{(1,1)}|^{(n-1)/2}. \end{aligned}$$

Thus, we see that \(C^{-1}\Sigma ^{1,2,3,4}_{m,m,i_1,i_2}\) is a standard n-linear bi-parameter shift. The complexity of the shift is \(((0,0), \dots , (0,0),(1,1),\dots ,(1,1), (i_1+1,i_2+1))\) with m zeros. The case of general \(m_1\) and \(m_2\) is analogous. \(\square \)

Bi-parameter Representation Theorem We set

$$\begin{aligned} \sigma = (\sigma _1, \sigma _2) \in (\{0,1\}^{d_1})^{{\mathbb {Z}}} \times (\{0,1\}^{d_2})^{{\mathbb {Z}}}, \qquad \sigma _i = (\sigma ^k_i)_{k \in {\mathbb {Z}}}, \end{aligned}$$

and denote the expectation over the product probability space by

$$\begin{aligned} {\mathbb {E}}_{\sigma } = {\mathbb {E}}_{\sigma _1} {\mathbb {E}}_{\sigma _2} = {\mathbb {E}}_{\sigma _2} {\mathbb {E}}_{\sigma _1} = \iint \,\mathrm {d}{\mathbb {P}}_{\sigma _1} \,\mathrm {d}{\mathbb {P}}_{\sigma _2}. \end{aligned}$$

We also set \({\mathcal {D}}_0 = {\mathcal {D}}^1_0 \times {\mathcal {D}}^2_0\), where \({\mathcal {D}}_0^i\) is the standard dyadic grid of \({\mathbb {R}}^{d_i}\). We use the notation

$$\begin{aligned} I_i + \sigma _i := I_i + \sum _{k:\, 2^{-k} < \ell (I_i)} 2^{-k}\sigma _i^k, \qquad I_i \in {\mathcal {D}}_0^i. \end{aligned}$$

Given \(\sigma = (\sigma _1, \sigma _2)\) and \(R = I_1 \times I_2 \in {\mathcal {D}}_0\) we set

$$\begin{aligned} R + \sigma = (I_1+\sigma _1) \times (I_2+\sigma _2) \qquad \text {and} \qquad {\mathcal {D}}_{\sigma } = \{R + \sigma :\, R \in {\mathcal {D}}_0\} = {\mathcal {D}}_{\sigma _1} \times {\mathcal {D}}_{\sigma _2}. \end{aligned}$$

Theorem 3.19

Suppose that T is an n-linear bi-parameter \((\omega _1, \omega _2)\)-CZO, where \(\omega _i \in {\text {Dini}}_{1/2}\). Then we have

$$\begin{aligned}&\langle T(f_1,\ldots ,f_n), f_{n+1} \rangle \\&\quad = C {\mathbb {E}}_{\sigma } \sum _{k = (k_1, k_2) \in {\mathbb {N}}^2} \sum _{u=0}^{c_{d,n}} \omega _1(2^{-k_1})\omega _2(2^{-k_2}) \langle V_{k,u,\sigma }(f_1,\ldots ,f_n), f_{n+1} \rangle , \end{aligned}$$

where

$$\begin{aligned}&V_{k,u, \sigma } \in \{Q_k, S_{((k_1, k_2), \ldots , (k_1, k_2))}, (QS)_{k_1, (k_2, \ldots , k_2)}, (SQ)_{(k_1, \ldots , k_1), k_2}, \\&(Q\pi )_{k_1}, (\pi Q)_{k_2}, (S\pi )_{(k_1, \ldots k_1)}, (\pi S)_{(k_2, \ldots , k_2)}, \Pi \} \end{aligned}$$

defined in \({\mathcal {D}}_{\sigma }\), and if the operator does not depend on \(k_1\) or \(k_2\) then that particular \(k_i = 0\).

Proof

We decompose

$$\begin{aligned} \begin{aligned}&\langle T(f_1, \ldots , f_n),f_{n+1} \rangle \\&\quad = {\mathbb {E}}_{\sigma } \sum _{R_1, \ldots , R_{n+1} } \langle T(\Delta _{R_1}f_1, \ldots , \Delta _{R_n}f_n),\Delta _{R_{n+1}}f_{n+1} \rangle \\&\quad = \sum _{j_1, j_2 =1}^{n+1} {\mathbb {E}}_{\sigma } \sum _{ \begin{array}{c} R_1, \ldots , R_{n+1} \\ \ell (I_{i_1}^1)> \ell (I_{j_1}^1) \text { for } i_1 \ne j_1 \\ \ell (I_{i_2}^2) > \ell (I_{j_2}^2) \text { for } i_2 \ne j_2 \end{array}} \langle T(\Delta _{R_1}f_1, \ldots , \Delta _{R_n}f_n),\Delta _{R_{n+1}}f_{n+1} \rangle \\&\qquad + {\mathbb {E}}_{\sigma } {\text {Rem}}_{\sigma }, \end{aligned} \end{aligned}$$

where \(R_1, \ldots , R_{n+1} \in {\mathcal {D}}_\sigma = {\mathcal {D}}_{\sigma _1} \times {\mathcal {D}}_{\sigma _2}\) for some \(\sigma = (\sigma _1, \sigma _2)\) and \(R_j = I_j^1 \times I_j^2\).

The Main Terms For \(j_1, j_2\) we let

$$\begin{aligned} \Sigma _{j_1, j_2, \sigma } = \sum _{ \begin{array}{c} R_1, \ldots , R_{n+1} \\ \ell (I_{i_1}^1)> \ell (I_{j_1}^1) \text { for } i_1 \ne j_1 \\ \ell (I_{i_2}^2) > \ell (I_{j_2}^2) \text { for } i_2 \ne j_2 \end{array}} \langle T(\Delta _{R_1}f_1, \ldots , \Delta _{R_n}f_n),\Delta _{R_{n+1}}f_{n+1} \rangle . \end{aligned}$$

These are symmetric and we choose to deal with \(\Sigma _{\sigma } := \Sigma _{n, n+1, \sigma }\). After collapsing the relevant sums we have

$$\begin{aligned} \Sigma _{\sigma } = \sum _{ \begin{array}{c} R_1, \ldots , R_{n+1} \\ \ell (R_1) = \cdots = \ell (R_{n+1}) \end{array}} \langle T(E_{R_1} f_1, \ldots , E_{R_{n-1}} f_{n-1}, \Delta _{I_n^1}^1 E_{I_n^2}^2 f_n), E_{I_{n+1}^1}^1 \Delta _{I_{n+1}^2}^2 f_{n+1} \rangle , \end{aligned}$$

where \(\ell (R_j) := ( \ell (I_j^1), \ell (I_j^2))\) for \(R_j = I_j^1 \times I_j^2\).

For \(R = I^1 \times I^2\) we define

$$\begin{aligned} h_R = h_{I^1} \otimes h_{I^2}, \,\, h_R^0 = h_{I^1}^0 \otimes h_{I^2}^0,\,\, h_R^{1,0} = h_{I^1} \otimes h_{I^2}^0\, \text { and } \, h_R^{0,1} = h_{I^1}^0 \otimes h_{I^2}. \end{aligned}$$

Using this notation we write

$$\begin{aligned} \begin{aligned}&\langle T(E_{R_1} f_1, \ldots , E_{R_{n-1}} f_{n-1}, \Delta _{I_n^1}^1 E_{I_n^2}^2 f_n), E_{I_{n+1}^1}^1 \Delta _{I_{n+1}^2}^2 f_{n+1} \rangle \\&\quad = \langle T( h_{R_1}^0, \ldots , h_{R_{n-1}}^0, h_{R_n}^{1,0}), h_{R_{n+1}}^{0,1} \rangle A_{R_1, \dots , R_{n+1}}^{n,n+1}(f_1, \dots , f_{n+1}), \end{aligned} \end{aligned}$$

where \(A_{R_1, \dots , R_{n+1}}^{n,n+1}(f_1, \dots , f_{n+1})=A_{R_1, \dots , R_{n+1}}^{n,n+1}\) is defined in (3.9).

We have

$$\begin{aligned} A_{R_1, \dots , R_{n+1}}^{n,n+1} =A_{R_1, \dots , R_{n+1}}^{n,n+1} -A_{I_n^1 \times I^2_1, \dots , I^1_n \times I^2_{n+1}}^{n,n+1}+A_{I_n^1 \times I^2_1, \dots , I^1_n \times I^2_{n+1}}^{n,n+1} \end{aligned}$$
(3.20)

and

$$\begin{aligned} A_{I_n^1 \times I^2_1, \dots , I^1_n \times I^2_{n+1}}^{n,n+1} =(A_{I_n^1 \times I^2_1, \dots , I^1_n \times I^2_{n+1}}^{n,n+1} -A_{I_n^1 \times I^2_{n+1}, \dots , I^1_n \times I^2_{n+1}}^{n,n+1}) +A_{I_n^1 \times I^2_{n+1}, \dots , I^1_n \times I^2_{n+1}}^{n,n+1}.\nonumber \\ \end{aligned}$$
(3.21)

Then, we further have that the difference of the first two terms in the right hand side of (3.20) equals

$$\begin{aligned} \begin{aligned}&\left[ A_{R_1, \dots , R_{n+1}}^{n,n+1} -A_{I_n^1 \times I^2_1, \dots , I^1_n \times I^2_{n+1}}^{n,n+1} -A_{I^1_1 \times I^2_{n+1}, \dots , I^1_{n+1} \times I^2_{n+1}}^{n,n+1} +A_{I^1_n \times I^2_{n+1}, \dots , I^1_n \times I^2_{n+1}}^{n,n+1}\right] \\&\quad +\left\{ A_{I^1_1 \times I^2_{n+1}, \dots , I^1_{n+1} \times I^2_{n+1}}^{n,n+1} -A_{I^1_n \times I^2_{n+1}, \dots , I^1_n \times I^2_{n+1}}^{n,n+1}\right\} .\nonumber \end{aligned}\\ \end{aligned}$$
(3.22)

This gives us the decomposition

$$\begin{aligned} A_{R_1, \dots , R_{n+1}}^{n,n+1} = [\, \cdot \,] + \{\, \cdot \, \} + (\, \cdot \,) + A_{I^1_n \times I^2_{n+1}, \dots , I^1_n \times I^2_{n+1}}^{n,n+1}, \end{aligned}$$
(3.23)

where inside the brackets we have the corresponding term as in (3.21) and (3.22).

The identity (3.23) splits \(\Sigma _\sigma \) into four terms \(\Sigma _\sigma =\Sigma _\sigma ^1+\Sigma _\sigma ^2+\Sigma _\sigma ^3+\Sigma _\sigma ^4\).

The Shift Case \(\Sigma _\sigma ^1\) We begin by looking at \(\Sigma _\sigma ^1\), that is, the term coming from \([\, \cdot \,] \) in (3.23). Let us further define the abbreviation

$$\begin{aligned} \begin{aligned}&\varphi _{R_1, \dots , R_{n+1}} :=\langle T( h_{R_1}^0, \ldots , h_{R_{n-1}}^0, h_{R_n}^{1,0}), h_{R_{n+1}}^{0,1} \rangle \\&\quad \times \left[ A_{R_1, \dots , R_{n+1}}^{n,n+1} -A_{I_n^1 \times I^2_1, \dots , I^1_n \times I^2_{n+1}}^{n,n+1} -A_{I^1_1 \times I^2_{n+1}, \dots , I^1_{n+1} \times I^2_{n+1}}^{n,n+1} +A_{I^1_n \times I^2_{n+1}, \dots , I^1_n \times I^2_{n+1}}^{n,n+1}\right] \nonumber \end{aligned}\\ \end{aligned}$$
(3.24)

so that

$$\begin{aligned} \Sigma _{\sigma }^1 = \sum _{ \begin{array}{c} R_1, \ldots , R_{n+1} \\ \ell (R_1) = \cdots = \ell (R_{n+1}) \end{array}}\varphi _{R_1, \dots , R_{n+1}}. \end{aligned}$$

If \(R=I^1\times I^2\) is a rectangle and \(m=(m^1,m^2) \in {\mathbb {Z}}^{d_1} \times {\mathbb {Z}}^{d_2}\), then we define \(I^i \dot{+} m^i:=I^i+m^i\ell (I^i)\) and \(R \dot{+} m:= (I^1\dot{+}m^1)\times (I^2\dot{+}m^2)\). Notice that if \(I^1_i=I^1_j\) for all ij or \(I^2_i=I^2_j\) for all ij then \(\varphi _{R_1, \dots , R_{n+1}}=0\). Thus, there holds that

$$\begin{aligned} \begin{aligned} \Sigma _\sigma ^1&= \sum _{\begin{array}{c} m_1, \dots , m_{n+1} \in {\mathbb {Z}}^{d_1}\times {\mathbb {Z}}^{d_2} \\ (m_1^1, \dots , m_{n+1}^1) \not =0, \ m^1_n=0 \\ (m_1^2, \dots , m_{n+1}^2) \not =0, \ m^2_{n+1}=0 \end{array}} \sum _{R} \varphi _{R \dot{+} m_1, \dots ,R \dot{+} m_{n+1} } \\&=\sum _{k_1,k_2=2}^\infty \sum _{\begin{array}{c} m_1, \dots , m_{n+1} \in {\mathbb {Z}}^{d_1}\times {\mathbb {Z}}^{d_2} \\ \max |m^1_j| \in (2^{k_1-3}, 2^{k_1-2}], \ m^1_n=0 \\ \max |m^2_j| \in (2^{k_2-3}, 2^{k_2-2}], \ m^2_{n+1}=0 \end{array}} \sum _{R} \varphi _{R \dot{+} m_1, \dots ,R \dot{+} m_{n+1} }. \end{aligned} \end{aligned}$$

As in [17] we say that \(I \in {\mathcal {D}}_{\sigma _i}\) is k-good for \(k \ge 2\)—and denote this by \(I \in {\mathcal {D}}_{\sigma _i, {\text {good}}}(k)\)—if \(I \in {\mathcal {D}}_{\sigma _i}\) satisfies

$$\begin{aligned} d(I, \partial I^{(k)}) \ge \frac{\ell (I^{(k)})}{4} = 2^{k-2} \ell (I). \end{aligned}$$
(3.25)

Notice that for all \(I \in {\mathcal {D}}_0^i\) we have

$$\begin{aligned} {\mathbb {P}}( \{ \sigma _i:I + \sigma _i \in {\mathcal {D}}_{\sigma _i, {\text {good}}}(k) \}) = 2^{-d_i}. \end{aligned}$$

Next, we consider \({\mathbb {E}}_\sigma \Sigma ^1_\sigma \) and add goodness to the rectangles R. Recall that \({\mathbb {E}}_\sigma ={\mathbb {E}}_{\sigma _1}{\mathbb {E}}_{\sigma _2}\). We write \({\mathcal {D}}_{\sigma , {\text {good}}}(k_1,k_2):= {\mathcal {D}}_{\sigma _1, {\text {good}}}(k_1) \times {\mathcal {D}}_{\sigma _2, {\text {good}}}(k_2)\). There holds that

$$\begin{aligned} {\mathbb {E}}_\sigma \sum _{R \in {\mathcal {D}}_\sigma } \varphi _{R \dot{+} m_1, \dots ,R \dot{+} m_{n+1} } =2^{d} {\mathbb {E}}_\sigma \sum _{ R \in {\mathcal {D}}_{\sigma ,{\text {good}}}(k_1,k_2)} \varphi _{R \dot{+} m_1, \dots ,R \dot{+} m_{n+1} }. \end{aligned}$$

Therefore, we have shown that

$$\begin{aligned} {\mathbb {E}}_\sigma \Sigma ^1_\sigma =2^{d}C\sum _{k_1,k_2=2}^\infty \omega _1(2^{-k_1})\omega _2(2^{-k_2}) \langle Q_{k_1,k_2}(f_1, \dots , f_n), f_{n+1} \rangle , \end{aligned}$$
(3.26)

where

$$\begin{aligned} \begin{aligned}&\langle Q_{k_1,k_2}(f_1, \dots , f_n), f_{n+1} \rangle \\&\quad := \frac{1}{C\omega _1(2^{-k_1})\omega _2(2^{-k_2})} \sum _{\begin{array}{c} m_1, \dots , m_{n+1} \in {\mathbb {Z}}^{d_1}\times {\mathbb {Z}}^{d_2} \\ \max |m^1_j| \in (2^{k_1-3}, 2^{k_1-2}], \ m^1_n=0 \\ \max |m^2_j| \in (2^{k_2-3}, 2^{k_2-2}], \ m^2_{n+1}=0 \end{array}} \sum _{R \in {\mathcal {D}}_{\sigma ,{\text {good}}}(k_1,k_2)} \varphi _{R \dot{+} m_1, \dots ,R \dot{+} m_{n+1} } \end{aligned} \end{aligned}$$

and C is a large enough constant.

Let \(m_1, \dots , m_{n+1}\) and \(R=I^1\times I^2\) be as in the definition of \( Q_{k_1, k_2}\). The goodness of the rectangle R easily implies (we omit the details, see [17]) that \((R \dot{+} m_j)^{(k_1, k_2)} = R^{(k_1, k_2)} =: K\) for all \(j \in \{1, \ldots , n+1\}\). Recall the definition of \(\varphi _{R \dot{+} m_1, \dots ,R \dot{+} m_{n+1} }\) from (3.24). Therefore, to conclude that \( Q_{k_1,k_2}\) is a modified bi-parameter n-linear shift it remains to prove the normalization

$$\begin{aligned} |\langle T( h_{R \dot{+}m_1}^0, \ldots , h_{R\dot{+}m_{n-1}}^0, h_{R\dot{+}m_n}^{1,0}), h_{R\dot{+}m_{n+1}}^{0,1} \rangle | \lesssim \omega _1(2^{-k_1})\omega _2(2^{-k_2}) \frac{ |R|^{(n+1)/2}}{|K|^n}. \end{aligned}$$
(3.27)

Let us first assume that \(k_1 \sim 1 \sim k_2\). Since \(m^1_i \not =0\) and \(m^2_j \not =0\) for some i and j we may use the full kernel representation of T to have that the left hand side of (3.27) is less than

$$\begin{aligned} \int \nolimits _{{\mathbb {R}}^{(n+1)d}} | K(x_{n+1},x_1, \dots , x_n)| \prod _{j=1}^{n+1} h_{R \dot{+}m_j}^0(x_j) \,\mathrm {d}x. \end{aligned}$$

Applying the size of the kernel K this is further dominated by

$$\begin{aligned} \begin{aligned}&\int \nolimits _{{\mathbb {R}}^{(n+1)d_1}} \frac{1}{\Big (\sum _{j=1}^n |x_{n+1}^1-x_j^1|\Big )^{nd_1}} \prod _{j=1}^{n+1} h_{I^1 \dot{+} m_j^1}^0(x_j^1) \,\mathrm {d}x^1 \\&\quad \times \int \nolimits _{{\mathbb {R}}^{(n+1)d_2}} \frac{1}{\Big (\sum _{j=1}^n |x_{n+1}^2-x_j^2|\Big )^{nd_2}} \prod _{j=1}^{n+1} h_{I^2 \dot{+} m_j^2}^0(x_j^2) \,\mathrm {d}x^2 \lesssim \frac{1}{|I^1|^{(n-1)/2}|I^2|^{(n-1)/2}}. \end{aligned} \end{aligned}$$

Notice that this is the right estimate, since \(\omega _i(2^{-k_i}) \sim 1\) and \(|K|= |R^{(k_1,k_2)}| \sim |R|= |I^1| |I^2|\).

Suppose then that \(k_1\) and \(k_2\) are large enough so that we can use the continuity assumption of the full kernel K. Using the zero integrals of \(h_{I^1}\) and \(h_{I^2}\) there holds that the left hand side of (3.27) equals

$$\begin{aligned}&\Big | \int \nolimits _{{\mathbb {R}}^{(n+1)d}}\Big ( K(x_{n+1},x_1, \dots , x_n)-K(x_{n+1},x_1, \dots , x_{n-1}, (c_{I^1},x^2_n))\nonumber \\&\quad -K((x_{n+1}^1,c_{I^2}),x_1, \dots , x_n)+K((x_{n+1}^1,c_{I^2}),x_1, \dots , x_{n-1}, (c_{I^1},x^2_n))\Big ) \nonumber \\&\quad \times \prod _{j=1}^{n-1} h_{R \dot{+}m_j}^0(x_j)h_{R \dot{+}m_n}^{1,0}(x_n) h_{R\dot{+}m_{n+1}}^{0,1}(x_{n+1}) \,\mathrm {d}x \Big |, \end{aligned}$$
(3.28)

where \(c_{I^i}\) denotes the center of the corresponding cube. Here one can use the continuity assumption of K which leads to a product of two one-parameter integrals which can be easily estimated.

What remains is the case that for example \(k_1 \sim 1\) and \(k_2\) is large. This is done similarly as the above two cases using the mixed size and continuity assumption of K. This concludes the proof of (3.27) and we are done dealing with \({\mathbb {E}}_\sigma \Sigma _\sigma ^1\).

The Partial Paraproduct Cases \(\Sigma _\sigma ^2\) and \(\Sigma _\sigma ^3\) Next, we look at the symmetric terms \({\mathbb {E}}_\sigma \Sigma _\sigma ^2\) and \({\mathbb {E}}_\sigma \Sigma _\sigma ^3\). We explicitly consider \({\mathbb {E}}_\sigma \Sigma _\sigma ^2\) here. Recall that \(\Sigma _\sigma ^2\) equals

$$\begin{aligned}&\sum _{ \begin{array}{c} R_1, \ldots , R_{n+1} \\ \ell (R_1) = \cdots = \ell (R_{n+1}) \end{array}} \langle T( h_{R_1}^0, \ldots , h_{R_{n-1}}^0, h_{R_n}^{1,0}), h_{R_{n+1}}^{0,1} \rangle \\&\qquad \qquad \left\{ A_{I^1_1 \times I^2_{n+1}, \dots , I^1_{n+1} \times I^2_{n+1}}^{n,n+1} -A_{I^1_n \times I^2_{n+1}, \dots , I^1_n \times I^2_{n+1}}^{n,n+1}\right\} . \end{aligned}$$

Since the difference \(A_{I^1_1 \times I^2_{n+1}, \dots , I^1_{n+1} \times I^2_{n+1}}^{n,n+1} -A_{I^1_n \times I^2_{n+1}, \dots , I^1_n \times I^2_{n+1}}^{n,n+1}\) depends only on the cube \(I_{n+1}^2\) in the second parameter we can further rewrite this as

$$\begin{aligned} \begin{aligned} \Sigma _\sigma ^2&=\sum _{ \begin{array}{c} I_1^1, \ldots , I_{n+1}^1, I^2 \\ \ell (I_1^1) = \cdots = \ell (I_{n+1}^1) \end{array}} \langle T( h_{I_1^1}^0\otimes 1, \ldots , h_{I_{n-1}^1}^0\otimes 1, h_{I_n^1}\otimes 1), h_{I^1_{n+1}\times I^2}^{0,1} \rangle \\&\quad \times \Big \{\prod _{j=1}^{n-1} \Big \langle f_j, h^0_{I_j^1} \otimes \frac{1 _{I^2}}{|I^2|} \Big \rangle \cdot \Big \langle f_n, h_{I_n^1} \otimes \frac{1 _{I^2}}{|I^2|} \Big \rangle \langle f_{n+1}, h_{I^1_{n+1}\times I^2}^{0,1}\rangle \\&\quad -\prod _{j=1}^{n-1} \Big \langle f_j, h^0_{I_n^1} \otimes \frac{1 _{I^2}}{|I^2|} \Big \rangle \cdot \Big \langle f_n, h_{I_n^1} \otimes \frac{1 _{I^2}}{|I^2|} \Big \rangle \langle f_{n+1}, h_{I^1_{n}\times I^2}^{0,1}\rangle \Big \}. \end{aligned} \end{aligned}$$
(3.29)

Let us write the summand in (3.29) as \(\varphi _{I_1^1, \dots , I_{n+1}^1,I^2}\). By proceeding in the same way as above with \({\mathbb {E}}_\sigma \Sigma _\sigma ^1\) we have that

$$\begin{aligned} {\mathbb {E}}_\sigma \Sigma _\sigma ^2 =2^{d_1}C{\mathbb {E}}_\sigma \sum _{k=2}^\infty \omega _1(2^{-k}) \langle (Q\pi )_k (f_1, \dots , f_n), f_{n+1} \rangle , \end{aligned}$$
(3.30)

where

$$\begin{aligned} \begin{aligned}&\langle (Q\pi )_k (f_1, \dots , f_n), f_{n+1} \rangle \\&\quad := \frac{1}{C\omega _1(2^{-k})}\sum _{\begin{array}{c} m \in {\mathbb {Z}}^{(n+1)d_1} \\ \max | m_j | \in (2^{k-3}, 2^{k-2} ] \\ m_n=0 \end{array}} \sum _{\begin{array}{c} I^1\in {\mathcal {D}}_{\sigma _1,{\text {good}}}(k) \\ I^2 \in {\mathcal {D}}_{\sigma _2} \end{array}} \varphi _{I^1\dot{+}m_1, \dots , I^1\dot{+}m_{n+1},I^2}. \end{aligned} \end{aligned}$$

The k-goodness of \(I^1\) implies that here \((I^1\dot{+}m_j)^{(k)}=(I^1)^{(k)}=:K^1\) for all j. Therefore, to conclude that \((Q\pi )_k\) is a modified partial paraproduct with the paraproduct component in \({\mathbb {R}}^{d_2}\) it remains to show that if we fix \(m_1, \dots , m_{n+1}\) and \(I^1\) as in the above sum then

$$\begin{aligned} \begin{aligned}&\Vert (\langle T( h_{I^1\dot{+}m_1}^0\otimes 1, \ldots , h_{I^1 \dot{+}m_{n-1}}^0\otimes 1, h_{I^1}\otimes 1), h_{(I^1\dot{+}m_{n+1})\times I^2}^{0,1} \rangle )_{I_2 \in {\mathcal {D}}_{\sigma _2}} \Vert _{{\text {BMO}}} \\&\quad \lesssim \omega _1(2^{-k}) \frac{|I^1|^{(n+1)/2}}{|K^1|^n}. \end{aligned} \end{aligned}$$
(3.31)

We verify the above \({\text {BMO}}\) condition by taking a cube \(I^2\) and a function \(a_{I^2}\) such that \(a_{I^2} = a_{I^2}1_{I^2}\), \(|a_{I^2}| \le 1\) and \(\int a_{I^2}=0\), and showing that

$$\begin{aligned} \begin{aligned}&|\langle T( h_{I^1\dot{+}m_1}^0\otimes 1, \ldots , h_{I^1 \dot{+}m_{n-1}}^0\otimes 1, h_{I^1}\otimes 1), h_{(I^1\dot{+}m_{n+1})}^0 \otimes a_{I^2} \rangle | \\&\quad \lesssim \omega _1(2^{-k}) \frac{|I^1|^{(n+1)/2}}{|K^1|^n} |I^2|. \end{aligned} \end{aligned}$$
(3.32)

For a suitably large constant C (so that we can use the continuity assumption of the kernel below) we split the pairing as

$$\begin{aligned} \begin{aligned}&\langle T( h_{I^1\dot{+}m_1}^0\otimes 1_{(CI^2)^c}, h_{I^1\dot{+}m_2}^0\otimes 1, \ldots , h_{I^1 \dot{+}m_{n-1}}^0\otimes 1, h_{I^1}\otimes 1), h_{(I^1\dot{+}m_{n+1})}^0 \otimes a_{I^2} \rangle \\&\quad +\langle T( h_{I^1\dot{+}m_1}^0\otimes 1_{CI^2}, h_{I^1\dot{+}m_2}^0\otimes 1, \ldots , h_{I^1 \dot{+}m_{n-1}}^0\otimes 1, h_{I^1}\otimes 1), h_{(I^1\dot{+}m_{n+1})}^0 \otimes a_{I^2} \rangle .\nonumber \end{aligned}\\ \end{aligned}$$
(3.33)

Let us show that the first term in (3.33) is dominated by \(\omega _1(2^{-k}) |I^1|^{(n+1)/2}|I^2|/|K^1|^n\). We have two cases. The case that \(k \sim 1\) is handled with the mixed size and continuity assumption of K. The case that k is large is handled with the continuity assumption of K. We show the details for the case \(k \sim 1\). The other case is done similarly (see also the paragraph containing (3.28)).

We assume that \(k \sim 1\). Since \(a_{I^2}\) has zero integral the pairing that we are estimating equals (by definition)

$$\begin{aligned} \begin{aligned}&\int \nolimits _ {{\mathbb {R}}^{(n+1)d}} \Big (K(x_{n+1},x_1, \dots , x_n)-K((x^1_{n+1},c_{I^2}),x_1, \dots , x_n)\Big ) \\&\quad \times \prod _{j=1}^{n-1} h_{I^1\dot{+}m_j}^0(x_j^1)h_{I^1}(x_n^1) h_{(I^1\dot{+}m_{n+1})}^0(x_{n+1}^1)1_{(CI^2)^c}(x_1^2) a_{I^2}(x_{n+1}^2) \,\mathrm {d}x. \end{aligned} \end{aligned}$$

The mixed size and continuity property of K implies that the absolute value of the last integral is dominated by

$$\begin{aligned} \begin{aligned}&\int \nolimits _{{\mathbb {R}}^{(n+1)d_1}} \frac{1}{\left( \sum _{j=1}^n |x_{n+1}^1-x_j^1|\right) ^{nd_1}} \prod _{j=1}^{n+1} h_{I^1 \dot{+} m_j^1}^0(x_j^1) \,\mathrm {d}x^1 \\&\quad \times \int \nolimits _{{\mathbb {R}}^{(n+1)d_2}} \omega _2\left( \frac{|x_{n+1}^2-c_{I^2}|}{\sum _{j=1}^n |c_{I^2}-x_j^2|} \right) \frac{1}{\left( \sum _{j=1}^n |c_{I^2}-x_j^2|\right) ^{nd_2}} 1_{(CI^2)^c}(x_1^2)1_{I^2}(x_{n+1}^2) \,\mathrm {d}x^2. \end{aligned} \end{aligned}$$

The integral related to \({\mathbb {R}}^{d_1}\) is dominated by \(|I^1|^{-(n-1)/2}\).

Consider the integral related to \({\mathbb {R}}^{d_2}\). By first estimating that

$$\begin{aligned} \omega _2\left( \frac{|x_{n+1}^2-c_{I^2}|}{\sum _{j=1}^n |c_{I^2}-x_j^2|} \right) \le \omega _2\left( \frac{|x_{n+1}^2-c_{I^2}|}{|c_{I^2}-x_1^2|} \right) \end{aligned}$$

with some work we see that the integral over \({\mathbb {R}}^{(n+1)d_2}\) is dominated by

$$\begin{aligned} \begin{aligned}&\int _{I^2} \int _{(CI^2)^c} \omega _2\left( \frac{|x_{n+1}^2-c_{I^2}|}{|c_{I^2}-x_1^2|} \right) \frac{1}{|c_{I^2}-x_1^2|^{d_2}} \,\mathrm {d}x_1^2 \,\mathrm {d}x_{n+1}^2 \\&\quad \lesssim |I^2| \int _{(CI^2)^c} \omega _2\left( \frac{\ell (I^2)}{|c_{I^2}-x_1^2|} \right) \frac{1}{|c_{I^2}-x_1^2|^{d_2}} \,\mathrm {d}x_1^2 \lesssim |I^2| \sum _{k=0}^\infty \omega _2(2^{-k}) \lesssim |I^2|. \end{aligned} \end{aligned}$$

In conclusion, we showed that the first term in (3.33) is dominated by \(|I^1|^{-(n-1)/2}|I^2|\), which is the right estimate in the case \(k \sim 1\).

We turn to consider the second term in (3.33). We again split it into two by writing \(1=1_{(CI^2)^c}+1_{CI^2}\) in the second slot. The part with \(1_{(CI^2)^c}\) is estimated in the same way as above and then one continues with the part related to \(1_{CI^2}\). This is repeated until we are only left with the term

$$\begin{aligned} \langle T( h_{I^1\dot{+}m_1}^0\otimes 1_{CI^2}, \ldots , h_{I^1 \dot{+}m_{n-1}}^0\otimes 1_{CI^2}, h_{I^1}\otimes 1_{CI^2}), h_{(I^1\dot{+}m_{n+1})}^0 \otimes a_{I^2} \rangle .\nonumber \\ \end{aligned}$$
(3.34)

The estimate for this uses the partial kernel representations of T. Again, we have the two cases that either \(k \sim 1\) or k is large. These are handled in the same way using either the size or the continuity of the partial kernels. We consider explicitly the case that k is large. Using the zero integral of \(h_{I^1}\) we have that the above pairing equals

$$\begin{aligned} \begin{aligned}&\int _{{\mathbb {R}}^{(n+1)d_1}} \Big (K_{1_{CI^2}, \dots , 1_{CI^2}, a_{I^2}}(x_{n+1}^1,x_1^1, \dots , x_n^1) -K_{1_{CI^2}, \dots , 1_{CI^2}, a_{I^2}}(x_{n+1}^1,x_1^1, \dots , c_{I^1})\Big ) \\&\quad \times \prod _{j=1}^{n-1} h^0_{I^1\dot{+}m_j}(x^1_j) h_{I^1}(x_n) h^0_{I^1\dot{+}m_{n+1}}(x^1_{n+1}) \,\mathrm {d}x^1. \end{aligned} \end{aligned}$$

Taking absolute values and using the continuity of the partial kernel leads to

$$\begin{aligned} \begin{aligned}&C(1_{CI^2},\dots ,1_{CI^2},a_{I^2}) \int _{{\mathbb {R}}^{(n+1)d_1}} \omega _1\left( \frac{|x_n^1-c_{I^1}|}{\sum _{j=1}^n|x_{n+1}^1-x_j^1|}\right) \\&\quad \times \frac{1}{\left( \sum _{j=1}^n|x_{n+1}^1-x_j^1|\right) ^{nd_1}} \prod _{j=1}^{n+1} h^0_{I^1\dot{+}m_j}(x^1_j) \,\mathrm {d}x^1. \end{aligned} \end{aligned}$$

By assumption there holds that \(C(1_{CI^2},\dots ,1_{CI^2},a_{I^2}) \lesssim |I^2|\) and the integral is dominated by \(\omega _1(2^{-k}) |I^1|^{(n+1)/2}{|K^1|^n}\). This concludes the proof of (3.32) and also finishes our treatment of \({\mathbb {E}}_\sigma \Sigma _\sigma ^2\).

The Full Paraproduct \(\Sigma _\sigma ^4\) Recall that

$$\begin{aligned} \Sigma _\sigma ^4&= \sum _{ \begin{array}{c} R_1, \ldots , R_{n+1} \\ \ell (R_1) = \cdots = \ell (R_{n+1}) \end{array}} \langle T(1_{R_1}, \ldots , 1_{R_{n-1}}, h_{I_n^1} \otimes 1_{I_n^2}), 1_{I_{n+1}^1} \otimes h_{I_{n+1}^2} \rangle \prod _{j=1}^{n-1} \langle f_j \rangle _{I_n^1 \times I_{n+1}^2} \\&\quad \times \Big \langle f_n, h_{I_n^1} \otimes \frac{1_{I_{n+1}^2}}{|I_{n+1}^2|} \Big \rangle \Big \langle f_{n+1}, \frac{1_{I_{n}^1}}{|I_{n}^1|} \otimes h_{I_{n+1}^2} \Big \rangle , \end{aligned}$$

which equals

$$\begin{aligned}&\sum _{R = K^1 \times K^2} \langle T(1, \ldots , 1, h_{K^1} \otimes 1), 1 \otimes h_{K^2} \rangle \\&\qquad \prod _{j=1}^{n-1} \langle f_j \rangle _{R}\Big \langle f_n, h_{K^1} \otimes \frac{1_{K^2}}{|K^2|} \Big \rangle \Big \langle f_{n+1}, \frac{1_{K^1}}{|K^1|} \otimes h_{K^2} \Big \rangle . \end{aligned}$$

This is directly a full paraproduct as

$$\begin{aligned} \langle T(1, \ldots , 1, h_{K^1} \otimes 1), 1 \otimes h_{K^2} \rangle = \langle T^{n*}_1(1, \ldots , 1), h_R \rangle , \end{aligned}$$

and so we are done with this term. Therefore, we are done with the main terms, and no more full paraproducts will appear.

The Remainder \({\text {Rem}}_{\sigma }\) To finish the proof of the bi-parameter representation theorem it remains to discuss the remainder term \({\text {Rem}}_{\sigma }\). Some of the weak boundedness type assumptions are used here—but there is nothing surprising on how they are used and we do not focus on that. We only explain the structural idea.

An \((n+1)\)-tuple \((I^i_1, \dots , I_{n+1}^i)\) of cubes \(I^i_j \in {\mathcal {D}}_{\sigma _i}\) belongs to \({\mathcal {I}}_{\sigma _i}\) if the following holds: if j is an index such that \(\ell (I^i_j) \le \ell (I^i_k)\) for all k, then there exists at least one index \(k_0 \not = j\) so that \(\ell (I^i_j) = \ell (I^i_{k_0})\). The remainder term can be written as

$$\begin{aligned} {\text {Rem}}_\sigma= & {} \sum _{j_1=1}^{n+1} \sum _{\begin{array}{c} I^1_1, \dots , I^1_{n+1} \\ \ell (I^1_i)>\ell (I^1_{j_1}) \text { for } i \not = j_1 \end{array}} \sum _{(I^2_1, \dots , I^2_{n+1}) \in {\mathcal {I}}_{\sigma _2}} \langle T(\Delta _{R_1}f_1, \ldots , \Delta _{R_n}f_n),\Delta _{R_{n+1}}f_{n+1} \rangle \\&+\sum _{j_2=1}^{n+1}\sum _{\begin{array}{c} I^2_1, \dots , I^2_{n+1} \\ \ell (I^2_i) >\ell (I^2_{j_2}) \text { for } i \not = j_2 \end{array}} \sum _{(I^1_1, \dots , I^1_{n+1}) \in {\mathcal {I}}_{\sigma _1}} \langle T(\Delta _{R_1}f_1, \ldots , \Delta _{R_n}f_n),\Delta _{R_{n+1}}f_{n+1} \rangle \\&+\sum _{\begin{array}{c} (I^1_1, \dots , I^1_{n+1}) \in {\mathcal {I}}_{\sigma _1} \\ (I^2_1, \dots , I^2_{n+1}) \in {\mathcal {I}}_{\sigma _2} \end{array}} \langle T(\Delta _{R_1}f_1, \ldots , \Delta _{R_n}f_n),\Delta _{R_{n+1}}f_{n+1} \rangle , \end{aligned}$$

where as usual \(R_i=I^1_i \times I^2_i\). Let us write this as

$$\begin{aligned} {\text {Rem}}_\sigma =\sum _{j_1=1}^{n+1}{\text {Rem}}_{\sigma ,j_1}^1 +\sum _{j_2=1}^{n+1}{\text {Rem}}_{\sigma ,j_2}^2+{\text {Rem}}_{\sigma }^3. \end{aligned}$$

First, we look at the terms \({\text {Rem}}_{\sigma ,j_1}^1\) and \({\text {Rem}}_{\sigma ,j_2}^2\) which are analogous. Consider for example \({\text {Rem}}_{\sigma ,n+1}^1\). We further divide \({\mathcal {I}}_{\sigma _2}\) into subcollections by specifying the slots where the smallest cubes are. For example, we consider here the part of the sum with the tuples \((I^2_1, \dots , I^2_{n+1})\) such that \(\ell (I^2_i)>\ell (I^2_n)=\ell (I^2_{n+1})\) for all \(i=1, \dots ,n-1\). By collapsing the relevant sums of martingale differences the term we are dealing with can be written as

$$\begin{aligned} \sum _{\begin{array}{c} R_1, \dots , R_{n+1} \\ \ell (R_i) =\ell (R_j) \end{array}} \langle T(E_{R_1}f_1, \dots , E_{R_{n-1}}f_{n-1}, E^1_{I^1_n} \Delta ^2_{I^2_n}f_n), \Delta _{R_{n+1}}f_{n+1} \rangle . \end{aligned}$$
(3.35)

In the first parameter there is only one martingale difference and in the second parameter there are two (in the general case at least two). Thus, the strategy is that we will write this in terms of model operators that have a modified shift or a paraproduct structure in the first parameter and a standard shift structure in the second parameter. We omit the details.

Finally, we consider \({\text {Rem}}_{\sigma }^3\). This is also divided into several cases by specifying the places of the smallest cubes in both parameters. For example, for notational convenience we take the part where \(\ell (I^1_1)=\ell (I^1_{n+1}) < \ell (I^1_i)\) and \(\ell (I^2_1)=\ell (I^2_{n+1}) < \ell (I^2_i)\) for all \(i=2, \dots , n\). Notice that in general the places and the number of the smallest cubes do not need to be the same in both parameters. After collapsing the relevant sums of martingale differences the term we are looking at is

$$\begin{aligned} {\mathbb {E}}_\sigma \sum _{\begin{array}{c} R_1, \dots , R_{n+1} \\ \ell (R_i) =\ell (R_j) \end{array}} \langle T(\Delta _{R_1}f_1, E_{R_2} f_2, \dots , E_{R_{n}}f_{n}), \Delta _{R_{n+1}}f_{n+1} \rangle . \end{aligned}$$
(3.36)

Here we have two (in the general case at least two) martingale differences in each parameter so this will be written in terms of standard bi-parameter n-linear shifts. We omit the details. This completes the proof. \(\square \)

Corollaries We indicate some corollaries—we start with the most basic unweighted boundedness on the Banach range of exponents.

Proposition 3.37

Let \(p_j \in (1, \infty )\), \(j=1, \dots ,n+1\), be such that \(\sum _{j=1}^{n+1} 1/p_j=1\). Suppose that \(Q_k\) is a modified n-linear bi-parameter shift. Then the estimate

$$\begin{aligned} |\langle Q_k(f_1, \ldots , f_n), f_{n+1} \rangle | \lesssim \sqrt{k}_1 \sqrt{k}_2 \prod _{j=1}^{n+1} \Vert f_j\Vert _{L^{p_j}} \end{aligned}$$

holds.

Suppose that \((QS)_{k,i}\) is a modified/standard shift (here \(k \in \{1,2, \dots \}\) and \(i=(i_1, \dots , i_{n+1})\)). Then the estimate

$$\begin{aligned} |\langle (QS)_{k,i}(f_1, \ldots , f_n), f_{n+1} \rangle | \lesssim \sqrt{k} \prod _{j=1}^{n+1} \Vert f_j\Vert _{L^{p_j}} \end{aligned}$$

holds.

Proof

We only prove the statement for the operator \(Q_k\). This essentially contains the proof for \((QS)_{k,i}\).

We assume \(Q_k\) has the explicit form

$$\begin{aligned} \langle Q_k(f_1, \ldots , f_n), f_{n+1}\rangle&= \sum _{K} \sum _{\begin{array}{c} R_1, \ldots , R_{n+1} \\ R_j^{(k)} = K \end{array}} a_{K, (R_j)} \left[ \prod _{j=1}^{n} \langle f_j, h_{R_j}^0 \rangle - \prod _{j=1}^{n} \langle f_j, h_{I_{n+1}^1 \times I_j^2}^0 \rangle \right. \\&\left. \quad - \prod _{j=1}^{n} \langle f_j, h_{I_j^1 \times I_{n+1}^2}^0 \rangle + \prod _{j=1}^{n} \langle f_j, h_{R_{n+1}}^0 \rangle \right] \langle f_{n+1}, h_{R_{n+1}} \rangle . \end{aligned}$$

Using the notation (3.13) there holds that

$$\begin{aligned} \prod _{j=1}^{n} \langle f_j, h_{R_j}^0 \rangle =\sum _{m_1,m_2=1}^{n+1} \prod _{j=1}^{n} \langle D^1_{K^1,k_1}(j,m_1)D^2_{K^2,k_2}(j,m_2)f_j, h_{R_j}^0 \rangle . \end{aligned}$$

We do the same decomposition with the other three terms inside the bracket \([ \, \cdot \,]\). This splits \([\, \cdot \, ]\) into a sum over \(m_1,m_2 \in \{1, \dots , n+1\}\). Then, we notice that all the terms in the sum with \(m_1=n+1\) or \(m_2=n+1\) cancel out. Thus, we get a splitting of \(\langle Q_k(f_1, \ldots , f_n), f_{n+1} \rangle \) into a sum over \(m_1,m_2 \in \{1, \dots , n\}\). All the terms with different \(m_1\) and \(m_2\) are estimated separately.

In what follows—for notational convenience—we will focus on the case \(m_1 = m_2 =: m \in \{1, \ldots , n\}\), and we define \(D^1_{K^1,k_1}(j,m)D^2_{K^2,k_2}(j,m) =: D_{K, k}(j,m)\). The term in the splitting of \(\langle Q_k(f_1, \ldots , f_n), f_{n+1} \rangle \) corresponding to \(m=m_1=m_2\) can be written as the sum

$$\begin{aligned} \sum _{i=1}^4 \langle U_i(f_1, \ldots , f_n), f_{n+1} \rangle , \end{aligned}$$

where

$$\begin{aligned} \langle U_1(f_1, \ldots , f_n), f_{n+1}\rangle = \sum _{K} \sum _{\begin{array}{c} R_1, \ldots , R_{n+1} \\ R_j^{(k)} = K \end{array}} a_{K, (R_j)} \prod _{j=1}^{n} \langle D_{K, k}(j,m)f_j, h_{R_j}^0 \rangle \langle f_{n+1}, h_{R_{n+1}} \rangle , \end{aligned}$$

and \(U_2\), \(U_3\) and \(U_4\) are defined similarly just by replacing \(h^0_{R_j}\), \(j \in \{1, \dots , n\}\), by \(h_{I_{n+1}^1 \times I_j^2}^0\), \(h_{I_{j}^1 \times I_{n+1}^2}^0\) and \(h_{R_{n+1}}^0\), respectively.

With some direct calculations it can be shown that for all \(i \in \{1, \ldots , 4\}\) we have

$$\begin{aligned} |\langle U_i(f_1, \ldots , f_n), f_{n+1}\rangle | \le \int \prod _{\begin{array}{c} j=1 \\ j \ne m \end{array}}^n Mf_j \left( \sum _K |MP_{K, (k_1-1, k_2-1)} f_m|^2 \right) ^{1/2} S_{{\mathcal {D}}} f_{n+1}. \end{aligned}$$
(3.38)

From here the estimate can be finished by Hölder’s inequality, the Fefferman–Stein inequality and square function estimates, see Lemma 2.4. \(\square \)

Next, we look at the modified partial paraproducts. We will use the well known one-parameter \(H^1\)-\({\text {BMO}}\) duality estimate

$$\begin{aligned} \sum _I |a_I b_I| \lesssim \Vert (a_I) \Vert _{{\text {BMO}}} \left\| \left( \sum _{I} |b_I|^2 \frac{1_I}{|I|} \right) ^{1/2} \right\| _{L^1}, \end{aligned}$$
(3.39)

where the cubes I are in some dyadic grid.

Proposition 3.40

Let \(p_j \in (1, \infty )\), \(j=1, \dots ,n+1\), be such that \(\sum _{j=1}^{n+1} 1/p_j=1\). Suppose \((Q\pi )_k\) is a modified n-linear partial paraproduct. Then the estimate

$$\begin{aligned} |\langle (Q\pi )_k(f_1, \ldots , f_n), f_{n+1} \rangle | \lesssim \sqrt{k} \prod _{j=1}^{n+1} \Vert f_j\Vert _{L^{p_j}} \end{aligned}$$

holds.

Proof

We assume that \(\langle (Q\pi )_k(f_1, \ldots , f_n), f_{n+1}\rangle \) has the form

$$\begin{aligned} \begin{aligned} \sum _{K} \sum _{\begin{array}{c} I^1_1, \ldots , I^1_{n+1} \\ (I^1_j)^{(k)} = K^1 \end{array}} a_{K, (I^1_j)} \left[ \prod _{j=1}^{n} \Big \langle f_j, h^0_{I^1_j} \otimes \frac{1_{K^2}}{|K^2|} \Big \rangle - \prod _{j=1}^{n} \Big \langle f_j, h^0_{I_{n+1}^1} \otimes \frac{1_{K^2}}{|K^2|} \Big \rangle \right] \langle f_{n+1}, h_{I^1_{n+1} \times K^2} \rangle . \end{aligned} \end{aligned}$$

We decompose

$$\begin{aligned} \prod _{j=1}^{n} \Big \langle f_j, h^0_{I^1_j} \otimes \frac{1_{K^2}}{|K^2|} \Big \rangle =\sum _{m=1}^{n+1} \prod _{j=1}^{n} \Big \langle D^1_{K^1, k_1}(j,m) f_j, h^0_{I^1_j} \otimes \frac{1_{K^2}}{|K^2|} \Big \rangle \end{aligned}$$

and similarly with the other term inside the bracket \([ \, \cdot \, ]\). Notice that the terms with \(m=n+1\) cancel out. Thus, we get a decomposition of \(\langle (Q\pi )_k(f_1, \ldots , f_n), f_{n+1}\rangle \) into a sum over \(m \in \{1, \dots , n\}\). The terms with different m are estimated separately.

Fix one m. The term from the decomposition of \(\langle (Q\pi )_k(f_1, \ldots , f_n), f_{n+1}\rangle \) related to m is

$$\begin{aligned} \sum _{i=1}^2\langle U_1(f_1, \ldots , f_n), f_{n+1}\rangle , \end{aligned}$$

where \(\langle U_1(f_1, \ldots , f_n), f_{n+1}\rangle \) equals

$$\begin{aligned} \sum _{K} \sum _{\begin{array}{c} I^1_1, \ldots , I^1_{n+1} \\ (I^1_j)^{(k)} = K^1 \end{array}} a_{K, (I^1_j)} \prod _{j=1}^{n} \Big \langle D^1_{K^1, k_1}(j,m) f_j, h^0_{I^1_j} \otimes \frac{1_{K^2}}{|K^2|} \Big \rangle \langle f_{n+1}, h_{I^1_{n+1} \times K^2} \rangle \qquad \end{aligned}$$
(3.41)

and \(\langle U_2(f_1, \ldots , f_n), f_{n+1}\rangle \) is defined similarly just be replacing \(h^0_{I^1_j}\), \(j=1, \dots , n\), with \(h^0_{I^1_{n+1}}\).

We consider \(U_1\) first. From the one-parameter \(H^1\)-\({\text {BMO}}\) duality estimate (3.39) we have that, with fixed \(K^1\) and \(I^1_1, \dots , I^1_{n+1}\), the sum over \(K^2\) of the absolute value of the summand in (3.41) is dominated by

$$\begin{aligned} \begin{aligned}&\frac{|I^1_{n+1}|^{(n+1)/2}}{|K^1|^n}\int _{{\mathbb {R}}^{d_2}} \left( \sum _{K^2} \Big |\prod _{j=1}^{n} \Big \langle D^1_{K^1, k_1}(j,m) f_j, h^0_{I^1_j} \otimes \frac{1_{K^2}}{|K^2|} \Big \rangle \langle f_{n+1}, h_{I^1_{n+1}\times K^2} \rangle \Big |^2 \frac{1_{K^2}}{|K^2|} \right) ^{1/2} \\&\quad \le \frac{|I^1_{n+1}|^{(n+1)/2}}{|K^1|^n} \int _{{\mathbb {R}}^{d_2}} \prod _{j=1}^{n} \langle M^2 D^1_{K^1, k_1}(j,m) f_j, h^0_{I^1_j} \rangle _1 \langle S^2_{{\mathcal {D}}_2} \Delta _{K^1,k_1} f_{n+1} , h^0_{I^1_{n+1}} \rangle _1. \end{aligned} \end{aligned}$$

The sum of this over \(K^1\) and \(I^1_1, \dots , I^1_{n+1}\) such that \((I^1_j)^{(k)}=K^1\) is less than

$$\begin{aligned} \int _{{\mathbb {R}}^d} \prod _{\begin{array}{c} j=1 \\ j \not =m \end{array}}^n M^1 M^2 f_j \left( \sum _{K^1} (M^1 M^2 P^1_{K^1, k_1-1}f_m)^2 \right) ^{1/2} \left( \sum _{K^1} (S^2_{{\mathcal {D}}^2} \Delta _{K^1,k_1} f_{n+1})^2\right) ^{1/2}.\nonumber \\ \end{aligned}$$
(3.42)

Notice that the square function related to \(f_{n+1}\) is just the bi-parameter square function \(S_{\mathcal {D}}\). To finish the estimate it remains to use the Fefferman–Stein inequality and square function estimates, see Lemma 2.4.

The second term \(|\langle U_2(f_1, \ldots , f_n), f_{n+1}\rangle |\) satisfies the same upper bound (3.42), and can therefore be estimated in the same way. The proof is concluded. \(\square \)

The above, together with known estimates for standard operators, directly leads to Banach range boundedness of n-linear bi-parameter \((\omega _1, \omega _2)\)-CZOs with \(\omega _i \in {\text {Dini}}_{1/2}\). We do not push this further in this paper. For state-of-the-art estimates with genuinely multilinear weights (in the full multilinear range) see [31]. There we recorded some of the estimates with \({\text {Dini}}_{1}\) using the above representation theorem and the decomposition of modified operators in terms of standard operators.

We are unable to perform the estimates of [31] with the regularity \({\text {Dini}}_{\frac{1}{2}}\). However, the linear case is special: the weighted estimates of linear modified model operators with a bound depending on the square root of the complexity are easy. Notice that in principle we have already done all the necessary work. For example, if we want to estimate \(\Vert Q_k f \Vert _{L^p(w)}\), we study the unweighted pairings \(\langle Q_k f,g \rangle \). Then, we proceed as in the linear case of Proposition 3.37. Depending on the form of the shift this leads us to terms corresponding to (3.38) such as

$$\begin{aligned} \int \left( \sum _K |MP_{K, (k_1-1, k_2-1)} f|^2 \right) ^{1/2} S_{{\mathcal {D}}} g. \end{aligned}$$

By Hölder’s inequality this is less than

$$\begin{aligned}&\left\| \left( \sum _K |MP_{K, (k_1-1, k_2-1)} f|^2 \right) ^{1/2} \right\| _{L^p(w)} \Vert S_{{\mathcal {D}}} g \Vert _{L^{p'}(w^{1-p'})}\\&\quad \lesssim \sqrt{k}_1 \sqrt{k}_2 \Vert f\Vert _{L^p(w)} \Vert g \Vert _{L^{p'}(w^{1-p'})}. \end{aligned}$$

Proposition 3.43

For every \(p \in (1, \infty )\) and bi-parameter \(A_p\) weight w we have

$$\begin{aligned} \Vert Q_{k} f\Vert _{L^p(w)} \lesssim \sqrt{k_1}\sqrt{k_2} \Vert f\Vert _{L^p(w)}. \end{aligned}$$

For completeness, we record the corresponding result for CZOs. Again, for multilinear weighted estimates with the optimal weight classes see [31].

Corollary 3.44

Let \(p_j \in (1, \infty )\), \(j=1, \dots ,n+1\), be such that \(\sum _{j=1}^{n+1} 1/p_j=1\). Suppose that T is an n-linear bi-parameter \((\omega _1, \omega _2)\)-CZO, where \(\omega _i \in {\text {Dini}}_{1/2}\). Then we have the Banach range estimate

$$\begin{aligned} |\langle T(f_1, \ldots , f_n), f_{n+1} \rangle | \lesssim \prod _{j=1}^{n+1} \Vert f_j\Vert _{L^{p_j}}. \end{aligned}$$
(3.45)

In the linear case \(n=1\) we have the weighted estimate

$$\begin{aligned} \Vert Tf\Vert _{L^p(w)} \lesssim \Vert f\Vert _{L^p(w)} \end{aligned}$$
(3.46)

whenever \(p \in (1,\infty )\) and \(w \in A_p\) is a bi-parameter weight.

4 Commutator Estimates

The basic form of a commutator is \([b,T]:f \mapsto bTf - T(bf)\). We are interested in various iterated versions in the multi-parameter setting and with mild kernel regularity.

For a bi-parameter weight \(w \in A_2({\mathbb {R}}^{d_1} \times {\mathbb {R}}^{d_2})\) and a locally integrable function b we define the weighted product \({\text {BMO}}\) norm

$$\begin{aligned} \Vert b\Vert _{{\text {BMO}}_{\text {prod}}(w)} = \sup _{{\mathcal {D}}} \sup _{\Omega }\left( \frac{1}{w(\Omega )}\sum _{\begin{array}{c} R\in {\mathcal {D}}\\ R\subset \Omega \end{array}} \frac{|\langle b, h_R\rangle |^2}{\big \langle w \big \rangle _R}\right) ^{\frac{1}{2}}, \end{aligned}$$
(4.1)

where the supremum is over all dyadic grids \({\mathcal {D}}^i\) on \({\mathbb {R}}^{d_i}\) and \({\mathcal {D}}= {\mathcal {D}}^1 \times {\mathcal {D}}^2\), and over all open sets \(\Omega \subset {\mathbb {R}}^d := {\mathbb {R}}^{d_1} \times {\mathbb {R}}^{d_2}\) for which \(0< w(\Omega ) < \infty \). The following theorem, which is the two-weight Bloom version of [9], was proved in [29] with \(\omega _i(t) = t^{\gamma _i}\).

Theorem 4.2

Suppose that \(T_i\) is a one-parameter \(\omega _i\)-CZO, where \(\omega _i \in {\text {Dini}}_{3/2}\). Let \(b :{\mathbb {R}}^d \rightarrow {\mathbb {C}}\), \(p \in (1, \infty )\), \(\mu , \lambda \in A_p({\mathbb {R}}^d)\) be bi-parameter weights and \(\nu = \mu ^{1/p} \lambda ^{-1/p} \in A_2({\mathbb {R}}^d)\) be the associated bi-parameter Bloom weight. Then we have

$$\begin{aligned} \Vert [T_1, [T_2, b]] \Vert _{L^p(\mu ) \rightarrow L^p(\lambda )} \lesssim \Vert b\Vert _{{\text {BMO}}_{\text {prod}}(\nu )}. \end{aligned}$$

Proof

Let \(\Vert b\Vert _{{\text {BMO}}_{\text {prod}}(\nu )} = 1\). We need to e.g. bound \(\Vert [Q_{k_1}, [Q_{k_2}, b]]f\Vert _{L^p(\lambda )}\) for one-parameter modified shifts (which have a similar definition as in the bi-parameter case). It seems non-trivial to fully exploit the operators \(Q_{k}\) here and we content on splitting the operators to standard shifts and bounding

$$\begin{aligned} \sum _{j_1 = 0}^{k_1} \sum _{j_2 = 0}^{k_2} \Vert [S_{k_1, j_1}, [S_{k_2, j_2}, b]]f\Vert _{L^p(\lambda )} \end{aligned}$$

and other similar terms, where \(S_{k_i, j_i}\) is a linear one-parameter shift on \({\mathbb {R}}^{d_i}\) of complexity \((k_i, j_i)\). Reaching \({\text {Dini}}_{1}\) would require replacing this step with a sharper estimate.

On page 11 of [29] it is recorded that

$$\begin{aligned} \Vert [S_{u_1, v_1}, [S_{u_2, v_2}, b]]f\Vert _{L^p(\lambda )} \lesssim (1+\max (u_1, v_1))(1+\max (u_2, v_2)) \Vert f\Vert _{L^p(\mu )}. \end{aligned}$$

Interestingly, this part of the argument can be improved: there actually holds that

$$\begin{aligned} \Vert [S_{u_1, v_1}, [S_{u_2, v_2}, b]]f\Vert _{L^p(\lambda )} \lesssim (1+\max (u_1, v_1))^{1/2}(1+\max (u_2, v_2))^{1/2} \Vert f\Vert _{L^p(\mu )}. \end{aligned}$$
(4.3)

We will get back to this after completing the proof. Therefore, we have

$$\begin{aligned} \sum _{j_1 = 0}^{k_1} \sum _{j_2 = 0}^{k_2} \Vert [S_{k_1, j_1}, [S_{k_2, j_2}, b]]f\Vert _{L^p(\lambda )} \lesssim (1+k_1)^{3/2} (1+k_2)^{3/2} \Vert f\Vert _{L^p(\mu )}. \end{aligned}$$

Handling the other terms of the shift expansion of \([Q_{k_1}, [Q_{k_2}, b]]\) similarly, we get

$$\begin{aligned} \Vert [Q_{k_1}, [Q_{k_2}, b]]f\Vert _{L^p(\lambda )} \lesssim (1+k_1)^{3/2} (1+k_2)^{3/2} \Vert f\Vert _{L^p(\mu )}. \end{aligned}$$

Controlling commutators like \([Q_{k_1}, [\pi , b]]\) similarly we get the claim.

We return to (4.3) now. Decompositions are very involved in the bi-commutator case, and we prefer to give the idea of the improvement (4.3) by studying the simpler one-parameter situation \([b, S_{i,j}]\), where

$$\begin{aligned} S_{i,j} = \sum _{K} \sum _{I^{(i)} = J^{(j)} = K} a_{IJK} \langle f, h_I \rangle h_J \end{aligned}$$

is a one-parameter shift on \({\mathbb {R}}^d\) and \(b \in {\text {BMO}}(\nu )\);

$$\begin{aligned} \Vert b\Vert _{{\text {BMO}}(\nu )}&:= \sup _{I \subset {\mathbb {R}}^d \text { cube}} \frac{1}{\nu (I)} \int _I |b - \langle b \rangle _I| \sim \sup _{{\mathcal {D}}} \sup _{I_0 \in {\mathcal {D}}}\left( \frac{1}{\nu (I_0)}\sum _{\begin{array}{c} I \in {\mathcal {D}}\\ I \subset I_0 \end{array}} \frac{|\langle b, h_I\rangle |^2}{\big \langle \nu \big \rangle _I}\right) ^{\frac{1}{2}} < \infty . \end{aligned}$$

Here we only have use for the expression on the right-hand side, which is the analogue of the bi-parameter definition (4.1). However, it is customary to define things as on the left-hand side in this one-parameter situation. The equivalence follows from the weighted John–Nirenberg [34]

$$\begin{aligned} \sup _{I \subset {\mathbb {R}}^d \text { cube}} \frac{1}{\nu (I)} \int _I |b - \langle b \rangle _I| \sim \sup _{I \subset {\mathbb {R}}^d \text { cube}} \left( \frac{1}{\nu (I)} \int _I |b - \langle b \rangle _I|^2 \nu ^{-1} \right) ^{1/2}, \qquad \nu \in A_2. \end{aligned}$$

Of course, one-parameter commutators [bT] can be handled even with \({\text {Dini}}_{0}\), but e.g. sparse domination proofs [25, 26] are restricted to one-parameter, unlike these decompositions. To get started, we define the one-parameter paraproducts (with some implicit dyadic grid)

$$\begin{aligned} A_1(b,f) = \sum _{I} \Delta _{I} b \Delta _{I} f, \, \, A_2(b,f) = \sum _{I} \Delta _{I} b E_{I} f \,\, \text { and } \,\, A_3(b, f) = \sum _{I} E_{I} b \Delta _{I} f. \end{aligned}$$

By writing \(b = \sum _{I} \Delta _{I} b\) and \(f = \sum _{J} \Delta _{J} f\), and collapsing sums such as \(1_I \sum _{J :I \subsetneq J} \Delta _{J} f = E_{I} f\), we formally have

$$\begin{aligned} bf = \sum _{I} \Delta _{I} b \Delta _{I} f + \sum _{I \subsetneq J} \Delta _{I} b \Delta _{J} f + \sum _{J \subsetneq I } \Delta _{I} b \Delta _{J} f = \sum _{k=1}^3 A_k(b,f). \end{aligned}$$

We now decompose the commutator as follows

$$\begin{aligned} {[}b, S_{i,j}]f&= b S_{i,j} f - S_{i,j}(bf) \\&= \sum _{k=1}^2 A_k(b,S_{i,j} f) - \sum _{k=1}^2 S_{i,j}(A_k(b,f)) + [A_3(b,S_{i,j} f) - S_{i,j}(A_3(b,f))]. \end{aligned}$$

We have the well-known fact that \(\Vert A_k(b, f)\Vert _{L^p(\lambda )} \lesssim \Vert b\Vert _{{\text {BMO}}(\nu )} \Vert f\Vert _{L^p(\mu )}\) for \(k=1,2\)—this can be seen by using the weighted \(H^1\)-\({\text {BMO}}\) duality [37] (with \(a_I = \langle b, h_I\rangle \))

$$\begin{aligned} \sum _I |a_I| |b_I| \lesssim \Vert (a_I)\Vert _{{\text {BMO}}(\nu )} \Bigg \Vert \Bigg ( \sum _I |b_I|^2 \frac{1_I}{|I|} \Bigg )^{1/2} \Bigg \Vert _{L^1(\nu )}, \end{aligned}$$
(4.4)

where

$$\begin{aligned} \Vert (a_I)\Vert _{{\text {BMO}}(\nu )} = \sup _{I_0 \in {\mathcal {D}}}\left( \frac{1}{\nu (I_0)}\sum _{\begin{array}{c} I \in {\mathcal {D}}\\ I \subset I_0 \end{array}} \frac{|a_I|^2}{\big \langle \nu \big \rangle _I}\right) ^{\frac{1}{2}}. \end{aligned}$$

Combining this with the well-known estimate \(\Vert S_{i,j} f\Vert _{L^p(w)} \lesssim \Vert f\Vert _{L^p(w)}\) for all \(w \in A_p\) it follows that

$$\begin{aligned} \left\| \sum _{k=1}^2 A_k(b,S_{i,j} f) - \sum _{k=1}^2 S_{i,j}(A_k(b,f)) \right\| _{L^p(\lambda )} \lesssim \Vert b\Vert _{{\text {BMO}}(\nu )} \Vert f\Vert _{L^p(\mu )}. \end{aligned}$$

The complexity dependence is coming from the remaining term

$$\begin{aligned} A_3(b,S_{i,j} f) - S_{i,j}(A_3(b,f)) = \sum _{K} \sum _{I^{(i)} = J^{(j)} = K} [\langle b \rangle _J - \langle b \rangle _I] a_{IJK} \langle f, h_I \rangle h_J. \end{aligned}$$

There are many ways to bound this, but the following way based on the \(H^1\)-\({\text {BMO}}\) duality—and executed in the particular way that we do below—gives the best dependence that we are aware of:

$$\begin{aligned} \Vert A_3(b,S_{i,j} f) - S_{i,j}(A_3(b,f))\Vert _{L^p(\lambda )} \lesssim (1+\max (i,j))^{1/2} \Vert b\Vert _{{\text {BMO}}(\nu )} \Vert f\Vert _{L^p(\mu )}. \end{aligned}$$

We write

$$\begin{aligned} \langle b \rangle _J - \langle b \rangle _I = [\langle b \rangle _J - \langle b \rangle _K] - [\langle b \rangle _I - \langle b \rangle _K], \end{aligned}$$

where we further write

$$\begin{aligned} \langle b \rangle _J - \langle b \rangle _K = \sum _{J \subsetneq L \subset K} \langle \Delta _L b \rangle _J = \sum _{J \subsetneq L \subset K} \langle b, h_L \rangle \langle h_L \rangle _J, \end{aligned}$$

and similarly for \(\langle b \rangle _I - \langle b \rangle _K\). We dualize and e.g. look at

$$\begin{aligned}&\sum _{K} \sum _{I^{(i)} = J^{(j)} = K} \sum _{J \subsetneq L \subset K} |\langle b, h_L \rangle | \langle |h_L| \rangle _J |a_{IJK}| |\langle f, h_I \rangle | | \langle g, h_J\rangle | \\&\quad = \sum _{K} \sum _{\begin{array}{c} L \subset K \\ \ell (L)> 2^{-j}\ell (K) \end{array}} |\langle b, h_L \rangle | |L|^{-1/2} \sum _{\begin{array}{c} I^{(i)} = J^{(j)} = K \\ J \subset L \end{array}} |a_{IJK}| |\langle f, h_I \rangle | | \langle g, h_J\rangle | \\&\quad \lesssim \Vert b\Vert _{{\text {BMO}}(\nu )} \sum _{K} \int \left( \sum _{\begin{array}{c} L \subset K \\ \ell (L) > 2^{-j}\ell (K) \end{array}} \frac{1_L}{|L|^2} \left[ \sum _{\begin{array}{c} I^{(i)} = J^{(j)} = K \\ J \subset L \end{array}} |a_{IJK}| |\langle f, h_I \rangle | | \langle g, h_J\rangle | \right] ^2 \right) ^{1/2} \nu , \end{aligned}$$

where we used the weighted \(H^1\)-\({\text {BMO}}\) duality. Here

$$\begin{aligned} \sum _{\begin{array}{c} I^{(i)} = J^{(j)} = K \\ J \subset L \end{array}} |a_{IJK}| |\langle f, h_I \rangle | | \langle g, h_J\rangle | \le \frac{1}{|K|} \int _K |\Delta _{K, i} f| \int _L |\Delta _{K,j}g|, \end{aligned}$$

and we can bound

$$\begin{aligned}&\sum _{K} \int \left( \sum _{\begin{array}{c} L \subset K \\ \ell (L) > 2^{-j}\ell (K) \end{array}} 1_L \langle |\Delta _{K, i} f| \rangle _K^2 \langle |\Delta _{K, j} g| \rangle _L^2 \right) ^{1/2} \nu \\&\quad \le j^{1/2} \sum _{K} \int (M \Delta _{K, i} f) (M \Delta _{K, j} g) \nu \\&\quad \le j^{1/2} \left\| \left( \sum _K |M \Delta _{K, i} f|^2 \right) ^{1/2} \right\| _{L^p(\mu )} \left\| \left( \sum _K |M \Delta _{K, j} g|^2 \right) ^{1/2} \right\| _{L^{p'}(\lambda ^{1-p'})} \\&\quad \lesssim j^{1/2} \Vert f\Vert _{L^p(\mu )} \Vert g\Vert _{L^{p'}(\lambda ^{1-p'})}. \end{aligned}$$

We are done with the one-parameter case—the desired bi-parameter case can now be done completely similarly by tweaking the proof in [29] using the above idea. \(\square \)

Remark 4.5

The previous way to use the \(H^1\)-\({\text {BMO}}\) duality was to look at

$$\begin{aligned} \sum _{K} \sum _{\begin{array}{c} L \subset K \\ \ell (L) = 2^{-l}\ell (K) \end{array}} |\langle b, h_L \rangle | |L|^{-1/2} \sum _{\begin{array}{c} I^{(i)} = J^{(j)} = K \\ J \subset L \end{array}} |a_{IJK}| |\langle f, h_I \rangle | | \langle g, h_J\rangle |, \end{aligned}$$

where \(l = 0, \ldots , j-1\) is fixed, and to apply the \(H^1\)-\({\text {BMO}}\) duality to the whole KL summation. With l fixed this yields a uniform estimate, and there is also a curious ’extra’ cancellation present—we can even bound

$$\begin{aligned} \sum _{\begin{array}{c} I^{(i)} = J^{(j)} = K \\ J \subset L \end{array}} |a_{IJK}| |\langle f, h_I \rangle | | \langle g, h_J\rangle | \le \frac{1}{|K|} \int _K |\Delta _{K, i} f| \int _L |g|, \end{aligned}$$

that is, forget the \(\Delta _{K,j}\) from g. Then it remains to sum over l which yields the dependence j instead of \(j^{1/2}\). The way in our proof above is more efficient and we see that we utilize all of the cancellation as well.

Remark 4.6

An interesting question is can we have \(\alpha = 1\) instead of \(\alpha = 3/2\) by somehow more carefully exploiting the operators \(Q_k\)—this would appear to be the optimal result theoretically obtainable by the current methods.

We also note that it is certainly possible to handle higher order commutators, such as, \([T_1, [T_2, [b, T_3]]]\).

We will continue with more multi-parameter commutator estimates – the difference to the above is that now even the singular integrals are allowed to be multi-parameter.

For a weight w on \({\mathbb {R}}^d := {\mathbb {R}}^{d_1} \times {\mathbb {R}}^{d_2}\) we say that a locally integrable function \(b :{\mathbb {R}}^d \rightarrow {\mathbb {C}}\) belongs to the weighted little BMO space \({\text {bmo}}_{}(w)\) if

$$\begin{aligned} \Vert b\Vert _{{\text {bmo}}(w)} := \sup _{R} \frac{1}{w(R)} \int _R |b - \langle b \rangle _R| < \infty , \end{aligned}$$

where the supremum is over rectangles \(R=I^1 \times I^2 \subset {\mathbb {R}}^d\). If \(w=1\) we denote the unweighted little \({\text {BMO}}\) space by \({\text {bmo}}\). There holds that

$$\begin{aligned} \Vert b \Vert _{{\text {bmo}}(w)} \sim \max \left( \mathop {\mathrm{ess\,sup}}\limits _{x_1 \in {\mathbb {R}}^{d_1}} \Vert b(x_1, \cdot ) \Vert _{{\text {BMO}}(w(x_1, \cdot ))}, \mathop {\mathrm{ess\,sup}}\limits _{x_2 \in {\mathbb {R}}^{d_2}} \Vert b(\cdot , x_2) \Vert _{{\text {BMO}}(w(\cdot , x_2))} \right) , \end{aligned}$$
(4.7)

see [19]. Here \({\text {BMO}}(w(x_1, \cdot ))\) and \({\text {BMO}}(w(\cdot , x_2))\) are the one-parameter weighted \({\text {BMO}}\) spaces. For example,

$$\begin{aligned} \Vert b(x_1, \cdot ) \Vert _{{\text {BMO}}(w(x_1, \cdot ))} :=\sup _{I^2} \frac{1}{ w(x_1, \cdot ) (I^2)} \int _{I^2} | b(x_1, y_2)-\langle b(x_1, \cdot )\rangle _{I^2}| \,\mathrm {d}y_2, \end{aligned}$$

where the supremum is over cubes \(I^2 \subset {\mathbb {R}}^{d_2}\).

The following theorem was proved in [28] with \(\omega _i(t) = t^{\gamma _i}\). The first order case [bT] appeared before in [19]. See also [29] for the optimality of the space \({\text {bmo}}(\nu ^{1/m})\) in the case \(b_1 = \cdots = b_m = b\).

Theorem 4.8

Let \(p \in (1,\infty )\), \(\mu , \lambda \in A_p\) be bi-parameter weights and \(\nu := \mu ^{1/p}\lambda ^{-1/p}\). Suppose that T is a bi-parameter \((\omega _1, \omega _2)\)-CZO and \(m \in {\mathbb {N}}\). Then we have

$$\begin{aligned} \Vert [b_m,\cdots [b_2, [b_1, T]]\cdots ]\Vert _{L^p(\mu ) \rightarrow L^p(\lambda )} \lesssim \prod _{i=1}^m\Vert b_i\Vert _{{\text {bmo}}(\nu ^{1/m})} \end{aligned}$$

if one of the following conditions holds:

  1. (1)

    T is paraproduct free and \(\omega _i \in {\text {Dini}}_{m/2+1}\);

  2. (2)

    \(m=1\) and \(\omega _i \in {\text {Dini}}_{3/2}\);

  3. (3)

    \(\omega _i \in {\text {Dini}}_{m+1}\).

Proof

The proof is similar in spirit to that of Theorem 4.2. We use Lemma 3.11 and estimates for the commutators of the usual bi-parameter model operators. If we use the bounds from [28] directly, we e.g. immediately get

$$\begin{aligned} \begin{aligned}&\Vert [b_m,\cdots [b_2, [b_1, Q_{k_1, k_2}]]\cdots ]\Vert _{L^p(\mu ) \rightarrow L^p(\lambda )} \\&\quad \lesssim (1+k_1)(1+k_2)(1+\max (k_1, k_2))^{m} \prod _{i=1}^m\Vert b_i\Vert _{{\text {bmo}}(\nu ^{1/m})}. \end{aligned} \end{aligned}$$
(4.9)

Similarly, we can read an estimate for all the other model operators from [28]. This gives us the result under the higher regularity assumption (3). Indeed, when using the estimate (4.9) in connection with the representation theorem one ends up with the series

$$\begin{aligned} \sum _{k_1=0}^\infty \sum _{k_2=0}^\infty \omega _1(2^{-k_1})\omega _2(2^{-k_2}) (1+k_1)(1+k_2)(1+\max (k_1, k_2))^{m}. \end{aligned}$$

We split this into two according to whether \(k_1 \le k_2\) or \(k_1>k_2\) and, for example, there holds that

$$\begin{aligned} \begin{aligned} \sum _{k_1=0}^\infty \omega _1(2^{-k_1})(1+k_1)\sum _{k_2=k_1}^\infty \omega _2(2^{-k_2})(1+k_2)^{m+1}&\lesssim \sum _{k_1=0}^\infty \omega _1(2^{-k_1})(1+k_1)\Vert \omega _2 \Vert _{{\text {Dini}}_{m+1}} \\&\lesssim \Vert \omega _1 \Vert _{{\text {Dini}}_{1}}\Vert \omega _2 \Vert _{{\text {Dini}}_{m+1}}. \end{aligned} \end{aligned}$$

The first order case \(m=1\) with the desired regularity (assumption (2)) follows as the papers [1, 2, 19] dealing with commutators of the form \([T_1, [T_2, \ldots [b, T_k]]]\), where each \(T_k\) can be multi-parameter, include the proof of the first order case with the \(H^1\)-\({\text {BMO}}\) duality strategy. And this strategy can be improved to give the additional square root save as in Theorem 4.2.

For \(m \ge 2\) the new square root save becomes tricky. The paper [28] is not at all based on the \(H^1\)-\({\text {BMO}}\) duality strategy on which this save is based on (see the proof of Theorem 4.2). We can improve the strategy of [28] for shifts. Thus, we are able to make the square root save for paraproduct free T (assumption (1)). By this we mean that (both partial and full) paraproducts in the dyadic representation of T vanish, which could also be stated in terms of (both partial and full) “\(T1=0\)” type conditions. The reader can think of convolution form SIOs.

We start considering \([b_2, [b_1, S_i]]\), where \(i=(i_1,i_2)\), \(i_j=(i_j^1,i_j^2)\) and \(S_i\) is a standard bi-parameter shift of complexity i. The reductions in pages 23 and 24 of [28] (Sect. 5.1) give that we only need to bound the key term

$$\begin{aligned} \langle U^{b_1, b_2}f, g \rangle := \sum _{K} \sum _{\begin{array}{c} R_1,R_2 \\ R_j^{(i_j)}=K \end{array}}&a_{K,R_1,R_2} [\langle b_1 \rangle _{R_2} - \langle b_1 \rangle _{R_1}] [\langle b_2 \rangle _{R_2} - \langle b_2 \rangle _{R_1}] \langle f, h_{R_1} \rangle \langle g, h_{R_2} \rangle , \end{aligned}$$

where as usual \(K=K^1 \times K^2\) and \(R_j=I^1_j \times I^2_j\).

We write

$$\begin{aligned} \begin{aligned} \langle b_i \rangle _{R_2} - \langle b_i \rangle _{R_1}&= [\langle b_i \rangle _{R_2} - \langle b_i \rangle _{K^1 \times I^2_2}] + [\langle b_i \rangle _{K^1 \times I^2_2} - \langle b_i \rangle _{K}] \\&\quad + [\langle b_i \rangle _{K} - \langle b_i \rangle _{K^1 \times I_1^2}] + [\langle b_i \rangle _{K^1 \times I_1^2} - \langle b_i \rangle _{R_1}]. \end{aligned} \end{aligned}$$

This splits \(U^{b_1, b_2}\) into 16 different terms \(U^{b_1, b_2}_{m_1, m_2}\), where \(m_i \in \{1, \ldots , 4\}\) tells which one of the above terms we have for \(b_i\). These can be handled quite similarly, but there are some variations in the arguments. We will handle two representative ones.

We begin by looking at the term

$$\begin{aligned} \langle U^{b_1, b_2}_{3,4} f, g \rangle&:= \sum _{K} \sum _{\begin{array}{c} R_1,R_2 \\ R_j^{(i_j)}=K \end{array}} a_{K,R_1,R_2} [\langle b_1 \rangle _{K^1 \times I_1^2}-\langle b_1 \rangle _{K}] [\langle b_2 \rangle _{R_1}\\&\quad -\langle b_2 \rangle _{K^1 \times I_1^2}] \langle f, h_{R_1} \rangle \langle g, h_{R_2} \rangle . \end{aligned}$$

Write

$$\begin{aligned} \begin{aligned} \langle b_1 \rangle _{K^1 \times I_1^2}-\langle b_1 \rangle _{K}&= \sum _{I_1^2 \subsetneq L^2 \subset K^2} \langle \Delta _{L^2} \langle b_1 \rangle _{K^1, 1} \rangle _{I_1^2} \\&= \sum _{I_1^2 \subsetneq L^2 \subset K^2} \Big \langle b_1, \frac{1_{K^1}}{|K^1|} \otimes h_{L^2} \Big \rangle \langle h_{L^2} \rangle _{I_1^2} \end{aligned} \end{aligned}$$
(4.10)

and

$$\begin{aligned} \langle b_2 \rangle _{R_1}-\langle b_2 \rangle _{K^1 \times I_1^2} = \sum _{I_1^1 \subsetneq L^1 \subset K^1} \langle \Delta _{L^1} \langle b_2 \rangle _{I_1^2, 2} \rangle _{I_1^1} = \sum _{I_1^1 \subsetneq L^1 \subset K^1} \Big \langle b_2, h_{L^1} \otimes \frac{1_{I_1^2}}{|I_1^2|} \Big \rangle \langle h_{L^1} \rangle _{I_1^1}. \end{aligned}$$

Writing \(\big \langle b_1, \frac{1_{K^1}}{|K^1|} \otimes h_{L^2} \big \rangle = \int _{{\mathbb {R}}^{d_1}} \langle b_1, h_{L^2} \rangle _2 \frac{1_{K^1}}{|K^1|}\) and similarly for \(\big \langle b_2, h_{L^1} \otimes \frac{1_{I_1^2}}{|I_1^2|} \big \rangle \) we arrive at

$$\begin{aligned}&\int _{{\mathbb {R}}^d} \sum _{K} \sum _{\begin{array}{c} L=L^1 \times L^2 \subset K \\ \ell (L^j) > 2^{-i_1^j}\ell (K^j) \end{array}} |\langle b_1, h_{L^2} \rangle _2| |L^2|^{-1/2} |\langle b_2, h_{L^1} \rangle _1| |L^1|^{-1/2} \\&\quad \sum _{\begin{array}{c} R_1^{(i_1)}=R_2^{(i_2)}=K \\ R_1 \subset L \end{array}} | a_{K,R_1,R_2} \langle f, h_{R_1} \rangle \langle g, h_{R_2} \rangle | \frac{1_{K^1}}{|K^1|} \frac{1_{I_1^2}}{|I_1^2|}. \end{aligned}$$

The last line can be dominated by

$$\begin{aligned} |L^1| \langle M^2 \Delta _{K,i_1} f \rangle _{L^1,1} \langle |\Delta _{K, i_2} g| \rangle _{K} \frac{1_{K^1}}{|K^1|} 1_{L^2}. \end{aligned}$$

We have now reached the term

$$\begin{aligned}&\int _{{\mathbb {R}}^d} \sum _{K} \langle |\Delta _{K, i_2} g| \rangle _{K} \frac{1_{K^1}}{|K^1|} \sum _{\begin{array}{c} L^2 \subset K^2 \\ \ell (L^2)> 2^{-i_1^2}\ell (K^2) \end{array}} |\langle b_1, h_{L^2} \rangle _2| |L^2|^{-1/2} 1_{L^2} \\&\quad \sum _{\begin{array}{c} L^1 \subset K^1 \\ \ell (L^1) > 2^{-i_1^1}\ell (K^1) \end{array}} |\langle b_2, h_{L^1} \rangle _1| |L^1|^{1/2} \langle M^2 \Delta _{K, i_1} f \rangle _{L^1,1}. \end{aligned}$$

Recall that with fixed \(x_2\) we have \(b(\cdot , x_2) \in {\text {BMO}}(\nu ^{1/2}(\cdot ,x_2))\), see (4.7). By weighted \(H^1\)-\({\text {BMO}}\) duality we now have that

$$\begin{aligned}&\sum _{\begin{array}{c} L^1 \subset K^1 \\ \ell (L^1)> 2^{-i_1^1}\ell (K^1) \end{array}} |\langle b_2, h_{L^1} \rangle _1(x_2) | |L^1|^{1/2} \langle M^2 \Delta _{K, i_1} f \rangle _{L^1,1}(x_2) \\&\qquad \lesssim \Vert b_2\Vert _{{\text {bmo}}(\nu ^{1/2})} \int _{{\mathbb {R}}^{d_1}}\\&\qquad \qquad \left( \sum _{\begin{array}{c} L^1 \subset K^1 \\ \ell (L^1) > 2^{-i_1^1}\ell (K^1) \end{array}} 1_{L^1}(y_1) (\langle M^2 \Delta _{K, i_1} f \rangle _{L^1,1}(x_2))^2 \right) ^{1/2} \nu ^{1/2}(y_1, x_2) \,\mathrm {d}y_1 \\&\qquad \le (i_1^1)^{1/2} \Vert b_2\Vert _{{\text {bmo}}(\nu ^{1/2})} |K^1| \langle M^1 M^2 \Delta _{K, i_1} f \cdot \nu ^{1/2} \rangle _{K^1,1}(x_2). \end{aligned}$$

The term \((i_1^1)^{1/2} \Vert b_2\Vert _{{\text {bmo}}(\nu ^{1/2})}\) is fine and we do not drag it along in the following estimates. We are left with the task of bounding

$$\begin{aligned}&\int _{{\mathbb {R}}^d} \sum _{K} \langle |\Delta _{K, i_2} g| \rangle _{K}1_{K^1} \sum _{\begin{array}{c} L^2 \subset K^2 \\ \ell (L^2) > 2^{-i_1^2}\ell (K^2) \end{array}} |\langle b_1, h_{L^2} \rangle _2| |L^2|^{-1/2} 1_{L^2} \\&\quad M^1( M^1 M^2 \Delta _{K, i_1} f \cdot \nu ^{1/2}). \end{aligned}$$

We now put the \(\int _{{\mathbb {R}}^{d_2}}\) inside and get the term

$$\begin{aligned} \int _{{\mathbb {R}}^{d_2}} 1_{L^2} M^1( M^1 M^2 \Delta _{K, i_1} f \cdot \nu ^{1/2}) = |L^2| \langle M^1( M^1 M^2 \Delta _{K, i_1} f \cdot \nu ^{1/2}) \rangle _{L^2, 2}. \end{aligned}$$

Then, we are left with

$$\begin{aligned}&\int _{{\mathbb {R}}^{d_1}} \sum _{K} \langle |\Delta _{K, i_2} g| \rangle _{K}1_{K^1} \sum _{\begin{array}{c} L^2 \subset K^2 \\ \ell (L^2) > 2^{-i_1^2}\ell (K^2) \end{array}} |\langle b_1, h_{L^2} \rangle _2| |L^2|^{1/2} \\&\quad \langle M^1( M^1 M^2 \Delta _{K, i_1} f \cdot \nu ^{1/2}) \rangle _{L^2, 2}. \end{aligned}$$

By weighted \(H^1\)-\({\text {BMO}}\) duality we have analogously as above that

$$\begin{aligned}&\sum _{\begin{array}{c} L^2 \subset K^2 \\ \ell (L^2) > 2^{-i_1^2}\ell (K^2) \end{array}} |\langle b_1, h_{L^2} \rangle _2| |L^2|^{1/2} \langle M^1( M^1 M^2 \Delta _{K, i_1} f \cdot \nu ^{1/2}) \rangle _{L^2, 2} \\&\qquad \qquad \lesssim (i_1^2)^{1/2} \Vert b_1\Vert _{{\text {bmo}}(\nu ^{1/2})} \int _{{\mathbb {R}}^{d_2}} 1_{K^2} M^2 M^1( M^1 M^2 \Delta _{K, i_1} f \cdot \nu ^{1/2}) \nu ^{1/2}. \end{aligned}$$

Forgetting the factor \((i_1^2)^{1/2} \Vert b_1\Vert _{{\text {bmo}}(\nu ^{1/2})}\), which is as desired, we are then left with

$$\begin{aligned}&\int _{{\mathbb {R}}^{d}} \sum _{K} \langle |\Delta _{K, i_2} g| \rangle _{K}1_{K} M^2 M^1( M^1 M^2 \Delta _{K, i_1} f \cdot \nu ^{1/2}) \nu ^{1/2} \\&\quad \le \int _{{\mathbb {R}}^{d}} \sum _{K} M^2 M^1( M^1 M^2 \Delta _{K, i_1} f \cdot \nu ^{1/2}) \cdot M^1 M^2 \Delta _{K, i_2} g \cdot \nu ^{1/2}. \end{aligned}$$

Writing \(\nu ^{\frac{1}{2}} = \mu ^{\frac{1}{2p}} \lambda ^{\frac{1}{2p}} \cdot \lambda ^{-\frac{1}{p}}\) we bound this with

$$\begin{aligned} \left\| \left( \sum _{K} [M^2 M^1( M^1 M^2 \Delta _{K, i_1} f \cdot \nu ^{1/2})]^2 \right) ^{1/2} \right\| _{L^p(\mu ^{1/2}\lambda ^{1/2})} \end{aligned}$$

multiplied by

$$\begin{aligned} \left\| \left( \sum _{K} [M^1 M^2 \Delta _{K, i_2} g]^2 \right) ^{1/2} \right\| _{L^{p'}(\lambda ^{1-p'})}. \end{aligned}$$

It remains to use square function bounds together with the Fefferman–Stein inequality. For the more complicated term with the function f the key thing to notice is that first \(\mu ^{1/2}\lambda ^{1/2} \in A_p\) and then that \(\nu ^{p/2} \mu ^{1/2}\lambda ^{1/2} = \mu \). We have controlled \(\langle U^{b_1, b_2}_{3,4} f, g \rangle \).

The bound for \(\langle U^{b_1, b_2}f, g \rangle \) follows by handling the other similar terms \(U^{b_1, b_2}_{m_1, m_2}\). There is a slight variation in the argument needed, for example, in the following term

$$\begin{aligned} \langle U^{b_1, b_2}_{1,1} f, g \rangle&:= \sum _{K} \sum _{\begin{array}{c} R_1,R_2 \\ R_j^{(i_j)}=K \end{array}} a_{K,R_1,R_2} [\langle b_1 \rangle _{R_2}-\langle b_1 \rangle _{K^1 \times I^2_2}] [\langle b_2 \rangle _{R_2}\\&\quad -\langle b_2 \rangle _{K^1 \times I^2_2}] \langle f, h_{R_1} \rangle \langle g, h_{R_2} \rangle . \end{aligned}$$

We expand the differences of averages as

$$\begin{aligned} \begin{aligned}&{[}\langle b_1 \rangle _{R_2}-\langle b_1 \rangle _{K^1 \times I^2_2}] [\langle b_2 \rangle _{R_2}-\langle b_2 \rangle _{K^1 \times I^2_2}] \\&\quad = \sum _{I^1_2 \subsetneq U^1 \subset K^1}\sum _{I^1_2 \subsetneq V^1 \subset K^1} \Big \langle b_1, h_{U^1} \otimes \frac{1_{I^2_2}}{|I^2_2|} \Big \rangle \langle h_{U^1} \rangle _{I^1_2} \Big \langle b_2, h_{V^1} \otimes \frac{1_{I^2_2}}{|I^2_2|} \Big \rangle \langle h_{V^1} \rangle _{I^1_2}. \end{aligned} \end{aligned}$$

The key difference to the above term \(U^{b_1, b_2}_{3,4}\) is that we need to further split this into two by comparing whether we have \(V^1 \subset U^1\) or \(U^1 \subsetneq V^1\). The related two terms are handled symmetrically. The absolute value of the one coming from “\(V^1 \subset U^1\)” can be written as

$$\begin{aligned} \begin{aligned}&\int _{{\mathbb {R}}^{d_2}} \int _{{\mathbb {R}}^{d_2}} \sum _{K} \sum _{\begin{array}{c} U^1 \subset K^1 \\ \ell (U^1)> 2^{-i_2^1}\ell (K^1) \end{array}}\\&\quad \sum _{\begin{array}{c} V^1 \subset U^1 \\ \ell (V^1) > 2^{-i_2^1}\ell (K^1) \end{array}} |\langle b_1, h_{U^1} \rangle _1(x_2)| |{U^1}|^{-1/2} |\langle b_2, h_{V^1} \rangle _1(y_2)| |{V^1}|^{-1/2} \\&\quad \sum _{\begin{array}{c} (I_1^1)^{(i_1^1)} = (I^1_2)^{(i_2^1)} = K^1 \\ I^1_2 \subset V^1 \end{array}} \sum _{(I_1^2)^{(i_1^2)} = (I^2_2)^{(i_2^2)} = K^2} | a_{K,R_1,R_2} \langle f, h_{R_1} \rangle \langle g, h_{R_2} \rangle | \frac{1_{I^2_2}(x_2)}{|I^2_2|} \frac{1_{I^2_2}(y_2)}{|I^2_2|}. \end{aligned} \end{aligned}$$

The last line can be dominated by

$$\begin{aligned} \langle | \Delta _{K,i_1}f | \rangle _{K} |V^1| \sum _{(I^2_2)^{(i_2^2)} = K^2} \langle |\Delta _{K,i_2} g| \rangle _{V^1 \times I^2_2} \frac{1_{I^2_2}(x_2)}{|I^2_2|} 1_{I^2_2}(y_2). \end{aligned}$$

Using the weighted \(H^1\)-\({\text {BMO}}\) duality as above we have

$$\begin{aligned}&\int _{{\mathbb {R}}^{d_2}} \sum _{\begin{array}{c} V^1 \subset U^1 \\ \ell (V^1) > 2^{-i_2^1}\ell (K^1) \end{array}} |\langle b_2, h_{V^1} \rangle _1(y_2)| |{V^1}|^{1/2} \langle |\Delta _{K,i_2} g| \rangle _{V^1 \times I^2_2} 1_{I^2_2}(y_2) \,\mathrm {d}y_2 \\&\quad \le (i_2^1)^{1/2} \Vert b_2 \Vert _{{\text {bmo}}(\nu ^{1/2})} |U^1| |I^2_2| \langle M^1 M^2 \Delta _{K,i_2} g \cdot \nu ^{1/2} \rangle _{U^1 \times I^2_2}. \end{aligned}$$

Forgetting the factor \( (i_2^1)^{1/2} \Vert b_2 \Vert _{{\text {bmo}}(\nu ^{1/2})}\) we have reached the term

$$\begin{aligned}&\int _{{\mathbb {R}}^{d_2}} \sum _{K} \langle | \Delta _{K,i_1}f | \rangle _{K} \sum _{(I^2_2)^{(i_2^2)} = K^2} 1_{I^2_2} \sum _{\begin{array}{c} U^1 \subset K^1 \\ \ell (U^1) > 2^{-i_2^1}\ell (K^1) \end{array}} |\langle b_1, h_{U^1} \rangle _1| |{U^1}|^{1/2} \\&\quad \langle M^1 M^2 \Delta _{K,i_2} g \cdot \nu ^{1/2} \rangle _{U^1 \times I^2_2}, \end{aligned}$$

which—after using the \(H^1\)-\({\text {BMO}}\) duality—produces \((i_2^1)^{1/2} \Vert b_1 \Vert _{{\text {bmo}}(\nu ^{1/2})}\) multiplied by

$$\begin{aligned} \int _{{\mathbb {R}}^d} \sum _{K} \langle | \Delta _{K,i_1} f | \rangle _{K} M^1M^2 (M^1 M^2 \Delta _{K,i_2} g \cdot \nu ^{1/2}) \nu ^{1/2} 1_{K}. \end{aligned}$$

Similarly as with \(U^{b_1,b_2}_{3,4}\), this term is under control. The term with \(U^1 \subsetneq V^1\) is symmetric, and so we are also done with \(U^{b_1,b_2}_{1,1}\).

This ends our treatment of \(U^{b_1, b_2}\), since the above arguments showcased the only major difference between the various terms \(U^{b_1, b_2}_{m_1, m_2}\). Thus, we are done with \([b_2, [b_1, S_{i}]]\). By Lemma 3.11 we conclude that

$$\begin{aligned} \Vert [b_2, [b_1, Q_{k_1, k_2}]]\Vert _{L^p(\mu ) \rightarrow L^p(\lambda )} \lesssim (1+k_1)(1+k_2) (1+\max (k_1, k_2)) \prod _{i=1}^2 \Vert b_i\Vert _{{\text {bmo}}(\nu ^{1/2})}. \end{aligned}$$

By handling the higher order commutators similarly, we get the claim related to assumption (1). We omit these details. \(\square \)

Remark 4.11

The new square root save from the \(H^1\)-\({\text {BMO}}\) arguments reduces the required regularity from \(m+1\) to \(m/2+1\). In these higher order commutators this is more significant than the save that could theoretically be obtained by not using Lemma 3.11. This could change the \(+1\) to \(+1/2\).

Theorem 4.2 involves only one-parameter CZOs in its estimate

$$\begin{aligned} \Vert [T_1, [T_2, b]] \Vert _{L^p(\mu ) \rightarrow L^p(\lambda )} \lesssim \Vert b\Vert _{{\text {BMO}}_{\text {prod}}(\nu )}, \end{aligned}$$

while the basic estimate

$$\begin{aligned} \Vert [b, T]\Vert _{L^p(\mu ) \rightarrow L^p(\lambda )} \lesssim \Vert b\Vert _{{\text {bmo}}(\nu )} \end{aligned}$$

of Theorem 4.8 involves a bi-parameter CZO T. A joint generalization—considered in the unweighted case in [36]—is an estimate for

$$\begin{aligned} \Vert [T_1, [T_2, \ldots [b, T_k]]] \Vert _{L^p(\mu ) \rightarrow L^p(\lambda )}, \end{aligned}$$

where each \(T_i\) can be a completely general m-parameter CZO. Then the appearing \({\text {BMO}}\) norm is some suitable combination of little \({\text {BMO}}\) and product \({\text {BMO}}\). See [1, 2] for a fully satisfactory Bloom type upper estimate in this generality – however, only for CZOs with the standard kernel regularity. The general case of [1, 2] is hard to digest, but let us formulate a model theorem of this type with mild kernel regularity.

Theorem 4.12

Let \({\mathbb {R}}^d = \prod _{i=1}^4 {\mathbb {R}}^{d_i}\) be a product space of four parameters and let \({\mathcal {I}}= \{{\mathcal {I}}_1, {\mathcal {I}}_2\}\), where \({\mathcal {I}}_1 = \{1,2\}\) and \({\mathcal {I}}_2 = \{3,4\}\), be a partition of the parameter space \(\{1, 2, 3, 4\}\). Suppose that \(T_i\) is a bi-parameter \((\omega _{1,i}, \omega _{2,i})\)-CZO on \(\prod _{j \in {\mathcal {I}}_i} {\mathbb {R}}^{d_j}\), where \(\omega _{j, i} \in {\text {Dini}}_{3/2}\). Let \(b :{\mathbb {R}}^d \rightarrow {\mathbb {C}}\), \(p \in (1, \infty )\), \(\mu , \lambda \in A_p({\mathbb {R}}^d)\) be 4-parameter weights and \(\nu = \mu ^{1/p} \lambda ^{-1/p}\) be the associated Bloom weight. Then we have

$$\begin{aligned} \Vert [T_1, [T_2, b]] \Vert _{L^p(\mu ) \rightarrow L^p(\lambda )} \lesssim \Vert b\Vert _{{\text {bmo}}^{{\mathcal {I}}}(\nu )}. \end{aligned}$$

Here \({\text {bmo}}^{{\mathcal {I}}}(\nu )\) is the following weighted little product \({\text {BMO}}\) space:

$$\begin{aligned} \Vert b\Vert _{{\text {bmo}}^{{\mathcal {I}}}(\nu )} = \sup _{{\bar{u}}} \Vert b\Vert _{{\text {BMO}}_{{\text {prod}}}^{{\bar{u}}}(\nu )}, \end{aligned}$$

where \({\bar{u}} = (u_i)_{i=1}^2\) is such that \(u_i \in {\mathcal {I}}_i\) and \({\text {BMO}}_{{\text {prod}}}^{{\bar{u}}}(\nu )\) is the natural weighted bi-parameter product \({\text {BMO}}\) space on the parameters \({\bar{u}}\). For example,

$$\begin{aligned} \Vert b\Vert _{{\text {BMO}}_{{\text {prod}}}^{(1,3)}(\nu )} := \sup _{x_2 \in {\mathbb {R}}^{d_2}, x_4 \in {\mathbb {R}}^{d_4}} \Vert b(\cdot , x_2, \cdot , x_4) \Vert _{{\text {BMO}}_{{\text {prod}}}(\nu (\cdot , x_2, \cdot , x_4))}, \end{aligned}$$

where the last weighted product \({\text {BMO}}\) norm is defined in (4.1).

The proof is again a combination of Lemma 3.11 with the known estimates for the commutators of standard model operators [1, 2]. However, there is again the additional square root save. There are no new significant challenges with this, which was not the case with Theorem 4.8 above, since these references are completely based on the \(H^1\)-\({\text {BMO}}\) strategy. In this regard the situation is closer to that of Theorem 4.2.