Product Space Singular Integrals with Mild Kernel Regularity

We develop product space theory of singular integrals with mild kernel regularity. We study these kernel regularity questions specifically in situations that are very tied to the T1 type arguments and the corresponding structural theory. In addition, our results are multilinear.


Introduction
The usual definition of a singular integral operator (SIO) involves a Hölder-continuous kernel K with a power-type continuity-modulus t → t γ . However, many results continue to hold with significantly more general assumptions. Such kernel regularity considerations become non-trivial especially in connection with results that go beyond the classical Calderón-Zygmund theory-an example is the A 2 theorem of Hytönen [21] with Dini-continuous kernels by Lacey [24]. Estimates for SIOs with mild kernel regularity are, for instance, linked to the theory of rough singular integrals, see e.g. [22].
The fundamental question concerning the L 2 (or L p ) boundedness of an SIO T is usually best answered by so-called T 1 theorems, where the action of the operator T on the constant function 1 is key. We study kernel regularity questions specifically in situations that are very tied to the T 1 type arguments and the corresponding structural theory-a big part of the modern product space theory of SIOs relies on such analysis. The proofs of T 1 theorems display a fundamental structural decomposition of SIOs into their cancellative parts and so-called paraproducts. It is this structure that is extremely important for obtaining further estimates beyond the initial scalar-valued L p boundedness. Refined versions of T 1 theorems provide exact identities in terms of model operators and are called representation theorems, see [20,21,32].
A concrete definition of kernel regularity is as follows. It concerns the required regularity of the continuity-moduli ω appearing in the various kernel estimates, such as, Recently, Grau de la Herrán and Hytönen [17] proved that the modified Dini condition ω Dini α :=ˆ1 0 ω(t) 1 + log 1 t α dt t with α = 1 2 is sufficient to prove a T 1 theorem even with an underlying measure μ that can be non-doubling. This matches the best known sufficient condition for the classical homogeneous T 1 theorem [10]-such results are implicit in Figiel [16] and explicit in Deng et al. [11]. The exponent α = 1 2 has a fundamental, even sharp, feeling in all of the existing arguments.
In [17] a new type of representation theorem appears, where the key difference to the original representation theorems [20,21] is that the decomposition of the cancellative part is in terms of different operators that package multiple dyadic shifts into one and offer more efficient bounds when it comes to kernel regularity. Some of the ideas of the decomposition in [17] are rooted in the work of Figiel [15,16]. We simultaneously extend [17] both to the multilinear [12][13][14]27,33] and multi-parameter [23,30,32,35] settings. The proofs of the representation theorems appear to be now converging to their final and most elegant form, and the arguments are simultaneously efficient and sharp.
Linear bi-parameter SIOs, for example, have kernels with singularities on x 1 = y 1 or x 2 = y 2 , where x, y ∈ R d are written as x = (x 1 , x 2 ), y = (y 1 , y 2 ) ∈ R d 1 ×R d 2 for a fixed partition d = d 1 + d 2 . For x, y ∈ C = R × R, compare e.g. the one-parameter Beurling kernel 1/(x − y) 2 with the bi-parameter kernel 1/[(x 1 − y 1 )(x 2 − y 2 )]the product of Hilbert kernels in both coordinate directions. In general, the product space analysis is quite different from one-parameter analysis and seems to resist many techniques-in part due to the failure of bi-parameter sparse domination methods, see [3] (see also [4] however), representation theorems are even more important in bi-parameter than in one-parameter. For example, the dyadic representation methods have proved very fruitful in connection with bi-parameter commutators and weighted analysis, see Holmes-Petermichl-Wick [19], Ou-Petermichl-Strouse [36] and [28]. See also [1,2].
(1) If T i , i = 1, 2, is a one-parameter ω i -CZO on R d i , where ω i ∈ Dini 3/2 , then (2) Suppose that T is a bi-parameter (ω 1 , ω 2 )-CZO. Then we have See the main text for all of the definitions and for additional results. These Bloomstyle two-weight estimates have recently been one of the main lines of development concerning commutators, see e.g. [1,2,18,19,25,26,28,29] for a non-exhaustive list.

Basic Notation and Fundamental Estimates
Throughout this paper A B means that A ≤ C B with some constant C that we deem unimportant to track at that point. We write A ∼ B if A B A. Dyadic Notation. Given a dyadic grid D, I ∈ D and k ∈ Z, k ≥ 0, we use the following notation: (1) (I ) is the side length of I .
(2) I (k) ∈ D is the kth parent of I , i.e., I ⊂ I (k) and (I (k) ) = 2 k (I ). See e.g. [7,8] for even weighted S D f L p (w) ∼ f L p (w) , w ∈ A p , square function estimates and their history. A weight w (i.e. a locally integrable a.e. positive function) belongs to the weight class A p (R d ), where the supremum is taken over all cubes Q ⊂ R d . This follows by extrapolating the corresponding weighted L 2 version of (2.3), which, in turn, simply follows from S D f L 2 (w) ∼ f L 2 (w) , w ∈ A 2 . Recall that the classical extrapolation theorem of Rubio de Francia says that if h L p 0 (w) g L p 0 (w) for some p 0 ∈ (1, ∞) and all w ∈ A p 0 , then h L p (w) g L p (w) for all p ∈ (1, ∞) and all w ∈ A p .
Let K ∈ D. We have that Thus, (2.3) gives that We will also have use for the Fefferman-Stein inequality where M is the Hardy-Littlewood maximal function. Often, the lighter Stein's inequality is sufficient. For an interval J ⊂ R we denote by J l and J r the left and right halves of J , respectively. We define h 0 J = |J | −1/2 1 J and h 1 J = |J | −1/2 (1 J l − 1 J r ). Let now I = I 1 ×· · ·× I d ⊂ R d be a cube, and define the Haar function h If η = 0 the Haar function is cancellative:´h η I = 0. We exploit notation by suppressing the presence of η, and write h I for some h η I , η = 0. Notice that for I ∈ D we have I f = f , h I h I (where the finite η summation is suppressed), f , h I :=´f h I .

Bi-parameter Variants
A weight w(x 1 , x 2 ) (i.e. a locally integrable a.e. positive function) belongs to the bi-parameter weight class where the supremum is taken over R = I 1 × I 2 and each I i ⊂ R d i is a cube. Thus, this is the one-parameter definition but cubes are replaced by rectangles.
We have and that max ess sup while the constant [w] A p is dominated by the maximum to some power. For basic bi-parameter weighted theory see e.g. [19]. We say It is well-known that We do not have any important use for the A ∞ constant. The w ∈ A ∞ assumption can always be replaced with the explicit assumption w ∈ A s for some s ∈ (1, ∞), and then estimating everything with a dependence on [w] A s . We denote a general dyadic grid in R d i by D i . We denote cubes in D i by I i , J i , K i , etc. Thus, our dyadic rectangles take the forms If A is an operator acting on R d 1 , we can always let it act on the product space . Similarly, we use the notation if A is originally an operator acting on R d 2 . Our basic bi-parameter dyadic operators -martingale differences and averaging operators-are obtained by simply chaining together relevant one-parameter operators. For instance, a bi-parameter martingale difference is R f = 1 Bi-parameter estimates, such as the square function bound where p ∈ (1, ∞) and w is a bi-parameter A p weight, are easily obtained using vector-valued versions of the corresponding one-parameter estimates. The required vector-valued estimates, on the other hand, follow simply by extrapolating the obvious weighted L 2 (w) estimates.
We systematically collect maximal function and square function bounds now. First, some notation. When we integrate with respect to only one of the parameters we may e.g. write If D = D 1 × D 2 we define the dyadic bi-parameter maximal function

Now define the square functions
and define S 2 D 2 f analogously. Define also and define similarly P 2 K 2 ,k 2 . Then, we define P K ,k := P 1

Lemma 2.4
For p ∈ (1, ∞) and a bi-parameter weight w ∈ A p we have and the analogous estimate with P 2 Here M can e.g. be M 1

Bi-parameter Singular Integrals
Bi-parameter SIOs We say that ω is a modulus of continuity if it is an increasing and subadditive function with ω(0) = 0. A relevant quantity is the modified Dini condition In practice, the quantity (3.1) arises as follows: For many standard arguments α = 0 is enough. For the T 1 type arguments we will always need α = 1/2. Some further applications can require a higher α. Let R d = R d 1 × R d 2 and consider an n-linear operator T on R d . We define what it means for T to be an n-linear bi-parameter SIO. Let ω i be a modulus of continuity on First, we set up notation for the adjoints of T . We let T j * , j ∈ {0, . . . , n}, denote the full adjoints, i.e., T 0 * = T and otherwise A subscript 1 or 2 denotes a partial adjoint in the given parameter-for example, we define Finally, we can take partial adjoints with respect to different parameters in different slots also-in that case we denote the adjoint by T . It simply interchanges the functions f 1 j 1 and f 1 n+1 and the functions f 2 j 2 and f 2 n+1 . Of course, we e.g. have T j * , j * 1,2 = T j * and T 0 * , j * 1,2 = T j * 2 , so everything can be obtained, if desired, with the most general notation T j 1 * , j 2 * 1,2 . In any case, there are (n + 1) 2 adjoints (including T itself). Similarly, the dyadic model operators that we later define always have (n + 1) 2 different forms. Full Kernel Representation Here we assume that given m ∈ {1, 2} there exists j 1 , j 2 ∈ {1, . . . , n + 1} so that spt f m j 1 ∩ spt f m j 2 = ∅. In this case we demand that is a kernel satisfying a set of estimates which we specify next. The kernel K is assumed to satisfy the size estimate We also require the following continuity estimates-to which we continue to refer to as Hölder estimates despite the general continuity moduli. For example, we require that we have Of course, we also require all the other natural symmetric estimates, where c 1 can be in any of the given n + 1 slots and similarly for c 2 . There are, of course, (n + 1) 2 different estimates.
Finally, we require the following mixed Hölder and size estimates. For example, we ask that Again, we also require all the other natural symmetric estimates. Partial Kernel Representations Suppose now only that there exists j 1 , j 2 ∈ {1, . . . , n+ 1} so that spt f 1 j 1 ∩ spt f 1 j 2 = ∅. Then we assume that where K ( f 2 j ) is a one-parameter ω 1 -Calderón-Zygmund kernel as e.g. in [17] but with a constant depending on the fixed functions f 2 1 , . . . , f 2 n+1 . For example, this means that the size estimate takes the form The continuity estimates are analogous. We assume the following T 1 type control on the constant C( f 2 1 , . . . , f 2 n+1 ). We have and C(a I 2 , 1 I 2 , . . . , 1 I 2 ) + C(1 I 2 , a I 2 , 1 I 2 , . . . , 1 I 2 ) + · · · + C(1 I 2 , . . . , 1 I 2 , a I 2 ) |I 2 | for all cubes I 2 ⊂ R d 2 and all functions a I 2 satisfying a I 2 = 1 I 2 a I 2 , |a I 2 | ≤ 1 and a I 2 = 0. Analogous partial kernel representation on the second parameter is assumed when spt f 2 j 1 ∩ spt f 2 j 2 = ∅ for some j 1 , j 2 . Definition 3.4 If T is an n-linear operator with full and partial kernel representations as defined above, we call T an n-linear bi-parameter (ω 1 , ω 2 )-SIO.

Bi-parameter CZOs
We say that T satisfies the weak boundedness property if An SIO T satisfies the diagonal BMO assumption if the following holds. For all rectangles and The product BMO space is originally by Chang and Fefferman [5,6], and it is the right bi-parameter BMO space for many considerations. An SIO T satisfies the product BMO assumption if it holds S1 ∈ BMO prod for all the (n + 1) 2 adjoints S = T j 1 * , j 2 * 1,2 . Here S1 := S(1, . . . , 1). This can be interpreted in the sense that S1 BMO prod = sup where h R = h I 1 ⊗ h I 2 , the supremum is over all dyadic grids D i on R d i and open sets ⊂ R d = R d 1 × R d 2 with 0 < | | < ∞, and the pairings S1, h R can be defined, in a natural way, using the kernel representations.

Bi-parameter Model Operators
For hybrid operators we will use suggestive notation, such as, (Sπ) i to denote a bi-parameter operator that behaves like an ordinary n-linear shift S i on the first parameter and like an n-linear paraproduct π on the second-but this is just notation and our operators are not of tensor product form.
An n-linear biparameter shift S i takes the form and Here we assume that for m ∈ {1, 2} there exist two indices and for the remaining We continue to define modified shifts-they are important for the weak kernel regularity. Let for some j 1 , j 2 . Moreover, a K ,(R j ) = a K ,R 1 ,...,R n+1 is a scalar satisfying the usual normalization (3.8).
We now define the hybrid operators that behave like a modified shift in one of the parameters and like a standard shift in the other. A modified/standard n-linear Partial Paraproducts Partial paraproducts are hybrids of π and S or π and Q.
where the functions h I 1 j and u j,K 2 satisfy the following. There are j 0 , There is j 2 ∈ {1, . . . , n + 1} so that u j 2 ,K 2 = h K 2 and for the remaining indices j = j 2 we have u j,K 2 = 1 K 2 |K 2 | . Moreover, the coefficients are assumed to satisfy Of course, (π S) i is defined symmetrically. A modified n-linear partial paraproduct (Qπ) k with the paraproduct component on R d 2 takes the form for j = j 0 and u j,K 2 are like in (3.10). The constants satisfy the same normalization. Full Paraproducts An n-linear bi-parameter full paraproduct takes the form where the functions u j,K 1 and u j,K 2 are like in (3.10). The coefficients are assumed to satisfy

Comparison to the Usual Model Operators
The modified model operators can be written as suitable sums of the standard operators. This is practical when one is willing to lose 1 2 of kernel regularity or if some estimates are too difficult to carry out for the more complicated modified operators. However, some regularity is always lost if this decomposition is used, so it is preferable to make do without it. To communicate the gist we only give the following formulation.

be a modified n-linear bi-parameter shift. Then
Similarly, a modified/standard shift can be represented using standard shifts and a modified partial paraproduct can be represented using standard partial paraproducts.
Proof For notational convenience we consider a shift Q k of the particular form There is no essential difference in the general case.
We define We can write the shift with these similarly as in (3.12) just by replacing a with b and A with B.
For the moment we define the following shorthand. For a cube I and integers l, j 0 ∈ {1, 2, . . . } we define where id denotes the identity operator. Let R 1 , . . . , R n+1 be as in the summation of Q k . We use the above notation in both parameters, and we denote this, as usual, with superscripts D 1 I ,l ( j, j 0 ) and D 2 I ,l ( j, j 0 ). With some work (we omit the details) it can be shown that Also, we have that and B n+1,n+1 Finally, we write that Using the above decompositions we have the identity The terms 1 m 1 ,m 2 with m 1 , m 2 ∈ {1, . . . , n} and the terms inside the parentheses will be written as sums of standard shifts.
for j ∈ {m 2 + 1, . . . , n}. Using this notation we have that the term inside the brackets is n j=1 g j K 1 − n j=1 g j I 1

n+1
. We write that Then, we write n j=1 g j ( This identity splits 1,2 m 2 ,i 2 further as 1,2 We fix some m 1 and i 1 and consider the corresponding term. For convenience of notation we look at the case m 1 = m 2 =: m. There holds that This is seen as a standard shift once we reorganize the summation and verify the normalization. We take (I 1 n+1 ) (i 1 +1) as the new "top cube" in the first parameter ((I 1 n+1 ) (i 1 +1) corresponds to (L 1 ) (1) in the summation below). There holds that We have the estimate Notice that the term in the first line in the right hand side is 2 d 1 (n−m)/2 times the right normalization of the shift, since in 1,2 m,m,i 1 ,i 2 we have the cubes L 1 related to f j with j ∈ {m + 1, . . . , n}. Also, the term in the second line is almost cancelled out when one changes the averages in 1,2 m,m,i 1 ,i 2 into pairings against non-cancellative Haar functions.
We conclude that for some C ≥ 1 we have where S is a standard n-linear bi-parameter shift of the given complexity. The case of general m 1 , m 2 is analogous.
Finally, we look at the term 1 n+1,n+1 − 2 n+1 − 3 n+1 + 4 which by definition is (3.14) Consider the rectangles K , R 1 , . . . , R n+1 as fixed for the moment. There holds that The sum of (3.15) and (3.16) can similarly be split as When one recalls the definition of the functions g m 1 ,i 1 j and writes this in terms of the functions f j , one has that in the first parameter f j is paired with Each f j is paired similarly in the second parameter. In the case m 1 = m 2 =: m the summand in (3.17) can be written as The splitting in (3.17) gives us the identity 1 n+1,n+1 − 2 n+1 − 3 n+1 + 4 =: We fix some i 1 and i 2 and consider the case m 1 = m 2 =: m. From (3.18) we see that The coefficient satisfies the estimate Thus, we see that C −1 1,2,3,4 m,m,i 1 ,i 2 is a standard n-linear bi-parameter shift. The complexity of the shift is ((0, 0), . . . , (0, 0), (1, 1), . . . , (1,1), (i 1 + 1, i 2 + 1)) with m zeros. The case of general m 1 and m 2 is analogous.

Bi-parameter Representation Theorem We set
and denote the expectation over the product probability space by We also set D 0 = D 1 0 × D 2 0 , where D i 0 is the standard dyadic grid of R d i . We use the notation Given σ = (σ 1 , σ 2 ) and R = I 1 × I 2 ∈ D 0 we set Proof We decompose
The Main Terms For j 1 , j 2 we let These are symmetric and we choose to deal with σ := n,n+1,σ . After collapsing the relevant sums we have Using this notation we write , (3.23) where inside the brackets we have the corresponding term as in (3.21) and (3.22). The identity (3.23) splits σ into four terms σ = 1 σ + 2 σ + 3 σ + 4 σ . The Shift Case 1 σ We begin by looking at 1 σ , that is, the term coming from [ · ] in (3.23). Let us further define the abbreviation for all i, j or I 2 i = I 2 j for all i, j then ϕ R 1 ,...,R n+1 = 0. Thus, there holds that As in [17] we say that I ∈ D σ i is k-good for k ≥ 2-and denote this by Notice that for all I ∈ D i 0 we have Next, we consider E σ 1 σ and add goodness to the rectangles R. Recall that E σ = E σ 1 E σ 2 . We write D σ,good (k 1 , k 2 ) := D σ 1 ,good (k 1 ) × D σ 2 ,good (k 2 ). There holds that Therefore, we have shown that and C is a large enough constant. Let m 1 , . . . , m n+1 and R = I 1 × I 2 be as in the definition of Q k 1 ,k 2 . The goodness of the rectangle R easily implies (we omit the details, see [17]) that (R+m j ) (k 1 ,k 2 ) = R (k 1 ,k 2 ) =: K for all j ∈ {1, . . . , n + 1}. Recall the definition of ϕ R+m 1 ,...,R+m n+1 from (3.24). Therefore, to conclude that Q k 1 ,k 2 is a modified bi-parameter n-linear shift it remains to prove the normalization Let us first assume that k 1 ∼ 1 ∼ k 2 . Since m 1 i = 0 and m 2 j = 0 for some i and j we may use the full kernel representation of T to have that the left hand side of (3.27) is less thanˆR Applying the size of the kernel K this is further dominated bŷ Notice that this is the right estimate, since ω i (2 −k i ) ∼ 1 and |K | = |R (k 1 ,k 2 ) | ∼ |R| = |I 1 ||I 2 |.
Suppose then that k 1 and k 2 are large enough so that we can use the continuity assumption of the full kernel K . Using the zero integrals of h I 1 and h I 2 there holds that the left hand side of (3.27) equals where c I i denotes the center of the corresponding cube. Here one can use the continuity assumption of K which leads to a product of two one-parameter integrals which can be easily estimated.
What remains is the case that for example k 1 ∼ 1 and k 2 is large. This is done similarly as the above two cases using the mixed size and continuity assumption of K . This concludes the proof of (3.27) and we are done dealing with E σ 1 σ . The Partial Paraproduct Cases 2 σ and 3 σ Next, we look at the symmetric terms E σ 2 σ and E σ 3 σ . We explicitly consider E σ 2 σ here. Recall that 2 σ equals (3.29) Let us write the summand in (3.29) as ϕ I 1 1 ,...,I 1 n+1 ,I 2 . By proceeding in the same way as above with E σ 1 σ we have that The k-goodness of I 1 implies that here (I 1+ m j ) (k) = (I 1 ) (k) =: K 1 for all j. Therefore, to conclude that (Qπ) k is a modified partial paraproduct with the paraproduct component in R d 2 it remains to show that if we fix m 1 , . . . , m n+1 and I 1 as in the above sum then (3.31) We verify the above BMO condition by taking a cube I 2 and a function a I 2 such that a I 2 = a I 2 1 I 2 , |a I 2 | ≤ 1 and´a I 2 = 0, and showing that (3.32) For a suitably large constant C (so that we can use the continuity assumption of the kernel below) we split the pairing as Let us show that the first term in (3.33) is dominated by We have two cases. The case that k ∼ 1 is handled with the mixed size and continuity assumption of K . The case that k is large is handled with the continuity assumption of K . We show the details for the case k ∼ 1. The other case is done similarly (see also the paragraph containing (3.28)). We assume that k ∼ 1. Since a I 2 has zero integral the pairing that we are estimating equals (by definition) The mixed size and continuity property of K implies that the absolute value of the last integral is dominated bŷ The integral related to R d 1 is dominated by |I 1 | −(n−1)/2 . Consider the integral related to R d 2 . By first estimating that with some work we see that the integral over R (n+1)d 2 is dominated bŷ In conclusion, we showed that the first term in (3.33) is dominated by |I 1 | −(n−1)/2 |I 2 |, which is the right estimate in the case k ∼ 1. We turn to consider the second term in (3.33). We again split it into two by writing 1 = 1 (C I 2 ) c + 1 C I 2 in the second slot. The part with 1 (C I 2 ) c is estimated in the same way as above and then one continues with the part related to 1 C I 2 . This is repeated until we are only left with the term The estimate for this uses the partial kernel representations of T . Again, we have the two cases that either k ∼ 1 or k is large. These are handled in the same way using either the size or the continuity of the partial kernels. We consider explicitly the case that k is large. Using the zero integral of h I 1 we have that the above pairing equalŝ Taking absolute values and using the continuity of the partial kernel leads to By assumption there holds that C(1 C I 2 , . . . , 1 C I 2 , a I 2 ) |I 2 | and the integral is dominated by ω 1 (2 −k )|I 1 | (n+1)/2 |K 1 | n . This concludes the proof of (3.32) and also finishes our treatment of E σ , which equals This is directly a full paraproduct as 1 (1, . . . , 1), h R , and so we are done with this term. Therefore, we are done with the main terms, and no more full paraproducts will appear. The Remainder Rem σ To finish the proof of the bi-parameter representation theorem it remains to discuss the remainder term Rem σ . Some of the weak boundedness type assumptions are used here-but there is nothing surprising on how they are used and we do not focus on that. We only explain the structural idea. An (n + 1)-tuple (I i 1 , . . . , I i n+1 ) of cubes I i j ∈ D σ i belongs to I σ i if the following holds: if j is an index such that (I i j ) ≤ (I i k ) for all k, then there exists at least one index k 0 = j so that (I i j ) = (I i k 0 ). The remainder term can be written as where as usual R i = I 1 i × I 2 i . Let us write this as First, we look at the terms Rem 1 σ, j 1 and Rem 2 σ, j 2 which are analogous. Consider for example Rem 1 σ,n+1 . We further divide I σ 2 into subcollections by specifying the slots where the smallest cubes are. For example, we consider here the part of the sum with the tuples (I 2 1 , . . . , I 2 n+1 ) such that (I 2 i ) > (I 2 n ) = (I 2 n+1 ) for all i = 1, . . . , n − 1. By collapsing the relevant sums of martingale differences the term we are dealing with can be written as In the first parameter there is only one martingale difference and in the second parameter there are two (in the general case at least two). Thus, the strategy is that we will write this in terms of model operators that have a modified shift or a paraproduct structure in the first parameter and a standard shift structure in the second parameter. We omit the details. Finally, we consider Rem 3 σ . This is also divided into several cases by specifying the places of the smallest cubes in both parameters. For example, for notational convenience we take the part where (I 1 1 ) = (I 1 n+1 ) < (I 1 i ) and (I 2 1 ) = (I 2 n+1 ) < (I 2 i ) for all i = 2, . . . , n. Notice that in general the places and the number of the smallest cubes do not need to be the same in both parameters. After collapsing the relevant sums of martingale differences the term we are looking at is (3.36) Here we have two (in the general case at least two) martingale differences in each parameter so this will be written in terms of standard bi-parameter n-linear shifts. We omit the details. This completes the proof.
Corollaries We indicate some corollaries-we start with the most basic unweighted boundedness on the Banach range of exponents.

Suppose that Q k is a modified n-linear bi-parameter shift. Then the estimate
Suppose that (QS) k,i is a modified/standard shift (here k ∈ {1, 2, . . . } and i =  (i 1 , . . . , i n+1 )). Then the estimate Proof We only prove the statement for the operator Q k . This essentially contains the proof for (QS) k,i .
We assume Q k has the explicit form Using the notation (3.13) there holds that We do the same decomposition with the other three terms inside the bracket [ · ]. This splits [ · ] into a sum over m 1 , m 2 ∈ {1, . . . , n + 1}. Then, we notice that all the terms in the sum with m 1 = n + 1 or m 2 = n + 1 cancel out. Thus, we get a splitting of Q k ( f 1 , . . . , f n ), f n+1 into a sum over m 1 , m 2 ∈ {1, . . . , n}. All the terms with different m 1 and m 2 are estimated separately.
In what follows-for notational convenience-we will focus on the case m 1 = m 2 =: m ∈ {1, . . . , n}, and we define D 1 k ( j, m). The term in the splitting of Q k ( f 1 , . . . , f n ), f n+1 corresponding to m = m 1 = m 2 can be written as the sum and U 2 , U 3 and U 4 are defined similarly just by replacing h 0 R j , j ∈ {1, . . . , n}, by h 0 and h 0 R n+1 , respectively. With some direct calculations it can be shown that for all i ∈ {1, . . . , 4} we have Next, we look at the modified partial paraproducts. We will use the well known one-parameter H 1 -BMO duality estimate where the cubes I are in some dyadic grid.
We consider U 1 first. From the one-parameter H 1 -BMO duality estimate (3.39) we have that, with fixed K 1 and I 1 1 , . . . , I 1 n+1 , the sum over K 2 of the absolute value of the summand in (3.41) is dominated by The sum of this over K 1 and I 1 1 , . . . , I 1 n+1 such that (I 1 j ) (k) = K 1 is less than Notice that the square function related to f n+1 is just the bi-parameter square function S D . To finish the estimate it remains to use the Fefferman-Stein inequality and square function estimates, see Lemma 2.4. The second term | U 2 ( f 1 , . . . , f n ), f n+1 | satisfies the same upper bound (3.42), and can therefore be estimated in the same way. The proof is concluded.
The above, together with known estimates for standard operators, directly leads to Banach range boundedness of n-linear bi-parameter (ω 1 , ω 2 )-CZOs with ω i ∈ Dini 1/2 . We do not push this further in this paper. For state-of-the-art estimates with genuinely multilinear weights (in the full multilinear range) see [31]. There we recorded some of the estimates with Dini 1 using the above representation theorem and the decomposition of modified operators in terms of standard operators.
We are unable to perform the estimates of [31] with the regularity Dini 1 2 . However, the linear case is special: the weighted estimates of linear modified model operators with a bound depending on the square root of the complexity are easy. Notice that in principle we have already done all the necessary work. For example, if we want to estimate Q k f L p (w) , we study the unweighted pairings Q k f , g . Then, we proceed as in the linear case of Proposition 3.37. Depending on the form of the shift this leads us to terms corresponding to (3.38) such aŝ By Hölder's inequality this is less than Proposition 3.43 For every p ∈ (1, ∞) and bi-parameter A p weight w we have For completeness, we record the corresponding result for CZOs. Again, for multilinear weighted estimates with the optimal weight classes see [31].

Commutator Estimates
The basic form of a commutator is We are interested in various iterated versions in the multi-parameter setting and with mild kernel regularity. For a bi-parameter weight w ∈ A 2 (R d 1 × R d 2 ) and a locally integrable function b we define the weighted product BMO norm where the supremum is over all dyadic grids D i on R d i and D = D 1 × D 2 , and over all open sets ⊂ R d := R d 1 × R d 2 for which 0 < w( ) < ∞. The following theorem, which is the two-weight Bloom version of [9], was proved in [29] with ω i (t) = t γ i .

Theorem 4.2 Suppose that T i is a one-parameter
be bi-parameter weights and ν = μ 1/ p λ −1/ p ∈ A 2 (R d ) be the associated bi-parameter Bloom weight. Then we have for oneparameter modified shifts (which have a similar definition as in the bi-parameter case). It seems non-trivial to fully exploit the operators Q k here and we content on splitting the operators to standard shifts and bounding and other similar terms, where S k i , j i is a linear one-parameter shift on R d i of complexity (k i , j i ). Reaching Dini 1 would require replacing this step with a sharper estimate.
On page 11 of [29] it is recorded that Interestingly, this part of the argument can be improved: there actually holds that We will get back to this after completing the proof. Therefore, we have Handling the other terms of the shift expansion of Controlling commutators like [Q k 1 , [π, b]] similarly we get the claim. We return to (4.3) now. Decompositions are very involved in the bi-commutator case, and we prefer to give the idea of the improvement (4.3) by studying the simpler one-parameter situation [b, S i, j ], where Here we only have use for the expression on the right-hand side, which is the analogue of the bi-parameter definition (4.1). However, it is customary to define things as on the left-hand side in this one-parameter situation. The equivalence follows from the weighted John-Nirenberg [34] sup Of course, one-parameter commutators [b, T ] can be handled even with Dini 0 , but e.g. sparse domination proofs [25,26] are restricted to one-parameter, unlike these decompositions. To get started, we define the one-parameter paraproducts (with some implicit dyadic grid) We now decompose the commutator as follows We have the well-known fact that A k (b, f ) L p (λ) b BMO(ν) f L p (μ) for k = 1, 2-this can be seen by using the weighted H 1 -BMO duality [37] (with a I = b, h I ) where (a I ) BMO(ν) = sup Combining this with the well-known estimate The complexity dependence is coming from the remaining term There are many ways to bound this, but the following way based on the H 1 -BMO duality-and executed in the particular way that we do below-gives the best dependence that we are aware of: where we further write and similarly for b I − b K . We dualize and e.g. look at where we used the weighted H 1 -BMO duality. Here and we can bound We are done with the one-parameter case-the desired bi-parameter case can now be done completely similarly by tweaking the proof in [29] using the above idea.

Remark 4.5
The previous way to use the H 1 -BMO duality was to look at where l = 0, . . . , j − 1 is fixed, and to apply the H 1 -BMO duality to the whole K , L summation. With l fixed this yields a uniform estimate, and there is also a curious 'extra' cancellation present-we can even bound that is, forget the K , j from g. Then it remains to sum over l which yields the dependence j instead of j 1/2 . The way in our proof above is more efficient and we see that we utilize all of the cancellation as well.

Remark 4.6
An interesting question is can we have α = 1 instead of α = 3/2 by somehow more carefully exploiting the operators Q k -this would appear to be the optimal result theoretically obtainable by the current methods. We also note that it is certainly possible to handle higher order commutators, such as, We will continue with more multi-parameter commutator estimates -the difference to the above is that now even the singular integrals are allowed to be multi-parameter.
For a weight w on R d := R d 1 × R d 2 we say that a locally integrable function b : where the supremum is over rectangles R = I 1 × I 2 ⊂ R d . If w = 1 we denote the unweighted little BMO space by bmo. There holds that b bmo(w) ∼ max ess sup ·)) , ess sup see [19]. Here BMO(w(x 1 , ·)) and BMO(w(·, x 2 )) are the one-parameter weighted BMO spaces. For example, b(x 1 , ·) BMO(w(x 1 ,·)) := sup where the supremum is over cubes The following theorem was proved in [28] with ω i (t) = t γ i . The first order case [b, T ] appeared before in [19]. See also [29] for the optimality of the space bmo(ν 1/m ) in the case b 1 = · · · = b m = b. Theorem 4.8 Let p ∈ (1, ∞), μ, λ ∈ A p be bi-parameter weights and ν := μ 1/ p λ −1/ p . Suppose that T is a bi-parameter (ω 1 , ω 2 )-CZO and m ∈ N. Then we have if one of the following conditions holds: (1) T is paraproduct free and ω i ∈ Dini m/2+1 ; (2) m = 1 and ω i ∈ Dini 3/2 ; Proof The proof is similar in spirit to that of Theorem 4.2. We use Lemma 3.11 and estimates for the commutators of the usual bi-parameter model operators. If we use the bounds from [28] directly, we e.g. immediately get (4.9) Similarly, we can read an estimate for all the other model operators from [28]. This gives us the result under the higher regularity assumption (3). Indeed, when using the estimate (4.9) in connection with the representation theorem one ends up with the series We split this into two according to whether k 1 ≤ k 2 or k 1 > k 2 and, for example, there holds that The first order case m = 1 with the desired regularity (assumption (2) For m ≥ 2 the new square root save becomes tricky. The paper [28] is not at all based on the H 1 -BMO duality strategy on which this save is based on (see the proof of Theorem 4.2). We can improve the strategy of [28] for shifts. Thus, we are able to make the square root save for paraproduct free T (assumption (1)). By this we mean that (both partial and full) paraproducts in the dyadic representation of T vanish, which could also be stated in terms of (both partial and full) "T 1 = 0" type conditions. The reader can think of convolution form SIOs.
We start considering and S i is a standard bi-parameter shift of complexity i. The reductions in pages 23 and 24 of [28] (Sect. 5.1) give that we only need to bound the key term where as usual K = K 1 × K 2 and R j = I 1 This splits U b 1 ,b 2 into 16 different terms U b 1 ,b 2 m 1 ,m 2 , where m i ∈ {1, . . . , 4} tells which one of the above terms we have for b i . These can be handled quite similarly, but there are some variations in the arguments. We will handle two representative ones.
We begin by looking at the term 2 3,4 f , g := Writing b 1 , The last line can be dominated by We have now reached the term Recall that with fixed x 2 we have b(·, x 2 ) ∈ BMO(ν 1/2 (·, x 2 )), see (4.7). By weighted H 1 -BMO duality we now have that The term (i 1 1 ) 1/2 b 2 bmo(ν 1/2 ) is fine and we do not drag it along in the following estimates. We are left with the task of boundinĝ We now put the´Rd 2 inside and get the term Then, we are left witĥ By weighted H 1 -BMO duality we have analogously as above that Forgetting the factor (i 2 1 ) 1/2 b 1 bmo(ν 1/2 ) , which is as desired, we are then left witĥ Writing ν .
It remains to use square function bounds together with the Fefferman-Stein inequality.
For the more complicated term with the function f the key thing to notice is that first μ 1/2 λ 1/2 ∈ A p and then that ν p/2 μ 1/2 λ 1/2 = μ. We have controlled U b 1 ,b 2 3,4 f , g . The bound for U b 1 ,b 2 f , g follows by handling the other similar terms U b 1 ,b 2 m 1 ,m 2 . There is a slight variation in the argument needed, for example, in the following term U b 1 ,b 2 1,1 f , g := We expand the differences of averages as which-after using the H 1 -BMO duality-produces (i 1 2 ) 1/2 b 1 bmo(ν 1/2 ) multiplied byˆR Similarly as with U b 1 ,b 2 3,4 , this term is under control. The term with U 1 V 1 is symmetric, and so we are also done with U b 1 ,b 2 1,1 . This ends our treatment of U b 1 ,b 2 , since the above arguments showcased the only major difference between the various terms U b 1 ,b 2 m 1 ,m 2 . Thus, we are done with [b 2 , [b 1 , S i ]]. By Lemma 3.11 we conclude that By handling the higher order commutators similarly, we get the claim related to assumption (1). We omit these details.

Remark 4.11
The new square root save from the H 1 -BMO arguments reduces the required regularity from m + 1 to m/2 + 1. In these higher order commutators this is more significant than the save that could theoretically be obtained by not using Lemma 3.11. This could change the +1 to +1/2. where each T i can be a completely general m-parameter CZO. Then the appearing BMO norm is some suitable combination of little BMO and product BMO. See [1,2] for a fully satisfactory Bloom type upper estimate in this generality -however, only for CZOs with the standard kernel regularity. The general case of [1,2] is hard to digest, but let us formulate a model theorem of this type with mild kernel regularity. where ω j,i ∈ Dini 3/2 . Let b : R d → C, p ∈ (1, ∞), μ, λ ∈ A p (R d ) be 4-parameter weights and ν = μ 1/ p λ −1/ p be the associated Bloom weight. Then we have Here bmo I (ν) is the following weighted little product BMO space: is such that u i ∈ I i and BMOū prod (ν) is the natural weighted bi-parameter product BMO space on the parametersū. For example, b BMO (1,3) prod (ν) := sup where the last weighted product BMO norm is defined in (4.1).
The proof is again a combination of Lemma 3.11 with the known estimates for the commutators of standard model operators [1,2]. However, there is again the additional square root save. There are no new significant challenges with this, which was not the case with Theorem 4.8 above, since these references are completely based on the H 1 -BMO strategy. In this regard the situation is closer to that of Theorem 4.2.
Funding Open Access funding provided by University of Helsinki including Helsinki University Central Hospital.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.