Two-weight inequalities for multilinear commutators in product spaces

This note is devoted to establishing two-weight estimates for commutators of singular integrals. We combine multilinearity with product spaces. A new type of two-weight extrapolation result is used to yield the quasi-Banach range of estimates.

The corresponding two-weight problem concerns estimates from L p (µ) to L p (λ) for two different weights µ, λ and has recently attracted interest after the work by Holmes-Lacey-Wick [10].See also e.g.[11,14,15].
In this note we establish that two-weight estimates for commutators can be proved under the joint difficulty of multilinearity and product spaces.Both have been considered separately before: see e.g.[1,2,10,20,22] for the multi-parameter work, and [12] and [17] for the multilinear work.The recent satisfactory multilinear result of [17] is based on sparse domination and the approach cannot be used in our setting -this is due to the product space nature of the problem.For given exponents 1 < p 1 , . . ., p n ≤ ∞ and 1/p = i 1/p i > 0, a natural form of a weighted estimate in the n-variable context has the form The key thing is to only impose a joint condition on the tuple of weights w = (w 1 , . . ., w n ) ∈ A p rather than to assume individual conditions w p i i ∈ A p i .See Lerner, Ombrosi, Pérez, Torres and Trujillo-González [13] and for multi-parameter versions [21].Naturally, this interplay is trickier still in our two-weight setting.
Our result is the following.
For the exact definitions see the main text.Extrapolation methods are important in our current work -they are used to yield the quasi-Banach range p < 1.The extrapolation theorem of Rubio de Francia says that if g L p 0 (w) f L p 0 (w) for some p 0 ∈ (1, ∞) and all w ∈ A p 0 , then g L p (w) f L p (w) for all p ∈ (1, ∞) and all w ∈ A p .In [9] (see also [6]) a multivariable analogue was developed in the setting w p i i ∈ A p i , i = 1, . . ., n. Very recently, in [18,19,24] it was shown that also the genuinely multilinear weighted estimates can be extrapolated.We prove a suitable two-weight adaptation that can be used in our current work.1.2.Theorem.Let (f, f 1 , . . ., f n ) be a tuple of non-negative functions.Let p i , and j ∈ {1, . . ., n}.Assume that for all (w 1 , • • • , w n ), (w 1 , . . ., λ j , . . ., w n ) ∈ A p with w j λ −1 j ∈ A ∞ , there holds that Then for all (w 1 , • • • , w n ), (w 1 , . . ., λ j , • • • , w n ) ∈ A q with w j λ −1 j ∈ A ∞ and 1 < q i ≤ ∞, i = j, 1/q = 1/p j + n i=1 i =j 1/q i > 0, there holds that

PRELIMINARIES
Throughout this paper, A B means that A ≤ CB with some constant C that we deem unimportant to track at that point.We write A ∼ B if A B A. Sometimes we e.g.write A ǫ B if we want to make the point that A ≤ C(ǫ)B.

2.A. Dyadic notation.
Given a dyadic grid D in R d , I ∈ D and k ∈ Z, k ≥ 0, we use the following notation: (1) ℓ(I) is the side length of I.
(4) E I f = f I 1 I is the averaging operator, where For an interval J ⊂ R we denote by J l and J r the left and right halves of J, respectively.We define h 0 J = |J| −1/2 1 J and h 1 J = |J| −1/2 (1 J l − 1 Jr ).Let now I = I 1 × • • • × I d ⊂ R d be a cube, and define the Haar function h η I , η = (η 1 , . . ., η d ) ∈ {0, 1} d , by setting If η = 0 the Haar function is cancellative: ´hη I = 0. We exploit notation by suppressing the presence of η, and write h I for some h η I , η = 0. Notice that for I ∈ D we have ∆ I f = f, h I h I (where the finite η summation is suppressed), f, h I := ´f h I .

2.B.
Multi-parameter notation.We will be working on the bi-parameter product space We denote a general dyadic grid in R d i by D i .We denote cubes in D i by I i , J i , K i , etc. Thus, our dyadic rectangles take the forms I 1 × I 2 , J 1 × J 2 , K 1 × K 2 etc.We usually denote the collection of dyadic rectangles by If A is an operator acting on R d 1 , we can always let it act on the product space R d by setting A 1 f (x) = A(f (•, x 2 ))(x 1 ).Similarly, we use the notation A i f if A is originally an operator acting on R d i .Our basic multi-parameter dyadic operators -martingale differences and averaging operators -are obtained by simply chaining together relevant one-parameter operators.For instance, a bi-parameter martingale difference is When we integrate with respect to only one of the parameters we may e.g.write f, h I 1 1 (x 2 ) := ˆRd 1 f (x 1 , x 2 )h I 1 (x 1 ) dx 1 or f I 1 ,1 (x 2 ) := 2.C.Adjoints.Consider an n-linear operator T on i , i = 1, . . ., n + 1.We set up notation for the adjoints of T in the bi-parameter situation.We let T j * , j ∈ {0, . . ., n}, denote the full adjoints, i.e., T 0 * = T and otherwise A subscript 1 or 2 denotes a partial adjoint in the given parameter -for example, we define . Finally, we can take partial adjoints with respect to different parameters in different slots also -in that case we denote the adjoint by T j 1 * ,j  = T j * and = T j * 2 , so everything can be obtained, if desired, with the most general notation . In any case, there are (n + 1) 2 adjoints (including T itself).Similarly, the biparameter dyadic model operators that we later define always have (n + 1) 2 different forms.

2.D. Multilinear bi-parameter weights.
A weight w(x 1 , x 2 ) (i.e. a locally integrable a.e.positive function) belongs to the bi-parameter weight class where the supremum is taken over rectangles R -that is, over R = I 1 ×I 2 where I i ⊂ R d i is a cube.In contrast to the one-parameter definition, we take supremum over rectangles instead of cubes.We have , ess sup and that max ess sup , ess sup while the constant [w] Ap is dominated by the maximum to some power.It is also useful For basic bi-parameter weighted theory see e.g.[10].We say

It is well-known that
We also define The following multilinear reverse Hölder property is well-known -for the history and a very short proof see e.g.[17,Lemma 2.5].The proof in our bi-parameter setting is the same.
Next we define multilinear bi-parameter Muckenhoupt weights.Original one-parameter versions appeared in [13].Our definition in the bi-parameter case is the same as in [21].
almost everywhere and where the supremum is over rectangles R, R as ess sup R w.
Conveniently, we can characterize the class A p using the standard A p class.The lemma is proven in [13] and the bi-parameter analog of the same proof is recorded in [21].
In the case p i = 1 the estimate is interpreted as [w A p , and in the case p = ∞ we have Most of the proofs are duality based and this makes the following lemma relevant.p i ∈ (0, 1).Let w = (w 1 , . . ., w n ) ∈ A p with w = n i=1 w i and define w i = (w 1 , . . ., w i−1 , w −1 , w i+1 , . . ., w n ),

Then we have
[ In the main theorems of this paper we will be using the multilinear bi-parameter weights Throughout this paper, we will be using notation as they will appear regularly.
The assumption that ν ∈ A ∞ is necessary as it is not implied by the other two assumptions, see a counter-example in [17].
However, instead of the two separate conditions we would automatically get that n i=1 Yet, it is unlikely that this assumption is enough for the boundedness of the commutator as conjectured for the linear case in [16].Although, we will show below that this assumption is enough for the boundedness of Bloom type paraproducts in the Banach range and also sufficient to conclude the lower bound of the commutator.
On the other hand, the joint assumption for the weights is very natural for the twoweight commutator estimates since the assumption (w 1 , . . ., w n , νw −1 ) ∈ A (p 1 ,...,pn,p ′ ) is implied by the two separate multilinear weight conditions and ν ∈ A ∞ .This is easy to verify.Let n i=1 1 where in the step ( * ) we apply [17, Lemma 2.9] for ν ∈ A ∞ .
Motivated by the above discussion we give the following definition, where p ′ does not appear hence p > 1 is not needed.2.6.Definition.Given p = (p 1 , . . ., p n ) with 1 ≤ p 1 , . . ., p n ≤ ∞, we say that w = (w 1 , . . ., w n , w n+1 ) almost everywhere and where the supremum is over rectangles R, Morally the difference is that with A * p we do not necessarily have ∈ A ∞ compared to assuming the two separate A p and ν ∈ A ∞ but we are equipped with ν ∈ A n+1 .
Then for all 0 < p < ∞ and w ∈ A ∞ we have ˆf p w ˆgp w.
In addition, let {(f i , g i )} i be a sequence of pairs of non-negative functions defined on R d .Suppose that for some 0 < p 0 < ∞, (f i , g i ) satisfies inequality (2.8) for every i.Then, for all 0 < p, q < ∞ and w ∈ A ∞ (R d ) we have i , where {(f j , g j )} j is a sequence of pairs of non-negative functions defined on R d .
An efficient proof can be found in [21] (originally proved in [8]).Also we often need the result of R. Fefferman [7].Proof also recorded in [22,Appendix B].Denote 2.10.Proposition.Let λ ∈ A p , p ∈ (1, ∞), be a bi-parameter weight.Then for all s ∈ (1, ∞) we have 2.F.Square functions.We begin with the classical (dyadic) square function in the biparameter framework.Let D = D 1 × D 2 be a fixed lattice of dyadic rectangles.We define the square functions for all p ∈ (0, ∞) and bi-parameter weights w ∈ A ∞ .
The first inequality is the classical result found e.g. in [25, Theorem 2.5] and the latter inequality can be deduced using the A ∞ extrapolation, Lemma 2.7.
Notice that by disjointness of supports we have, for example, for all k Next, we take the definition of the n-linear square functions from [21].For k = (k 1 , k 2 ) we set In addition, we understand this so that A 1,k can also take any one of the symmetric forms, where each can alternatively be associated with any of the other functions f 2 , . . ., f n .That is, A 1,k can, for example, also take the form where we again understand this as a family of square functions.First, the appearing three martingale blocks can be associated with different functions, too.Second, we can have the K 1 summation out and the K 2 summation in (we can interchange them), but then we have two martingale blocks with K 2 and one martingale block with K 1 .
Finally, for k = (k 1 , k 2 , k 3 , k 4 ) we define where this is a family with two martingale blocks in each parameter, which can be moved around.
Moreover, we need a certain linear estimate which appears regularly when dealing with the commutator estimates.
2.14.Proposition ([21, Proposition 5.8.]).For u ∈ A ∞ and p, s ∈ (1, ∞) we have loc and a bi-parameter weight ν ∈ A ∞ we define the usual dyadic weighted little BMO norm of b as follows: In fact, the direct definition is not used that often and we will mostly invoke it through the following H 1 -BMO type inequalities.For i = 1 and i = 2 we have The first estimate follows from the one-parameter result [26], see e.g.[10].For the second inequality concerning square functions only see e.g.[1,Lemma 2.5].
Often when a supremum is taken over rectangles we also have a formulation of the norm uniformly each parameter separately.We have b bmo(ν) ∼ max ess sup where • BMO(ρ) is the standard one-parameter dyadic weighted BMO space.For proof see e.g.[10].
The following proposition gives an equivalent definition for the little BMO norm in Bloom type two-weight setting.The equivalent definition is needed for the proof of the lower bound of the commutator.
The proof can be adapted from the one-parameter version (see, for example, [17]).In our case, the sparse method poses no problems as it can be adapted to rectangles when the dyadic and sparse families inside of a rectangle R are attained by iteratively bisecting the size of R. We omit the details.
We formulate the Muckenhoupt-Wheeden type estimates now. .
In particular the above one is a special case of the two-weight version.We state this as a little bmo version. .
Also, we have with a similar estimate when the cancellation is on the second parameter.
Proof.Let us consider the first estimate above and use the duality By the reverse Hölder property of A ∞ weights, Lemma 2.2, we have Hence, for all R ∈ D we have The second part of the extrapolation result, Lemma 2.7, yields that as desired.
For the second claim observe that, for example, we have where we use the one-parameter duality for fixed variable on the second parameter.The proof is concluded as above.
Using characterizations (3.1) and (2.1), we have , where g x 1 denotes the one parameter function g(x 1 , •).We have a similar estimate for a fixed variable on R d 2 .
We omit the proof as it is analogous to the previous one.

MULTILINEAR BI-PARAMETER SINGULAR INTEGRALS
We call a function ω as a modulus of continuity if it is an increasing and subadditive function with ω(0) = 0.A relevant quantity is the modified Dini condition

4.A. Bi-parameter SIOs.
We consider an n-linear operator Let ω i be a modulus of continuity on R d i .We define that T is an n-linear bi-parameter (ω 1 , ω 2 )-SIO if it satisfies the full and partial kernel representations as defined below.
Full kernel representation.Let We demand that in this case we have the representation where kernel satisfying a set of estimates which we specify next.The kernel K is assumed to satisfy the size estimate In addition, we require the continuity estimate, for example, we demand that Of course, we also require all the other natural symmetric estimates, where c 1 can be in any of the given n + 1 slots and similarly for c 2 .There are, of course, (n + 1) 2 different estimates.
Moreover, we expect to have the following mixed continuity and size estimates.For example, we demand that Again, we also require all the other natural symmetric estimates.

Partial kernel representations. Suppose now only that there exists
Then we assume that where K (f 2 i ) is a one-parameter ω 1 -Calderón-Zygmund kernel with a constant depending on the fixed functions f 2 1 , . . ., f 2 n+1 .For example, this means that the size estimate takes the form The continuity estimates are analogous.We assume the following T 1 type control on the constant C(f 2 1 , . . ., f 2 n+1 ).We have (4.1) and Analogous partial kernel representation on the second parameter is assumed when

4.B. Multilinear bi-parameter Calderón-Zygmund operators.
We say that T satisfies the weak boundedness property if An SIO T satisfies the diagonal BMO assumption if the following holds.For all rectangles R = An SIO T satisfies the product BMO assumption if it holds . This can be interpreted in the sense that where h R = h I 1 ⊗ h I 2 and the supremum is over all dyadic grids D i on R d i and open sets and the pairings S(1, • • • , 1), h R can be defined, in a natural way, using the kernel representations.
We simplify the study of above operators through the following representation theorem.4.5.Proposition.Suppose T is an n-linear bi-parameter (ω 1 , ω 2 )-CZO.Then we have where C T enjoys a linear bound with respect to the CZO quantities and U u,σ denotes some nlinear bi-parameter dyadic operator (defined in the grid D σ ) with the following property.We have that U u = U u,σ can be decomposed using the standard dyadic model operators as follows: where each In above E σ denotes the expectation over a natural probability space Ω = Ω 1 × Ω 2 , the details of which are not relevant for us here, so that to each σ = (σ 1 , σ 2 ) ∈ Ω we can associate a random collection of dyadic rectangles The proposition is a consequence of [3, Theorem 5.35.and Lemma 5.12.].
It was proven in [3] that the minimal regularity we require is that . For the optimal dependence the dyadic representation is in terms of certain modified model operators.The modified versions of the standard operators are much more difficult to handle and we are forced to rely on the lemma that these can be written as a sum of the standard ones.However, as it is explained in [3], this will cause a loss in the kernel regularity.Yet another problem appears when dealing with the genuinely multilinear weights.Thus in some cases, we need to stick to the usual Hölder type kernel regularity ω i (t) = t α i .In the paper [21], it was proven that the standard model operators are bounded with the weights on the genuinely multilinear weight class introduced earlier.We will move on to introducing the model operators and state the very recent results for these.

4.C. Dyadic model operators.
All the operators in this section are defined in some fixed rectangles D = D 1 × D 2 .We do not emphasise this dependence in the notation.

4.D. Shifts
Here we assume that for m ∈ {1, 2} there exist two indices and for the remaining indices 4.8.Theorem ([21, Theorem 6.2.]).Suppose S k is an n-linear bi-parameter shift, 1 < p 1 , . . ., p n , ≤ ∞ and 1 p = n i=1 1 p i > 0. Then we have for all multilinear bi-parameter weights w ∈ A p .The implicit constant does not depend on k.
where the functions h I 1 i and u i,K 2 satisfy the following.There are i 0 , i 1 ∈ {1, . . ., n + 1}, and for the remaining indices i ∈ {i 0 , i 1 } we have Moreover, the coefficients are assumed to satisfy (4.10) Of course, (πS) k is defined symmetrically.
In fact, the above theorem is a special case of the Bloom type inequality.The following operator and result have obvious extensions in the product BMO setting.We consider an n-linear bi-parameter paraproduct Here we assume that for m ∈ {1, 2} there exist two indices i m 0 , i m 1 ∈ {0, . . ., n + 1}, Moreover, here we will assume that we at least have 0 ∈ {i 1 0 , i 1 1 } or 0 ∈ {i 2 0 , i 2 1 }.Later on, paraproducts will also appear as a result of standard expansions of products bf = In the first term, the worst case is if Often it is enough to consider the worst-case scenario.
Case 2. We have 0 ∈ {i 1 0 , i 1 1 } but 0 ∈ {i 2 0 , i 2 1 } (or the other way around).We consider the concrete case We have where we used the estimate The second claim is obtained by using extrapolation, Theorem 1.2.
4.18.Proposition.let p ∈ (1, ∞).Let λ and w be bi-parameter weights such that λ −p ′ , w −p ′ ∈ A ∞ .Let Π b,η be a weighted paraproduct operator defined via (4.17), we have Proof.The result follows from a variant of techniques seen in the proof of Proposition 4.15.For example, by duality we have terms like (4.17).Introducing a weight averages , where σ = w −p ′ , we can apply Lemma 3.4.Hence, we get In the same setting as above we can have, for example, the following mixed type weighted paraproduct .
Symmetrical definition when we have b, We also consider the case 4.19.Proposition.For a weighted paraproduct operator Π b,η as described above, we have , where i is either 1 or 2 depending on which parameter the cancellation is.
Proof.Let us, for example, consider the paraproduct written above, where we have b, h Similar to the previous proof, we use Lemma 3.4 but this time the second claim.Then the main difference to the previous proof is that we face e.g. .
Nevertheless, the claim follows quite easily via an extrapolation trick (see [21, Lemma 9.2]), since for fixed p ′ = 2 we have For the references below, we state a lemma regarding the square functions of partial paraproducts.For the lemma, it is relevant in which slots the cancellation appears.The square function can be taken corresponding to the cancellation on the (n + 1)-th slot.For example, if (Sπ) k is a form of partial paraproduct such that there is a cancellation on the (n + 1)-th slot on the second parameter, then we have the boundedness of the second parameter square function of this operator, namely S D 2 (Sπ) k .Similarly, S D 1 (Sπ) k and S D (Sπ) k must have the corresponding cancellation to be bounded.4.20.Lemma.Let U be a square function of partial paraproduct stated in above.Let 1 < p i ≤ ∞ and 1 p = n i=1 1 where w = n i=1 w i , (w 1 , . . ., w n ) ∈ A (p 1 ,...,pn) .Proof.The result follows almost identically to the proof of [21,Theorem 6.7.].We take the partial paraproduct of the form Using the dualisation trick in [21] for p > 1, we choose a sequence of functions ≤ 1, and we look at where We write , . . ., n} whenever we have the non-cancellative Haar function, expect when complexity is zero.We are reduced to bounding where h L 1 i = h L 1 i for at least one index i, and , then we have complexity ℓ i = 0.
We consider an example to see how we can use the idea in [21] in this setting.The goal is to prove where and g equals to By extrapolation [18], we just need to prove that Following the proof in [21], everything will be the same except that for fn+1 , we need to control where and γ n+1 = v −2 n+1 .For brevity, in below we just write I 1 n+1 instead of (I 1 n+1 ) (k n+1 ) =K 1 .So it remains to prove some variant of Proposition 2.14, which is straightforward.In fact, for the above model case, since γ n+1 ∈ A 2(n+1) , we have If k n+1 = 0, we simply have Then it is just a matter of vector-valued estimates for the multilinear maximal function and we are done.If k 1 > 0, then let s > 1 be such that d 1 /s ′ is sufficiently small, we have Then the fact that with characteristic independent of K 1 gives us that By Minkowski's inequality, We are left with estimating This completes the proof.The case p ≤ 1 follows from extrapolation [19].
The boundedness of these model operator commutators yields the boundedness of the commutators of Calderón-Zygmund operators via Proposition 4.5.Use of Proposition 4.5 and complexity dependences (5.2) restricts the kernel regularity of (ω 1 , ω 2 )-CZOs in Theorem 1.1.For the paraproduct free CZOs, we can use milder kernel regularity, where we have that ω i ∈ Dini 3/2 , i = 1, 2. By paraproduct free, we mean that the paraproducts in the dyadic representation of T vanish, which could also be stated in terms of (both partial and full) "T 1 = 0" type conditions.In the paraproduct free case, the reader can think of convolution form SIOs. Otherwise, we must use the standard Hölder type kernel regularity ω i (t) = t α i , α i ∈ (0, 1]. In the proof, we consider the boundedness n i=1 L p i (w p i i ) → L p (ν −p w p ) for p > 1 since Theorem 1.2 will extend the result to the quasi-Banach range.Recall the notation of dual weights: Here we chose to consider the commutators acting on the first function slot as the other ones are symmetrical.
The idea is to expand the commutator so that a product bf paired with Haar functions is expanded in the bi-parameter fashion only if both of the Haar functions are cancellative.In a mixed situation, we expand only in R d 1 or R d 2 , and in the remaining fully non-cancellative situation we do not expand at all.This strategy has been important in the recent multi-parameter results -see e.g.[1,3,20,22].
We focus on a commutator, where the cancellation appears in a mixed situation on first and last slots, that is, we have a commutator that is expanded as follows (5.3) This case essentially gathers all the methods for estimating these commutators.More involved expansions are considered with partial paraproducts.Both terms are handled separately whenever we have a bounded paraproduct, that is Π i j i , j i = 3 (or bi-parameter Π j 1 ,j 2 , (j 1 , j 2 ) = (3, 3)).Otherwise, we need to add and subtract certain averages of the function b to obtain enough cancellation.We analyse the second term in (5.3) as the first term is similar (swap the roles of functions f 1 and f n+1 together with weights η 1 and w p ).
We begin with the term .
By the zero average of Haar functions, we always have J 2 ⊂ I 2 1 .Now the important observation is that when |J 2 | and we can replace . Thus, we can change the order of the operators, and we can split the dual form of the term as follows where S 1 * k,J 2 differs from the usual adjoint so that we have I 2 1 ⊂ J 2 .We do not explicitly handle the term E as it is similar to the case j 2 = 2 (note that we have more cancellation than we need).Since the truncated operator S 1 * k,J 2 can be dominated by the A ∞ weighted square functions lower bound, we can drop the dependence on cube J 2 .Then, the estimations of the first two terms are very similar, hence one might think of S 1 * k as such or as S 2 D S 1 * k below.The boundedness follows simply by using Proposition 4.18 and the boundedness of multilinear shifts.Namely, i=2 w i and (w 2 , . . ., w n , νw −1 ) ∈ A (p 2 ,...,pn,p ′ ) by Lemma 2.5.The term, where j 2 = 2, is significantly more straightforward to estimate.We consider the dual form and estimate where ) , i m 1 ∈ {2, . . ., n} is from family of operators such that the square sum is an A 2,k type square function.We use Lemma 3.5 with a fixed variable on the first parameter and get In the above estimates, it is enough to note that the maximal function is bounded since, by Fubini's theorem, we can work with a fixed variable on the first parameter and use the classical one-parameter result.Lastly, we are left with the paraproducts of the illegal form Here we introduce the martingale blocks to the function b.We write The extra b K 1 ×K 2 simply cancels with the one from b I 2 1 ,2 .Hence, in the commutator we can expand as follows Observe that we have omitted the terms raised from b I 2 1 ,2 because they are similar.On the other hand, we shall only work with (5.4) and (5.5) because (5.6) is analogous.
We begin with the dual form of (5.4) By similar arguments as that in the proof of Lemma 3.4, we have Then by standard calculus, we can reduce the problem to ˆRd The rest follows from estimates such as Hölder's inequality, Theorem 2.13, and Proposition 2.14.
Finally, we consider the dual form of (5.5) ) is defined such that This resembles the term that we faced earlier with paraproduct Π 2 .The only meaningful difference is the extra summation.The estimations are similar when we divide and multiply with σ n+1 J 1 ×K 2 .To be more precise, that is, we write where we have used Lemma 3.4.The rest of the argument is rather standard and thus the object is bounded by where dependence This completes the analysis of the commutator of this form.
Although other forms of shifts lead to different expansions, the methods shown above are sufficient to handle those as well.Since we are dealing with multilinear shifts, we now encounter terms in the shift case that are non-cancellative.In comparison, this does not happen in the linear case in [22], where we always expand in the bi-parameter fashion.For example, if we look at the term bS(f 1 , . . ., f n ) − S(Π 1,2 3,3 (b, f 1 ), . . ., f n ), we have The above terms are expanded to the martingale blocks and differences in a standard way like terms (5.4) and (5.5).Note that the first term on the right-hand side produces a bi-parameter martingale difference inside of the rectangle I 1 n+1 × I 2 n+2 .We will analyse similar terms in the following subsection.

Partial paraproducts.
As explained earlier, we will now focus on more involved expansions of the commutator.We show the most representative case out of those.Although we demonstrated the main ideas of the estimates already in the shift case, we need to use more complex estimates due to the more complicated structure of the partial paraproducts.
We do not repeat the expansion strategy and instead straight away consider separately for (j 1 , j 2 ) = (3,3).We collect most of the mixed index (j 1 = j 2 ) cases, as the methods can be attained from these.Let us begin with the term, where j 1 = 1, j 2 = 2, that equals Similarly to the previously seen techniques, for the second term we use the square function lower bound to get rid of the restriction I 1 1 ⊂ J 1 .Thus, via Proposition 4.19 we can bound the first two terms by Clearly, Lemma 4.20 is enough to conclude the claim.The estimate for the remaining term is easier.We apply Lemma 3.4 and note that we have more cancellation than we need.Hence, we control The desired estimate follows by Hölder's inequality and Lemma 4.20.We remark that the remaining term essentially contains the idea to handle Π 1,1 .
The term with Π 2,1 is analogous to the previous one.We remark that in this case, the weighted paraproduct operator has the weight η 1 I 1 1 as the localization of the operator is at that level on the first parameter.The cases Π 3,1 and Π 1,3 can be handled similarly.For the sake of the completeness, we give a sketch of the case Π 3,1 .As before, we write For the second term, we again use Lemma 3.4 and treat The estimate is done by Proposition 2.14 and Lemma 4.20.For the first term, we split as usual to We focus on the first term as the other one is very similar once square function lower bound is applied inside of the average over J 2 .Rewrite the first term as Then the estimate is done by Proposition 4.19 and Lemma 4.20.
We continue with the term, where j 1 = 2, j 2 = 3, that is, Note that we can rewrite the above as Symmetrically, we can work with Π 3,2 .Lastly, we focus on terms with Π 3,3 type illegal paraproducts.We choose here the type of term which we did not consider in the shift section: Notice that we have (5.7) Now on the right-hand side of the above equation (5.7), we have two distinct cases where the first part is similar to the ones seen in the analysis of the shift commutator.We begin with this familiar case.However, now without using the sharper (5.5) expansion since, in this case, it does not matter if we have a square root dependence or a linear one.Observe that Then our term is bounded by b bmo(ν) ).We first consider ν I 1 n+1 ,K 2 .We fix j n+1 ∈ {1, . . ., k n+1 } and it suffices to bound where we have applied Lemma 3.3.Recall the strategy in [21], when h I 1 i = h I 1 i we do not do anything and when h We have , the same proof as in [21, Section 6.B] yields the desired estimate.The proof of ∆ 1 ,1 with i n+1 ≥ j n+1 is similar.So we only focus on i n+1 < j n+1 .By simple calculus, we reduce to bounding , and write By the reverse Hölder and A ∞ extrapolation, we can get σ n+1 out of the square sum.Then using where ).By Hölder's inequality, it suffices to bound the L p (w p ) norm of .
Simply control the outer ℓ 2 norm by ℓ 1 norm -then we can again use the estimate in [21, Section 6.B] to conclude the first term.For the second term, note that , which again can be handled exactly as in [21, p.23].Now we turn to consider the case Similar as [21], we may without loss of generality assume either h I 1 i = h I 1 i or otherwise As before, by reverse Hölder and A ∞ extrapolation the object is dominated by Next, we write Then by Hölder's inequality the estimate is reduced to and .
Again, the estimate of A can be found in [21, Section 6.B] and we omit the details.For B, we shall prove By the extrapolation theorem, it suffices to prove and by the vector-valued estimate for M v 2 n D and Hölder's inequality, we have .
Recall that when h , then according to our convention I 1 i = K 1 and Again the rest can be estimated as in [21, Section 6.B].
Next, we consider the latter part of (5.7).Notice that by Lemma 3.4 we have Then e.g.dominating ) allows us to view these square functions (which are bounded on L p ′ (σ n+1 )) as the new f n+1 .So that by Hölder's inequality, the related term in the commutator boils down to estimating the partial paraproduct which is exactly the standard one.Following the expansion methods and estimations introduced earlier, we can handle the other forms of commutators similarly.Compared to the shift case, the more difficult challenges arise from the terms of forms, where we have and We already handled the first and the symmetric case of the last one.By modifying the above methods, we can estimate the other two terms.
Full paraproducts.Although the full paraproducts have the more complicated product BMO coefficients, they do not require as much analysis as the partial paraproducts.Since no unseen methods are needed to conclude the boundedness of full paraproduct commutators, we omit the details.

THE LOWER BOUND
Let K be a standard bi-parameter full kernel as described earlier.In this section, we additionally assume that K is a multilinear non-degenerate kernel.That is, for any given rectangle R = I 1 × I 2 there exists R = I 1 × I 2 such that ℓ(I i ) = ℓ( I i ), d(I i , I i ) ∼ ℓ(I i ), and there exists some ζ ∈ C with |ζ| = 1 such that for all x ∈ R and y 1 , . . ., y n ∈ R there holds Re ζK(x, y 1 , . . ., y n ) 1 |R| n .
We are going to assume the weak type boundedness of the commutator.Suppose that sup A⊂R 1 n i=1 σ i (R) where recall that σ i = w −p ′ i i and ν = λ −1 j w j .Clearly, this is a weaker assumption than [b, T ] j : n i=1 L p i (w p i i ) → L p (ν −p w p ) < ∞.
We employ the idea of the median method to prove that b ∈ bmo ν (σ j ) := {b ∈ L 1 loc : sup ˆR |b − c|σ j < ∞} under the weaker assumption above.We additionally need to assume that νσ j ∈ A ∞ since when ν, σ j , νσ j ∈ A ∞ it follows that this is equivalent with the Bloom type little BMO definition, see Proposition 3.2.
6.1.Remark.We get νσ j ∈ A ∞ for free whenever λ Fix rectangle R ∈ D. We take arbitrary α ∈ R and x ∈ R ∩ {b ≥ α}, where R is a rectangle that satisfies the non-degeneracy property.Thus, we have As σ n+1 ∈ A ∞ we have that σ n+1 ( R ∩ {b ≥ α}) ∼ σ n+1 ( R) ∼ σ n+1 (R).Thus, we get (6.2) σ n+1 (R) Then the goal is to prove which can also be written as We split the proof to the following cases: Case 1: 1/s := 1/q − 1/p = 1/q n − 1/p n > 0. Without loss of generality we may assume , so that we have h where M ′ w g = M w (gW −q ′ n w )W q ′ n w , M ′ λ g = M λ (gW Let us explain why R ′ is well-defined.Indeed, since W w ∈ A qn,q ( w), we have ( w) and M ′ w is bounded on L (1+ q ′ n q ) ′ (W −q ′ n w w) = L (1+ q ′ n q ) ′ (w Then the above discussion easily yields h ≤ H, H L qn (w have the general form [b, T ] : f → bT f − T (bf ).Here T is a singular integral operator T f (x) = ˆRd K(x, y)f (y) dy.Well-known examples include the Hilbert transform H in dimension d = 1, which has the kernel K(x, y) = 1 x−y , and the Riesz transforms R j in dimensions d ≥ 2, which have the kernel K j (x, y) = x j −y j |x−y| d+1 , j = 1, . . ., d.Our work revolves around the Coifman-Rochberg-Weiss [4] result, where the twosided estimate b BMO [b, T ] L p (R d )→L p (R d ) b BMO , p ∈ (1, ∞), was proved for a class of non-degenerate singular integrals T on R d .Here BMO stands for functions of bounded mean oscillation: b BMO := sup I I |b − b I |, where the supremum is over all cubes I ⊂ R d and b I = ffl I b := 1 |I| ´I b.