1 Introduction

The commutators with the Riesz transforms are bounded and compact on \(L^p({\mathbb {R}}^n)\), \(1<p<\infty \), if and only if the symbol b is in the BMO space and VMO space. This is well known, see [10, 44]. A finer property of the commutators quantifies the Schatten norms, that is the \( \ell ^{p}\) norm of the singular values. This was studied by Peller, for the Hilbert transform on \({\mathbb {R}}\) [33] (see also [34]), and in higher dimensions by Janson–Wolff in \({\mathbb {R}}^n\), \(n\ge 2\) [25], and later on by Rochberg–Semmes [38, 39]. The Schatten norm is characterized by the symbol being in certain Besov spaces. We summarize the known results as follows. Let H denote the Hilbert transform and let \(R_\ell \) denote the \(\ell \)-th Riesz transform on \({\mathbb {R}}^n\).

  • If \(n=1\) and \(0< p<\infty \), then [bH] is in Schatten class \(S^p\) if and only if the symbol b is in the Besov space \(B_{p,p}^{1/p}({\mathbb {R}})\) [33, 34].

  • Suppose \(n\ge 2\) and \(b\in L^1_{\textrm{loc}}({\mathbb {R}}^n)\). When \(p>n\), \([b,R_{\ell }]\in S^p\) if and only if \(b\in B_{p,p}^{{n}/{p}}({\mathbb {R}}^n)\); when \(0<p\le n\), \([b,R_{\ell }]\in S^p\) if and only if b is a constant [25, 39].

Notice that the cases of dimensions \( n =1\) and \( n >2\) differ somewhat. This is due to the distinguished nature of the Hilbert transform, particularly its close connection to analyticity. Similar results have been demonstrated in [15] for Szegő projection, big and little Hankel operators on the unit ball and Heisenberg group, in [2] for the big Hankel operator on Bergman space of the disk, and in [48] for Hankel operators on the Bergman space of the unit ball.

The Janson–Wolff characterization has bearing on the quantised derivative of Alain Connes introduced in [11, Chapter 4]. In this setting, the (weak) Schatten norm of the commutator is relevant [30]. See also some recent progresses in various settings, especially in non-commutative analysis [1, 14, 18, 24, 32, 36, 37].

In this paper we investigate Schatten class estimates for commutators of Riesz transforms on Heisenberg groups, where the boundedness and compactness were established in [12] and [8], respectively. This requires us to revisit the methods of Janson–Wolff and Rochberg–Semmes, replacing Fourier analytic methods they used with more robust real variable arguments. Our result not only recovers the result of Janson–Wolff [25] and Rochberg–Semmes [39] on \({\mathbb {R}}^n\), \(n\ge 2\), with the quantitative estimate of the Schatten norm (which was not showed explicitly before), but also opens the door to the study of Schatten classes for commutators with certain Calderón–Zygmund operators in other important settings beyond \({\mathbb {R}}^n\). Examples of such Calderón–Zygmund operators include

  1. (1)

    the Cauchy–Szegő projection from Siegel upper half space to its boundary (identified with the Heisenberg group), see [41, Chapter 12, Sect. 2.4] and [15];

  2. (2)

    certain second order Riesz transforms, such as the well-known Beurling–Ahlfors operator on the complex plane \({\mathbb {C}}\) and second order Riesz transforms on \({\mathbb {H}}^n\). Details will be provided in the last section;

  3. (3)

    Riesz transforms in the Bessel setting [4, 23] and Neumann Laplacian setting [29], which will be addressed in subsequent papers.

To be more explicit on our result, we let \({\mathbb {H}}^{n}\) be the Heisenberg group. It is a nilpotent Lie group with underlying manifold \({\mathbb {C}}^{n}\times {\mathbb {R}}=\{[z,t]:z\in {\mathbb {C}}^{n}\times {\mathbb {R}}\}\), the multiplication law

$$\begin{aligned}&[z,t]\big [z^{\prime },t^{\prime }\big ]=[z_{1},\ldots ,z_{n},t]\big [z_{1}^{\prime }, \ldots ,z_{n}^{\prime },t^{\prime }\big ]\nonumber \\&:=\Bigg [z_{1}+z_{1}^{\prime },\ldots ,z_{n}+z_{n}^{\prime },t+t^{\prime }+2\textrm{Im}\Big (\sum _{j=1}^{n}z_{j}{\overline{z}}_{j}\Big )\Bigg ], \end{aligned}$$
(1.1)

and the homogeneous norm \(\rho (g)\) defined by \(\rho (g)=\rho ([z,t])=\max \{|z|_{{\mathbb {C}}^n},|t|^{1/2}\},\) where \(|z|_{{\mathbb {C}}^n}^2=\sum _{j=1}^{n} |z_j|^2\). For any \(\ell =1,2,\ldots ,2n\), let \(R_{\ell }\) be the Riesz transform on Heisenberg groups \({\mathbb {H}}^{n}\) and the commutator with \(R_{\ell }\) is defined as follows.

$$\begin{aligned}{}[b,R_{\ell }](f)(x):= b(x)R_{\ell }(f)(x) - R_{\ell }(bf)(x). \end{aligned}$$

Recently, the theory of Besov space on space of homogeneous type (in particular, Lie group [7, 16]) has attracted a lot of attentions (see [22, 46] and the references therein). Several equivalent characterizations were obtained. For our purpose, we will use the homogeneous Besov space via difference characterization defined as follows.

Definition 1.1

Suppose \(1<p,q< \infty \) and \(0<\alpha <1\). Let \(f\in L_\textrm{loc}^{1}({\mathbb {H}}^{n})\). Then we say that f belongs to Besov space \(B_{p,q}^{\alpha }({\mathbb {H}}^{n})\) if

$$\begin{aligned} {\int _{{\mathbb {H}}^{n}}\frac{\Vert f(g \cdot )-f(\cdot )\Vert _{L^{p}({\mathbb {H}}^{n})}^{q}}{\rho (g)^{2n+2+q\alpha }}dg<\infty .} \end{aligned}$$

We recall the definition of the Schatten class \(S^{p}\). Note that if T is any compact operator on \(L^{2}({\mathbb {H}}^{n})\), then \(T^{*}T\) is compact, symmetric and positive. It is diagonalizable. For \(0<p<\infty \), we say that \(T\in S^{p}\) if \(\{\lambda _{n}\}\in \ell ^{p}\), where \(\{\lambda _{n}\}\) is the sequence of square roots of eigenvalues of \(T^{*}T\) (counted according to multiplicity).

Our main theorem is the following.

Theorem 1.2

Suppose that \(0<p<\infty \) and \(b\in L^1_{\textrm{loc}}({\mathbb {H}}^n)\). Then for any \(\ell \in \{1,2,\ldots ,2n\}\), one has \([b,R_{\ell }]\in S^p\) if and only if

  1. (1)

    \(b\in B_{p,p}^{\frac{2n+2}{p}}({\mathbb {H}}^n)\), if \(p>2n+2\); in this case we have \(\Vert b\Vert _{B_{p,p}^{\frac{2n+2}{p}}({\mathbb {H}}^n)}\approx \Vert [b,R_{\ell }]\Vert _{S^p}\).

  2. (2)

    b is a constant, if \(0<p\le 2n+2\).

In the Euclidean setting, the Riesz transforms have an explicit form, \(\Omega (x)\over |x|^n\) where n is the dimension of the underlying space and \(\Omega (x)\) is a smooth homogeneous function of degree 0. This leads to arguments highly dependent on the form of the kernel. However, the Riesz transform kernel on Heisenberg group has no such convenient form. And, our argument depends upon recent developments. A pointwise lower bound of the Riesz transform kernel on stratified Lie groups (which covers the Heisenberg group) was established in [12] to characterize the boundedness of the commutator with Riesz transform. We have to further develop this theme to prove the main result. See Theorem 3.1 below. Indeed, Theorem 3.1 is key to our proof, a canonical ‘non-degenerate’ condition. It depends upon the kernel of the Riesz transforms only being zero on a set of zero measure, and being suitably large. In addition, the property aligns well with the martingale structure on Heisenberg groups. Verifying this property should be central in settings beyond the Euclidean. We return to this point in Sect. 6.

Our proof uses a natural martingale structure on the Heisenberg group, and an associated Haar basis, and crucially a notion of nearly weakly orthogonal due to Rochberg–Semmes [39]. It is very well adapted to the analysis of Schatten norms in harmonic analysis settings. See (2.2). As with other methods, the median of the symbol on the atoms of the martingale is important.

The paper is organized as follows. In Sect. 2, we recall the tilings and Haar Basis on Heisenberg group and characterization of Schatten class. In Sect. 3 we recall the basic property for Riesz transform and then prove the pointwise lower bound for the Riesz kernel (Theorem 3.1). In Sects. 4 and 5, we give the proof of Theorem 1.2 for the cases \(p>2n+2\) and \(0<p\le 2n+2\), respectively, which lies in Propositions 4.4, 4.5 and 5.5. In Sect. 6, we extend our approach to some other well-known Calderón–Zygmund operators beyond the Euclidean setting.

Throughout the paper we denote the \(L^{p}({\mathbb {H}}^n)\) norm of a function f by \(\Vert f\Vert _{p}\), \(1\le p\le \infty \). The indicator function of a subset \(E\subseteq X\) is denoted by \(\chi _{E}\). We use \(A\lesssim B\) to denote the statement that \(A\le CB\) for some constant \(C>0\), and \(A\approx B\) to denote the statement that \(A\lesssim B\) and \(B\lesssim A\).

2 Preliminaries on \({\mathbb {H}}^n\)

Let \({\mathbb {H}}^{n}\) be a Heisenberg group, which is a nilpotent Lie group with underlying manifold \({\mathbb {C}}^{n}\times {\mathbb {R}}=\{[z,t]:z\in {\mathbb {C}}^{n}\times {\mathbb {R}}\}\) and multiplication law as in (1.1). Then the identity of \({\mathbb {H}}^{n}\) is the origin and the inverse is given by \([z,t]^{-1}=[-z,-t]\). In addition to the Heisenberg group multiplication law, for each positive number \(\lambda \), non-isotropic dilations \(\delta _{\lambda }\) on \({\mathbb {H}}^{n}\) are given by

$$\begin{aligned} \delta _{\lambda }(g):=\delta _{\lambda }[z,t]:=\big [\lambda z,\lambda ^{2}t\big ]. \end{aligned}$$

Besides, the norm structure \(\rho \) on \({\mathbb {H}}\) is defined by

$$\begin{aligned} \rho (g)=\rho ([z,t])=\max \bigg \{|z|_{{\mathbb {C}}^n},|t|^{1/2}\bigg \}, \end{aligned}$$

where \(|z|_{{\mathbb {C}}^n}^2=\sum _{j=1}^{n} |z_j|^2\). The Haar measure on \({\mathbb {H}}^{n}\) coincides with Lebesgue measure on \({\mathbb {R}}^{2n+1}\). For any measurable set \(E\subset {\mathbb {H}}^{n}\), |E| denotes its Haar measure. It is direct to see that \(\rho (g^{-1})=\rho (-g)=\rho (g)\) and \( \rho (\delta _{\lambda }(g))=\lambda \rho (g)\).

The \(2n+1\) vector fields

$$\begin{aligned} X_{\ell }:=\frac{\partial }{\partial x_{\ell }}-2y_{\ell }\frac{\partial }{\partial t},\ \ Y_{\ell }:=\frac{\partial }{\partial y_{\ell }}+2x_{\ell }\frac{\partial }{\partial t},\ \ {\mathcal {T}}:=\frac{\partial }{\partial t},\ \ \ell =1,2,\ldots ,n, \end{aligned}$$

form a natural basis for the Lie algebra of left-invariant vector field on \({\mathbb {H}}^{n}\). For convenience, we set \(X_{n+\ell }:=Y_\ell , \ell =1,2,\ldots ,n\) and set \(X_{2n+1}:={\mathcal {T}}\). The standard sub-Laplacian \(\Delta _{{\mathbb {H}}}\) on the Heisenberg group is defined by \(\Delta _{{\mathbb {H}}}:=\sum _{\ell =1}^{2n}X_{\ell }^{2}\). For any multi-index \(I=(i_{1},\ldots ,i_{2n+1})\in {\mathbb {N}}^{2n+1}\), we set \(X^{I}:=X_{1}^{i_{1}}X_{2}^{i_{2}}\cdots X_{2n+1}^{i_{2n+1}}\) and further set

$$\begin{aligned} |I|:=i_{1}+\cdots +i_{2n+1} \quad and \quad \ \vartheta (I):=i_{1}+\cdots +i_{2n}+2i_{2n+1}. \end{aligned}$$
(2.1)

The integers |I| and \(\vartheta (I)\) are said to be the topological degree and homogeneous degree of the differential \(X^{I}\), respectively.

2.1 Tiles on \({\mathbb {H}}^n\)

We recall the metrics and tilings in \({\mathbb {H}}^n\) summarized in [9]. We shall use the gauge distance d, which is defined by setting

$$\begin{aligned} d(g, g'):= \left\| g'^{-1} \cdot g \right\| = \left\| g^{-1} \cdot g'\right\| , \qquad \forall g, g' \in {\mathbb {H}}^n, \end{aligned}$$

where \(\left\| {}\cdot {}\right\| \) is given by \(\left\| (z,t) \right\| := \max \left\{ |x_1|, |y_1|, \dots , |x_n|, |y_n|, |t|^{1/2} \right\} , \forall (z,t)\in {\mathbb {H}}^n.\) It is easy to see that d is equivalent to the homogeneous norm \(\rho \). See [43, Sect. 2.2] for a discussion. We write B(gr) for the ball in \({\mathbb {H}}^n\) with center g and radius r constructed using the distance d. We also use balls in the (algebraic) center of \({\mathbb {H}}^n\), which may be identified with \({\mathbb {R}}\): we define \(B^{*}(t, s):= \{ t' \in {\mathbb {R}}: |t-t'| < s \}\). Tubes are sets of the form \(g \cdot B(o, r) \cdot B^{*}(0, s)\), which are images of products of balls in \({\mathbb {H}}^n \times {\mathbb {R}}\) under the multiplication in (1.1). Here we used the notation o to denote the zero point \((z,t)=(0,0)\) of \({\mathbb {H}}^n\) and we shall explain the notation \(B(o, r) \cdot B^{*}(0, s)\) in more details: for any \((z,t) \in B(o, r)\) and any \(t'\in B^{*}(0, s)\), we interpret \(t'\) as \((0,t') \in {\mathbb {H}}^n\), and hence by (1.1),

$$\begin{aligned} (z,t)\cdot (0,t') = (z, t+t'). \end{aligned}$$

We now recall that T(grs) is defined as

$$\begin{aligned} T(g,r,s) := g \cdot B(o, r) \cdot B^{*}(0, s) = B(g, r) \cdot B^{*}(0, s). \end{aligned}$$

We use the work of [42, 43] on self-similar tilings to find a “nice” decomposition of \({\mathbb {H}}^n\), analogous to the decomposition of \({\mathbb {R}}^n\) into dyadic cubes in classical harmonic analysis, and describe an analogue of a lemma of Journé [27]. We identify \({\mathbb {C}}^n\) with \({\mathbb {R}}^{2n}\), \(|z|_\infty \) denotes \(\max \{ |x_1|, |y_1|, \dots |x_n|, |y_n| \}\), \(Q_0\) denotes the cube \([-1/2,1/2)^{2n}\), and \({\mathbb {H}}^n_{{\mathbb {Z}}}\) denotes the subgroup \(\{ (z,t) \in {\mathbb {H}}^n: z \in {\mathbb {Z}}^{2n}, t \in (2n)^{-1} {\mathbb {Z}}\}\).

Theorem 2.1

([42, 43]) There is a measurable function \(f: Q_0 \rightarrow {\mathbb {R}}\) such that \(f(0) = {1\over 2(n+1)}\) and

$$\begin{aligned} \frac{1}{4n(n+ 1)} \le f(z) \le \frac{2n+ 1}{4n(n+ 1)} \qquad \forall z \in Q_0, \end{aligned}$$

such that the set \(T_o\), defined by

$$\begin{aligned} T_o:= \left\{ (z,t): z \in Q_0, f(z) - \frac{1}{2n} \le t < f(z) \right\} , \end{aligned}$$

has the property that

$$\begin{aligned} \delta _{2n+1} (T_o) = \bigcup _{g \in \Delta } g \cdot T_o, \end{aligned}$$

where \(\Delta := \{ (z,t) \in {\mathbb {H}}^n_{{\mathbb {Z}}}: |z|_\infty \le n: |t| \le n+ 1 \}\).

The definitions of \(T_o\) and the metrics that we use show that

$$\begin{aligned} T_o\subset & {} \left\{ (z,t) \in {\mathbb {H}}^n: |z|_\infty \le 1/2, |t| \le 3/8 \right\} \subseteq {\bar{B}}(o, 1/2)\cdot {\bar{B}}^{*}(o, 1/8)\\= & {} {\bar{T}}(o,1/2,1/8), \end{aligned}$$

where the barred symbols indicate closures. We note that \(|T_o| = 1/2n\) while \(|T(o,1/2,1/8)| = 3/4\).

Definition 2.2

We define

$$\begin{aligned} \mathfrak {T}_0:= \{ g \cdot T_o: g \in {\mathbb {H}}^n_{{\mathbb {Z}}} \}, \qquad \mathfrak {T}_j:= \delta _{(2n+1)^j} \mathfrak {T}_0 \quad \text {and}\quad \mathfrak {T}:= \bigcup _{j \in {\mathbb {Z}}} \mathfrak {T}_j. \end{aligned}$$

We call the sets \(T \in \mathfrak {T}\) tiles. If \(j \in {\mathbb {Z}}\) and \(g \in {\mathbb {H}}^n_{{\mathbb {Z}}}\) and \(T = \delta _{(2n+1)^j} (g \cdot T_o)\), then \(T = \delta _{(2n+1)^j} (g) \cdot \delta _{(2n+1)^j} (T_o)\), and we further define

$$\begin{aligned} {\text {cent}}(T):= \delta _{(2n+1)^j} (g), \quad {\text {width}}(T):= (2n+1)^j \quad \text {and}\quad {\text {height}}(T):= \frac{(2n+1)^{2j}}{2n}. \end{aligned}$$

And we define \(I_{j}\) be the j-th center set consisting of all the centers of \(T\in \mathfrak {T}_j\). That is,

$$\begin{aligned} I_{j}=\{{\text {cent}}(T):T\in \mathfrak {T}_{j}\}. \end{aligned}$$

Lemma 2.3

([42, 43]) Let \(\mathfrak {T}_j\) and \(\mathfrak {T}\) be defined as above. Then the following hold:

  1. (1)

    for each \(j \in {\mathbb {Z}}\), \(\mathfrak {T}_j\) is a partition of \({\mathbb {H}}^n\), that is, \({\mathbb {H}}^n = \bigcup _{T \in \mathfrak {T}_j} T\);

  2. (2)

    \(\mathfrak {T}\) is nested, that is, if \(T, T' \in \mathfrak {T}\), then either T and \(T'\) are disjoint or one is a subset of the other;

  3. (3)

    for each \(j \in {\mathbb {Z}}\) and \(T\in \mathfrak {T}_j\), T is a union of \((2n+1)^{2n+2}\) disjoint congruent subtitles in \(\mathfrak {T}_{j-1}\);

  4. (4)

    \(B(g, C_1 q) \subseteq T \subseteq B(g, C_2 q)\), where \(g = {\text {cent}}(T)\) and \(q = {\text {width}}(T)\) for each \(T \in \mathfrak {T}\); the constants \(C_1\) and \(C_2\) depend only on \(n\);

  5. (5)

    if \(T \in \mathfrak {T}_j\), then \(g \cdot T \in \mathfrak {T}_j\) for all \(g \in \delta _{(2n+1)^j} {\mathbb {H}}^n_{{\mathbb {Z}}}\), and \(\delta _{(2n+1)^k} T \in \mathfrak {T}_{j+k}\) for all \(k \in {\mathbb {Z}}\).

Every tile is a dilate and translate of the basic tile \(T_o\), so all have similar geometry. Hence each tile in \(\mathfrak {T}_j\) is a fractal set—its boundary is a set of Lebesgue measure 0 and (Euclidean Hausdorff) dimension \(2n\)—and is “approximately” a Heisenberg ball of radius \((2n+1)^{j}\). The decompositions are product-like in the sense that the tiles project onto cubes in the factor \({\mathbb {C}}^n\), and their centers form a product set. If two tiles in \(\mathfrak {T}_j\) are “horizontal neighbors”, then the distance between their centers is \((2n+1)^{j}\), while if they are “vertical neighbors”, then the distance is \((2n+1)^{2j}/2n\).

2.2 An Explicit Haar Basis on Heisenberg Group

Next we recall the explicit construction in [28] of a Haar basis. Note that in [28], the Haar basis was constructed on a system of dyadic cubes for general metric space with a positive Borel measure. Here we apply it to the specific setting of Heisenberg group \({\mathbb {H}}^n\) on the system of tiles.

There exists a Haar basis on \({\mathbb {H}}^n\): \(\{h_{T}^{\epsilon }: T\in \mathfrak {T}, \epsilon = 1,\dots ,M_n - 1\}\) for \(L^p({\mathbb {H}}^n)\), \(1< p < \infty \), where \(M_n:=\# {\mathfrak {H}}(T)= (2n+1)^{2n+2}\) denotes the number of sub-tiles of T and \({\mathfrak {H}}(T)\) denotes the collection of sub-tiles of T.

Lemma 2.4

([28]) For each \(f\in L^p\), we have

$$\begin{aligned} f(x) = \sum _{T\in \mathfrak {T}}\sum _{\epsilon =1}^{M_n-1} \langle f,h^\epsilon _T\rangle h^\epsilon _T(x), \end{aligned}$$

where the sum converges (unconditionally) both in the \(L^p\)-norm and pointwise almost everywhere.

The following theorem collects several basic properties of the functions \(h_{T}^{\epsilon }\).

Lemma 2.5

([28]) The Haar functions \(h_{T}^{\epsilon }\), \(T\in \mathfrak {T}\), \(\epsilon = 1,\ldots ,M_n - 1\), have the following properties:

  1. (i)

    \(h_{T}^{\epsilon }\) is a simple Borel-measurable real function on \({\mathbb {H}}^{n}\);

  2. (ii)

    \(h_{T}^{\epsilon }\) is supported on T;

  3. (iii)

    \(h_{T}^{\epsilon }\) is constant on each \(R\in {\mathcal {H}}(T)\);

  4. (iv)

    \(\int _T h_{T}^{\epsilon }(g)\, dg = 0\) (cancellation);

  5. (v)

    \(\langle h_{T}^{\epsilon },h_T^{\epsilon '}\rangle = 0\) for \(\epsilon \ne \epsilon '\), \(\epsilon \), \(\epsilon '\in \{1, \ldots , M_n - 1\}\);

  6. (vi)

    the collection \( \big \{|T|^{-1/2}\chi _T\big \} \cup \{h_{T}^{\epsilon }: \epsilon = 1, \ldots , M_n - 1\} \) is an orthogonal basis for the vector space V(T) of all functions on T that is a constant on each sub-cube \(R\in {\mathfrak {H}}(T)\);

  7. (vii)

    if \(h_{T}^{\epsilon }\not \equiv 0\) then \(\Vert h_{T}^{\epsilon }\Vert _{p} \approx |T|^{\frac{1}{p} - \frac{1}{2}} \quad \text {for}~1 \le p \le \infty ;\)

  8. (viii)

    \(\Vert h_{T}^{\epsilon }\Vert _{1}\cdot \Vert h_{T}^{\epsilon }\Vert _{\infty } \approx 1\).

2.3 Characterization of Schatten Class

The Schatten norm is defined in a non-linear fashion. Estimating it above, and below, is not necessarily straight forward. Operators with kernels, such as commutators, admit general upper bounds in terms of norms on the kernels. These general facts are recalled, and used, in Sect. 4.2.

Characterizations of Schatten norms for general operators are well-known, and frequently expressed in terms of supremums, or infimums, over all choices of orthonormal bases for the Hilbert space in question.

Rochberg and Semmes [39] proposed a notion of nearly weakly orthogonal (NWO) sequences of functions. This notion is closely connected to Carleson measures. For our purposes, we do not need to recall the full definition of NWO sequences. With the development of tiles in Sect. 2.1, we have the inequality below, for any bounded compact operator A on \( L ^2 ({\mathbb {H}} ^{n})\) and \(1<p<\infty \):

$$\begin{aligned} \left[ \sum _{T\in \mathfrak {T}} |\langle A e_T, f_T \rangle |^{p} \right] ^{1/p} \lesssim \Vert A \Vert _{S ^{p}}, \end{aligned}$$
(2.2)

where \(\{e_T\}_T\) and \(\{f_T\}_T\) are function sequences satisfying \( |e_T|, |f_T|\le |T|^{-1/2} \chi _{T} \). This inequality can be found in [39, (1.10), Sect. 3].

3 Lower Bound of the Riesz Transform Kernel on \({\mathbb {H}}^n\)

For any \(\ell =1,2,\ldots ,2n\), the Riesz transform on Heisenberg groups \({\mathbb {H}}^{n}\) is given by \(R_{\ell }=X_{\ell } (-\Delta _{{\mathbb {H}}})^{-1/2}\). It is well known that the heat kernel \(p_h\) on \({\mathbb {H}}^n\) has this form (cf. [19]): for \(g=[z,t]\in {\mathbb {H}}^n\),

$$\begin{aligned} p_h(g) = \frac{1}{2 (4 \pi h)^{n+1}} \int _{{\mathbb {R}}} \exp {\Big (\frac{\lambda }{4 h} ( t\, \i - | z |_{{\mathbb {C}}^n}^2 \coth {\lambda }) \Big )} \Big ( \frac{\lambda }{\sinh {\lambda }} \Big )^n \, d\lambda ,\quad \i ^2=-1. \end{aligned}$$

Moreover, \(p_h\) on \({\mathbb {H}}^n\) satisfies (c.f. for example [17, Eq. (1.73)])

$$\begin{aligned} p_h(g) = h^{-n-1} p(\delta _{\frac{1}{\sqrt{h}}}(g)), \qquad \forall h > 0, \ g \in {\mathbb {H}}^n. \end{aligned}$$
(3.2)

The kernel of the \(\ell {\textrm{th}}\) Riesz transform \(R_\ell \) (\(1 \le \ell \le 2n\)) is written simply as \(K_\ell (g)\). It is well-known that \( K_\ell \in C^{\infty }({\mathbb {H}}^n \setminus \{o\})\), and it satisfies the scaling condition

$$\begin{aligned} \ K_\ell (\delta _r(g)) = r^{-2n-2} K_\ell (g), \quad \forall g \ne o, \ r > 0, \ 1 \le \ell \le 2n. \end{aligned}$$
(3.3)

Indeed, this follows from the relationship between the Riesz transform and heat kernel (3.2) given by

$$\begin{aligned} K_\ell (g) = \frac{1}{\sqrt{\pi }} \int _0^{+\infty } h^{-\frac{1}{2}} X_\ell p_h(g) \, dh = \frac{1}{\sqrt{\pi }} \int _0^{+\infty } h^{- n - 2} \left( X_\ell p \right) (\delta _{\frac{1}{\sqrt{h}}}(g)) \, dh. \end{aligned}$$

We recall that by the classical estimates for heat kernel and its derivations on stratified groups (see for example [45]), it is well-known that (e.g. [17]) for any multi-index \(I=(i_1,\ldots ,i_{2n})\in {\mathbb {N}}^{2n}\), \(\forall 1 \le \ell \le 2n\), the Riesz transform kernel satisfies the following smoothness inequality:

$$\begin{aligned} \bigg |X^I K_\ell (g)\bigg | \lesssim \rho (g)^{-2n-2-|I|}. \end{aligned}$$

We now establish the following fundamental result for the pointwise lower bound of the Riesz transform kernel, which is one of the key property for proving our main theorem. It is of independent interest, in that this property can be seen to hold for other Calderón–Zygmund operators.

Theorem 3.1

There exists a positive integer \(A_0\) such that:

  • for any \(T\in \mathfrak {T}_{j}\), there is a unique \(T_{A_0}\in \mathfrak {T}_{j+A_0}\) such that \(T\subset T_{A_0}\).

  • furthermore, for each \(\ell \in \{1,2,\cdots ,2n\}\), there exist positive constants \(3\le A_{1}\le A_{2}\) and \(C>0\) such that for any tile \(T\in \mathfrak {T}_{j}\), there exists a tile \({\hat{T}}\in \mathfrak {T}_{j}\) satisfying:

    1. (1)

      \({\hat{T}}\subset T_{A_0}\);

    2. (2)

      \(A_{1}(2n+1)^{j}\le d({\text {cent}}{(T)},{\text {cent}}({\hat{T}}))\le A_{2}(2n+1)^{j}\);

    3. (3)

      for all \((g,{{\hat{g}}})\in T\times {\hat{T}}\), \(K_{\ell }(({{\hat{g}}})^{-1} g)\) does not change sign;

    4. (4)

      for all \((g,{{\hat{g}}})\in T\times {\hat{T}}\), \(|K_{\ell }(({{\hat{g}}})^{-1} g)|\ge C (2n+1)^{-(2n+2)j}\).

Proof

Begin with this fundamental fact of the Riesz transform kernel from [13, Theorem 1.5]:

$$\begin{aligned} K_\ell (g)\not =0\qquad \mathrm{a.e.}\ g\in {\mathbb {H}}^n, \textrm{for each fixed} \ell \in \{1,2,\ldots ,2n\}. \end{aligned}$$

From the scaling property of \(K_\ell \) (c.f. (3.3)) and the property above, we obtain that

$$\begin{aligned} \quad K_\ell ( g)\not =0\qquad \mathrm{a.e.}\ \ g\in {\mathbb {S}}^n, \end{aligned}$$

where \({\mathbb {S}}^n=\{g\in {\mathbb {H}}^n:\ \rho ( g)=1\}\) is the unit sphere in \({\mathbb {H}}^n\). Let \(E_\ell :=\{ g\in {\mathbb {S}}^n:\ K_\ell ( g)=0 \}\). Then \(\sigma (E_\ell )=0\), where \(\sigma \) represents the surface measure, and for every small positive number \(\epsilon \), there exists an open set \({\mathcal {E}}_\ell \) covering \(E_\ell \) such that \(\sigma ({\mathcal {E}}_\ell )<\epsilon \). Since \(K_\ell \) is a \(C^\infty \) function in \({\mathbb {H}}^n \backslash \{o\}\), there exists \(g_\ell \) in \({\mathbb {H}}^n\) with \(\rho (g_ \ell )=1\) such that

$$\begin{aligned} |K_\ell (g_\ell )|=\min _{g\in {\mathcal {F}}_\ell } |K_\ell (g) |>0, \end{aligned}$$

where \({\mathcal {F}}_\ell :={\mathbb {S}}^n\backslash {\mathcal {E}}_\ell \).

Hence, there exists \( 0<\varepsilon _o\ll 1\) such that

$$\begin{aligned} |K_\ell ( g)|>{1\over 2} |K_\ell ( g_\ell )|, \end{aligned}$$
(3.4)

for all \(g\in B({\mathcal {F}}_\ell , 4\varepsilon _o) = \{g\in {\mathbb {H}}^n: \exists {\tilde{g}}\in {\mathcal {F}}_\ell \mathrm{\ \ such \ that} \ \ d(g,{\tilde{g}})<4\varepsilon _o \}\).

We now turn to the tiles. Based on the construction of tiles, for every \(T\in \mathfrak {T}_{j}\), there exists a unique \(T_{A_0}\in \mathfrak {T}_{j+A_0}\) such that \(T\subset T_{A_0}\). Here \(A_0\) is a positive integer to be determined later. We now choose an arbitrary \(T\in \mathfrak {T}_{j}\).

We first claim that for the chosen \(T\in \mathfrak {T}_{j}\) and the unique tile \(T_{A_0}\in \mathfrak {T}_{j+A_0}\) with \(T\subset T_{A_0}\), there must be some \({{\hat{g}}} \in T_{A_0}\) with \(d(h, {{\hat{g}}}) = {\mathfrak {C}} (2n+1)^{j+A_0}\) and \(d({{\hat{g}}}, T_{A_0}^c) >10C_2(2n+1)^j\) such that

$$\begin{aligned} (\delta _{{\mathfrak {C}}^{-1}(2n+1)^{-j-A_0}} (h^{-1} {{\hat{g}}}) )^{-1}\in {\mathcal {F}}_\ell , \end{aligned}$$
(3.5)

where \(h= {\text {cent}}{(T)}\), \({\mathfrak {C}}\) is a positive constant such that \({ C_1\over 2}<{\mathfrak {C}} < {3 C_1\over 4}\), \(C_1\) and \(C_2\) are the constants in Lemma 2.3.

We now prove this claim. Suppose that for all \({{\hat{g}}} \in T_{A_0}\) with \(d(h, {{\hat{g}}}) = {\mathfrak {C}} (2n+1)^{j+A_0}\) and \(d({{\hat{g}}}, T_{A_0}^c) >10C_2(2n+1)^j\), (3.5) does not hold. Then since \(\rho ((\delta _{{\mathfrak {C}}^{-1}(2n+1)^{-j-A_0}} (h^{-1} {{\hat{g}}}))^{-1})=1\), we obtain that \((\delta _{\mathfrak C^{-1}(2n+1)^{-j-A_0}} (h^{-1} {{\hat{g}}}))^{-1} \in {\mathcal {E}}_\ell .\) However, due to the construction of the system of tiles, we obtain that

$$\begin{aligned} {\ \sigma ( \{{{\hat{g}}} \in T_{A_0}:\ d(h, {{\hat{g}}}) = {\mathfrak {C}} (2n+1)^{j+A_0},\ d({{\hat{g}}}, T_{A_0}^c)>10C_2(2n+1)^j\} )\ \over \sigma (\{{{\hat{g}}} \in {\mathbb {H}}^n:\ d(h, {{\hat{g}}}) = {\mathfrak {C}} (2n+1)^{j+A_0}\})}> {\mathfrak {D}}>0, \end{aligned}$$

where \( {\mathfrak {D}}\in (0,1)\) is a constant depending on n and \(A_0\) only, but independent of j and T. This contradicts to the fact that \(\sigma ({\mathcal {E}}_\ell )<\epsilon \) for any small positive \(\epsilon \) given at the beginning. Thus, the claim holds.

Now based on the claim, we choose \({{\hat{h}}} \in T_{A_0}\) with \(d(h, {{\hat{h}}}) = {\mathfrak {C}} (2n+1)^{j+A_0}\) and \(d({{\hat{h}}}, T_{A_0}^c) >10C_2(2n+1)^j\) such that \((\delta _{{\mathfrak {C}}^{-1}(2n+1)^{-j-A_0}} (h^{-1} {{\hat{h}}}))^{-1} \in {\mathcal {F}}_\ell \). Let \({\tilde{g}}_\ell :=(\delta _{{\mathfrak {C}}^{-1}(2n+1)^{-j-A_0}} (h^{-1} {{\hat{h}}}))^{-1} \). Without lost of generality, we assume that \(K_\ell ({\tilde{g}}_\ell )\) is positive.

From the definition of \({\tilde{g}}_\ell \) we see that

$$\begin{aligned} {{\hat{h}}}= h \cdot \delta _{{\mathfrak {C}} (2n+1)^{j+A_0}}\bigg ( \tilde{g}_\ell ^{-1}\bigg ). \end{aligned}$$
(3.6)

Next, we choose the integer \(A_0\) so that \((2n+1)^{A_0}> 5C_2{\mathfrak {C}}^{-1} \varepsilon _0^{-1}\). Then fix some \(\eta \in (0, 2 \varepsilon _o)\) such that the two balls \(B(h, \eta r)\) and \(B( {{\hat{h}}}, \eta r) \) with \(r={\mathfrak {C}} (2n+1)^{j+A_0}\) satisfy the following condition:

$$\begin{aligned} 5 C_2(2n+1)^j< \eta r< 10 C_2(2n+1)^j. \end{aligned}$$

Then we can deduce that \(T\subset B(h, \eta r)\) and \(B( {{\hat{h}}}, \eta r) \subset T_{A_0}\).

It is direct that for every \(g\in B(h, \eta r)\), we can write

$$\begin{aligned} g =h\cdot \delta _r (g'_1), \end{aligned}$$

where \(g'_1 \in B(o, \eta )\). Similarly, for every \({\hat{g}}\in B( {{\hat{h}}}, \eta r)\), we can write

$$\begin{aligned} {\hat{g}} = {{\hat{h}}}\cdot \delta _r (g'_2), \end{aligned}$$

where \(g'_2 \in B(o, \eta )\).

As a consequence, we have

$$\begin{aligned} K_\ell (g,{\hat{g}})&= K_\ell \big ( h\cdot \delta _r (g'_1) , {{\hat{h}}}\cdot \delta _r (g'_2) \big )\nonumber \\&= K_\ell \big ( h\cdot \delta _r (g'_1) , h \cdot \delta _r( \tilde{g}_\ell ^{-1} )\cdot \delta _r (g'_2) \big )\nonumber \\&= K_\ell \big ( \delta _r (g'_1) , \delta _r( {\tilde{g}}_\ell ^{-1} )\cdot \delta _r (g'_2) \big )\nonumber \\&= K_\ell \big ( \delta _r (g'_1) , \delta _r( {\tilde{g}}_\ell ^{-1} \cdot g'_2) \big )\nonumber \\&= r^{-2n-2} K_\ell \big ( g'_1, {\tilde{g}}_\ell ^{-1} \cdot g'_2 \big )\nonumber \\&= r^{-2n-2} K_\ell \big ( (g'_2)^{-1} \cdot {\tilde{g}}_\ell \cdot g'_1 \big ), \end{aligned}$$
(3.7)

where the second equality comes from (3.6), the third comes from the property of the left-invariance and the fifth comes from (3.3).

Next, we note that

$$\begin{aligned} d\big ( (g'_2)^{-1} \cdot {\tilde{g}}_\ell \cdot g'_1, \tilde{g}_\ell \big )&= d\big ( {\tilde{g}}_\ell \cdot g'_1, g'_2 \cdot {\tilde{g}}_\ell \big )\\&\le \, \left[ d\big ( {\tilde{g}}_\ell \cdot g'_1, \tilde{g}_\ell \big )+ d\big ( {\tilde{g}}_\ell , g'_2 \cdot {\tilde{g}}_\ell \big ) \right] \\&= \, \left[ d\big ( g'_1, o \big )+ d\big ( o, g'_2 \big ) \right] \\&\le 2 \eta \\&<4 \varepsilon _o, \end{aligned}$$

which shows that \( (g'_2)^{-1} \cdot {\tilde{g}}_\ell \cdot g'_1\) is contained in the ball \(B({\tilde{g}}_\ell , 4 \varepsilon _o)\) for all \(g'_1 \in B(o, \eta )\) and for all \(g'_2 \in B(o, \eta )\).

Thus, from (3.4), we obtain that

$$\begin{aligned} | K_\ell \big ( (g'_2)^{-1} \cdot {\tilde{g}}_\ell \cdot g'_1 \big )| > {1\over 2 } | K_\ell ({\tilde{g}}_\ell )| \end{aligned}$$
(3.8)

and for all \(g'_1 \in B(o, \eta )\) and for all \(g'_2 \in B(o, \eta )\), \(K_\ell \big ( (g'_2)^{-1} \cdot {\tilde{g}}_\ell \cdot g'_1 \big )\) and \(K_\ell ({\tilde{g}}_\ell )\) have the same sign.

Now combining the equality (3.7) and (3.8) above, we obtain that

$$\begin{aligned} |K_\ell (g,{\hat{g}})| > {1\over 2 } r^{-2n-2} |K_\ell ({\tilde{g}}_\ell )|, \end{aligned}$$
(3.9)

for every \(g\in B(h, \eta r)\) and for every \({{\hat{g}}}\in B( {{\hat{h}}}, \eta r)\), where \(K_\ell (g,{\hat{g}})\) and \(K_\ell ({\tilde{g}}_\ell )\) have the same sign. Here \(K_\ell ({\tilde{g}}_\ell )\) is a fixed constant independent of \(\eta \), r, h, \(g_1\) and \(g_2\). We denote

$$\begin{aligned} C(\ell ,n)= {1\over 2}|K_\ell ({\tilde{g}}_\ell )|. \end{aligned}$$

From the lower bound (3.9) above, we further obtain that for the suitable \(\eta \in (0,\varepsilon _o)\),

$$\begin{aligned} |K_\ell (g,{\hat{g}})| > C(\ell ,n) r^{-2n-2}, \end{aligned}$$

for every \(g\in B(h, \eta r)\) and for every \({\hat{g}}\in B( {{\hat{h}}}, \eta r)\). Moreover, the sign of \(K_\ell (g,{\hat{g}})\) is invariant for every \(g\in B(h, \eta r)\) and for every \({\hat{g}}\in B( {{\hat{h}}}, \eta r)\).

Based on the fact that \(B( {{\hat{h}}}, \eta r)\subset T_{A_0}\) and \(\eta r> 5 C_2(2n+1)^j \), there must be some tile \({{\hat{T}}} \in \mathfrak {T}_{j}\) such that \({\hat{T}}\subset B( {{\hat{h}}}, \eta r)\). Also note that \(T\subset B( h, \eta r)\). Hence we obtain that \(A_{1}(2n+1)^{j}\le d({\text {cent}}{(T)},{\text {cent}}({\hat{T}}))\le A_{2}(2n+1)^{j}\), where \(A_1\) and \(A_2\) depends only on \(A_0\) and \({\mathfrak {C}}\). Moreover, we see that for all \((g,{{\hat{g}}})\in T\times {\hat{T}}\), \(K_{\ell }(({{\hat{g}}})^{-1} g)\) does not change sign and that for all \((g,{{\hat{g}}})\in T\times {\hat{T}}\), \(|K_{\ell }(({{\hat{g}}})^{-1} g)| > rsim (2n+1)^{-(2n+2)j}\), where the implicit constant depends on \(C(\ell ,n)\) and \(A_0\).

The proof of Theorem 3.1 is complete. \(\square \)

4 Theorem 1.2: \(2n+2<p<\infty \)

4.1 Proof of the Necessary Condition

In this subsection, we assume that \([b,R_{\ell }]\in S^p\) for some \(2n+2<p<\infty \) and then prove that \(b\in B_{p,p}^{\frac{2n+2}{p}}({\mathbb {H}}^{n})\).

We need these preliminary observations. Let \(\mathfrak {T}_{k}\) be the decomposition of \({\mathbb {H}}^{n}\) into tiles T as in Sect. 2.1. We define the conditional expectation of a locally integrable function f on \({\mathbb {H}}^{n}\) with respect to the increasing family of \(\sigma \)-algebras \(\sigma (\mathfrak {T}_{-k})\) by the expression:

$$\begin{aligned} E_{k}(f)(g)=\sum _{T\in \mathfrak {T}_{-k}}(f)_{T}\chi _{T}(g),\ g\in {\mathbb {H}}^n, \end{aligned}$$

where we denote \((f)_{T}\) be the average of f over T, that is, .

For \(T\in \mathfrak {T}_{k}\), we let \(h_{T}^{1}\), \(h_{T}^{2},\ldots , h_{T}^{M_n-1}\) be a family of Haar functions defined in Lemma 2.5. Next, we choose \(h_{T}\) among these Haar functions such that \(\left| \int _{T}b(g)h_{T}^{\epsilon }(g)\,dg\right| \) is maximal.

Note that the function \((E_{k+1}(b)(g)-E_{k}(b)(g))\chi _T(g)\) is a sum of \(M_n\) Haar functions. That is, we are in a finite dimensional setting and all \(L^p\)-spaces have comparable norms. So we have that

(4.1)

where C is a constant only depending on p and n.

This is the main Lemma.

Lemma 4.1

Let \(1< p<\infty \) and suppose that \(b\in L_{\textrm{loc}}^{1}({\mathbb {H}}^{n})\) satisfying \(\Vert [b,R_{\ell }]\Vert _{S^p}<\infty \) for some \(\ell \in \{1,2,\ldots ,2n\}\), then there exists a constant \(C>0\) such that

$$\begin{aligned} \sum _{k}(2n+1)^{(2n+2)k}\Vert E _{k+1}(b) -E_{k}(b)\Vert _{p} ^{p} \le C \Vert [b,R_{\ell }]\Vert _{S^p} ^{p}. \end{aligned}$$

Proof

We will ultimately apply the Rochberg–Semmes [39] notion of NWO sequences, namely the inequality (2.2). By (4.1), we have

(4.2)

To continue, for any \(T\in \mathfrak {T}_{-k}\), let \({\hat{T}}\) be the tile chosen in Theorem 3.1, then \(K_{\ell }({{{\hat{g}}}}^{-1}g)\) does not change sign for all \((g,{{{\hat{g}}}})\in T\times {\hat{T}}\) and

$$\begin{aligned} |K_{\ell }({{{\hat{g}}}}^{-1}g)|\ge \frac{C}{|T|}, \end{aligned}$$

for some constant \(C>0\). Also, let \(\alpha _{{\hat{T}}}(b)\) be a median value of b over \({\hat{T}}\). This means \(\alpha _{{\hat{T}}}(b)\) is a real number such that defining for a tile S,

$$\begin{aligned} E_{1}^{S}:=\left\{ g\in S:b(g) < \alpha _{{\hat{T}}}(b)\right\} \ \ \textrm{and}\ \ E_{2}^{S}:=\left\{ g\in S:b(g)>\alpha _{{\hat{T}}}(b)\right\} , \end{aligned}$$
(4.3)

we have, with \( S= {{\hat{T}}} \), the upper bound \( |E ^{{{\hat{T}}}} _{j}|\le \tfrac{1}{2} |{\hat{T}}|\) for \( j=1,2\). A median value always exists, but may not be unique (see for example [26]).

Next we decompose T into a union of sub-tiles by writing \(T=\bigcup _{i=1}^{M_{n}}P_{i}\), where \(P_{i}\in \mathfrak {T}_{-k-1}\) and \(P_{i}\subseteq T\) satisfying \(P_{i}\ne P_{j}\) if \(i\ne j\). By the cancellation property of \(h_{T}\), we see that

$$\begin{aligned} |T|^{-1/2}\left| \int _{T}b(g)h_{T}(g)dg\right|&=|T|^{-1/2} \left| \int _{T}(b(g)-\alpha _{{\hat{T}}}(b))h_{T}(g)\,dg\right| \nonumber \\&\quad \le \frac{1}{|T|}\int _{T}\left| b(g)-\alpha _{{\hat{T}}}(b)\right| dg\nonumber \\&\quad \le \frac{1}{|T|}\sum _{i=1}^{M_n}\int _{P_{i}}\left| b(g) -\alpha _{{\hat{T}}}(b)\right| dg\nonumber \\&\quad \le \frac{1}{|T|}\sum _{i=1}^{M_n}\int _{P_{i}\cap E_{1}^{T}} \left| b(g)-\alpha _{{\hat{T}}}(b)\right| dg+ \frac{1}{|T|}\nonumber \\&\sum _{i=1}^{M_n}\int _{P_{i}\cap E_{2}^{T}}\left| b(g)-\alpha _{{\hat{T}}}(b)\right| dg\nonumber \\&\quad =:\textrm{I}_{1}^{T}+\textrm{I}_{2}^{T}. \end{aligned}$$
(4.4)

Above, we are using the notation in (4.3).

Now we denote

$$\begin{aligned} F_{1}^{T}:=\bigg \{{{{\hat{g}}}}\in {\hat{T}}:b({{{\hat{g}}}})\ge \alpha _{{\hat{T}}}(b)\bigg \}\ \ \textrm{and}\ \ F_{2}^{T}:=\bigg \{{{{\hat{g}}}}\in {\hat{T}}:b({{{\hat{g}}}})\le \alpha _{{\hat{T}}}(b)\bigg \}. \end{aligned}$$

Then by the definition of \(\alpha _{{\hat{T}}}(b)\), we have \(|F_{1}^{T}|=|F_{2}^{T}|\sim |{\hat{T}}|\) and \(F_{1}^{T}\cup F_{2}^{T}={\hat{T}}\). Note that for \(s=1,2\), if \(g\in E_{s}^{T}\) and \( {{\hat{g}}}\in F_{s}^{T}\), then

$$\begin{aligned} \left| b(g)-\alpha _{{\hat{T}}}(b)\right|&\le \left| b(g) -\alpha _{{\hat{T}}}(b)\right| +\left| \alpha _{{\hat{T}}}(b)-b({{{\hat{g}}}})\right| \\&=\left| b(g)-\alpha _{{\hat{T}}}(b)+\alpha _{{\hat{T}}}(b)-b({{{\hat{g}}}})\right| = \left| b({{{\hat{g}}}})-b(g)\right| . \end{aligned}$$

Therefore, for \( s=1,2\),

$$\begin{aligned} \textrm{I}_{s}^{T}&\lesssim \frac{1}{|T|}\sum _{i=1}^{M_n} \int _{P_{i}\cap E_{s}^{T}}\left| b(g) -\alpha _{{\hat{T}}}(b)\right| dg\frac{|F_{s}^{T}|}{|T|}\\&\lesssim \frac{1}{|T|}\sum _{i=1}^{M_n} \int _{P_{i}\cap E_{s}^{T}}\int _{F_{s}^{T}}\left| b(g) -\alpha _{{\hat{T}}}(b)\right| \left| K_{\ell }({{{\hat{g}}}}^{-1}g)\right| d{{{\hat{g}}}}dg\\&\lesssim \frac{1}{|T|}\sum _{i=1}^{M_n}\int _{P_{i} \cap E_{s}^{T}}\int _{F_{s}^{T}}\left| b({{{\hat{g}}}})-b(g) \right| \left| K_{\ell }({{{\hat{g}}}}^{-1}g)\right| d{{{\hat{g}}}}dg \\&=\frac{1}{|T|}\sum _{i=1}^{M_n}\left| \int _{P_{i} \cap E_{s}^{T}}\int _{F_{s}^{T}}(b({{{\hat{g}}}})-b(g)) K_{\ell }({{{\hat{g}}}}^{-1}{{{\hat{g}}}})d{{{\hat{g}}}}dg\right| , \end{aligned}$$

where in the last equality we used the fact that \(K_{\ell }({{{\hat{g}}}}^{-1}g)\) and \(b({{{\hat{g}}}})-b(g)\) do not change sign for \((g,{{{\hat{g}}}})\in (P_{i}\cap E_{s}^{T})\times F_{s}^{T}\), \(s=1,2\). This, in combination with the inequalities (4.2) and (4.4), implies that

$$\begin{aligned}&(2n+1)^{(2n+2)k}\int _{{\mathbb {H}}^{n}} |E_{k+1}(b)(g)-E_{k}(b)(g)|^{p}dg \\&\quad \lesssim \sum _{T\in \mathfrak {T}_{-k}}|T| ^{-p/2}\left| \int _{T}b(g)h_{T}(g)dg\right| ^{p} \\&\quad \lesssim \sum _{s=1}^{2} \sum _{T\in \mathfrak {T}_{-k}}\left| \textrm{I}_{s}^{T}\right| ^{p} \\&\quad \lesssim \sum _{s=1}^{2}\sum _{T\in \mathfrak {T}_{-k}} \left( \sum _{i=1}^{M_n}\left| \left\langle [b,R_{\ell }] \frac{|P_{i}|^{1/2} \chi _{F_{s}^{T}}}{|T|}, \frac{\chi _{P_i\cap E_{s}^{T}}}{|P_{i}|^{1/2}} \right\rangle \right| \right) ^{p}. \end{aligned}$$

Note that \(e_T:=\frac{|P_{i}|^{1/2} \chi _{F_{s}^{T}}}{|T|} \subset {{\hat{T}}}\) and \(f_T:=\frac{\chi _{P_i\cap E_{s}^{T}}}{|P_{i}|^{1/2}} \subset T\). Based on Theorem 3.1, we see that for each \(T\in \mathfrak {T}_{-k}\), there is a unique \(T_{A_0}\in \mathfrak {T}_{-k+A_0}\) such that \(T, {{\hat{T}}}\subset T_{A_0}\). Hence, \(|e_T|, |f_T|\le C|T_{A_0}|^{-{1\over 2}}\chi _{T_{A_0}} \), where C is an absolute constant depending only on n and \(A_0\). Note also that each \(T_{A_0}\in \mathfrak {T}_{-k+A_0}\) contains only a finite number (depending on \(n,A_0\)) of \(T\in \mathfrak {T}_{-k}\) with \(T, {{\hat{T}}}\subset T_{A_0}\). Sum this last inequality over \( k\in {\mathbb {Z}} \), and appeal to (2.2) to conclude the Lemma. \(\square \)

Corollary 4.2

Let \(1<p<\infty \) and suppose that \(b\in L_{\textrm{loc}}^{1}({\mathbb {H}}^{n})\) satisfying \(\Vert [b,R_{\ell }]\Vert _{S^p}<\infty \) for some \(\ell \in \{1,2,\ldots ,2n\}\), then there exists a constant \(C>0\) such that for any \(k\in {\mathbb {Z}}\),

$$\begin{aligned} \Vert b-E_{k}(b)\Vert _{p}\le C(2n+1)^{-(2n+2)k/p}\bigg \Vert [b,R_{\ell }]\bigg \Vert _{S^p}. \end{aligned}$$

Proof

Note that \(E_{k}(b)\rightarrow b\) a.e. as \(k\rightarrow \infty \). Besides, by Lemma 4.1, \(\Vert E_{k+1}(b)-E_{k}(b)\Vert _{p}\le C(2n+1)^{-(2n+2)k/p}\Vert [b,R_{\ell }]\Vert _{S^{p}}\). Combining these two facts and summing the geometric series yield the conclusion. \(\square \)

Lemma 4.3

Let \(1<p<\infty \) and suppose that \(b\in L_{\textrm{loc}}^{1}({\mathbb {H}}^{n})\) satisfying \(\Vert [b,R_{\ell }]\Vert _{S^p}<\infty \) for some \(\ell \in \{1,2,\ldots ,2n\}\), then

$$\begin{aligned} \left( \sum _k (2n+1)^{(2n+2)k}\Vert b-E_{k}(b)\Vert _{p}^{p}\right) ^{1/p}&\lesssim \Vert [b,R_{\ell }]\Vert _{S^{p}}. \end{aligned}$$

Proof

It suffices to show that

$$\begin{aligned} \left( \sum _{k=L}^M (2n+1)^{(2n+2)k}\Vert b-E_{k}(b)\Vert _{p}^{p}\right) ^{1/p}&\le C \Vert [b,R_{\ell }]\Vert _{S^{p}}, \end{aligned}$$

for some constant \(C>0\) independent of \(L<M\in {\mathbb {N}}\). To this end, we denote the term in the left hand side above by \({\mathfrak {J}}\) and then note that

$$\begin{aligned} {\mathfrak {J}}&\le \left( \sum _{k=L}^M (2n+2)^{(2n+1)k}\Vert b-E_{k+1}(b)\Vert _{L^p({\mathbb {H}}^{n})}^{p}\right) ^{1/p}\\&\quad +\left( \sum _{k=L}^M(2n+2)^{(2n+1)k}\Vert E_{k+1}(b)-E_{k}(b)\Vert _{L^p({\mathbb {H}}^{n})}^{p}\right) ^{1/p}\\&=\left( \sum _{k=L+1}^{M+1}(2n+2)^{(2n+1)(k-1)}\Vert b-E_{k}(b)\Vert _{L^p({\mathbb {H}}^{n})}^{p}\right) ^{1/p}\\&\quad +\left( \sum _{k=L}^M(2n+2)^{(2n+1)k}\Vert E_{k+1}(b)-E_{k}(b)\Vert _{L^p({\mathbb {H}}^{n})}^{p}\right) ^{1/p}\\&=: {Term _{1}}+{Term _{2}}. \end{aligned}$$

To continue, we first note that Lemma 4.1 controls \( {Term _{2}}\). Besides, \({Term _{1}}\) is dominated by

$$\begin{aligned}&(2n+2)^{-(2n+1)/p}\left( \sum _{k=L}^{M}(2n+2)^{(2n+1)k}\Vert b-E_{k}(b)\Vert _{L^p({\mathbb {H}}^{n})}^{p}\right) ^{1/p} \nonumber \\&\quad +(2n+2)^{(2n+1)M/p}\Vert b-E_{M+1}(b)\Vert _{L^p({\mathbb {H}}^{n})}. \end{aligned}$$
(4.8)

By Corollary 4.2, we see that the first term of the right-hand side in (4.8) can be absorbed into \( {\mathfrak {J}}\), while the second term can be dominated by \(C\Vert [b,R_{\ell }]\Vert _{S^{p}}\).

This ends the proof of Lemma 4.3. \(\square \)

Proposition 4.4

Let \(2n+2<p<\infty \) and suppose that \(b\in L_{\textrm{loc}}^{1}({\mathbb {H}}^{n})\), then there exists a constant \(C>0\) such that

$$\begin{aligned} \Vert b\Vert _{B_{p,p}^{\frac{2n+2}{p}}({\mathbb {H}}^n)}\le C\Vert [b,R_{\ell }]\Vert _{S^p}. \end{aligned}$$

Proof

To begin with, we note that

$$\begin{aligned}&\int _{{\mathbb {H}}^{n}}\int _{{\mathbb {H}}^{n}}\frac{|b(g)-b({{{\hat{g}}}}) |^{p}}{d(g,{{{\hat{g}}}})^{2(2n+2)}}dgd{{\hat{g}}}\\&\quad \lesssim \sum _{k\in {\mathbb {Z}}}(2n+1)^{2(2n+2)k}\iint _{d(g,{{{\hat{g}}}})\le (2n+1)^{-k-1}}|b(g)-b({{{\hat{g}}}})|^{p}dgd{{{\hat{g}}}}. \end{aligned}$$

Hence, it suffices to show that

$$\begin{aligned} \sum _{k=L}^{M}(2n+1)^{2(2n+2)k}\iint _{d(g,{{\hat{g}}})\le (2n+1)^{-k-1}}\bigg |b(g)-b({{{\hat{g}}}})\bigg |^{p}dgd{{{\hat{g}}}}\le C\bigg \Vert [b,R_{\ell }]\bigg \Vert _{S^{p}}^p, \end{aligned}$$
(4.9)

where C is a constant independent of \(L<M\in {\mathbb {Z}}\).

Recall that a tile T in \(\mathfrak {T}_{-k}\) is approximately a Heisenberg ball of radius \((2n+1)^{-k}\). Fix a Heisenberg ball B centered at the origin with radius \((2n+1)^{-L+A}\) for a large fixed integer A, and then denote \(b_{{\tilde{g}}}(g):= b({\tilde{g}} g)\) for \(\tilde{g}\in {\mathbb {H}}^n\). Then the left-hand side of (4.9) is dominated by a constant times

$$\begin{aligned}&{1\over |B|}\int _B \sum _{k=L}^{M}\sum _{T\in \mathfrak {T}_{-k}} (2n+1)^{2(2n+2)k} \int _T\int _T |b_{{\tilde{g}}}(g)-b_{{\tilde{g}}}({{{\hat{g}}}})|^p\, d{{{\hat{g}}}}\, dg\, d{\tilde{g}}\nonumber \\&\quad \lesssim {1\over |B|}\int _B \sum _{k=L}^{M}\sum _{T\in \mathfrak {T}_{-k}} (2n+1)^{(2n+2)k} \int _T| b_{{\tilde{g}}}(g)- E_k(b_{{\tilde{g}}})(g) |^p\,dg\,d{\tilde{g}}\nonumber \\&\quad \lesssim {1\over |B|}\int _B C\Vert [b_{\tilde{g}},R_{\ell }]\Vert _{S^{p}}^p\, d {\tilde{g}}, \end{aligned}$$
(4.10)

where in the first inequality we added and subtracted the term \(E_k(b_{{\tilde{g}}})\) by noting that for \(g,{{\hat{g}}}\in T\in \mathfrak {T}_{-k}\), \(E_k(b_{{\tilde{g}}})(g)=E_k(b_{{\tilde{g}}})({{\hat{g}}})\), and in the second inequality we use Lemma 4.3. Next, as the Riesz transform is convolution, \(\Vert [b_{\tilde{g}},R_{\ell }]\Vert _{S^{p}}=\Vert [b,R_{\ell }]\Vert _{S^{p}}\), we obtain that the right-hand side of (4.10) is bounded by \(C\Vert [b,R_{\ell }]\Vert _{S^{p}}^p\). Hence, (4.9) holds.

Therefore, the proof of Proposition 4.4 is complete. \(\square \)

4.2 Proof of the Sufficient Condition

Proposition 4.5

Suppose \(\ell \in \{1,2,\ldots ,2n\}\), \(2n+2<p<\infty \) and \(b\in L^1_{\textrm{loc}}({\mathbb {H}}^n)\). If \(b\in B_{p,p}^{\frac{2n+2}{p}}({\mathbb {H}}^n)\), then \([b,R_{\ell }]\in S^p\).

Proof

We follow the proof in [25], which relies upon general estimates for Schatten norms of integral operators. For the convenience of the readers, we briefly sketch the proof here. We first recall that \([b,R_{\ell }]\) is compact [8] when \(b\in B_{p,p}^{\frac{2n+2}{p}}({\mathbb {H}}^n)\subset \textrm{VMO}({\mathbb {H}}^n)\). Note that Russo [40] proved that for general measure space \((X,\mu )\), if \(p>2\) and \(K(x,y)\in L^{2}(X\times X)\), then the integral operator T associated to the kernel K(xy) satisfies the following bound:

$$\begin{aligned} \Vert T\Vert _{S^{p}}\le \Vert K\Vert _{L^p,L^{p^{\prime }}}^{1/2}\Vert K^{*}\Vert _{L^p,L^{p^{\prime }}}^{1/2}, \end{aligned}$$

where \(p'\) is the conjugate index of p, \(K^{*}(x,y)=\overline{K(y,x)}\), and \(\Vert \cdot \Vert _{L^p, L^{p^{\prime }}}\) denotes the mixed-norm: \( \Vert K\Vert _{L^p,L^{p^{\prime }}}:=\big \Vert \Vert K(x,y)\Vert _{L^p(dx)}\big \Vert _{L^{p^{\prime }}(dy)}. \) Later on Goffeng [21] showed that the condition \(K(x,y)\in L^{2}(X\times X)\) in the above statement can be removed.

Moreover, Janson–Wolff [25, Lemmas 1 and 2] extended the above statement to the corresponding weak-type version general measure space \((X,\mu )\): if \(p>2\) and \(1/p+1/p^{\prime }=1\), then

$$\begin{aligned} \Vert T\Vert _{S^{p,\infty }}\le \Vert K\Vert _{L^{p},L^{p^{\prime },\infty }}^{1/2}\Vert K^{*}\Vert _{L^{p},L^{p^{\prime },\infty }}^{1/2}, \end{aligned}$$
(4.13)

where \(\Vert \cdot \Vert _{L^p, L^{p^{\prime },\infty }}\) denotes the mixed-norm: \(\Vert K\Vert _{L^p,L^{p^{\prime },\infty }}{:=}\big \Vert \Vert K(x,y)\Vert _{L^p(dx)}\big \Vert _{L^{p^{\prime },\infty }(dy)}\).

Next, back to our setting on Heisenberg group, we note that for \(1/q=1-2/p\),

$$\begin{aligned} \left\| \frac{1}{d(g,{{\hat{g}}})^{(2n+2)(1-2/p)}}\right\| _{L^{\infty },L^{q,\infty }}&=\sup \limits _{g\in {\mathbb {H}}^n}\sup \limits _{\alpha>0}\alpha \left| \left\{ {\hat{g}}\in {\mathbb {H}}^n:\frac{1}{d(g,{{\hat{g}}})^{(2n+2)(1-2/p)}}>\alpha \right\} \right| ^{1/q}\\&=\sup \limits _{g\in {\mathbb {H}}^n}\sup \limits _{\alpha>0}\alpha \left| B(g,\alpha ^{-\frac{1}{(2n+2)(1-2/p)}})\right| ^{1/q}\\&\approx \sup \limits _{g\in {\mathbb {H}}^n}\sup \limits _{\alpha >0}\alpha \ \big ( \alpha ^{-\frac{1}{(2n+2)(1-2/p)}}\big )^{2n+2\over q}\\&\lesssim 1. \end{aligned}$$

Then by weak-type Young’s inequality,

$$\begin{aligned}&\left\| (b(g)-b({{\hat{g}}}))K(g,{{\hat{g}}})\right\| _{L^p, L^{p^{\prime },\infty }} \nonumber \\&\quad \le \left\| \frac{b(g)-b({{\hat{g}}})}{d(g,{{\hat{g}}})^{2n+2}}\right\| _{L^p, L^{p^{\prime },\infty }}\nonumber \\&\quad \le \left\| \frac{b(g)-b({{\hat{g}}})}{d(g,{{\hat{g}}})^{2(2n+2)/p}}\right\| _{L^{p},L^{p}}\left\| \frac{1}{d(g,{{\hat{g}}})^{(2n+2)(1-2/p)}}\right\| _{L^{\infty },L^{q,\infty }}\nonumber \\&\quad \le C\Vert b\Vert _{B_{p,p}^{(2n+2)/p}({\mathbb {H}}^n)}. \end{aligned}$$
(4.14)

Similarly,

$$\begin{aligned} \left\| (b(g)-b({{\hat{g}}}))\overline{K({{\hat{g}}},g)}\right\| _{L^p, L^{p^{\prime },\infty }}\le C\Vert b\Vert _{B_{p,p}^{(2n+2)/p}({\mathbb {H}}^n)}. \end{aligned}$$
(4.15)

Combining the inequalities (4.14), (4.15) and then applying the weak-type Russo’s inequality (4.13), we see that

$$\begin{aligned} \Vert [b,R_{\ell }]\Vert _{S^{p,\infty }}\le C\Vert b\Vert _{B_{p,p}^{(2n+2)/p}({\mathbb {H}}^n)}. \end{aligned}$$

Since this inequality holds for all \(2n+2<p<\infty \), we can apply the interpolation \((S^{p_1,\infty },S^{p_2,\infty })_{\theta _p}=S^{p}\) and \((B_{p_1,p_1}^{(2n+2)/p_1},B_{p_2,p_2}^{(2n+2)/p_2})_{\theta _{p}}=B_{p,p}^{(2n+2)/p}\) (see for example [31, Theorem 4.1] and [47, Theorem 3.1]), where \(\frac{1-\theta _p}{p_1}+\frac{\theta _p}{p_2}=\frac{1}{p}\), to obtain that

$$\begin{aligned} \Vert [b,R_{\ell }]\Vert _{S^{p}}\le C\Vert b\Vert _{B_{p,p}^{(2n+2)/p}({\mathbb {H}}^n)}. \end{aligned}$$

This finishes the proof of sufficient condition for the case \(2n+2<p<\infty \). \(\square \)

5 Theorem 1.2: \(0<p\le 2n+2\)

In this section, we prove the second argument in Theorem 1.2. That is, for each \(\ell \in \{1,2,\ldots ,2n\}\) and for \(0<p\le 2n+2\), the commutator \([b,R_{\ell }]\) is in \(S^p\) if and only if b is a constant. The sufficient condition is obvious, since \([b,R_{\ell }]=0\) when b is a constant. Thus, it suffices to show the necessary condition. It suffices to consider the critical case \(p=2n+2\), by the inclusion \( S^p\subset S^{q}\) for \(p<q\).

To formulate our argument simply, we will usually identity \({\mathbb {C}}^{n}\) with \({\mathbb {R}}^{2n}\) in Lemmas 5.15.3 and use the following notation to denote the points of \({\mathbb {C}}^{n}\times {\mathbb {R}}\equiv {\mathbb {R}}^{2n+1}: g=[z,t]\equiv [x,y,t]=[x_{1},\ldots ,x_{n},y_{1},\ldots ,y_{n},t]\) with \(z=[z_{1},\ldots ,z_{n}]\), \(z_{j}=x_{j}+iy_{j}\) and \(x_{j},y_{j},t\in {\mathbb {R}}\) for \(j=1,\ldots ,n\). Then the multiplication law can be explicitly expressed as

$$\begin{aligned} gg^{\prime }=[x,y,t][x^{\prime },y^{\prime },t^{\prime }]=\bigg [x+x^{\prime },y+y^{\prime },t+t^{\prime }+2\langle y,x^{\prime }\rangle -2\langle x,y^{\prime } \rangle \bigg ], \end{aligned}$$

where \(\langle \cdot ,\cdot \rangle \) denotes the standard inner product in \({\mathbb {R}}^{n}\).

Lemma 5.1

There exists a positive integer \(B_0\) such that for any tile \(T\in \mathfrak {T}_{-k}\) and \(a_{j}=\pm 1\) (\(j=1,2,\ldots ,2n\)), there are tiles \(T^{\prime }\in \mathfrak {T}_{-k-B_0}\), \(T^{\prime \prime }\in \mathfrak {T}_{-k-B_0}\) such that \(T^{\prime }\subset T\), \(T^{\prime \prime }\subset T\) and if \(g=(g_{1},\ldots ,g_{2n},t)\in T^{\prime \prime }\), \(h=(h_{1},\ldots ,h_{2n},t^{\prime })\in T^{\prime }\), then \(a_{j}(g_{j}-h_{j}) > rsim {\text {width}}(T)\) \((j=1,2,\ldots ,2n)\).

Proof

Consider first \(T = \delta _{(2n+1)^k} (T_o)\). Based on (4) in Lemma 2.3, we see that \(B\bigg (o, C_1(2n+1)^k\bigg )\subset T\). Then one can choose \(g_{o,1}\in B\bigg (o, C_1(2n+1)^k\bigg )\) such that \(d(g_{o,1},o) = {3C_1\over 4}(2n+1)^k\), and that all the first 2n components of \(g_{o,1}\) is positive and equals to \({3C_1\over 4}(2n+1)^k\). Thus, we have \(B(g_{o,1}, {C_1\over 40} (2n+1)^k) \subset B(o, C_1(2n+1)^k)\) and that for every \(x=(x_1,\ldots ,x_{2n},t_x)\in B(g_{o,1}, {C_1\over 40} (2n+1)^k),\) we have \(x_i>0\) and is equivalent to \({3C_1\over 4}(2n+1)^k\). Then taking the inverse of the ball \(B\bigg (g_{o,1}, {C_1\over 40} (2n+1)^k\bigg )\), we get other ball \(B\bigg (g_{o,2}, {C_1\over 40} (2n+1)^k\bigg )\) such that \(g_{o,2}=g_{o,1}^{-1}\) and that for every \(y=(y_1,\ldots ,y_{2n},t_y)\in B\bigg (g_{o,2}, {C_1\over 20} (2n+1)^k\bigg ),\) we have \(y_i<0\) and is equivalent to \(-{3C_1\over 4}(2n+1)^k\). As a consequence, we see that there exist \(T^{\prime }\in \mathfrak {T}_{-k-B_0}\) such that \(T'\subset B\bigg (g_{o,1}, {C_1\over 40} (2n+1)^k\bigg )\) and \(T^{\prime \prime }\in \mathfrak {T}_{-k-B_0}\) such that \(T''\subset B\bigg (g_{o,2}, {C_1\over 20} (2n+1)^k\bigg )\). Then it is clear that if \(g\in T^{\prime \prime }\), \(h\in T^{\prime }\), then \(g_{j}-h_{j} > rsim {\text {width}}(T)\) \((j=1,2,\ldots ,2n)\).

For general \(T\in \mathfrak {T}_{-k}\) with \(u={\text {cent}}(T)\), we know that \(T = \delta _{(2n+1)^k} (u) \cdot \delta _{(2n+1)^k} (T_o)\). Hence, the argument holds by using the translation and dilation. This ends the proof of Lemma 5.1. \(\square \)

Recall the following first order Taylor’s inequality on Heisenberg group from [5].

Lemma 5.2

Let \(f\in C^{\infty }({\mathbb {H}}^{n})\), then for every \(g=(x_{1},\ldots ,x_{2n},t),g_{0}=(x_{0}^{1},\ldots ,x_{0}^{2n},t_0)\in {\mathbb {H}}^{n}\), we have

$$\begin{aligned} f(g)=f(g_{0})+\sum _{j=1}^{2n}X_{j}f(g_{0})(x_{j}-x_{0}^{j})+R(g,g_{0}), \end{aligned}$$

where the remainder \(R(g,g_{0})\) satisfies the following inequality:

$$\begin{aligned} |R(g,g_{0})|\le C\left( \sum _{j=1}^{2}\frac{c^{j}}{j!}\sum _{\begin{array}{c} i_{1},\ldots ,i_{j}\le 2n+1,\\ I=(i_{1},\ldots ,i_{j}),\ \vartheta (I)\ge 2 \end{array}}\rho (g_{0}^{-1}g)^{\vartheta (I)}\sup \limits _{\rho (z)\le c\rho (g_{0}^{-1}g)}|X^{I}f(g_{0}z)|\right) , \end{aligned}$$

for some constant \(c>0\). Here \(\vartheta (I)\) is the homogeneous degree with respect to I given in (2.1).

We denote \(\nabla \) be the horizontal gradient of \({\mathbb {H}}^{n}\) defined by \(\nabla f:=(X_{1}f,\ldots ,X_{2n}f)\). Then we can show a lower bound for a local pseudo-oscillation of the symbol b in the commutator.

Lemma 5.3

Let \(b\in C^{\infty }({\mathbb {H}}^n)\). Assume that there is a point \(g_{0}\in {\mathbb {H}}^{n}\) such that \(\nabla b(g_{0})\ne 0\). Then there exist \(C>0\), \(\varepsilon >0\) and \(N>0\) such that if \(k>N\), then for any tile \(T\in \mathfrak {T}_{-k}\) satisfying \(d({\text {cent}}(T),g_{0})<\varepsilon \), one has

(5.4)

Above, \(T^{\prime }\) and \( T''\) are the tiles chosen in Lemma 5.1.

Proof

Denote \(c_{T}:={\text {cent}}(T):=\{c_{T}^{1},\ldots ,c_{T}^{2n},t_T\}\) and \(g=(g_1,\ldots ,g_{2n},t)\), then by Lemma 5.2,

$$\begin{aligned} b(g)=b(c_{T})+\sum _{j=1}^{2n}X_{j}b(c_{T})(g_{j}-c_{T}^{j})+R(g,c_{T}), \end{aligned}$$
(5.5)

where the remainder term \(R(g,c_{T})\) satisfies

$$\begin{aligned} |R(g,c_{T})|\le C\left( \sum _{j=1}^{2}\frac{c^{j}}{j!}\sum _{\begin{array}{c} i_{1},\ldots ,i_{j}\le 2n+1,\\ I=(i_{1},\ldots ,i_{j}),\ \vartheta (I)\ge 2 \end{array}}\rho (c_{T}^{-1}g)^{\vartheta (I)}\sup \limits _{\rho (z)\le c\rho (c_{T}^{-1}g)}|X^{I}b(c_{T}z)|\right) . \end{aligned}$$

Note that the condition \(\rho (z)\le c\rho (c_{T}^{-1}g)\) implies that \(d(c_{T}z,c_{T})=\rho (z)\le c\rho (c_{T}^{-1}g)\lesssim {\text {width}}(T)\) whenever \(g\in T\). Hence, if \(g\in T\), then

$$\begin{aligned} |R(g,c_{T})| \lesssim {\text {width}}(T)^{2}\sum _{j=1}^{2}\sum _{\begin{array}{c} i_{1},\ldots ,i_{j}\le 2n+1,\\ I=(i_{1},\ldots ,i_{j}),\ \vartheta (I)\ge 2 \end{array}}\Vert X^{I}b\Vert _{L^{\infty }(B(g_{0},1))}. \end{aligned}$$

For \( \epsilon = \epsilon _{b} >0\) sufficiently small, this last estimate is smaller than the right hand side of (5.4). That is, in (5.5), we are only concerned with the first two terms on the right.

Apply Lemma 5.1, with the choice of signs \( a_j = \textrm{sgn}(X_{j}b)(c_{T})\). Let \(T', T''\) be the tiles that this Lemma provides to us. For \( g'= (g_j')\in T' \) and \( g'' = (g''_j) \in T''\), we have

$$\begin{aligned} \textrm{sgn}(X_{j}b)(c_{T})(g_{j}'-g_{j}'') > rsim {\text {width}}(T), \qquad j=1,\dotsc , 2n. \end{aligned}$$

Therefore, we can estimate

This inequality completes the Lemma. \(\square \)

Lemma 5.4

A function \(b\in L_{\textrm{loc}}^{1}({\mathbb {H}}^{n})\) is constant if

(5.7)

(In the display, \( T\in \mathfrak {T}_k\), and both T and k vary. And \( \tau ^{h}\) denotes translation by h).

Proof

The assumption is that \( b\in L_{\textrm{loc}}^{1}({\mathbb {H}}^{n})\), but the previous Lemmas require b to be smooth. Denote \(\psi _{\epsilon }(g):=\epsilon ^{-2n-2}\psi (\delta _{\epsilon ^{-1}}g)\), where \(\psi \) is a smooth compactly supported bump function which integrates to one, and \(\epsilon \) is a small positive constant. Then, \(b _{\epsilon } = b*\psi _{\epsilon }\) is smooth. We argue that these are all constant. And, they converge to b pointwise so this is sufficient.

The point is that \( b _{\epsilon }\) is smooth and we observe that

(5.8)

If \( b _{\epsilon }\) is not constant, we argue that the norm in the left hand side above is actually infinite, which is a contradiction. It follows from [6, Proposition 1.5.6] that there exists a point \(g_{0}\in {\mathbb {H}}^{n}\) such that \(\nabla b*\psi _{\epsilon }(g_{0})\ne 0\). But then, Lemma 5.3 applies. There exist \(\varepsilon >0\) and \(N>0\) such that if \(k>N\), then for any tile \(T\in \mathfrak {T}_{-k}\) satisfying \(d({\text {cent}}(T),g_{0})<\varepsilon \),

Note that for \(k>N\), the number of \(T\in \mathfrak {T}_{-k}\) and \(d({\text {cent}}(T),g_{0})<\varepsilon \) is at least

$$\begin{aligned} c(2n+1)^{k(2n+2)} \simeq {\text {width}}(T) ^{- (2n+2)}. \end{aligned}$$

But then, it is clear that the norm in (5.8) is infinite. \(\square \)

Proposition 5.5

Suppose \(b\in L^{1}_{\textrm{loc}}({\mathbb {H}}^n)\). Then for any \(\ell \in \{1,2,\ldots ,2n\}\), the commutator \([b,R_{\ell }]\in S^{2n+2}\) if and only if b is a constant.

Proof

A constant function b is associated with the zero commutator. So, we only consider the direction in which we assume \([b,R_{\ell }]\in S^ {2n+2}\). And, then, we need to verify that (5.7) holds. That inequality has the supremum over translations. The Riesz transforms are themselves convolution operators, so that it suffices to verify (5.7) without translations. That is,

(5.10)

(In the display, \( T\in \mathfrak {T}_k\), and both T and k vary).

This is in fact a corollary to Lemma 4.1, and is seen by way of a general remark. For a random variable Z, we have for \( 1\le p < \infty \),

$$\begin{aligned} \Vert Z - {\mathbb {E}} Z \Vert _p \simeq \Vert Z - Z'\Vert _p, \end{aligned}$$

where \( Z'\) is an independent copy of Z. Indeed,

$$\begin{aligned} \Vert Z - {\mathbb {E}} Z \Vert _p&= \Vert Z - {\mathbb {E}} Z ' \Vert _p \\&\le \Vert Z - Z'\Vert _p \le 2 \Vert Z - {\mathbb {E}} Z \Vert _p. \end{aligned}$$

The first inequality is by convexity and the second by the triangle inequality.

Thus, Lemma 4.1 implies

as \( B_0\) is a fixed integer. And then (5.10) follows. \(\square \)

6 Applications

As stated in the introduction, our approach depends upon a standard non-degeneracy condition on the kernel of the singular integral operator, and then on robust real variable techniques. (In particular, no Fourier analysis.) The approach applies to the following non-Euclidean Calderón–Zygmund operators.

  1. (1)

    The Cauchy–Szegő projection \({\mathcal {C}}\) [41, Chapter12, Sect. 2.4] is an important singular integral on \({\mathbb {H}}^n\). It recovers an analytic function in the Siegel upper half space from its boundary value. Its restriction to the boundary is a convolution operator, that is, \({\mathcal {C}}(f)(g)=\int _{{\mathbb {H}}^n} f (g') k_{CS}((g')^{-1}g)dg'\), and the convolution kernel \(k_{CS}\) is given by

    $$\begin{aligned} k_{CS}(g)= \frac{c}{(|z|^2+\i \, t)^{n+1}},\quad \i ^2=-1, \qquad \forall g=(z,t)\in {\mathbb {H}}^n. \end{aligned}$$

    It is well-known that this \(k_{CS}\) is a Calderón–Zygmund kernel. From the explicit kernel, we see that the non-degeneracy condition in our Theorem 3.1 holds for \({\mathcal {C}}\). Hence, Theorem 1.2 holds for \([b, {\mathcal {C}}]\). This recovers the Theorem A obtained by Feldman–Rochberg [15] where they relied on the Cayley transform and Fourier transform.

  2. (2)

    Second order Riesz transforms appear naturally in the study of PDEs (see for instance [20]) and have been extensively studied in literature. They are mostly interpreted as iterations of Riesz transforms and their adjoints, or second derivatives of the fundamental solution operator for the Laplacian: \(\partial _i\partial _j (-\Delta )^{-1}\). On Euclidean spaces, second order Riesz transforms are well understood as Calderón–Zygmund singular integrals and have bounded \(L^p\) norm for \(1<p<\infty \).

  3. (2a)

    A particular interesting example is the classical Beurling–Ahlfors operator \({\mathcal {B}}\) on the complex plane defined by (see for example [3, 35])

    $$\begin{aligned} {\mathcal {B}}(f)(z)&= \mathrm{p.v.} {1\over \pi } \int _{{\mathbb {C}}} {f(w) \over \big ( z-w\big )^2} \ dw. \end{aligned}$$

    Equivalently, we have

    $$\begin{aligned} {\mathcal {B}} = \partial ^2 (-\Delta )^{-1}, \end{aligned}$$

    where \(\partial = {\partial \over \partial x_1}- \i {\partial \over \partial x_2}\) is the Cauchy–Riemann operator and \(\Delta \) is the Laplacian on \({\mathbb {R}}^2\). Note that the kernel of \({\mathcal {B}}\) is homogeneous and smooth away from the diagonal. Hence, the Schatten class \([b,{\mathcal {B}}]\) was covered by Rochberg–Semmes [39]. Our approach can also be applied to \([b,{\mathcal {B}}]\), to have the explicit quantitative estimate for the Schatten norm.

  4. (2b)

    Second order Riesz transform \({\mathcal {T}} (-\Delta _{\mathbb {H}})^{-1}\) on \({\mathbb {H}}^n\) (recall that \({\mathcal {T}} = {1\over 4} (X_j X_{n+j} - X_{n+j} X_{j})\)). By using functional calculus for \((-\Delta _{\mathbb {H}})^{-1}\), it is direct to see that

    $$\begin{aligned} {\mathcal {T}} (-\Delta _{\mathbb {H}})^{-1}= \int _0^\infty {\mathcal {T}} e^{h \Delta _{\mathbb {H}}} \, dh, \end{aligned}$$

    which gives that the kernel K of \({\mathcal {T}} (-\Delta _{\mathbb {H}})^{-1}\) is a convolution kernel. Together with the size and smoothness estimates for the heat kernel [45], we obtain that for \(g\not =[0,0]\),

    $$\begin{aligned} |K(g)|\lesssim {1\over \rho (g)^{2n+2}}\quad \textrm{and}\quad |X_\ell K(g)|\lesssim {1\over \rho (g)^{2n+3}}, \end{aligned}$$

    for \(\ell =1,2,\ldots ,2n\). Hence, \({\mathcal {T}} (-\Delta _{\mathbb {H}})^{-1}\) is a Calderón–Zygmund operator on \({\mathbb {H}}^n\). We now verify the non-degeneracy condition in our Theorem 3.1. We follow the idea in [13, Sect. 7]. Recall that ([19]) the explicit expression of heat kernel on the Heisenberg group \(\mathbb H^n\) is as follows: for \(g=[z,t]\in {\mathbb {H}}^n\),

    $$\begin{aligned} p_h(g)={1\over 2(4\pi h)^{n+1}}\int _{\mathbb R}\exp \Big ({\lambda \over 4h}\big (\i t-|z|_{\mathbb C^n}^2\coth \lambda \big )\Big )\Big ({\lambda \over \sinh \lambda }\Big )^nd\lambda ,\quad \i ^2=-1. \end{aligned}$$
    (6.1)

    For any \(g=[z,t]\in {\mathbb {H}}^n\), by using the explicit expression of the heat kernel above and by Fubini’s theorem, we have that for \(g\not =[0,0]\),

    $$\begin{aligned} K(g)&={1\over 2(4\pi )^{n+1}}{\partial \over \partial t}\int _{0}^{+\infty }h^{-n-1}\int _{{\mathbb {R}}}\exp \Big ({\lambda \over 4h}\big (\i t-|z|_{\mathbb C^n}^2\coth \lambda \big )\Big )\Big ({\lambda \over \sinh \lambda }\Big )^n\ d\lambda \ dh\\&={1\over 2(4\pi )^{n+1}}{\partial \over \partial t}\int _{\mathbb R}\,\int _{0}^{+\infty }h^{-n-1}\exp \Big ({\lambda \over 4h}\big (\i t-|z|_{{\mathbb {C}}^n}^2\coth \lambda \big )\Big )dh\ \Big ({\lambda \over \sinh \lambda }\Big )^nd\lambda \\&=C_1{\partial \over \partial t}\int _{{\mathbb {R}}}\big (|z|_{\mathbb C^n}^2\lambda \coth \lambda -\i \lambda t\big )^{-n}\Big ({\lambda \over \sinh \lambda }\Big )^nd\lambda \\&=C_2 \int _{{\mathbb {R}}}\big (|z|_{\mathbb C^n}^2\lambda \coth \lambda -\i \lambda t\big )^{-n-1}\Big ({\lambda \over \sinh \lambda }\Big )^n \lambda \ d\lambda , \end{aligned}$$

    where in the next to the last equality we applied Cauchy integral formula to deform the ray on right-half complex plane \({\mathbb {C}}_{+}\) into the real axis. Here we also note that

    $$\begin{aligned} C_{2}=-n\, \i C_{1}=-\frac{n\, \i }{8\pi ^{n+1}}\int _{0}^{\infty }s^{-n-1}e^{-s^{-1}}ds\ne 0. \end{aligned}$$
    (6.2)

    Observe that

    $$\begin{aligned} |z|_{{\mathbb {C}}^n}^2\lambda \coth \lambda -\i \lambda t&={\lambda \over \sinh \lambda }d_K^2(g)\bigg ({|z|_{{\mathbb {C}}^n}^2\over d_K^2(g)}\cosh \lambda -\i {t\over d_K^2(g)}\sinh \lambda \bigg )\nonumber \\&={\lambda \over \sinh \lambda }d_K^2(g)\cosh (\lambda -\i \phi ), \end{aligned}$$

    where \(d_K\) is the Korányi metric given by \(d_K(g) = (|z|_{\mathbb C^n}^4+t^2)^{1\over 4}\) for \(g=[z,t]\in {\mathbb {H}}^n\), and

    $$\begin{aligned} -{\pi \over 2}\le \phi =\phi (|z|_{{\mathbb {C}}^n},t)\le {\pi \over 2},\quad e^{\i \phi }=d_K^{-2}(g)(|z|_{{\mathbb {C}}^n}^2+\i \,t). \end{aligned}$$
    (6.3)

    Thus, we have

    $$\begin{aligned} K(g)&=C_2 \int _{{\mathbb {R}}}\bigg ( {\lambda \over \sinh \lambda }d_K^2(g)\cosh (\lambda -\i \phi ) \bigg )^{-n-1}\Big ({\lambda \over \sinh \lambda }\Big )^n \lambda \ d\lambda \\&=C_2 d_K^{-2n-2}(g) \int _{{\mathbb {R}}}\big ( \cosh (\lambda -\i \phi ) \big )^{-n-1} \Big ({\lambda \over \sinh \lambda }\Big )^{-n-1}\Big ({\lambda \over \sinh \lambda }\Big )^n \lambda \ d\lambda \\&=C_2 d_K^{-2n-2}(g) \int _{{\mathbb {R}}}\big ( \cosh (\lambda -\i \phi ) \big )^{-n-1} \ \sinh \lambda \ d\lambda \\&=C_2 d_K^{-2n-2}(g) \int _{{\mathbb {R}}}\big ( \cosh (\lambda ) \big )^{-n-1} \ \sinh (\lambda +\i \phi ) \ d\lambda , \end{aligned}$$

    where the last equality follows from Cauchy integral formula again. We now define

    $$\begin{aligned} F(g):=\int _{{\mathbb {R}}}\big ( \cosh (\lambda ) \big )^{-n-1} \ \sinh (\lambda +\i \phi ) \ d\lambda . \end{aligned}$$

    Then we have \(K(g)=C_2F(g) d_K^{-2n-2}(g) \). We now investigate the function

    $$\begin{aligned} {\mathfrak {F}}(w):=\int _{{\mathbb {R}}}\big ( \cosh (\lambda ) \big )^{-n-1} \ \sinh (\lambda +w) \ d\lambda , \quad w\in {\mathbb {C}}. \end{aligned}$$

    Then we have \(F(g) = {\mathfrak {F}}(\i \phi )\) with \(g=[z,t]\not =0\) and \(\phi =\phi \bigg (|z|_{{\mathbb {C}}^n}, t\bigg )\) such that \(e^{\i \phi }=d_K^{-2}(g)\bigg (|z|_{{\mathbb {C}}^n}^2+\i \,t\bigg )\). Note that \({\mathfrak {F}}(w)\) is analytic in some domain in the complex plane \({\mathbb {C}}\), which contains the line segment \(\bigg [-{\pi \over 2}\i ,{\pi \over 2}\i \bigg ]\) in the imaginary axis, and that \({\mathfrak {F}}({\pi \over 4}\i )\not =0\). Thus, \({\mathfrak {F}}(w)\) has at most a finite number of zero points on \(\bigg [-{\pi \over 2}\i ,{\pi \over 2}\i \bigg ]\), i.e., there exist \(\{\phi _\ell \}_{\ell =1}^N\subset \bigg [-{\pi \over 2}, {\pi \over 2}\bigg ]\) such that \({\mathfrak {F}}(\i \phi _\ell )=0\). From the mapping in (6.3), we see that for each \(\ell =1,\ldots ,N\), \(\phi _\ell \) corresponds to a hyperplane \({\mathcal {H}}_\ell \) in \({\mathbb {H}}^n\) defined by

    $$\begin{aligned} {\mathcal {H}}_\ell :=\bigg \{ (z,t)\in {\mathbb {H}}^n:\ \phi _\ell =\phi (|z|_{{\mathbb {C}}^n},t)\bigg \}. \end{aligned}$$

    Let

    $$\begin{aligned} {\mathcal {H}} = \bigcup _{\ell =1}^N{\mathcal {H}}_\ell . \end{aligned}$$

    Then we see that \(\{g\in {\mathbb {H}}^n: F(g)=0\} \subset {\mathcal {H}}\), and that \({\mathcal {H}}\) has measure zero. Consequently, the measure of the set \(\{g\in {\mathbb {H}}^n: F(g)=0\}\) is zero. Hence, we see that the convolution kernel K(g) is homogeneous of degree \(-2n-2\) and that

    $$\begin{aligned} K(g)= C_2F(g), \quad g\in {\mathbb {S}}^n, \end{aligned}$$

    which is non-zero almost everywhere on \({\mathbb {S}}^n\) (the unit sphere in \({\mathbb {H}}^n\)). Thus, non-degeneracy condition in Theorem 3.1 holds.

  5. (2c)

    Second order Riesz transform \(X_j X_k (-\Delta _{\mathbb {H}})^{-1}\) on \({\mathbb {H}}^n\), \(j,k\in \{1,2,\ldots ,2n\}\). Again, by using functional calculus for \((-\Delta _{\mathbb {H}})^{-1}\), it is direct to see that

    $$\begin{aligned} X_j X_k(-\Delta _{\mathbb {H}})^{-1}= \int _0^\infty X_j X_k e^{h \Delta _{\mathbb {H}}} \, dh, \end{aligned}$$

    which together with the size and smoothness estimates for the heat kernel [45], shows that \(X_j X_k (-\Delta _{\mathbb {H}})^{-1}\) is a Calderón–Zygmund operator on \({\mathbb {H}}^n\). Denote the kernel of \(X_j X_k (-\Delta _{\mathbb {H}})^{-1}\) by \(K_{j,k}(g)\). We now verify the non-degeneracy condition in our Theorem 3.1. In fact, this follows from similar approach as we used in (2b). Without lost of generality, we take

    $$\begin{aligned} X_j={\partial \over \partial x_j} + 2x_{n+j} {\partial \over \partial t},\ \ j<n\quad \textrm{and}\quad X_k={\partial \over \partial x_k} + 2x_{n-k} {\partial \over \partial t},\ \ k>n. \end{aligned}$$

    Then based on the formula (6.1) for heat kernel, we get that for \(g\not =[0,0]\),

    $$\begin{aligned} K_{j,k}(g)=&\, d_K^{-2n-4}(g) \Bigg ( F_1(g)+\i F_2(g) + F_3(g) + \i F_4(g) \Bigg ), \end{aligned}$$

    where

    $$\begin{aligned} F_1(g)&=C_3 x_jx_k\int _{{\mathbb {R}}} \big (\cosh (\lambda )\big )^{-n-2} \cosh (\lambda +\i \phi )^2\, d\lambda ,\\ F_2(g)&= - C_3 x_{n+j}x_k \int _{\mathbb R}\big (\cosh (\lambda )\big )^{-n-2} \cosh (\lambda +\i \phi ) \sinh (\lambda +\i \phi )^2\, d\lambda ,\\ F_3(g)&= C_4 x_{n-k}x_j \int _{\mathbb R}\big (\cosh (\lambda )\big )^{-n-2}\ \cosh (\lambda +\i \phi ) \sinh (\lambda +\i \phi )^2 \ d\lambda ,\\ F_4(g)&= - C_4 x_{n-k}x_{n+j} \int _{{\mathbb {R}}}\big (\cosh (\lambda )\big )^{-n-2}\ \sinh (\lambda +\i \phi )^2 \ d\lambda , \end{aligned}$$

    with \(\phi \) defined as in (6.3), \(C_3:=2n(2n+2)C_1 \), \(C_4:=2(2n+2)C_2\) and \(C_1, C_2\) are as in (6.2). By resorting to the analytic continuation as in (2b) and using the isolated zero point, we get that \(K_{j,k}(g)\not =0\) a.e. \(g\in {\mathbb {H}}^n\). Thus, non-degeneracy condition in Theorem 3.1 holds.