1 Introduction and preliminary

1.1 Introductions

The density estimation plays important roles in both statistics and econometrics. This paper considers the density model under bias and multiplicative censoring, which were introduced by Abbaszadeh et al. [1]. Let \(Z_{1}, Z_{2}, \ldots, Z_{n}\) be independent and identically distributed (i.i.d.) random variables of

$$Z_{i}= U_{i} Y_{i}, \quad i=1, \ldots , n, $$

where \(U_{1}, U_{2}, \ldots, U_{n}\) are unobserved i.i.d. random variables with the common uniform distribution on \([0, 1]\), \(Y_{1}, Y_{2}, \ldots, Y_{n}\) are also unobserved i.i.d. random variables and the density function \(f_{Y}\) is given by

$$ f_{Y}(x)=\frac{\omega(x)f_{X}(x)}{\theta}, \quad x\in[0,1]. $$
(1.1)

Here, \(\omega(x)>0\) denotes a known weight function, \(f_{X}(x)\) stands for an unknown density function of a random variable X and \(\theta=E(\omega(X))=\int_{0}^{1}\omega(x)f_{X}(x)\,dx\) represents the unknown normalization constant (EX is the expectation of X). We suppose that \(U_{i}\) and \(Y_{i}\) are independent for each \(i\in{1, 2, \ldots, n}\). Our aim is to estimate \(f_{X}\) when only \(Z_{1}, Z_{2}, \ldots, Z_{n}\) are observed.

In particular, when \(\omega(x)=1\), this model reduces to the classical density estimation problem under multiplicative censoring described by Vardi [2], which unifies several well-studied statistical problems, including non-parametric inference for renewal processes, certain non-parametric deconvolution problems and estimation of decreasing densities [24]. Many methods were proposed to deal with that problem including a series expansion method [4], the kernel method [5] and wavelet method [1, 6], etc. For the standard biased density model (1.1) (estimating \(f_{X}\) from \(Y_{1}, Y_{2}, \ldots, Y_{n}\)), we refer to [79]. However, the estimation of \(f_{X}\) from \(Z_{1}, Z_{2}, \ldots, Z_{n}\) is a new statistical problem which has potential applications in statistics and econometrics. As far as we know, only Abbaszadeh et al. [1] dealt with that problem. By using wavelet method, they considered a convergence rate of estimators in \(L^{2}\) norm over Besov space \(B^{s}_{r,q}\).

It is well known that in many statistical models, the error of estimators is measured in \(L^{p}\) norm [1012]. In this paper, we consider \(L^{p}\) (\(1< p < \infty\)) risk estimation in Besov space \(B^{s}_{r,q}\) based on wavelet bases. We define a linear estimator and a nonlinear adaptive one motivated by Abbaszadeh et al. and Youming and Junlian’s work. We prove that the nonlinear adaptive estimator achieves a faster rate of convergence than the linear one for \(r< p\). Our results can be considered as an extension of Abbaszadeh et al.’s theorems from \(p=2\) to \(p\in [1, +\infty)\).

Section 1.2 introduces some notations and classical results on wavelets and Besov spaces, which will be used in our discussions; the assumptions on the model and the main results are presented in Section 2. In order to prove our theorems, we show several lemmas in Section 3 and give the proofs in Section 4.

1.2 Some preparations

In recent years, the wavelet method has turned out to be effective for density estimation [1, 6, 11, 12] because of the time and frequency localization, being a fast algorithm in numerical computations. In this subsection, we introduce the wavelet basis of the real line \(\mathbb{R}\) (not necessarily on the fixed interval \([0, 1]\) as in [1]), which will be used in our discussions. Let \(\varphi\in C_{0}^{m}(\mathbb{R})\) be an orthonormal scaling function with \(m>s\). The corresponding wavelet function is denoted by ψ. It is well known that \(\{ \phi_{J,k}, \psi_{j,k}, j\geq J, k\in\mathbb {Z}\}\) constitutes an orthonormal basis of \(L^{2}(\mathbb{R})\), where \(f_{j,k}(x):=2^{j/2}f(2^{j}x-k)\) as in wavelet analysis. Then, for each \(f\in L^{2}(\mathbb{R})\),

$$f(x)=\sum_{k\in\mathbb{Z}}\alpha_{j,k} \phi_{J,k}(x)+\sum_{j\geq J}\sum _{k\in\mathbb{Z}}\beta_{j,k}\psi_{j,k}(x), $$

where \(\alpha_{J,k}=\int f(x)\phi_{j,k}(x)\,dx\) and \(\beta_{j,k}=\int f(x)\psi_{j,k}(x)\,dx\). Details on wavelet bases can be found in [13].

One of the advantages of wavelet bases is that they can characterize Besov spaces. Throughout the paper, we work within Besov space on a compact subset of the real line \(\mathbb{R}\) (not necessarily on the fixed interval \([0, 1]\) as [1]). To introduce those spaces, we need the well-known Sobolev spaces with integer exponents \(W_{p}^{n}(\mathbb {R}):=\{ f| f\in L^{p}(\mathbb{R}), f^{(n)}\in L^{p}(\mathbb{R})\}\) and \(\|f\|_{W_{p}^{n}}:=\|f\|_{p}+\|f^{(n)}\|_{p}\). Then \(L^{p}(\mathbb{R})\) can be considered as \(W^{0}_{p}(\mathbb{R})\). For \(1\leq p, q\leq\infty\) and \(s=n+\alpha\) with \(\alpha\in(0,1]\), the Besov spaces on \(\mathbb{R}\) are defined by

$$B_{p, q}^{s}(\mathbb{R}):=\bigl\{ f\in W^{n}_{p}( \mathbb{R}), \bigl\| t^{-\alpha}\omega_{p}^{2} \bigl(f^{(n)}, t\bigr)\bigr\| _{q}^{\ast} < \infty \bigr\} , $$

where \(\omega_{p}^{2}(f, t):= \sup_{|h|\leq t}\|f (\cdot+2h)-2f (\cdot+h)+f (\cdot)\|_{p}\) denotes the smoothness modulus of f and

$$\| h\|_{q}^{\ast}:= \left \{ \textstyle\begin{array}{@{}l@{\quad}l} (\int_{0}^{\infty} |h(t) |^{q}\frac{dt}{t} )^{\frac {1}{q}} , & \mbox{if }1\leq q< \infty,\\ \operatorname{ess} \sup_{t} |h(t) | , & \mbox{if }q=\infty. \end{array}\displaystyle \displaystyle \right . $$

The associated norm \(\|f\|_{B_{p, q}^{s}}:=\|f\|_{p}+\|t^{-\alpha}\omega _{p}^{2}(f^{(n)}, t)\|_{q}^{\ast}\). It should be pointed out that Besov spaces contain Hölder spaces and Sobolev spaces with non-integer exponents for a particular choice of s, p, and r [13].

The following theorems are fundamental in our discussions.

Theorem 1.1

([14])

Let \(f\in L^{r}(\mathbb {R})\) (\(1\leq r\leq\infty\)), \(\alpha_{J,k}=\int f(x)\phi_{J,k}(x)\,dx\), \(\beta_{j,k}=\int f(x)\psi_{j,k}(x)\,dx\). Then the following assertions are equivalent.

  1. (i)

    \(f\in B_{r,q}^{s}(\mathbb{R})\), \(s>0\), \(1\leq q\leq\infty\);

  2. (ii)

    \(\{2^{js}\|P_{j}f-f\|_{r}\}_{j\geq0}\in l^{q}\), where \(P_{j}(x):=\sum_{k\in\mathbb{ Z}}\alpha_{j,k}\phi_{J,k}(x)\) is the projection operator to \(V_{j}\);

  3. (iii)

    \(\|\alpha_{J,\cdot}\|_{r}+\|\{2^{j(s+1/2-1/r)}\|\beta_{j,\cdot}\| _{r}\}_{j\geq0}\|_{q}< \infty\).

Theorem 1.2

([14])

Let ϕ be a scaling function or a wavelet with \(\theta(\phi):=\sup_{x\in \mathbb{R}}|\phi(x-k)|< \infty\). Then

$$\biggl\| \sum_{k\in\mathbb{Z}}\lambda_{k} \phi_{j,k}\biggr\| _{p}\sim 2^{j(\frac{1}{2}- \frac{1}{p})}\|\lambda\|_{p} $$

for \(\lambda=\{\lambda_{k}\}\in l^{p}(\mathbb{Z})\) and \(1\leq p\leq \infty\), where

$$\|\lambda\|_{p}:= \left \{ \textstyle\begin{array}{@{}l@{\quad}l} (\sum_{k\in\mathbb{Z}}|\lambda_{k}|^{p})^{\frac{1}{p}} , & \textit{if }p< \infty,\\ \sup_{k\in\mathbb{Z}}|\lambda_{k}| , & \textit{if }q=\infty. \end{array}\displaystyle \displaystyle \right . $$

Here and after, \(A\lesssim B\) denotes \(A\leq CB\) for some constant \(C>0\); \(A\sim B\) stands for both \(A\lesssim B\) and \(B\lesssim A\). Clearly, Daubechies and Meyer’s scaling and wavelet functions satisfy the conditions \(\theta(\phi)<\infty\).

2 Main results

This section is devoted to the statement of our main results. To do that, we make the following assumptions as described in [1]:

  1. (A1)

    The two density functions \(f_{X}\) and \(f_{Y}\) have the support \([0, 1]\) and \(f_{X}\) belongs to the Besov ball \(B_{r,q}^{s}(H)\) (\(H>0\)) defined as

    $$B_{r,q}^{s}(H):=\bigl\{ f\in B_{r,q}^{s}( \mathbb{R}), f\mbox{ is a probability density and }\|f\|_{B_{r,q}^{s}}\leq H \bigr\} . $$
  2. (A2)

    The density of \(Z_{i}\) is

    $$f_{Z}(x)=\int_{x}^{1} \frac{f_{Y}(y)}{y}\,dy. $$
  3. (A3)

    There exists a constant \(C>0\) such that

    $$\sup_{x\in[0,1]} f_{X}(x)\leq C,\qquad \sup _{x\in[0,1]} f_{Z}(x)\leq C. $$
  4. (A4)

    There exist a constant \(C>0\) and \(c>0\) such that

    $$\sup_{x\in[0,1]} \omega(x)\leq C, \qquad \sup_{x\in[0,1]} \omega'(x)\leq C, \qquad \inf_{x\in[0,1]} \omega(x)\geq c. $$

To introduce the wavelet estimator, we define the operator T by

$$T(h) (x):=\frac{h(x)\omega(x)+xh'(x)\omega(x)-xh(x)\omega'(x)}{\omega^{2}(x)} $$

for \(h\in C^{1}(\mathbb{R})\), the function set of all differential functions on \(\mathbb{R}\). Then the linear estimator is given as follows:

$$ \hat{f}^{\mathrm{lin}}(x):=\sum_{k\in\wedge}\hat{ \alpha}_{j_{0}, k}\phi _{j_{0}, k}(x), $$
(2.1)

where \(j_{0}\) is chosen such that \(2^{j_{0}}\sim n^{\frac{1}{2s+3}}\) and \(\wedge:=\{k\in\mathbb{Z}, \operatorname{supp} f_{X} \cap \operatorname{supp} \phi _{j_{0}, k}\neq\varnothing\}\).

To obtain a nonlinear estimator, we take \(j_{0}\) and \(j_{1}\) such that \(2^{j_{1}}\sim\frac{n}{\ln n}\) and \(2^{j_{0}}\sim n^{\frac {1}{2m+3}}\) with \(m>s\). By definition

$$ \hat{\alpha}_{j, k}=\frac{\hat{\theta}}{n}\sum_{i=1}^{n}T( \phi _{j, k}) (Z_{i}), \qquad \hat{\beta}_{j, k}= \frac{\hat{\theta}}{n}\sum_{i=1}^{n}T( \psi_{j, k}) (Z_{i}) $$
(2.2)

are the estimators of \(\alpha_{j, k}=\int f(x)\phi_{j, k}(x)\,dx\) and \(\beta_{j, k}=\int f(x)\psi_{j, k}(x)\,dx\), respectively, with

$$\hat{\theta}:= \Biggl[\frac{1}{n}\sum_{i=1}^{n} \frac{\omega (Z_{i})-Z_{i}\omega'(Z_{i})}{\omega^{2}(Z_{i})} \Biggr]^{-1}. $$

Then the nonlinear estimator is given by

$$ \hat{f}^{\mathrm{non}}(x):=\sum_{k\in\wedge}\hat{ \alpha}_{j_{0}, k}\phi _{j_{0}, k}(x)+\sum_{j=j_{0}}^{j_{1}} \sum_{k\in\wedge _{j}}\hat{\beta}_{j,k}1_{\{|\hat{\beta}_{j,k}|>\lambda\}} \psi_{j, k}(x), $$
(2.3)

where \(\wedge:=\{k\in\mathbb{Z}, \operatorname{supp} f_{X}\cap \operatorname{supp} \phi _{j_{0}, k}\neq\varnothing\}\), \(\wedge_{j}:=\{k\in\mathbb{Z}, \operatorname{supp} f_{X}\cap \operatorname{supp} \psi_{j, k}\neq\varnothing\}\) and \(1_{D}\) denotes the indicator function on the set D with \(\lambda:=T2^{j}\sqrt{\frac{\ln n}{n}}\).

Remark 2.1

From the definition of \(\hat {f}^{\mathrm{non}}_{n}\), we find that the nonlinear estimator has the advantage of being adaptive, since it does not depend on the indices s, r, q, and H in its construction.

Remark 2.2

The definitions of \(\hat {f}^{\mathrm{lin}}\) and \(\hat{f}^{\mathrm{non}}\) are essentially same as in [1]. However, the selection of \(j_{0}\) and \(j_{1}\) is different from that in [1] and the wavelet functions are defined on the real line \(\mathbb{R}\) not necessarily on \([0, 1]\).

Then we have the following approximation result, which extends Abbaszadeh et al.’s theorems [1] from \(p=2\) to \(p\in[1, +\infty)\).

Theorem 2.1

Let \(f_{X}(x)\in B_{r,q}^{s}(H)\) (\(s>\frac{1}{r}\), \(r,q\geq1\)) and \(\hat{f}^{\mathrm{lin}}\) be the estimator defined by (2.1). If (A1)-(A4) hold, then for each \(1\leq p<\infty\), \(s'=s-(1/r-1/p)_{+}\), and \(x_{+}:=\max(x,0)\),

$$\sup_{f_{X}\in B_{r,q}^{s}(H)} E\bigl\| \hat{f}^{\mathrm{lin}}(x)-f_{X}(x)\bigr\| _{p}^{p}\lesssim n^{-\frac{s'p}{2s'+3}}. $$

Remark 2.3

The condition \(s>\frac{1}{p}\) seems natural, since \(B^{s}_{r,q}(\mathbb{R})\subseteq C(\mathbb{R})\) for \(sp>1\), where \(C(\mathbb{R})\) denotes the function set of all continuous functions on \(\mathbb{R}\).

Remark 2.4

If \(r\geq2\) and \(p=2\), then Abbaszadeh et al.’s Theorem 4.1 [1] follows directly from our theorem, in this case \(s'=s\). That is, Theorem 2.1 extends the corresponding theorem of [1] from \(p=2\) to \(p\in[1, +\infty)\).

Remark 2.5

When \(\omega(x)=1\) and \(\theta =E(\omega(X))=1\), the model reduces to the standard multiplicative censoring one considered by Abbaszadeh et al. [6]. In [6], they estimate the convergence rate of wavelet estimators in \(L_{p}\) norm for a density and its derivatives in Besov space. Our result is consistent with Theorem 4.1 [6] taken with \(m=0\).

Theorem 2.2

Let \(f_{X}(x)\in B_{r,q}^{s}(H)\) (\(\frac{1}{r} < s < m\), \(r, q\geq1\)) and \(\hat {f}^{\mathrm{non}}_{n}\) be the estimator given by (2.3). If (A1)-(A4) hold, then there exists \(C>0\) such that for each \(1\leq p<\infty\) and \(\alpha :=\min\{\frac{s}{2s+3}, \frac{s-\frac{1}{r}+\frac{1}{p}}{2(s-\frac {1}{r})+3}\}\),

$$\sup_{f_{X}\in B_{r,q}^{s}(H)} E\bigl\| \hat{f}^{\mathrm{non}}_{n}(x)-f_{X}(x) \bigr\| _{p}^{p}\lesssim (\ln n)^{p}\biggl( \frac{\ln n}{n}\biggr)^{\alpha p}. $$

Remark 2.6

When \(p=2\) and \(r\geq2\) or \(\{ 1\leq r<2, s>3/r\}\), our result is the exactly same as Theorem 4.2 [1], ignoring the log factor. In this case, \(\alpha=\frac{s}{2s+3}\). In other words, our theorem can be considered as an extension of Theorem 4.2 in [1].

Remark 2.7

When \(\omega(x)=1\) and \(\theta =E(\omega(X))=1\), our result coincides with Theorem 4.2 [6] taken with \(m=0\), ignoring the log factor. In this case, the model reduces to the standard density estimation problem under multiplicative censoring.

Remark 2.8

When \(r< p\), the nonlinear estimator attains a better rate of convergence than that of the linear one, due to \(\frac{s'p}{2s'+3}<\alpha p\) because of \(\frac {s'p}{2s'+3}<\frac{sp}{2s+3}\) and \(\frac{s'p}{2s'+3}<\frac{(s-\frac {1}{r}+\frac{1}{p})p}{2(s-\frac{1}{r})+3}\). When \(r\geq p\), the nonlinear estimator does the same rate of convergence to that of the linear one, i.e., \(n^{-\frac{sp}{2s+3}}\), ignoring the log factor. However, taking into account that the nonlinear estimator is adaptive, it is preferable to the linear one in the estimation of \(f_{X}\).

3 Lemmas

We present several important lemmas in this section, which will be needed for the proofs of our main theorems. To show Lemma 3.1, we need Rosenthal’s inequality [14].

Rosenthal’s inequality

Let \(X_{1}, X_{2}, \ldots, X_{n}\) be independent random variables such that \(E(X_{i})=0\) and \(E |x_{i} |^{p}<\infty\) (\(i=1,2,\ldots, n\)). Then

$$E \Biggl|\sum_{i=1}^{n}X_{i} \Biggr|^{p}\leq \left \{ \textstyle\begin{array}{@{}l@{\quad}l} C_{p} [\sum_{i=1}^{n}E |X_{i} |^{p}+ (\sum_{i=1}^{n}E |X_{i} |^{2} )^{\frac{1}{2}p} ] , & p\geq2,\\ C_{p} (\sum_{i=1}^{n}E |X_{i} |^{2} )^{\frac {1}{2}p}, & 0< p\leq2. \end{array}\displaystyle \displaystyle \right . $$

Lemma 3.1

Let \(\hat{\alpha}_{j,k}\), \(\hat {\beta}_{j,k}\) be defined by (2.2). If (A1)-(A4) hold, then there exists a constant \(C>0\) such that

$$E |\hat{\alpha}_{j,k}-\alpha_{j,k} |^{p} \leq2^{pj}n^{-\frac {1}{2}p} \quad \textit{and}\quad E |\hat{ \beta}_{j,k}-\beta_{j,k} |^{p}\leq 2^{pj}n^{-\frac{1}{2}p} $$

for \(1\leq p<\infty\) and \(2^{j}\leq n\).

Remark 3.1

When \(p=4\), Lemma 3.1 reduces to Proposition 4.1 in [1].

Proof of Lemma 3.1

One only proves the first inequality, the second one is similar. Clearly,

$$\hat{\alpha}_{j,k}-\alpha_{j,k}=\frac{\hat{\theta}}{\theta} \frac {1}{n}\sum_{i=1}^{n} \bigl[ \theta T(\phi_{j,k}) (Z_{i})-\alpha _{j,k} \bigr]+ \alpha_{j,k} \hat{\theta} \biggl(\frac{1}{\theta}-\frac {1}{\hat{\theta}} \biggr). $$

By (A1) and (A3), \(|\alpha_{j,k}|\leq\int_{0}^{1}f_{X}(x)|\phi _{j,k}(x)|\,dx\lesssim\int_{0}^{1}|\phi_{j,k}(x)|\,dx\lesssim1\). On the other hand, \(c\leq|\theta|= |E (\omega(X) ) |= |\int_{0}^{1}\omega(x)f_{X}(x)\,dx |\leq C\) and \(|\hat{\theta }|\lesssim1\) thanks to (A4). Therefore,

$$E |\hat{\alpha}_{j,k}-\alpha_{j,k} |^{p}\leq E \Biggl| \frac{1}{n}\sum_{i=1}^{n} \bigl[\theta T(\phi_{j,k}) (Z_{i})-\alpha_{j,k} \bigr] \Biggr|^{p}+E \biggl|\frac{1}{\hat{\theta}}-\frac{1}{\theta} \biggr|^{p}:=T_{1}+T_{2}. $$

To estimate \(T_{1}\), one defines \(\xi_{i}:=\theta T(\phi _{j,k})(Z_{i})-\alpha_{j,k}\). Then \(T_{1}=E |\frac{1}{n}\sum_{i=1}^{n}\xi_{i} |^{p}\) and \(E(\xi_{i})=0\) by Lemma 4.2 in [1]. By the definition of the operator T,

$$E \bigl|\theta T(\phi_{j,k}) (Z_{i}) \bigr|^{p}=| \theta|^{p}\int_{0}^{1}\frac { |\phi_{j,k}(x)\omega(x)+x(\phi_{j,k})'(x)\omega(x)-x\phi _{j,k}(x)\omega'(x) |^{p}}{\omega^{2}(x)} f_{Z}(x)\,dx. $$

Moveover, \(E |\theta T(\phi_{j,k})(Z_{i}) |^{p}\lesssim\int_{0}^{1} ( |\phi_{j,k}(x) |^{p}+ |\phi_{j,k}'(x) |^{p} )\,dx\lesssim2^{j(\frac{3}{2}p-1)}\) due to (A3) and (A4). Note that \(|\alpha_{j,k}|^{p}\lesssim1\). Then for \(p\geq1\),

$$ E |\xi_{i} |^{p}\lesssim E \bigl|\theta T(\phi_{j,k}) (Z_{i}) \bigr|^{p}+ |\alpha_{j,k} |^{p} \lesssim2^{j(\frac{3}{2}p-1)}. $$
(3.1)

This with Rosental’s inequality leads to

$$E \Biggl|\frac{1}{n}\sum_{i=1}^{n} \xi_{i} \Biggr|^{p}\lesssim n^{-p}\max \bigl\{ nE | \xi_{i} |^{p}, \bigl(nE |\xi_{i} |^{2} \bigr)^{\frac{1}{2}p} \bigr\} \lesssim n^{-p}\max \bigl\{ n2^{j(\frac {3}{2}p-1)},n^{\frac{1}{2}p}2^{pj} \bigr\} . $$

Using the assumption \(2^{j}\leq n\), one obtains \(T_{1}=E |\frac {1}{n}\sum_{i=1}^{n}\xi_{i} |^{p}\lesssim2^{pj}n^{-p/2}\). To end the proof, one needs only to show

$$ T_{2}=E \biggl|\frac{1}{\hat{\theta}}-\frac{1}{\theta} \biggr|^{p}\lesssim 2^{pj}n^{-\frac{1}{2}p}. $$
(3.2)

Denote

$$\eta_{i}:=\frac{\omega(Z_{i})-Z_{i}\omega'(Z_{i})}{\omega ^{2}(Z_{i})}-\frac{1}{\theta} \quad (i=1,2,\ldots, n). $$

Then \(E|\eta_{i}|^{p}\leq C\) due to (A4). By Theorem 4.1 [1], \(\eta_{1}, \eta_{2}, \ldots, \eta_{n}\) are i.i.d. and \(E(\eta_{i})=0\). Then it follows from Rosental’s inequality that

$$T_{2}=n^{-p}E \Biggl|\sum_{i=1}^{n} \eta_{i} \Biggr|^{p}\lesssim n^{-p} \Biggl(\sum _{i=1}^{n}E|\eta_{i}|^{2} \Biggr)^{\frac {1}{2}p}\lesssim n^{-\frac{1}{2}p}\leq2^{pj}n^{-\frac{1}{2}p} $$

for \(1\leq p\leq2\) and

$$T_{2}=n^{-p}E \Biggl|\sum_{i=1}^{n} \eta_{i} \Biggr|^{p}\lesssim n^{-p} \Biggl[\sum _{i=1}^{n}E|\eta_{i}|^{p}+ \Biggl(\sum_{i=1}^{n}E|\eta_{i}|^{2} \Biggr)^{\frac{1}{2}p} \Biggr]\lesssim n^{-(p-1)}+n^{-\frac{1}{2}p}\lesssim n^{-\frac{1}{2}p} $$

for \(p\geq2\), which proves the desired conclusion (3.2). This finishes the proof of Lemma 3.1. □

The well-known Bernstein inequality [14] is needed in order to prove Lemma 3.2.

Bernstein’s inequality

Let \(X_{1}, X_{2}, \ldots, X_{n}\) be i.i.d. random variables with \(E(X_{i})=0\), \(\|X_{i}\|_{\infty}\leq M\). Then, for each \(\gamma>0\),

$$P \Biggl\{ \Biggl|\frac{1}{n}\sum_{i=1}^{n}X_{i} \Biggr|>\gamma \Biggr\} \leq 2\exp \biggl(-\frac{n\gamma^{2}}{2[E(X_{i}^{2})+\|X\|_{\infty}\gamma /3]} \biggr). $$

Lemma 3.2

Let \(\hat{\beta}_{j,k}\) be defined by (2.2). If \(2^{j}\leq n/\ln n\) and (A1)-(A4) hold, then for each \(\varepsilon>0\), there exists \(T>0\) such that

$$ P \biggl\{ |\hat{\beta}_{j,k}-\beta_{j,k}|> \frac{T}{2}2^{j}\sqrt{\frac{\ln n}{n}} \biggr\} \lesssim2^{-\varepsilon j}. $$
(3.3)

Proof

By the definition of \(\hat{\beta}_{j,k}\),

$$\hat{\beta}_{j,k}-\beta_{j,k}=\frac{\hat{\theta}}{n}\sum _{i=1}^{n}T(\psi_{j,k}) (Z_{i})-\beta_{j,k}=\frac{\hat{\theta}}{\theta }\frac{1}{n}\sum _{i=1}^{n} \bigl[\theta T( \psi_{j,k}) (Z_{i})-\beta _{j,k} \bigr]+\hat{ \theta} \beta_{j,k} \biggl(\frac{1}{\theta}-\frac{1}{\hat {\theta}} \biggr). $$

Then

$$ |\hat{\beta}_{j,k}-\beta_{j,k}|\leq \biggl|\frac{\hat{\theta}}{\theta} \biggr| \Biggl| \frac{1}{n}\sum_{i=1}^{n}\bigl[\theta T(\psi_{j,k}) (Z_{i})-\beta _{j,k}\bigr] \Biggr|+| \theta| |\beta_{j,k}| \biggl|\frac{1}{\hat{\theta}}-\frac {1}{\theta} \biggr|. $$
(3.4)

The proof of Lemma 3.1 shows \(|\hat{\theta}|\leq C\) and \(c\leq|\theta |\leq C\). By (A1) and (A3), \(|\beta_{j,k}|= |\int_{0}^{1}f_{X}(x)\psi_{j,k}(x)\,dx |\lesssim\int_{0}^{1} |\psi _{j,k}(x) |\,dx\lesssim1\). Then (3.4) reduces to

$$|\hat{\beta}_{j,k}-\beta_{j,k} |\leq C \Biggl|\frac{1}{n} \sum_{i=1}^{n} \bigl[\theta T( \psi_{j,k}) (Z_{i})-\beta_{j,k} \bigr] \Biggr|+C \biggl| \frac{1}{\hat{\theta}}-\frac{1}{\theta} \biggr|. $$

Furthermore,

$$ P \biggl\{ |\hat{\beta}_{j,k}-\beta_{j,k}|> \frac{T}{2}2^{j}\sqrt{\frac {\ln n}{n}} \biggr\} \leq I_{1}+I_{2}, $$
(3.5)

where

$$I_{1}:=P \Biggl\{ \Biggl|\frac{1}{n}\sum_{i=1}^{n} \bigl[\theta T(\psi _{j,k}) (Z_{i})-\beta_{j,k} \bigr] \Biggr|>\frac{T}{4C}2^{j}\sqrt{\frac{\ln n}{n}} \Biggr\} $$

and

$$I_{2}:=P \biggl\{ \biggl|\frac{1}{\hat{\theta}}-\frac{1}{\theta} \biggr|> \frac {T}{4C}2^{j}\sqrt{\frac{\ln n}{n}} \biggr\} . $$

By (3.5), one needs only to prove

$$ |I_{i}|\lesssim2^{-\varepsilon j} \quad (i=1, 2) $$
(3.6)

for the desired conclusion (3.3).

To estimate \(I_{1}\), one defines \(U_{i}=\theta T(\psi_{j, k})(Z_{i})-\beta_{j, k}\). Then

$$\bigl|T(\psi_{j, k}) (x) \bigr|=\omega^{-2}(x) \bigl|\psi_{j, k}(x) \omega (x)+x(\psi_{j,k})'(x)\omega(x)-x \psi_{j, k}(x)\omega' (x)\bigr| $$

by the definition of the operator T. Using (A4), \(|T(\psi_{j, k})(x)|\lesssim|\psi_{j, k}(x)|+|(\psi_{j, k})'(x)|\lesssim2^{\frac {3}{2}j}\) for \(x\in[0,1]\) and

$$|U_{i}|= \bigl|\theta T(\psi_{j, k}) (Z_{i})- \beta_{j, k} \bigr|\lesssim |\theta| \bigl|T(\psi_{j, k}) (Z_{i}) \bigr|+|\beta_{j, k}|\lesssim2^{\frac{3}{2}j} $$

due to \(|\theta|\lesssim1\) and \(|\beta_{j, k}|\lesssim1\). Moreover, using (3.1) with ψ instead of ϕ and \(p=2\), one obtains \(E |U_{i}|^{2}\lesssim2^{2j}\). Because \(U_{1}, U_{2}, \ldots, U_{n}\) are i.i.d. and \(E(U_{i})=0\) (\(i=1, 2, \ldots, n\)) thanks to Lemma 4.2 [1], Bernstein’s inequality tells us that

$$I_{1}:=P \Biggl\{ \Biggl|\frac{1}{n}\sum_{i=1}^{n}U_{i} \Biggr|>\frac {T}{4C}2^{j}\sqrt{\frac{\ln n}{n}} \Biggr\} \leq2\exp \biggl(-\frac {n\gamma^{2}}{2[E(U_{i}^{2})+\frac{\gamma}{3}\|U\|_{\infty}]} \biggr) $$

with \(\gamma=\frac{T}{4C}2^{j}\sqrt{\frac{\ln n}{n}}\). It is easy to see that \(\frac{n\gamma^{2}}{2[E(U_{i}^{2})+\frac{\gamma}{3}\|U\|_{\infty}]}\geq \frac{n\frac{T^{2}}{16C^{2}}2^{2j}\frac{\ln n}{n}}{2(2^{2j}+\frac {T}{12C}2^{\frac{5}{2}j}\sqrt{\frac{\ln n}{n}})}\geq\frac{T^{2}\ln n}{32C^{2}(1+\frac{T}{12C})}\) because of \(2^{\frac{j}{2}}\sqrt{\frac {\ln n}{n}}\leq1\) by the assumption \(2^{j}\leq\frac{n}{\ln n}\). Note that \(\ln n> j\ln2\) due to \(n\geq2^{j}\ln n>2^{j}\). Hence, \(\frac {n\gamma^{2}}{2[E(U_{i}^{2})+\frac{\gamma}{3}\|U\|_{\infty}]}\geq\frac {T^{2}\ln2}{32C^{2}(1+\frac{T}{12C})}j\). One chooses \(T>0\) such that \(\frac{T^{2}\ln2}{32C^{3}(1+\frac{T}{12C})}>\varepsilon\). Then \(\frac {T^{2}\ln2}{32C^{2}(1+\frac{T}{12C})}>\varepsilon\) due to \(C\geq1\) and \(I_{1}\lesssim \exp(-\frac{T^{2}\ln2}{32C^{2}(1+\frac {T}{12C})}j)\lesssim 2^{-\varepsilon j}\), which shows (3.6) for \(i=1\).

Next, one estimates \(I_{2}\): Define \(W_{i}:=\frac{\omega (Z_{i})-Z_{i}\omega'(Z_{i})}{\omega^{2}(Z_{i})}-\frac{1}{\theta}\). Then \(W_{1}, W_{2}, \ldots, W_{n}\) are i.i.d. and \(E(W_{i})=0\). On the other hand, (A3) implies \(|W_{i}|\leq C\) and \(E|W_{i}|^{2}\leq C\). Applying Bernstein’s inequality, one obtains

$$I_{2}=P \Biggl\{ \Biggl|\frac{1}{n}\sum_{i=1}^{n}W_{i} \Biggr|>\frac {T}{4C}2^{j}\sqrt{\frac{\ln n}{n}} \Biggr\} \leq2\exp \biggl(-\frac {n\gamma^{2}}{2[E(W_{i}^{2})+\frac{\gamma}{3}\|W\|_{\infty}]} \biggr) $$

with \(\gamma=\frac{T}{4C}2^{j}\sqrt{\frac{\ln n}{n}}\). Note that \(\frac{n\gamma^{2}}{2[E(W_{i}^{2})+\frac{\gamma}{3}\|W\|_{\infty}]}\geq \frac{n\frac{T^{2}}{16C^{2}}2^{2j}\frac{\ln n}{n}}{2(C+\frac {TC}{12C}2^{j}\sqrt{\frac{\ln n}{n}})}\geq\frac{\frac {T^{2}}{16C^{2}}2^{j}\ln n}{2C(1+\frac{T}{12C})}\geq\frac{T^{2}\ln 2}{32C^{3}(1+\frac{T}{12C})}j\) because of \(\sqrt{\frac{\ln n}{n}}\leq1\) and \(\ln n >j\ln2\). The desired conclusion (3.6) (\(i=2\)) follows by taking \(T>0\) such that \(\frac {T^{2}\ln2}{32C^{3}(1+\frac{T}{12C})}>\varepsilon\). This completes the proof of Lemma 3.2. □

4 Proofs

This section is devoted to the proof of Theorems 2.1 and 2.2, based on the knowledge of Section 3. We begin with the proof of Theorem 2.1.

Proof of Theorem 2.1

Clearly, \(\hat {f}^{\mathrm{lin}}_{n}-f^{X}=(\hat {f}^{\mathrm{lin}}_{n}-P_{j_{0}}f_{X})+(P_{j_{0}}f_{X}-f_{X})\) and

$$ E\bigl\| \hat{f}^{\mathrm{lin}}_{n}-f_{X}\bigr\| _{p}^{p} \leq\|P_{j_{0}}f_{X}-f_{X}\| _{p}^{p}+E \bigl\| \hat{f}^{\mathrm{lin}}_{n}-P_{j_{0}}f_{X} \bigr\| _{p}^{p}. $$
(4.1)

It follows from the proof of Theorem 4.1 [15] that

$$ \|P_{j_{0}}f_{X}-f_{X}\|_{p}^{p} \lesssim2^{-j_{0}ps'}\lesssim n^{-\frac {ps'}{2s'+3}} $$
(4.2)

due to \(2^{j_{0}}\sim n^{\frac{1}{2s'+3}}\). By (4.1) and (4.2), it is sufficient to show

$$ E\bigl\| \hat{f}^{\mathrm{lin}}_{n}-P_{j_{0}}f_{X} \bigr\| _{p}^{p}\lesssim n^{-\frac {ps'}{2s'+3}} $$
(4.3)

for the conclusion of Theorem 2.1. By the definition of \(\hat{f}^{\mathrm{lin}}_{n}\), \(\hat{f}^{\mathrm{lin}}_{n}-P_{j_{0}}f_{X}=\sum_{k\in\wedge}(\hat{\alpha }_{j_{0}, k}-\alpha_{j_{0}, k})\phi_{j_{0}, k}\). Then \(\|\hat {f}^{\mathrm{lin}}_{n}-P_{j_{0}}f_{X}\|_{p}^{p}\lesssim2^{j_{0}(\frac {1}{2}p-1)}\sum_{k\in\wedge}|\hat{\alpha}_{j_{0},k}-\alpha _{j_{0},k}|^{p}\) thanks to Theorem 1.2. This with Lemma 3.1 and the choice of \(j_{0}\) leads to

$$E\bigl\| \hat{f}^{\mathrm{lin}}_{n}-P_{j_{0}}f^{X} \bigr\| _{p}^{p}\lesssim2^{\frac {p}{2}j_{0}}E|\hat{\alpha}_{j_{0},k}- \alpha_{j_{0},k}|^{p}\lesssim 2^{\frac{3}{2}p j_{0}}n^{-\frac{1}{2}p} \lesssim n^{-\frac{ps'}{2s'+3}}, $$

which is the desired conclusion (4.3). This finishes the proof of Theorem 2.1. □

Next, we prove Theorem 2.2.

Proof of Theorem 2.2

It is sufficient to prove the case \(r\leq p\). In fact, when \(r>p\), \(\hat{f}^{\mathrm{non}}_{n}\) has compact support because of ϕ, ψ, and f having the same property. Then

$$E\bigl\| \hat{f}^{\mathrm{non}}_{n}(x)-f_{X}(x) \bigr\| _{p}^{p}\lesssim\bigl(E\bigl\| \hat {f}^{\mathrm{non}}_{n}(x)-f_{X}(x) \bigr\| _{r}^{r}\bigr)^{\frac{p}{r}} $$

using the Hölder inequality. For \(f_{X}\in B_{r,q}^{s}(H)\), using Theorem 2.2 for the case \(r=p\), one has

$$\sup_{f_{X}\in B_{r,q}^{s}(H)} E\bigl\| \hat{f}^{\mathrm{non}}_{n}(x)-f_{X}(x) \bigr\| _{r}^{r}\lesssim(\ln n)^{r}\biggl( \frac{\ln n}{n}\biggr)^{\alpha r} $$

and

$$\sup_{f_{X}\in B_{r,q}^{s}(H)} E\bigl\| \hat{f}^{\mathrm{non}}_{n}(x)-f_{X}(x) \bigr\| _{p}^{p}\lesssim(\ln n)^{p}\biggl( \frac{\ln n}{n}\biggr)^{\alpha p}. $$

Now, one estimates the case \(r\leq p\). By the definition of \(\hat{f}_{n}^{\mathrm{non}}\),

$$\hat{f}_{n}^{\mathrm{non}}-f=\bigl(\hat{f}_{n}^{\mathrm{lin}}-P_{j_{0}}f \bigr)+(P_{j_{1}+1}f-f)+\sum_{j=j_{0}}^{j_{1}} \sum_{k\in\wedge_{j}}(\hat{\beta }_{j,k}1_{\{|\hat{\beta}_{j,k}|>\lambda\}}- \beta_{j,k})\psi_{j,k}. $$

Then

$$\begin{aligned} E\bigl\| \hat{f}_{n}^{\mathrm{non}}-f\bigr\| _{p}^{p}\lesssim{}& E\bigl\| \hat {f}_{n}^{\mathrm{lin}}-P_{j_{0}}f\bigr\| _{p}^{p}+ \|P_{j_{1}+1}f-f\|_{p}^{p} \\ &{}+E\Biggl\| \sum _{j=j_{0}}^{j_{1}}\sum_{k\in\wedge_{j}}( \hat{\beta }_{j,k}1_{\{|\hat{\beta}_{j,k}|>\lambda\}}-\beta_{j,k}) \psi_{j,k}\Biggr\| _{p}^{p}. \end{aligned}$$
(4.4)

From the proof of Theorem 2.1, one knows

$$\|P_{j_{1}+1}f-f\|_{p}^{p}\lesssim2^{-j_{1}s'p}, \qquad E\bigl\| \hat {f}_{n}^{\mathrm{lin}}-P_{j_{0}}f \bigr\| _{p}^{p}\lesssim2^{\frac {3}{2}pj_{0}}n^{-\frac{p}{2}}. $$

Note that \(2^{j_{0}}\sim n^{\frac{1}{2m+3}}\), \(2^{j_{1}}\sim\frac {n}{\ln n}\), and \(\alpha=\min\{\frac{s}{2s+3},\frac{s-\frac{1}{r}+\frac {1}{p}}{2(s-\frac{1}{r})+3}\}\leq s-\frac{1}{r}+\frac{1}{p}=s'\) thanks to \(s>\frac{1}{r}\). Then

$$ \|P_{j_{1}+1}f-f\|_{p}^{p}\lesssim\biggl( \frac{\ln n}{n}\biggr)^{\alpha p}, \qquad E\bigl\| \hat{f}_{n}^{\mathrm{lin}}-P_{j_{0}}f \bigr\| _{p}^{p}\lesssim\biggl(\frac{\ln n}{n} \biggr)^{\alpha p} . $$
(4.5)

To estimate \(E\|\sum_{j=j_{0}}^{j_{1}}\sum_{k\in \wedge_{j}} (\hat{\beta}_{j,k}1_{\{|\hat{\beta}_{j,k}|>\lambda\} }-\beta_{j,k} )\psi_{j,k}\|_{p}^{p}\), one defines

$$\sum_{j=j_{0}}^{j_{1}}\sum _{k\in\wedge_{j}} (\hat {\beta}_{j,k}1_{\{|\hat{\beta}_{j,k}|>\lambda\}}- \beta_{j,k} )\psi _{j,k}(x):=T_{1}+T_{2}+T_{3}+T_{4}, $$

where

$$\begin{aligned}& T_{1}:=\sum_{j=j_{0}}^{j_{1}}\sum _{k\in\wedge_{j}} (\hat{\beta}_{j,k}- \beta_{j,k} )\psi_{j,k}(x)1_{\{|\hat{\beta}_{j, k}|>\lambda, |\beta_{j, k}|< \lambda/2\}}, \\& T_{2}:=\sum_{j=j_{0}}^{j_{1}}\sum _{k\in\wedge_{j}} (\hat{\beta}_{j,k}- \beta_{j,k} )\psi_{j,k}(x)1_{\{|\hat{\beta}_{j, k}|>\lambda, |\beta_{j, k}|\geq\lambda/2\}}, \\& T_{3}:=\sum_{j=j_{0}}^{j_{1}}\sum _{k\in\wedge_{j}}\beta _{j,k}\psi_{j,k}(x)1_{\{|\hat{\beta}_{j, k}|\leq\lambda, |\beta_{j, k}|> 2\lambda\}}, \\& T_{4}:=\sum_{j=j_{0}}^{j_{1}}\sum _{k\in\wedge_{j}}\beta _{j,k}\psi_{j,k}(x)1_{\{|\hat{\beta}_{j, k}|\leq\lambda, |\beta_{j, k}|\leq2\lambda\}}. \end{aligned}$$

Then \(E\|\sum_{j=j_{0}}^{j_{1}}\sum_{k\in\wedge_{j}} (\hat{\beta}_{j,k}1_{\{|\hat{\beta}_{j,k}|>\lambda\}}-\beta_{j,k} )\psi_{j,k}\|_{p}^{p}\lesssim\sum_{i=1}^{4}E\|T_{i}\|_{p}^{p}\). By (4.4) and (4.5), it is sufficient to show

$$ E\|T_{i}\|_{p}^{p}\lesssim(\ln n)^{p} \biggl(\frac{\ln n}{n}\biggr)^{\alpha p} \quad(i=1, 2, 3, 4) $$
(4.6)

for the conclusion of Theorem 2.2.

To prove (4.6) for \(i=1\), applying Theorem 1.2, one has

$$E\|T_{1}\|_{p}^{p}\lesssim(j_{1}-j_{0}+1)^{p-1} \sum_{j=j_{0}}^{j_{1}}2^{j(\frac{p}{2}-1)}\sum _{k\in\wedge_{j}}E \bigl[ |\hat{\beta}_{j, k}- \beta_{j, k} |^{p}1_{\{|\hat{\beta}_{j, k}-\beta_{j, k}|\geq\lambda/2\}} \bigr] $$

due to the fact that \(\{|\hat{\beta}_{j, k}|> \lambda,|\beta_{j, k}|<\lambda/2 \}\subseteq \{ |\hat{\beta}_{j, k}-\beta_{j, k} |\geq\lambda/2 \}\). By the Hölder inequality,

$$\begin{aligned} E\|T_{1}\|_{p}^{p}&\lesssim(j_{1}-j_{0}+1)^{p-1} \sum_{j=j_{0}}^{j_{1}}2^{j(\frac{p}{2}-1)}\sum _{k\in\wedge_{j}} \bigl( E |\hat{\beta}_{j, k}- \beta_{j, k} |^{2p} \bigr)^{\frac{1}{2}} \bigl[E ( 1_{\{|\hat{\beta}_{j, k}-\beta_{j, k}|\geq\lambda/2\}} ) \bigr]^{\frac{1}{2}} \\ &\lesssim(j_{1}-j_{0}+1)^{p-1}\sum _{j=j_{0}}^{j_{1}}2^{j(\frac{p}{2}-1)}\sum _{k\in\wedge_{j}}\bigl( E|\hat{\beta}_{j, k}- \beta_{j, k}|^{2p}\bigr)^{\frac{1}{2}}\bigl[P\bigl(|\hat{\beta }_{j, k}-\beta_{j, k}|\geq\lambda/2\bigr)\bigr]^{\frac{1}{2}}. \end{aligned}$$

This with Lemma 3.1 and Lemma 3.2 leads to

$$E\|T_{1}\|_{p}^{p}\lesssim(j_{1}-j_{0}+1)^{p-1}n^{-\frac{p}{2}} \sum_{j=j_{0}}^{j_{1}}2^{\frac{3}{2}pj}2^{-\frac{1}{2}\varepsilon j} \lesssim(\ln n)^{p-1}n^{-\frac{1}{2}p}\sum_{j=j_{0}}^{j_{1}}2^{(\frac{3}{2}p-\frac{1}{2}\varepsilon)j} $$

thanks to \(j_{1}-j_{0}\sim\ln n\) by the choice of \(j_{0}\) and \(j_{1}\). Take ε such that \(\varepsilon>3 p\). Then \(E\|T_{1}\| _{p}^{p}\lesssim(\ln n)^{p-1}n^{-\frac{p}{2}}2^{\frac{3}{2}pj_{0}} \lesssim(\ln n)^{p-1}n^{-\frac{ps}{2s+3}}\lesssim(\ln n)^{p}n^{-\alpha p}\) due to the choice of \(j_{0}\) and \(\alpha\leq\frac {s}{2s+3}\). That is, (4.6) holds for \(i=1\).

To show (4.6) for \(i=3\), one uses the fact that \(\{|\hat{\beta}_{j, k}|\leq\lambda, |\beta_{j, k}|>2\lambda \} \subseteq \{|\hat{\beta}_{j, k}-\beta_{j, k}|\geq\lambda/2 \}\). Hence,

$$E\|T_{3}\|_{p}^{p}\lesssim(j_{1}-j_{0}+1)^{p-1} \sum_{j=j_{0}}^{j_{1}}2^{j(\frac{p}{2}-1)}\sum _{k\in\wedge_{j}}E \bigl[|\beta_{j, k}|^{p}1_{\{|\hat{\beta}_{j, k}-\beta_{j, k}|> \lambda /2\}} \bigr] $$

thanks to Theorem 1.2. When \(|\hat{\beta}_{j, k}|\leq\lambda< |\beta _{j, k}|/2\), \(|\hat{\beta}_{j, k}-\beta_{j, k}|\geq|\beta_{j, k}|-|\hat {\beta}_{j, k}|>|\beta_{j, k}|/2>\lambda\). Then

$$E\|T_{3}\|_{p}^{p}\lesssim(j_{1}-j_{0}+1)^{p-1} \sum_{j=j_{0}}^{j_{1}}2^{j(\frac{p}{2}-1)}\sum _{k\in\wedge _{j}}E \bigl[ |\hat{\beta}_{j, k}- \beta_{j, k} |^{p}1_{\{|\hat {\beta}_{j, k}-\beta_{j, k}|> \lambda/2\}} \bigr] $$

due to Theorem 1.2. The same arguments as above shows \(E\|T_{3}\| _{p}^{p}\lesssim(\ln n)^{p}n^{-\alpha p}\), which is the desired conclusion (4.6) with \(i=3\).

In order to estimate \(E\|T_{2}\|_{p}^{p}\) and \(E\|T_{4}\| _{p}^{p}\), one defines

$$2^{j_{0}^{\ast}}\sim\biggl(\frac{n}{\ln n}\biggr)^{\frac{1-2\alpha }{3}},\qquad 2^{j_{1}^{\ast}}\sim\biggl(\frac{n}{\ln n}\biggr)^{\frac{\alpha}{s-\frac {1}{r}+\frac{1}{p}}}. $$

Recall that \(2^{j_{0}}\sim n^{\frac{1}{2m+3}}\), \(2^{j_{1}}\sim\frac {n}{\ln n}\), and \(\alpha:=\min\{ \frac{s}{2s+3}, \frac{s-\frac{1}{r}+\frac {1}{p}}{2(s-\frac{1}{r})+3}\}\). Then \(\frac{1-2\alpha}{3}\geq\frac{1}{2s+3}>\frac{1}{2m+3}\) and \(\frac {\alpha}{s-\frac{1}{r}+\frac{1}{p}}\leq\frac{1}{2(s-\frac {1}{r})+3}\leq1\). Hence, \(2^{j_{0}}\leq2^{j_{0}^{\ast}}\) and \(2^{j_{1}^{\ast}}\leq 2^{j_{1}}\). Moreover, a simple computation shows \(\frac{1-2\alpha }{3}\leq\frac{\alpha}{s-\frac{1}{r}+\frac{1}{p}}\), which implies \(2^{j_{0}^{\ast}}\leq2^{j_{1}^{\ast}}\).

One estimates \(E\|T_{2}\|_{p}^{p}\) by dividing \(T_{2}\) into

$$ T_{2}=\sum_{j=j_{0}}^{j_{1}}\sum _{k\in\wedge_{j}}(\hat {\beta}_{j, k}- \beta_{j, k})\psi_{j, k}(x)1_{\{|\hat{\beta}_{j, k}|>\lambda, |\beta_{j, k}|\geq\lambda/2\}}=\sum _{j=j_{0}}^{j_{0}^{\ast}}+\sum_{j=j_{0}^{\ast }+1}^{j_{1}}=:t_{1}+t_{2}. $$
(4.7)

Then \(E\|t_{1}\|_{p}^{p}\lesssim(j_{1}-j_{0}+1)^{p-1}\sum_{j=j_{0}}^{j_{0}^{\ast}}2^{j(\frac{1}{2}p-1)}\sum_{k\in\wedge _{j}}E |\hat{\beta}_{j, k}-\beta_{j, k}|^{p}\) due to Theorem 1.2. This with Lemma 3.1 and the definition of \(j_{0}^{\ast}\) leads to

$$ E\|t_{1}\|_{p}^{p}\lesssim(j_{1}-j_{0}+1)^{p-1} \sum_{j=j_{0}}^{j_{0}^{\ast}}2^{\frac{3}{2}p j}n^{-\frac{1}{2}p} \lesssim (\ln n)^{p-1}n^{-\frac{1}{2}p}2^{\frac{3}{2}p j_{0}^{\ast}}\lesssim (\ln n)^{\frac{1}{2}p-1}\biggl(\frac{\ln n}{n}\biggr)^{\alpha p}. $$
(4.8)

To estimate \(E\|t_{2}\|_{p}^{p}\), one observes that \(1_{\{|\hat{\beta}_{j, k}|>\lambda, |\beta_{j, k}|\geq\lambda/2\}}\leq 1_{\{|\beta_{j, k}|\geq\lambda/2\}}\leq(\frac{|\beta_{j, k}|}{\lambda /2})^{r}\). Then it follows from Theorem 1.2 that

$$ E\|t_{2}\|_{p}^{p}\lesssim(j_{1}-j_{0}+1)^{p-1} \sum_{j=j_{0}^{\ast}+1}^{j_{1}}2^{j(\frac{1}{2}p-1)}\sum _{k\in\wedge _{j}}E |\hat{\beta}_{j, k}-\beta_{j, k}|^{p} \biggl(\frac{|\beta_{j, k}|}{\lambda/2}\biggr)^{r}. $$
(4.9)

By Lemma 3.1, \(E |\hat{\beta}_{j, k}-\beta_{j, k}|^{p}\leq n^{-\frac {p}{2}}2^{pj}\). On the other hand, \(\|\beta_{j, \cdot}\|_{r}\leq 2^{-j(s+\frac{1}{2}-\frac{1}{r})}\) for \(f\in B^{s}_{r,q}(H)\) due to Theorem 1.1. Then (4.9) reduces to

$$E\|t_{2}\|_{p}^{p}\lesssim(j_{1}-j_{0}+1)^{p-1}n^{-\frac{p}{2}} \sum_{j=j_{0}^{\ast}+1}^{j_{1}} 2^{-j(sr+\frac{1}{2}r-\frac{3}{2}p)} \lambda^{-r}. $$

Substituting \(\lambda=T2^{j}\sqrt{\frac{\ln n}{n}}\) into the above inequality, one gets

$$E\|t_{2}\|_{p}^{p}\lesssim(\ln n)^{p-\frac{1}{2}r-1}n^{\frac {1}{2}(r-p)}\sum_{j=j_{0}^{\ast}+1}^{j_{1}} 2^{-j(sr+\frac{3}{2}r-\frac{3}{2}p)} $$

due to \(j_{1}-j_{0}\sim\ln n\). Denote \(\theta:=sr+\frac{3}{2}r-\frac {3}{2}p\). When \(\theta>0\), \(r>\frac{3p}{2s+3}\), and

$$ E\|t_{2}\|_{p}^{p}\lesssim(\ln n)^{p-\frac{1}{2}r-1}n^{\frac {1}{2}(r-p)}2^{-j_{0}^{\ast}(sr+\frac{3}{2}r-\frac{3}{2}p)}\lesssim (\ln n)^{\frac{1}{2}p-1} \biggl(\frac{\ln n}{n}\biggr)^{\alpha p} $$
(4.10)

thanks to the definition of \(2^{j_{0}^{\ast}}\).

Moreover, (4.10) also holds for \(\theta\leq0\). In fact, the same analysis as (4.9) produces

$$E\|t_{2}\|_{p}^{p}\lesssim(j_{1}-j_{0}+1)^{p-1} \sum_{j=j_{0}^{\ast}+1}^{j_{1}}2^{j(\frac{1}{2}p-1)}\sum _{k\in\wedge _{j}}E |\hat{\beta}_{j, k}-\beta_{j, k}|^{p} \biggl(\frac{|\beta_{j, k}|}{\lambda/2}\biggr)^{r_{1}}, $$

where \(r_{1}:=(1-2\alpha)p\). When \(\theta\leq0\), \(\alpha=\frac{s-\frac {1}{r}+\frac{1}{p}}{2(s-\frac{1}{r})+3}\leq\frac{s}{2s+3}\), and \(r\leq \frac{3p}{2s+3}\leq(1-2\alpha)p=r_{1}\). Then \(\|\beta_{j, \cdot}\| _{r_{1}}\leq\|\beta_{j, \cdot}\|_{r}\leq2^{-j(s-\frac{1}{r}+\frac {1}{2})}\) for \(f\in B^{s}_{r, q}(H)\) thanks to Theorem 1.1. Using Lemma 3.1 and the definition of λ, one has

$$E\|t_{2}\|_{p}^{p}\lesssim(\ln n)^{p-1-\frac{1}{2}r_{1}}n^{\frac {1}{2}(r_{1}-p)}\sum_{j=j_{0}^{\ast}+1}^{j_{1}}2^{j[\frac {3}{2}p-1-(s-\frac{1}{r}+\frac{3}{2})r_{1}]}. $$

Note that \(\frac{3}{2}p-1-(s-\frac{1}{r}+\frac{3}{2})r_{1}=0\) because of \(r_{1}=(1-2\alpha)p\) and \(\alpha=\frac{s-\frac{1}{r}+\frac {1}{p}}{2(s-\frac{1}{r})+3}\). Hence, \(E\|t_{2}\|_{p}^{p}\lesssim(\ln n)^{p-\frac{1}{2}r_{1}}n^{\frac {1}{2}(r_{1}-p)}\lesssim(\ln n)^{\frac{1}{2}p}(\frac{\ln n}{n})^{\alpha p}\), which shows (4.10). Combing (4.7), (4.8), and (4.10), one obtains the desired conclusion (4.6) with \(i=2\).

To end the proof, it is sufficient to estimate \(E\|T_{4}\| _{p}^{p}\). When \(\theta> 0\), one splits \(T_{4}\) into

$$ T_{4}=\sum_{j=j_{0}}^{j_{1}}\sum _{k\in\wedge_{j}}\beta _{j, k}\psi_{j, k}(x)1_{\{|\hat{\beta}_{j, k}|\leq\lambda, |\beta_{j, k}|\leq2 \lambda\}}= \sum_{j=j_{0}}^{j_{0}^{\ast}}+\sum _{j=j_{0}^{\ast}+1}^{j_{1}}=:e_{1}+e_{2}. $$
(4.11)

Since \(|\beta_{j, k}|1_{\{|\hat{\beta}_{j, k}|\leq2\lambda, |\beta_{j, k}|\leq2 \lambda\}}\leq2|\lambda|\), \(E\|e_{1}\|_{p}^{p}\lesssim(j_{1}-j_{0}+1)^{p-1}\sum_{j=j_{0}}^{j_{0}^{\ast}}2^{j(\frac{1}{2}p-1)}2^{j}|\lambda|^{p}\) due to Theorem 1.2. Note that \(\lambda=T 2^{j}\sqrt{\frac{\ln n}{n}}\), \(2^{j_{0}^{\ast}}\sim(\frac{n}{\ln n})^{\frac{1-2\alpha}{3}}\), and \(\alpha=\frac{s}{2s+3}\) when \(\theta>0\). Then

$$ E\|e_{1}\|_{p}^{p}\lesssim (\ln n)^{\frac{3}{2}p-1}n^{-\frac{p}{2}}\sum_{j=j_{0}}^{j_{0}^{\ast}}2^{\frac{3}{2}pj} \lesssim (\ln n)^{\frac{3}{2}p-1}n^{-\frac{p}{2}}2^{\frac{3}{2}pj_{0}^{\ast }}\lesssim(\ln n)^{p-1}\biggl(\frac{\ln n}{n}\biggr)^{\alpha p}. $$
(4.12)

To estimate \(E\|e_{2}\|_{p}^{p}\) with \(e_{2}=\sum_{j=j_{0}^{\ast}+1}^{j_{1}}\sum_{k\in\wedge_{j}}\beta_{j, k}\psi _{j, k}(x)1_{\{|\hat{\beta}_{j, k}|\leq\lambda, |\beta_{j, k}|\leq2 \lambda\}}\), one uses the fact that \(1_{\{|\hat{\beta}_{j, k}|\leq \lambda, |\beta_{j, k}|\leq2 \lambda\}}\leq(\frac{2\lambda}{|\beta _{j, k}|})^{p-r}\) because of \(r\leq p\). Then

$$\begin{aligned} E\|e_{2}\|_{p}^{p}& \lesssim(j_{1}-j_{0}+1)^{p-1}\sum _{j=j_{0}^{\ast}+1}^{j_{1}}2^{j(\frac{1}{2}p-1)}\sum _{k\in\wedge _{j}}|\beta_{j, k}|^{p}\biggl( \frac{2\lambda}{|\beta_{j, k}|}\biggr)^{p-r} \\ &\lesssim(j_{1}-j_{0}+1)^{p-1}\sum _{j=j_{0}^{\ast }+1}^{j_{1}}2^{j(\frac{1}{2}p-1)}|\lambda|^{p-r} \sum_{k\in\wedge _{j}}|\beta_{j, k}|^{r} \end{aligned}$$
(4.13)

due to Theorem 1.2. By \(f\in B^{s}_{r, q}(H)\) and Theorem 1.1, \(\|\beta_{j, \cdot}\| _{r}\leq2^{-j(s-\frac{1}{r}+\frac{1}{2})}\). Furthermore,

$$\begin{aligned} E\|e_{2}\|_{p}^{p}& \lesssim(j_{1}-j_{0}+1)^{p-1}\biggl( \frac{\ln n}{n}\biggr)^{\frac {1}{2}(p-r)}n^{\frac{r-p}{2}}\sum _{j=j_{0}^{\ast }+1}^{j_{1}}2^{-j(sr+\frac{3}{2}r-\frac{3}{2}p)} \\ &\lesssim(\ln n)^{p-1}\biggl(\frac{\ln n}{n}\biggr)^{\frac {1}{2}(p-r)}2^{-j_{0}^{\ast}(sr+\frac{3}{2}r-\frac{3}{2}p)} \lesssim(\ln n)^{p-1}\biggl(\frac{\ln n}{n}\biggr)^{\alpha p}. \end{aligned}$$
(4.14)

In the last inequality, one used the assumption \(2^{j_{0}^{\ast}}\sim (\frac{n}{\ln n})^{\frac{1-2\alpha}{3}}\) and \(\alpha=\frac{s}{2s+3}\) for \(\theta> 0\). This with (4.11) and (4.12) leads to

$$ E\|T_{4}\|_{p}^{p}\lesssim(\ln n)^{p-1} \biggl(\frac{\ln n}{n}\biggr)^{\alpha p}. $$
(4.15)

Then one needs only to show that (4.15) holds for \(\theta\leq0\). Similarly, one divides \(T_{4}\) into

$$T_{4}=\sum_{j=j_{0}}^{j_{1}}\sum _{k\in K_{j}}\beta_{j, k}\psi_{j, k}(x)1_{\{|\hat{\beta}_{j, k}|\leq\lambda, |\beta_{j, k}|\leq2 \lambda\}}= \sum_{j=j_{0}}^{j_{1}^{\ast}}+\sum _{j=j_{1}^{\ast}+1}^{j_{1}}=:e_{1}^{\ast}+e_{2}^{\ast}. $$

For the first sum, proceeding as (4.13) and (4.14), one has \(E\|e_{1}^{\ast}\|_{p}^{p}\lesssim(j_{1}-j_{0}+1)^{p-1} (\frac{\ln n}{n})^{\frac{p-r}{2}}\sum_{j=j_{0}}^{j_{1}^{\ast}}2^{-j(sr+\frac {3}{2}r-\frac{3}{2}p)} \lesssim(j_{1}-j_{0}+1)^{p-1}(\frac{\ln n}{n})^{\frac{p-r}{2}}2^{-j_{1}^{\ast}(sr+\frac{3}{2}r-\frac{3}{2}p)}\). Note that \(j_{1}-j_{0}\sim\ln n\) and \(2^{j_{1}^{\ast}}\sim(\frac {n}{\ln n})^{\frac{\alpha}{s-\frac{1}{r}+\frac{1}{p}}}\). Then \(E\| e_{1}^{\ast}\|_{p}^{p}\lesssim(\ln n)^{p-1}(\frac{\ln n}{n})^{\alpha p}\) due to \(\alpha=\frac{s-\frac{1}{r}+\frac{1}{p}}{2(s-\frac{1}{r})+3}\) for \(\theta\leq0\).

To estimate the second sum, using Theorem 1.2,

$$E\bigl\| e_{2}^{\ast}\bigr\| _{p}^{p} \lesssim(j_{1}-j_{0}+1)^{p-1}\sum _{j=j_{1}^{\ast}+1}^{j_{1}}2^{j(\frac{1}{2}p-1)}\sum _{k\in\wedge _{j}}|\beta_{j, k}|^{p}. $$

By \(f\in B^{s}_{r, q}(H)\) and Theorem 1.1, \(\|\beta_{j, \cdot}\| _{p}\leq\|\beta_{j, \cdot}\|_{r}\lesssim2^{-j(s-\frac{1}{r}+\frac {1}{2})}\). Hence, \(E\|e_{2}^{\ast}\|_{p}^{p}\lesssim (j_{1}-j_{0}+1)^{p-1}\sum_{j=j_{1}^{\ast}+1}^{j_{1}}2^{-j(s-\frac {1}{r}+\frac{1}{p})p}\lesssim(j_{1}-j_{0}+1)^{p-1}2^{-j_{1}^{\ast }(s-\frac{1}{r}+\frac{1}{p})p}\). By the choice of \(j_{1}^{\ast}\), \(E\|e_{2}^{\ast}\|_{p}^{p}\lesssim(\ln n)^{p-1}(\frac{\ln n}{n})^{\alpha p}\) because of \(\alpha=\frac{s-\frac{1}{r}+\frac {1}{p}}{2(s-\frac{1}{r})+3}\), when \(\theta\leq0\). Then the desired conclusion (4.15) follows. This completes the proof of Theorem 2.2. □