1 Introduction

Random variables \(X_{1}, X_{2}, \ldots , X_{n}\) are said to be negatively dependent (ND), if for any \(x_{1}, x_{2},\ldots , x_{n} \in \mathbb{R}\),

$$ \mathbf{P}(X_{1}\leq x_{1}, X_{2}\leq x_{2},\dots ,X_{n}\leq x_{n}) \leq \prod _{i=1}^{n} P(X_{i}\leq x_{i}), $$

and

$$ \mathbf{P}(X_{1}> x_{1}, X_{2}> x_{2}, \dots ,X_{n}> x_{n})\leq \prod_{i=1}^{n} P(X_{i}>x_{i}). $$

The definition was introduced by Bozorgnia [3]. Further discussion and related concepts can be found in [2, 10]. ND random variables are very useful in reliability theory and applications. Because of the wide applications, the notion of ND random variables has been receiving more and more attention recently. A series of useful results have been established (see [13,14,15,16]). Hence, we consider density estimation for ND random variables in this paper.

For density estimation, Donoho et al. [6] defined wavelet estimators and showed their convergence rates on \(L^{p}\)-loss, when \(X_{1}, X_{2}, \ldots , X_{n}\) are independent. They found that the convergence rate of the nonlinear estimator is better than that of the linear one. In many cases, random variables \(X_{1}, X_{2}, \ldots , X_{n}\) are dependent. Doosti et al. [8] proposed a linear wavelet estimator and evaluated its \(L^{p}\) (\(1\leq p<\infty \)) risks for negatively associated random variables. Soon afterwards, the above results were extended to the case of negatively dependent sequences [7]. Chesneau [4] and Liu [12] also considered density estimation for an NA sample. Kou [11] defined linear and nonlinear wavelet estimators for mixing data and obtained their convergence rates.

Motivated by the above work, this paper will estimate the unknown density function f from a sequence of ND data \(X_{1}, X_{2}, \ldots , X_{n}\). We shall define wavelet estimators and give their upper bounds on \(L^{p}\)-loss. It turns out that our results reduce to Donoho’s classical theorems in [6], when the random sample is independent.

We establish our results on Besov spaces on a compact subset of the real line \(\mathbb{R}\). As usual, the Sobolev spaces with integer exponents are defined as

$$ W_{r}^{n}(\mathbb{R}):=\bigl\{ f\in L^{r}( \mathbb{R}), f^{(n)}\in L^{r}( \mathbb{R})\bigr\} $$

with \(\|f\|_{W_{r}^{n}}:=\|f\|_{r}+\|f^{(n)}\|_{r}\). Then \(L^{r}( \mathbb{R})\) can be considered as \(W_{r}^{0}(\mathbb{R})\). For \(1\leq r,q\leq \infty \) and \(s=n+\alpha \) with \(\alpha \in (0,1]\), a Besov space on \(\mathbb{R}\) means

$$ B^{s}_{r,q}(\mathbb{R}):=\bigl\{ f\in W_{r}^{n}( \mathbb{R}), \bigl\Vert t^{-\alpha } \omega _{r}^{2} \bigl(f^{(n)},t\bigr) \bigr\Vert _{q}^{*}< \infty \bigr\} $$

with the norm \(\|f\|_{srq}:=\|f\|_{W_{r}^{n}}+\|t^{-\alpha }\omega _{r}^{2}(f^{(n)},t)\|_{q}^{*}\), where \(\omega _{r}^{2}(f,t):=\sup_{|h|\leq t}\|f(\cdot +2h)-2f(\cdot +h)+f(\cdot )\|_{r}\) stands for the smoothness modulus of f and

$$ \Vert h \Vert _{q}^{*}= \textstyle\begin{cases} (\int _{0}^{\infty } \vert h(t) \vert ^{q}\frac{dt}{t})^{\frac{1}{q}}, & \text{if } 1\leq q< \infty ; \\ \operatorname{ess}\sup _{t} \vert h(t) \vert , & \text{if } q=\infty . \end{cases} $$

We always assume \(f\in B^{s}_{r,q}(\mathbb{R}, L)=\{f\in B^{s}_{r,q}( \mathbb{R}), f \text{ is a probability density and } \|f\|_{srq}\leq L\}\) with \(L>0\). Let \(\phi \in C_{0}^{t}(\mathbb{R})\) be an orthonormal scaling function with \(t>\max \{s,1\}\). Then ϕ is a function of bounded variation (BV). The corresponding wavelet function is denoted by ψ. It is well known that \(\{\phi _{J,k}, \psi _{j,k}, j\geq J, k\in Z\}\) constitutes an orthonormal basis of \(L^{2}(\mathbb{R})\), where \(\phi _{J,k}(x):=2^{\frac{J}{2}} \psi (2^{J}x-k)\), \(\psi _{j,k}(x):=2^{\frac{j}{2}}\psi (2^{j}x-k)\) as in wavelet analysis [5]. Then for each \(f\in L^{2}(\mathbb{R})\), \(\alpha _{J,k}=\int f(x)\phi _{J,k}(x)\,dx\), and \(\beta _{j,k}=\int f(x)\psi _{j,k}(x)\,dx\), we have

$$ f(x)=\sum_{k\in \mathbb{Z}}\alpha _{J,k}\phi _{J,k}(x)+\sum_{j\geq J}\sum _{k\in \mathbb{Z}}\beta _{j,k}\psi _{j,k}(x). $$

Here and in what follows, \(A\lesssim B\) denotes \(A\leq C B\) for some constant \(C>0\); \(A\gtrsim B\) means \(B\lesssim A\); \(A\sim B\) stands for both \(A\lesssim B\) and \(B\lesssim A\). The following theorems are needed in our discussion:

Theorem 1.1

(Härdle et al. [9])

Let \(f\in L^{r}(\mathbb{R})\) (\(1\leq r\leq \infty \)), \(\alpha _{J,k}=\int f(x) \phi _{J,k}(x)\,dx\) and \(\beta _{j,k}=\int f(x)\psi _{j,k}(x)\,dx\). The following assertions are equivalent:

  1. (i)

    \(f\in B^{s}_{r,q}\) \((\mathbb{R})\), \(s>0\), \(1\leq q\leq \infty \);

  2. (ii)

    \(\{2^{js}\|P_{j}f-f\|_{r}\}_{j\geq 0}\in l^{q}\) with \(P_{j}f:=\sum_{k\in \mathbb{Z}}\alpha _{j,k}\phi _{j,k}\);

  3. (iii)

    \(\|\alpha _{J\cdot }\|_{r}+\|\{2^{j(s+\frac{1}{2}-\frac{1}{r})} \|\beta _{j\cdot }\|_{r}\}_{j\geq 0}\|_{q}<+\infty \).

Moreover,

$$ \Vert f \Vert _{srq}\sim \bigl\Vert \bigl(2^{js} \Vert P_{j}f-f \Vert _{r}\bigr)_{j\geq 0} \bigr\Vert _{q}\sim \Vert \alpha _{J\cdot } \Vert _{r}+ \bigl\Vert \bigl\{ 2^{j(s+\frac{1}{2}-\frac{1}{r})} \Vert \beta _{j\cdot } \Vert _{r}\bigr\} _{j\geq 0} \bigr\Vert _{q}. $$

Theorem 1.2

(Härdle et al. [9])

Let \(\theta _{\phi }(x):=\sum_{k}|\phi (x-k)|\) and \(\operatorname{ess} \sup_{x}\theta _{\phi }(x)<\infty \). Then for \(\lambda =\{ \lambda _{k}\}\in l^{r}(\mathbb{Z})\) and \(1\leq r\leq \infty \),

$$ \biggl\Vert \sum_{k\in \mathbb{Z}}\lambda _{k}\phi _{jk} \biggr\Vert _{r}\sim 2^{j( \frac{1}{2}-\frac{1}{r})} \Vert \lambda \Vert _{r}. $$

Negatively dependent random variables possess the following property which will be used in this paper.

Theorem 1.3

(Bozorgnia et al. [3])

Let \(X_{1},\ldots ,X_{n}\) be a sequence of ND random variables and let \(A_{1},\ldots ,A_{m}\) be some pairwise disjoint nonempty subsets of \(\{1,\ldots ,n\}\) with \(\alpha _{i}=\sharp (A_{i})\), where \(\sharp (A)\) denotes the number of elements in the set A. If \(f_{i}: \mathbb{R} ^{\alpha _{i}}\rightarrow \mathbb{R}\) (\(i=1,\ldots ,m\)) are m coordinatewise nondecreasing (nonincreasing) functions, then \(f_{1}(X_{i}, i\in A_{1}), \ldots , f_{m}(X_{i}, i\in A_{m})\) are also ND. In particular, for any \(t_{i}\geq 0 (\leq 0)\), \(1\leq i\leq m\),

$$ \mathbf{E}\Biggl[\exp \Biggl(\sum_{i=1}^{n}t_{i}X_{i} \Biggr)\Biggr]\leq \prod_{i=1}^{n} \mathbf{E} \bigl[\exp (t_{i}X_{i})\bigr]. $$

2 Linear estimators

In this section, we shall give a linear wavelet estimator for a density function \(f(x)\) in a Besov space.

The linear wavelet estimator of \(f(x)\) is defined as follows:

$$ \hat{f^{\mathrm{lin}}_{n}}(x)=\sum _{k\in K_{0}}\hat{\alpha }_{j_{0},k} \phi _{j_{0},k}(x), $$
(1)

where \(K_{0}=\{k\in \mathbb{Z}, \operatorname{supp} f\cap \operatorname{supp} \phi _{j_{0},k}\neq \emptyset \}\),

$$ \hat{\alpha }_{j_{0},k}=\frac{1}{n}\sum _{i=1}^{n}\phi _{j_{0},k}(X _{i}). $$
(2)

The following inequalities play important roles in this paper.

Lemma 2.1

(Rosenthal’s inequality, see Asadian et al. [1])

Let \(X_{1},\ldots ,X_{n}\) be a sequence of ND random variables, which satisfy \(\mathbf{E}X_{i}=0\) and \(\mathbf{E}|X_{i}|^{p}< \infty \), where \(i=1,\dots ,n\). Then

$$\begin{aligned}& \mathbf{E}\Biggl( \Biggl\vert \sum_{i=1}^{n}X_{i} \Biggr\vert ^{p}\Biggr)\lesssim \sum_{i=1} ^{n}\mathbf{E} \vert X_{i} \vert ^{p}+\Biggl( \sum_{i=1}^{n}\mathbf{E}X_{i}^{2} \Biggr)^{ \frac{p}{2}}, \quad p\geq 2, \\& \mathbf{E}\Biggl( \Biggl\vert \sum_{i=1}^{n}X_{i} \Biggr\vert ^{p}\Biggr)\leq \Biggl(\sum_{i=1} ^{n}\mathbf{E}X_{i}^{2}\Biggr)^{\frac{p}{2}}, \quad 0< p\leq 2. \end{aligned}$$

Lemma 2.2

Let \(X_{1}, X_{2}, \ldots , X _{n}\) be ND random variables and let the density function f be bounded and compactly supported with support length less than \(H>0\). Then for \(\hat{\alpha }_{{j_{0}},k}\) defined by (2) we have

$$ \mathbf{E} \vert \hat{\alpha }_{{j_{0}},k}-\alpha _{{j_{0}},k} \vert ^{p}\lesssim n^{-\frac{p}{2}} $$

for \(1\leq p<\infty \) and \(2^{j_{0}}\leq n\).

Proof

By the definition of \(\hat{\alpha }_{{j_{0}},k}\), one has

$$ \mathbf{E} \vert \hat{\alpha }_{{j_{0}},k}-\alpha _{{j_{0}},k} \vert ^{p} =\frac{1}{n ^{p}}\mathbf{E} \Biggl\vert \sum_{i=1}^{n}\bigl[\phi _{{j_{0}},k}(X_{i})- \alpha _{{j_{0}},k}\bigr] \Biggr\vert ^{p}. $$
(3)

Let \(\xi _{i}:=\phi _{{j_{0}},k}(X_{i})-\alpha _{{j_{0}},k}\) (\(i=1,2, \ldots ,n\)). Clearly,

$$ \mathbf{E} \Biggl\vert \sum _{i=1}^{n}\bigl[\phi _{{j_{0}},k}(X_{i})- \alpha _{{j_{0}},k}\bigr] \Biggr\vert ^{p} =\mathbf{E} \Biggl\vert \sum_{i=1}^{n}\xi _{i} \Biggr\vert ^{p}. $$
(4)

One can choose a scaling function ϕ, which a function of bounded variation, and assume \(\phi :=\tilde{\phi }-\bar{\phi }\), where ϕ̃ and ϕ̄ are bounded, nonnegative and nondecreasing functions. Define

$$ \tilde{\alpha }_{{j_{0}},k}:= \int \tilde{\phi }_{{j_{0}},k}(x)f(x)\,dx, \qquad \bar{\alpha }_{{j_{0}},k}:= \int \bar{\phi }_{{j_{0}},k}(x)f(x)\,dx, $$

and

$$ \tilde{\xi }_{i}:=\tilde{\phi }_{{j_{0}},k}(X_{i})- \tilde{\alpha } _{{j_{0}},k}, \qquad \bar{\xi }_{i}:=\bar{\phi }_{{j_{0}},k}(X_{i})-\bar{\alpha }_{{j_{0}},k}. $$

Then \(\alpha _{{j_{0}},k}=\tilde{\alpha }_{{j_{0}},k}-\bar{\alpha } _{{j_{0}},k}\), \(\xi _{i}=\tilde{\xi }_{i}-\bar{\xi }_{i}\) and

$$ \mathbf{E} \Biggl\vert \sum _{i=1}^{n}\xi _{i} \Biggr\vert ^{p} =\mathbf{E} \Biggl\vert \sum_{i=1}^{n}( \tilde{\xi }_{i}-\bar{\xi }_{i}) \Biggr\vert ^{p}. $$
(5)

It is easy to see that \(\mathbf{E}\tilde{\xi }_{i}=0\), the random variables \(\tilde{\xi }_{1}, \ldots , \tilde{\xi }_{n}\) are ND due to the nondecreasing property ϕ̃ and Theorem 1.3. To apply the Rosenthal’s inequality, one shows an inequality

$$ \mathbf{E} \vert \tilde{\xi }_{i} \vert ^{m}\lesssim 2^{\frac{(m-2)j_{0}}{2}} $$
(6)

for \(m\geq 2\). In fact,

$$ \mathbf{E} \vert \tilde{\xi }_{i} \vert ^{m} =\mathbf{E} \bigl\vert \tilde{\phi }_{j_{0},k}(X _{i})-\tilde{\alpha }_{{j_{0}},k} \bigr\vert ^{m} \lesssim \mathbf{E} \bigl\vert \tilde{\phi }_{j_{0},k}(X_{i}) \bigr\vert ^{m}+ \vert \tilde{\alpha }_{{j_{0}},k} \vert ^{m}. $$
(7)

Note that \(|\tilde{\phi }_{j_{0},k}(x)|\lesssim 2^{\frac{{j_{0}}}{2}}\). Then for \(m\geq 2\),

$$\begin{aligned} \mathbf{E} \bigl\vert \tilde{\phi }_{j_{0},k}(X_{i}) \bigr\vert ^{m} =&\mathbf{E}\bigl[ \bigl\vert \tilde{\phi }_{j_{0},k}(X_{i}) \bigr\vert ^{2} \bigl\vert \tilde{\phi }_{j_{0},k}(X_{i}) \bigr\vert ^{m-2}\bigr] \\ \lesssim &2^{\frac{(m-2){j_{0}}}{2}}\mathbf{E} \bigl\vert \tilde{\phi }_{j_{0},k} ^{2}(X_{i}) \bigr\vert . \end{aligned}$$
(8)

Note that \(f\in B^{s}_{r,q}(\mathbb{R},L)\subseteq B^{s-\frac{1}{r}} _{\infty ,q}(\mathbb{R},L)\). Then \(\|f\|_{\infty }\leq L\). Using \(\tilde{\phi }\in L^{2}(\mathbb{R})\), one knows that

$$ \mathbf{E} \bigl\vert \tilde{\phi }_{j_{0},k}^{2}(X_{i}) \bigr\vert \lesssim \int ( \tilde{\phi }_{j_{0},k})^{2}(x)f(x)\,dx = \int \bigl\vert \tilde{\phi }(x-k) \bigr\vert ^{2}f \bigl(2^{-j}x\bigr)\,dx \lesssim 1, $$

and \(|\tilde{\alpha }_{{j_{0}},k}|=|\int f(x)\tilde{\phi }_{j_{0},k}(x)\,dx| \lesssim 1\) because of suppf is contained in some interval I with length \(|I|\leq H\). This, together with (8) and (7), leads to (6).

By Rosenthal’s inequality with \(1\leq p\leq 2\),

$$ \mathbf{E} \Biggl\vert \sum_{i=1}^{n} \tilde{\xi }_{i} \Biggr\vert ^{p}\leq \Biggl[\sum _{i=1}^{n_{m}}\mathbf{E}(\tilde{\xi }_{i})^{2} \Biggr]^{\frac{p}{2}} \lesssim n^{\frac{p}{2}}. $$

Similarly, \(\mathbf{E}|\sum_{i=1}^{n}\bar{\xi }_{i}|^{p}\lesssim n^{\frac{p}{2}}\). Combining this with (5), one has

$$ \mathbf{E} \Biggl\vert \sum _{i=1}^{n}\xi _{i} \Biggr\vert ^{p}\lesssim \mathbf{E} \Biggl\vert \sum _{i=1}^{n}\tilde{\xi }_{i} \Biggr\vert ^{p} +\mathbf{E} \Biggl\vert \sum_{i=1}^{n} \bar{\xi }_{i} \Biggr\vert ^{p}\lesssim n^{\frac{p}{2}}. $$
(9)

Substituting (9) into (4), one obtains

$$ \mathbf{E} \Biggl\vert \sum_{i=1}^{n} \bigl[\phi _{{j_{0}},k}(X_{i})- \alpha _{{j_{0}},k}\bigr] \Biggr\vert ^{p} \lesssim n^{\frac{p}{2}}. $$

This with (3) shows that for \(1\leq p\leq 2\),

$$ E \vert \hat{\alpha }_{{j_{0}},k}-\alpha _{{j_{0}},k} \vert ^{p} \lesssim \frac{1}{n ^{p}}\times n^{\frac{p}{2}}= n^{-\frac{p}{2}}. $$
(10)

When \(2\leq p<\infty \), Rosenthal’s inequality and (6) show that

$$ \mathbf{E} \Biggl\vert \sum_{i=1}^{n} \tilde{\xi }_{i} \Biggr\vert ^{p}\lesssim \sum _{i=1}^{n}\mathbf{E} \vert \tilde{\xi }_{i} \vert ^{p} +\Biggl[\sum _{i=1} ^{n}\mathbf{E}(\tilde{\xi }_{i})^{2} \Biggr]^{\frac{p}{2}}\lesssim n2^{\frac{(p-2)j _{0}}{2}}+n^{\frac{p}{2}}. $$

Similarly, \(\mathbf{E}|\sum_{i=1}^{n}\bar{\xi }_{i}|^{p}\lesssim n2^{\frac{(p-2)j_{0}}{2}}+n^{\frac{p}{2}}\). Hence \(\mathbf{E}|\sum_{i=1}^{n}\xi _{i}|^{p}\lesssim n2^{\frac{(p-2)j_{0}}{2}}+n^{ \frac{p}{2}}\). Furthermore, it follows from (4), (3) and \(2^{j_{0}}\leq n \) that

$$ E \vert \hat{\alpha }_{{j_{0}},k}-\alpha _{{j_{0}},k} \vert ^{p} \lesssim \frac{1}{n ^{p}}\bigl[n2^{\frac{(p-2)j_{0}}{2}}+n^{\frac{p}{2}} \bigr]\lesssim n^{- \frac{p}{2}}. $$

Combining this with (10), one concludes the desired inequality of the lemma. □

Theorem 2.1

Let \(f(x)\in B^{s}_{r,q}( \mathbb{R},L)\) (\(s>\frac{1}{r}\), \(r, q \geq 1\)) and let \(\hat{f}^{\mathrm{lin}} _{n}\) be defined by (1). Under the conditions of Lemma 2.2, for each \(1\leq p<\infty \), one has

$$ \sup _{f\in B^{s}_{r,q}(\mathbb{R},L)}\mathbf{E} \bigl\Vert \hat{f}^{\mathrm{lin}} _{n}-f \bigr\Vert ^{p}_{p}\lesssim n^{-\frac{s^{\prime }p}{2s^{\prime }+1}}, $$

where \(s^{\prime }:=s-(\frac{1}{r}-\frac{1}{p})_{+}\) and \(x_{+}= \max (x, 0)\).

Proof

Since

$$ \mathbf{E} \bigl\Vert \hat{f}^{\mathrm{lin}}_{n}-f \bigr\Vert _{p}^{p} \lesssim \Vert P_{{j_{0}}}f-f \Vert _{p}^{p}+\mathbf{E} \bigl\Vert \hat{f}^{\mathrm{lin}}_{n}-P_{{j_{0}}}f \bigr\Vert _{p}^{p}, $$
(11)

it is sufficient to estimate \(\|P_{{j_{0}}}f-f\|_{p}^{p}\) and \(\mathbf{E}\|\hat{f}^{\mathrm{lin}}_{n}-P_{{j_{0}}}f\|_{p}^{p}\).

When \(r\leq p\), \(s^{\prime }=s-(\frac{1}{r}-\frac{1}{p})_{+}=s- \frac{1}{r}+\frac{1}{p}\) and \(B^{s}_{r,q}(\mathbb{R})\subset B^{s^{ \prime }}_{p,q}(\mathbb{R})\), one has

$$\begin{aligned} \sup _{f\in B^{s}_{r,q}(\mathbb{R},L)} \Vert P_{{j_{0}}}f-f \Vert _{p} ^{p} \lesssim & \sup _{f\in B^{s^{\prime }}_{p,q}(\mathbb{R},L)} \Vert P_{{j_{0}}}f-f \Vert _{p}^{p}. \end{aligned}$$

By the approximation theorem in Besov spaces and from Theorem 9.4 in [9], one gets

$$ \Vert P_{{j_{0}}}f-f \Vert _{p}^{p}\lesssim 2^{-{j_{0}}s^{\prime }p}. $$
(12)

When \(r>p\), because both f and ϕ have compact supports, one can assume that \(\operatorname{supp} (P_{j_{0}}f-f)\subseteq I\) with \(|I|\leq H\). Then Hölder inequality shows

$$ \Vert P_{{j_{0}}}f-f \Vert _{p}^{p}= \int _{I} \bigl\vert P_{{j_{0}}}f(y)-f(y) \bigr\vert ^{p}\,dy\lesssim \Vert P_{{j_{0}}}f-f \Vert _{r}^{p}. $$

Since \(f\in B^{s}_{r,q}(\mathbb{R},L)\), one knows \(\|P_{{j_{0}}}f-f\| _{r}\lesssim 2^{-{j_{0}}s}\). Moreover, \(\|P_{{j_{0}}}f-f\|_{p}^{p} \lesssim 2^{-{j_{0}}sp}\). Note that \(s^{\prime }=s\) for \(r>p\). Then \(\|P_{j_{0}}f-f\|_{p}^{p}\lesssim 2^{-{j_{0}}s^{\prime }p}\). This, together with (12), shows that for \(1\leq p<\infty \),

$$ \Vert P_{{j_{0}}}f-f \Vert _{p}^{p}\lesssim 2^{-{j_{0}}s^{\prime }p}. $$
(13)

Next, one estimates \(\mathbf{E}\|\hat{f}^{\mathrm{lin}}_{n}-P_{{j_{0}}}f\|^{p} _{p}\). It is easy to see that

$$ \hat{f}^{\mathrm{lin}}_{n}-P_{{j_{0}}}f=\sum _{k\in K}(\hat{\alpha }_{ {j_{0}},k}-\alpha _{{j_{0}},k}) \phi _{{j_{0}},k} $$

by the definitions of \(\hat{f}^{\mathrm{lin}}_{n}\) and \(P_{{j_{0}}}f\). Furthermore,

$$ \bigl\Vert \hat{f}^{\mathrm{lin}}_{n}-P_{{j_{0}}}f \bigr\Vert ^{p}_{p}\lesssim 2^{{j_{0}}p( \frac{1}{2}-\frac{1}{p})} \sum _{k\in K} \vert \hat{\alpha }_{{j_{0}},k}- \alpha _{{j_{0}},k} \vert ^{p} $$

due to Theorem 1.2. Let \(|K_{0}|\) denote the number of elements in \(K_{0}\). Then \(|K_{0}|\sim 2^{j_{0}}\), because \(K_{0}:=\{k\in Z, \operatorname{supp} f\cap \operatorname{supp} \phi _{{j_{0}},k}\neq \emptyset \}\) and f, ϕ have compact supports. This, together with Lemma 2.2, leads to

$$ \mathbf{E} \bigl\Vert \hat{f}^{\mathrm{lin}}_{n}-P_{{j_{0}}}f \bigr\Vert ^{p}_{p} \lesssim 2^{\frac{ {j_{0}}p}{2}} \mathbf{E} \vert \hat{\alpha }_{{j_{0}},k}-\alpha _{{j_{0}},k} \vert ^{p} \lesssim \biggl(\frac{2^{{j_{0}}}}{n}\biggr)^{\frac{p}{2}}. $$
(14)

Substituting (13) and (14) into (11), one obtains

$$ \mathbf{E} \bigl\Vert \hat{f}^{\mathrm{lin}}_{n}-f \bigr\Vert _{p}^{p} \lesssim \biggl( \frac{2^{{j_{0}}}}{n} \biggr)^{\frac{p}{2}}+2^{-{j_{0}}s^{\prime }p}. $$

Taking \(2^{{j_{0}}}\sim n^{\frac{1}{2s^{\prime }+1}}\), the desired conclusion follows. □

3 Nonlinear estimators

In this part, we will give a nonlinear wavelet estimator for \(f(x)\), which is better than the linear one in some cases. The nonlinear (hard thresholding) wavelet estimator is defined as follows:

$$ \hat{f}^{\mathrm{non}}_{n}(y):=\sum _{k\in K_{0}}\hat{\alpha }_{j_{0},k} \phi _{j_{0},k}(y)+ \sum_{j=j_{0}}^{j_{1}}\sum _{k\in K _{j}}\hat{\beta }_{j,k}^{*}\psi _{j,k}(y). $$
(15)

Here \(K_{0}=\{k\in \mathbb{Z}, \operatorname{supp} f\cap \operatorname{supp} \phi _{j_{0},k}\neq \emptyset \}\), \(K_{j}=\{k\in \mathbb{Z}, \operatorname{supp} f \cap \operatorname{supp} \psi _{j,k}\neq \emptyset \}\),

$$ \hat{\alpha }_{j_{0},k}=\frac{1}{n}\sum _{i=1}^{n}\phi _{j_{0},k}(X _{i}) \quad \text{and} \quad \hat{\beta }_{j,k}= \frac{1}{n}\sum_{i=1}^{n}\psi _{j,k}(X_{i}) $$
(16)

with \(\hat{\beta }_{j,k}^{*}=\hat{\beta }_{j,k}\mathcal{X}\{| \hat{\beta }_{j,k}|>\lambda =c\sqrt{\frac{j}{n}}\}\) while the constant c is determined (later on) by s, r, p and L.

For the wavelet coefficients, we can get the following lemma whose proof is very similar to that of Lemma 2.2 and so we omit it.

Lemma 3.1

Let \(\hat{\beta }_{j,k}\) be defined by (16). Then under the assumptions of Lemma 2.2,

$$ \mathbf{E} \vert \hat{\beta }_{j,k}-\beta _{j,k} \vert ^{p}\lesssim n^{- \frac{p}{2}} $$

for \(1\leq p<\infty \) and \(2^{j}\leq n\).

To prove Lemma 3.3, we need an important inequality.

Lemma 3.2

(Bernstein’s inequality)

Let \(X_{1},\ldots ,X_{n}\) be a sequence of ND random variables such that \(\mathbf{E}(X_{i})=0\), \(\mathbf{E}(X^{2}_{i})=\sigma ^{2}\) and \(|X_{i}|\leq M<\infty \) (\(i=1,\dots ,n\)). Then for each \(v>0\),

$$ \mathbb{P}\Biggl( \Biggl\vert \frac{1}{n}\sum _{i=1}^{n}X_{i} \Biggr\vert >v\Biggr)\leq 2\exp \biggl(-\frac{nv ^{2}}{2(\sigma ^{2}+\frac{vM}{3})} \biggr). $$

This above inequality is well-known, when \(X_{1}, \ldots , X_{n}\) are independent; see Theorem C.1 on page 241 in [9]. We find by checking the details that the same inequality holds for ND samples: In fact, because Theorem C.1 is a direct corollary of Lemma C.1 (page 239), it suffices to prove that lemma for the ND case. Note that

$$ \mathbf{E}\Biggl[\exp \Biggl(\sum_{i=1}^{n}tX_{i} \Biggr)\Biggr]=\prod_{i=1}^{n}\mathbf{E} \bigl[ \exp (tX_{i})\bigr] $$

for an independent sample \(X_{1}, \ldots ,X _{n}\), while

$$ \mathbf{E}\Biggl[\exp \Biggl(\sum_{i=1}^{n}tX_{i} \Biggr)\Biggr]\leq \prod_{i=1}^{n}\mathbf{E} \bigl[ \exp (tX_{i})\bigr] $$

for ND samples, according to Theorem 1.3. Then we only need to replace the equality

$$ \exp (-{\lambda }t)\mathbf{E}\Biggl[\exp \Biggl(\sum _{i=1}^{n}tX_{i}\Biggr)\Biggr]=\exp \Biggl\{ -\Biggl[ {\lambda } t-\sum_{i=1}^{n}\log \mathbf{E}\bigl(e^{tX_{i}}\bigr)\Biggr]\Biggr\} $$

by

$$ \exp (-{\lambda }t)\mathbf{E}\Biggl[\exp \Biggl(\sum _{i=1}^{n}tX_{i}\Biggr)\Biggr]\leq \exp \Biggl\{ -\Biggl[{\lambda } t-\sum_{i=1}^{n} \log \mathbf{E}\bigl(e^{tX_{i}}\bigr)\Biggr]\Biggr\} $$

on page 240 (line 8–9), in order to complete the proof of Lemma C.1, when \(X_{1}, \ldots , X_{n}\) are ND.

Lemma 3.3

Let \(\hat{\beta }_{j,k}\) be given by (16). Under the assumptions of Lemma 2.2 and if \(j2^{j}\leq n\), then for each \(\omega >0\), there exists \(c>0\) such that

$$ \mathbb{P}\biggl( \vert \hat{\beta }_{j,k}-\beta _{j,k} \vert >\lambda =c\sqrt{ \frac{j}{n}}\biggr)\lesssim 2^{-\omega j}. $$

Proof

It is easy to see that

$$ \vert \hat{\beta }_{j,k}-\beta _{j,k} \vert = \frac{1}{n} \Biggl\vert \sum_{i=1}^{n} \bigl[ \psi _{j,k}(X_{i})-\beta _{j,k}\bigr] \Biggr\vert . $$

Hence,

$$ I:=\mathbb{P}\bigl( \vert \hat{\beta }_{j,k}-\beta _{j,k} \vert >\lambda \bigr) =\mathbb{P}\Biggl( \frac{1}{n} \Biggl\vert \sum _{i=1}^{n}\bigl[\psi _{j,k}(X_{i})- \beta _{j,k}\bigr] \Biggr\vert > \lambda \Biggr). $$

In order to estimate I, denote \(\eta _{i}:=\psi _{j,k}(X_{i})-\beta _{j,k}\) (\(i=1,2,\ldots ,n\)). Then

$$ I=\mathbb{P}\Biggl(\frac{1}{n} \Biggl\vert \sum _{i=1}^{n}\eta _{i} \Biggr\vert >\lambda \Biggr). $$

Since ψ is a function of BV, \(\psi :=\tilde{\psi }-\bar{\psi }\), where ψ̃ and ψ̄ are bounded, nonnegative and nondecreasing functions. Denote

$$ \tilde{\beta }_{j,k}:= \int \tilde{\psi }_{j,k}(x)f(x)\,dx, \qquad \bar{\beta }_{j,k}:= \int \bar{\psi }_{j,k}(x)f(x)\,dx, $$

and

$$ \tilde{\eta }_{i}:=\tilde{\psi }_{j,k}(X_{i})- \tilde{\beta }_{j,k}, \qquad \bar{\eta }_{i}:=\bar{\psi }_{j,k}(X_{i})-\bar{\beta }_{j,k}. $$

Then \(\beta _{j,k}=\tilde{\beta }_{j,k}-\bar{\beta }_{j,k}\), \(\eta _{i}= \tilde{\eta }_{i}-\bar{\eta }_{i}\) and

$$\begin{aligned} I =&\mathbb{P} \Biggl(\frac{1}{n} \Biggl\vert \sum_{i=1}^{n}(\tilde{\eta } _{i}- \bar{\eta }_{i}) \Biggr\vert >{\lambda } \Biggr) \\ \leq &\mathbb{P} \Biggl(\frac{1}{n} \Biggl\vert \sum _{i=1}^{n} \tilde{\eta }_{i} \Biggr\vert >\frac{\lambda }{2} \Biggr) +\mathbb{P} \Biggl(\frac{1}{n} \Biggl\vert \sum_{i=1}^{n}\bar{\eta }_{i} \Biggr\vert >\frac{ \lambda }{2} \Biggr). \end{aligned}$$
(17)

Note that \(\tilde{\eta }_{1}, \ldots , \tilde{\eta }_{n}\) are ND thanks to the monotonicity of ψ̃ and Theorem 1.3. On the other hand, \(\mathbf{E}\tilde{\eta }_{i}=0\), \(\mathbf{E}(\tilde{\eta }_{i})^{2} \lesssim 1\) and \(|\tilde{\eta }_{i}|\lesssim 2^{\frac{j}{2}}\). Using Bernstein’s inequality, one obtains that

$$\begin{aligned} \mathbb{P} \Biggl(\frac{1}{n} \Biggl\vert \sum _{i=1}^{n}\tilde{\eta }_{i} \Biggr\vert > \frac{ \lambda }{2}=\frac{c}{2}\sqrt{\frac{j}{n}} \Biggr)\leq 2\exp \biggl(-\frac{c ^{2}j}{C(1+c\sqrt{\frac{j2^{j}}{n}})} \biggr) \end{aligned}$$

for some fixed constant \(C>0\). Due to \(j2^{j}\leq n\), one can take \(c>0\) such that \(\frac{c^{2}}{C(1+c)}\geq \omega \) and

$$ \mathbb{P} \Biggl(\frac{1}{n} \Biggl\vert \sum _{i=1}^{n}\tilde{\eta }_{i} \Biggr\vert >\frac{ \lambda }{2} \Biggr)\lesssim 2^{-\omega j}. $$
(18)

Similarly, \(\mathbb{P} (\frac{1}{n} |\sum_{i=1}^{n}\bar{ \eta }_{i}|>\frac{\lambda }{2} )\lesssim 2^{-\omega j}\). This, with (18) and (17), leads to

$$ I=\mathbb{P} \Biggl(\frac{1}{n} \Biggl\vert \sum _{i=1}^{n}\eta _{i} \Biggr\vert >\lambda \Biggr)\lesssim 2^{-\omega j}. $$

The desired conclusion follows. □

Theorem 3.1

Let \(f(x)\in B^{s}_{r,q}( \mathbb{R},L)\) (\(s>\frac{1}{r}\), \(r, q \geq 1\)), and let \(\hat{f}^{\mathrm{non}} _{n}\) be defined by (15). Under the assumptions of Lemma 2.2, for each \(1\leq p<\infty \), \(s^{\prime }:=s-(\frac{1}{r}-\frac{1}{p})_{+}\) and \(x_{+}=\max (x, 0)\), there exist \(\theta _{i}\in \mathbb{R}\) (\(i=1, 2, 3\)) such that

$$ \sup _{f\in B^{s}_{r,q}(\mathbb{R},L)}\mathbf{E} \bigl\Vert \hat{f}^{\mathrm{non}} _{n}-f \bigr\Vert _{p}^{p} \lesssim \textstyle\begin{cases} (\ln n)^{{\theta }_{1}}n^{-\frac{sp}{2s+1}} , & \frac{p}{2s+1}< r< p, \\ (\ln n)^{\theta _{2}}(\frac{\ln n}{n})^{\frac{s'p}{2(s-1/r)+1}} , & r= \frac{p}{2s+1}, \\ (\ln n)^{\theta _{3}}(\frac{\ln n}{n})^{\frac{s'p}{2(s-1/r)+1}}, & r< \frac{p}{2s+1}. \end{cases} $$
(19)

Proof

Clearly,

$$ \hat{f}_{n}^{\mathrm{non}}-f=\bigl(\hat{f}^{\mathrm{lin}}_{n}-P_{{j_{0}}}f \bigr)+(P_{j_{1}+1}f-f)+ \sum_{j=j_{0}}^{j_{1}} \sum_{k\in K_{j}}\bigl(\hat{\beta } ^{*}_{jk}- \beta _{jk}\bigr)\psi _{jk}. $$

Then

$$ \mathbf{E} \bigl\Vert \hat{f}_{N}^{\mathrm{non}}-f^{X} \bigr\Vert ^{p}_{p}\lesssim T_{1}+T_{2}+T _{3}, $$
(20)

where \(T_{1}:=\mathbf{E}\|\hat{f}^{\mathrm{lin}}_{n}-P_{{j_{0}}}f\|^{p}_{p}\), \(T _{2}:=\|P_{j_{1}+1}f-f\|^{P}_{p}\) and \(T_{3}:=\mathbf{E}\|\sum_{j=j_{0}}^{j_{1}}\sum_{k\in K_{j}}(\hat{\beta }^{*} _{jk}-\beta _{jk})\psi _{jk}\|^{p}_{p}\). By (13) and (14),

$$ T_{1}\lesssim \biggl(\frac{2^{{j_{0}}}}{n} \biggr)^{\frac{p}{2}} \quad \text{and} \quad T_{2}\lesssim 2^{-j_{1}s'p}. $$
(21)

For estimating \(T_{3}\), one uses Minkowski and Jensen’s inequalities to get

$$ \Biggl\Vert \sum_{j=j_{0}}^{j_{1}}\sum _{k\in K_{j}}\bigl(\hat{\beta } ^{*}_{j,k}-\beta _{j,k}\bigr)\psi _{j,k} \Biggr\Vert ^{p}_{p} \leq (j_{1}-j_{0}+1)^{p-1} \sum _{j=j_{0}}^{j_{1}} \biggl\Vert \sum _{k\in K_{j}}\bigl(\hat{\beta }^{*}_{j,k}- \beta _{j,k}\bigr)\psi _{j,k} \biggr\Vert ^{p}_{p}. $$

This, together with Theorem 1.2, leads to

$$ T_{3}\leq (j_{1}-j_{0}+1)^{p-1} \mathbf{E}\sum_{j=j_{0}}^{j_{1}}2^{j( \frac{p}{2}-1)} \biggl(\sum_{k\in K_{j}} \bigl\vert \hat{\beta }^{*}_{j,k}-\beta _{j,k} \bigr\vert ^{p} \biggr). $$

Since \(\hat{\beta }^{*}_{j,k}=\delta ^{H}(\hat{\beta }_{j,k},\lambda )\),

$$\begin{aligned} \bigl\vert \hat{\beta }^{*}_{j,k}-\beta _{j,k} \bigr\vert ^{p} =& \vert \hat{\beta }_{j,k}-\beta _{j,k} \vert ^{p}[\mathcal{X}_{\{ \vert \hat{\beta }_{j,k} \vert >\lambda , \vert \beta _{j,k} \vert < \frac{\lambda }{2}\}} + \mathcal{X}_{\{ \vert \hat{\beta }_{j,k} \vert >\lambda , \vert \beta _{j,k} \vert \geq \frac{\lambda }{2}\}}] \\ &{}+ \vert \beta _{j,k} \vert ^{p}[\mathcal{X}_{\{ \vert \hat{\beta }_{j,k} \vert \leq \lambda , \vert \beta _{j,k} \vert >2\lambda \}} +\mathcal{X}_{\{ \vert \hat{\beta }_{j,k} \vert \leq \lambda , \vert \beta _{j,k} \vert \leq 2\lambda \}}]. \end{aligned}$$

Therefore,

$$\begin{aligned} T_{3} \lesssim & (j_{1}-j_{0}+1)^{p-1} \Biggl\{ \mathbf{E} \sum_{j=j _{0}}^{j_{1}}2^{j(\frac{p}{2}-1)} \sum_{k\in K_{j}} \vert \hat{\beta }_{j,k}-\beta _{j,k} \vert ^{p}[\mathcal{X}_{\{ \vert \hat{\beta }_{j,k} \vert > \lambda , \vert \beta _{j,k} \vert < \frac{\lambda }{2}\}} \\ &{}+\mathcal{X}_{\{ \vert \hat{\beta }_{j,k} \vert >\lambda , \vert \beta _{j,k} \vert \geq \frac{ \lambda }{2}\}}]+\mathbf{E}\sum_{j=j_{0}}^{j_{1}}2^{j( \frac{p}{2}-1)} \sum_{k\in K_{j}} \vert \beta _{j,k} \vert ^{p}[\mathcal{X} _{\{ \vert \hat{\beta }_{j,k} \vert \leq \lambda , \vert \beta _{j,k} \vert > 2\lambda \}} \\ &{}+\mathcal{X}_{\{ \vert \hat{\beta }_{j,k} \vert \leq \lambda , \vert \beta _{j,k} \vert \leq 2\lambda \}}]\Biggr\} . \end{aligned}$$
(22)

When \(|\hat{\beta }_{jk}|>\lambda \) and \(|\beta _{jk}|< \frac{\lambda }{2}\), \(|\hat{\beta }_{jk}-\beta _{jk}|\geq |\hat{\beta }_{jk}|-|\beta _{jk}|> \frac{\lambda }{2}\), one has

$$ I_{\{|\hat{\beta }_{jk}|>\lambda ,|\beta _{jk}| < \frac{\lambda }{2}\}} \leq I_{\{|\hat{\beta }_{jk}-\beta _{jk}| >\frac{\lambda }{2}\}}. $$

Similarly, when \(|\hat{\beta }_{jk}|\leq \lambda \) and \(|\beta _{jk}|> 2\lambda \), \(|\hat{\beta }_{jk}|\leq \lambda <\frac{|\beta _{jk}|}{2}\). Hence,

$$ \vert \hat{\beta }_{jk}-\beta _{jk} \vert \geq \vert \beta _{jk} \vert - \vert \hat{\beta }_{jk} \vert > \frac{ \vert \beta _{jk} \vert }{2}> \lambda \quad \text{and} \quad \vert \beta _{jk} \vert < 2 \vert \hat{\beta }_{jk}-\beta _{jk} \vert . $$

Furthermore,

$$ \vert \beta _{jk} \vert ^{p}I_{\{ \vert \hat{\beta }_{jk} \vert \leq \lambda , \vert \beta _{jk} \vert > 2 \lambda \}}\lesssim \vert \hat{\beta }_{jk}-\beta _{jk} \vert ^{p} I_{\{ \vert \hat{\beta }_{jk}-\beta _{jk} \vert >\frac{\lambda }{2}\}}. $$

Then (22) reduces to

$$ T_{3}\lesssim T_{31}+T_{32}+T_{33}, $$

where

$$\begin{aligned}& T_{31}:=(j_{1}-j_{0}+1)^{p-1}\mathbf{E} \sum_{j=j_{0}}^{j_{1}}2^{j( \frac{p}{2}-1)}\sum _{k\in K_{j}} \vert \hat{\beta }_{j,k}-\beta _{j,k} \vert ^{p} \mathcal{X}_{\{ \vert \hat{\beta }_{j,k}-\beta _{j,k} \vert >\frac{\lambda }{2}\}}, \\& T_{32}:=(j_{1}-j_{0}+1)^{p-1}\mathbf{E} \sum_{j=j_{0}}^{j_{1}}2^{j( \frac{p}{2}-1)}\sum _{k\in K_{j}} \vert \hat{\beta }_{j,k}-\beta _{j,k} \vert ^{p} \mathcal{X}_{\{ \vert \beta _{j,k} \vert \geq \frac{\lambda }{2}\}} \end{aligned}$$

and \(T_{33}:=(j_{1}-j_{0}+1)^{p-1} \sum_{j=j_{0}}^{j_{1}}2^{j( \frac{p}{2}-1)}\sum_{k\in K_{j}}|\beta _{j,k}|^{p}\mathcal{X} _{\{|\beta _{j,k}| \leq 2\lambda \}}\).

In order to estimate \(T_{31}\), first one assumes \(\frac{1}{q}+ \frac{1}{q'}=1\). Then Jensen’s inequality shows that

$$\begin{aligned} T_{31} \leq &(j_{1}-j_{0}+1)^{p-1}\sum _{j=j_{0}}^{j_{1}}2^{j( \frac{p}{2}-1)}\sum _{k\in K_{j}}\bigl[\mathbf{E} \vert \hat{\beta }_{j,k}- \beta _{j,k} \vert ^{qp}\bigr]^{\frac{1}{q}} \bigl[ \mathbf{E}(\mathcal{X}_{\{ \vert \hat{\beta }_{j,k}-\beta _{j,k} \vert > \frac{\lambda }{2}\}})^{q'}\bigr]^{ \frac{1}{q'}} \\ \leq & (j_{1}-j_{0}+1)^{p-1}\sum _{j=j_{0}}^{j_{1}}2^{j( \frac{p}{2}-1)}\sum _{k\in K_{j}}\bigl(\mathbf{E} \vert \hat{\beta }_{j,k}- \beta _{j,k} \vert ^{qp}\bigr)^{\frac{1}{q}} \biggl[P\biggl( \vert \hat{\beta }_{j,k}-\beta _{j,k} \vert > \frac{ \lambda }{2}\biggr)\biggr]^{\frac{1}{q'}}. \end{aligned}$$

This, together with Lemmas 3.1 and 3.3, leads to

$$\begin{aligned} T_{31} \lesssim & (j_{1}-j_{0}+1)^{p-1} \sum_{j=j_{0}}^{j_{1}}2^{j( \frac{p}{2}-1)}2^{j}n^{-\frac{p}{2}}2^{-\frac{\omega j}{q'}}=(j_{1}-j _{0}+1)^{p-1}n^{-\frac{p}{2}}\sum _{j=j_{0}}^{j_{1}}2^{j( \frac{p}{2}-\frac{\omega }{q'})} \\ \lesssim & (j_{1}-j_{0}+1)^{p-1}n^{-\frac{p}{2}}2^{j_{0}(\frac{p}{2}-\frac{ \omega }{q'})} \leq (j_{1}-j_{0}+1)^{p-1}n^{-\frac{p}{2}}2^{\frac{j _{0}p}{2}} \end{aligned}$$
(23)

by choosing ω such that \(\frac{p}{2}<\frac{\omega }{q'}\).

It is easy to see that \(\|\beta _{j\cdot }\|_{r}\lesssim 2^{-j(s+ \frac{1}{2}-\frac{1}{r})}\) thanks to Theorem 1.1. Combining this with Lemma 3.1 and \(\mathcal{X}_{\{|\beta _{j,k}|\geq \frac{\lambda }{2}\}} \leq (\frac{|\beta _{j,k}|}{\frac{\lambda }{2}})^{r}\), one has

$$\begin{aligned} T_{32} \lesssim &(j_{1}-j_{0}+1)^{p-1} \sum_{j=j_{0}}^{j_{1}} 2^{j( \frac{p}{2}-1)} \sum _{k\in K_{j}}n^{-\frac{p}{2}} \biggl\vert \frac{\beta _{j,k}}{\frac{ \lambda }{2}} \biggr\vert ^{r} \\ \lesssim & (j_{1}-j_{0}+1)^{p-1}n^{-\frac{p}{2}} \sum_{j=j_{0}} ^{j_{1}} \lambda ^{-r}2^{j(\frac{p-r}{2}-rs)}. \end{aligned}$$
(24)

Similarly, it can be shown that

$$\begin{aligned} T_{33} \leq & (j_{1}-j_{0}+1)^{p-1} \sum_{j=j_{0}}^{j_{1}}2^{j( \frac{p}{2}-1)}\sum _{k\in K_{j}} \vert \beta _{j,k} \vert ^{p} \biggl(\frac{2 \lambda }{ \vert \beta _{j,k} \vert }\biggr)^{p-r} \\ \lesssim &(j_{1}-j_{0}+1)^{p-1}\sum _{j=j_{0}}^{j_{1}}{\lambda }^{p-r}2^{j(\frac{p-r}{2}-rs)} \end{aligned}$$
(25)

due to \(r< p\) and \(\mathcal{X}_{\{|\beta _{j,k}|\leq 2\lambda \}}\leq (\frac{2 \lambda }{|\beta _{j,k}|})^{p-r}\).

Take

$$ 2^{j_{0}}\sim \textstyle\begin{cases} [(\ln n)^{\frac{p-r}{r}}n]^{\frac{1}{2s+1}}, & r>\frac{p}{2s+1}, \\ n^{\frac{1-2/p}{2(s-1/r)+1}} , & r\leq \frac{p}{2s+1}, \end{cases}\displaystyle \quad \text{and}\quad 2^{j_{1}} \sim \textstyle\begin{cases} n^{\frac{s}{s'(2s+1)}}, & r>\frac{p}{2s+1}, \\ (n/\ln n)^{\frac{1}{2(s-1/r)+1}} , & r\leq \frac{p}{2s+1}. \end{cases} $$
(26)

Then \(j_{0}< j_{1}\), \(j_{1}-j_{0}\sim \ln n\) and for \(j_{0}\leq j\leq j _{1}\), \(\lambda :=c\sqrt{\frac{j}{n}}\sim c\sqrt{\frac{\ln n}{n}}\). Moreover, (24) and (25) reduce to

$$ T_{32}\lesssim (j_{1}-j_{0}+1)^{p-1}n^{\frac{r-p}{2}}( \ln n)^{- \frac{r}{2}} \bigl[2^{j_{0}\xi }\mathcal{X}_{\{\xi < 0\}} +(j_{1}-j_{0}+1) \mathcal{X}_{\{\xi =0\}}+2^{j_{1}\xi } \mathcal{X}_{\{\xi >0\}}\bigr] $$
(27)

and

$$ T_{33}\lesssim (j_{1}-j_{0}+1)^{p-1} \biggl(\frac{\ln n}{n}\biggr)^{\frac{p-r}{2}}\bigl[2^{j _{0}\xi } \mathcal{X}_{\{\xi < 0\}}+ (j_{1}-j_{0}+1) \mathcal{X}_{\{ \xi =0\}}+2^{j_{1}\xi }\mathcal{X}_{\{\xi >0\}}\bigr], $$
(28)

where \(\xi =\frac{p-r}{2}-rs\).

Note that \(\xi \geq 0\) holds if and only if \(r\leq \frac{p}{2s+1}\). Then substituting (26) into (23), (27) and (28), one obtains

$$ T_{3}\lesssim T_{31}+T_{32}+T_{33} \lesssim \textstyle\begin{cases} (\ln n)^{{\theta }_{1}}n^{-\frac{sp}{2s+1}} , & \frac{p}{2s+1}< r< p, \\ (\ln n)^{\theta _{2}}(\frac{\ln n}{n})^{\frac{s'p}{2(s-1/r)+1}} , & r= \frac{p}{2s+1}, \\ (\ln n)^{\theta _{3}}(\frac{\ln n}{n})^{\frac{s'p}{2(s-1/r)+1}}, & r< \frac{p}{2s+1}. \end{cases} $$
(29)

Similarly, it is easy to check that

$$ T_{1}+T_{2}\lesssim \textstyle\begin{cases} n^{-\frac{sp}{2s+1}} , & \frac{p}{2s+1}< r< p, \\ (\frac{\ln n}{n})^{\frac{s'p}{2(s-1/r)+1}}, & r\leq \frac{p}{2s+1} \end{cases} $$
(30)

by (26) and (21). Finally, the desired conclusion (19) follows from (20), (29), and (30). □

Remark 3.1

From Theorems 2.1 and 3.1, we easily found that our results are consistent with those in [6] for independent samples.

Remark 3.2

In [7], Doosti and Chaubey provided a convergence rate of \(n^{-\frac{s'p}{2s'+1}}\) for ND samples, which is a little weaker than \(n^{-\frac{sp}{2s+1}}\) in Theorem 3.1 for \(r< p\) (note that \(s< s'\) when \(r< p\)).