1 Introduction

In practice, it usually happens that drawing a direct sample from a random variable X is impossible. In this paper, we consider the problem of estimating the density functions f X (x) without observing directly the i.i.d. sample X 1 , X 2 ,, X n . We observe the samples Y 1 , Y 2 ,, Y n from biased data with the following density function:

f Y (x)= g ( x ) f X ( x ) μ ,
(1.1)

where g(x) is the so-called weight or bias function, μ=E(g(X)). The purpose of this paper is to estimate the density function f X (x) from the samples Y 1 , Y 2 ,, Y n .

Several examples of this biased data can be found in the literature. For instance, in paper [1], it is shown that the distribution of the concentration of alcohol in the blood of intoxicated drivers is of interest, since the drunken driver has a larger chance of being arrested, the collected data are size-biased.

The density estimation problem for biased data (1.1) has been discussed in several papers. In 1982, Vardi [2] considered the nonparametric maximum likelihood estimation for f X (x). In 1991, Jones [3] discussed the mean squared error properties of the kernel density estimation. In 2004, Efromovich [4] developed the Efromovich-Pinsker adaptive Fourier estimator. It was based on a blockwise shrinkage algorithm and achieved the minimax rate of convergence under the L 2 risk over a Besov class B 2 , 2 s .

In 2010, Ramírez and Vidakovic [5] proposed a linear wavelet estimator and discussed the consistency of a function in L 2 [0,1] under the mean integrated squared error (MISE) sense. But the wavelet estimator in paper [5] contained the unknown parameter μ. In the same year, Christophe [6] constructed a nonlinear wavelet estimator and evaluated the L p risk in the Besov space B r , q s . However, Sobolev spaces W r N (N N + ) except r=2 is not a special case in the Besov space B r , q s .

In this paper, we consider the nonlinear hard thresholding wavelet density estimation for biased data in Sobolev spaces W r N (N N + ). We mainly give the upper bound of minimax rate of convergence under the L p risk without particular restriction on the parameters r and p, and the convergence rate is optimal.

2 Preliminaries

In this section, we shall recall some well-known concepts and lemmas.

2.1 Wavelets

In this paper, we always assume that the scaling wavelet φ is orthonormal, compactly supported and N+1 regular.

Definition 2.1 The scaling function φ(x) is called m regular if φ(x) has continuous derivatives of order m and its corresponding wavelet ψ(x) has vanishing moments of order m, i.e., x k ψ(x)dx=0, k=0,1,,m1.

The following conditions about the scaling function φ and the kernel function K(x,y) will be very useful in the third section.

Condition (θ)

The function θ φ (x)= k Z |φ(xk)| is such that ess sup x R θ φ (x)<.

Condition H(N)

There exists an integrable function F(x) such that for any x,yR, |K(x,y)|F(xy), where | x | N F(x)dx<.

Condition M(N)

Condition H(N) is satisfied and K(x,y) ( y x ) k dy= δ 0 k , k=0,,N, xR.

For any xR, j,kZ, denoted by φ j k (x):= 2 j 2 φ( 2 j xk), ψ j k (x):= 2 j 2 ψ( 2 j xk), then for any f(x) L r (R):={f(x)| R | f ( x ) | r dx<}, where 1r<, we have the following equation [7]:

f(x)= k Z α J , k φ J , k (x)+ j J k Z β j , k ψ j , k (x),a.e.,
(2.1)

where

α J , k = R f(x) φ J , k (x)dx, β j , k = R f(x) ψ j , k (x)dx.

2.2 Sobolev space

The Sobolev space W r N (R) (N N + ) is defined by W r N (R):={f:f L r (R), f ( N ) L r (R)}, which is equipped with the norm f W r N := f r + f ( N ) r . The Sobolev balls W ˜ r N (A,L) are defined as follows:

W ˜ r N ( A , L ) : = { f W r N ( R ) : f  is a probability density function , supp f A , f ( N ) r L } .

Between a Sobolev space and a Besov space, the following embedding conclusions are established.

Lemma 2.1 [8]

Let s>0, 1p,q,r, then

  1. (i)

    W r N B r N B N 1 / r , N>1/r;

  2. (ii)

    B r q s B p q s , r<p, s =s1/r+1/p,

where AB denotes that the Banach space A is continuously embedding in the Banach space B, i.e., there exists a constant c0 such that for any uA, we have u B c u A .

2.3 Auxiliary lemmas

The following lemmas given by [9] will be used in the next section.

Lemma 2.2 If the scaling function φ satisfies Condition (θ), then for any sequence { λ k } k Z satisfying λ l p := ( k | λ k | p ) 1 p <, we have C 1 λ l p 2 ( j 2 j p ) k λ k φ j , k p C 2 λ l p 2 ( j 2 j p ) , where C 1 = ( θ φ 1 p φ 1 1 q ) 1 , C 2 = ( θ φ 1 q φ 1 1 p ) 1 , 1p, 1 p + 1 q =1.

Lemma 2.3 For some integer N0, if the kernel function K(x,y) satisfies Conditions M(N) and H(N+1), f B p q s (R), where 1p,q, 0<s<N+1, then we have K j f f p = 2 j s ε j , where ε j l q .

Lemma 2.4 (Rosenthal inequality)

Let X 1 ,, X n be independent random variables such that E( X i )=0 and E( | X i | p )<, then there exists a constant C(p)>0 such that

E ( | i = 1 n X i | p ) C ( p ) ( i = 1 n E ( | X i | p ) + ( i = 1 n E ( X i 2 ) ) p / 2 ) , p > 2 , E ( | i = 1 n X i | p ) ( i = 1 n E ( X i 2 ) ) p / 2 , 0 < p 2 .

Lemma 2.5 (Bernstein inequality)

Let X 1 , X 2 ,, X n be independent random variables such that E( X i )=0, E( X i 2 ) σ 2 , | X i |M<. Then

P ( | 1 n i = 1 n X i | > λ ) 2exp ( n λ 2 2 ( σ 2 + M λ / 3 ) ) ,λ>0.

Remark In this paper, we often use the notation AB to indicate that AcB with a positive constant c, which is independent of A and B. If AB and BA, we write AB.

3 Main results

In this paper, our hard thresholding wavelet density estimator is defined as follows:

f ˆ n X non (x)= k α ˆ j 0 k φ j 0 k (x)+ j = j 0 j 1 k β ˆ j k ψ j k (x),
(3.1)

where

α ˆ j 0 k := μ ˆ n i = 1 n φ j 0 k ( Y i ) g ( Y i ) , β ˆ j k := μ ˆ n i = 1 n ψ j k ( Y i ) g ( Y i ) , μ ˆ := n i = 1 n 1 g ( Y i ) .

The hard thresholding wavelet coefficients are β ˆ j k := β ˆ j k I{| β ˆ j k |λ}, where

I { | β ˆ j k | λ } :={ 1 , | β ˆ j k | λ , 0 , | β ˆ j k | < λ .

Suppose that the parameters j 0 , j 1 , λ of the wavelet thresholding estimator (3.1) satisfy the assumptions:

2 j 0 { ( ( ln n ) p r r n ) 1 2 N + 1 , r > p 2 N + 1 , n 1 2 / p 2 ( N 1 / r ) + 1 , r p 2 N + 1 ,
(3.2)
2 j 1 { n N N ( 2 N + 1 ) , r > p 2 N + 1 , ( n ln n ) 1 2 ( N 1 / r ) + 1 , r p 2 N + 1 ,
(3.3)
λ=c j n ,
(3.4)

where c is a suitably chosen positive constant.

Lemma 3.1 Suppose that there exist two constants g 1 and g 2 such that 0< g 1 g(x) g 2 < for xR. Let α j k , β j k be the coefficients in the expansion (2.1) and let α ˆ j k , β ˆ j k be defined by estimator in (3.1). If 2 j n, then for any 1p<, we have

  1. (i)

    E | α j k α ˆ j k | p n p 2 ;

  2. (ii)

    E | β j k β ˆ j k | p n p 2 .

Proof (i) From the definition of α ˆ j k and the triangular inequality, we have

| α ˆ j , k α j , k | = | μ ˆ n i = 1 n φ j , k ( Y i ) g ( Y i ) α j , k | = | μ ˆ μ ( μ n i = 1 n φ j , k ( Y i ) g ( Y i ) α j , k ) + μ ˆ α j , k ( 1 μ 1 μ ˆ ) | | μ ˆ μ | | μ n i = 1 n φ j , k ( Y i ) g ( Y i ) α j , k | + | μ ˆ α j , k | | 1 μ ˆ 1 μ | .

Since g 1 g(y) g 2 , we have

μ ˆ = n i = 1 n 1 g ( Y i ) g 2 ,μ=Eg(X) g 1 ,

and

| α j , k | | f X ( y ) | | φ j , k ( y ) | dy ( | f X ( y ) | 2 d y ) 1 2 ( | φ j , k ( y ) | 2 d y ) 1 2 A 1 / 2 f .

Furthermore, a Sobolev space and a Besov space have the following embedding theorem, W r N B r N B N 1 / r , for any integer N>1/r, then we have f f B N 1 / r f W r N =c. Therefore, by the convexity inequality, we get

E | α ˆ j , k α j , k | p E ( g 2 g 1 | μ n i = 1 n φ j , k ( Y i ) g ( Y i ) α j , k | + g 2 A 1 / 2 c | 1 μ ˆ 1 μ | ) p 2 p 1 max { g 2 g 1 , g 2 A 1 / 2 c } p E ( | μ n i = 1 n φ j , k ( Y i ) g ( Y i ) α j , k | p + | 1 μ ˆ 1 μ | p ) E | μ n i = 1 n φ j , k ( Y i ) g ( Y i ) α j , k | p + E | 1 μ ˆ 1 μ | p = : T 1 + T 2 ,

where T 1 :=E | μ n i = 1 n φ j , k ( Y i ) g ( Y i ) α j , k | p , T 2 :=E | 1 μ ˆ 1 μ | p .

The term T i is estimated as follows. Firstly, let ξ i :=μ φ j , k ( Y i ) g ( Y i ) α j , k , we can see that they are i.i.d., and E( ξ i )=0. Moreover, for any m2,

E | ξ i | m = E | μ φ j , k ( Y i ) g ( Y i ) α j , k | m 2 m 1 ( E | μ φ j , k ( Y i ) g ( Y i ) | m + | α j , k | m ) ,

where

E | μ φ j , k ( Y i ) g ( Y i ) | m = μ m 1 φ j , k m 2 ( Y i ) g ( Y i ) m 1 E | μ φ j , k 2 ( Y i ) g ( Y i ) | g 2 m 1 g 1 m + 1 2 j 2 ( m 2 ) φ m 2 E | μ φ j , k 2 ( Y i ) g ( Y i ) | ,

and

E | μ φ j , k 2 ( Y i ) g ( Y i ) | = μ φ j , k 2 ( y ) g ( y ) f Y ( y ) d y = μ φ j , k 2 ( y ) g ( y ) g ( y ) f X ( y ) μ d y f f W r N = c .

So, we have

E | μ φ j , k ( Y i ) g ( Y i ) | m g 2 m 1 g 1 m + 1 2 j 2 ( m 2 ) φ m 2 c.

Since 2 j n, we obtain

E | ξ i | m C 2 j ( m 2 ) / 2 n ( m 2 ) / 2 .

By Rosenthal’s inequality, we have

T 1 =E | 1 n i = 1 n ξ i | p n p ( n E | ξ i | p + n p / 2 ( E | ξ i | 2 ) p / 2 ) n p / 2 .
(3.5)

To estimate the term T 2 , let η i = 1 g ( Y i ) 1 μ . We can compute E( η i )=0 easily, and for any m2, E | η i | m C.

If p2, i.e., 1p<p/2, using Rosenthal’s inequality, we have

T 2 =E | 1 n i = 1 n η i | p n p ( n E | η i | p + n p / 2 ( E | η i | 2 ) p / 2 ) n p + 1 + n p / 2 n p / 2 .
(3.6)

If 1p<2, we get

T 2 =E | 1 n i = 1 n η i | p n p ( n p / 2 ( E | η i | 2 ) p / 2 ) n p / 2 .
(3.7)

By (3.5), (3.6) and (3.7), we obtain

E | α ˆ j , k α j , k | p T 1 + T 2 n p / 2 .
  1. (ii)

    It is similar to (i), we omit it. □

Lemma 3.2 If j 2 j n, then for any ω>0, there exists a constant c>0 such that

P ( | β ˆ j k β j k | > λ = c j n ) 2 ω j .
(3.8)

Proof We can easily get

μ ˆ g 2 , μ g 1 , 1 μ g 1 1 , | β j , k | A 1 / 2 f A 1 / 2 f W r N .

Therefore,

| β ˆ j , k β j , k | = | μ ˆ μ ( μ n i = 1 n ψ j , k ( Y i ) g ( Y i ) β j , k ) + μ ˆ β j , k ( 1 μ 1 μ ˆ ) | g 2 g 1 | 1 n i = 1 n ( μ ψ j , k ( Y i ) g ( Y i ) β j , k ) | + g 2 A 1 / 2 f X W r N | 1 n i = 1 n ( 1 g ( Y i ) 1 μ ) | = : g 2 g 1 | 1 n i = 1 n ξ i | + g 2 A 1 / 2 f X W r N | 1 n i = 1 n η i | ,

where ξ i =μ ψ j , k ( Y i ) g ( Y i ) β j , k , η i = 1 g ( Y i ) 1 μ . So, we get

P ( | β ˆ j , k β j , k | > λ ) P ( g 2 g 1 | 1 n i = 1 n ξ i | + g 2 A 1 / 2 f X W r N | 1 n i = 1 n η i | > λ ) P ( g 2 g 1 | 1 n i = 1 n ξ i | > λ / 2 ) + P ( g 2 A 1 / 2 f X W r N | 1 n i = 1 n η i | > λ / 2 ) = P ( | 1 n i = 1 n ξ i | > λ g 1 2 g 2 ) + P ( | 1 n i = 1 n η i | > λ 2 g 2 A 1 / 2 f X W r N ) = : P 1 + P 2 ,

where P 1 :=P(| 1 n i = 1 n ξ i |> λ g 1 2 g 2 ), P 2 :=P(| 1 n i = 1 n η i |> λ 2 g 2 A 1 / 2 f X W r N ).

Now, we estimate P 1 . Clearly, E ξ i =0, and

E ξ i 2 = E ( μ ψ j , k ( Y i ) g ( Y i ) β j , k ) 2 2 ( E | μ ψ j , k ( Y i ) g ( Y i ) | 2 + β j , k 2 ) = 2 ( E | μ ψ j , k 2 ( Y i ) g ( Y i ) | μ g ( Y i ) + β j , k 2 ) 2 ( g 2 g 1 f X W r N + A f X W r N 2 ) : = σ 2 .

Furthermore, we have

| ξ i |= | μ ψ j , k ( Y i ) g ( Y i ) E ( μ ψ j , k ( Y i ) g ( Y i ) ) | 2 2 j / 2 g 2 g 1 1 ψ .

By Bernstein’s inequality, we obtain

P ( | 1 n i = 1 n ξ i | > λ g 1 2 g 2 ) 2 exp ( n λ 2 g 1 2 / 4 g 2 2 2 ( σ 2 + λ g 1 2 g 2 2 2 j / 2 g 2 g 1 1 ψ / 3 ) ) = 2 exp ( n c 2 j n g 1 2 / 4 g 2 2 2 ( σ 2 + g 2 c j / n 2 j / 2 g 2 g 1 1 ψ g 1 / 3 ) ) = 2 exp ( c 2 j g 1 2 / 4 g 2 2 2 ( σ 2 + j 2 j / n ψ c / 3 ) ) .

Since j 2 j n, then

P ( | 1 n i = 1 n ξ i | > λ g 1 2 g 2 ) 2 exp ( c 2 j g 1 2 / 4 g 2 2 2 ( σ 2 + ψ c / 3 ) ) = 2 exp ( c 2 g 1 2 / 4 g 2 2 2 ( σ 2 + ψ c / 3 ) j ) .

Taking c 1 >0 such that c 1 2 g 1 2 / 4 g 2 2 2 ( σ 2 + ψ c 1 / 3 ) ω, then

P 1 =P ( | 1 n i = 1 n ξ i | > λ g 1 2 g 2 ) 2 e ω j 2 ω j .
(3.9)

Next, we estimate P 2 . We compute that E η i =0, i.e.,

E η i = E ( 1 g ( Y i ) ) E ( 1 μ ) = 1 g ( y ) f Y ( y ) d y 1 μ = 1 g ( y ) g ( y ) f X ( y ) μ d y 1 μ = 1 μ f X ( y ) d y 1 μ = 0 ,

and

E η i 2 = E ( 1 g ( Y i ) 1 μ ) 2 2 ( E | 1 g ( Y i ) | 2 + 1 μ 2 ) 4 g 1 2 , | η i | = | 1 g ( Y i ) 1 μ | = | 1 g ( Y i ) E 1 g ( Y i ) | 2 g 1 1 .

By Bernstein’s inequality, we obtain

P ( | 1 n i = 1 n η i | > λ 2 g 2 A 1 / 2 f X W r N ) 2 exp ( n ( λ / ( 2 g 2 A 1 / 2 f X W r N ) ) 2 2 ( 4 g 1 2 + λ g 1 1 / ( 3 g 2 A 1 / 2 f X W r N ) ) ) = 2 exp ( n c 2 j / ( 4 n g 2 2 A f X W r N 2 ) 2 ( 4 g 1 2 + c j / n / ( 3 g 1 g 2 A 1 / 2 f X W r N ) ) ) .

Since jn, then

P ( | 1 n i = 1 n η i | > λ 2 g 2 A 1 / 2 f X W r N ) 2 exp ( c 2 / ( 4 g 2 2 A f X W r N 2 ) 2 ( 4 g 1 2 + c / ( 3 g 1 g 2 A 1 / 2 f X W r N ) ) j ) .

Taking c 2 >0 such that c 2 2 / ( 4 g 2 2 A f X W r N 2 ) 2 ( 4 g 1 2 + c 2 / ( 3 g 1 g 2 A 1 / 2 f X W r N ) ) ω, we have

P 2 =P ( | 1 n i = 1 n η i | > λ 2 g 2 A 1 / 2 f X W r N ) 2 e ω j 2 ω j .
(3.10)

Taking c=max{ c 1 , c 2 }, by (3.9) and (3.10), we have

P ( | β ˆ j , k β j , k | > λ ) P 1 + P 2 2 ω j .

 □

Lemma 3.3 Suppose that there exist two constants g 1 and g 2 such that 0< g 1 g(x) g 2 <, for xR, and β ˆ j k , β ˆ j k are given by (3.1). Then

E j = j 0 j 1 k ( β ˆ j k β j k ) ψ j k p { ( ln n ) c 3 n N 2 N + 1 , r > p 2 N + 1 , ( ln n ) c 4 ( ln n n ) N 2 ( N 1 / r ) + 1 , r = p 2 N + 1 , ( ln n n ) N 2 ( N 1 / r ) + 1 , r < p 2 N + 1 ,

where c 3 , c 4 are constants.

Proof By Lemma 2.2, we obtain

E j = j 0 j 1 k ( β ˆ j k β j k ) ψ j k p E j = j 0 j 1 k ( β ˆ j k β j k ) ψ j k p E j = j 0 j 1 2 j ( 1 2 1 p ) ( k | β ˆ j k β j k | p ) 1 p .

Furthermore, since β ˆ j k = β ˆ j k I{| β ˆ j k |>λ}, we have

E j = j 0 j 1 k ( β ˆ j k β j k ) ψ j k p E ( j = j 0 j 1 2 j ( 1 2 1 p ) ( k | β ˆ j k β j k | p ) 1 p × ( I { | β ˆ j k | > λ , | β j k | λ 2 } + I { | β ˆ j k | > λ , | β j k | < λ 2 } ) + j = j 0 j 1 2 j ( 1 2 1 p ) ( k | β j k | p ) 1 p ( I { | β ˆ j k | λ , | β j k | 2 λ } + I { | β ˆ j k | λ , | β j k | > 2 λ } ) ) .

Note that

I { | β ˆ j k | > λ , | β j k | < λ 2 } I { | β ˆ j k β j k | > λ 2 } , I { | β ˆ j k | λ , | β j k | > 2 λ } I { | β ˆ j k β j k | > λ 2 } ,

and if | β ˆ j k |λ, | β j k |>2λ, we get | β ˆ j k β j k || β j k || β ˆ j k |> | β j k | 2 , i.e., | β j k |<2| β ˆ j k β j k |; therefore, we have

E j = j 0 j 1 k ( β ˆ j k β j k ) ψ j k p E j = j 0 j 1 2 j ( 1 2 1 p ) ( k | β ˆ j k β j k | p I { | β j k | λ 2 } ) 1 p + E j = j 0 j 1 2 j ( 1 2 1 p ) ( k | β ˆ j k β j k | p I { | β ˆ j k β j k | > λ 2 } ) 1 p + j = j 0 j 1 2 j ( 1 2 1 p ) ( k | β j k | p I { | β j k | 2 λ } ) 1 p = : W 1 + W 2 + W 3 ,

where

W 1 : = E j = j 0 j 1 2 j ( 1 2 1 p ) ( k | β ˆ j k β j k | p I { | β j k | λ 2 } ) 1 p , W 2 : = E j = j 0 j 1 2 j ( 1 2 1 p ) ( k | β ˆ j k β j k | p I { | β ˆ j k β j k | > λ 2 } ) 1 p , W 3 : = j = j 0 j 1 2 j ( 1 2 1 p ) ( k | β j k | p I { | β j k | 2 λ } ) 1 p .
  1. (i)

    Firstly, we estimate

    W 1 :=E j = j 0 j 1 2 j ( 1 2 1 p ) ( k | β ˆ j k β j k | p I { | β j k | λ 2 } ) 1 p .

By Lemma 3.1, we have

E | β ˆ j k β j k | p n p 2 .

Using I{| β j k | λ 2 } ( | β j k | λ 2 ) r and Jensen’s inequality, we obtain

W 1 = E j = j 0 j 1 2 j ( 1 2 1 p ) ( k | β ˆ j k β j k | p I { | β j k | λ 2 } ) 1 p j = j 0 j 1 2 j ( 1 2 1 p ) ( k E | β ˆ j k β j k | p I { | β j k | > λ 2 } ) 1 p j = j 0 j 1 2 j ( 1 2 1 p ) ( k n p 2 ( | β j k | λ / 2 ) r ) 1 p = j = j 0 j 1 2 j ( 1 2 1 p ) n 1 2 λ r p β j r r p .

By β j r 2 j ( N + 1 2 1 r ) and λc ln n n , we have

W 1 j = j 0 j 1 n 1 2 2 j ( 1 2 1 p ) 2 j ( N + 1 2 1 r ) r p ( n ln n ) r 2 p = n 1 2 ( n ln n ) r 2 p j = j 0 j 1 2 j ξ n r p 2 p ln n r 2 p ( 2 j 0 ξ I { ξ > 0 } + 2 j 1 ξ I { ξ < 0 } + ( j 1 j 0 + 1 ) I { ξ = 0 } ) .

Using Lemma 3.1 and (3.4), we obtain

W 1 { ( ln n ) c 3 n N 2 N + 1 , r > p 2 N + 1 , ( ln n ) c 4 ( ln n n ) N 2 ( N 1 / r ) + 1 , r = p 2 N + 1 , ( ln n n ) N 2 ( N 1 / r ) + 1 , r < p 2 N + 1 ,
(3.11)

where c 3 , c 4 are constants.

  1. (ii)

    For

    W 3 := j = j 0 j 1 2 j ( 1 2 1 p ) ( k | β j k | p I { | β j k | 2 λ } ) 1 p ,

let ξ:= 1 2 ( r p (2N+1)1). By I{| β j k |2λ} ( 2 λ | β j k | ) p r (r<p), we have

W 3 j = j 0 j 1 2 j ( 1 2 1 p ) ( k | β j k | p ( 2 λ | β j k | ) p r ) 1 p = j = j 0 j 1 2 j ( 1 2 1 p ) ( 2 λ ) p r p ( k | β j k | r ) 1 p = j = j 0 j 1 2 j ( 1 2 1 p ) ( 2 λ ) p r p β j r r p .

Since f X W r N (R), then β j r 2 j ( N + 1 2 1 r ) . Taking λ=c j n c ln n n , j 1 j 0 C(lnn), we have

W 3 j = j 0 j 1 2 j ( 1 2 1 p ) λ p r p 2 j ( N + 1 2 1 r ) r p ( ln n n ) p r 2 p j = j 0 j 1 2 j 1 2 [ r p ( 2 N + 1 ) 1 ] = ( ln n n ) p r 2 p j = j 0 j 1 2 j ξ ( ln n n ) p r 2 p ( 2 j 0 ξ I { ξ > 0 } + 2 j 1 ξ I { ξ < 0 } + ( j 1 j 0 + 1 ) I { ξ = 0 } ) .

Note that ξ>0 if and only if r> p 2 N + 1 . When ξ=0, i.e., p=r(2N+1), we can compute N 2 ( N 1 r ) + 1 = p r 2 p . Using (3.2), (3.3), we obtain

W 3 { ( ln n n ) p r 2 p 2 j 0 ξ = ( ln n ) p r 2 r ( 2 N + 1 ) n N 2 N + 1 , r > p 2 N + 1 , ( ln n n ) p r 2 p ( j 1 j 0 + 1 ) ( ln n n ) p r 2 p , r = p 2 N + 1 , ( ln n n ) p r 2 p 2 j 1 ξ = ( ln n n ) N 2 ( N 1 / r ) + 1 , r < p 2 N + 1 .
(3.12)
  1. (iii)

    Finally, we estimate

    W 2 :=E j = j 0 j 1 2 j ( 1 2 1 p ) ( k | β ˆ j k β j k | p I { | β ˆ j k β j k | > λ 2 } ) 1 p .

Let 1< q ,q<, and 1 q + 1 q =1. Using Jensen’s inequality and Hölder’s inequality, we have

W 2 = E j = j 0 j 1 2 j ( 1 2 1 p ) ( k | β ˆ j k β j k | p I { | β ˆ j k β j k | > λ 2 } ) 1 p j = j 0 j 1 2 j ( 1 2 1 p ) ( k E ( | β ˆ j k β j k | p I { | β ˆ j k β j k | > λ 2 } ) ) 1 p j = j 0 j 1 2 j ( 1 2 1 p ) ( k ( E | β ˆ j k β j k | q p ) 1 q ( E I q { | β ˆ j k β j k | > λ 2 } ) 1 q ) 1 p j = j 0 j 1 2 j ( 1 2 1 p ) ( k ( E | β ˆ j k β j k | q p ) 1 q ( P ( | β ˆ j k β j k | > λ 2 ) ) 1 q ) 1 p .

By Lemma 3.1 and Lemma 3.2, we obtain

W 2 j = j 0 j 1 2 j ( 1 2 1 p ) ( 2 j n p 2 2 ω j q ) 1 p = n 1 2 j = j 0 j 1 2 j ( 1 2 ω p q ) .

Taking large enough ω such that 1 2 < ω p q , we get

W 2 n 1 2 2 j 0 ( 1 2 ω p q ) n 1 2 j 0 .

Taking 2 j 0 as in (3.2), we have

W 2 { ( ln n ) p r 2 r ( 2 N + 1 ) n N 2 N + 1 , r > p 2 N + 1 , n N 2 ( N 1 / r ) + 1 , r p 2 N + 1 .
(3.13)

Putting (3.11), (3.12) and (3.13) together, we can obtain

E j = j 0 j 1 k ( β ˆ j k β j k ) ψ j k p { ( ln n ) c 3 n N 2 N + 1 , r > p 2 N + 1 , ( ln n ) c 4 ( ln n n ) N 2 ( N 1 / r ) + 1 , r = p 2 N + 1 , ( ln n n ) N 2 ( N 1 / r ) + 1 , r < p 2 N + 1 ,

where c 3 , c 4 are constants. □

Theorem 3.4 Let the scaling function φ(x) be orthonormal, compactly supported and N+1 regular. There exist two positive constants g 1 and g 2 such that g 1 g(x) g 2 , xR. If f ˆ n X non is the nonlinear wavelet estimator in (3.1), and assumptions (3.2), (3.3) and (3.4) are satisfied, then for any f X W ˜ r N (A,L), where 1r<p<, N> 1 r , we have

sup f X W ˜ r N ( A , L ) E f ˆ n X non f X p { ( ln n ) c 3 n N 2 N + 1 , r > p 2 N + 1 , ( ln n ) c 4 ( ln n n ) N 2 ( N 1 / r ) + 1 , r = p 2 N + 1 , ( ln n n ) N 2 ( N 1 / r ) + 1 , r < p 2 N + 1 ,

where N =N1/r+1/p, c 3 , c 4 are constants.

Proof By the definition of f ˆ n X non in (3.1) and the expansion of f X in (2.1), one has

f ˆ n X non f X = k ( α ˆ j 0 k α j 0 k ) φ j 0 k + j = j 0 j 1 k ( β ˆ j k β j k ) ψ j k + P j 1 + 1 f X f X .

Then

E f ˆ n X non f X p E k ( α ˆ j 0 k α j 0 k ) φ j 0 k p + E j = j 0 j 1 k ( β ˆ j k β j k ) ψ j k p + P j 1 + 1 f X f X p = : I 1 + I 2 + I 3 ,

where

I 1 : = E k ( α ˆ j 0 k α j 0 k ) φ j 0 k p , I 2 : = E j = j 0 j 1 k ( β ˆ j k β j k ) ψ j k p , I 3 : = P j 1 + 1 f X f X p .

Firstly, we estimate

I 1 :=E k ( α ˆ j 0 k α j 0 k ) φ j 0 k p .

By Lemma 2.2 and Jensen’s inequality,

I 1 2 j 0 ( 1 2 1 p ) E ( k | α ˆ j 0 k α j 0 k | p ) 1 p 2 j 0 ( 1 2 1 p ) ( k E | α ˆ j 0 k α j 0 k | p ) 1 p .

Since f X (x) and φ(x) are compactly supported, then the number of elements in {k: α j 0 k 0} is O( 2 j 0 ). By Lemma 3.1, we have E | α ˆ j 0 k α j 0 k | p n p 2 .

Therefore

I 1 2 j 0 ( 1 2 1 p ) ( 2 j 0 n p 2 ) 1 p = n 1 2 j 0 .

Using (3.2), we have

I 1 { ( ln n ) p r 2 r ( 2 N + 1 ) n N 2 N + 1 , r > p 2 N + 1 , n N 2 ( N 1 / r ) + 1 , r p 2 N + 1 ,
(3.14)

where N =N 1 r + 1 p .

Next, we estimate

I 3 := P j 1 + 1 f X f X p .

In reference [9], it turns out that if the scaling function φ(x) is orthonormal, compactly supported and N+1 regular, then the associated kernel function K(x,y):= k φ(xk)φ(yk) satisfies Conditions H(N+1) and M(N), and K j f(x)= P j f(x).

Since a Sobolev space and a Besov space have the following embedding theorem: W ˜ r N B ˜ r N B ˜ p N , where N =N 1 r + 1 p , then f X B ˜ p N . By Lemma 2.3, we have

P j 1 + 1 f X f X p 2 j 1 N .

Taking 2 j 1 as in (3.3), we have

I 3 { n N 2 N + 1 , r > p 2 N + 1 , ( ln n n ) N 2 ( N 1 / r ) + 1 , r p 2 N + 1 .
(3.15)

Finally, we estimate

I 2 :=E j = j 0 j 1 k ( β ˆ j k β j k ) ψ j k p .

Using Lemma 3.3, we obtain

E j = j 0 j 1 k ( β ˆ j k β j k ) ψ j k p { ( ln n ) c 3 n N 2 N + 1 , r > p 2 N + 1 , ( ln n ) c 4 ( ln n n ) N 2 ( N 1 / r ) + 1 , r = p 2 N + 1 , ( ln n n ) N 2 ( N 1 / r ) + 1 , r < p 2 N + 1 .
(3.16)

By (3.14), (3.15) and (3.16), we obtain

sup f X W ˜ r N ( A , L ) E f ˆ n X non f X p { ( ln n ) c 3 n N 2 N + 1 , r > p 2 N + 1 , ( ln n ) c 4 ( ln n n ) N 2 ( N 1 / r ) + 1 , r = p 2 N + 1 , ( ln n n ) N 2 ( N 1 / r ) + 1 , r < p 2 N + 1 .

 □

4 Optimality

Now, we discuss the optimality of the rates of convergence. Using similar techniques as those in reference [10], we can obtain the following lower bound theorem.

Theorem 4.1 Let the scaling function φ(x) be orthonormal, compactly supported and N+1 regular, f X W ˜ r N (A,L). If there exist two positive constants g 1 and g 2 such that g 1 g(x) g 2 , xR, then for any estimator f ˆ n X , we have

inf f ˆ n X sup f X W ˜ r N ( A , L ) E f ˆ n X f X p { n N 2 N + 1 , r > p 2 N + 1 , ( ln n n ) N 2 ( N 1 / r ) + 1 , r p 2 N + 1 ,

where 1r,p<, N> 1 r .

Remark The proof is very similar to that in reference [10], in which the author studied the lower bound of the convergence rates in Besov spaces for the samples without bias data.

According to Theorem 4.1, we can see that:

  1. (i)

    When r< p 2 N + 1 , our nonlinear estimator can attain the optimal rate.

  2. (ii)

    When r= p 2 N + 1 , our convergence rate and the optimal rate of convergence differ in a logarithmic. So, it is sub-optimal.

  3. (iii)

    When r> p 2 N + 1 , the logarithmic factor is an extra penalty for the chosen wavelet thresholding, our convergence rate is sub-optimal.