Asymptotics for $L_{1}$ -wavelet method for nonparametric regression

Zhou, Xingcai; Zhu, Fangxia

doi:10.1186/s13660-020-02483-w

Asymptotics for $L_{1}$-wavelet method for nonparametric regression

Research
Open access
Published: 03 September 2020

Volume 2020, article number 216, (2020)
Cite this article

Download PDF

You have full access to this open access article

Journal of Inequalities and Applications Submit manuscript

Asymptotics for $L_{1}$-wavelet method for nonparametric regression

Download PDF

Xingcai Zhou^1,2 &
Fangxia Zhu³

1189 Accesses
3 Citations
Explore all metrics

Abstract

Wavelets are particularly useful because of their natural adaptive ability to characterize data with intrinsically local properties. When the data contain outliers or come from a population with a heavy-tailed distribution, $L_{1}$-estimation should obtain a better fit. In this paper, we propose a $L_{1}$-wavelet method for nonparametric regression, and derive the asymptotic properties of the $L_{1}$-wavelet estimator, including the Bahadur representation, the rate of convergence and asymptotic normality. The rate of convergence of it is comparable with the optimal convergence rate of the nonparametric estimation in nonparametric models, and it does not require the continuously differentiable conditions of a nonparametric function.

Nonparametric regression with warped wavelets and strong mixing processes

Article 13 March 2021

Distributional Wavelet Transform

Article 29 January 2016

Wavelet estimation in time-varying coefficient models

Article 15 April 2019

1 Introduction

Consider the problem of estimating the underlying regression function from a set of noisy data. The nonparametric regression is an underlying framework. It has the following standard form:

$$ y_{i}=g(t_{i})+\epsilon _{i}, \quad i=1,\ldots ,n, $$

(1.1)

where the $y_{i}$ are the noisy samples of an unknown function $g(\cdot )$ defined on $[0,1]$, $\{t_{i}\}$ are non-random design points with $0\leq t_{1}\leq\cdots \leq t_{n}\leq 1$, and the $\epsilon _{i}$ are i.i.d. random errors with mean zero.

Nonparametric regression is a classic smoothing technique for recovering a signal function from data, without having a strong prior restriction on its form [1, 2]. There is much literature on nonparametric regression. Most methods developed so far are based on the mean regression function by using a least-squares estimation ($L_{2}$). For example, Gasser and Müller [3] proposed a kernel estimation by a Gasser–Müller kernel weight; Fan [4, 5] added more insights to the local linear method for the mean regression; and Braun and Huang [6] proposed a kernel spline regression by replacing the polynomial approximation for local polynomial kernel regression with a spline basis. The least-square-based methods certainly have some nice properties as regards Gaussian errors, but this method will not perform well because of the high sensitiveness to extreme outliers, specially to the errors having a heavy-tailed distribution. More robust estimation methods are required. The local median and M-estimators have been studied; see, for example, [7–13]. Also see [14, 15] for more details on quantile regression and robust estimation, respectively. As pointed out in [9], among many robust estimation methods, the $L_{1}$ method based on least absolute deviations behaves quite well because of downweight outliers, unique solutions and no transition point in the influence function (such as the additional parameter c in Huber’s $\rho _{c}$ function). The above methods basically require that the unknown function g has high smoothness. But in reality, the condition may not be satisfied. In fact, objects of some practical areas, such as signal and image processing, are frequently inhomogeneous. In this paper, we consider the wavelet technique to recover the signal function g based on $L_{1}$ method for the robust case.

We aim to study the asymptotic properties on $L_{1}$-wavelet estimator for the nonparametric model (1.1). Wavelet techniques, due to their ability to adapt to local features of curves, have received a lot of attention, and have been used to estimate the nonparametric curve. See, for example, [16–19]. Wavelet methods are prominent because of their computational ease and having the minimax results over very wide classes of function spaces for the signal function g. For linear wavelet smoothing, Antoniadis et al. [16] is a key reference that introduces wavelet versions of some classical kernel and orthogonal series estimators, and studies their asymptotic properties such as mean square consistent, bias, variance and asymptotic normal. Huang [20] also gave asymptotic bias and variance of the wavelet density estimator by wavelet-based reproducing kernels. Zhou and You [21] constructed wavelet estimators for varying-coefficient partially linear regression models, and established their asymptotic normalities and some convergence rates. For varying-coefficient models, the convergence rate and asymptotic normality of wavelet estimators were considered by [22, 23] provided asymptotic bias and variance of wavelet estimator for regression function under a mixing stochastic process. Recently, Chesneau et al. [24] proposed the nonparametric wavelet estimators of the quantile density function and its consistency. Li and Xiao [25] considered a wavelet estimator for the mean regression function with strong mixing errors and investigated their asymptotic rates of convergence by using the thresholding of the empirical wavelet coefficients. Berry–Esseen type bounds for wavelet estimators for semiparametric regression models were studied by [26, 27]. For the nonparametric models (1.1), as we learned, no study on $L_{1}$-wavelet estimators is reported. For this model, the estimation should be combined with the special feature of the model.

In this paper, we develop $L_{1}$-wavelet method for nonparametric regression model (1.1) by adopting wavelet to detect and represent localized features of the signal function g, and applying $L_{1}$ to yield better recovery for outliers or heavy-tailed data. The advantage of $L_{1}$-wavelet method is in avoiding the restrictive smoothness requirement for nonparametric function of the traditional smoothing approaches, such as kernel and local polynomial methods, and to robustify the usual mean regression. Last, we investigate asymptotic properties of the $L_{1}$-wavelet estimators, including the Bahadur representation, the rate of convergence and asymptotic normality.

The paper is organized as follows. In Sect. 2, we provide some necessary background on wavelet and develop $L_{1}$-wavelet estimation for the model (1.1). Asymptotic properties of $L_{1}$-wavelet estimators are presented in Sect. 3. Technical proofs are deferred to Sect. 4.

2 $L_{1}$-Wavelet estimation

Wavelet analysis requires a description of two related and suitably chosen orthonormal basic functions: the scaling function ϕ and the wavelet ψ. A wavelet system is generated by dilation and translation of ϕ and ψ through

$$ \phi _{m,k}(t)=2^{m/2}\phi \bigl(2^{m}t-k \bigr),\qquad \psi _{m,k}(t)=2^{m/2}\psi \bigl(2^{m}t-k \bigr),\quad m,k\in \mathbb{Z}. $$

A multiresolution analysis of $\mathcal{L}^{2}(\mathbb{R})$ consists of a nested sequence of closed subspace $V_{m}$, $m\in \mathbb{Z}$ of $\mathcal{L}^{2}(\mathbb{R})$,

$$ \cdots \subset V_{-2}\subset V_{-1}\subset V_{0}\subset V_{1}\subset V_{2} \subset \cdots , $$

where $\mathcal{L}^{2}(\mathbb{R})$ is the set of square integral functions over real line. Since $\{\phi (\cdot -k), k\in \mathbb{Z}\}$ is an orthogonal family of $\mathcal{L}^{2}(\mathbb{R})$ and $V_{0}$ is the subspace spanned, $\{\phi _{0k}, k\in \mathbb{Z}\}$ and $\{\phi _{mk}, k\in \mathbb{Z}\}$ are the orthogonal bases of $V_{0}$ and $V_{m}$, respectively. From the Moore–Aronszajn theorem [28], it follows that

$$ E(t,s)=\sum_{k}\phi (t-k)\phi (s-k) $$

is a reproducing kernel of $V_{0}$. By self-similarity of multiresolution subspaces,

$$ E_{m}(t,s)=2^{m}E\bigl(2^{m}t,2^{m}s \bigr) $$

is a reproducing kernel of $V_{m}$. Thus, the projection of g on the space $V_{m}$ is given by

$$ \mathbb{P} _{V_{m}}g(t)= \int 2^{m}E\bigl(2^{m}t,2^{m}s \bigr)g(s)\,ds. $$

This motivates us to define a $L_{1}$-wavelet estimator of g by

$$ \hat{g}(t)=\operatorname{argmin}_{a}\sum _{i=1}^{n} \vert y_{i}-a \vert \int _{A_{i}}E_{m}(t,s)\,ds, $$

(2.1)

where $A_{i}$ are intervals that partition $[0,1]$, so that $t_{i}\in A_{i}$. One way of defining the intervals $A_{i}=[s_{i-1},s_{i})$ is by taking $s_{0}=0$, $s_{n}=1$, and $s_{i}=(t_{i}+t_{i+1})/2$, $i=1,\ldots ,n-1$.

For the ith sample point, define $e_{i}^{+}$ and $e_{i}^{-}$ to be the positive and negative parts of $e_{i}$. Then, with the noisy samples, problem (2.1) can be reduced to the following linear program:

$$\begin{aligned} \mbox{Minimize }&\quad \sum_{i=1}^{n} \bigl(e_{i}^{+}+e_{i}^{-}\bigr) \int _{A_{i}}E_{m}(t,s)\,ds, \\ \mbox{Subject to }&\quad \boldsymbol {a}+\boldsymbol {e}^{+}-\boldsymbol {e}^{-}= \boldsymbol {y}, \\ &\quad \boldsymbol {e}^{+}, \boldsymbol {e}^{-}\geq 0, \end{aligned}$$

where $\boldsymbol {a}=a\mathbf{1}_{p}$, $\mathbf{1}_{p}$ is a p dimensional vector whose each component is 1, $\boldsymbol {e}^{+}=(e_{1}^{+},\ldots ,e_{n}^{+})^{T}$, $\boldsymbol {e}^{-}=(e_{1}^{-},\ldots ,e_{n}^{-})^{T}$ and $\boldsymbol {y}=(y_{1},\ldots , y_{n})^{T}$. In addition, $\int _{A_{i}}E_{m}(t,s)\,ds$ can be calculated by the cascade algorithm given by [16]. Thus, the $L_{1}$-wavelet estimator can be easily obtained. This linear program is just for calculating the estimator. To establish the asymptotic properties, we work only with (2.1).

3 Asymptotic properties

We begin with the following assumptions required to derive the asymptotic properties of the proposed estimator in Sect. 2.

(A1)
The noisy errors $\epsilon _{i}$ are i.i.d. with median 0 and a continuous, positive density $f_{\epsilon }$ in a neighborhood of 0.
(A2)
g belongs to the Sobolev space $\mathcal{H}^{\nu }(\mathbb{R})$ with order $\nu >1/2$.
(A3)
g satisfies the Lipschitz of order condition of order $\gamma >0$.
(A4)
ϕ has a compact support, and is in the Schwarz space with order $l>\nu $, it satisfies the Lipschitz condition with order l. Furthermore, $|\hat{\phi }(\xi )-1|=O(\xi )$ as $\xi \rightarrow 0$, where ϕ̂ is the Fourier transform of ϕ.
(A5)
$\max_{i}|t_{i}-t_{i-1}|=O(n^{-1})$.
(A6)
We also assume that, for some Lipschitz function $\kappa (\cdot )$,
$$ \rho (n)=\max_{i} \biggl\vert s_{i}-s_{i-1}- \frac{\kappa (s_{i})}{n} \biggr\vert =o\bigl(n^{-1}\bigr). $$
(A7)
(i) $n2^{-m}\rightarrow \infty $; (ii) $2^{m}=O(n^{1-2p})$, $1/(2+\delta )\leq p\leq 1/2$ for $\delta >0$. (iii) Let $v^{*}=\min (3/2,\nu ,\gamma +1/2)-\epsilon _{1}$ and $\epsilon _{1}=0$ for $\nu \neq 3/2$, $\epsilon _{1}>0$ for $\nu =3/2$. Assume that $n2^{-2mv^{*}}\rightarrow 0$.

Remark 3.1

The above conditions are mild and easily satisfied. (A1) is crucial to the asymptotic behavior of $\hat{g}(\cdot )$ based on $L_{1}$ estimation (2.1). (A2)–(A6) and (A7)(i)(iii) have been used in [16]. Note that, if $g\in \mathcal{H}^{\nu }(\mathbb{R})$ with $\nu >3/2$, then g is continuously differentiable; thus (A3) is redundant when $\nu >3/2$. So, (A2) is weaker than smoothness. For (A7), m acts as a tuning parameter, such as the bandwidth does for standard kernel smoothers; (A7) is the standard assumption for asymptotic behaviors; For example, take $\delta =2$ and $p=1/4$, then $2^{m}=O(n^{1/2})$. Thus, (i) holds; furthermore, taking $\nu >1$ and $\gamma >1/2$, then (iii) holds.

Our results are as follows.

Theorem 3.1

(i)
(Bahadur representation) Suppose that (A1)–(A5) and (A7)(i) hold, then
$$ \hat{g}(t)=g(t)-\frac{1}{2}f_{\epsilon }^{-1}(0)\sum _{i=1}^{n} \operatorname{sign}(\epsilon _{i}) \int _{A_{i}}E_{m}(t,s)\,ds +R_{n}(m; \gamma ,\nu ), $$
with
$$ R_{n}(m;\gamma ,\nu )=O_{p} \biggl\{ n^{-\gamma }+\eta _{m}+\sqrt{ \frac{2^{m}}{n^{1+\gamma }}} \biggr\} , $$
where $\operatorname{sign}(\cdot )$is a sign function.
(ii)
(Rate of convergence) Assume that (A1)–(A5) and (A7)(ii) hold, then
$$ \sup_{t\in [0,1]} \bigl\vert \hat{g}(t)-g(t) \bigr\vert =O_{p} \biggl\{ \sqrt{ \frac{2^{m}}{n}}\log n+n^{-\gamma }+\eta _{m} \biggr\} . $$

Remark 3.2

Theorem 3.1(i) gives the Bahadur representation of the $L_{1}$-wavelet estimation for a nonparametric model. For $1/2<\nu <3/2$, $\eta _{m}$ is a lower rate of convergence than the one of $\nu \geq 3/2$; Meanwhile, $g\in \mathcal{H}^{\nu }(\mathbb{R})$ is not differentiable if $1/2<\nu <3/2$. If we take $2^{m}=O(n^{1-\gamma })$ and $\nu =(\gamma +1)/[2(1-\gamma )]$ with $0<\gamma <1/2$, then $g\in \mathcal{H}^{\nu }(\mathbb{R})$ ($1/2<\nu <3/2$), and $R_{n}(m;\gamma ,\nu )=O_{p}(n^{-\gamma })$ ($0<\gamma <1/2$), It implies that $R_{n}(m;\gamma ,\nu )$ has an order very close to $O_{p}(n^{-1/2})$, which is comparable with the Bahadur order $o_{p}((nh)^{-1/2})$ for kernel weighted local polynomial estimation [29], where the bandwidth $h\rightarrow 0$ and the function requires second-order differentiability. For example, the triangular function having Fourier transform $\sin ^{2}(\xi /2)/(\xi /2)^{2}$ belongs to $\mathcal{H}^{1}(\mathbb{R})$ and is Lipschitz of order 1, so it satisfies our conditions for g but is not differentiable. Such a function has not been studied before.

Remark 3.3

Theorem 3.1(ii) states the rate of convergence of $L_{1}$-wavelet estimation for a nonparametric model. As in Remark 3.2, we consider the lower rate case of $\eta _{m}$, i.e., $1/2<\nu <3/2$. If we take $2^{m}=O(n^{\gamma })$ ($\gamma \geq 1/3$), then $\sup_{t\in [0,1]}|\hat{g}(t)-g(t)|=O_{p} (n^{-(1-\gamma )/2} \log n )$. Furthermore, taking $\gamma =1/3$, one gets

$$ \sup_{t\in [0,1]} \bigl\vert \hat{g}(t)-g(t) \bigr\vert =O_{p} \bigl(n^{-1/3}\log n \bigr), $$

which is comparable with the optimal convergence rate of the nonparametric estimation in nonparametric models. Meanwhile, it is the same as the results of [21] (in probability) and [22] (almost sure) for any $t\in [0,1]$ based on a least-square wavelet estimator, but they require that g is continuously differentiable, that is, $g\in \mathcal{H}^{\nu }(\mathbb{R})$ ($\nu >3/2$).

To obtain an asymptotic expansion of the variance and an asymptotic normality result, we need to consider an approximation to $\hat{g}(t)$ based on its values at dyadic points of order m. That is, we define $\hat{g}^{d}(t)=\hat{g}_{n}(t^{(m)})$ with $t^{(m)}= \lfloor 2^{m}t\rfloor /2^{m}$, where $\lfloor z\rfloor $ denotes the maximum integer not greater than z.

Theorem 3.2

(Asymptotic normality)

Support that (A1)–(A6) and (A7)(iii) hold, then

$$ \sqrt{n2^{-m}}\bigl(\hat{g}^{d}(t)-g(t)\bigr) \stackrel{D}{\longrightarrow }N \bigl(0,4^{-1}f_{\epsilon }^{-2}(0) \omega _{0}^{2}\kappa (t) \bigr), $$

where $\omega _{0}^{2}=\int _{\mathbb{R}}E_{0}^{2}(0,u)\,du=\sum_{k\in \mathbb{Z}}\phi ^{2}(k)$.

Remark 3.4

$\hat{g}^{d}(t)$ is the piecewise-constant approximation of $\hat{g}(t)$ at resolution $2^{-m}$. The reason to consider this is that the variance of ĝ is unstable as a function of t, because $\operatorname{var}(\hat{g}(t))=2^{m}n^{-1}\kappa (t)\int _{0}^{1}E_{0}^{2}(t_{m},s)\,ds$, where $t_{m}=2^{m}t-[2^{m}t]$. We know that, if t is non-dyadic, then the sequence $t_{m}$ wanders around the unit interval and fails to converge. Also see [16].

4 Technical proofs

In order to prove the main results, we first present several lemmas.

Lemma 4.1

Suppose that (A4) holds. We have:

(i)
$E_{0}(t,s)\leq c_{k}/(1+|t-s|)^{k}$and $E_{k}(t,s)\leq 2^{k}c_{k}/(1+2^{k}|t-s|)^{k}$, where k is a positive integer and $c_{k}$is a constant depending on k only.
(ii)
$\sup_{0\leq t,s \leq 1}|E_{m}(t,s)|=O(2^{m})$.
(iii)
$\sup_{0\leq t\leq 1}\int _{0}^{1}|E_{m}(t,s)|\,ds\leq c$, where c is a positive constant.
(iv)
$\int _{0}^{1} E_{m}(t,s)\,ds\rightarrow 1$uniformly in $t\in [0,1]$, as $m\rightarrow \infty $.

The proofs of (i) and (ii) can be found in [16], and (iii) follows from (i); the proof of (iv) can be found in [30].

Lemma 4.2

Suppose that (A4)–(A5) hold and $h(\cdot )$satisfies (A2)–(A3). Then

$$ \sup_{0\leq t\leq 1} \Biggl\vert h(t)-\sum _{i=1}^{n} h(t_{i}) \int _{A_{i}}E_{m}(t,s)\,ds \Biggr\vert =O \bigl(n^{-\gamma }\bigr)+O(\eta _{m}), $$

where

$$ \eta _{m}= \textstyle\begin{cases} (1/2^{m})^{\nu -1/2} & \textit{if } 1/2< \nu < 3/2, \\ \sqrt{m}/2^{m} & \textit{if } \nu =3/2, \\ 1/2^{m} & \textit{if } \nu >3/2. \end{cases} $$

It follows easily from Theorem 3.2 of [16].

Lemma 4.3

Let $\{V_{i}, i=1,\ldots ,n\}$be a sequence of independent random variables with mean zero and finite $2+\delta $th moments, and $\{a_{ij}, i,j=1,\ldots ,n\}$a set of positive numbers such that $\max_{ij}|a_{ij}|\leq n^{-p_{1}}$for some $0\leq p_{1}\leq 1$and $\sum_{i=1}^{n}a_{ij}=O(n^{p_{2}})$for some $p_{2}\geq \max (0,2/(2+\delta )-p_{1})$. Then

$$ \max_{1\leq j\leq n} \Biggl\vert \sum_{i=1}^{n}a_{ij}V_{ij} \Biggr\vert =O\bigl(n^{-(p_{1}-p_{2})/2} \log n\bigr),\quad \textit{a.s.} $$

It can be found in [31].

Lemma 4.4

Let $\{\lambda _{n}(\theta ),\theta \in \varTheta \}$be a sequence of random convex functions defined on a convex, open subset Θ of $\mathbb{R}^{d}$. Suppose $\lambda (\cdot )$is a real-valued function on Θ for which $\lambda _{n}(\theta )\rightarrow \lambda (\theta )$in probability, for each θ in Θ. Then for each compact subset K of Θ, in probability,

$$ \sup_{\theta \in K} \bigl\vert \lambda _{n}(\theta )- \lambda (\theta ) \bigr\vert \rightarrow 0. $$

See [32].

Below, we give the proof of the main results. The proof of Theorem 3.1 uses the idea of [32] and the convex lemma (Lemma 4.4). To complete the proof of Theorem 3.2, it is enough to check the Lindeberg-type condition.

Proof of Theorem 3.1

(i) From (2.1), note that $\hat{g}(t)=\hat{a}$ and â minimizes

$$ \sum_{i=1}^{n} \vert y_{i}-a \vert \int _{A_{i}}E_{m}(t,s)\,ds. $$

Let $\theta =a-g(t)$ and $\epsilon _{i}^{*}=\epsilon _{i}+[g(t_{i})-g(t)]$. Then θ minimizes the function

$$ G_{n}(\theta )=\sum_{i=1}^{n} \bigl\{ \bigl\vert \epsilon _{i}^{*}-\theta \bigr\vert - \bigl\vert \epsilon _{i}^{*} \bigr\vert \bigr\} \int _{A_{i}}E_{m}(t,s)\,ds. $$

The idea behind the proof, as in [32], is to approximate $G_{n}(\theta )$ by a quadratic function whose minima have an explicit expression, and then to show that θ̂ is close enough to those minima that share their asymptotic behavior.

We now set out to approximate $G_{n}(\theta )$ by a quadratic function of θ. Write

$$ G_{n}(\theta )=W_{n}\theta +R_{n}(\theta ) , $$

where $W_{n}=-\sum_{i=1}^{n}\operatorname{sign}(\epsilon _{i})\int _{A_{i}}E_{m}(t,s)\,ds$, which does not depend on θ, and

$$ R_{n}(\theta )=\sum_{i=1}^{n} \bigl\{ \bigl\vert \epsilon _{i}^{*}-\theta \bigr\vert - \bigl\vert \epsilon _{i}^{*} \bigr\vert + \operatorname{sign}(\epsilon _{i})\theta \bigr\} \int _{A_{i}}E_{m}(t,s)\,ds. $$

(4.1)

We have

$$ G_{n}(\theta )=E\bigl(G_{n}(\theta ) \bigr)+W_{n}\theta +\bigl[R_{n}(\theta )-E \bigl(R_{n}( \theta )\bigr)\bigr]. $$

(4.2)

From the error assumption (A1), it is ensured that the function $\Delta (t)=E[|\epsilon _{i}-t|-|\epsilon _{i}|]$ has a unique minimum at zero, and $\Delta (t)=t^{2}f_{\epsilon }(0)+o(t^{2})$. Therefore, by Lemmas 4.1 and 4.2,

$$\begin{aligned} E\bigl(G_{n}(\theta )\bigr) =&\sum _{i=1}^{n}\bigl\{ f_{\epsilon }(0) \theta ^{2}-2f_{\epsilon }(0)\bigl[g(t_{i})-g(t) \bigr]\theta \bigr\} \int _{A_{i}}E_{m}(t,s)\,ds+o\bigl( \delta _{n}^{2}\bigr) \\ =&f_{\epsilon }(0)\theta ^{2}-2f_{\epsilon }(0)\theta \Biggl\{ \sum_{i=1}^{n}g(t_{i}) \int _{A_{i}}E_{m}(t,s)\,ds-g(t) \Biggr\} +o\bigl( \delta _{n}^{2}\bigr) \\ =&f_{\epsilon }(0)\theta ^{2}+O\bigl[\bigl(n^{-\gamma }+ \eta _{m}\bigr)\theta \bigr]+o\bigl( \delta _{n}^{2} \bigr), \end{aligned}$$

(4.3)

where $\delta _{n}=\max \{ (n^{-\gamma }+\eta _{m}),|\theta | \} $. For (4.1), note that $\vert |\epsilon _{i}^{*}-\theta |-|\epsilon _{i}^{*}|+\operatorname{sign}( \epsilon _{i})\theta \vert \leq 2|\theta |I\{|\varepsilon _{i}| \leq |\theta |+|g(t_{i})-g(t)|\}$, then we obtain

$$\begin{aligned} ER_{n}^{2}(\theta ) \leq & 4\theta ^{2}\sum _{i=1}^{n}EI\bigl\{ \vert \varepsilon _{i} \vert \leq \vert \theta \vert + \bigl\vert g(t_{i})-g(t) \bigr\vert \bigr\} \biggl\{ \int _{A_{i}}E_{m}(t,s)\,ds \biggr\} ^{2} \\ =&8\theta ^{2}f_{\epsilon }(0)\sum _{i=1}^{n} \bigl\vert g(t_{i})-g(t) \bigr\vert \biggl\{ \int _{A_{i}}E_{m}(t,s)\,ds \biggr\} ^{2} \bigl(1+o(1)\bigr) \\ =&O \biggl(\frac{2^{m}}{n^{1+\gamma }}\theta ^{2} \biggr). \end{aligned}$$

We get

$$ R_{n}(\theta )-E\bigl(R_{n}(\theta ) \bigr)=O_{p} \biggl(\theta \sqrt{ \frac{2^{m}}{n^{1+\gamma }}} \biggr). $$

(4.4)

Let $a_{n}=O_{p} \{ n^{-\gamma }+\eta _{m}+\sqrt{ \frac{2^{m}}{n^{1+\gamma }}} \} $. Combining (4.2)–(4.4), for each fixed θ, we have

$$\begin{aligned} G_{n}(\theta ) =&f_{\epsilon }(0)\theta ^{2}+W_{n} \theta +O\bigl[\bigl(n^{-\gamma }+ \eta _{m}\bigr)\theta \bigr]+ O_{p} \biggl(\theta \sqrt{ \frac{2^{m}}{n^{1+\gamma }}} \biggr) \\ =&f_{\epsilon }(0)\theta ^{2}+(W_{n}+a_{n}) \theta , \end{aligned}$$

(4.5)

with $a_{n}=o_{p}(1)$ uniformly. Note that

$$ W_{n}=-\sum_{i=1}^{n} \operatorname{sign}(\epsilon _{i}) \int _{A_{i}}E_{m}(t,s)\,ds. $$

It is easy to see that $W_{n}$ has a bounded second moment and hence is stochastically bounded. Since the convex function $G_{n}(\theta )-(W_{n}+a_{n})\theta $ converges in probability to the convex function $f_{\epsilon }(0)\theta ^{2}$, it follows from the convexity lemma, Lemma 4.4, that, for every compact set K,

$$ \sup_{\theta \in K} \bigl\vert G_{n}( \theta )-(W_{n}+a_{n})\theta -f_{\epsilon }(0) \theta ^{2} \bigr\vert =o_{p}(1). $$

(4.6)

Thus, the quadratic approximation to the convex function $G_{n}(\theta )$ holds uniformly for θ in any compact set. So, using the convexity assumption again, the minimizer θ̂ of $G_{n}(\theta )$ converges in probability to the minimizer

$$ \bar{\theta }=-\frac{1}{2}f_{\epsilon }^{-1}(0) (W_{n}+a_{n}), $$

(4.7)

that is,

$$ P\bigl( \vert \hat{\theta }-\bar{\theta } \vert >\delta \bigr)\rightarrow 0. $$

The assertion can be proved by some elementary arguments, which is similar to the proof of Theorem 1 in [32]. Based on (4.6), let $G_{n}(\theta )=(W_{n}+a_{n})\theta +f_{\epsilon }(0)\theta ^{2}+r_{n}( \theta )$ which can be written as

$$ G_{n}(\theta )=f_{\epsilon }(0)\bigl\{ \vert \theta -\bar{\theta } \vert ^{2}- \vert \bar{\theta } \vert ^{2}\bigr\} +r_{n}(\theta ), $$

(4.8)

with $\sup_{\theta \in K}|r_{n}(\theta )|=o_{p}(1)$. Because θ̄ is stochastically bounded. The compact set K can be chosen to contain a closed ball $B(n)$ with center θ̄ and radius δ in probability, thereby implying that

$$ \Delta _{n}=\sup_{\theta \in B(n)} \bigl\vert r_{n}(\theta ) \bigr\vert =o_{p}(1). $$

Now consider the behavior of $G_{n}(\theta )$ outside $B(n)$. Suppose $\theta =\bar{\theta }+\beta \mu $, with $\beta >\delta $ and μ is a unit vector. Define $\theta ^{*}$ as the boundary point $B_{n}$ that lies on the line segment from θ̄ to θ, i.e. $\theta ^{*}=\bar{\theta }+\delta \mu $. Convexity of $G_{n}(\theta )$, (4.8) and the definition of $\Delta _{n}$ imply

$$\begin{aligned} \frac{\delta }{\beta }G_{n}(\theta )+\biggl(1- \frac{\delta }{\beta }\biggr)G_{n}( \bar{\theta }) \geq & G_{n}\bigl(\theta ^{*}\bigr) \\ \geq &f_{\epsilon }(0)\delta ^{2}-f_{\epsilon }(0) \vert \bar{\theta } \vert ^{2}- \Delta _{n} \\ \geq &f_{\epsilon }(0)\delta ^{2}+G_{n}(\bar{ \theta })-2\Delta _{n}. \end{aligned}$$

It follows that

$$ \inf_{|\theta -\bar{\theta }|>\delta }G_{n}(\theta )\geq G_{n}( \bar{\theta })+\frac{\beta }{\delta }\bigl[f_{\epsilon }(0)\delta ^{2}-2\Delta _{n}\bigr], $$

when $2\Delta _{n}< f_{\epsilon }(0)\delta ^{2}$, which happens with probability tending to one, the minimum of $G_{n}(\theta )$ cannot occur at any θ with $|\theta -\bar{\theta }|>\delta $. This implies that, for any $\delta >0$ and for large enough n, the minimum of $G_{n}(\theta )$ must be achieved with $B(n)$, i.e., $|\hat{\theta }-\bar{\theta }|\leq \delta $ with probability tending to one. Thus, it completes the proof of (i).

(ii) In the following, we will prove that

$$ W_{n}=O \biggl\{ \biggl(\frac{2^{m}}{n} \biggr)^{1/2}\log n \biggr\} ,\quad \mbox{a.s.} $$

(4.9)

By Lemma 4.1, we have

$$ \max_{i,m} \biggl\vert \int _{A_{i}}E_{m}(t,s)\,ds \biggr\vert =O \bigl(2^{m}/n\bigr)=O\bigl(n^{-2p}\bigr) $$

and

$$ \sum_{i=1}^{n} \int _{A_{i}}E_{m}(t,s)\,ds= \int _{0}^{1} E_{m}(t,s)\,ds=O(1)=O \bigl(n^{p_{2}}\bigr), $$

where $p_{1}=2p$ with $0\leq p_{1}\leq 1$ and $p_{1}\geq 2/(2+\delta )$, and $p_{2}=0$, which can be satisfied by Conditions (A1) and (A7)(ii). By Lemma 4.3, $W_{n}=O(n^{-p}\log n)$. Furthermore, we get (4.9). So, (ii) holds. □

Proof of Theorem 3.2

From Theorem 3.1(i), we have

$$ 2f_{\epsilon }(0)\bigl\{ \hat{g}(t)-g(t)\bigr\} =Z_{n}(t)+R_{n}(m;\gamma ,\nu ), $$

(4.10)

where $Z_{n}(t)=\sum_{i=1}^{n}\operatorname{sign}(\epsilon _{i})\int _{A_{i}}E_{m}(t,s)\,ds$ and $R_{n}(m;\gamma ,\nu )=O_{p} \{ n^{-\gamma }+\eta _{m}+\sqrt{ \frac{2^{m}}{n^{1+\gamma }}} \} $. From (A7)(iii) $n2^{-2mv^{*}}\rightarrow 0$, one gets

$$ \sqrt{n2^{-m}}R_{n}(m;\gamma ,\nu )=o_{p}(1). $$

(4.11)

Now, let us verify the asymptotic normality of $\sqrt{n2^{-m}}Z_{n}(t^{(m)})$. First, we calculate the variance of it. By the proofs of Theorem 3.3 and Lemma 6.1 of [16], we have

$$\begin{aligned}& \bigl\vert \operatorname{var} \bigl(\sqrt{n2^{-m}}Z_{n} \bigl(t^{(m)}\bigr) \bigr)-\kappa (t) \omega _{0}^{2} \bigr\vert \\& \quad = \Biggl\vert n2^{-m}\sum _{i=1}^{n} \biggl( \int _{A_{i}}E_{m}\bigl(t^{(m)},s\bigr)\,ds \biggr)^{2}-\kappa (t)\omega _{0}^{2} \Biggr\vert \\& \quad \leq \Biggl\vert n2^{-m}\sum_{i=1}^{n} \biggl( \int _{A_{i}}E_{m}\bigl(t^{(m)},s\bigr)\,ds \biggr)^{2}-2^{-m} \int _{0}^{1}E_{m}^{2} \bigl(t^{(m)},s\bigr)\kappa (s)\,ds \Biggr\vert \\& \qquad {} + \biggl\vert 2^{-m} \int _{0}^{1}E_{m}^{2} \bigl(t^{(m)},s\bigr)\kappa (s)\,ds-\kappa (t) \omega _{0}^{2} \biggr\vert \\& \quad \leq n2^{-m} \Biggl\vert \sum_{i=1}^{n} (s_{i}-s_{i-1})^{2}E_{m}^{2} \bigl(t^{(m)},u_{i}\bigr)- \frac{1}{n}(s_{i}-s_{i-1})E_{m}^{2} \bigl(t^{(m)},v_{i}\bigr)k(v_{i}) \Biggr\vert +o(1) \\& \qquad (\mbox{where }u_{i}\mbox{ and }v_{i}\mbox{ belong to }A_{i}) \\& \quad = n2^{-m}O\bigl(n^{-1}\bigr)O\bigl(n2^{-m} \bigr) \biggl(\rho (n)2^{2m}+ \frac{2^{2m}}{n^{2}}+ \frac{2^{2m}}{n}\frac{2^{m}}{n} \biggr)+o(1) \\& \quad \leq O \bigl(n\rho (n)+2^{m}/n \bigr)=o(1). \end{aligned}$$

So,

$$ \operatorname{var} \bigl(\sqrt{n2^{-m}}Z_{n} \bigl(t^{(m)}\bigr) \bigr)=\kappa (t)\omega _{0}^{2}+o(1). $$

(4.12)

To complete the proof, we only need to check the Lindeberg-type condition

$$ \max_{1\leq i\leq n} \frac{n2^{-m} (\int _{A_{i}}E_{m}(t,s)\,ds )^{2}}{\operatorname{var} (\sqrt{n2^{-m}}Z_{n}(t^{(m)}) )} \rightarrow 0. $$

From (4.12) and Lemma 4.1, one sees that the order is $O(2^{m}/n)\rightarrow 0$. Thus, we complete Theorem 3.2. □

References

Müller, H.G.: Nonparametric Regression Analysis for Longitudinal Data. Springer, Berlin (1988)
Book Google Scholar
Fan, J., Yao, Q.: Nonlinear Time Series: Nonparametric and Parametric Methods. Springer, New York (2003)
Book Google Scholar
Gasser, T., Müller, H.G.: Kernel estimation of regression function. In: Smoothing Techniques for Curve Estiamtion. Springer, New York (1979)
Chapter Google Scholar
Fan, J.: Design-dadaptive nonparametric regression. J. Am. Stat. Assoc. 87, 998–1004 (1992)
Article Google Scholar
Fan, J.: Local linear regression smoothers and their minimax efficiencies. Ann. Stat. 21, 196–216 (1993)
Article MathSciNet Google Scholar
Braun, W.J., Huang, L.S.: Kernel spline regression. Can. J. Stat. 33, 259–278 (2005)
Article MathSciNet Google Scholar
Härdle, W., Gasser, T.: Robust non-parametric function fitting. J. R. Stat. Soc., Ser. B, Stat. Methodol. 46, 42–51 (1984)
MATH Google Scholar
Truong, Y.K.: Asymtotic properties of kernel estimators based on local medians. Ann. Stat. 17, 606–617 (1989)
Article Google Scholar
Wang, F.T., Scott, D.W.: The $\mathrm{l}_{1}$ method for robust nonparametric regression. J. Am. Stat. Assoc. 89, 65–76 (1994)
Google Scholar
Fan, J., Hu, T.C., Truong, Y.K.: Robust non-parametric function estimation. Scand. J. Stat. 21, 433–446 (1994)
MathSciNet MATH Google Scholar
Tang, Q., Wang, J.: $\mathrm{L}_{1}$-Estimation for varying coefficient models. Statistics 39, 389–404 (2005)
Article MathSciNet Google Scholar
Honda, T.: Nonparametric estimation of conditional medians for linear and related processes. Ann. Inst. Stat. Math. 62, 995–1021 (2010)
Article MathSciNet Google Scholar
Zhao, Z.B., Wei, Y., Lin, D.K.J.: Asymptotics of nonparametric $\mathrm{l}_{1}$ regression models with dependent data. Bernoulli 20, 1532–1559 (2014)
Article MathSciNet Google Scholar
Koenker, R.: Quantile Regression. Cambridge University Press, Cambridge (2005)
Book Google Scholar
Huber, P.J., Ronchetti, E.M.: Robust Statistics, 2nd edn. Wiley, New York (2009)
Book Google Scholar
Antoniadis, A., Gregoire, G., McKeague, I.W.: Wavelet methods for curve estimation. J. Am. Stat. Assoc. 89, 1340–1353 (1994)
Article MathSciNet Google Scholar
Donoho, D.L., Johnstone, I.M.: Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc. 90, 1200–1224 (1994)
Article MathSciNet Google Scholar
Hall, P., Patil, P.: Formulae for mean integrated squared error of non-linear wavelet-based density estimators. Ann. Stat. 23, 905–928 (1995)
Article Google Scholar
Härdle, W., Kerkycharian, G., Picard, D., Tsybakov, A.: Wavelet, Approximation and Statistical Application. Springer, New York (1998)
Book Google Scholar
Huang, S.Y.: Density estimation by wavelet-based reproducing kernels. Stat. Sin. 9, 137–151 (1999)
MathSciNet MATH Google Scholar
Zhou, X., You, J.: Wavelet estimation in varying-coefficient partially linear regression models. Stat. Probab. Lett. 68, 91–104 (2004)
Article MathSciNet Google Scholar
Lu, Y., Li, Z.: Wavelet estimation in varying-coefficient models. Chinese J. Appl. Probab. 25, 409–420 (2009)
MathSciNet MATH Google Scholar
Doosti, H., Afshari, M., Niroumand, H.A.: Wavelets for nonparametric stochastic regression with mixing stochastic process. Commun. Stat., Theory Methods 37, 373–385 (2008)
Article MathSciNet Google Scholar
Chesneau, C., Dewan, I., Doosti, H.: Nonparametric estimation of a quantile density function by wavelet methods. Comput. Stat. Data Anal. 94, 161–174 (2016)
Article MathSciNet Google Scholar
Li, L., Xiao, Y.: Wavelet-based estimation of regression function with strong mixing errors under fixed design. Commun. Stat., Theory Methods 46, 4824–4842 (2017)
Article MathSciNet Google Scholar
Zhou, X.C., Lin, J.G.: Asymptotic properties of wavelet estimators in semiparametric regression models under dependent errors. J. Multivar. Anal. 122, 251–270 (2013)
Article MathSciNet Google Scholar
Ding, L., Chen, P., Li, Y.: Berry–Esseen bound of wavelet estimator in heteroscedastic regression model with random errors. Int. J. Comput. Math. 96, 821–852 (2019)
Article MathSciNet Google Scholar
Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68, 337–404 (1950)
Article MathSciNet Google Scholar
Welsh, A.H.: Robust estimation of smooth regression and spread functions and their derivatives. Stat. Sin. 6, 347–366 (1996)
MathSciNet MATH Google Scholar
Walter, G.G.: Wavelets and Orthogonal Systems with Applications. CRC Press, Boca Raton (1994)
MATH Google Scholar
Härdle, W., Liang, H., Gao, J.: Partially Linear Models. Physica-Verlag, Heidelberg (2000)
Book Google Scholar
Pollard, D.: Asymptotics for least absolute deviation regression estimators. Econom. Theory 7, 186–199 (1991)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank the editor and the referees for their careful reading and for their helpful comments and suggestions that led to considerable improvements of the paper.

Availability of data and materials

No applicable.

Funding

This work is supported by National Social Science Fund of China (No. 19BTJ034).

Author information

Authors and Affiliations

Institute of Statistics and Data Science, Nanjing Audit University, Nanjing, Jiangsu, 211815, China
Xingcai Zhou
School of Economics and Management, Southeast University, Nanjing, Jiangsu, 211189, China
Xingcai Zhou
School of Mathematics and Finance, Chuzhou University, Chuzhou, Anhui, 239001, China
Fangxia Zhu

Authors

Xingcai Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Fangxia Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The authors contributed equally in writing the final version of this article. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Fangxia Zhu.

Ethics declarations

Competing interests

The authors declare to have no competing interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhou, X., Zhu, F. Asymptotics for $L_{1}$-wavelet method for nonparametric regression. J Inequal Appl 2020, 216 (2020). https://doi.org/10.1186/s13660-020-02483-w

Download citation

Received: 24 April 2020
Accepted: 26 August 2020
Published: 03 September 2020
DOI: https://doi.org/10.1186/s13660-020-02483-w

Asymptotics for \(L_{1}\)-wavelet method for nonparametric regression

Abstract

Similar content being viewed by others

Nonparametric regression with warped wavelets and strong mixing processes

Distributional Wavelet Transform

Wavelet estimation in time-varying coefficient models

1 Introduction

2 \(L_{1}\)-Wavelet estimation

3 Asymptotic properties

Remark 3.1

Theorem 3.1

Remark 3.2

Remark 3.3

Theorem 3.2

Remark 3.4

4 Technical proofs

Lemma 4.1

Lemma 4.2

Lemma 4.3

Lemma 4.4

Proof of Theorem 3.1

Proof of Theorem 3.2

References

Acknowledgements

Availability of data and materials

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

MSC

Keywords

Navigation

Asymptotics for \(L_{1}\)-wavelet method for nonparametric regression

Abstract

Similar content being viewed by others

Nonparametric regression with warped wavelets and strong mixing processes

Distributional Wavelet Transform

Wavelet estimation in time-varying coefficient models

1 Introduction

2 \(L_{1}\)-Wavelet estimation

3 Asymptotic properties

Remark 3.1

Theorem 3.1

Remark 3.2

Remark 3.3

Theorem 3.2

Remark 3.4

4 Technical proofs

Lemma 4.1

Lemma 4.2

Lemma 4.3

Lemma 4.4

Proof of Theorem 3.1

Proof of Theorem 3.2

References

Acknowledgements

Availability of data and materials

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

MSC

Keywords

Search

Navigation