1 Introduction

Consider the problem of estimating the underlying regression function from a set of noisy data. The nonparametric regression is an underlying framework. It has the following standard form:

$$ y_{i}=g(t_{i})+\epsilon _{i}, \quad i=1,\ldots ,n, $$
(1.1)

where the \(y_{i}\) are the noisy samples of an unknown function \(g(\cdot )\) defined on \([0,1]\), \(\{t_{i}\}\) are non-random design points with \(0\leq t_{1}\leq\cdots \leq t_{n}\leq 1\), and the \(\epsilon _{i}\) are i.i.d. random errors with mean zero.

Nonparametric regression is a classic smoothing technique for recovering a signal function from data, without having a strong prior restriction on its form [1, 2]. There is much literature on nonparametric regression. Most methods developed so far are based on the mean regression function by using a least-squares estimation (\(L_{2}\)). For example, Gasser and Müller [3] proposed a kernel estimation by a Gasser–Müller kernel weight; Fan [4, 5] added more insights to the local linear method for the mean regression; and Braun and Huang [6] proposed a kernel spline regression by replacing the polynomial approximation for local polynomial kernel regression with a spline basis. The least-square-based methods certainly have some nice properties as regards Gaussian errors, but this method will not perform well because of the high sensitiveness to extreme outliers, specially to the errors having a heavy-tailed distribution. More robust estimation methods are required. The local median and M-estimators have been studied; see, for example, [713]. Also see [14, 15] for more details on quantile regression and robust estimation, respectively. As pointed out in [9], among many robust estimation methods, the \(L_{1}\) method based on least absolute deviations behaves quite well because of downweight outliers, unique solutions and no transition point in the influence function (such as the additional parameter c in Huber’s \(\rho _{c}\) function). The above methods basically require that the unknown function g has high smoothness. But in reality, the condition may not be satisfied. In fact, objects of some practical areas, such as signal and image processing, are frequently inhomogeneous. In this paper, we consider the wavelet technique to recover the signal function g based on \(L_{1}\) method for the robust case.

We aim to study the asymptotic properties on \(L_{1}\)-wavelet estimator for the nonparametric model (1.1). Wavelet techniques, due to their ability to adapt to local features of curves, have received a lot of attention, and have been used to estimate the nonparametric curve. See, for example, [1619]. Wavelet methods are prominent because of their computational ease and having the minimax results over very wide classes of function spaces for the signal function g. For linear wavelet smoothing, Antoniadis et al. [16] is a key reference that introduces wavelet versions of some classical kernel and orthogonal series estimators, and studies their asymptotic properties such as mean square consistent, bias, variance and asymptotic normal. Huang [20] also gave asymptotic bias and variance of the wavelet density estimator by wavelet-based reproducing kernels. Zhou and You [21] constructed wavelet estimators for varying-coefficient partially linear regression models, and established their asymptotic normalities and some convergence rates. For varying-coefficient models, the convergence rate and asymptotic normality of wavelet estimators were considered by [22, 23] provided asymptotic bias and variance of wavelet estimator for regression function under a mixing stochastic process. Recently, Chesneau et al. [24] proposed the nonparametric wavelet estimators of the quantile density function and its consistency. Li and Xiao [25] considered a wavelet estimator for the mean regression function with strong mixing errors and investigated their asymptotic rates of convergence by using the thresholding of the empirical wavelet coefficients. Berry–Esseen type bounds for wavelet estimators for semiparametric regression models were studied by [26, 27]. For the nonparametric models (1.1), as we learned, no study on \(L_{1}\)-wavelet estimators is reported. For this model, the estimation should be combined with the special feature of the model.

In this paper, we develop \(L_{1}\)-wavelet method for nonparametric regression model (1.1) by adopting wavelet to detect and represent localized features of the signal function g, and applying \(L_{1}\) to yield better recovery for outliers or heavy-tailed data. The advantage of \(L_{1}\)-wavelet method is in avoiding the restrictive smoothness requirement for nonparametric function of the traditional smoothing approaches, such as kernel and local polynomial methods, and to robustify the usual mean regression. Last, we investigate asymptotic properties of the \(L_{1}\)-wavelet estimators, including the Bahadur representation, the rate of convergence and asymptotic normality.

The paper is organized as follows. In Sect. 2, we provide some necessary background on wavelet and develop \(L_{1}\)-wavelet estimation for the model (1.1). Asymptotic properties of \(L_{1}\)-wavelet estimators are presented in Sect. 3. Technical proofs are deferred to Sect. 4.

2 \(L_{1}\)-Wavelet estimation

Wavelet analysis requires a description of two related and suitably chosen orthonormal basic functions: the scaling function ϕ and the wavelet ψ. A wavelet system is generated by dilation and translation of ϕ and ψ through

$$ \phi _{m,k}(t)=2^{m/2}\phi \bigl(2^{m}t-k \bigr),\qquad \psi _{m,k}(t)=2^{m/2}\psi \bigl(2^{m}t-k \bigr),\quad m,k\in \mathbb{Z}. $$

A multiresolution analysis of \(\mathcal{L}^{2}(\mathbb{R})\) consists of a nested sequence of closed subspace \(V_{m}\), \(m\in \mathbb{Z}\) of \(\mathcal{L}^{2}(\mathbb{R})\),

$$ \cdots \subset V_{-2}\subset V_{-1}\subset V_{0}\subset V_{1}\subset V_{2} \subset \cdots , $$

where \(\mathcal{L}^{2}(\mathbb{R})\) is the set of square integral functions over real line. Since \(\{\phi (\cdot -k), k\in \mathbb{Z}\}\) is an orthogonal family of \(\mathcal{L}^{2}(\mathbb{R})\) and \(V_{0}\) is the subspace spanned, \(\{\phi _{0k}, k\in \mathbb{Z}\}\) and \(\{\phi _{mk}, k\in \mathbb{Z}\}\) are the orthogonal bases of \(V_{0}\) and \(V_{m}\), respectively. From the Moore–Aronszajn theorem [28], it follows that

$$ E(t,s)=\sum_{k}\phi (t-k)\phi (s-k) $$

is a reproducing kernel of \(V_{0}\). By self-similarity of multiresolution subspaces,

$$ E_{m}(t,s)=2^{m}E\bigl(2^{m}t,2^{m}s \bigr) $$

is a reproducing kernel of \(V_{m}\). Thus, the projection of g on the space \(V_{m}\) is given by

$$ \mathbb{P} _{V_{m}}g(t)= \int 2^{m}E\bigl(2^{m}t,2^{m}s \bigr)g(s)\,ds. $$

This motivates us to define a \(L_{1}\)-wavelet estimator of g by

$$ \hat{g}(t)=\operatorname{argmin}_{a}\sum _{i=1}^{n} \vert y_{i}-a \vert \int _{A_{i}}E_{m}(t,s)\,ds, $$
(2.1)

where \(A_{i}\) are intervals that partition \([0,1]\), so that \(t_{i}\in A_{i}\). One way of defining the intervals \(A_{i}=[s_{i-1},s_{i})\) is by taking \(s_{0}=0\), \(s_{n}=1\), and \(s_{i}=(t_{i}+t_{i+1})/2\), \(i=1,\ldots ,n-1\).

For the ith sample point, define \(e_{i}^{+}\) and \(e_{i}^{-}\) to be the positive and negative parts of \(e_{i}\). Then, with the noisy samples, problem (2.1) can be reduced to the following linear program:

$$\begin{aligned} \mbox{Minimize }&\quad \sum_{i=1}^{n} \bigl(e_{i}^{+}+e_{i}^{-}\bigr) \int _{A_{i}}E_{m}(t,s)\,ds, \\ \mbox{Subject to }&\quad \boldsymbol {a}+\boldsymbol {e}^{+}-\boldsymbol {e}^{-}= \boldsymbol {y}, \\ &\quad \boldsymbol {e}^{+}, \boldsymbol {e}^{-}\geq 0, \end{aligned}$$

where \(\boldsymbol {a}=a\mathbf{1}_{p}\), \(\mathbf{1}_{p}\) is a p dimensional vector whose each component is 1, \(\boldsymbol {e}^{+}=(e_{1}^{+},\ldots ,e_{n}^{+})^{T}\), \(\boldsymbol {e}^{-}=(e_{1}^{-},\ldots ,e_{n}^{-})^{T}\) and \(\boldsymbol {y}=(y_{1},\ldots , y_{n})^{T}\). In addition, \(\int _{A_{i}}E_{m}(t,s)\,ds\) can be calculated by the cascade algorithm given by [16]. Thus, the \(L_{1}\)-wavelet estimator can be easily obtained. This linear program is just for calculating the estimator. To establish the asymptotic properties, we work only with (2.1).

3 Asymptotic properties

We begin with the following assumptions required to derive the asymptotic properties of the proposed estimator in Sect. 2.

  1. (A1)

    The noisy errors \(\epsilon _{i}\) are i.i.d. with median 0 and a continuous, positive density \(f_{\epsilon }\) in a neighborhood of 0.

  2. (A2)

    g belongs to the Sobolev space \(\mathcal{H}^{\nu }(\mathbb{R})\) with order \(\nu >1/2\).

  3. (A3)

    g satisfies the Lipschitz of order condition of order \(\gamma >0\).

  4. (A4)

    ϕ has a compact support, and is in the Schwarz space with order \(l>\nu \), it satisfies the Lipschitz condition with order l. Furthermore, \(|\hat{\phi }(\xi )-1|=O(\xi )\) as \(\xi \rightarrow 0\), where ϕ̂ is the Fourier transform of ϕ.

  5. (A5)

    \(\max_{i}|t_{i}-t_{i-1}|=O(n^{-1})\).

  6. (A6)

    We also assume that, for some Lipschitz function \(\kappa (\cdot )\),

    $$ \rho (n)=\max_{i} \biggl\vert s_{i}-s_{i-1}- \frac{\kappa (s_{i})}{n} \biggr\vert =o\bigl(n^{-1}\bigr). $$
  7. (A7)

    (i) \(n2^{-m}\rightarrow \infty \); (ii) \(2^{m}=O(n^{1-2p})\), \(1/(2+\delta )\leq p\leq 1/2\) for \(\delta >0\). (iii) Let \(v^{*}=\min (3/2,\nu ,\gamma +1/2)-\epsilon _{1}\) and \(\epsilon _{1}=0\) for \(\nu \neq 3/2\), \(\epsilon _{1}>0\) for \(\nu =3/2\). Assume that \(n2^{-2mv^{*}}\rightarrow 0\).

Remark 3.1

The above conditions are mild and easily satisfied. (A1) is crucial to the asymptotic behavior of \(\hat{g}(\cdot )\) based on \(L_{1}\) estimation (2.1). (A2)–(A6) and (A7)(i)(iii) have been used in [16]. Note that, if \(g\in \mathcal{H}^{\nu }(\mathbb{R})\) with \(\nu >3/2\), then g is continuously differentiable; thus (A3) is redundant when \(\nu >3/2\). So, (A2) is weaker than smoothness. For (A7), m acts as a tuning parameter, such as the bandwidth does for standard kernel smoothers; (A7) is the standard assumption for asymptotic behaviors; For example, take \(\delta =2\) and \(p=1/4\), then \(2^{m}=O(n^{1/2})\). Thus, (i) holds; furthermore, taking \(\nu >1\) and \(\gamma >1/2\), then (iii) holds.

Our results are as follows.

Theorem 3.1

  1. (i)

    (Bahadur representation) Suppose that (A1)(A5) and (A7)(i) hold, then

    $$ \hat{g}(t)=g(t)-\frac{1}{2}f_{\epsilon }^{-1}(0)\sum _{i=1}^{n} \operatorname{sign}(\epsilon _{i}) \int _{A_{i}}E_{m}(t,s)\,ds +R_{n}(m; \gamma ,\nu ), $$

    with

    $$ R_{n}(m;\gamma ,\nu )=O_{p} \biggl\{ n^{-\gamma }+\eta _{m}+\sqrt{ \frac{2^{m}}{n^{1+\gamma }}} \biggr\} , $$

    where \(\operatorname{sign}(\cdot )\)is a sign function.

  2. (ii)

    (Rate of convergence) Assume that (A1)(A5) and (A7)(ii) hold, then

    $$ \sup_{t\in [0,1]} \bigl\vert \hat{g}(t)-g(t) \bigr\vert =O_{p} \biggl\{ \sqrt{ \frac{2^{m}}{n}}\log n+n^{-\gamma }+\eta _{m} \biggr\} . $$

Remark 3.2

Theorem 3.1(i) gives the Bahadur representation of the \(L_{1}\)-wavelet estimation for a nonparametric model. For \(1/2<\nu <3/2\), \(\eta _{m}\) is a lower rate of convergence than the one of \(\nu \geq 3/2\); Meanwhile, \(g\in \mathcal{H}^{\nu }(\mathbb{R})\) is not differentiable if \(1/2<\nu <3/2\). If we take \(2^{m}=O(n^{1-\gamma })\) and \(\nu =(\gamma +1)/[2(1-\gamma )]\) with \(0<\gamma <1/2\), then \(g\in \mathcal{H}^{\nu }(\mathbb{R})\) (\(1/2<\nu <3/2\)), and \(R_{n}(m;\gamma ,\nu )=O_{p}(n^{-\gamma })\) (\(0<\gamma <1/2\)), It implies that \(R_{n}(m;\gamma ,\nu )\) has an order very close to \(O_{p}(n^{-1/2})\), which is comparable with the Bahadur order \(o_{p}((nh)^{-1/2})\) for kernel weighted local polynomial estimation [29], where the bandwidth \(h\rightarrow 0\) and the function requires second-order differentiability. For example, the triangular function having Fourier transform \(\sin ^{2}(\xi /2)/(\xi /2)^{2}\) belongs to \(\mathcal{H}^{1}(\mathbb{R})\) and is Lipschitz of order 1, so it satisfies our conditions for g but is not differentiable. Such a function has not been studied before.

Remark 3.3

Theorem 3.1(ii) states the rate of convergence of \(L_{1}\)-wavelet estimation for a nonparametric model. As in Remark 3.2, we consider the lower rate case of \(\eta _{m}\), i.e., \(1/2<\nu <3/2\). If we take \(2^{m}=O(n^{\gamma })\) (\(\gamma \geq 1/3\)), then \(\sup_{t\in [0,1]}|\hat{g}(t)-g(t)|=O_{p} (n^{-(1-\gamma )/2} \log n )\). Furthermore, taking \(\gamma =1/3\), one gets

$$ \sup_{t\in [0,1]} \bigl\vert \hat{g}(t)-g(t) \bigr\vert =O_{p} \bigl(n^{-1/3}\log n \bigr), $$

which is comparable with the optimal convergence rate of the nonparametric estimation in nonparametric models. Meanwhile, it is the same as the results of [21] (in probability) and [22] (almost sure) for any \(t\in [0,1]\) based on a least-square wavelet estimator, but they require that g is continuously differentiable, that is, \(g\in \mathcal{H}^{\nu }(\mathbb{R})\) (\(\nu >3/2\)).

To obtain an asymptotic expansion of the variance and an asymptotic normality result, we need to consider an approximation to \(\hat{g}(t)\) based on its values at dyadic points of order m. That is, we define \(\hat{g}^{d}(t)=\hat{g}_{n}(t^{(m)})\) with \(t^{(m)}= \lfloor 2^{m}t\rfloor /2^{m}\), where \(\lfloor z\rfloor \) denotes the maximum integer not greater than z.

Theorem 3.2

(Asymptotic normality)

Support that (A1)(A6) and (A7)(iii) hold, then

$$ \sqrt{n2^{-m}}\bigl(\hat{g}^{d}(t)-g(t)\bigr) \stackrel{D}{\longrightarrow }N \bigl(0,4^{-1}f_{\epsilon }^{-2}(0) \omega _{0}^{2}\kappa (t) \bigr), $$

where \(\omega _{0}^{2}=\int _{\mathbb{R}}E_{0}^{2}(0,u)\,du=\sum_{k\in \mathbb{Z}}\phi ^{2}(k)\).

Remark 3.4

\(\hat{g}^{d}(t)\) is the piecewise-constant approximation of \(\hat{g}(t)\) at resolution \(2^{-m}\). The reason to consider this is that the variance of ĝ is unstable as a function of t, because \(\operatorname{var}(\hat{g}(t))=2^{m}n^{-1}\kappa (t)\int _{0}^{1}E_{0}^{2}(t_{m},s)\,ds\), where \(t_{m}=2^{m}t-[2^{m}t]\). We know that, if t is non-dyadic, then the sequence \(t_{m}\) wanders around the unit interval and fails to converge. Also see [16].

4 Technical proofs

In order to prove the main results, we first present several lemmas.

Lemma 4.1

Suppose that (A4) holds. We have:

  1. (i)

    \(E_{0}(t,s)\leq c_{k}/(1+|t-s|)^{k}\)and \(E_{k}(t,s)\leq 2^{k}c_{k}/(1+2^{k}|t-s|)^{k}\), where k is a positive integer and \(c_{k}\)is a constant depending on k only.

  2. (ii)

    \(\sup_{0\leq t,s \leq 1}|E_{m}(t,s)|=O(2^{m})\).

  3. (iii)

    \(\sup_{0\leq t\leq 1}\int _{0}^{1}|E_{m}(t,s)|\,ds\leq c\), where c is a positive constant.

  4. (iv)

    \(\int _{0}^{1} E_{m}(t,s)\,ds\rightarrow 1\)uniformly in \(t\in [0,1]\), as \(m\rightarrow \infty \).

The proofs of (i) and (ii) can be found in [16], and (iii) follows from (i); the proof of (iv) can be found in [30].

Lemma 4.2

Suppose that (A4)(A5) hold and \(h(\cdot )\)satisfies (A2)(A3). Then

$$ \sup_{0\leq t\leq 1} \Biggl\vert h(t)-\sum _{i=1}^{n} h(t_{i}) \int _{A_{i}}E_{m}(t,s)\,ds \Biggr\vert =O \bigl(n^{-\gamma }\bigr)+O(\eta _{m}), $$

where

$$ \eta _{m}= \textstyle\begin{cases} (1/2^{m})^{\nu -1/2} & \textit{if } 1/2< \nu < 3/2, \\ \sqrt{m}/2^{m} & \textit{if } \nu =3/2, \\ 1/2^{m} & \textit{if } \nu >3/2. \end{cases} $$

It follows easily from Theorem 3.2 of [16].

Lemma 4.3

Let \(\{V_{i}, i=1,\ldots ,n\}\)be a sequence of independent random variables with mean zero and finite \(2+\delta \)th moments, and \(\{a_{ij}, i,j=1,\ldots ,n\}\)a set of positive numbers such that \(\max_{ij}|a_{ij}|\leq n^{-p_{1}}\)for some \(0\leq p_{1}\leq 1\)and \(\sum_{i=1}^{n}a_{ij}=O(n^{p_{2}})\)for some \(p_{2}\geq \max (0,2/(2+\delta )-p_{1})\). Then

$$ \max_{1\leq j\leq n} \Biggl\vert \sum_{i=1}^{n}a_{ij}V_{ij} \Biggr\vert =O\bigl(n^{-(p_{1}-p_{2})/2} \log n\bigr),\quad \textit{a.s.} $$

It can be found in [31].

Lemma 4.4

Let \(\{\lambda _{n}(\theta ),\theta \in \varTheta \}\)be a sequence of random convex functions defined on a convex, open subset Θ of \(\mathbb{R}^{d}\). Suppose \(\lambda (\cdot )\)is a real-valued function on Θ for which \(\lambda _{n}(\theta )\rightarrow \lambda (\theta )\)in probability, for each θ in Θ. Then for each compact subset K of Θ, in probability,

$$ \sup_{\theta \in K} \bigl\vert \lambda _{n}(\theta )- \lambda (\theta ) \bigr\vert \rightarrow 0. $$

See [32].

Below, we give the proof of the main results. The proof of Theorem 3.1 uses the idea of [32] and the convex lemma (Lemma 4.4). To complete the proof of Theorem 3.2, it is enough to check the Lindeberg-type condition.

Proof of Theorem 3.1

(i) From (2.1), note that \(\hat{g}(t)=\hat{a}\) and â minimizes

$$ \sum_{i=1}^{n} \vert y_{i}-a \vert \int _{A_{i}}E_{m}(t,s)\,ds. $$

Let \(\theta =a-g(t)\) and \(\epsilon _{i}^{*}=\epsilon _{i}+[g(t_{i})-g(t)]\). Then θ minimizes the function

$$ G_{n}(\theta )=\sum_{i=1}^{n} \bigl\{ \bigl\vert \epsilon _{i}^{*}-\theta \bigr\vert - \bigl\vert \epsilon _{i}^{*} \bigr\vert \bigr\} \int _{A_{i}}E_{m}(t,s)\,ds. $$

The idea behind the proof, as in [32], is to approximate \(G_{n}(\theta )\) by a quadratic function whose minima have an explicit expression, and then to show that θ̂ is close enough to those minima that share their asymptotic behavior.

We now set out to approximate \(G_{n}(\theta )\) by a quadratic function of θ. Write

$$ G_{n}(\theta )=W_{n}\theta +R_{n}(\theta ) , $$

where \(W_{n}=-\sum_{i=1}^{n}\operatorname{sign}(\epsilon _{i})\int _{A_{i}}E_{m}(t,s)\,ds\), which does not depend on θ, and

$$ R_{n}(\theta )=\sum_{i=1}^{n} \bigl\{ \bigl\vert \epsilon _{i}^{*}-\theta \bigr\vert - \bigl\vert \epsilon _{i}^{*} \bigr\vert + \operatorname{sign}(\epsilon _{i})\theta \bigr\} \int _{A_{i}}E_{m}(t,s)\,ds. $$
(4.1)

We have

$$ G_{n}(\theta )=E\bigl(G_{n}(\theta ) \bigr)+W_{n}\theta +\bigl[R_{n}(\theta )-E \bigl(R_{n}( \theta )\bigr)\bigr]. $$
(4.2)

From the error assumption (A1), it is ensured that the function \(\Delta (t)=E[|\epsilon _{i}-t|-|\epsilon _{i}|]\) has a unique minimum at zero, and \(\Delta (t)=t^{2}f_{\epsilon }(0)+o(t^{2})\). Therefore, by Lemmas 4.1 and 4.2,

$$\begin{aligned} E\bigl(G_{n}(\theta )\bigr) =&\sum _{i=1}^{n}\bigl\{ f_{\epsilon }(0) \theta ^{2}-2f_{\epsilon }(0)\bigl[g(t_{i})-g(t) \bigr]\theta \bigr\} \int _{A_{i}}E_{m}(t,s)\,ds+o\bigl( \delta _{n}^{2}\bigr) \\ =&f_{\epsilon }(0)\theta ^{2}-2f_{\epsilon }(0)\theta \Biggl\{ \sum_{i=1}^{n}g(t_{i}) \int _{A_{i}}E_{m}(t,s)\,ds-g(t) \Biggr\} +o\bigl( \delta _{n}^{2}\bigr) \\ =&f_{\epsilon }(0)\theta ^{2}+O\bigl[\bigl(n^{-\gamma }+ \eta _{m}\bigr)\theta \bigr]+o\bigl( \delta _{n}^{2} \bigr), \end{aligned}$$
(4.3)

where \(\delta _{n}=\max \{ (n^{-\gamma }+\eta _{m}),|\theta | \} \). For (4.1), note that \(\vert |\epsilon _{i}^{*}-\theta |-|\epsilon _{i}^{*}|+\operatorname{sign}( \epsilon _{i})\theta \vert \leq 2|\theta |I\{|\varepsilon _{i}| \leq |\theta |+|g(t_{i})-g(t)|\}\), then we obtain

$$\begin{aligned} ER_{n}^{2}(\theta ) \leq & 4\theta ^{2}\sum _{i=1}^{n}EI\bigl\{ \vert \varepsilon _{i} \vert \leq \vert \theta \vert + \bigl\vert g(t_{i})-g(t) \bigr\vert \bigr\} \biggl\{ \int _{A_{i}}E_{m}(t,s)\,ds \biggr\} ^{2} \\ =&8\theta ^{2}f_{\epsilon }(0)\sum _{i=1}^{n} \bigl\vert g(t_{i})-g(t) \bigr\vert \biggl\{ \int _{A_{i}}E_{m}(t,s)\,ds \biggr\} ^{2} \bigl(1+o(1)\bigr) \\ =&O \biggl(\frac{2^{m}}{n^{1+\gamma }}\theta ^{2} \biggr). \end{aligned}$$

We get

$$ R_{n}(\theta )-E\bigl(R_{n}(\theta ) \bigr)=O_{p} \biggl(\theta \sqrt{ \frac{2^{m}}{n^{1+\gamma }}} \biggr). $$
(4.4)

Let \(a_{n}=O_{p} \{ n^{-\gamma }+\eta _{m}+\sqrt{ \frac{2^{m}}{n^{1+\gamma }}} \} \). Combining (4.2)–(4.4), for each fixed θ, we have

$$\begin{aligned} G_{n}(\theta ) =&f_{\epsilon }(0)\theta ^{2}+W_{n} \theta +O\bigl[\bigl(n^{-\gamma }+ \eta _{m}\bigr)\theta \bigr]+ O_{p} \biggl(\theta \sqrt{ \frac{2^{m}}{n^{1+\gamma }}} \biggr) \\ =&f_{\epsilon }(0)\theta ^{2}+(W_{n}+a_{n}) \theta , \end{aligned}$$
(4.5)

with \(a_{n}=o_{p}(1)\) uniformly. Note that

$$ W_{n}=-\sum_{i=1}^{n} \operatorname{sign}(\epsilon _{i}) \int _{A_{i}}E_{m}(t,s)\,ds. $$

It is easy to see that \(W_{n}\) has a bounded second moment and hence is stochastically bounded. Since the convex function \(G_{n}(\theta )-(W_{n}+a_{n})\theta \) converges in probability to the convex function \(f_{\epsilon }(0)\theta ^{2}\), it follows from the convexity lemma, Lemma 4.4, that, for every compact set K,

$$ \sup_{\theta \in K} \bigl\vert G_{n}( \theta )-(W_{n}+a_{n})\theta -f_{\epsilon }(0) \theta ^{2} \bigr\vert =o_{p}(1). $$
(4.6)

Thus, the quadratic approximation to the convex function \(G_{n}(\theta )\) holds uniformly for θ in any compact set. So, using the convexity assumption again, the minimizer θ̂ of \(G_{n}(\theta )\) converges in probability to the minimizer

$$ \bar{\theta }=-\frac{1}{2}f_{\epsilon }^{-1}(0) (W_{n}+a_{n}), $$
(4.7)

that is,

$$ P\bigl( \vert \hat{\theta }-\bar{\theta } \vert >\delta \bigr)\rightarrow 0. $$

The assertion can be proved by some elementary arguments, which is similar to the proof of Theorem 1 in [32]. Based on (4.6), let \(G_{n}(\theta )=(W_{n}+a_{n})\theta +f_{\epsilon }(0)\theta ^{2}+r_{n}( \theta )\) which can be written as

$$ G_{n}(\theta )=f_{\epsilon }(0)\bigl\{ \vert \theta -\bar{\theta } \vert ^{2}- \vert \bar{\theta } \vert ^{2}\bigr\} +r_{n}(\theta ), $$
(4.8)

with \(\sup_{\theta \in K}|r_{n}(\theta )|=o_{p}(1)\). Because θ̄ is stochastically bounded. The compact set K can be chosen to contain a closed ball \(B(n)\) with center θ̄ and radius δ in probability, thereby implying that

$$ \Delta _{n}=\sup_{\theta \in B(n)} \bigl\vert r_{n}(\theta ) \bigr\vert =o_{p}(1). $$

Now consider the behavior of \(G_{n}(\theta )\) outside \(B(n)\). Suppose \(\theta =\bar{\theta }+\beta \mu \), with \(\beta >\delta \) and μ is a unit vector. Define \(\theta ^{*}\) as the boundary point \(B_{n}\) that lies on the line segment from θ̄ to θ, i.e. \(\theta ^{*}=\bar{\theta }+\delta \mu \). Convexity of \(G_{n}(\theta )\), (4.8) and the definition of \(\Delta _{n}\) imply

$$\begin{aligned} \frac{\delta }{\beta }G_{n}(\theta )+\biggl(1- \frac{\delta }{\beta }\biggr)G_{n}( \bar{\theta }) \geq & G_{n}\bigl(\theta ^{*}\bigr) \\ \geq &f_{\epsilon }(0)\delta ^{2}-f_{\epsilon }(0) \vert \bar{\theta } \vert ^{2}- \Delta _{n} \\ \geq &f_{\epsilon }(0)\delta ^{2}+G_{n}(\bar{ \theta })-2\Delta _{n}. \end{aligned}$$

It follows that

$$ \inf_{|\theta -\bar{\theta }|>\delta }G_{n}(\theta )\geq G_{n}( \bar{\theta })+\frac{\beta }{\delta }\bigl[f_{\epsilon }(0)\delta ^{2}-2\Delta _{n}\bigr], $$

when \(2\Delta _{n}< f_{\epsilon }(0)\delta ^{2}\), which happens with probability tending to one, the minimum of \(G_{n}(\theta )\) cannot occur at any θ with \(|\theta -\bar{\theta }|>\delta \). This implies that, for any \(\delta >0\) and for large enough n, the minimum of \(G_{n}(\theta )\) must be achieved with \(B(n)\), i.e., \(|\hat{\theta }-\bar{\theta }|\leq \delta \) with probability tending to one. Thus, it completes the proof of (i).

(ii) In the following, we will prove that

$$ W_{n}=O \biggl\{ \biggl(\frac{2^{m}}{n} \biggr)^{1/2}\log n \biggr\} ,\quad \mbox{a.s.} $$
(4.9)

By Lemma 4.1, we have

$$ \max_{i,m} \biggl\vert \int _{A_{i}}E_{m}(t,s)\,ds \biggr\vert =O \bigl(2^{m}/n\bigr)=O\bigl(n^{-2p}\bigr) $$

and

$$ \sum_{i=1}^{n} \int _{A_{i}}E_{m}(t,s)\,ds= \int _{0}^{1} E_{m}(t,s)\,ds=O(1)=O \bigl(n^{p_{2}}\bigr), $$

where \(p_{1}=2p\) with \(0\leq p_{1}\leq 1\) and \(p_{1}\geq 2/(2+\delta )\), and \(p_{2}=0\), which can be satisfied by Conditions (A1) and (A7)(ii). By Lemma 4.3, \(W_{n}=O(n^{-p}\log n)\). Furthermore, we get (4.9). So, (ii) holds. □

Proof of Theorem 3.2

From Theorem 3.1(i), we have

$$ 2f_{\epsilon }(0)\bigl\{ \hat{g}(t)-g(t)\bigr\} =Z_{n}(t)+R_{n}(m;\gamma ,\nu ), $$
(4.10)

where \(Z_{n}(t)=\sum_{i=1}^{n}\operatorname{sign}(\epsilon _{i})\int _{A_{i}}E_{m}(t,s)\,ds\) and \(R_{n}(m;\gamma ,\nu )=O_{p} \{ n^{-\gamma }+\eta _{m}+\sqrt{ \frac{2^{m}}{n^{1+\gamma }}} \} \). From (A7)(iii) \(n2^{-2mv^{*}}\rightarrow 0\), one gets

$$ \sqrt{n2^{-m}}R_{n}(m;\gamma ,\nu )=o_{p}(1). $$
(4.11)

Now, let us verify the asymptotic normality of \(\sqrt{n2^{-m}}Z_{n}(t^{(m)})\). First, we calculate the variance of it. By the proofs of Theorem 3.3 and Lemma 6.1 of [16], we have

$$\begin{aligned}& \bigl\vert \operatorname{var} \bigl(\sqrt{n2^{-m}}Z_{n} \bigl(t^{(m)}\bigr) \bigr)-\kappa (t) \omega _{0}^{2} \bigr\vert \\& \quad = \Biggl\vert n2^{-m}\sum _{i=1}^{n} \biggl( \int _{A_{i}}E_{m}\bigl(t^{(m)},s\bigr)\,ds \biggr)^{2}-\kappa (t)\omega _{0}^{2} \Biggr\vert \\& \quad \leq \Biggl\vert n2^{-m}\sum_{i=1}^{n} \biggl( \int _{A_{i}}E_{m}\bigl(t^{(m)},s\bigr)\,ds \biggr)^{2}-2^{-m} \int _{0}^{1}E_{m}^{2} \bigl(t^{(m)},s\bigr)\kappa (s)\,ds \Biggr\vert \\& \qquad {} + \biggl\vert 2^{-m} \int _{0}^{1}E_{m}^{2} \bigl(t^{(m)},s\bigr)\kappa (s)\,ds-\kappa (t) \omega _{0}^{2} \biggr\vert \\& \quad \leq n2^{-m} \Biggl\vert \sum_{i=1}^{n} (s_{i}-s_{i-1})^{2}E_{m}^{2} \bigl(t^{(m)},u_{i}\bigr)- \frac{1}{n}(s_{i}-s_{i-1})E_{m}^{2} \bigl(t^{(m)},v_{i}\bigr)k(v_{i}) \Biggr\vert +o(1) \\& \qquad (\mbox{where }u_{i}\mbox{ and }v_{i}\mbox{ belong to }A_{i}) \\& \quad = n2^{-m}O\bigl(n^{-1}\bigr)O\bigl(n2^{-m} \bigr) \biggl(\rho (n)2^{2m}+ \frac{2^{2m}}{n^{2}}+ \frac{2^{2m}}{n}\frac{2^{m}}{n} \biggr)+o(1) \\& \quad \leq O \bigl(n\rho (n)+2^{m}/n \bigr)=o(1). \end{aligned}$$

So,

$$ \operatorname{var} \bigl(\sqrt{n2^{-m}}Z_{n} \bigl(t^{(m)}\bigr) \bigr)=\kappa (t)\omega _{0}^{2}+o(1). $$
(4.12)

To complete the proof, we only need to check the Lindeberg-type condition

$$ \max_{1\leq i\leq n} \frac{n2^{-m} (\int _{A_{i}}E_{m}(t,s)\,ds )^{2}}{\operatorname{var} (\sqrt{n2^{-m}}Z_{n}(t^{(m)}) )} \rightarrow 0. $$

From (4.12) and Lemma 4.1, one sees that the order is \(O(2^{m}/n)\rightarrow 0\). Thus, we complete Theorem 3.2. □