1 Introduction

Consider the generalized semiparametric model

$$ y_{i}=h \bigl(\mathbf{x}_{i}^{T} \beta \bigr)+f(t_{i})+e_{i} , \quad 1 \leq i\leq n, $$
(1)

where \(y_{i}\) are scalar response variables, \(h(\cdot )\) is a continuously differentiable known function, the superscript T denotes the transpose, \(\mathbf{x}_{i}=(x_{i1},\ldots ,x_{id})^{T}\) are explanatory variables, β is a d-dimensional unknown parameter, \(f(\cdot )\) is an unknown function, and \(0\leq t_{1}\leq t_{2}\leq \cdots \leq t_{n} \leq 1\). Some authors commented that the assumption of independence is a serious restriction (see Huber [1] and Hampel [2]); so for the errors \(e_{i}\), we confine ourselves to negatively superadditive dependent (NSD) errors. NSD random variables have been introduced by Hu [3] and are widely used in statistics; see [4,5,6,7,8,9,10,11,12].

The theory of the GSPM is an extension of the classical theory of partially linear models; the component of the generalized parametric \(h (\mathbf{x}_{i}^{T}\beta )\) for GSPM includes the linear parametric component \(\mathbf{x}^{T}_{i}\beta \), exponential parametric component \(e^{\mathbf{x}^{T}_{i}\beta }\), and so on.

As is well known, the generalized partially linear model and partially linear single-index model (\(h(\cdot )\) is an unknown link function) are also derived from the partially linear model. There is a substantial amount of work for generalized partially linear model (see [13,14,15,16,17,18] and, for a partially linear single-index model, [19,20,21,22,23,24]); this research is devoted to presenting various methods to obtain estimators of β and \(f(t_{i})\) and investigating some large-sample properties of these estimators.

In this paper, we consider a difference-based estimator method to estimate the unknown parametric component β. This difference-based estimator is optimal in the sense that the estimator of the unknown parametric component is asymptotically efficient. For example, Tabakan et al. [25] studied a difference-based ridge in a partially linear model. Wang et al. [26] obtained a difference-based approach to the semiparametric partially linear model. Zhao and You [27] used a difference-based estimator method to estimate the parametric component for partially linear regression models with measurement errors. Duran et al. [28] investigated the difference-based ridge and Liu-type estimators in semiparametric regression models. Wu [29] discussed a restricted difference-based Liu estimator in partially linear models. Hu et al. [30] presented a difference-based Huber–Dutter (DHD) estimator to obtain the root variance σ and parameter β for a partially linear model. However, Most of the results rely on the independence errors. Wu [31] studied the difference-based ridge-type estimator of parameters in a restricted partial linear model with correlated errors, but this paper just focuses on estimating the linear component. Zeng and Liu [32] used a difference-based and ordinary least-square method to obtain the estimator of an unknown parametric component, but this paper ignores the fact that a difference-based estimator may cause greater bias in moderately sized samples than other estimators. Inspired by these papers, we propose a difference-based M-estimator (DM) methods for generalized semiparametric model with NSD errors. The M-estimator is a most famous robust estimator, which was introduced by Huber [33]. In addition, once β is estimated, we can estimate \(f(\cdot )\) by a variety of nonparametric techniques. In this paper, the estimator of \(f(\cdot )\) is obtained by the wavelet method.

The paper has the following structure. In Sect. 2, we present the estimation procedure. In Sect. 3, we establish the main results. The proofs of the main results are provided in the Appendix.

2 Estimation method

2.1 Notation

Throughout the paper, Z is the set of integers, N is the set of natural numbers, R is the set of real numbers. A sequence of random variables \(\eta _{n}\) is said to be of smaller order in probability than a sequence \(d_{n}\) (denoted by \(\eta _{n}=o_{P}(d_{n})\)) if \(\eta _{n}/d_{n}\) converges to 0 in probability, and \(\eta _{n}=O_{P}(d_{n})\) if \(\eta _{n}/d_{n}\) is bounded in probability. Convergence in distribution is denoted by \(H_{n}\stackrel{D}{ \rightarrow }H\). For any arbitrary function \(h(\cdot )\), \(h'(\cdot )\), \(h''(\cdot )\), and \(h'''(\cdot )\) are the first, second, and third derivatives of \(h(\cdot )\), respectively. \(\|\mathbf{x}\|\) is the Euclidean norm of x, and \(\lfloor x\rfloor =\max \{k\in \mathbf{Z}:k\leq x\}\). Let \(C_{0}\), \(C_{1}\), \(C_{2}\), \(C_{3}\), \(C_{4}\) be positive constants, and let \(\beta _{0}\) be the true parameter. Let \(\varTheta = \{\beta :|\beta -\beta _{0}|\leq C_{0} \}\).

2.2 Difference-based M-estimation

Let \(\tilde{y}_{i}=\sum_{q=0}^{m}d_{q}y_{i+q}\), \(\tilde{h}_{i}(\beta )=\sum_{q=0}^{m}d_{q}h (\mathbf{x}^{T}_{i+q}\beta )\), \(\tilde{f}(t_{i})=\sum_{q=0}^{m}d_{q}f(t_{i+q})\), and \(\tilde{e}_{i}= \sum_{q=0}^{m}d_{q}e_{i+q}\), where \(d_{0},d_{1},\ldots ,d _{m}\) satisfy the conditions

$$ \sum_{q=0}^{m} d_{q}=0, \qquad \sum_{q=0}^{m} d^{2}_{q}=1. $$
(2)

Then \(\tilde{y}_{i}\), \(\tilde{h}_{i}(\beta )\), \(\tilde{f}(t_{i})\), and \(\tilde{e}_{i}\) can be seen as the mth-order differences of \(y_{i}\), \(h(\mathbf{x}_{i}^{T}\beta )\), \(f(t_{i})\), and \(e_{i}\), respectively. Hence, applying the differencing procedures, model (1) becomes

$$ \tilde{y}_{i}=\tilde{h}_{i}(\beta )+ \tilde{f}(t_{i})+\tilde{e}_{i}, \quad 1\leq i\leq n-m. $$
(3)

From Yatchew [34] we find that the application of differencing procedures in model (1) can remove the nonparametric effect in large samples, so we ignore the presence of \(\tilde{f}(\cdot )\). Thus (3) becomes

$$ \tilde{y}_{i}=\tilde{h}_{i}(\beta )+ \tilde{e}_{i} \quad 1\leq i\leq n-m. $$
(4)

Let ρ be a convex function. Assume that ρ has a continuous derivative ψ and there is a such that \(\psi (a)=0\). We can propose the difference-based M-estimator given by minimizing

$$ Q(\beta )=\sum_{i=1}^{n-m}\rho \bigl(\tilde{y_{i}}-\tilde{h}_{i}( \beta )+a \bigr). $$
(5)

Let a \(d\times 1\) vector \(\hat{\beta }_{n}\) be the minimizer of (5) and \(\hat{\beta }_{n}\in \varTheta \). Write \(\tilde{\mathbf{h}}'_{i}(\beta )= \sum_{q=0}^{m}d_{q}\times h'(\mathbf{x}^{T}_{i+q}\beta )\mathbf{x}_{i+q}\), \(\tilde{h}'_{ik}(\beta )=\sum_{q=0}^{m}d_{q}h'(\mathbf{x}^{T}_{i+q} \beta )x_{(i+q)k}\), with \(1 \leq k \leq d\), \(\tilde{\mathbf{h}}''_{i}( \beta )=\sum_{q=0}^{m}d_{q}h''(\mathbf{x}^{T}_{i+q}\beta )\times\mathbf{x} _{i+q}\mathbf{x}^{T}_{i+q}\), and \(\tilde{\mathbf{h}}'_{i}(\beta ) \tilde{\mathbf{h}}^{\prime \,T}_{j}(\beta )=\sum_{q=0}^{m}d_{q}h'(\mathbf{x} ^{T}_{i+q}\beta )\mathbf{x}_{i+q}\sum_{q=0}^{m}d_{q}h'(\mathbf{x}^{T} _{j+q}\beta )\mathbf{x}^{T}_{j+q}\). Then the estimator satisfies

$$ \frac{\partial Q(\hat{\beta }_{n})}{\partial \beta }=-\sum_{i=1}^{n-m} \psi (\hat{\tilde{e_{i}}}+a)\tilde{\mathbf{h}}'_{i}( \hat{\beta }_{n})=0 $$
(6)

with \(\hat{\tilde{e_{i}}}=\tilde{y}_{i}-\tilde{\mathbf{h}}_{i}( \hat{\beta }_{n})\). The convexity of ρ guarantees the equivalence of (5) and (6) and the asymptotic uniqueness of the solution; otherwise, it is unimportant.

We estimate the nonparametric function \(f(\cdot )\) by the wavelet method. The formal definition of the wavelet method is the following.

Suppose that there exist a scaling function \(\phi (\cdot )\) in the Schwartz space \(S_{l}\) and a multiresolution analysis \(\{V_{\tilde{m}} \}\) in the concomitant Hilbert space \(L^{2}(\mathbf{R})\) with the reproducing kernel \(E_{\tilde{m}}(t,s)\) given by

$$\begin{aligned} E_{\tilde{m}}(t,s)=2^{\tilde{m}}E_{0}\bigl(2^{\tilde{m}}t, 2^{\tilde{m}}s\bigr)=2^{ \tilde{m}}\sum_{k\in \mathbf{Z}}\phi \bigl(2^{\tilde{m}}t-k\bigr)\phi \bigl(2^{ \tilde{m}}s-k\bigr). \end{aligned}$$

Let \(A_{i}=[s_{i-1}, s_{i}]\) denote intervals that partition \([0, 1]\) with \(t_{i} \in A_{i}\) for \(1\leq i\leq n\). Then the estimator of the nonparameter \(f(t)\) is given by

$$\begin{aligned} \hat{f}_{n}(t)=\sum_{i=1}^{n} \bigl(y_{i}-\mathbf{x}^{T}_{i}\hat{\beta } _{n}\bigr) \int _{A_{i}}{E_{\tilde{m}}(t,s)}\,ds. \end{aligned}$$
(7)

3 Main results

We now list some conditions used to obtain the main results.

  1. (C1)

    \(\max_{1\leq i \leq n}\|\mathbf{x}_{i}\|=O(1) \), and the eigenvalues of \(n^{-1}\sum^{n}_{i=1}\mathbf{x}_{i}\mathbf{x}^{T}_{i}\) are bounded above and away from zero.

  2. (C2)

    \(b, bc-d^{2}>0\), where \(b=E\{\psi '(\eta )\}\), \(c=E\{\eta ^{2} \psi '(\eta )\}\), \(d=E\{\eta \psi '(\eta )\}\) with \(\eta =\tilde{e}_{i}+a\).

  3. (C3)

    \(E\psi (\tilde{e}_{i}+a )=0\).

  4. (C4)

    The function ρ is assumed to be convex, not monotone, and possessing bounded derivatives of sufficiently high order in a neighborhood of the point \(\mathbf{x}_{i}^{T}\beta _{0}\). In particular, \(\psi (t)\) should be continuous and bounded in a neighborhood of \(\mathbf{x}_{i}^{T}\beta _{0}\).

  5. (C5)

    \(h(\cdot )\) is assumed to possess bounded derivatives of sufficiently high order in a neighborhood of point \(\mathbf{x}_{i} ^{T}\beta _{0}\).

  6. (C6)

    \(f(\cdot )\in H^{\alpha }\) (Sobolev space) for some \(\alpha >1/2\).

  7. (C7)

    \(f(\cdot )\) is a Lipschitz function of order \(\gamma >0\).

  8. (C8)

    \(\phi (\cdot )\) belongs to \(S_{l}\), which is a Schwartz space for \(l\geq \alpha \), is a Lipschitz function of order 1, and has a compact support, in addition to \(|\hat{\phi }(\xi )-1|=O(\xi )\) as \(\xi \rightarrow 0\), where ϕ̂ denotes the Fourier transform of ϕ.

  9. (C9)

    \(s_{i}\), \(1\leq i\leq n\), satisfy \(\max_{1\leq i\leq n}(s_{i}-s _{i-1})=O(n^{-1})\), and \(2^{\tilde{m}}=O(n^{1/3})\).

Remark 1

Condition (C1) is often imposed in M-estimation theory of regression models. Condition (C2) is used by Silvapullé [35] for HD estimation. In this paper, this condition is also necessary for M-estimation. Condition (C3) is used by Wu [36] and Zeng and Hu [37] with \(a=0\). We require this in order that the expectation of (5) reaches its minimum at the true value \(\beta _{0}\). For Condition (C4), higher-order derivatives are technically convenient (Taylor expansions), but their existence is hardly essential for the results to hold; see Huber [1]. Condition (C5) is quite mild and can be easily satisfied. Conditions (C6)–(C9) are used by Hu et al. [38].

Remark 2

The assumption of \(\psi (a)=0\) and Condition (C4) are serious restrictions, which shows that the M-estimator in our paper is a particular case of the classical M-estimator. However, in our study, these conditions are necessary.

Theorem 3.1

Let \(\{e_{n}, n\geq 1\}\) be a sequence of NSD random variables with \(Ee_{n}=0\), and let for some \(\delta >0\),

$$ \sup_{n\geq 1}E \vert e_{n} \vert ^{2+\delta }< \infty . $$
(8)

Suppose that

$$ \sup_{j\geq 1}\sum_{i: \vert i-j \vert \geq u} \bigl\vert \operatorname{cov}(e_{i},e_{j}) \bigr\vert \rightarrow 0 \quad \textit{as } u\rightarrow \infty . $$
(9)

Set \(\tilde{e}_{i}=\sum_{q=0}^{m}d_{q}e_{i+q}\), where \(\{d_{q}, 1 \leq q \leq m\}\) are defined in (2). Let \(\{c_{i}, 1\leq i \leq n-m\}\) be an array of constants satisfying \(\max_{1\leq i\leq n-m}|c_{i}|=O(1)\), and suppose that \(\psi (a)=0\) and Conditions (C3) and (C4) hold. Then

$$ (n-m)^{-1/2}\tau ^{-1}\sum _{i=1}^{n-m}c_{i}\psi (\tilde{e}_{i}+a) \stackrel{D}{ \rightarrow } N(0,1), $$
(10)

provided that

$$ \tau ^{2}=\lim_{n\rightarrow \infty }(n-m)^{-1} \Biggl\{ \sum_{i=1}^{n-m}c _{i}^{2} \operatorname{Var}\bigl(\psi (\tilde{e}_{i}+a)\bigr)+2\sum _{i=1}^{n-m}\sum_{j=i+1}^{n-m}c _{i}c_{j}\operatorname{Cov}\bigl(\psi (\tilde{e}_{i}+a),\psi ( \tilde{e}_{j}+a)\bigr) \Biggr\} >0. $$

Theorem 3.2

Let \(\{e_{n}, n\geq 1\}\) be a sequence of NSD random variables with \(Ee_{n}=0\) satisfying conditions (8) and (9). Assume that conditions (C1)–(C5) hold. Then

$$\begin{aligned} &(n-m)^{-1/2}\tau ^{-1}_{\beta }E \biggl( \frac{\partial ^{2}Q(\beta _{0})}{ \partial \beta \partial \beta ^{T}} \biggr) (\hat{\beta }_{n}-\beta _{0}) \stackrel{D}{ \rightarrow } N(0,I_{d}), \end{aligned}$$
(11)

provided that

$$\begin{aligned} \tau ^{2}_{\beta }={}& \lim_{n\rightarrow \infty } \frac{1}{n-m} \Biggl\{ \sum_{i=1}^{n-m} \tilde{\mathbf{h}}_{i}'(\beta _{0})\tilde{ \mathbf{h}}_{i}^{\prime \,T}( \beta _{0})\operatorname{Var} \bigl(\psi (\tilde{e}_{i}+a ) \bigr) \\ &{}+2\sum_{i=1}^{n-m}\sum _{j= i+1}^{n-m}\tilde{\mathbf{h}}_{i}'( \beta _{0})\tilde{\mathbf{h}}_{j}^{\prime \,T}(\beta _{0})\operatorname{Cov} \bigl(\psi (\tilde{e} _{i}+a ),\psi ( \tilde{e}_{j}+a ) \bigr) \Biggr\} \end{aligned}$$

is a positive definite matrix, where \(I_{d}\) is the identity matrix of order d.

Corollary 3.1

Let \(h(\mathbf{x}_{i}^{T}\beta )= \mathbf{x}_{i}^{T}\beta \), and let \(\{e_{n}, n\geq 1\}\) be a sequence of NSD random variables with \(Ee_{n}=0\) satisfying conditions (8) and (9). Assume that Conditions (C1)–(C4) hold. Then

$$\begin{aligned} &(n-m)^{-1/2}\tau ^{-1}_{\beta }E \biggl( \frac{\partial ^{2}Q(\beta _{0})}{ \partial \beta \partial \beta ^{T}} \biggr) (\hat{\beta }_{n}-\beta _{0}) \stackrel{D}{ \rightarrow } N(0,I_{d}), \end{aligned}$$
(12)

provided that

$$ \tau ^{2}_{\beta }=\lim_{n\rightarrow \infty }\frac{1}{n-m} \Biggl\{ \sum_{i=1}^{n-m}\tilde{ \mathbf{x}}_{i}\tilde{\mathbf{x}}_{i}^{T} \operatorname{Var} \bigl(\psi (\tilde{e}_{i}+a ) \bigr)+2\sum _{i=1}^{n-m}\sum_{j= i+1} ^{n-m}\tilde{\mathbf{x}}_{i}\tilde{\mathbf{x}}_{j}^{T}\operatorname{Cov} \bigl(\psi (\tilde{e}_{i}+a ),\psi (\tilde{e}_{j}+a ) \bigr) \Biggr\} $$

is a positive definite matrix.

Corollary 3.2

Let \(\{e_{n}, n\geq 1\}\) be a sequence of NSD random variables with \(Ee_{n}=0\) satisfying \(\operatorname{Cov}_{|i-j|> \bar{m}}(e_{i},e_{j})=0\) with \(\bar{m}<\infty \). Assume that Condition (C1)–(C5) and (8) hold. Then

$$\begin{aligned} &(n-m)^{-1/2}\tau ^{-1}_{\beta }E \biggl( \frac{\partial ^{2}Q(\beta _{0})}{ \partial \beta \partial \beta ^{T}} \biggr) (\hat{\beta }_{n}-\beta _{0}) \stackrel{D}{ \rightarrow } N(0,I_{d}), \end{aligned}$$

provided that

$$\begin{aligned} \tau ^{2}_{\beta }={}&\lim_{n\rightarrow \infty } \frac{1}{n-m} \Biggl\{ \sum_{i=1}^{n-m} \tilde{\mathbf{h}}_{i}'(\beta _{0})\tilde{ \mathbf{h}}_{i}^{\prime \,T}( \beta _{0})\operatorname{Var} \bigl(\psi (\tilde{e}_{i}+a ) \bigr) \\ &{}+2\sum_{k=1}^{\bar{m}}\sum _{i=1}^{n-m-k}\tilde{\mathbf{h}} _{i+k}'( \beta _{0})\tilde{\mathbf{h}}_{i}^{\prime \,T}(\beta _{0})\operatorname{Cov} \bigl(\psi (\tilde{e}_{i+k}+a ),\psi ( \tilde{e}_{i}+a ) \bigr) \Biggr\} \end{aligned}$$

is a positive definite matrix.

By Theorem 3.2 we also easily obtain some corresponding results for \(\rho (t)=t^{2}\). Here we omit their proofs.

Corollary 3.3

(Zeng and Liu [32])

Let \(\rho (t)=t^{2}\), \(h(\mathbf{x}_{i}^{T}\beta )=\mathbf{x}_{i}^{T}\beta \), and let \(\{e_{n}, n\geq 1\}\) be a sequence of NSD random variables with \(Ee_{n}=0\) satisfying conditions (8) and (9). Assume that conditions (C1)–(C2) hold. Then

$$\begin{aligned} &(n-m)^{-1/2}\tau ^{-1}_{\beta }\sum ^{n-m}_{i=1}\tilde{\mathbf{x}}_{i} \tilde{ \mathbf{x}}_{i}^{T}(\hat{\beta }_{n}-\beta _{0})\stackrel{D}{ \rightarrow } (0,I_{d}), \end{aligned}$$

provided that

$$ \tau ^{2}_{\beta }=\lim_{n\rightarrow \infty }(n-m)^{-1} \Biggl\{ \sum_{i=1}^{n-m}\tilde{x}_{i} \tilde{x}^{T}_{i}\operatorname{Var} (\tilde{e}_{i} )+2 \sum_{i=1}^{n-m}\sum _{j=i+1}^{n-m}\tilde{x}_{i} \tilde{x}^{T}_{j}\operatorname{Cov} (\tilde{e}_{i}, \tilde{e}_{j} ) \Biggr\} $$

is a positive definite matrix.

Corollary 3.4

Let \(\rho (t)=t^{2}\), \(h(\mathbf{x} _{i}^{T}\beta )=e^{\mathbf{x}_{i}^{T}\beta }\), and let \(\{e_{n}, n \geq 1\}\) be a sequence of NSD random variables with \(Ee_{n}=0\) satisfying conditions (8) and (9). Assume that conditions (C1)–(C2) hold. Then

$$\begin{aligned} &(n-m)^{-\frac{1}{2}}\tau ^{-1}_{\beta }\sum ^{n-m}_{i=1} \Biggl(\sum_{q=0}^{m}d_{q}e^{\mathbf{x}^{T}_{i+q}\beta _{0}} \mathbf{x}_{i+q} \Biggr) ^{2}(\hat{\beta }_{n}-\beta _{0})\stackrel{D}{\rightarrow } (0,I_{d}), \end{aligned}$$

provided that \(\tau ^{2}_{\beta }=\lim_{n\rightarrow \infty }(n-m)^{-1}\operatorname{Var} (\sum_{i=1}^{n-m}\tilde{e}_{i}\sum_{q=0}^{m}d_{q}e^{\mathbf{x} ^{T}_{i+q}\beta _{0}}\mathbf{x}_{i+q} )\) is a positive definite matrix.

Theorem 3.3

Under the conditions of Theorem 3.2, assume that Conditions (C6)–(C9) hold. Then

$$\begin{aligned} \sup_{0\leq t \leq 1} \bigl\vert \hat{f}_{n}(t)-f(t) \bigr\vert =O_{P}\bigl(n^{- \gamma }\bigr)+O_{P}(\tau _{\tilde{m}})+O_{P}\bigl(n^{-1/3}M_{n}\bigr) \quad \textit{as } n\rightarrow \infty , \end{aligned}$$
(13)

where \(M_{n}\rightarrow \infty \) in arbitrary slowly rate, and \(\tau _{\tilde{m}}=2^{-\tilde{m}(\alpha -1/2)}\) if \(1/2< \alpha <3/2\), \(\tau _{\tilde{m}}=\sqrt{\tilde{m}}2^{-\tilde{m}}\) if \(\alpha =3/2\), and \(\tau _{\tilde{m}}=2^{-\tilde{m}}\) if \(\alpha >3/2\).