1 Introduction

Longitudinal data (Diggle et al. [1]) are characterized by repeated observations over time on the same set of individuals. They are common in medical and epidemiological studies. Examples of such data can be easily found in clinical trials and follow-up studies for monitoring disease progression. Interest of the study is often focused on evaluating the effects of time and covariates on the outcome variables. Let t ij be the time of the j th measurement of the i th subject, x ij Rpand y ij be the i th subject's observed covariate and outcome at time t ij respectively. We assume that the full dataset {(x ij , y ij , t ij ), i = 1,..., n, j = 1,..., m i }, where n is the number of subjects and m i is the number of repeated measurements of the i th subject, is observed and can be modeled as the following partially linear models

y i j = x i j T β + g ( t i j ) + e i j ,
(1.1)

where β is a p × 1 vector of unknown parameter, g() is an unknown smooth function, e ij are random errors with E(e ij ) = 0. We assume without loss of generality that t ij are all scaled into the interval I = [0, 1]. Although the observations, and therefore the e ij , from the different subjects are independent, they can be dependent within each subject.

Partially linear models keep the flexibility of nonparametric models, while maintaining the explanatory power of parametric models (Fan and Li [2]). Many authors have studied the models in the form of (1.1) under some additional assumptions or restrictions. If the nonparametric component g() is known or not present in the models, they become the general linear models with repeated measurements, which were studied under Gaussian errors in a amount of literature. Some works have been integrated into PROC MIXED of the SAS Systems for estimation and inference for such models. If g(⋅) is unknown but there are no repeated measurements, that is m1 = ⋅ ⋅ ⋅ = m n = 1, the models (1.1) are reduced to non-longitudinal partially linear regression models, which were firstly introduced by Engle et al. [3] to study the effect of weather on electricity demand, and further studied by Heckman [4], Speckman [5] and Robinson [6], among others. A recent survey of the estimation and application of the models can be found in the monograph of Häardle et al. [7]. When the random errors of the models (1.1) are independent replicates of a zero mean stationary Gaussian process, Zeger and Diggle [8] obtained estimators of the unknown quantities and analyzed time-trend CD4 cell numbers among HIV sero-converters; Moyeed and Diggle [9] gave the rate of convergence for such estimators; Zhang et al. [10] proposed the maximum penalized Gaussian likelihood estimator. Introducing the counting process technique to the estimation scheme, Fan and Li [2] established asymptotic normality and rate of convergence of the resulting estimators. Under the models (1.1) for panel data with a one-way error structure, You and Zhou [11] and You et al. [12] developed the weighted semiparametric least square estimator and derived asymptotic properties of the estimators. In practice, a great deal of the data in econometrics, engineering and natural sciences occur in the form of time series in which observations are not independent and often exhibit evident dependence. Recently, the non-longitudinal partially linear regression models with complex error structure have attracted increasing attention by statisticians. For example, see Schick [13] with AR(1) errors, Gao and Anh [14] with long-memory errors, Sun et al. [15] with MA(∞) errors, Baek and Liang [16] and Zhou et al. [17] with negatively associated (NA) errors, and Li and Liu [18], Chen and Cui [19] and Liang and Jing [20] with martingale difference sequence, among others.

For longitudinal data, an inherent characteristic is the dependence among the observations within the same subject. Some authors have not considered the with-subject dependence to study the asymptotic behaviors of estimation in the semipara-metric models with assumption that the m i are all bounded, see, for example, He et al. [21], Xue and Zhu [22] and the references therein. Li et al. [23] and Bai et al. [24] showed that ignoring the data dependence within each subject causes a loss of efficiency of statistical inference on the parameters of interest. Hu et al. [25] and Wang et al. [26] took into consideration within-subject correlations for analyzing longitudinal data and obtained some asymptotic results based on the assumption that max1≤inm i is bounded for all n. Chi and Reinsel [27] considered linear models for longitudinal data that contain both individual random effects components and with-individual errors that follow an (autoregressive) AR(1) time series process and gave some estimation procedures, but they did not investigate asymptotic properties of estimations. In fact, the observed responses within the same subject are correlated and may be represented by a sequence of responses {y ij , j ≥ 1} for the i-individual with an intrinsic dependence structure, such as mixing conditions. For example, in hydrology, many measures may be represented by a sequence of responses {y ij , j ≥ 1} for the i th year at t ij , where t ij represents the time elapsed from the beginning of the i th year, and {e ij , j ≥ 1} are the measurements of the deviation from the mean { x i j T β + g ( t i j ) , j 1 } . It is not reasonable that E ( e i j 1 e i j 2 ) =0 for j1j2. In practice, {e ij , j ≥ 1} may be "weak error's structure", such as mixing-dependent structure. In this paper, we consider the estimation problems for the models (1.1) with the φ-mixing and ρ-mixing error structures for exhibiting dependence among the observations within the same subject respectively and are mainly devoted to strong consistency of estimators.

Let {X m , m ≥ 1} be a sequence of random variables defined on probability space ( Ω , , P ) , k l = σ ( X i , k i l ) be σ-algebra generated by X k , . . ., X l , and denote L 2 ( k l ) be the set of all k l measurable random variables with second moments.

A sequence of random variables {X m , m ≥ 1} is called to be φ-mixing if

φ ( m ) = sup k 1 , A 1 k , P ( A ) 0 , B k + m | P ( B | A ) P ( B ) | 0 , as m .

A sequence of random variables {X m , m ≥ 1} is called to be ρ-mixing if maximal correlation coefficient

ρ ( m ) = sup k 1 , X L 2 ( 1 k ) , Y L 2 ( k + m ) | cov ( X , Y ) | Var ( X ) Var ( Y ) 0 , as m .

The concept of mixing sequence is central in many areas of economics, finance and other sciences. A mixing time series can be viewed as a sequence of random variables for which the past and distant future are asymptotically independent. A number of limit theorems for φ-mixing and ρ-mixing random variables have been studied by many authors. For example, see Shao [28], Peligrad [29], Utev [30], Kiesel [31], Chen et al. [32] and Zhou [33] for φ-mixing; Peligrad [34], Peligrad and Shao [35, 36], Shao [37] and Bradley [38] for ρ-mixing. Some limit theories can be found in the monograph of Lin and Lu [39]. Recently, the mixing-dependent error structure has also been used to study the nonparametric and semiparametric regression models, for instance, Roussas [40], Truong [41], Fraiman and Iribarren [42], Roussas and Tran [43], Masry and Fan [44], Aneiros and Quintela [45], and Fan and Yao [46].

The rest of this paper is organized as follows. In Section 2, we give least square estimator (LSE) β ^ n of β based on the nonparametric estimator of g(·) under the mixing-dependent error structure and state some main results. Section 3 is devoted to sketches of several technical lemmas and corollaries. The proofs of main results are given in Section 4. We close with concluding remarks in the last section.

2 Estimators and main results

For models (1.1), if β is known to be the true parameter, then by Ee ij = 0, we have

g ( t i j ) = E ( y i j - x i j T β ) , 1 i n , 1 j m i .

Hence, a natural nonparametric estimator of g(·) given β is

g n * ( t , β ) = i = 1 n j = 1 m i W n i j ( t ) ( y i j - x i j T β ) ,
(2.1)

where W n i j ( t ) = W n i j ( t , t 1 1 , t 1 2 , , t n m n ) is the weight function defined on I. Now, in order to estimate β, we minimize

S S ( β ) = i = 1 n j = 1 m i y i j - x i j T β - g n * ( t i j , β ) 2 .

The minimizer to the above equation is found to be

β ^ n = i = 1 n j = 1 m i x ̃ i j x ̃ i j T - 1 i = 1 n j = 1 m i x ̃ i j i j ,
(2.2)

where x ̃ i j = x i j - k = 1 n l = 1 m i W n k l ( t i j ) x k l and i j = y i j - k = 1 n l = 1 m i W n k l ( t i j ) y k l .

So, a plug-in estimator of the nonparametric component g(·) is given by

ĝ n ( t ) = i = 1 n j = 1 m i W n i j ( t ) ( y i j - x i j T β ^ n ) .
(2.3)

In this paper, let {e ij ,1 ≤ jm i } be φ-mixing or ρ-mixing with Ee ij = 0 for each i(1 ≤ in), and {e i , 1 ≤ in} be mutually independent, where e i = ( e i 1 , , e i m i ) T . For each i, denote φ i (·) and ρ i (·) be the i th mixing coefficients of the sequence of φ-mixing and ρ-mixing, respectively. Define S n 2 = i = 1 n j = 1 m i x ̃ i j x ̃ i j T , g ̃ ( t ) =g ( t ) - k = 1 n l = 1 m i W n k l ( t ) g ( t k l ) , denote I(·) be the indicator function, || · || be the Euclidean norm, and set ⌊z⌋ ≤ z < ⌊z⌋ + 1 for the integer part of z. In the sequence, C and C1 denote positive constants whose values may vary at each occurrence.

For obtaining our main results, we list some assumptions:

A1 (i) {e ij , 1 ≤ jm i } are φ-mixing with Ee ij = 0 for each i;

  1. (ii)

    {e ij , 1 ≤ jm i } are ρ-mixing with Ee ij = 0 for each i.

A2 (i) max1≤inm i = o(nδ) for some 0<δ< r - 2 2 r and r > 2;

  1. (ii)

    lim n 1 N ( n ) S n 2 =, where Σ is a positive definite matrix and N ( n ) = i = 1 n m i

  2. (iii)

    g(·) satisfies the first-order Lipschitz condition on [0, 1].

A3 For n large enough, the probability weight functions W nij (·) satisfy

  1. (i)

    i = 1 n j = 1 m i W n i j ( t ) =1 for each t ∈ [0, 1];

  2. (ii)

    sup 0 t 1 max 1 i n , 1 j m i W n i j ( t ) =O n - 1 2 ;

  3. (iii)

    sup 0 t 1 i = 1 n j = 1 m i W n i j ( t ) I ( | t i j - t | > ε ) =o ( 1 ) for any ϵ > 0;

  4. (iv)

    max 1 k n , 1 l m i || i = 1 n j = 1 m i W n i j ( t k l ) x i j ||=O ( 1 ) ,

  5. (v)

    sup 0 t 1 i = 1 n j = 1 m i W n i j ( t ) x i j =O ( 1 ) ,

  6. (vi)

    max 1 i n , 1 j m i W n i j ( s ) - W n i j ( t ) C|s-t| uniformly for s, t ∈ [0, 1].

Remark 2.1 For obtaining the asymptotic properties of estimators of the models (1.1), many authors often assumed that {m i , 1 ≤ in} are bounded. Under the weak condition A2(i), we obtain the strong consistency of estimators of the models (1.1) with mixing-dependent structure. The condition of {m i , 1 ≤ in} being a bounded sequence is a special case of A2(i).

Remark 2.2 Assumption A2(ii) implies that

1 N ( n ) i = 1 n j = 1 m i | | x ̃ i j | | = O ( 1 ) and max 1 i n , 1 j m i | | x ̃ i j | | = o N ( n ) 1 2 .

Remark 2.3 As a matter of fact, there exist some weights satisfying assumption A3. For example, under some regularity conditions, the following Nadaraya-Watson kernel weight satisfies assumption A3:

W n i j ( t ) = K t - t i j h n k = 1 n l = 1 m i K t - t k l h n - 1 ,

where K(·) is a kernel function and h n is a bandwidth parameter. Assumption A3 has also been used by Hardle et al. [7], Baek and Liang [16], Liang and Jing [20] and Chen and You [47].

Theorem 2.1 Suppose that A1(i) or A1(ii), and A2 and A3(i)-(iii) hold. If

max 1 i n , 1 j m i E ( | e i j | p ) C , a . s .
(2.4)

for p > 3, then

β ^ n β , a . s . .
(2.5)

Theorem 2.2 Suppose that A1(i) or A1(ii), and A2, A3(i-iv) and (2.4) hold. For any t ∈ [0, 1], we have

ĝ n ( t ) g ( t ) , a . s . .
(2.6)

Theorem 2.3 Suppose that A1(i) or A1(ii), and A2, A3(i-iii), A3(v-vi) and (2.4) hold. We have

sup 0 t 1 | ĝ n ( t ) - g ( t ) | = o ( 1 ) , a . s . .
(2.7)

3 Several technical lemmas and corollaries

In order to prove the main results, we first introduce some lemmas and corollaries. Let S j = l = 1 j X l for j ≥ 1, and S k ( i ) = j = k + 1 k + i X j for i ≥ 1 and k ≥ 0.

Lemma 3.1. (Shao [28]) Let {X m , m ≥ 1} be a φ-mixing sequence.

(1) If EX i = 0, then

E S k 2 ( i ) 8 0 0 0 i exp 6 j = 1 log i φ 1 2 ( 2 j ) max k + 1 j k + i E X j 2 .

(2) Suppose that there exists an array {c km } of positive numbers such that max 1 i m E S k 2 ( i ) c k m for every k ≥ 0, m ≥ 1. Then, for any q ≥ 2, there exists a positive constant C = C(q, φ(·)) such that

E max 1 i m | S k ( i ) | q C c k m q 2 + E max k < i k + m | X i | q .

Lemma 3.2. (Shao [37]) Let {X m , m ≥ 1} be a ρ -mixing sequence with EY i = 0. Then, for any q ≥ 2, there exists a positive constant C = C(q, ρ(·)) such that

E max 1 j m | S j | q C m q 2 exp C j = 1 log m ρ ( 2 j ) max 1 j m ( E | X j | 2 ) q 2 + m exp C j = 1 log m ρ 2 q ( 2 j ) max 1 j m E | Y j | q θ .

Lemma 3.3. Suppose that A1(i) or A1(ii) holds. Let α > 1,0 < r < α and

e i j = e i j I | e i j | ε i 1 r m i ,
(3.1)
e i j = e i j - e i j = e i j I e i j > ε i 1 r m i + e i j I e i j < - ε i 1 r m i
(3.2)

for any ε > 0. If

max 1 i n max 1 j m i E ( | e i j | α ) C , a . s . ,
(3.3)

we have

i = 1 j = 1 m i | e i j | < , a . s . .

Proof Note that | e i j |=| e i j |I | e i j | > ε i 1 r m i . Let ξ i = j = 1 m i | e i j | , ξ i = j = 1 m i | e i j | I j = 1 m i | e i j | ε i 1 r m i , ξ i = ξ i - ξ i = j = 1 m i | e i j | I j = 1 m i | e i j | > ε i 1 r m i , and | ξ i | d =| ξ i |I ( | ξ i | d ) for fixed d > 0. First, we prove

i = 1 | ξ i | < , a . s . .
(3.4)

Note that

{ | ξ i | > d } = j = 1 m i | e i j | I j = 1 m i | e i j | > ε i 1 r m i > d = j = 1 m i | e i j | > ε i 1 r m i
(3.5)

for i large enough. By Markov's inequality, C r -inequality, and (3.3), we have

i = 1 P ( ξ i d ) C i = 1 P j = 1 m i e i j > ε i 1 r m i C i = 1 i - α r m i - α E j = 1 m i e i j α C i = 1 i - α r m i - 1 j = 1 m i E e i j α C lim n i = 1 n i - α r max 1 i n max 1 j m i E e i j α C i = 1 i - α r < ,
(3.6)

From (3.5), { | ξ i | d } = j = 1 m i | e i j | ε i 1 r m i for i large enough. One gets

E ( | ξ i | d ) = E ( | ξ i | I ( | ξ i | d ) ) = E j = 1 m i | e i j | I j = 1 m i | e i j | > ε i 1 r m i I j = 1 m i | e i j | ε i 1 r m i = 0

and

Var ( | ξ i | d ) E ( | ξ i | d 2 ) = E ( | ξ i | I ( | ξ i | d ) ) 2 = E | ξ i | 2 I ( | ξ i | d ) d E ( | ξ i | I ( | ξ i | d ) ) = 0

for i large enough. Therefore,

i = 1 E ( | ξ i | d ) < , i = 1 Var ( | ξ i | d ) < .
(3.7)

Since { ξ i , 1 i n } is a sequence of independent random variables, (3.4) holds from (3.6) and (3.7) by Three Series Theorem. Then,

i = 1 j = 1 m i | e i j | = i = 1 j = 1 m i | e i j | I | e i j | > ε i 1 r m i i = 1 j = 1 m i | e i j | I j = 1 m i | e i j | > ε i 1 r m i = i = 1 | ξ i | < , a . s . .

Thus, we complete the proof of Lemma 3.3.

Lemma 3.4. Let {e ij , 1 ≤ jm i } be the φ-mixing with Ee ij = 0 for each i (1 ≤ in). Assume that {a nij (·), 1 ≤ in,1 ≤ jm i } is a function array defined on [0, 1], satisfying i = 1 n j = 1 m i | a n i j ( t ) |=O ( 1 ) and max 1 i n , 1 j m i | a n i j ( t ) |=O n - 1 2 for any t ∈ [0, 1], and A2(i) and (2.4) hold. Then, for any t ∈ [0, 1] we have

i = 1 n j = 1 m i a n i j ( t ) e i j = o ( 1 ) , a . s . .
(3.8)

Proof Based on (3.1) and (3.2), we denote ζ n i j = e i j -E ( e i j ) , η n i j = e i j -E ( e i j ) and take r satisfying 2 < r < p - 1. Since e ij = ζ nij + η nij , we have

i = 1 n j = 1 m i a n i j ( t ) e i j = i = 1 n j = 1 m i a n i j ( t ) ζ n i j + i = 1 n j = 1 m i a n i j ( t ) e i j - i = 1 n j = 1 m i a n i j ( t ) E ( e i j ) = : A 1 n + A 2 n - A 3 n .
(3.9)

First, we prove

A 1 n 0 , a . s . .
(3.10)

Denoting ζ ̃ n i = j = 1 m i a n i j ( t ) ζ n i j , we know that { ζ ̃ n i , 1 i n } is a sequence of independent random variables with E ζ ̃ n i =0. By Markov's inequality, and Rosenthal's inequality, for any ε > 0 and q ≥ 2, one gets

P i = 1 n ζ ̃ n i > ε ε - q E i = 1 n ζ ̃ n i q C i = 1 n E | ζ ̃ n i | q + i = 1 n E ζ ̃ n i 2 q 2 = : A 1 1 n + A 1 2 n .
(3.11)

Note that φi(m) → 0 as m → ∞, hence k = 1 log m i φ i 1 2 ( 2 k ) =o ( log m i ) . Further, exp λ k = 1 log m i φ i 1 2 ( 2 k ) =o ( m i τ ) for any λ > 0 and τ > 0.

For A11n, by Lemma 3.1, A2(i) and (2.4), and taking q > p, we have

A 1 1 n = C i = 1 n E j = 1 m i a n i j ( t ) ζ n i j q C i = 1 n m i exp 6 k = 1 log m i φ i 1 2 ( 2 k ) max 1 k m i E | a n i k ( t ) ζ n i k | 2 q 2 + j = 1 m i E | a n i j ζ n i j | q C i = 1 n m i 1 + τ n - 1 q 2 + j = 1 m i n - q 2 E | ζ n i j | p | ζ n i j | q - p C n - q 2 i = 1 n m i ( τ + 1 ) q 2 + C n - q 2 i = 1 n j = 1 m i ( i 1 r m i ) q - p C n - q 2 - ( τ + 1 ) δ q 2 - 1 + C n - q 2 - q r + p r - ( q - p + 1 ) δ - 1

Take q>max 2 r ( 2 + δ ) r - 2 r δ - 2 , 4 1 - δ , p . We have q 2 - δ q 2 >2 and q 2 - q r + p r - ( q - p + 1 ) δ>2.

Next, take τ > 0 small enough such that q 2 - ( τ + 1 ) δ q 2 >2. Thus, we have

n = 1 A 1 1 n < .
(3.12)

For A12n, by Lemma 3.1 and (2.4), we have

A 1 2 n = C i = 1 n E j = 1 m i a n i j ( t ) ζ n i j 2 q 2 C i = 1 n m i exp 6 k = 1 log m i φ i 1 2 ( 2 k ) max 1 j m i E | a n i j ( t ) ζ n i j | 2 q 2 C i = 1 n m i τ + 1 j = 1 m i E | a n i j ( t ) ζ n i j | 2 q 2 C n - q 4 - ( τ + 1 ) δ q 2

Note that δ< r - 2 2 r < 1 2 . Taking q> 4 1 - 2 δ , we have q 4 - δ q 2 >1. Next, take τ > 0 small enough such that q 2 - ( τ + 1 ) δ q 2 >1. Thus, we have

n = 1 A 1 2 n < .
(3.13)

Combining (3.11)-(3.13), we obtain (3.10).

By Lemma 3.3 and max 1 i n , 1 j m i | a n i j ( t ) |=O n - 1 2 for any t ∈ [0, 1], we have

| A 2 n | max i i n , 1 j m i | a n i j ( t ) | i = 1 n j = 1 m i | e i j | = O n - 1 2 .
(3.14)

Note that p - 1 r >1 and δ > 0. From (2.4), we have

| A 3 n | = i = 1 n j = 1 m i a n i j ( t ) E ( e i j ) n - 1 2 i = 1 n j = 1 m i E | e i j | I ( | e i j | > ε i 1 r m i ) = n - 1 2 i = 1 n j = 1 m i E | e i j | p | e i j | 1 - p I | e i j | > ε i 1 r m i C n - 1 2 i = 1 n j = 1 m i i 1 r m i 1 - p C n - 1 2 i = 1 n i - p - 1 r m i 2 - p C n - ( p - 2 ) δ + 1 2 = o ( 1 ) .
(3.15)

From (3.9), (3.10), (3.14) and (3.15), we have (3.8).

Corollary 3.1. In Lemma 3.4, if {e ij , 1 ≤ jm i } are ρ -mixing with Ee ij = 0 for each i (1 ≤ in), then (3.8) holds.

Proof From the proof of Lemma 3.4, it is enough to prove that n = 1 A 1 1 n < and n = 1 A 1 2 n <.

Note that ρ i (m) → 0 as m → ∞, hence k = 1 log m i ρ i 2 q ( 2 k ) =o ( log m i ) . Further, exp λ k = 1 log m i ρ i 2 q ( 2 k ) = o ( m i τ ) for any λ > 0 and τ > 0.

For A11n, by Lemma 3.2 and (2.4), and taking q > p, we get

A 1 1 n = C i = 1 n E j = 1 m i a n i j ( t ) ζ n i j q C i = 1 n m i q 2 exp C 1 k = 1 log m i ρ 1 ( 2 k ) max 1 k m i ( E | a n i k ζ n i k | 2 ) q 2 + m i exp C 1 k = 1 log m i ρ i 2 q ( 2 k ) max 1 k m i E | a n i k ζ n i k | q C i = 1 n m i τ + q 2 n - q 2 + m i τ + 1 n - q 2 i 1 r m i q - p C n - q 2 - r + q 2 δ - 1 + C n - q 2 - q r + p r - ( q + p + r + 1 ) δ - 1

Take q>max 2 r ( 2 + δ ) r - 2 r δ - 2 , 4 1 - δ , p . We have q 2 - q δ 2 >2 and q 2 - q r + p r - ( q - p + 1 ) δ>2.

Next, take τ > 0 small enough such that q 2 - τ + q 2 δ>2 and q 2 - q r + p r - ( q + p + τ + 1 ) δ>2. Thus, n = 1 A 1 1 n <.

For A12n, by Lemma 3.2 and (2.4), we have

A 1 2 n = C i = 1 n E j = 1 m i a n i j ( t ) ζ n i j 2 q 2 C i = 1 n m i exp C 1 k = 1 log m i ρ 1 ( 2 k ) max 1 j m i E | a n i k ζ n i k | 2 q 2 C i = 1 n m i τ + 1 j = 1 m i E | a n i j ζ n i j | 2 q 2 C n - q 4 - ( τ + 1 ) δ q 2

Note that δ< 1 2 from A2(i). Taking q> 4 1 - 2 δ , we have q 4 - δ q 2 >1. Next, take τ > 0 small enough such that q 2 - ( τ + 1 ) δ q 2 >1. Thus, n = 1 A 1 2 n <.

So, we complete the proof of Lemma 3.4.

Remark 3.1 If the real function array {a nij (t),1 ≤ in, 1 ≤ j < m i } is replaced with the real constant array {a nij , 1 ≤ in, 1 ≤ jm i }, the results of Lemma 3.4 and Corollary 3.1 hold obviously.

Lemma 3.5. Let {e ij , 1 ≤ jm i } be the φ-mixing with Ee ij = 0 for each i (1 ≤ in). Assume that {a nij (·), 1 ≤ in, 1 ≤ jm i } is a function array defined on [0, 1], satisfying i = 1 n j = 1 m i | a n i j ( t ) |=O ( 1 ) and max 1 i n , 1 j m i | a n i j ( t ) |=O n - 1 2 uniformly for t ∈ [0, 1], and max 1 i n , 1 j m i | a n i j ( s ) - a n i j ( t ) |C|s-t| uniformly for s,t ∈ [0, 1], where C is a constant. If A2(i) and (2.4) hold, then

sup 0 t 1 i = 1 n j = 1 m i a n i j ( t ) e i j = o ( 1 ) , a . s . .
(3.16)

Proof Based on (3.1) and (3.2), we denote ζ n i j = e i j -E e i j and take r satisfying 2 < r < p - 1. Using the finite covering theorem, [0, 1] is covered by O n 2 + 1 r 's neighborhoods D n with center s n and radius n - 2 + 1 r , and for each t ∈ [0, 1], there exists some neighborhood D n (s n (t)) with center s n (t) and radius n - 2 + 1 r such that tD n (s n (t)). Since E(e ij ) = 0, we have

i = 1 n j = 1 m i a n i j ( t ) e i j i = 1 n j = 1 m i a n i j ( t ) e i j + i = 1 n j = 1 m i ( a n i j ( t ) - a n i j ( s n ( t ) ) ) e i j + i = 1 n j = 1 m i a n i j ( s n ( t ) ) ζ n i j + i = 1 n j = 1 m i ( a n i j ( t ) - a n i j ( s n ( t ) ) ) E ( e i j ) + i = 1 n j = 1 m i a n i j ( t ) E ( e i j ) . = : B 1 n ( t ) + B 2 n ( t ) + B 3 n ( t ) + B 4 n ( t ) + B 5 n ( t ) .

Denote sup max t , i , j = sup 0 t 1 max 1 i n , 1 j m i . By Lemma 3.3 and the proof of (3.15), noting that δ< 1 2 , we have

sup 0 t 1 B 1 n ( t ) sup max t , i , j a n i j ( t ) i = 1 n j = 1 m i e i j = O n - 1 2 , a . s . , sup 0 t 1 B 2 n ( t ) sup max t , i , j a n i j ( t ) - a n i j ( s n ( t ) ) i = 1 n j = 1 m i e i j C n - 2 + 1 r n 2 δ n 1 + 1 r = o ( 1 ) , sup 0 t 1 B 4 n ( t ) sup max t , i , j a n i j ( t ) - a n i j ( s n ( t ) ) i = 1 n j = 1 m i E ( e i j ) = o ( 1 ) , sup 0 t 1 B 5 n ( t ) sup max t , i , j a n i j ( t ) i = 1 n j = 1 m i E e i j I e i j > ε i 1 r = o ( 1 ) .

Now, it is enough to show sup0≤t≤1 B3n(t) = o(1), a.s..

From (3.11), A11nand A12n, for the given t ∈ [0, 1] and uD n (s n (t)), we have

P i = 1 n j = 1 m i a n i j ( u ) ζ n i j > ε C n - q 2 - q r + p r - ( q - p + 1 ) δ - 1 + n - q 2 - ( r + 1 ) δ q 2 - 1 + n - q 4 - ( r + 1 ) δ q 2 .

Then, we obtain

P sup 0 t 1 B 3 n ( t ) > ε P sup 0 t 1 sup u D n ( s n ( t ) ) i = 1 n j = 1 m i a n i j ( u ) ζ n i j > ε C n 2 + 1 r n - q 2 - q r + p r - ( q - p + 1 ) δ - 1 + n - q 2 - ( r + 1 ) δ q 2 - 1 + n - q 4 - ( r + 1 ) δ q 2 C n - q 2 - q r - δ d - δ - 4 + n - q 2 - ( r + 1 ) δ q 2 - 4 + n - q 4 - ( r + 1 ) δ q 2 - 3 .

Take q>max 2 r ( 5 + δ ) r - 2 r δ - 2 , 1 6 1 - 2 δ , p . We have q 2 - q r -δq-δ>5, q 2 - δ q 2 >5 and q 4 - δ q 2 >4. Next, take τ > 0 small enough such that q 2 - ( r + 1 ) δ q 2 >5 and q 4 - ( r + 1 ) δ q 2 >4. Thus, we have n = 1 P sup 0 t 1 B 3 n ( t ) > ε <. Thus, sup0≤t≤1B3n(t) = o(1),a.s.. Therefore, (3.16 ) holds.

Corollary 3.2. In Lemma 3.5, if {e ij , 1 ≤ jm i } are ρ -mixing with Ee ij = 0 for each i (1 ≤ in), then (3.16) holds.

Proof By Corollary 3.1, with arguments similar to the proof of Lemma 3.5, we have (3.16).

4 Proof of Theorems

Proof of Theorem 2.1 From (1.1) and (2.2), we have

β ^ n - β = i = 1 n j = 1 m i x ̃ i j x ̃ i j T - 1 i = 1 n j = 1 m i x ̃ i j ( i j - x ̃ i j T β ) = S n - 2 i = 1 n j = 1 m i x ̃ i j ( y i j - x i j T β ) - k = 1 n l = 1 m i W n k l ( t i j ) ( y k l - x k l T β ) = S n - 2 i = 1 n j = 1 m i x ̃ i j ( g ( t i j ) + e i j ) - k = 1 n l = 1 m i W n k l ( t i j ) ( g ( t k l ) + e k l ) = S n - 2 i = 1 n j = 1 m i x ̃ i j e i j - k = 1 n l = 1 m i W n k l ( t i j ) e k l + g ̃ ( t i j ) = S n - 2 i = 1 n j = 1 m i x ̃ i j e i j - i = 1 n j = 1 m i x ̃ i j k = 1 n l = 1 m i W n k l ( t i j ) e k l + i = 1 n j = 1 m i x ̃ i j g ̃ ( t i j ) = S n 2 N ( n ) - 1 i = 1 n j = 1 m i x ̃ i j N ( n ) e i j - i = 1 n j = 1 m i x ̃ i j N ( n ) k = 1 n l = 1 m i W n k l ( t i j ) e k l + i = 1 n j = 1 m i x ̃ i j N ( n ) g ̃ ( t i j ) = : D 1 n + D 2 n + D 3 n .
(4.1)

From A2(ii), S n 2 N ( n ) - 1 =O ( 1 ) . By Remark 2.2, we have

i = 1 n j = 1 m i | | x ̃ i j | | N ( n ) =O ( 1 ) and max 1 i n , 1 j m i | | x ̃ i j | | N ( n ) =o n 1 2 .
(4.2)

According to (4.2) and Remark 3.1, we have

| | D 1 n | | C i = 1 n j = 1 m i | | x ̃ i j | | N ( n ) e i j = o ( 1 ) , a . s . .
(4.3)

By A3(i-ii), (4.2), Lemma 3.4 or Corollary 3.1, we have

| | D 2 n | | C max 1 i n , 1 j m i k = 1 n l = 1 m i W n k l ( t i j ) e k l i = 1 n j = 1 m i | | x ̃ i j | | N ( n ) = o ( 1 ) . a . s . .
(4.4)

From A2(iii) and A3(iii), we obtain

max 1 i n , 1 j m i g ̃ ( t i j ) = max 1 i n , 1 j m i g ( t i j ) - k = 1 n l = 1 m i W n k l ( t i j ) g ( t k l ) max 1 i n , 1 j m i k = 1 n l = 1 m i W n k l ( t i j ) ( g ( t i j ) - g ( t k l ) ) I ( | t i j - t k l | > ε ) + max 1 i n , 1 j m i k = 1 n l = 1 m i W n k l ( t i j ) ( g ( t i j ) - g ( t k l ) ) I ( | t i j - t k l | ε ) = o ( 1 ) .
(4.5)

Together with (4.2), one gets

| | D 3 n | | C max 1 i n , 1 j m i | g ̃ ( t i j ) | i = 1 n j = 1 m i | | x ̃ i j | | N ( n ) = o ( 1 ) .
(4.6)

By (4.1), (4.3), (4.4) and (4.6), (2.5) holds.

Proof of Theorem 2.2 From (1.1) and (2.3), we have

ĝ n ( t ) - g ( t ) = i = 1 n j = 1 m i W n i j ( t ) ( y i j - x i j T β ^ n ) - g ( t ) = i = 1 n j = 1 m i W n i j ( t ) ( y i j - x i j T β ^ n ) - ( y i j - x i j T β ) + i = 1 n j = 1 m i W n i j ( t ) ( y i j - x i j T β ) - g ( t ) = i = 1 n j = 1 m i W n i j ( t ) x i j T ( β - β ^ n ) + i = 1 n j = 1 m i W n i j ( t ) ( g ( t i j ) + e i j ) - g ( t ) = i = 1 n j = 1 m i W n i j ( t ) x i j T ( β - β ^ n ) + i = 1 n j = 1 m i W n i j ( t ) e i j - g ̃ ( t ) = : E 1 n + E 2 n + E 3 n .
(4.7)

By A3(iv) and (2.5), one gets

| E 1 n | i = 1 n j = 1 m i W n i j ( t ) x i j | | β - β ^ n | | = o ( 1 ) , a . s . .
(4.8)

By Lemma 3.4 or Corollary 3.1, E2n= o(1), a.s.; With arguments similar to (4.5), we have E3n= o(1). Therefore, together with (4.7) and (4.8), (2.6) holds.

Proof of Theorem 2.3 Here, we still use (4.7), but E in in (4.7) are replaced by E in (t) for i = 1,2 and 3. By A3(v) and (2.5), we get

sup 0 t 1 | E 1 n ( t ) | sup 0 t 1 i = 1 n j = 1 m i W n i j ( t ) x i j | | β - β ^ n | | = o ( 1 ) , a . s . .

By Lemma 3.5 or Corollary 3.2, sup0≤t≤1|E2n(t)| = o(1), a.s.; Similar to the arguments in (4.5), we have sup0≤t≤1|E2n(t)| = o(1). Hence, (2.7) is proved.

5 Simulation study

To evaluate the finite-sample performance of the least squares estimator β ^ n and the nonparametric estimator ĝ n ( t ) , we respectively take two forms of functions for g(·):

I . g ( t ) = exp ( 3 t ) ; II . g ( t ) = cos 3 π 2 t ,

consider the case where p = 1 and m i = m = 12, and take the design points t ij = ((i - 1)m + j)/(nm), x ij ~ N(1, 1) and the errors e ij = 0.2ei, j-1+ ϵ ij , where ϵ ij are i.i.d. N(0,1) random variables, and ei,0~ N(0,1) for each i.

The kernel function is taken as the Epanechnikov kernel K ( t ) = 3 4 ( 1 - t 2 ) I ( | t | 1 ) , and the weight function is given by Nadaraya-Watson kernel weight W i j ( t ) =K t - t i j h n i = 1 n j = 1 m i K t - t i j h n . The bandwidth h is selected by a "leave-one-subject-out" cross validation method. In the simulations, we draw B = 1000 random samples of sizes 150,200,300 and 500 for β = 2, respectively. We obtain the estimators β ^ n and ĝ n ( t ) from (2.2) and (2.3), respectively. Let β ^ n ( b ) be b th least squares estimator of β under the size n. Some numerical results for β ^ n are computed by

β ̄ n = 1 B b = 1 B β ^ n ( b ) , SD ^ ( β ^ n ) = 1 B - 1 b = 1 B ( β ^ n ( b ) - β ̄ ) 2 1 2 , Bias ^ ( β ^ n ) = β ̄ - β , MSE ^ ( β ^ n ) = 1 B - 1 b = 1 B ( β ^ n ( b ) - β ) 2 ,

which are listed in Table 1.

Table 1 The estimators of β and some indices of their accuracy for the different sample size n and nonparametric function g(·)

In addition, for assessing estimator of the nonparametric component g(·), we study the square root of mean-squared errors (RMSE) based on 1000 repetitions. Denote ĝ n ( b ) ( t ) be the b th estimator of g(t) under the size n, and ĝ ̄ n ( t ) = b = 1 B ĝ n ( b ) ( t ) B be the average estimator of g(t). We compute

R M S E n = 1 M s = 1 M ( ĝ ̄ n ( t s ) - g ( t s ) ) 2 1 2 ,

and

R M S E n ( b ) = 1 M s = 1 M ( ĝ n ( b ) ( t s ) - g ( t s ) ) 2 1 2 , b = 1 , 2 , , B ,

where {t s , s = 1,..., M} is a sequence of regular grid points on [0, 1]. Figures 1 and 2 respectively provide the average estimators of the nonparametric function g(·) and RMSE n values for Cases I and II, respectively. The boxplots for RMSE n ( b ) ( b = 1 , 2 , , B ) values for Cases I and II are presented in Figure 3.

Figure 1
figure 1

Estimators of the nonparametric component g (·) for the case I: ĝ n ( ) ( dashed curve ) and g(·) ( solid curve ).

Figure 2
figure 2

Estimators of the nonparametric component g (·) for the case II: ĝ n ( ) ( dashed curve ) and g(·) ( solid curve ).

Figure 3
figure 3

The boxplots of RMSE n ( b ) ( b = 1 , 2 , , B ) values in the estimators of g (·).

From Table 1, we see that (i) | Bias ^ ( β ^ n ) |, SD ^ ( β ^ n ) and MSE ^ ( β ^ n ) do decrease with increasing the sample size n; (ii) the larger the sample size n is, the closer the β ̄ n is to the true value 2. From Figures 1, 2 and 3, we observe that the biases of estimators of the nonparametric component g(·) decrease as the sample size n increases. These show that, for semiparametric partially linear regression models for longitudinal data based on mixing error's structure, the least squares estimator of parametric component β and the estimator of nonparametric component g(·) work well.

6 Concluding remarks

An inherent characteristic of longitudinal data is the dependence among the observations within the same subject. For exhibiting dependence among the observations within the same subject, we consider the estimation problems of partially linear models for longitudinal data with the φ-mixing and ρ- mixing error structures, respectively. The strong consistency for least squares estimator β ^ n of parametric component β is studied. In addition, the strong consistency and uniform consistency for the estimator ĝ n ( ) of nonparametric function g(·) are investigated under some mild conditions.

In the paper, we only consider ( x i j T , t i j ) are known and nonrandom design points, as Baek and Liang [16], and Liang and Jing [20]. In the monograph of Hardle et al. [7], they respectively considered the two cases: the fixed design and the random design, to study non-longitudinal partially linear regression models. Our results can also be extended to the case of ( x i j T , t i j ) being random. The interested readers can consider the work. In addition, we consider partially linear models for longitudinal data with only φ-mixing and ρ-mixing. In fact, our results with other mixing-dependent structures, such as α-mixing, φ*-mixing and ρ*-mixing, can also be obtained by the same arguments in our paper. At present, we have not given the asymptotic normality of estimators, since some details need further discussion. We will devote to establish the asymptotic normality of β ^ n and ĝ n ( ) in our future work.