# Empirical likelihood semiparametric nonlinear regression analysis for longitudinal data with responses missing at random

## Authors

- First Online:

- Received:
- Revised:

DOI: 10.1007/s10463-012-0387-4

- Cite this article as:
- Tang, N. & Zhao, P. Ann Inst Stat Math (2013) 65: 639. doi:10.1007/s10463-012-0387-4

- 1 Citations
- 366 Views

## Abstract

This paper develops the empirical likelihood (EL) inference on parameters and baseline function in a semiparametric nonlinear regression model for longitudinal data in the presence of missing response variables. We propose two EL-based ratio statistics for regression coefficients by introducing the working covariance matrix and a residual-adjusted EL ratio statistic for baseline function. We establish asymptotic properties of the EL estimators for regression coefficients and baseline function. Simulation studies are used to investigate the finite sample performance of our proposed EL methodologies. An AIDS clinical trial data set is used to illustrate our proposed methodologies.

### Keywords

Empirical likelihoodImputationLongitudinal dataMissing at randomSemiparametric nonlinear regression model## 1 Introduction

Longitudinal data are often encountered in economical, psychological, biomedical, behavioral, educational and social research. In longitudinal studies, subjects are observed repeatedly over time and responses of interest are recorded together with covariates. Semiparametric regression models are often employed to fit various longitudinal data because the parametric part provides an interpretable data summary and the nonparametric functions provide flexibility to all the data to decide some unknown or uncertain components such as the shape of the mean response over time. Various statistical methods have been developed to estimate the regression coefficients and smoothing functions in a semiparametric regression model in past years. For example, see Green (1987), Zeger and Diggle (1994), Lin and Carroll (2001), Ruppert et al. (2003) and Fan and Li (2004). However, nonlinear relations among the covariates are important for developing more reasonable and meaningful models, see Bates and Watts (1988). Recently, semiparametric nonlinear regression models have received considerable attention, for example, see Zhu et al. (2000), Li and Nie (2008), and Wang and Ke (2009). These existing theories and methods have been developed under the assumption that responses or covariates in semiparametric nonlinear regression models are not subject to missingness. Hence, this paper aimed to develop an inference procedure for regression coefficients and smoothing functions in semiparametric nonlinear regression models with missing responses at random.

Since missing data are often encountered in various fields, such as surveys, clinical trials and longitudinal studies (Little and Rubin 2002) due to some potential reasons such as study drop-out or study subjects’ refusal to answer some items on a questionnaire or failing to attend a scheduled clinic visit, various methods have been developed to analyse semiparametric regression models with missing data. For example, see Yi and Cook (2002), Shardell and Miller (2008), Chen et al. (2008). Particularly, EL inference for semiparametric regression models with missing data has received a lot of attention in recent years because it is especially useful for constructing confidence intervals or regions of parameters of interest in the considered models. For example, see Wang et al. (2004), Liang et al. (2007), Liang and Qin (2008), Xue and Xue (2011). Also, nonlinear regression models with responses missing at random were studied in recent years, for example, see Müller (2009) and Ciuperca (2011). However, it is more challenging to deal with semiparametric nonlinear regression models for longitudinal data with missing responses at random due to nonlinearity of unknown regression coefficients and the within-subject correlation. Moreover, there is little work done on the development of the EL method for semiparametric nonlinear regression models for longitudinal data with missing responses at random.

The aim of this paper was to develop a general EL inference procedure for parameters and baseline function using the complete-case data set or the imputed values in a semiparametric nonlinear regression model for longitudinal data with responses missing at random. In our proposed methods, the value for a missing response is imputed using the inverse-probability weighted imputed method, and the within-subject correlation structure is considered by introducing the working covariance matrix into the proposed auxiliary random vectors. Particularly, to avoid selecting the optimal bandwidth and the so-called “curse of dimensionality” in estimating selection probability function via the kernel method, we employ a logistic regression model, which is widely used in missing data analysis (see Ibrahim et al. 2001; Lee and Tang 2006; Chen and Zhong 2010), to evaluate estimation of the selection probability function by maximizing the corresponding likelihood function of the given logistic model. Our proposed EL method has the following features: (1) the EL ratio statistic on \(\beta \) follows asymptotically the central Chi-squared distribution, which can be directly used to construct confidence regions of the parameters without any extra Monte Carlo approximation needed when our proposed EL method is not used; (2) unlike normal-approximation-based (NA-based) method for constructing confidence region on \(\beta \), a consistent estimator of the asymptotic covariance matrix is not needed; (3) our empirical results show that the EL-based method has advantage over the NA-based in terms of the coverage probability and the interval width; and (4) our proposed theoretical results are new since other literature only considered nonlinear models with responses missing at random (Ciuperca 2011) or semiparametric linear regression models with responses missing at random or within-subject independence structure. We here extend the EL inference for semiparametric regression models with missing responses at random to semiparametric nonlinear regression models for longitudinal data with missing responses at random by incorporating the within-subject correlation into the constructed auxiliary vectors. We systematically investigate the asymptotic properties of the maximum EL estimators (MELEs) under this new setting.

The rest of the paper is organized as follows: Section 2 outlines the formulations of two ELs for \(\beta \) based on the complete-case data and the inverse probability weighted imputation technique. We propose a calibrated method for constructing EL ratios and an imputation estimator for \(g(t)\) in Sect. 2. In Sect. 3, we establish the asymptotic properties of the proposed three EL ratio functions and their corresponding EL estimators. Numerical illustrations including two simulation studies and a real example are presented to compare the finite sample performance of the proposed methods in Sect. 4. Some concluding remarks are given in Sect. 5. Technical details are presented in the Appendix.

## 2 Methods

### 2.1 Model and notation

Throughout this paper, we assume that \(Y_{ij}\)’s are subject to missingness and \(X_{ij}\)’s and \(T_{ij}\)’s are completely observed. Let \(\delta _{ij}=0\) if \(Y_{ij}\) is missing and \(\delta _{ij}=1\) if \(Y_{ij}\) is observed. Generally, the missing components may vary across different subjects. Here we assume that \(Y_{ij}\) is missing at random (MAR), i.e. \(\delta _{ij}\) and \(Y_{ij}\) are conditionally independent given \(X_{ij}\) and \(T_{ij}\): \(P(\delta _{ij}=1|X_{ij}, Y_{ij}, T_{ij})=P(\delta _{ij}=1|X_{ij}, T_{ij})\triangleq p(X_{ij},T_{ij})\). It is assumed that \(\delta _{ij}\) is independent of \(\delta _{ik}\) for any \(j\not = k\). Without loss of generality, we also assume that \(T_{ij}\)’s are all scaled into the interval \([0,1]\).

### 2.2 MELE of \(\beta \) with the complete-case data

### 2.3 MELE of \(\beta \) with the imputed values

### 2.4 Maximum residual-adjusted EL estimator for \(g(t)\)

### 2.5 Imputation estimator for \(g(t)\)

All the above-presented estimators for \(g(t)\) are obtained from the complete-case data set and do not sufficiently use the information contained in the data set, which may yield bias estimator of \(g(t)\). Motivated by the imputation method for missing responses given in Sect. 2.3, we propose an imputation estimator for \(g(t)\) as follows:

## 3 Asymptotic properties

Based on the above-mentioned notation, we consider asymptotic distributions of the LELRFs \(\ell _l(\beta )\) and the estimators \(\hat{\beta }_{l}\) (\(l=c,I\)) for parameter \(\beta \) presented in Sects. 2.2 and 2.3.

**Theorem 1**

Suppose that the conditions (A1)–(A11) given in the Appendix hold. If \(\beta \) is the true parameter, then \(\ell _{l}(\beta )\stackrel{\mathcal{L}}{\rightarrow }\chi _{p}^{2}\) for \(l=c\) and \(I\), where \(\chi _{p}^{2}\) is the Chi-squared distribution with \(p\) degrees of freedom, and \(\stackrel{\mathcal{L}}{\rightarrow }\) denotes the convergence in distribution.

Let \(\chi _{p,\alpha }^2\) be the upper \(\alpha \)-percentile of the central Chi-squared distribution with \(p\) degrees of freedom for \(0<\alpha <1\). It follows from Theorem 1 that the approximate \(100(1-\alpha )~\%\) EL confidence region (ELCR) for \(\beta \) can be obtained by \(\{\beta : \ell _l(\beta )\le \chi _{p,\alpha }^2\}\) for \(l=c\) and \(I\).

**Theorem 2**

Let \(\hat{\varOmega }_k=\hat{\varXi }_k^{-1}\hat{\varLambda }_k\hat{\varXi }_k^{-1}\), where \(\hat{\varLambda }_k=n^{-1}\sum _{i=1}^{n}Z_{ik}({\hat{\beta }})Z_{ik}^\mathrm{T}({\hat{\beta }})\) and \(\hat{\varXi }_k=n^{-1}\sum _{i=1}^{n}\{\partial Z_{ik}(\beta )/\partial \beta \}_{\beta =\hat{\beta }_k}\) for \(k=c\) and \(I\). It is easily shown that \(\hat{\varOmega }_k\) is the consistent estimator of \(\varXi _k^{-1}\varLambda _k\varXi _k^{-1}\) for \(k=c\) and \(I\). Then, it follows from Theorem 2 that \(\sqrt{n}\hat{\varOmega }_k^{-1/2}(\hat{\beta }_k-\beta )\stackrel{\mathcal{L}}{\rightarrow } N(0,I_p)\), which yields \(n(\hat{\beta }_k-\beta )^\mathrm{T}\hat{\varOmega }_k^{-1}(\hat{\beta }_k-\beta )\stackrel{\mathcal{L}}{\rightarrow } \chi _p^2\), where \(I_p\) is the \(p\times p\) identity matrix. Therefore, the approximate \(100(1-\alpha )~\%\) ELCR for \(\beta \) can be constructed by \(\{\beta : n(\hat{\beta }_k-\beta )^\mathrm{T}\hat{\varOmega }_k^{-1}(\hat{\beta }_k-\beta )\le \chi _{p,\alpha }^{2}\}\) for \(k=c\) and \(I\).

**Theorem 3**

Suppose that the conditions (A1)–(A11) given in the Appendix hold and the kernel function \(K(\cdot )\) is twice continuously differentiable on \([0,1]\). If \(g(t_0)\) is the true value of the baseline function \(g(t)\), then we have \(\ell _R(g(t_0))\stackrel{\mathcal{L}}{\rightarrow } \chi _1^2\).

By Theorem 3, an approximate \(100(1-\alpha )~\%\) pointwise EL confidence interval (CI) for \(g(t_0)\) can be constructed by \(\{g(t_0): {\hat{\ell }}(g(t_0))\le \chi _{1,\alpha }^2\}\).

**Theorem 4**

**Proposition 1**

If the condition (A2) is substituted by the condition that \(Nh^2/\log (N)\)\(\rightarrow \infty \) and \(Nh^5\rightarrow 0\), then the bias term \(b(t_0)\) is asymptotically zero and \(\sqrt{Nh}\{{\hat{g}}(t_0)-g(t_0)\}\stackrel{\mathcal{L}}{\rightarrow }N(0,\gamma ^2(t_0))\).

**Proposition 2**

If the condition presented in Proposition 1 holds, the approximation \(100(1-\alpha )~\%\) CI for \(g(t_0)\) can be expressed as \({\hat{g}}(t_0) \pm z_{\alpha /2}(Nh)^{-1/2}\hat{\gamma }(t_0)\).

**Theorem 5**

Suppose that the conditions (A1)–(A11) given in the Appendix hold. Then, we have \(\hat{g}^\mathrm{MIP}(t)-g(t)=O_p((nh)^{-\frac{1}{2}}+(nb)^{-\frac{1}{2}}+b+h)\). In particular, if \(h=O(n^{-1/3})\) and \(b=O(n^{-1/3})\), we have \(\hat{g}^\mathrm{MIP}(t)-g(t)=O_p(n^{-1/3})\).

Theorem 5 shows that \(\hat{g}^\mathrm{MIP}(t)\) attains the optimal convergence rate of nonparametric kernel regression estimator when \(h\!=\!O(n^{-1/3})\) and \(b\!=\!O(n^{-1/3})\) (Stone 1980).

## 4 Numerical examples

### 4.1 Simulation studies

**(1) One-dimensional case**

In the simulation study, the data set \(\{Y_{ij}:i=1,\ldots ,n,j=1,\ldots ,n_i\}\) was generated from the following semiparametric nonlinear model: \(Y_{ij}=\exp (X_{ij}\beta )+\cos (4\pi T_{ij})+\varepsilon _{ij}\) with the true value of parameter \(\beta \) being \(\beta =1.5\). To generate \(Y_{ij}\), we independently simulated \(X_{ij}\) and the time point \(T_{ij}\) from the uniform distribution \(U(0,1)\) and then generated \(\varepsilon _{ij}\) via \(\varepsilon _{ij}=e_i+v_{ij}\) in which \(e_i\) and \(v_{ij}\) were independently generated from \(N(0,\sigma _{e}^2)\) and \(N(0,\sigma _{v}^2)\) with the true values of parameters \(\sigma _e^2\) and \(\sigma _v^2\) being \(\sigma _e^2=1.0\) and \(\sigma _v^2=1.0\). This structure for generating \(\varepsilon _{ij}\) ensures dependence among the repeated measurements \(Y_{ij}\) for each subject \(i\) because \(\mathrm{cov}(\varepsilon _{ij},\varepsilon _{ik})=\sigma _e^2\) and the correlation coefficient between \(Y_{ij}\) and \(Y_{ik}\) is \(\sigma _{e}^2/(\sigma _{e}^2+\sigma _{v}^2)\) for \(j\not = k\). For simplicity, we consider the balanced design, i.e. \(n_1=\cdots =n_n=J\). To create the missing data for responses \(Y_{ij}\), we consider the following four cases for the selection probability function \(p(x,t;\gamma )=\exp (\gamma _0+\gamma _1 x+\gamma _2 t)/(1+{\exp }(\gamma _0+\gamma _1x+\gamma _2t))\) with \(\gamma =(\gamma _0,\gamma _1,\gamma _2)\) specified by (1) \(\gamma =(1.85,0.02,0.05)\), (2) \(\gamma =(1.0,0.5,0.05)\), (3) \(\gamma =(1.0,0.001,0.012)\) and (4) \(\gamma =(0.4,0.01,0.02)\). Clearly, the considered missing data mechanism is MAR. For each given case of the selection probability \(p(x,t;\gamma )\), the missing data \(Y_{ij}\)’s were created via the following steps: (a) we first generated a random number \(\tau \) from the uniform distribution \(U(0,1)\) and then (b) the observation \(Y_{ij}\) was missing if \(\tau \le 1-p(X_{ij},T_{ij};\gamma )\) and we set \(\delta _{ij}=0\), and \(\delta _{ij}=1\) otherwise. In evaluating MELE and CI for \(\beta \) and estimating the parametric function \(g(t)=\mathrm{cos}(4\pi t)\), we took the kernel function to be the Gaussian kernel \(K(u)=(2\pi )^{-1/2}\exp (-u^2/2)\) and set the bandwidths \(h\) and \(b\) to be \(n^{-1/5}\); we use the reweighted least squares iterative algorithm to estimate the parameter \(\gamma \). We considered the following three different kinds of working covariance matrices in the simulation study, that is, we took \(V=I_J\) (working independence), \(V=\varSigma _i\) (true covariance matrix) and \(V=\tilde{V}_i\) (estimator of \(V\)), where \(\tilde{V}_i\) is evaluated using the formulae introduced in Sects. 2.2 and 2.3.

Bias, RMS, coverage probability and average length of \(\beta \) under different missing functions \(P(X,T)\) and sample sizes when nominal level is 0.95 and \(p=1\)

Methods | \(n=50\) | \(n=100\) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

CEL | IEL | CEL | IEL | ||||||||||

\(I\) | \(\varSigma _i\) | \(\tilde{V}_i\) | \(I\) | \(\varSigma _i\) | \(\tilde{V}_i\) | \(I\) | \(\varSigma _i\) | \(\tilde{V}_i\) | \(I\) | \(\varSigma _i\) | \(\tilde{V}_i\) | ||

Case 1 | |||||||||||||

Bias | 0.003 | 0.003 | 0.004 | 0.003 | 0.003 | 0.004 | 0.003 | 0.000 | 0.000 | 0.002 | 0.000 | 0.000 | |

RMS | 0.087 | 0.0710 | 0.073 | 0.087 | 0.073 | 0.076 | 0.062 | 0.050 | 0.051 | 0.062 | 0.051 | 0.052 | |

NACP | 0.922 | 0.938 | 0.920 | 0.946 | 0.938 | 0.908 | 0.932 | 0.934 | 0.926 | 0.966 | 0.944 | 0.926 | |

NAAL | 0.319 | 0.266 | 0.256 | 0.356 | 0.271 | 0.262 | 0.226 | 0.186 | 0.182 | 0.251 | 0.190 | 0.187 | |

ELCP | 0.922 | 0.936 | 0.918 | 0.922 | 0.932 | 0.912 | 0.930 | 0.940 | 0.930 | 0.930 | 0.946 | 0.940 | |

ELAL | 0.325 | 0.267 | 0.256 | 0.325 | 0.273 | 0.263 | 0.225 | 0.184 | 0.180 | 0.225 | 0.188 | 0.184 | |

Case 2 | |||||||||||||

Bias | 0.003 | 0.003 | 0.004 | 0.003 | 0.003 | 0.004 | 0.002 | 0.000 | 0.000 | 0.002 | 0.001 | 0.001 | |

RMS | 0.091 | 0.074 | 0.077 | 0.091 | 0.079 | 0.083 | 0.063 | 0.052 | 0.053 | 0.063 | 0.053 | 0.055 | |

NACP | 0.912 | 0.940 | 0.926 | 0.974 | 0.934 | 0.920 | 0.946 | 0.936 | 0.928 | 0.980 | 0.944 | 0.924 | |

NAAL | 0.331 | 0.278 | 0.269 | 0.395 | 0.287 | 0.279 | 0.233 | 0.194 | 0.191 | 0.277 | 0.201 | 0.198 | |

ELCP | 0.920 | 0.936 | 0.924 | 0.922 | 0.924 | 0.918 | 0.946 | 0.932 | 0.932 | 0.944 | 0.944 | 0.930 | |

ELAL | 0.337 | 0.280 | 0.270 | 0.336 | 0.290 | 0.282 | 0.232 | 0.192 | 0.188 | 0.232 | 0.199 | 0.196 | |

Case 3 | |||||||||||||

Bias | 0.004 | 0.005 | 0.006 | 0.004 | 0.004 | 0.006 | 0.003 | 0.001 | 0.001 | 0.003 | 0.000 | 0.000 | |

RMS | 0.092 | 0.078 | 0.081 | 0.092 | 0.082 | 0.087 | 0.064 | 0.053 | 0.055 | 0.064 | 0.055 | 0.057 | |

NACP | 0.924 | 0.942 | 0.918 | 0.978 | 0.930 | 0.916 | 0.944 | 0.932 | 0.934 | 0.982 | 0.944 | 0.930 | |

NAAL | 0.340 | 0.288 | 0.280 | 0.429 | 0.298 | 0.292 | 0.239 | 0.201 | 0.198 | 0.300 | 0.209 | 0.207 | |

ELCP | 0.924 | 0.940 | 0.916 | 0.928 | 0.926 | 0.916 | 0.944 | 0.936 | 0.932 | 0.942 | 0.948 | 0.940 | |

ELAL | 0.347 | 0.291 | 0.282 | 0.344 | 0.302 | 0.295 | 0.239 | 0.199 | 0.196 | 0.238 | 0.208 | 0.205 | |

Case 4 | |||||||||||||

Bias | 0.002 | 0.002 | 0.003 | 0.001 | 0.001 | 0.002 | 0.002 | 0.002 | 0.002 | 0.002 | 0.001 | 0.000 | |

RMS | 0.099 | 0.084 | 0.087 | 0.099 | 0.092 | 0.098 | 0.067 | 0.058 | 0.060 | 0.067 | 0.061 | 0.063 | |

NACP | 0.922 | 0.936 | 0.922 | 0.984 | 0.922 | 0.914 | 0.944 | 0.942 | 0.934 | 0.998 | 0.944 | 0.938 | |

NAAL | 0.361 | 0.314 | 0.307 | 0.521 | 0.328 | 0.326 | 0.255 | 0.219 | 0.216 | 0.364 | 0.230 | 0.229 | |

ELCP | 0.918 | 0.940 | 0.928 | 0.920 | 0.930 | 0.908 | 0.944 | 0.942 | 0.938 | 0.946 | 0.954 | 0.942 | |

ELAL | 0.369 | 0.318 | 0.310 | 0.365 | 0.335 | 0.331 | 0.255 | 0.217 | 0.215 | 0.253 | 0.230 | 0.229 |

From Table 1, we have following observations: (1) the CEL method has shorter interval length than the IEL method; (2) the EL-based method produces shorter interval length but larger coverage probability than the NA-based method; (3) the coverage probabilities for our considered EL-based CI and NA-based CI are close to the prespecified nominal level when the sample size is large or the average proportion of missing data is small; (4) the widths for the EL-based CI and the NA-based CI decrease as sample size \(n\) increases for every fixed selection probability function; (5) the average length depends on the selection probability function, namely, the average length increases as the missing rate increases; (6) the EL-based estimate for \(\beta \) is reasonably accurate under different cases for the selection probability function and all considered sample sizes including small sample case; and (7) the values of Bias and RMS via the true working covariance matrix are smaller than the other two cases, whilst the method via the estimated working covariance matrix performs better than the method with the identity working covariance matrix; the CI via the estimated working covariance matrix outperforms the CI via the identity and true working covariance matrices in terms of the length of CI. These results show that increasing \(n\) or reducing missing rate can improve the accuracy of estimators.

**(2) Two-dimensional case**

Bias, RMS, coverage probability and average length of \(\beta \) under different missing functions \(P(X,T)\) and sample size when nominal level is 0.95 and \(p=2\)

Methods | \(n=50\) | \(n=100\) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

CEL | IEL | CEL | IEL | ||||||||||

\(I\) | \(\varSigma _i\) | \(\tilde{V}_i\) | \(I\) | \(\varSigma _i\) | \(\tilde{V}_i\) | \(I\) | \(\varSigma _i\) | \(\tilde{V}_i\) | \(I\) | \(\varSigma _i\) | \(\tilde{V}_i\) | ||

Estimate of \(\beta _1\) | |||||||||||||

Bias | 0.001 | 0.004 | 0.004 | 0.002 | 0.003 | 0.002 | 0.001 | 0.001 | 0.000 | 0.001 | 0.000 | 0.001 | |

RMS | 0.121 | 0.100 | 0.105 | 0.122 | 0.103 | 0.110 | 0.088 | 0.072 | 0.072 | 0.088 | 0.076 | 0.078 | |

NACP | 0.946 | 0.868 | 0.922 | 0.944 | 0.948 | 0.924 | 0.938 | 0.876 | 0.944 | 0.942 | 0.936 | 0.926 | |

NAAL | 0.477 | 0.326 | 0.386 | 0.478 | 0.414 | 0.403 | 0.339 | 0.227 | 0.275 | 0.337 | 0.289 | 0.284 | |

ELCP | 0.937 | 0.953 | 0.930 | 0.945 | 0.945 | 0.928 | 0.957 | 0.965 | 0.943 | 0.949 | 0.967 | 0.947 | |

ELAL | 0.280 | 0.228 | 0.217 | 0.279 | 0.236 | 0.226 | 0.194 | 0.158 | 0.154 | 0.193 | 0.161 | 0.158 | |

Estimate of \(\beta _1\) with \(p_2(x,t)\) | |||||||||||||

Bias | 0.001 | 0.002 | 0.003 | 0.001 | 0.004 | 0.005 | 0.006 | 0.003 | 0.003 | 0.006 | 0.003 | 0.003 | |

RMS | 0.134 | 0.115 | 0.119 | 0.134 | 0.123 | 0.132 | 0.099 | 0.083 | 0.085 | 0.099 | 0.088 | 0.092 | |

NACP | 0.950 | 0.884 | 0.916 | 0.944 | 0.946 | 0.928 | 0.950 | 0.864 | 0.932 | 0.950 | 0.940 | 0.940 | |

NAAL | 0.534 | 0.375 | 0.449 | 0.529 | 0.490 | 0.486 | 0.378 | 0.261 | 0.319 | 0.375 | 0.342 | 0.339 | |

ELCP | 0.941 | 0.932 | 0.915 | 0.930 | 0.939 | 0.909 | 0.945 | 0.963 | 0.949 | 0.947 | 0.943 | 0.926 | |

ELAL | 0.319 | 0.267 | 0.261 | 0.319 | 0.287 | 0.282 | 0.215 | 0.183 | 0.180 | 0.213 | 0.191 | 0.188 | |

Estimate of \(\beta _2\) with \(p_1(x,t)\) | |||||||||||||

Bias | 0.004 | 0.009 | 0.008 | 0.003 | 0.007 | 0.005 | 0.007 | 0.004 | 0.004 | 0.006 | 0.003 | 0.002 | |

RMS | 0.141 | 0.112 | 0.118 | 0.141 | 0.116 | 0.124 | 0.099 | 0.079 | 0.081 | 0.099 | 0.083 | 0.086 | |

NACP | 0.944 | 0.892 | 0.944 | 0.942 | 0.952 | 0.926 | 0.948 | 0.902 | 0.946 | 0.952 | 0.954 | 0.944 | |

NAAL | 0.539 | 0.372 | 0.438 | 0.538 | 0.469 | 0.455 | 0.381 | 0.255 | 0.309 | 0.379 | 0.324 | 0.319 | |

ELCP | 0.937 | 0.953 | 0.930 | 0.945 | 0.945 | 0.928 | 0.957 | 0.965 | 0.943 | 0.949 | 0.967 | 0.947 | |

ELAL | 0.280 | 0.228 | 0.217 | 0.279 | 0.236 | 0.226 | 0.194 | 0.158 | 0.154 | 0.193 | 0.161 | 0.158 | |

Estimate of \(\beta _2\) with \(p_2(x,t)\) | |||||||||||||

Bias | 0.009 | 0.010 | 0.009 | 0.009 | 0.012 | 0.011 | 0.004 | 0.004 | 0.003 | 0.004 | 0.002 | 0.001 | |

RMS | 0.160 | 0.135 | 0.141 | 0.160 | 0.140 | 0.152 | 0.113 | 0.093 | 0.095 | 0.112 | 0.102 | 0.106 | |

NACP | 0.938 | 0.892 | 0.924 | 0.942 | 0.944 | 0.938 | 0.936 | 0.868 | 0.928 | 0.944 | 0.928 | 0.924 | |

NAAL | 0.604 | 0.426 | 0.509 | 0.600 | 0.556 | 0.551 | 0.426 | 0.294 | 0.358 | 0.421 | 0.383 | 0.381 | |

ELCP | 0.941 | 0.932 | 0.915 | 0.930 | 0.939 | 0.909 | 0.945 | 0.963 | 0.949 | 0.947 | 0.943 | 0.926 | |

ELAL | 0.319 | 0.267 | 0.261 | 0.319 | 0.287 | 0.282 | 0.215 | 0.183 | 0.180 | 0.213 | 0.191 | 0.188 |

### 4.2 A real example

A longitudinal data set from the pediatric AIDS clinical trial group ACTG 315 study was used to illustrate our proposed methodologies. In an AIDS clinical trial, plasma HIV RNA copies (viral load) and CD4+ cell counts were two important surrogate markers for evaluating antiviral therapies (Saag et al. 1996; Mellors et al. 1996). Clinical investigators’ main purpose is to study their relationship during antiviral treatment. In this study, viral load and CD4+ cell counts from \(46\) patients were measured on treatment days \(t=0,2,4,5,6,7,8,9,10,11,12,13,14,15,16,25,27,\ldots ,175,\)\(182,196\) after initiation of an antiviral therapy, and 361 complete pairs of viral load and CD4+ cell count were obtained. The number of the measured time points on individual patients ranges from 4 to 8. The data set has even been analysed by Liang et al. (2003) and Xue and Xue (2011). The preceding studies in Liang et al. (2003) and Xue and Xue (2011) suggested that viral load depends linearity on CD4 cell count but nonlinearly on treatment time; however, the scatterplot between viral load and CD4 cell count shows that there is no rigorous linearity between viral load and CD4 cell count. Therefore, here we used the following semiparametric nonlinear model to formulate the relationship between viral load and CD4 cell count: \(Y_{ij}=\exp (X_{ij}\beta )+g(T_{ij})+\varepsilon _{ij}\), where \(Y_{ij}\) and \(X_{ij}\) are the viral load and the CD4+ cell count for subject \(i\) at treatment time \(T_{ij}\), respectively. To illustrate the application of our proposed methodologies, we created missing data via the following selection probability function: \(p(x,t;\gamma )=\exp (\gamma _0+\gamma _1 x+\gamma _2t)/(1+{\exp }(\gamma _0+\gamma _1x+\gamma _2t))\) with \(\gamma =(\gamma _0,\gamma _1,\gamma _2)=(0.4,0.05,0.1)\). Based on this selection probability function and the assumption that \(Y_{i1}\) was always observed, the missing data for \(Y_{ij}\) were created with the following steps: (a) we generated a random number \(\tau \) from the uniform distribution \(U(0,1)\), (b) \(Y_{ij}\) was missing if \(\tau \le p(X_{ij},T_{ij};\gamma )\) for \(i=1,\ldots ,46,j=1,\ldots ,n_i\). The corresponding missing proportion is roughly \(15~\%\). As commonly done in AIDS clinical trials, we used \(\log _{10}\) scale in viral load and \(100^{-1}\) scale in CD4 cell counts to stabilize the variance and computational algorithms.

## 5 Conclusions

By introducing the working covariance matrix into the auxiliary random vector, we develop an EL-based inference procedure for a semiparametric nonlinear regression model for longitudinal data with response missing at random. Two MELEs for unknown parameter \(\beta \) in our considered semiparametric nonlinear regression models were presented on the basis of the complete-case data and the imputed values of missing responses. Also, a maximum residual-adjusted EL estimator and an imputation estimator for the smoothing functions were proposed. We systematically investigate the asymptotic properties of the MELEs under this new setting. Our main contribution is that (1) our considered model is more general than nonlinear regression model and semiparametric regression model with response missing at random, which indicates that our proposed theoretical results are new; (2) the working covariance matrix is introduced to accommodate for the within-subject correlation, which can be used to improve the efficiency of MELE; and (3) we proved that our constructed EL ratio statistic for \(\beta \) follows asymptotically the central Chi-squared distribution, which can be directly used to construct confidence regions of parameters in our considered semiparametric nonlinear regression model without any extra Monte Carlo approximation needed when our proposed EL method is not used. We extended the EL inference procedure for semiparametric regression models with missing response at random to semiparametric nonlinear regression models for longitudinal data with missing response at random by incorporating the within-subject correlation into the constructed auxiliary vectors.

## 6 Appendix

- (A1)
The selection probability function \(p(x,t)\) and the \(X\)-density function \(\varGamma (x)\) have bounded partial derivatives up to order \(s\) with \(s\ge 2\).

- (A2)
Let \(S(\gamma )\) be the score function of the partial likelihood \(L(\gamma )\) for parameter \(\gamma =(\gamma _0,\gamma _1^\mathrm{T},\gamma _2)^\mathrm{T}\) defined in Sect. 2.1 and \(\gamma ^*\) be in the interior of compact set \(\Upsilon \). We assume \(\mathrm {var}(S(\gamma ))\) is a finite and positive definite matrix, and \(E(\partial S(\gamma )/\partial \gamma |_{\gamma =\gamma ^*})\) exists and is invertible. The missing propensity \(p(X_{ij},T_{ij};\gamma )>c_0>0\) for all \(i\in \{1,\ldots ,n\}\) and \(j\in \{1,\ldots ,n_i\}\).

- (A3)
The bandwidth satisfies \(h=h_0N^{-1/5}\) for some constant \(h_0>0\), and \(b=b_0N^{-1/5}\) for some constant \(b_0>0\).

- (A4)
The kernel function \(K(\cdot )\) is a symmetric and bounded probability density function with support \([-1,1]\).

- (A5)
For each design, points \(\{T_{ij}:i=1,\ldots ,n,j=1,\ldots ,n_i\}\) are assumed to be independent and identically distributed from a super-population density \(\kappa (t)\). Both \(q(t)\) and \(\kappa (t)\) have continuous and bounded derivatives on (0,1) and are bounded away from zero and infinity on [0,1].

- (A6)
The residuals \(\varepsilon _{ij}\) and \(u_{ij}\) are independent of each other, and \(\varepsilon _{ij}\) and \(u_{ij}\) are, respectively, independent of \(\varepsilon _{i^{\prime }j}\) and \(u_{i^{\prime }j}\) for any \(i\not = i^{\prime }\). Further, we assume that \(E|\varepsilon _{ij}|^{4+r}<\infty \), \(\max _{1\le i \le n}\Vert u_{ij}\Vert =o_p\{n^{\frac{2+r}{2(4+r)}}(\mathrm{log}n)^{-1}\}\) for some \(r>0\).

- (A7)
The matrices \(\varLambda _i\) and \(\varXi _i\) (\(i=c,I\)) defined in Theorem 2 are positive definite.

- (A8)
The functions \(g(t)\) and \(h(t)\) are twice continuously differentiable on (0,1).

- (A9)
The function \(f(X;\beta )\) is continuous with respect to \(\beta \) in a compact set \(\Theta \).

- (A10)There exit two positive constants \(c_1\) and \(c_2\) such thatwhere \(\lambda _{i1}\) and \(\lambda _{in_i}\) denote the smallest and largest eigenvalues of \(\varSigma _i\), respectively.$$\begin{aligned} 0\le \min \limits _{1\le i \le n}\lambda _{i1}\le \min \limits _{1\le i \le n}\lambda _{in_i}\le c_2<\infty , \end{aligned}$$
- (A11)There exit two positive constants \(c_3\) and \(c_4\) such thatwhere \(\lambda _{i1}^{^{\prime }}\) and \(\lambda _{in_i}^{^{\prime }}\) denote the smallest and largest eigenvalues of \(V_i\), respectively.$$\begin{aligned} 0\le \min \limits _{1\le i \le n}\lambda _{i1}^{^{\prime }}\le \min \limits _{1\le i \le n}\lambda _{in_i}^{^{\prime }}\le c_2<\infty , \end{aligned}$$

To complete Proofs of Theorems 1–5, the following Lemmas are needed:

**Lemma 1**

*Proof*

For simplicity, we only prove the second equation. The other two equations can be similarly proved. According to the inequality \((A+B)^2 \le 2A^2+2B^2\) for any constants \(A\) and \(B\) and \(\sum _{k=1}^n\sum _{l=1}^{n_i}W_{kl}^\mathrm{{C}}(T_{ij})=1\), we can prove that \(E\{|\hat{g}_{2n}^\mathrm{{C}}(T_{ij}) - g_2^\mathrm{{C}}(T_{ij})|^2|T_{ij}=t\}\le I_1(t)+I_2(t)\), where \(I_1(t) = 2E\{|\sum _{k=1}^n\sum _{l=1}^{n_k}W_{kl}^\mathrm{{C}}(T_{ij})(Y_{kl} - g_2^\mathrm{{C}}(T_{kl}))|^2|T_{ij}=t\}\) and \(I_2(t) = 2E\{|\sum _{k=1}^n\sum _{l=1}^{n_k}W_{kl}^\mathrm{{C}}(T_{ij})(g_2^\mathrm{{C}}(T_{kl}) - g_2^\mathrm{{C}}(T_{ij}))|^2|\)\(T_{ij}=t\}\).

We first prove that \(\sup _{a \le t \le b}I_2(t)=O(n^{-1}h+h^{4})\). Let \(q(t)=E(\delta |T=t)\), \(m(t)=q(t)\kappa (t)\) and \(\hat{m}(t)=(nh)^{-1}\sum _{i=1}^n\sum _{j=1}^{n_i}\delta _{ij}\)\(K_h(T_{ij}-t)\). Following the standard procedure in a nonparametric regression, it can be shown that \(\max _{a \le t \le b}|\hat{m}(t)-m(t)|=O(n^{-1/5})\) a.s.. Hence, it follows from condition (A4) that there are two positive constants \(c_1\) and \(c_2\) such that \(\min _{0\le t \le 1}m(t)\ge c_1\) and \(\min _{0\le t \le 1}\hat{m}(t)\ge c_2\) a.s.. Let \(\psi _{kl}(T_{ij})=K_h(T_{kl}-T_{ij})\delta _{kl}\{g_2^\mathrm{{C}}(T_{kl}) -g_2^\mathrm{{C}}(T_{ij})\}\). Then, by conditions (A3), (A4) and (A7), we have \(\max _{a \le t \le b}|E\{\psi _{kl}(T_{ij})|T_{ij}=t\}|=O(h^3)\) and \(\max _{a \le t \le b}|E\{\psi ^2_{kl}(T_{ij})|T_{ij}=t\}|=O(h^3)\). Based on these results, it is easy to show that \(I_2(t)\le cn^{-1}h+ch^{4}\).

Again, it is easy to show that \(E\{\delta _{kl}(Y_{kl}-g_2^\mathrm{{C}}(T_{kl}))\}=0\). Then, we can obtain that \(I_1(t)\le c(nh)^{-1}\). Combining the above inequalities finishes the proof of the second equation.\(\square \)

**Lemma 2**

*Proof*

Let \(\check{g}(T_{ij}) = g(T_{ij})-\hat{g}(T_{ij}) = g(T_{ij})-\hat{g}_{2n}^\mathrm{C}(T_{ij})+\hat{g}_{1n}^\mathrm{C}(T_{ij};\beta )\). Denote \(\sigma _i^{kl}\) be the \((k,l)\)th component of \(V_i^{-1}\). Then, we have \(n^{-1/2}\sum _{i=1}^nZ_{i1}(\beta )\triangleq U_1+U_2+U_3+U_4\), where \(U_1 = n^{-1/2}\sum _{i=1}^n\sum _{k=1}^{n_i} \sum _{l=1}^{n_i}\{\delta _{ik}\delta _{il}u_{ik}\sigma _i^{kl} \varepsilon _{il}\}\), \(U_2=n^{-1/2}\sum _{i=1}^n\sum _{k=1}^{n_i} \sum _{l=1}^{n_i}\{\delta _{ik}\delta _{il}\sigma _i^{kl}\check{h} (T_{ik},\beta )\varepsilon _{il}\}\), \(U_3=n^{-1/2}\sum _{i=1}^n\sum _{k=1}^{n_i} \sum _{l=1}^{n_i}\)\(\{\delta _{ik}\delta _{il} \sigma _i^{kl}u_{ik}\check{g}(T_{il})\}\), \(U_4= n^{-1/2}\sum _{i=1}^n\sum _{k=1}^{n_i}\)\(\sum _{l=1}^{n_i}\{\delta _{ik}\delta _{il} \sigma _i^{kl}\check{h}(T_{ik},\beta ) \check{g}(T_{il})\}\).

**Lemma 3**

*Proof*

*Proof of Theorem 1*

From Lemmas 2 and 3, we obtain \(n\{ \frac{1}{n}\sum _{i=1}^{n}Z_{il}(\beta )\}^\mathrm{T}S_l^{-1}\{ \frac{1}{n}\sum _{i=1}^{n} Z_{il}(\beta )\}\stackrel{ \mathcal {L}}{ \rightarrow }\chi _{p}^{2}\) as \(n\rightarrow \infty \). It follows from the definitions of \(\xi _{nl}\) and \(S_l\) and the above equations that \(n\xi _{nl}^\mathrm{T}S_l^{-1}\xi _{nl} = no_{p}(n^{-\frac{1}{2}})O_{p}(1)o_{p}(n^{-\frac{1}{2}})=o_{p}(1)\) and \(2\sum _{i=1}^{n}\eta _{il}\le 2C\Vert \lambda _{nl}\Vert ^{3}\sum _{i=1}^{n} \Vert Z_{il}(\beta )\Vert ^{3}=O_{p}(n^{-\frac{3}{2}})o_{p}(n^{\frac{3}{2}}) = o_{p}(1)\). Then, combining the above equations leads to \(\ell _{l}(\beta )\stackrel{ \mathcal {L}}{ \rightarrow }\chi _{p}^{2}\) for \(l=c\) and \(I\). \(\square \)

*Proof of Theorem 2*

*Proof of Theorem 3*

*Proof of Theorem 4*

*Proof of Theorem 5*

## Acknowledgments

The authors thank two anonymous referees for their helpful comments and suggestions which have substantially improved the readability and the presentation of this paper. The research was fully supported by grants from the National Natural Science Foundation of China (10961026, 11171293), Specialized Research Fund for the Doctoral Program of Higher Education of China (No. 20115301110004) and the Natural Science Key Project of Yunnan Province (No. 2010CC003).