Skip to main content

Advertisement

Log in

Kernel estimation in semiparametric mixed effect longitudinal modeling

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

HIV damages the immune system by targeting the CD4 cells. Hence CD4 count data modeling is important in the analysis of HIV infection. This paper considers a semiparametric mixed effects model for the analysis of CD4 longitudinal data. The model is a natural extension to the linear mixed and semiparametric models that uses parametric linear model to present the covariate effects and an arbitrary nonparametric smooth function to model the time effect and account for the within subject correlation using random effects. We approximate the nonparametric function by the profile kernel method, and make use of the weighted least squares to estimate the regression coefficients. Under some regularity conditions, the asymptotic normality of the resulting estimator is established and the performance is compared with the backfitting method. Although, two estimators share the same asymptotic variance matrix for independent data, it is shown that, backfitting often produces larger bias and variance than the profile-kernel method, asymptotically. Consequently, the use of backfitting method is no longer advised in semiparametric mixed effect longitudinal model. For practical implementation and also improve efficiency, the estimation of the covariance function is accomplished using an iterative algorithm. Performance of the proposed methods are compared through a simulation study and the analysis of CD4 data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Akdeniz F, Roozbeh M (2017) Generalized difference-based weighted mixed almost unbiased ridge estimator in partially linear models. Statistical Papers, Basel. https://doi.org/10.1007/s00362-017-0893-9

    Book  MATH  Google Scholar 

  • Carroll RJ, Fan J, Gijbels I, Wand MP (1997) Generalised linear single-index models. J Am Stat Assoc 92:477–89

    Article  Google Scholar 

  • Fan JQ, Li RZ (2004) New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J Am Stat Assoc 99:710–723

    Article  MathSciNet  Google Scholar 

  • Fan JQ, Huang T, Li RZ (2007) Analysis of longitudinal data with semiparametric estimation of covariance function. J Am Stat Assoc 102:632–641

    Article  Google Scholar 

  • Fitzmaurice GM, Laird NM, Ware JH (2004) Applied longitudinal analysis. Wiley, Hoboken

    MATH  Google Scholar 

  • He XM, Zhu ZY, Fung WK (2002) Estimation in a semiparametric model for longitudinal data with unspecified dependence structure. Biometrika 89:579–590

    Article  MathSciNet  Google Scholar 

  • Hu Z, Wang N, Carroll R (2004) Profile-kernel versus backfitting in the partially linear models for longitudinal/clustered data. Biometrika 91:251–262

    Article  MathSciNet  Google Scholar 

  • Li Z, Zhu L (2010) On variance components in semiparametric mixed models for longitudinal data. Scand J Stat 37:442–457

    Article  MathSciNet  Google Scholar 

  • Liang H (2009) Generalized partially linear mixed-effects models incorporating mismeasured covariates. Ann Inst Stat Math 61:27–46

    Article  MathSciNet  Google Scholar 

  • Lin X, Carroll RJ (2001) Semiparametric regression for clustered data using generalised estimating equations. J Am Stat Assoc 96:1045–56

    Article  Google Scholar 

  • Opsomer JD, Ruppert D (1999) A root-n consistent backfitting estimator for semiparametric additive modeling. J Comput Graph Stat 8:715–32

    Google Scholar 

  • Qin GY, Zhu ZY (2007) Robust estimation in generalized semiparametric mixed models for longitudinal data. J Mult Anal 98:1658–1683

    Article  MathSciNet  Google Scholar 

  • Qin GY, Zhu ZY (2009) Robustified maximum likelihood estimation in generalized partial linear mixed model for longitudinal data. Biometrics 65:52–59

    Article  MathSciNet  Google Scholar 

  • Rice JA (1986) Convergence rates for partially splined models. Stat Prob Lett 4:204–8

    Article  MathSciNet  Google Scholar 

  • Roozbeh M (2015) Shrinkage ridge estimators in semiparametric regression models. J Mult Anal 136:56–74

    Article  MathSciNet  Google Scholar 

  • Roozbeh M (2016) Robust ridge estimator in restricted semiparametric regression models. J Mult Anal 147:127–144

    Article  MathSciNet  Google Scholar 

  • Severini TA, Staniswalis JG (1994) Quasi-likelihood estimation in semi-parametric models. J Am Stat Assoc 89:501–12

    Article  Google Scholar 

  • Speckman PE (1988) Regression analysis for partially linear models. J R Stat Soc B 50:413–436

    MathSciNet  MATH  Google Scholar 

  • Wang N (2003) Marginal nonparametric kernel regression accounting for within-subject correlation. Biometrika 90:43–52

    Article  MathSciNet  Google Scholar 

  • Wang WL, Fan TH (2010) ECM-based maximum likelihood inference for multivariate linear mixed models with autoregressive errors. Comput Stat Data Anal 54:1328–1341

    Article  MathSciNet  Google Scholar 

  • Wang WL, Lin TI (2014) Multivariate t nonlinear mixed-effects models for multi-outcome longitudinal data with missing values. Stat Med 33:3029–3046

    Article  MathSciNet  Google Scholar 

  • Wang WL, Lin TI (2015) Bayesian analysis of multivariate t linear mixed models with missing responses at random. J Stat Comput Simul 85:3594–3612

    Article  MathSciNet  Google Scholar 

  • Wang N, Carroll R, Lin XH (2005) Efficient semiparametric marginal estimation for longitudinal/clustered data. J Am Stat Assoc 100:147–157

    Article  MathSciNet  Google Scholar 

  • Wang WL, Lin TI, Lachos VH (2015) Extending multivariate-t linear mixed models for multiple longitudinal data with censored responses and heavy tails. Meth Med Res Stat 27:48–64

    Article  MathSciNet  Google Scholar 

  • Wu CFJ (1983) On the convergence properties of the EM algorithm. Anal Stat 11:95–103

    MathSciNet  MATH  Google Scholar 

  • Yu Y, Ruppert D (2002) Penalised spline estimation for partially linear single index models. J Am Stat Assoc 97:1042–54

    Article  Google Scholar 

  • Zeger SL, Diggle PJ (1994) Semi-parametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics 50:689–99

    Article  Google Scholar 

  • Zhang D (2004) Generalized linear mixed models with varying coefficients for longitudinal data. Biometrics 60:8–15

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We would like to thank the reviewers for their constructive comments which significantly improved the presentation and led to put many details in the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Arashi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

In this section, we provide the proof of the main results along with a brief introduction of the EM method, which is used for the convergency of Algorithm 1. In the following proofs, recall that \({\varvec{t}}\), \(\varvec{X}\) and \({\varvec{Y}}\) denote the observations over all the subjects; that is \({\varvec{t}}=({\varvec{t}}_1^\top , \ldots , {\varvec{t}}_n^\top )^\top \), \({\varvec{t}}_i=(t_{i1}, \ldots , t_{in_i})\) and similarly for \(\varvec{X}\) and \({\varvec{Y}}\). Also, \({\varvec{V}}\) and \({\varvec{V}}_{o}\) stand for the \(N \times N\) assumed and true covariance matrices for all data, respectively.

1.1 Proof of Theorem 1

For the kernel estimator, based on expression (4.1),

$$\begin{aligned} \sqrt{n}\Big ({\widehat{{\varvec{\beta }}}}_{K}-{\varvec{\beta }}\Big ) =D_n^{-1}\lbrace \sqrt{n}C_n\rbrace +o_p(1), \end{aligned}$$

where

$$\begin{aligned} D_n =\frac{1}{n}\sum _{i=1}^n {\tilde{\varvec{X}}}_i^\top {\varvec{V}}_i^{-1} {\tilde{\varvec{X}}}_i \end{aligned}$$

and

$$\begin{aligned} C_n=\frac{1}{n}\sum _{i=1}^n {\tilde{\varvec{X}}}_i^\top {\varvec{V}}_i^{-1} \big [{\varvec{Y}}_i - \big (\varvec{X}^\top _i {\varvec{\beta }}+ {\widehat{{\varvec{g}}}}({\varvec{t}}_i)\big )\big ] . \end{aligned}$$

Denote,

$$\begin{aligned} D=\underset{n\longrightarrow \infty }{\lim }D_n=E\big \lbrace {\tilde{\varvec{X}}}^\top {\varvec{V}}^{-1}{\tilde{\varvec{X}}}\big \rbrace . \end{aligned}$$

Simple calculations show that \(C_n\), can be expanded as \(C_n = C_{1n}- C_{2n}, + o_p(1)\), where, denoting \({\varvec{\mu }}_i=\varvec{X}^\top _i {\varvec{\beta }}+ {\varvec{g}}({\varvec{t}}_i)\) and \({\varvec{W}}_{1i}={\varvec{V}}_i^{-1}{\tilde{\varvec{X}}}_i\),

$$\begin{aligned} C_{1n}=\frac{1}{n}\sum _{i=1}^n {\tilde{\varvec{X}}}_i^\top {\varvec{V}}_i^{-1} \big [{\varvec{Y}}_i - {\varvec{\mu }}_i\big ]=\frac{1}{n}\sum _{i=1}^n {\varvec{W}}_{1i}^{\top } \big [{\varvec{Y}}_i - {\varvec{\mu }}_i\big ] \end{aligned}$$

and

$$\begin{aligned} C_{2n}=\frac{1}{n}\sum _{i=1}^n {\tilde{\varvec{X}}}_i^\top {\varvec{V}}_i^{-1}\big [{\widehat{{\varvec{g}}}}({\varvec{t}}_i,{\varvec{\beta }}) -{\varvec{g}}({\varvec{t}}_i)]. \end{aligned}$$

Obtaining asymptotic distribution of \(\sqrt{n}C_{1n}\) is simple. Now examine the distribution of \(\sqrt{n}C_{2n}\). Following Lin and Carroll (2001), we have

$$\begin{aligned} C_{2n}=\frac{1}{n}\sum _{i=1}^n {\varvec{W}}_{2i}^{\top } \big [{\varvec{Y}}_i - {\varvec{\mu }}_i\big ]+\frac{h^2}{2}E\big \lbrace \varvec{X}^\top {\varvec{V}}^{-1}{\varvec{g}}^2({\varvec{t}}) \big \rbrace +o_p(1), \end{aligned}$$

where \({\varvec{W}}_{2i}=\lbrace {\varvec{W}}_{2i1}, \ldots ,{\varvec{W}}_{2im}\rbrace ^\top \) and

$$\begin{aligned} {\varvec{W}}_{2ij}=\dfrac{\lbrace \sum _{k=1}^{m}\sum _{l=1}^{m}E({\tilde{\varvec{X}}}_k{\varvec{V}}^{kl}|{\varvec{t}}_l=t_{ij})\rbrace f_j(t_{ij}))}{\sum _{l=1}^{m}f_l(t_{ij}))}. \end{aligned}$$

It follows that

$$\begin{aligned} \sqrt{n}\Big ({\widehat{{\varvec{\beta }}}}_{K}-{\varvec{\beta }}\Big ) =D^{-1}\frac{1}{\sqrt{n}}\sum _{i=1}^{n}({\varvec{W}}_{1i}-{\varvec{W}}_{2i})({\varvec{Y}}_i-{\varvec{\mu }}_i)+\sqrt{nh^4}{\varvec{b}}_K({\varvec{\beta }}, {\varvec{g}})/2+o_p(1), \end{aligned}$$

where the bias term \({\varvec{b}}_K({\varvec{\beta }}, {\varvec{g}})=D^{-1}E\lbrace {\tilde{\varvec{X}}}^\top {\varvec{V}}^{-1}{\varvec{g}}^{(2)}({\varvec{t}}) \rbrace \). Equivalently,

$$\begin{aligned} \sqrt{n}\lbrace {\widehat{{\varvec{\beta }}}}_K-{\varvec{\beta }}+h^2{\varvec{b}}_K({\varvec{\beta }}, {\varvec{g}})/2\rbrace \overset{D}{ \rightarrow } N_p(0, {\varvec{V}}_K) \end{aligned}$$

, where \({\varvec{V}}_K=D^{-1} E\big \lbrace ({\varvec{W}}_1-{\varvec{W}}_2)^\top {\varvec{V}}_o({\varvec{W}}_1-{\varvec{W}}_2)\big \rbrace D^{-1}\) with \({\varvec{V}}_o=var({\varvec{Y}}|\varvec{X},{\varvec{t}})\).

1.2 Proof of Theorem 3

The asymptotic variance \({\varvec{V}}_{BF}\) has its central component generated from \(\varvec{X}^\top {\varvec{V}}^{-1}({\varvec{I}}-{\varvec{S}}){\varvec{\varepsilon }}^*\), where \({\varvec{\varepsilon }}^*={\varvec{Z}}^\top {\varvec{b}}+{\varvec{\varepsilon }}\). Similarly, the central component in the asymptotic variance \({\varvec{V}}_K\) is from \(\varvec{X}^\top ({\varvec{I}}-{\varvec{S}})^\top {\varvec{V}}^{-1} ({\varvec{I}}-{\varvec{S}}){\varvec{\varepsilon }}^*\), which is \({\tilde{\varvec{X}}}^\top {\varvec{V}}^{-1} ({\varvec{I}}-{\varvec{S}}){\varvec{\varepsilon }}^*\) asymptotically. To compare \({\varvec{V}}_K\) and \({\varvec{V}}_{BF}\), it is thus sufficient to compare the variances of the two central terms.

We now show that, under the condition that \({\varvec{V}}\) and \({\varvec{V}}_o\) depend only on \({\varvec{t}}\),

$$\begin{aligned} \hbox {cov} \lbrace \varvec{X}^\top {\varvec{V}}^{-1}({\varvec{I}}-{\varvec{S}}){\varvec{\varepsilon }}^* \rbrace \ge \hbox {cov} \lbrace {\tilde{\varvec{X}}}^\top {\varvec{V}}^{-1} ({\varvec{I}}-{\varvec{S}}){\varvec{\varepsilon }}^* \rbrace . \end{aligned}$$

For the backfitting estimator,

$$\begin{aligned} \hbox {cov} \lbrace \varvec{X}^\top {\varvec{V}}^{-1}({\varvec{I}}-{\varvec{S}}){\varvec{\varepsilon }}^* \rbrace&=\hbox {E}\Big \lbrace \varvec{X}^\top {\varvec{V}}^{-1}({\varvec{I}}-{\varvec{S}}){\varvec{V}}_o({\varvec{I}}-{\varvec{S}})^\top {\varvec{V}}^{-1} \varvec{X}| {\varvec{t}}\Big \rbrace \\&= \mu _{\varvec{X}}^\top ({\varvec{t}}) {\varvec{V}}^{-1}({\varvec{I}}-{\varvec{S}}){\varvec{V}}_o({\varvec{I}}-{\varvec{S}})^\top {\varvec{V}}^{-1}\mu _{\varvec{X}}({\varvec{t}}) \\&\quad +\hbox {tr} \big [{\varvec{V}}^{-1}({\varvec{I}}-{\varvec{S}}){\varvec{V}}_o({\varvec{I}}-{\varvec{S}})^\top {\varvec{V}}^{-1} \hbox {cov} (\varvec{X}|{\varvec{t}})\big ]. \end{aligned}$$
(A1)

In this expression, \({\varvec{\mu }}_X({\varvec{t}})\) is generally nonzero and the first term is positive semidefinite because \({\varvec{V}}^{-1}({\varvec{I}}-{\varvec{S}}){\varvec{V}}_o({\varvec{I}}-{\varvec{S}})^\top {\varvec{V}}^{-1}\) is positive semidefinite. Also,

$$\begin{aligned} \hbox {cov} \lbrace {\tilde{\varvec{X}}}^\top {\varvec{V}}^{-1} ({\varvec{I}}-{\varvec{S}}){\varvec{\varepsilon }}^* \rbrace&=\hbox {E}\Big \lbrace {\tilde{\varvec{X}}}^\top {\varvec{V}}^{-1} ({\varvec{I}}-{\varvec{S}}){\varvec{V}}_o ({\varvec{I}}-{\varvec{S}})^\top {\varvec{V}}^{-1} {\tilde{\varvec{X}}}| {\varvec{t}}\Big \rbrace \\&=\hbox {E}({\tilde{\varvec{X}}}|{\varvec{t}})^\top {\varvec{V}}^{-1} ({\varvec{I}}-{\varvec{S}}){\varvec{V}}_o({\varvec{I}}-{\varvec{S}})^\top {\varvec{V}}^{-1} \hbox {E}({\tilde{\varvec{X}}}|{\varvec{t}}) \\&\quad + \hbox {tr} \big [{\varvec{V}}^{-1}({\varvec{I}}-{\varvec{S}}){\varvec{V}}_o({\varvec{I}}-{\varvec{S}})^\top {\varvec{V}}^{-1} \hbox {cov} ({\tilde{\varvec{X}}}|{\varvec{t}})\big ]. \end{aligned}$$
(A2)

Note that

$$\begin{aligned} \hbox {E}({\tilde{\varvec{X}}}_i|{\varvec{t}}_i){=}\hbox {E}\Big \lbrace \varvec{X}_i{-}{\varvec{\mu }}_X(t_i)|{\varvec{t}}_i\Big \rbrace {=}0, \qquad \hbox {cov}({\tilde{\varvec{X}}}_i|{\varvec{t}}_i){=}\hbox {cov} \Big \lbrace \varvec{X}_i{-}{\varvec{\mu }}_X({\varvec{t}}_i)|{\varvec{t}}_i\Big \rbrace {=}\hbox {cov} (\varvec{X}_i|{\varvec{t}}_i) \end{aligned}$$

Therefore, the first term in (A2) is 0, and the second terms in (A1) and (A2) are identical. It follows that \(\hbox {cov} \lbrace \varvec{X}^\top {\varvec{V}}^{-1}({\varvec{I}}-{\varvec{S}}){\varvec{\varepsilon }}^* \rbrace \ge \hbox {cov} \lbrace {\tilde{\varvec{X}}}^\top {\varvec{V}}^{-1} ({\varvec{I}}-{\varvec{S}}){\varvec{\varepsilon }}^* \rbrace \), and consequently \({\varvec{V}}_{BF}\ge {\varvec{V}}_K\).

1.3 EM algorithm-related to Algorithm 1

The EM algorithm, which is composed of the expectation (E) step and the maximization (M) step at each iteration, has several appealing features such as simplicity of implementation and monotone increase of the likelihood at each iteration. Toward this end, we let \({\varvec{\Theta }}=({\varvec{\beta }}, {\varvec{D}}, {\varvec{\Sigma }}, \phi )\) be the set of entire model parameters. Treating the unobservable random effects \({\varvec{b}}_i\) as the missing data, we have the complete data \(\lbrace ({\varvec{y}}_i, {\varvec{b}}_i), i = 1, \ldots ,n\rbrace \). Let the density function of \(({\varvec{y}}_i, {\varvec{b}}_i)\) be \( f({\varvec{y}}_i, {\varvec{b}}_i|{\varvec{\Theta }}) \) with parameters \( {\varvec{\Theta }}\in \Omega \) and let the density function of \( {\varvec{y}}_i \) given by

$$\begin{aligned} f({\varvec{y}}_i|{\varvec{\Theta }})=\prod _{i=1}^{n}\int f({\varvec{y}}_i,{\varvec{b}}_i|{\varvec{\Theta }})d{\varvec{b}}_i. \end{aligned}$$
(7.1)

The parameters \( {\varvec{\Theta }}\) are to be estimated by maximizing \( f({\varvec{y}}_i|{\varvec{\Theta }}) \) over \( {\varvec{\Theta }}\in \Omega \). In many statistical problems, maximization of the complete-data specification \( f({\varvec{y}}_i,{\varvec{b}}_i|{\varvec{\Theta }}) \) is simpler than that of the incomplete-data specification \( f({\varvec{y}}_i|{\varvec{\Theta }}) \). A main feature of the EM algorithm is maximization of \( f({\varvec{y}}_i,{\varvec{b}}_i|{\varvec{\Theta }}) \) over \( {\varvec{\Theta }}\in \Omega \). Then the log-likelihood function is

$$\begin{aligned} L({\varvec{\Theta }})=\log f({\varvec{y}}_i|{\varvec{\Theta }})= Q({\varvec{\Theta }}^{'}|{\varvec{\Theta }})-H({\varvec{\Theta }}^{'}|{\varvec{\Theta }}), \end{aligned}$$
(7.2)

where \( Q({\varvec{\Theta }}^{'}|{\varvec{\Theta }})=\hbox {E} \{\log f({\varvec{y}}_i|{\varvec{b}}_i,{\varvec{\Theta }})\} \) and \( H({\varvec{\Theta }}^{'}|{\varvec{\Theta }})= \hbox {E}\{\log f({\varvec{b}}_i|{\varvec{\Theta }})\} \) are assumed to exist for all pairs \(({\varvec{\Theta }}^{'}, {\varvec{\Theta }})\). We now define the EM iteration \( {\varvec{\Theta }}_{r}\rightarrow {\varvec{\Theta }}_{r+1} \in M({\varvec{\Theta }}_{r}) \) as follows:

E-step:

Determine \(Q({\varvec{\Theta }}|{\varvec{\Theta }}_{r})\).

M-step:

Choose \( {\varvec{\Theta }}_{r+1} \) to be any value of \( {\varvec{\Theta }}\in \Omega \) which maximizes \( Q({\varvec{\Theta }}|{\varvec{\Theta }}_{r}) \).

Note that M is a point-to-set map, i.e., \( M({\varvec{\Theta }}_{r}) \) is the set of \( {\varvec{\Theta }}\) values which maximizes \(Q({\varvec{\Theta }}|{\varvec{\Theta }}_{r})\). Many applications of EM are for the curved exponential family, for which the E-step and M-step take special forms. Sometimes it may not be numerically feasible to perform the M-step. So let defined a generalized EM algorithm (a GEM algorithm) to be an iterative scheme \( {\varvec{\Theta }}_{r}\rightarrow {\varvec{\Theta }}_{r+1} \in M({\varvec{\Theta }}_{r})\), where \( {\varvec{\Theta }}\rightarrow M({\varvec{\Theta }}) \) is a point-to-setm ap, such that

$$\begin{aligned} Q({\varvec{\Theta }}^{'}|{\varvec{\Theta }})\ge Q({\varvec{\Theta }}|{\varvec{\Theta }}). \end{aligned}$$
(7.3)

EM is a special case of GEM. For any instance \( \{{\varvec{\Theta }}_{r}\} \) of a GEM algorithm,

$$\begin{aligned} L({\varvec{\Theta }}_{r+1})\ge L({\varvec{\Theta }}_{r}), \end{aligned}$$
(7.4)

and

$$\begin{aligned} H({\varvec{\Theta }}|{\varvec{\Theta }})\ge H({\varvec{\Theta }}^{'}|{\varvec{\Theta }}). \end{aligned}$$
(7.5)

See Wu (1983) for more details.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Taavoni, M., Arashi, M. Kernel estimation in semiparametric mixed effect longitudinal modeling. Stat Papers 62, 1095–1116 (2021). https://doi.org/10.1007/s00362-019-01125-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-019-01125-8

Keywords

Navigation