Abstract
The literature on measurement error for time-dependent covariates has mostly focused on empirical models, such as linear mixed effects models. Motivated by an AIDS study, we propose a joint modeling method in which a mechanistic nonlinear model is used to address the time-varying covariate measurement error for a longitudinal outcome that can be either discrete such as binary and count or continuous. We implement an inference procedure that uses first-order Taylor approximation to linearize both the covariate model and the response model. We study the asymptotic properties of the joint model based estimator and provide proof of consistency and normality. We then evaluate the performance of estimation in finite sample size scenario through simulation. Finally, we apply the new method to real data in an HIV/AIDS study.
Similar content being viewed by others
References
Acosta E, Walawander HWA, Eron J, Pettinelli C, Yu S, Neath D (2004) Comparison of two indinavir/ritonavir regimens in treatmet-experienced HIV-infected individuals. J Acquir Immune Defic Syndr 37:1358–1366
Barndorff-Nielsen O, Cox D (1989) Asymptotic techniques for use in statistics. Chapman and Hall, New York
Booth J, Hobert J (1999) Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J R Stat Soc Ser B 61:265–285
Bradley R, Gart J (1962) The asymptotic properties of ml estimators when sampling from associated populations. Biometrika 49:205–214
Breslow N, Clayton D (1993) Approximate inference in generalized linear mixed models. J Am Stat Assoc 88:9–25
Carroll R, Ruppert D, Stefanski L, Crainiceanu C (2006) Measurement error in nonlinear models: a modern perspective, 2nd edn. Chapman and Hall, London
Cruz R, Marshall G, Quintana F (2011) Logistic regression when covariates are random effects from a non-linear mixed model. Biom J 53:735–749
Demidenko E (2004) Mixed models: theory and applications. Wiley, New York
Fitzmanurice G, Laird N, Ware J (2011) Applied longitudinal analysis, 2nd edn. Wiley, New York
Fu L, Lei Y, Sharma R, Tang S (2013) Parameter estimation of nonlinear mixed-effects models using first-order conditional linearization and em algorithm. J Appl Stat 40(2):252–265
Ibrahim J, Lipsitz S, Chen M (1999) Missing covariates in generalized linear models when the missing data mechanism is nonignorable. J R Stat Soc Ser B 61:173–190
Laird N, Ware J (1982) Random-effects models for longitudinal data. Biometrics 38:963–974
Lee Y, Nelder J, Pawitan Y (2006) Generalized linear models with random effects: unified analysis via H-likelihood. Chapman and Hall/CRC, London
Lindstrom M, Bates D (1990) Nonliner mixed effects models for repeated measures data. Biometrics 46:673–687
Liu W, Wu L (2010) Some asymptotic results for semiparametric nonlinear mixed-effects models with incomplete data. J Stat Plan Inference 140:52–64
McLachlan G, Krishnan T (1997) The EM-algorithm and extension. Wiley, New York
Prentice E, Zhao L (1991) Estimating equation for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics 47:825–839
Serfling F (1980) Approximaton theorems of mathematical statistics. Wiley, New York
Vonesh E, Chinchilli V (1997) Linear and nonlinear models for the analysis of repeated measurements. Marcel Dekker, New York
Vonesh E, Wang H, Nie L, Majumdar D (2002) Conditional second-order generalized estimating equations for generalized linear and nonlinear mixed-effects models. J Am Stat Assoc 97:271–283
Wei G, Tanner M (1990) A Monte-Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithm. J Am Stat Assoc 85:699–704
Wu L (2010) Mixed effects models for complex data. Chapman and Hall, London
Wu H, Ding A (1999) Population HIV-1 dynamics in vivo: applicable models and inferential tools for virological data from AIDS clinical trials. Biometrics 55:410–418
Zhang H, Wong H, Wu L (2018) A mechanistic nonlinear model for censored and mis-measured covariates in longitudinal models, with application in aids studies. Stat Med 37(1):167–178
Acknowledgements
This work is partially supported by the City University of New York High-Performance Computing Center, College of Staten Island, funded in part by the City and State of New York, City University of New York Research Foundation and National Science Foundation grants CNS-0958379, CNS-0855217, and ACI-112611.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Appendices
A Proofs of asymptotic properties
1.1 A.1 Regularity conditions
Consider models (4) and (5). Denote \({\varvec{\gamma }}=({\varvec{\alpha }}^{T},{\varvec{\beta }}^{T})^{T} \in \varGamma \), \({\varvec{\omega }}_{i}=({\varvec{a}}_{i}^{T}, {\varvec{b}}_{i}^{T})^{T}\), and
where \(f(\cdot )\) is a generic density function, \(f({\varvec{x}}|{\varvec{y}})\) is a conditional distribution of X given Y and \(k_{i}=n_{i}+m_{i}\). Denote the estimates by \(\hat{{\varvec{\gamma }}}\) and \({\varvec{\hat{\omega }_{i}}}\) and
Similarly, we can define \(l_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\), \(l_{i,{\varvec{\gamma }}{\varvec{\gamma }}}^{''}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\), \(l_{i,{\varvec{\gamma }}{\varvec{\omega }}_{i}}^{''}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\) and derivatives for \(L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\).
We denote convergence in probability as \(k_{i} \rightarrow \infty \) by \(o_{p}(1_{k_{i}})\), convergence in probability as \(n \rightarrow \infty \) by \(o_{p}(1_{n})\), and convergence in probability as both \(k_{i} \rightarrow \infty \) and \(n \rightarrow \infty \) by \(o_{p}(1_{k_{i},n})\). We show consistency and asymptotic normality under the following regularity conditions.
-
R1.
\(k_{i} = O(N) \) uniformly for \(i=1,\ldots ,n\), where \(N= \text {min}_{i}\{k_{i}\}\).
-
R2.
The variance-covariance parameters \({\varvec{\lambda }}=(\sigma ^{2}, A, B) \) are fixed and known, and the true parameter \({\varvec{\gamma }}_{0}\) is in the interior of \(\varGamma \). Quantities \(U_{i}\), \(\varLambda _{i}\), \(V_{i}\), \(\varSigma _{i}\) and D are evaluated at \({\varvec{\gamma }}\) and \({\varvec{\omega }}_{i}\). When \({\varvec{\lambda }}\) is unknown, we can simply replace it by its consistent estimate (in (12)).
-
R3.
The density \(f(x_{ij}|{\varvec{\omega }}_{i};{\varvec{\gamma }})\) and \(f(y_{ij}|{\varvec{\omega }}_{i};{\varvec{\gamma }})\) satisfy the necessary regularity conditions (e.g., Bradley and Gart 1962) such that, for fixed \({\varvec{\gamma }}\), the estimate of \({\varvec{\omega }}_{i}\) is \(\sqrt{k_{i}}\)-consistent for \({\varvec{\omega }}_{i}\) as \(k_{i} \rightarrow \infty \). Also, the necessary regularity conditions are assumed (e.g., Serfling 1980, p. 27, Theorem C) such that, by the Law of Large Numbers, the following hold (as \(k_{i} \rightarrow \infty \)):
$$\begin{aligned} \begin{aligned}&-k_{i}^{-1} \frac{\partial ^{2}}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) = k_{i}^{-1}U_{i}^{T}\varLambda _{i}^{-1}U_{i} + o_{p}(1_{k_{i}}), \\&-k_{i}^{-1} \frac{\partial ^{2}}{\partial {\varvec{\omega }}_{i}\partial {\varvec{\omega }}_{i}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) = k_{i}^{-1}V_{i}^{T}\varLambda _{i}^{-1}V_{i} + o_{p}(1_{k_{i}}),\\&-k_{i}^{-1} \frac{\partial ^{2}}{\partial {\varvec{\gamma }}\partial {\varvec{\omega }}_{i}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) = k_{i}^{-1}U_{i}^{T}\varLambda _{i}^{-1}V_{i} + o_{p}(1_{k_{i}}), \\&-k_{i}^{-1} \frac{\partial ^{2}}{\partial {\varvec{\omega }}_{i}\partial {\varvec{\gamma }}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) = k_{i}^{-1}V_{i}^{T}\varLambda _{i}^{-1}U_{i} + o_{p}(1_{k_{i}}), \\ \end{aligned} \end{aligned}$$$$\begin{aligned} \begin{aligned}&\text {E}_{{\varvec{\gamma }}|{\varvec{\omega }}}\left[ - \frac{\partial ^{2}}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \right] = U_{i}^{T}\varLambda _{i}^{-1}U_{i}, \\&\text {E}_{{\varvec{\gamma }}|{\varvec{\omega }}}\left[ - \frac{\partial ^{2}}{\partial {\varvec{\omega }}_{i}\partial {\varvec{\omega }}_{i}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \right] = V_{i}^{T}\varLambda _{i}^{-1}V_{i},\\&\text {E}_{{\varvec{\gamma }}|{\varvec{\omega }}}\left[ - \frac{\partial ^{2}}{\partial {\varvec{\gamma }}\partial {\varvec{\omega }}_{i}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \right] = U_{i}^{T}\varLambda _{i}^{-1}V_{i}.\\ \end{aligned} \end{aligned}$$Finally, the matrices \(k_{i}^{-1}U_{i}^{T}\varLambda _{i}^{-1}U_{i}\) and \(k_{i}^{-1}V_{i}^{T}\varLambda _{i}^{-1}V_{i}\) are both assumed to be positive definite with finite determinants such that, for example, the smallest eigenvalue of \(k_{i}^{-1}U_{i}^{T}\varLambda _{i}^{-1}U_{i}\) exceeds \(\lambda _{0}\) for some \(\lambda _{0} > 0\).
-
R4.
For all \({\varvec{\gamma }}\in \varGamma \) and all the s-dimensional \({\varvec{\omega }}_{i}\in R^{s}\), the function \(L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\) is three times differentiable and continuous in \({\varvec{\gamma }}\) and \({\varvec{\omega }}_{i}\) for all \(x_{ij}\) and \(y_{ij}\), and \(L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\) satisfies the conditions to change the order of integration and differentiation, as indicated in the proof.
-
R5.
For any \({\varvec{\gamma }}\in \varGamma \), there exist \(d_{1} > 0\) and \(\lambda _{1} > 0\) such that (a): for all \({\varvec{\gamma }}^{*} \in B_{d_{1}}({\varvec{\gamma }})\), where \(B_{d_{1}}({\varvec{\gamma }})\) is the \(\tau \)-dimensional sphere centered at \({\varvec{\gamma }}\) with radius \(d_{1}\), the following holds:
$$\begin{aligned} -\frac{1}{n} \sum _{i=1}^{n} \frac{\partial }{\partial {\varvec{\gamma }}^{T}}l_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})) \Big |_{{\varvec{\gamma }}={\varvec{\gamma }}^{*}} = \varOmega ({\varvec{\gamma }}^{*})^{-1} + o_{p}(1_{n}),\quad as \quad n \rightarrow \infty , \end{aligned}$$where \(\varOmega ({\varvec{\gamma }}^{*})^{-1}\) is positive definite with minimum eigenvalue greater than \(\lambda _{1}\) and
$$\begin{aligned} \begin{aligned} \frac{\partial }{\partial {\varvec{\gamma }}^{T}}l_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})) \Big |_{{\varvec{\gamma }}={\varvec{\gamma }}^{*}} =&\,\{ l_{i,{\varvec{\gamma }}{\varvec{\gamma }}}^{''}({\varvec{\gamma }}^{*},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*})) + l_{i,{\varvec{\gamma }}{\varvec{\omega }}_{i}}^{''}({\varvec{\gamma }}^{*},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*})) \\&\times [l_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''}({\varvec{\gamma }}^{*},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*})) + D^{-1} ]^{-1} l_{i,{\varvec{\omega }}{\varvec{\gamma }}}^{''}({\varvec{\gamma }}^{*},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*}))\}; \end{aligned} \end{aligned}$$and (b): the first, second, and third derivatives of \(\sqrt{k_{i}}L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\) with respect to \({\varvec{\omega }}_{i}\) are uniformly bounded in \(B_{d_{1}}({\varvec{\gamma }})\).
-
R6.
At the true value \({\varvec{\gamma }}_{0}\), the following hold true:
$$\begin{aligned} \begin{aligned}&\text {E}_{{\varvec{\omega }}}(U_{i}^{T}\varSigma _{i}^{-1}U_{i}) = \varphi _{i}({\varvec{\gamma }}_{0}) \quad \text {exists for all} \quad i=1,\ldots ,n, \\&\lim _{n \rightarrow \infty } n^{-2} \sum _{i=1}^{n}\text {Cov}_{{\varvec{\omega }}}( U_{i}^{T}\varSigma _{i}^{-1}U_{i}) = 0, \quad \lim _{n \rightarrow \infty } n^{-1} \sum _{i=1}^{n} \varphi _{i}({\varvec{\gamma }}_{0}) = \varOmega ({\varvec{\gamma }}_{0})^{-1}, \\ \end{aligned} \end{aligned}$$where \(U_{i}\), \(V_{i}\), and \(\varSigma _{i}\) are evaluated at \({\varvec{\gamma }}_{0}\) and \({\varvec{\omega }}_{i}\), and \(\varOmega ({\varvec{\gamma }}_{0})^{-1}\) is positive definite.
-
R7.
The marginal densities, \(\int \exp \{k_{i}L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\}d{\varvec{\omega }}_{i}\), satisfy the necessary regularity conditions such that the MLE of \({\varvec{\gamma }}\) exists and satisfies \((\hat{{\varvec{\gamma }}}_{MLE} - {\varvec{\gamma }}_{0}) = O_{p}(n^{-1/2})\).
Let \({\varvec{\gamma }}=({\varvec{\alpha }}^{T}, {\varvec{\beta }}^{T})^{T}\) be the mean parameters in models (4) and (5). Let \({{\varvec{\hat{\gamma }}}}_{\text {MLE}}\) be the the (exact) MLE of \({\varvec{\gamma }}\) which maximizes the observed data log-likelihood \(l_o({\varvec{\theta }})\) in (3), and let \(\hat{{\varvec{\gamma }}}\) be the approximate MLE based on the linearization method described in Sect. 3. Under the regularity conditions R1–R7, we have the following asymptotic results:
Theorem\(\varvec{A}_{\varvec{1}}\)The estimates\({{\varvec{\hat{\gamma }}}}_{\text {MLE}}\)and\({\varvec{\hat{\gamma }}}\)satisfy
where \({\varvec{\gamma }}_{0}\) is the true value of \({\varvec{\gamma }}\).
Theorem\(\varvec{A}_{\varvec{2}}\)The approximate estimate\({\varvec{\hat{\gamma }}}\)asymptotically follows a normal distribution
where \(\varOmega ({\varvec{\gamma }}_{0})\) is given in R5 and R6.
1.2 A.2 Estimating equations
In Sect. 3.1, for fixed \({\tilde{{\varvec{\lambda }}}}\), we obtain an approximate MLE \(\hat{{\varvec{\gamma }}}(\tilde{{\varvec{\lambda }}})\) and \(\hat{{\varvec{\omega }}}_{i}(\tilde{{\varvec{\lambda }}})\) using linearization. The linearization procedure is equivalent to maximizing (with respect to \({\varvec{\gamma }}\) and \({\varvec{\omega }}_{i}\)) a “complete-data” likelihood as shown below. We note that the maximization is taken with respect to the random effects \({\varvec{\omega }}_{i}\). It does not lead to MLE but it is equivalent to the approximation by integrating out the random effects. Therefore, it is in fact a computational method rather than finding MLE (see, e.g., Vonesh et al. 2002; Lee et al. 2006).
In fact, we can write a complete-data log-likelihood as (suppressing \({\varvec{\lambda }}\))
where \({\varvec{a}}=({\varvec{a}}_{1}^{T},\ldots ,{\varvec{a}}_{n}^{T})^{T}\), \({\varvec{b}}=({\varvec{b}}_{1}^{T},\ldots ,{\varvec{b}}_{n}^{T})^{T}\), \({\varvec{x}}=({\varvec{x}}_{1}^{T},\ldots ,{\varvec{x}}_{n}^{T})^{T}\), \({\varvec{y}}=({\varvec{y}}_{1}^{T},\ldots ,{\varvec{y}}_{n}^{T})^{T}\), and \(l_{i} =\log f({\varvec{x}}_{i}|{\varvec{a}}_{i};{\varvec{\alpha }}) + \log f({\varvec{y}}_{i}|{\varvec{a}}_{i},{\varvec{b}}_{i};{\varvec{\alpha }},{\varvec{\beta }}) \), where the last two density functions are referred in (3).
Note that
where \({\varvec{h}}_{i}(\cdot )\), \({\varvec{g}}_{i}^{-1}(\cdot )\) and \(\varOmega _{i}\) are defined in Sect. 3.1. Recall \({\varvec{z}}_{i}= ({\varvec{x}}_{i}^{T},{\varvec{y}}_{i}^{T})^{T}\) and define \({\varvec{\mu }}_{i}=({\varvec{h}}_{i}^{T},{\varvec{g}}_{i}^{T})^{T}\). By directly applying to the results presented by Prentice and Zhao (1991), we have that maximizing (14) using Fisher’s method of scoring is equivalent to solving the following set of estimating equations:
where \(U_{i}\), \(\varLambda _{i}\), \(V_{i}\), and D are defined in Sect. 3.1 (see also Vonesh et al. 2002).
By writing \(U^{T}=[U_{1}^{T},\ldots ,U_{n}^{T}]\), \(V^{T}=\text {diag}\{V_{1}^{T},\ldots ,V_{n}^{T}\}\), \({\varvec{z}}^{T}=[{\varvec{z}}_{1}^{T},\ldots ,{\varvec{z}}_{n}^{T}],{\varvec{\mu }}^{T}=[{\varvec{\mu }}_{1}^{T},\ldots ,{\varvec{\mu }}_{n}^{T}]\) and \(\varLambda =\text {diag}\{\varLambda _{1},\ldots ,\varLambda _{n}\}\), it can be shown (see Prentice and Zhao 1991) that solving (15) and (16) via Fisher’s method of scoring reduces to iteratively solving the following linear mixed-model equations
where \(\tilde{D} = \text {diag}\{D,\ldots ,D\}\), \({\varvec{z}}^{*}={\varvec{z}}- {\varvec{\mu }}+ U\hat{{\varvec{\gamma }}}^{(u+1)} + V\hat{{\varvec{\omega }}}^{(u+1)}\) and u is the iteration indicator.
The solution to the Eq. (17) can be obtained by iteratively solving the following equations
where \({\varvec{z}}_{i}= ({\varvec{x}}_{i}^{T},{\varvec{y}}_{i}^{T})^{T}\) is defined in (7), and where \(U_{i}\), \(\varLambda _{i}\), and \(V_{i}\) are all evaluated at \((\hat{{\varvec{\gamma }}}^{(u)},\hat{{\varvec{\omega }}}_{i}^{(u)})\).
The solution to the Eq. (18) is given in (9) and (11). Therefore, we have shown that for fixed \({\varvec{\lambda }}\), the final estimates \(\hat{{\varvec{\gamma }}}\) and \(\hat{{\varvec{\omega }}}_{i} =\hat{{\varvec{\omega }}}_{i}(\hat{{\varvec{\gamma }}})\) satisfy the estimating Eqs. (15) and (16) and maximize the complete-data log-likelihood function (14) with respect to \({\varvec{\gamma }}\) and \({\varvec{\omega }}_{i}\). These facts will be used to show the following asymptotic properties of \({\hat{{\varvec{\gamma }}}}\).
1.3 A.3 Consistency
We first note that, for fixed \({\varvec{\lambda }}\), the approximate MLE of \({\varvec{\gamma }}\) will satisfy the set of estimating equations (Vonesh et al. 2002)
Under R4, we have
Now we examine the term \(L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\omega }}_{i})\) in the above expression. Recall that \(k_{i}L_{i,{\varvec{\gamma }}}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \equiv l_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i}) + \log f({\varvec{a}}_{i}) + \log f({\varvec{b}}_{i})\). Since \(y_{ij}'s\) are conditionally independent of each other given \({\varvec{\omega }}_{i}\), under R3, it follows that conditional on \({\varvec{\omega }}_{i}\),
Furthermore, under R3 we have
Combining the results in (19) and (20), we can show that
Then, by direct application of the Laplace approximation to integrals of the form \(\int \exp \{kp(x)\}dx\) and \(\int q(x) \exp \{kp(x)\}dx\), where q(x) and p(x) are smooth functions in x with p(x) having a unique maximum at some point \(\hat{x}\) , it can be shown Barndorff-Nielsen and Cox (1989) that
and
where \(\hat{L}_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''} = \hat{L}_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''} ({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) \). Because (19) implies \(\sqrt{k_{i}}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))=O_{p}(1)\), it follows from R1 that
Hence we have
where \(K({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) = \exp \{\sum _{i=1}^{n} k_{i}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\} \prod _{i=1}^{n} (2\pi /|k_{i}L_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''}|)^{b/2}.\) Since \(K({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) \ne 0\) for all \({\varvec{\gamma }}\in \varGamma \), the approximate MLE \(\hat{{\varvec{\gamma }}}_{MLE}\) of \({\varvec{\gamma }}\) satisfies
where \(J_{1}({\varvec{\gamma }},\hat{{\varvec{\omega }}}({\varvec{\gamma }})) = \sum _{i=1}^{n}k_{i}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) = \sum _{i=1}^{n} l_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\) is the set of estimating equations for \({\varvec{\gamma }}\) conditional on fixed \({\varvec{\lambda }}\), as given in (19). By taking a first-order Taylor series expansion of \(J_{1}({\varvec{\gamma }},\hat{{\varvec{\omega }}}({\varvec{\gamma }}))\) about \(\hat{{\varvec{\gamma }}}_{MLE}\) and noting that, from (22), \(J_{1}(\hat{{\varvec{\gamma }}}_{MLE},\hat{\hat{{\varvec{\omega }}}}({\varvec{\gamma }}_{MLE})) = O_{p}(nN^{-1/2})\), we have
where
and \({\varvec{\gamma }}^{*}\) is on the line segment joining \({\varvec{\gamma }}\) to \(\hat{{\varvec{\gamma }}}_{MLE}\). By applying the chain rule, we have, for any \({\varvec{\gamma }}\in \varGamma \),
Note that \(\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})\) maximized \(k_{i}L_{i}({\varvec{\gamma }},{\varvec{\omega }})\) by definition and this implies
Applying the chain rule once again, we have
Solving the above equation for \([\partial {\hat{{\varvec{\omega }}}}_{i}({\varvec{\gamma }})/\partial {\varvec{\gamma }}^{T}]\), we have
Substituting this expression of \([\partial \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})/\partial {\varvec{\gamma }}^{T}]\) in (24), it follows from R1, R3 and R5 that as \(n\rightarrow \infty \)
which implies \(-\frac{1}{n} J_{1}^{'}({\varvec{\gamma }}^{*}, \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*})) \xrightarrow {\text {p}} \varOmega ({\varvec{\gamma }}^{*})^{-1} \).
It follows from (25) that for sufficiently large n, \(-\frac{1}{n} J_{1}^{'}({\varvec{\gamma }}^{*}, \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*}))> \frac{1}{2}\varOmega ({\varvec{\gamma }}^{*})^{-1} > \frac{\lambda _{1}}{2}I\) with probability 1, where \(\lambda _{1}\) is defined in R5. This, together with (23) and the fact that for sufficiently large N, \(\parallel O_{p}(N^{-1/2})\parallel \le \lambda _{1} \delta /4 \) for any \(0< \delta < d_{1}\) (\(d_{1}\) as defined in R5), imply that, conditional on \({\varvec{\omega }}= ({\varvec{\omega }}_{1},\ldots , {\varvec{\omega }}_{n})\),
Thus, since \(\hat{{\varvec{\gamma }}}\) satisfies \(J_{1}(\hat{{\varvec{\gamma }}},\hat{{\varvec{\omega }}}({\varvec{\gamma }})) = \mathbf{0}\), it follows using standard arguments that
Hence, using the bounded convergence theorem, we have
Since \(J_{1}(\hat{{\varvec{\gamma }}},\hat{{\varvec{\omega }}}(\hat{{\varvec{\gamma }}})) = \mathbf{0}\) and \(J_{1}(\hat{{\varvec{\gamma }}}_{MLE}, \hat{{\varvec{\omega }}}(\hat{{\varvec{\gamma }}}_{MLE}))=O_{p}(nN^{-1/2})\) by (23), it follows from (25) that
where \({\varvec{\gamma }}^{**}\) is on the line segment joining \(\hat{{\varvec{\gamma }}}\) to \(\hat{{\varvec{\gamma }}}_{MLE}\). From (27), we know the \({\varvec{\gamma }}^{**}\) is in the interior, it follows from R5 that \(J_{1}^{'}({\varvec{\gamma }}^{**},\hat{{\varvec{\omega }}}({\varvec{\gamma }}^{**})) \ge \frac{n}{2}\varOmega ({\varvec{\gamma }}^{**})^{-1}\), where \(\varOmega ({\varvec{\gamma }}^{**})^{-1}\) is positive definite. Hence the estimator satisfies \((\hat{{\varvec{\gamma }}} - \hat{{\varvec{\gamma }}}_{MLE}) = O_{p}(N^{-1/2})\), from which it follows (given R7) that
the desired result.
1.4 A.4 Asymptotic normality of \(\hat{{\varvec{\gamma }}}\)
The asymptotic normality of \(\hat{{\varvec{\gamma }}}\) will be shown based on the estimation equations (15) and (16). Let
where \(U_{i}, \varSigma _{i}\) and \({\varvec{z}}_{i}\) are defined in Sect. 3.1. The estimator \(\hat{{\varvec{\gamma }}}\) satisfies \(\varPhi (\hat{{\varvec{\gamma }}}) = \mathbf{0}\) at convergence. Noting that \(\partial \varPhi ({\varvec{\gamma }})/\partial {\varvec{\gamma }}^{T} = -\sum _{i=1}^{n}U_{i}^{T}\varSigma _{i}^{-1}U_{i}\) is constant for \({\varvec{\gamma }}\), we take a Taylor series expansion of \(\varPhi (\hat{{\varvec{\gamma }}})\) around the true parameter \({\varvec{\gamma }}_{0}\):
which implies
Since \(E[U_{i}^{T}\varSigma _{i}^{-1}({\varvec{z}}_{i}- U_{i}{\varvec{\gamma }}_{0})]=0\) and \(\text {Cov}[U_{i}^{T}\varSigma _{i}^{-1}({\varvec{z}}_{i}- U_{i}{\varvec{\gamma }}_{0})]=U_{i}^{T}\varSigma _{i}^{-1}U_{i}\), by the Lindeberg Central Limit Theorem, we have
Noting that \(\hat{{\varvec{\gamma }}} = {\varvec{\gamma }}_{0} + o_{p}(1_{N,n})\) and \(k_{i}=O(N)\), and using (20), we have
Hence, it follows by the Law of Large Numbers and R6 that
Using Slutsky’s theorem, we can show that
We can extend the foregoing proofs to the case where \({\varvec{\lambda }}\) is unknown by replacing \({\varvec{\lambda }}\) with its consistent estimate \(\hat{{\varvec{\lambda }}}\). Note that at the estimate \(\hat{{\varvec{\gamma }}}\), the estimate \(\hat{{\varvec{\lambda }}}\) given in (12) can be shown to be consistent for \({\varvec{\lambda }}\) as \(n \rightarrow \infty \) and \(N\rightarrow \infty \) see, e.g., Demidenko (2004).
B Monte Carlo (MC) EM algorithm
The EM algorithm is a standard approach for likelihood estimation in the presence of missing data. In our case, by treating the random effects \(({\varvec{a}}_{i}\), \({\varvec{b}}_{i})\) as “missing data”, we can write “complete data” as \(\big \{ ({\varvec{x}}_{i},{\varvec{y}}_{i}, {\varvec{a}}_{i}, {\varvec{b}}_{i}), i=1,\ldots , n \big \}\). Let \({\varvec{\theta }}\) be the collection of all parameters. Then, the “complete-data” log-likelihood function for individual i can be expressed as
The EM algorithm iterates between E-step and M-step until convergence. Let \({\varvec{\theta }}^{(t)}\) be the parameter estimates from the tth EM iteration. The E-step for individual i at the \((t+1)\)th EM iteration can be expressed as
The above E-step again involves an intractable integration. However, because expression (29) is an expectation with respect to \(f({\varvec{a}}_{i},{\varvec{b}}_{i}|{\varvec{y}}_{i};{\varvec{\theta }}^{(t)})\), and it can be evaluated using the MCEM algorithm (Wei and Tanner 1990; Booth and Hobert 1999; Ibrahim et al. 1999). Specifically, we may use the Gibbs sampler to generate many samples from \(f({\varvec{a}}_{i}, {\varvec{b}}_{i}|{\varvec{y}}_{i}; {\varvec{\theta }}^{(t)})\) by iteratively sampling from the full conditionals \([{\varvec{a}}_{i}|{\varvec{x}}_{i},{\varvec{y}}_{i},{\varvec{b}}_{i};{\varvec{\theta }}^{(t)}]\), and \([{\varvec{b}}_{i}|{\varvec{x}}_{i},{\varvec{y}}_{i},{\varvec{a}}_{i};{\varvec{\theta }}^{(t)}]\). Monte Carlo samples from each of the above full conditionals can be generated using rejection sampling methods.
After generating large random samples from the conditional distribution \(f({\varvec{a}}_{i},{\varvec{b}}_{i}|{\varvec{y}}_{i};{\varvec{\theta }}^{(t)})\), we can then approximate the expectation \(Q_{i}({\varvec{\theta }}|{\varvec{\theta }}^{(t)})\) in the E-step by its empirical mean, with “missing data” replaced by simulated values. Then the M-step, which maximizes \(\sum _{i=1}^{n} Q_{i}({\varvec{\theta }}|{\varvec{\theta }}^{(t)})\), is like a complete-data maximization, so complete-data optimization procedures such as the Newton-Raphson method may be used to update the parameter estimates. At convergence, we obtain the MLE of \({\varvec{\theta }}\) or a possibly local maximum. We may try different starting values to roughly check if an MLE (i.e., global maximum) is obtained. Thus, for the MCEM algorithm, a major computational challenge is the implementation of the E-step. Although conceptually the MCEM method is not new, implementation for a specific problem like the complicated models described above can be quite challenging and tedious, since it not only involves non-trial programming but it also involves various convergence issues such as very slow or non-convergence.
To obtain the variance-covariance matrix of the MLE \(\hat{{\varvec{\theta }}}\), we could consider the following approximate formula (McLachlan and Krishnan 1997). Let \(S_{c}^{(i)} = \partial l_{c}^{(i)}/\partial {\varvec{\theta }}\), where \(l_{c}^{(i)}\) is the complete-data log-likelihood for individual i. Then an approximate formula for the variance-covariance matrix of \({\varvec{\hat{\theta }}}\) is
where the expectation can be approximated by Monte Carlo empirical means, as above.
Rights and permissions
About this article
Cite this article
Zhang, H., Wu, L. An approximate method for generalized linear and nonlinear mixed effects models with a mechanistic nonlinear covariate measurement error model. Metrika 82, 471–499 (2019). https://doi.org/10.1007/s00184-018-0690-z
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-018-0690-z