Skip to main content

Advertisement

Log in

An approximate method for generalized linear and nonlinear mixed effects models with a mechanistic nonlinear covariate measurement error model

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

The literature on measurement error for time-dependent covariates has mostly focused on empirical models, such as linear mixed effects models. Motivated by an AIDS study, we propose a joint modeling method in which a mechanistic nonlinear model is used to address the time-varying covariate measurement error for a longitudinal outcome that can be either discrete such as binary and count or continuous. We implement an inference procedure that uses first-order Taylor approximation to linearize both the covariate model and the response model. We study the asymptotic properties of the joint model based estimator and provide proof of consistency and normality. We then evaluate the performance of estimation in finite sample size scenario through simulation. Finally, we apply the new method to real data in an HIV/AIDS study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Acosta E, Walawander HWA, Eron J, Pettinelli C, Yu S, Neath D (2004) Comparison of two indinavir/ritonavir regimens in treatmet-experienced HIV-infected individuals. J Acquir Immune Defic Syndr 37:1358–1366

    Article  Google Scholar 

  • Barndorff-Nielsen O, Cox D (1989) Asymptotic techniques for use in statistics. Chapman and Hall, New York

    Book  MATH  Google Scholar 

  • Booth J, Hobert J (1999) Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J R Stat Soc Ser B 61:265–285

    Article  MATH  Google Scholar 

  • Bradley R, Gart J (1962) The asymptotic properties of ml estimators when sampling from associated populations. Biometrika 49:205–214

    Article  MathSciNet  MATH  Google Scholar 

  • Breslow N, Clayton D (1993) Approximate inference in generalized linear mixed models. J Am Stat Assoc 88:9–25

    MATH  Google Scholar 

  • Carroll R, Ruppert D, Stefanski L, Crainiceanu C (2006) Measurement error in nonlinear models: a modern perspective, 2nd edn. Chapman and Hall, London

    Book  MATH  Google Scholar 

  • Cruz R, Marshall G, Quintana F (2011) Logistic regression when covariates are random effects from a non-linear mixed model. Biom J 53:735–749

    Article  MathSciNet  MATH  Google Scholar 

  • Demidenko E (2004) Mixed models: theory and applications. Wiley, New York

    Book  MATH  Google Scholar 

  • Fitzmanurice G, Laird N, Ware J (2011) Applied longitudinal analysis, 2nd edn. Wiley, New York

    Book  Google Scholar 

  • Fu L, Lei Y, Sharma R, Tang S (2013) Parameter estimation of nonlinear mixed-effects models using first-order conditional linearization and em algorithm. J Appl Stat 40(2):252–265

    Article  MathSciNet  Google Scholar 

  • Ibrahim J, Lipsitz S, Chen M (1999) Missing covariates in generalized linear models when the missing data mechanism is nonignorable. J R Stat Soc Ser B 61:173–190

    Article  MathSciNet  MATH  Google Scholar 

  • Laird N, Ware J (1982) Random-effects models for longitudinal data. Biometrics 38:963–974

    Article  MATH  Google Scholar 

  • Lee Y, Nelder J, Pawitan Y (2006) Generalized linear models with random effects: unified analysis via H-likelihood. Chapman and Hall/CRC, London

    Book  MATH  Google Scholar 

  • Lindstrom M, Bates D (1990) Nonliner mixed effects models for repeated measures data. Biometrics 46:673–687

    Article  MathSciNet  Google Scholar 

  • Liu W, Wu L (2010) Some asymptotic results for semiparametric nonlinear mixed-effects models with incomplete data. J Stat Plan Inference 140:52–64

    Article  MathSciNet  MATH  Google Scholar 

  • McLachlan G, Krishnan T (1997) The EM-algorithm and extension. Wiley, New York

    MATH  Google Scholar 

  • Prentice E, Zhao L (1991) Estimating equation for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics 47:825–839

    Article  MathSciNet  MATH  Google Scholar 

  • Serfling F (1980) Approximaton theorems of mathematical statistics. Wiley, New York

    Book  MATH  Google Scholar 

  • Vonesh E, Chinchilli V (1997) Linear and nonlinear models for the analysis of repeated measurements. Marcel Dekker, New York

    MATH  Google Scholar 

  • Vonesh E, Wang H, Nie L, Majumdar D (2002) Conditional second-order generalized estimating equations for generalized linear and nonlinear mixed-effects models. J Am Stat Assoc 97:271–283

    Article  MathSciNet  MATH  Google Scholar 

  • Wei G, Tanner M (1990) A Monte-Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithm. J Am Stat Assoc 85:699–704

    Article  Google Scholar 

  • Wu L (2010) Mixed effects models for complex data. Chapman and Hall, London

    MATH  Google Scholar 

  • Wu H, Ding A (1999) Population HIV-1 dynamics in vivo: applicable models and inferential tools for virological data from AIDS clinical trials. Biometrics 55:410–418

    Article  MATH  Google Scholar 

  • Zhang H, Wong H, Wu L (2018) A mechanistic nonlinear model for censored and mis-measured covariates in longitudinal models, with application in aids studies. Stat Med 37(1):167–178

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is partially supported by the City University of New York High-Performance Computing Center, College of Staten Island, funded in part by the City and State of New York, City University of New York Research Foundation and National Science Foundation grants CNS-0958379, CNS-0855217, and ACI-112611.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongbin Zhang.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Appendices

A Proofs of asymptotic properties

1.1 A.1 Regularity conditions

Consider models (4) and (5). Denote \({\varvec{\gamma }}=({\varvec{\alpha }}^{T},{\varvec{\beta }}^{T})^{T} \in \varGamma \), \({\varvec{\omega }}_{i}=({\varvec{a}}_{i}^{T}, {\varvec{b}}_{i}^{T})^{T}\), and

$$\begin{aligned} \begin{aligned} l_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})&=l_{i}({\varvec{\alpha }},{\varvec{\beta }},{\varvec{a}}_{i},{\varvec{b}}_{i};{\varvec{x}}_{i},{\varvec{y}}_{i}) = \log f({\varvec{x}}_{i}|{\varvec{\omega }}_{i};{\varvec{\gamma }}) + \log f({\varvec{y}}_{i}|{\varvec{\omega }}_{i};{\varvec{\gamma }}),\\ k_{i}L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})&= l_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i}) + \log f({\varvec{a}}_{i}) + \log f({\varvec{b}}_{i}),\\ \end{aligned} \end{aligned}$$

where \(f(\cdot )\) is a generic density function, \(f({\varvec{x}}|{\varvec{y}})\) is a conditional distribution of X given Y and \(k_{i}=n_{i}+m_{i}\). Denote the estimates by \(\hat{{\varvec{\gamma }}}\) and \({\varvec{\hat{\omega }_{i}}}\) and

$$\begin{aligned} l_{i,{\varvec{\omega }}_{i}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) = \frac{\partial }{\partial {\varvec{\omega }}_{i}} l_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\Big |_{{\varvec{\omega }}_{i}={\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})}, \quad l_{i,{\varvec{\omega }}_{i}{\varvec{\omega }}_{i}}^{''}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) = \frac{\partial ^{2}}{\partial {\varvec{\omega }}_{i}\partial {\varvec{\omega }}_{i}^{T}} l_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\Big |_{{\varvec{\omega }}_{i}={\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})}. \end{aligned}$$

Similarly, we can define \(l_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\), \(l_{i,{\varvec{\gamma }}{\varvec{\gamma }}}^{''}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\), \(l_{i,{\varvec{\gamma }}{\varvec{\omega }}_{i}}^{''}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\) and derivatives for \(L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\).

We denote convergence in probability as \(k_{i} \rightarrow \infty \) by \(o_{p}(1_{k_{i}})\), convergence in probability as \(n \rightarrow \infty \) by \(o_{p}(1_{n})\), and convergence in probability as both \(k_{i} \rightarrow \infty \) and \(n \rightarrow \infty \) by \(o_{p}(1_{k_{i},n})\). We show consistency and asymptotic normality under the following regularity conditions.

  1. R1.

    \(k_{i} = O(N) \) uniformly for \(i=1,\ldots ,n\), where \(N= \text {min}_{i}\{k_{i}\}\).

  2. R2.

    The variance-covariance parameters \({\varvec{\lambda }}=(\sigma ^{2}, A, B) \) are fixed and known, and the true parameter \({\varvec{\gamma }}_{0}\) is in the interior of \(\varGamma \). Quantities \(U_{i}\), \(\varLambda _{i}\), \(V_{i}\), \(\varSigma _{i}\) and D are evaluated at \({\varvec{\gamma }}\) and \({\varvec{\omega }}_{i}\). When \({\varvec{\lambda }}\) is unknown, we can simply replace it by its consistent estimate (in (12)).

  3. R3.

    The density \(f(x_{ij}|{\varvec{\omega }}_{i};{\varvec{\gamma }})\) and \(f(y_{ij}|{\varvec{\omega }}_{i};{\varvec{\gamma }})\) satisfy the necessary regularity conditions (e.g., Bradley and Gart 1962) such that, for fixed \({\varvec{\gamma }}\), the estimate of \({\varvec{\omega }}_{i}\) is \(\sqrt{k_{i}}\)-consistent for \({\varvec{\omega }}_{i}\) as \(k_{i} \rightarrow \infty \). Also, the necessary regularity conditions are assumed (e.g., Serfling 1980, p. 27, Theorem C) such that, by the Law of Large Numbers, the following hold (as \(k_{i} \rightarrow \infty \)):

    $$\begin{aligned} \begin{aligned}&-k_{i}^{-1} \frac{\partial ^{2}}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) = k_{i}^{-1}U_{i}^{T}\varLambda _{i}^{-1}U_{i} + o_{p}(1_{k_{i}}), \\&-k_{i}^{-1} \frac{\partial ^{2}}{\partial {\varvec{\omega }}_{i}\partial {\varvec{\omega }}_{i}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) = k_{i}^{-1}V_{i}^{T}\varLambda _{i}^{-1}V_{i} + o_{p}(1_{k_{i}}),\\&-k_{i}^{-1} \frac{\partial ^{2}}{\partial {\varvec{\gamma }}\partial {\varvec{\omega }}_{i}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) = k_{i}^{-1}U_{i}^{T}\varLambda _{i}^{-1}V_{i} + o_{p}(1_{k_{i}}), \\&-k_{i}^{-1} \frac{\partial ^{2}}{\partial {\varvec{\omega }}_{i}\partial {\varvec{\gamma }}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) = k_{i}^{-1}V_{i}^{T}\varLambda _{i}^{-1}U_{i} + o_{p}(1_{k_{i}}), \\ \end{aligned} \end{aligned}$$

    where, under models (4)–(5),

    $$\begin{aligned} \begin{aligned}&\text {E}_{{\varvec{\gamma }}|{\varvec{\omega }}}\left[ - \frac{\partial ^{2}}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \right] = U_{i}^{T}\varLambda _{i}^{-1}U_{i}, \\&\text {E}_{{\varvec{\gamma }}|{\varvec{\omega }}}\left[ - \frac{\partial ^{2}}{\partial {\varvec{\omega }}_{i}\partial {\varvec{\omega }}_{i}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \right] = V_{i}^{T}\varLambda _{i}^{-1}V_{i},\\&\text {E}_{{\varvec{\gamma }}|{\varvec{\omega }}}\left[ - \frac{\partial ^{2}}{\partial {\varvec{\gamma }}\partial {\varvec{\omega }}_{i}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \right] = U_{i}^{T}\varLambda _{i}^{-1}V_{i}.\\ \end{aligned} \end{aligned}$$

    Finally, the matrices \(k_{i}^{-1}U_{i}^{T}\varLambda _{i}^{-1}U_{i}\) and \(k_{i}^{-1}V_{i}^{T}\varLambda _{i}^{-1}V_{i}\) are both assumed to be positive definite with finite determinants such that, for example, the smallest eigenvalue of \(k_{i}^{-1}U_{i}^{T}\varLambda _{i}^{-1}U_{i}\) exceeds \(\lambda _{0}\) for some \(\lambda _{0} > 0\).

  4. R4.

    For all \({\varvec{\gamma }}\in \varGamma \) and all the s-dimensional \({\varvec{\omega }}_{i}\in R^{s}\), the function \(L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\) is three times differentiable and continuous in \({\varvec{\gamma }}\) and \({\varvec{\omega }}_{i}\) for all \(x_{ij}\) and \(y_{ij}\), and \(L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\) satisfies the conditions to change the order of integration and differentiation, as indicated in the proof.

  5. R5.

    For any \({\varvec{\gamma }}\in \varGamma \), there exist \(d_{1} > 0\) and \(\lambda _{1} > 0\) such that (a): for all \({\varvec{\gamma }}^{*} \in B_{d_{1}}({\varvec{\gamma }})\), where \(B_{d_{1}}({\varvec{\gamma }})\) is the \(\tau \)-dimensional sphere centered at \({\varvec{\gamma }}\) with radius \(d_{1}\), the following holds:

    $$\begin{aligned} -\frac{1}{n} \sum _{i=1}^{n} \frac{\partial }{\partial {\varvec{\gamma }}^{T}}l_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})) \Big |_{{\varvec{\gamma }}={\varvec{\gamma }}^{*}} = \varOmega ({\varvec{\gamma }}^{*})^{-1} + o_{p}(1_{n}),\quad as \quad n \rightarrow \infty , \end{aligned}$$

    where \(\varOmega ({\varvec{\gamma }}^{*})^{-1}\) is positive definite with minimum eigenvalue greater than \(\lambda _{1}\) and

    $$\begin{aligned} \begin{aligned} \frac{\partial }{\partial {\varvec{\gamma }}^{T}}l_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})) \Big |_{{\varvec{\gamma }}={\varvec{\gamma }}^{*}} =&\,\{ l_{i,{\varvec{\gamma }}{\varvec{\gamma }}}^{''}({\varvec{\gamma }}^{*},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*})) + l_{i,{\varvec{\gamma }}{\varvec{\omega }}_{i}}^{''}({\varvec{\gamma }}^{*},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*})) \\&\times [l_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''}({\varvec{\gamma }}^{*},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*})) + D^{-1} ]^{-1} l_{i,{\varvec{\omega }}{\varvec{\gamma }}}^{''}({\varvec{\gamma }}^{*},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*}))\}; \end{aligned} \end{aligned}$$

    and (b): the first, second, and third derivatives of \(\sqrt{k_{i}}L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\) with respect to \({\varvec{\omega }}_{i}\) are uniformly bounded in \(B_{d_{1}}({\varvec{\gamma }})\).

  6. R6.

    At the true value \({\varvec{\gamma }}_{0}\), the following hold true:

    $$\begin{aligned} \begin{aligned}&\text {E}_{{\varvec{\omega }}}(U_{i}^{T}\varSigma _{i}^{-1}U_{i}) = \varphi _{i}({\varvec{\gamma }}_{0}) \quad \text {exists for all} \quad i=1,\ldots ,n, \\&\lim _{n \rightarrow \infty } n^{-2} \sum _{i=1}^{n}\text {Cov}_{{\varvec{\omega }}}( U_{i}^{T}\varSigma _{i}^{-1}U_{i}) = 0, \quad \lim _{n \rightarrow \infty } n^{-1} \sum _{i=1}^{n} \varphi _{i}({\varvec{\gamma }}_{0}) = \varOmega ({\varvec{\gamma }}_{0})^{-1}, \\ \end{aligned} \end{aligned}$$

    where \(U_{i}\), \(V_{i}\), and \(\varSigma _{i}\) are evaluated at \({\varvec{\gamma }}_{0}\) and \({\varvec{\omega }}_{i}\), and \(\varOmega ({\varvec{\gamma }}_{0})^{-1}\) is positive definite.

  7. R7.

    The marginal densities, \(\int \exp \{k_{i}L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\}d{\varvec{\omega }}_{i}\), satisfy the necessary regularity conditions such that the MLE of \({\varvec{\gamma }}\) exists and satisfies \((\hat{{\varvec{\gamma }}}_{MLE} - {\varvec{\gamma }}_{0}) = O_{p}(n^{-1/2})\).

Let \({\varvec{\gamma }}=({\varvec{\alpha }}^{T}, {\varvec{\beta }}^{T})^{T}\) be the mean parameters in models (4) and (5). Let \({{\varvec{\hat{\gamma }}}}_{\text {MLE}}\) be the the (exact) MLE of \({\varvec{\gamma }}\) which maximizes the observed data log-likelihood \(l_o({\varvec{\theta }})\) in (3), and let \(\hat{{\varvec{\gamma }}}\) be the approximate MLE based on the linearization method described in Sect. 3. Under the regularity conditions R1–R7, we have the following asymptotic results:

Theorem\(\varvec{A}_{\varvec{1}}\)The estimates\({{\varvec{\hat{\gamma }}}}_{\text {MLE}}\)and\({\varvec{\hat{\gamma }}}\)satisfy

$$\begin{aligned} \begin{aligned} {\varvec{\hat{\gamma }}}&= {\varvec{\hat{\gamma }}}_{\text {MLE}} + O_{p}\{ \text {min}(n_{i}, m_{i})^{-1/2} \} = {\varvec{\gamma }}_{0} + O_{p}\{ \text {max}[n^{-1/2}, \text {min}(n_{i}, m_{i})^{-1/2}]\},\\ \end{aligned} \end{aligned}$$

where \({\varvec{\gamma }}_{0}\) is the true value of \({\varvec{\gamma }}\).

Theorem\(\varvec{A}_{\varvec{2}}\)The approximate estimate\({\varvec{\hat{\gamma }}}\)asymptotically follows a normal distribution

$$\begin{aligned} \sqrt{n}({\varvec{\hat{\gamma }}}- {\varvec{\gamma }}_{0})\quad \xrightarrow {\text {d}} \quad N(\mathbf{0}, \varOmega ({\varvec{\gamma }}_{0})), \quad \text {as} \quad n \rightarrow \infty , \quad \text {min}(n_{i}, m_{i}) \rightarrow \infty , \end{aligned}$$

where \(\varOmega ({\varvec{\gamma }}_{0})\) is given in R5 and R6.

1.2 A.2 Estimating equations

In Sect. 3.1, for fixed \({\tilde{{\varvec{\lambda }}}}\), we obtain an approximate MLE \(\hat{{\varvec{\gamma }}}(\tilde{{\varvec{\lambda }}})\) and \(\hat{{\varvec{\omega }}}_{i}(\tilde{{\varvec{\lambda }}})\) using linearization. The linearization procedure is equivalent to maximizing (with respect to \({\varvec{\gamma }}\) and \({\varvec{\omega }}_{i}\)) a “complete-data” likelihood as shown below. We note that the maximization is taken with respect to the random effects \({\varvec{\omega }}_{i}\). It does not lead to MLE but it is equivalent to the approximation by integrating out the random effects. Therefore, it is in fact a computational method rather than finding MLE (see, e.g., Vonesh et al. 2002; Lee et al. 2006).

In fact, we can write a complete-data log-likelihood as (suppressing \({\varvec{\lambda }}\))

$$\begin{aligned} l_{c}({\varvec{\alpha }},{\varvec{\beta }},{\varvec{a}},{\varvec{b}};{\varvec{x}},{\varvec{y}}) = \sum _{i=1}^{n}[l_{i}({\varvec{\alpha }},{\varvec{\beta }},{\varvec{a}}_{i},{\varvec{b}}_{i};{\varvec{x}}_{i},{\varvec{y}}_{i}) + \log f({\varvec{a}}_{i}) + \log f({\varvec{b}}_{i})], \end{aligned}$$
(14)

where \({\varvec{a}}=({\varvec{a}}_{1}^{T},\ldots ,{\varvec{a}}_{n}^{T})^{T}\), \({\varvec{b}}=({\varvec{b}}_{1}^{T},\ldots ,{\varvec{b}}_{n}^{T})^{T}\), \({\varvec{x}}=({\varvec{x}}_{1}^{T},\ldots ,{\varvec{x}}_{n}^{T})^{T}\), \({\varvec{y}}=({\varvec{y}}_{1}^{T},\ldots ,{\varvec{y}}_{n}^{T})^{T}\), and \(l_{i} =\log f({\varvec{x}}_{i}|{\varvec{a}}_{i};{\varvec{\alpha }}) + \log f({\varvec{y}}_{i}|{\varvec{a}}_{i},{\varvec{b}}_{i};{\varvec{\alpha }},{\varvec{\beta }}) \), where the last two density functions are referred in (3).

Note that

$$\begin{aligned} \begin{aligned}&\text {E}({\varvec{x}}_{i}|{\varvec{a}}_{i}) = {\varvec{h}}_{i}({\varvec{\alpha }},{\varvec{a}}_{i}), \quad \text {Var}({\varvec{x}}_{i}|{\varvec{a}}_{i}) = \sigma ^{2}I, \\&\text {E}({\varvec{y}}_{i}|{\varvec{a}}_{i},{\varvec{b}}_{i}) = {\varvec{g}}_{i}^{-1}({\varvec{\alpha }},{\varvec{\beta }},{\varvec{a}}_{i},{\varvec{b}}_{i}), \quad \text {Var}({\varvec{y}}_{i}|{\varvec{a}}_{i},{\varvec{b}}_{i})=\varOmega _{i},\\ \end{aligned} \end{aligned}$$

where \({\varvec{h}}_{i}(\cdot )\), \({\varvec{g}}_{i}^{-1}(\cdot )\) and \(\varOmega _{i}\) are defined in Sect. 3.1. Recall \({\varvec{z}}_{i}= ({\varvec{x}}_{i}^{T},{\varvec{y}}_{i}^{T})^{T}\) and define \({\varvec{\mu }}_{i}=({\varvec{h}}_{i}^{T},{\varvec{g}}_{i}^{T})^{T}\). By directly applying to the results presented by Prentice and Zhao (1991), we have that maximizing (14) using Fisher’s method of scoring is equivalent to solving the following set of estimating equations:

$$\begin{aligned} \partial l/\partial {\varvec{\gamma }}= & {} \sum _{i=1}^{n}U_{i}^{T}\varLambda _{i}^{-1}({\varvec{z}}_{i}- {\varvec{\mu }}_{i}) = \mathbf{0}, \end{aligned}$$
(15)
$$\begin{aligned} \partial l/\partial {\varvec{\omega }}_{i}= & {} V_{i}^{T}\varLambda _{i}^{-1}({\varvec{z}}_{i}- {\varvec{\mu }}_{i}) - D^{-1}{\varvec{\omega }}_{i}=\mathbf{0},\quad i=1,\ldots ,n, \end{aligned}$$
(16)

where \(U_{i}\), \(\varLambda _{i}\), \(V_{i}\), and D are defined in Sect. 3.1 (see also Vonesh et al. 2002).

By writing \(U^{T}=[U_{1}^{T},\ldots ,U_{n}^{T}]\), \(V^{T}=\text {diag}\{V_{1}^{T},\ldots ,V_{n}^{T}\}\), \({\varvec{z}}^{T}=[{\varvec{z}}_{1}^{T},\ldots ,{\varvec{z}}_{n}^{T}],{\varvec{\mu }}^{T}=[{\varvec{\mu }}_{1}^{T},\ldots ,{\varvec{\mu }}_{n}^{T}]\) and \(\varLambda =\text {diag}\{\varLambda _{1},\ldots ,\varLambda _{n}\}\), it can be shown (see Prentice and Zhao 1991) that solving (15) and (16) via Fisher’s method of scoring reduces to iteratively solving the following linear mixed-model equations

$$\begin{aligned} \begin{pmatrix} U^{T}\varLambda ^{-1}U &{} U^{T}\varLambda ^{-1}V \\ V^{T}\varLambda ^{-1}U &{} V^{T}\varLambda ^{-1}V+\tilde{D}^{-1} \end{pmatrix} \begin{pmatrix} \hat{{\varvec{\gamma }}}^{(u+1)} \\ \hat{{\varvec{\omega }}}^{(u+1)} \end{pmatrix} = \begin{pmatrix} U^{T}\varLambda ^{-1}{\varvec{z}}^{*} \\ V^{T}\varLambda ^{-1}{\varvec{z}}^{*} \end{pmatrix}, \quad u=1,2,\ldots \end{aligned}$$
(17)

where \(\tilde{D} = \text {diag}\{D,\ldots ,D\}\), \({\varvec{z}}^{*}={\varvec{z}}- {\varvec{\mu }}+ U\hat{{\varvec{\gamma }}}^{(u+1)} + V\hat{{\varvec{\omega }}}^{(u+1)}\) and u is the iteration indicator.

The solution to the Eq. (17) can be obtained by iteratively solving the following equations

$$\begin{aligned} {\left\{ \begin{array}{ll} &{}\sum _{i=1}^{n}U_{i}^{T}\varLambda _{i}^{-1}U_{i}\hat{{\varvec{\gamma }}}^{(u+1)} + \sum _{i=1}^{n}U_{i}\varLambda _{i}^{-1}V_{i}\hat{{\varvec{\omega }}}_{i}^{(u+1)} = \sum _{i=1}^{n}U_{i}\varLambda _{i}^{-1}{\varvec{z}}_{i},\\ &{}V_{i}^{T}\varLambda _{i}^{-1}U_{i}\hat{{\varvec{\gamma }}}^{(u+1)} + (V_{i}^{T}\varLambda _{i}^{-1}V_{i} + D_{i})\hat{{\varvec{\omega }}}_{i}^{(u+1)}=V_{i}^{T}\varLambda _{i}^{-1}{\varvec{z}}_{i}, \quad i=1,\ldots ,n,\end{array}\right. } \end{aligned}$$
(18)

where \({\varvec{z}}_{i}= ({\varvec{x}}_{i}^{T},{\varvec{y}}_{i}^{T})^{T}\) is defined in (7), and where \(U_{i}\), \(\varLambda _{i}\), and \(V_{i}\) are all evaluated at \((\hat{{\varvec{\gamma }}}^{(u)},\hat{{\varvec{\omega }}}_{i}^{(u)})\).

The solution to the Eq. (18) is given in (9) and (11). Therefore, we have shown that for fixed \({\varvec{\lambda }}\), the final estimates \(\hat{{\varvec{\gamma }}}\) and \(\hat{{\varvec{\omega }}}_{i} =\hat{{\varvec{\omega }}}_{i}(\hat{{\varvec{\gamma }}})\) satisfy the estimating Eqs. (15) and (16) and maximize the complete-data log-likelihood function (14) with respect to \({\varvec{\gamma }}\) and \({\varvec{\omega }}_{i}\). These facts will be used to show the following asymptotic properties of \({\hat{{\varvec{\gamma }}}}\).

1.3 A.3 Consistency

We first note that, for fixed \({\varvec{\lambda }}\), the approximate MLE of \({\varvec{\gamma }}\) will satisfy the set of estimating equations (Vonesh et al. 2002)

$$\begin{aligned} J({\varvec{\gamma }}) = \frac{\partial }{\partial {\varvec{\gamma }}} \prod _{i=1}^{n} f({\varvec{y}}_{i}; {\varvec{\gamma }}) = \frac{\partial }{\partial {\varvec{\gamma }}} \prod _{i=1}^{n} \int \exp \{k_{i}L_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i})\} d{\varvec{\omega }}_{i}= \mathbf{0}. \end{aligned}$$

Under R4, we have

$$\begin{aligned} \begin{aligned} J({\varvec{\gamma }})&= \int \cdots \int \left\{ \sum _{i=1}^{n} k_{i} \frac{\partial }{\partial {\varvec{\gamma }}} L_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i})\right\} \exp \left\{ \sum _{j=1}^{n} k_{i}L_{j}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \right\} d {\varvec{\omega }}_{1}\cdots d{\varvec{\omega }}_{n}\\&= \sum _{i=1}^{n}\sqrt{k_{i}} \int \cdots \int \left( \sqrt{k_{i}} \frac{\partial }{\partial {\varvec{\gamma }}} L_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \right) \exp \left\{ \sum _{j=1}^{n} k_{i}L_{j}({\varvec{\gamma }}, {\varvec{\omega }}_{j}) \right\} d {\varvec{\omega }}_{1}\cdots d{\varvec{\omega }}_{n}\\&= \sum _{i=1}^{n}\sqrt{k_{i}} \left[ \int (\sqrt{k_{i}} L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \exp \{ k_{i}L_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \} d {\varvec{\omega }}_{i}\times \prod _{j\ne i}^{n} \int \exp \{ k_{i}L_{j}({\varvec{\gamma }}, {\varvec{\omega }}_{j}) \} d {\varvec{\omega }}_{j}\right] .\\ \end{aligned} \end{aligned}$$

Now we examine the term \(L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\omega }}_{i})\) in the above expression. Recall that \(k_{i}L_{i,{\varvec{\gamma }}}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \equiv l_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i}) + \log f({\varvec{a}}_{i}) + \log f({\varvec{b}}_{i})\). Since \(y_{ij}'s\) are conditionally independent of each other given \({\varvec{\omega }}_{i}\), under R3, it follows that conditional on \({\varvec{\omega }}_{i}\),

$$\begin{aligned} L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) = k_{i}^{-1} l_{i, {\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) = k_{i}^{-1}U_{i}^{T}\varLambda _{i}^{-1}({\varvec{z}}_{i}- {\varvec{\mu }}_{i}) = O_{p}(k_{i}^{-1/2}). \end{aligned}$$
(19)

Furthermore, under R3 we have

$$\begin{aligned} {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}) = {\varvec{\omega }}_{i}+ O_{p}(k_{i}^{-1/2}). \end{aligned}$$
(20)

Combining the results in (19) and (20), we can show that

$$\begin{aligned} \begin{aligned} L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))&= L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) + O_{p}(k_{i}^{-1/2})\\&= O_{p}(k_{i}^{-1/2}) + O_{p}(k_{i}^{-1/2}) = O_{p}(k_{i}^{-1/2}). \end{aligned} \end{aligned}$$
(21)

Then, by direct application of the Laplace approximation to integrals of the form \(\int \exp \{kp(x)\}dx\) and \(\int q(x) \exp \{kp(x)\}dx\), where q(x) and p(x) are smooth functions in x with p(x) having a unique maximum at some point \(\hat{x}\) , it can be shown Barndorff-Nielsen and Cox (1989) that

$$\begin{aligned} \int \exp \{k_{i}L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\} d{\varvec{\omega }}_{i}= \exp \{k_{i}L_{i}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\} \left( \frac{2\pi }{|k_{i} \hat{L}_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''} |} \right) ^{b/2} (1 + O(k_{i}^{-1})) \end{aligned}$$

and

$$\begin{aligned} \begin{aligned}&\int (\sqrt{k_{i}} L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \exp \{ k_{i}L_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \} d {\varvec{\omega }}_{i}= \exp \{k_{i}L_{i}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\} \\&\quad \times \left( \frac{2\pi }{|k_{i} \hat{L}_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''} |} \right) ^{b/2} (\sqrt{k_{i}}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) + O_{p}(k_{i}^{-1})), \\ \end{aligned} \end{aligned}$$

where \(\hat{L}_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''} = \hat{L}_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''} ({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) \). Because (19) implies \(\sqrt{k_{i}}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))=O_{p}(1)\), it follows from R1 that

$$\begin{aligned} (\sqrt{k_{i}}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) + O_{p}(k_{i}^{-1})) \times \prod _{j \ne i}^{n} (1+O_{p}(k_{i}^{-1})) = \sqrt{k_{i}}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) + O_{p}(N^{-1}). \end{aligned}$$

Hence we have

$$\begin{aligned} \begin{aligned} J({\varvec{\gamma }})&= \sum _{i=1}^{n} \sqrt{k_{i}} \exp \{k_{i}L_{i}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\} \left( \frac{2\pi }{|k_{i} \hat{L}_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''} |} \right) ^{b/2} (\sqrt{k_{i}}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) + O_{p}(k_{i}^{-1})) \\&\qquad \times \prod _{j \ne i}^{n} \exp \{k_{i}L_{j}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}_{j}({\varvec{\gamma }}) \} \left( \frac{2\pi }{|k_{i} \hat{L}_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''} |} \right) ^{b/2} (1+O_{p}(k_{i}^{-1})) \\&\quad = K({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) \left[ \sum _{i=1}^{n}k_{i}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) + nO_{p}(N^{-1/2}) \right] \\ \end{aligned} \end{aligned}$$

where \(K({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) = \exp \{\sum _{i=1}^{n} k_{i}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\} \prod _{i=1}^{n} (2\pi /|k_{i}L_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''}|)^{b/2}.\) Since \(K({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) \ne 0\) for all \({\varvec{\gamma }}\in \varGamma \), the approximate MLE \(\hat{{\varvec{\gamma }}}_{MLE}\) of \({\varvec{\gamma }}\) satisfies

$$\begin{aligned} J({\varvec{\gamma }})|_{{\varvec{\gamma }}=\hat{{\varvec{\gamma }}}_{MLE}} = \mathbf{0}\Longleftrightarrow J_{1}({\varvec{\gamma }},\hat{{\varvec{\omega }}}({\varvec{\gamma }}))|_{{\varvec{\gamma }}=\hat{{\varvec{\gamma }}}_{MLE}} + O_{p}(nN^{-1/2}) = \mathbf{0}, \end{aligned}$$
(22)

where \(J_{1}({\varvec{\gamma }},\hat{{\varvec{\omega }}}({\varvec{\gamma }})) = \sum _{i=1}^{n}k_{i}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) = \sum _{i=1}^{n} l_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\) is the set of estimating equations for \({\varvec{\gamma }}\) conditional on fixed \({\varvec{\lambda }}\), as given in (19). By taking a first-order Taylor series expansion of \(J_{1}({\varvec{\gamma }},\hat{{\varvec{\omega }}}({\varvec{\gamma }}))\) about \(\hat{{\varvec{\gamma }}}_{MLE}\) and noting that, from (22), \(J_{1}(\hat{{\varvec{\gamma }}}_{MLE},\hat{\hat{{\varvec{\omega }}}}({\varvec{\gamma }}_{MLE})) = O_{p}(nN^{-1/2})\), we have

$$\begin{aligned} J_{1}({\varvec{\gamma }},\hat{{\varvec{\omega }}}({\varvec{\gamma }})) = O_{p}(nN^{-1/2}) + J_{1}^{'}({\varvec{\gamma }}^{*},\hat{{\varvec{\omega }}}({\varvec{\gamma }}^{*}))({\varvec{\gamma }}- \hat{{\varvec{\gamma }}}_{MLE}), \end{aligned}$$
(23)

where

$$\begin{aligned} J_{1}^{'}({\varvec{\gamma }}^{*},\hat{{\varvec{\omega }}}({\varvec{\gamma }}^{*})) = \frac{\partial }{\partial {\varvec{\gamma }}^{T}}J_{1}({\varvec{\gamma }},\hat{{\varvec{\omega }}}({\varvec{\gamma }}))|_{{\varvec{\gamma }}= {\varvec{\gamma }}^{*}} = \frac{\partial }{\partial {\varvec{\gamma }}^{T}} \sum _{i=1}^{n} k_{i}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))|_{{\varvec{\gamma }}= {\varvec{\gamma }}^{*}} \end{aligned}$$

and \({\varvec{\gamma }}^{*}\) is on the line segment joining \({\varvec{\gamma }}\) to \(\hat{{\varvec{\gamma }}}_{MLE}\). By applying the chain rule, we have, for any \({\varvec{\gamma }}\in \varGamma \),

$$\begin{aligned} J_{1}^{'}({\varvec{\gamma }},\hat{{\varvec{\omega }}}({\varvec{\gamma }}))&= \sum _{i=1}^{n}\left[ \frac{\partial ^{2}}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^{T}} k_{i}L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})|_{{\varvec{\omega }}_{i}=\hat{{\varvec{\omega }}_{i}}({\varvec{\gamma }})} +\frac{\partial ^{2}}{\partial {\varvec{\gamma }}\partial {\varvec{\omega }}_{i}^{T}} k_{i}L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})|_{{\varvec{\omega }}_{i}=\hat{{\varvec{\omega }}_{i}}({\varvec{\gamma }})} \frac{\partial {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{T}} \right] \nonumber \\&= \sum _{i=1}^{n}\left[ l_{i,{\varvec{\gamma }}{\varvec{\gamma }}}^{''}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) + l_{i,{\varvec{\gamma }}{\varvec{\omega }}}^{''}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) \frac{\partial {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{T}} \right] . \end{aligned}$$
(24)

Note that \(\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})\) maximized \(k_{i}L_{i}({\varvec{\gamma }},{\varvec{\omega }})\) by definition and this implies

$$\begin{aligned} l_{i,{\varvec{\omega }}}^{'}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})) -D^{-1}\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}) = \mathbf{0}\Longleftrightarrow \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}) = D l_{i,{\varvec{\omega }}}^{'}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})). \end{aligned}$$

Applying the chain rule once again, we have

$$\begin{aligned} \begin{aligned} \frac{\partial \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{T}}&= \frac{\partial }{\partial {\varvec{\gamma }}^{T}}\{ Dl_{i,{\varvec{\omega }}}^{'}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}))\} \\&=D\left[ \frac{\partial ^{2}}{\partial {\varvec{\omega }}_{i} \partial {\varvec{\gamma }}^{T}}l_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\Big |_{{\varvec{\omega }}_{i} = \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})} \right] + D\left[ \frac{\partial ^{2}}{\partial {\varvec{\omega }}_{i} \partial {\varvec{\omega }}_{i}^{T}} l_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\Big |_{{\varvec{\omega }}_{i} = \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})} \right] \frac{\partial \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{T}}\\&=Dl_{i,{\varvec{\omega }}{\varvec{\gamma }}}^{''}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})) + Dl_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}))\frac{\partial \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{T}} \end{aligned}. \end{aligned}$$

Solving the above equation for \([\partial {\hat{{\varvec{\omega }}}}_{i}({\varvec{\gamma }})/\partial {\varvec{\gamma }}^{T}]\), we have

$$\begin{aligned} \frac{\partial \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{T}}=\left[ - l_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''}({\varvec{\gamma }}, \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})) + D^{-1} \right] ^{-1} l_{i,{\varvec{\omega }}{\varvec{\gamma }}}^{''}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})). \end{aligned}$$

Substituting this expression of \([\partial \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})/\partial {\varvec{\gamma }}^{T}]\) in (24), it follows from R1, R3 and R5 that as \(n\rightarrow \infty \)

$$\begin{aligned} -\frac{1}{n} J_{1}^{'}({\varvec{\gamma }}^{*}, \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*}))&= -\frac{1}{n} \sum _{i=1}^{n} \frac{\partial }{\partial {\varvec{\gamma }}^{T}} l_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}))\big |_{{\varvec{\gamma }}={\varvec{\gamma }}^{*}}\nonumber \\&=\varOmega ({\varvec{\gamma }}^{*})^{-1} + o_{p}(1_{n}), \end{aligned}$$
(25)

which implies \(-\frac{1}{n} J_{1}^{'}({\varvec{\gamma }}^{*}, \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*})) \xrightarrow {\text {p}} \varOmega ({\varvec{\gamma }}^{*})^{-1} \).

It follows from (25) that for sufficiently large n, \(-\frac{1}{n} J_{1}^{'}({\varvec{\gamma }}^{*}, \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*}))> \frac{1}{2}\varOmega ({\varvec{\gamma }}^{*})^{-1} > \frac{\lambda _{1}}{2}I\) with probability 1, where \(\lambda _{1}\) is defined in R5. This, together with (23) and the fact that for sufficiently large N, \(\parallel O_{p}(N^{-1/2})\parallel \le \lambda _{1} \delta /4 \) for any \(0< \delta < d_{1}\) (\(d_{1}\) as defined in R5), imply that, conditional on \({\varvec{\omega }}= ({\varvec{\omega }}_{1},\ldots , {\varvec{\omega }}_{n})\),

$$\begin{aligned} P_{{\varvec{\omega }}}\left\{ \frac{1}{\delta } ({\varvec{\gamma }}- \hat{{\varvec{\gamma }}}_{MLE})^{T} \left[ \frac{1}{n} J_{1}({\varvec{\gamma }}, \hat{{\varvec{\omega }}}({\varvec{\gamma }})) \right] <0 \right\} \rightarrow 1 \quad as \quad n \rightarrow \infty , N \rightarrow \infty . \end{aligned}$$

Thus, since \(\hat{{\varvec{\gamma }}}\) satisfies \(J_{1}(\hat{{\varvec{\gamma }}},\hat{{\varvec{\omega }}}({\varvec{\gamma }})) = \mathbf{0}\), it follows using standard arguments that

$$\begin{aligned} \lim _{n,N\rightarrow \infty } P_{{\varvec{\omega }}}\{ \parallel \hat{{\varvec{\gamma }}} - \hat{{\varvec{\gamma }}}_{MLE} \parallel < \delta \} = 1. \end{aligned}$$

Hence, using the bounded convergence theorem, we have

$$\begin{aligned} \lim _{n,N \rightarrow \infty } P\{ \parallel \hat{{\varvec{\gamma }}} - \hat{{\varvec{\gamma }}}_{MLE} \parallel< \delta \} = \text {E}_{{\varvec{\omega }}} \left\{ \lim _{n,N \rightarrow \infty } P_{{\varvec{\omega }}}\{ \parallel \hat{{\varvec{\gamma }}} - \hat{{\varvec{\gamma }}}_{MLE} \parallel < \delta \} \right\} = 1. \end{aligned}$$
(26)

Since \(J_{1}(\hat{{\varvec{\gamma }}},\hat{{\varvec{\omega }}}(\hat{{\varvec{\gamma }}})) = \mathbf{0}\) and \(J_{1}(\hat{{\varvec{\gamma }}}_{MLE}, \hat{{\varvec{\omega }}}(\hat{{\varvec{\gamma }}}_{MLE}))=O_{p}(nN^{-1/2})\) by (23), it follows from (25) that

$$\begin{aligned} -\frac{1}{n}J_{1}^{'}({\varvec{\gamma }}^{**},\hat{{\varvec{\omega }}}({\varvec{\gamma }}^{**}))(\hat{{\varvec{\gamma }}} - \hat{{\varvec{\gamma }}}_{MLE}) = O_{p}(N^{-1/2}), \end{aligned}$$
(27)

where \({\varvec{\gamma }}^{**}\) is on the line segment joining \(\hat{{\varvec{\gamma }}}\) to \(\hat{{\varvec{\gamma }}}_{MLE}\). From (27), we know the \({\varvec{\gamma }}^{**}\) is in the interior, it follows from R5 that \(J_{1}^{'}({\varvec{\gamma }}^{**},\hat{{\varvec{\omega }}}({\varvec{\gamma }}^{**})) \ge \frac{n}{2}\varOmega ({\varvec{\gamma }}^{**})^{-1}\), where \(\varOmega ({\varvec{\gamma }}^{**})^{-1}\) is positive definite. Hence the estimator satisfies \((\hat{{\varvec{\gamma }}} - \hat{{\varvec{\gamma }}}_{MLE}) = O_{p}(N^{-1/2})\), from which it follows (given R7) that

$$\begin{aligned} \begin{aligned} \hat{{\varvec{\gamma }}}&= \hat{{\varvec{\gamma }}}_{MLE} + O_{p}(N^{-1/2}) = {\varvec{\gamma }}_{0} + O_{p}(n^{-1/2}) + O_{p}(N^{-1/2}) \\&= {\varvec{\gamma }}_{0} + O_{p}\left\{ \text {max}\left[ n^{-1/2}, \text {min}(k_{i})^{-1/2}\right] \right\} ,\\ \end{aligned} \end{aligned}$$

the desired result.

1.4 A.4 Asymptotic normality of \(\hat{{\varvec{\gamma }}}\)

The asymptotic normality of \(\hat{{\varvec{\gamma }}}\) will be shown based on the estimation equations (15) and (16). Let

$$\begin{aligned} \varPhi ({\varvec{\gamma }}) = \sum _{i=1}^{n} U_{i}^{T}\varSigma _{i}^{-1}({\varvec{z}}_{i}- U_{i}{\varvec{\gamma }}), \end{aligned}$$

where \(U_{i}, \varSigma _{i}\) and \({\varvec{z}}_{i}\) are defined in Sect. 3.1. The estimator \(\hat{{\varvec{\gamma }}}\) satisfies \(\varPhi (\hat{{\varvec{\gamma }}}) = \mathbf{0}\) at convergence. Noting that \(\partial \varPhi ({\varvec{\gamma }})/\partial {\varvec{\gamma }}^{T} = -\sum _{i=1}^{n}U_{i}^{T}\varSigma _{i}^{-1}U_{i}\) is constant for \({\varvec{\gamma }}\), we take a Taylor series expansion of \(\varPhi (\hat{{\varvec{\gamma }}})\) around the true parameter \({\varvec{\gamma }}_{0}\):

$$\begin{aligned} \mathbf{0}= \varPhi (\hat{{\varvec{\gamma }}}) \approx \varPhi ({\varvec{\gamma }}_{0}) + \frac{\partial \varPhi ({\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{T}} (\hat{{\varvec{\gamma }}} - {\varvec{\gamma }}_{0}) , \end{aligned}$$

which implies

$$\begin{aligned} \begin{aligned} \sqrt{n}(\hat{{\varvec{\gamma }}} - {\varvec{\gamma }}_{0})&\approx \left[ -\frac{1}{n} \frac{\partial \varPhi ({\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{T}} \right] ^{-1}\left[ \frac{1}{\sqrt{n}}\varPhi ({\varvec{\gamma }}_{0})\right] \\&= \left[ \frac{1}{n} \sum _{i=1}^{n} U_{i}\varSigma _{i}^{-1}U_{i} \right] ^{-1} \left( \frac{1}{\sqrt{n}} \sum _{i=1}^{n} U_{i}^{T}\varSigma _{i}^{-1}(\tilde{{\varvec{r}}}_{i} - U_{i}{\varvec{\gamma }}_{0}) \right) . \end{aligned} \end{aligned}$$

Since \(E[U_{i}^{T}\varSigma _{i}^{-1}({\varvec{z}}_{i}- U_{i}{\varvec{\gamma }}_{0})]=0\) and \(\text {Cov}[U_{i}^{T}\varSigma _{i}^{-1}({\varvec{z}}_{i}- U_{i}{\varvec{\gamma }}_{0})]=U_{i}^{T}\varSigma _{i}^{-1}U_{i}\), by the Lindeberg Central Limit Theorem, we have

$$\begin{aligned} \left[ \frac{1}{n} \sum _{i=1}^{n} U_{i}\varSigma _{i}^{-1}U_{i} \right] ^{-1/2} \left( \frac{1}{\sqrt{n}} \sum _{i=1}^{n} U_{i}^{T}\varSigma _{i}^{-1}({\varvec{z}}_{i}- U_{i}{\varvec{\gamma }}_{0}) \right) \xrightarrow {\text {d}} N(\mathbf{0}, I). \end{aligned}$$

Noting that \(\hat{{\varvec{\gamma }}} = {\varvec{\gamma }}_{0} + o_{p}(1_{N,n})\) and \(k_{i}=O(N)\), and using (20), we have

$$\begin{aligned} \begin{aligned} \hat{{\varvec{\omega }}}_{i}(\hat{{\varvec{\gamma }}})&= \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}_{0}) + o_{p}(1_{N,n}) = {\varvec{\omega }}_{i} + O_{p}(k_{i}^{-1/2}) + o_{p}(1_{N,n}) \\&= {\varvec{\omega }}_{i} + O_{p}(N^{-1/2}) + o_{p}(1_{N,n}) \\&= {\varvec{\omega }}_{i} + o_{p}(1_{N,n}).\\ \end{aligned} \end{aligned}$$

Hence, it follows by the Law of Large Numbers and R6 that

$$\begin{aligned} \begin{aligned} \left[ \frac{1}{n} \sum _{i=1}^{n} U_{i}\varSigma _{i}^{-1}U_{i} \right] ^{-1/2}&\xrightarrow {\text {p}} \left[ \frac{1}{n} \sum _{i=1}^{n} U_{i}\varSigma _{i}^{-1}U_{i} \right] _{\tilde{{\varvec{\gamma }}}={\varvec{\gamma }}_{0}, \tilde{{\varvec{\omega }}} = {\varvec{\omega }}}^{-1/2}, \quad N \rightarrow \infty , n \rightarrow \infty \\&\xrightarrow {\text {p}} [\varOmega ({\varvec{\gamma }}_{0})]^{1/2}, \quad \quad \quad \quad N \rightarrow \infty , n \rightarrow \infty .\\ \end{aligned} \end{aligned}$$

Using Slutsky’s theorem, we can show that

$$\begin{aligned} \begin{aligned} \sqrt{n}(\hat{{\varvec{\gamma }}} - {\varvec{\gamma }}_{0})&= \left[ \frac{1}{n} \sum _{i=1}^{n} U_{i}\varSigma _{i}^{-1}U_{i} \right] ^{-1/2} \left( \frac{1}{\sqrt{n}} \sum _{i=1}^{n} U_{i}^{T}\varSigma _{i}^{-1}({\varvec{z}}_{i}- U_{i}{\varvec{\gamma }}_{0}) \right) \\&\left[ \frac{1}{n} \sum _{i=1}^{n} U_{i}\varSigma _{i}^{-1}U_{i} \right] ^{-1/2} \\&\xrightarrow {\text {d}} N(\mathbf{0}, \varOmega ({\varvec{\gamma }}_{0})), \quad \text {as} \quad N \rightarrow \infty , n \rightarrow \infty .\\ \end{aligned} \end{aligned}$$

We can extend the foregoing proofs to the case where \({\varvec{\lambda }}\) is unknown by replacing \({\varvec{\lambda }}\) with its consistent estimate \(\hat{{\varvec{\lambda }}}\). Note that at the estimate \(\hat{{\varvec{\gamma }}}\), the estimate \(\hat{{\varvec{\lambda }}}\) given in (12) can be shown to be consistent for \({\varvec{\lambda }}\) as \(n \rightarrow \infty \) and \(N\rightarrow \infty \) see, e.g., Demidenko (2004).

B Monte Carlo (MC) EM algorithm

The EM algorithm is a standard approach for likelihood estimation in the presence of missing data. In our case, by treating the random effects \(({\varvec{a}}_{i}\), \({\varvec{b}}_{i})\) as “missing data”, we can write “complete data” as \(\big \{ ({\varvec{x}}_{i},{\varvec{y}}_{i}, {\varvec{a}}_{i}, {\varvec{b}}_{i}), i=1,\ldots , n \big \}\). Let \({\varvec{\theta }}\) be the collection of all parameters. Then, the “complete-data” log-likelihood function for individual i can be expressed as

$$\begin{aligned} l_{c}^{(i)}({\varvec{\theta }}) = \log f({\varvec{y}}_{i}|{\varvec{x}}_{i}^{*},{\varvec{b}}_{i}; {\varvec{\beta }}) + \log f({\varvec{x}}_{i}|{\varvec{a}}_{i};{\varvec{\alpha }},\sigma ^2) + \log f({\varvec{a}}_{i};A) + \log f({\varvec{b}}_{i};B).\nonumber \\ \end{aligned}$$
(28)

The EM algorithm iterates between E-step and M-step until convergence. Let \({\varvec{\theta }}^{(t)}\) be the parameter estimates from the tth EM iteration. The E-step for individual i at the \((t+1)\)th EM iteration can be expressed as

$$\begin{aligned} Q_{i}({\varvec{\theta }}|{\varvec{\theta }}^{(t)})= & {} \int \int [\log f({\varvec{y}}_{i}|{\varvec{x}}_{i}^{*},{\varvec{b}}_{i};{\varvec{\beta }}) + \log f({\varvec{x}}_{i}|{\varvec{a}}_{i};{\varvec{\alpha }},\sigma ^2) \nonumber \\&+\log f({\varvec{a}}_{i};A) + \log f({\varvec{b}}_{i};B) ] f({\varvec{a}}_{i},{\varvec{b}}_{i}|{\varvec{y}}_{i};{\varvec{\theta }}^{(t)}) d{\varvec{a}}_{i}d{\varvec{b}}_{i}\end{aligned}$$
(29)

The above E-step again involves an intractable integration. However, because expression (29) is an expectation with respect to \(f({\varvec{a}}_{i},{\varvec{b}}_{i}|{\varvec{y}}_{i};{\varvec{\theta }}^{(t)})\), and it can be evaluated using the MCEM algorithm (Wei and Tanner 1990; Booth and Hobert 1999; Ibrahim et al. 1999). Specifically, we may use the Gibbs sampler to generate many samples from \(f({\varvec{a}}_{i}, {\varvec{b}}_{i}|{\varvec{y}}_{i}; {\varvec{\theta }}^{(t)})\) by iteratively sampling from the full conditionals \([{\varvec{a}}_{i}|{\varvec{x}}_{i},{\varvec{y}}_{i},{\varvec{b}}_{i};{\varvec{\theta }}^{(t)}]\), and \([{\varvec{b}}_{i}|{\varvec{x}}_{i},{\varvec{y}}_{i},{\varvec{a}}_{i};{\varvec{\theta }}^{(t)}]\). Monte Carlo samples from each of the above full conditionals can be generated using rejection sampling methods.

After generating large random samples from the conditional distribution \(f({\varvec{a}}_{i},{\varvec{b}}_{i}|{\varvec{y}}_{i};{\varvec{\theta }}^{(t)})\), we can then approximate the expectation \(Q_{i}({\varvec{\theta }}|{\varvec{\theta }}^{(t)})\) in the E-step by its empirical mean, with “missing data” replaced by simulated values. Then the M-step, which maximizes \(\sum _{i=1}^{n} Q_{i}({\varvec{\theta }}|{\varvec{\theta }}^{(t)})\), is like a complete-data maximization, so complete-data optimization procedures such as the Newton-Raphson method may be used to update the parameter estimates. At convergence, we obtain the MLE of \({\varvec{\theta }}\) or a possibly local maximum. We may try different starting values to roughly check if an MLE (i.e., global maximum) is obtained. Thus, for the MCEM algorithm, a major computational challenge is the implementation of the E-step. Although conceptually the MCEM method is not new, implementation for a specific problem like the complicated models described above can be quite challenging and tedious, since it not only involves non-trial programming but it also involves various convergence issues such as very slow or non-convergence.

To obtain the variance-covariance matrix of the MLE \(\hat{{\varvec{\theta }}}\), we could consider the following approximate formula (McLachlan and Krishnan 1997). Let \(S_{c}^{(i)} = \partial l_{c}^{(i)}/\partial {\varvec{\theta }}\), where \(l_{c}^{(i)}\) is the complete-data log-likelihood for individual i. Then an approximate formula for the variance-covariance matrix of \({\varvec{\hat{\theta }}}\) is

$$\begin{aligned} \texttt {Cov}({\varvec{\hat{\theta }}}) = \left[ \sum _{i=1}^{n} \texttt {E}(S_{c}^{(i)}|{\varvec{y}}_{i};{\varvec{\hat{\theta }}})\texttt {E}(S_{c}^{(i)}| {\varvec{y}}_{i};{\varvec{\hat{\theta }}})^{T} \right] ^{-1}, \end{aligned}$$

where the expectation can be approximated by Monte Carlo empirical means, as above.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Wu, L. An approximate method for generalized linear and nonlinear mixed effects models with a mechanistic nonlinear covariate measurement error model. Metrika 82, 471–499 (2019). https://doi.org/10.1007/s00184-018-0690-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-018-0690-z

Keywords

Navigation