An approximate method for generalized linear and nonlinear mixed effects models with a mechanistic nonlinear covariate measurement error model

Zhang, Hongbin; Wu, Lang

doi:10.1007/s00184-018-0690-z

An approximate method for generalized linear and nonlinear mixed effects models with a mechanistic nonlinear covariate measurement error model

Published: 17 October 2018

Volume 82, pages 471–499, (2019)
Cite this article

Metrika Aims and scope Submit manuscript

270 Accesses
1 Citation
Explore all metrics

Abstract

The literature on measurement error for time-dependent covariates has mostly focused on empirical models, such as linear mixed effects models. Motivated by an AIDS study, we propose a joint modeling method in which a mechanistic nonlinear model is used to address the time-varying covariate measurement error for a longitudinal outcome that can be either discrete such as binary and count or continuous. We implement an inference procedure that uses first-order Taylor approximation to linearize both the covariate model and the response model. We study the asymptotic properties of the joint model based estimator and provide proof of consistency and normality. We then evaluate the performance of estimation in finite sample size scenario through simulation. Finally, we apply the new method to real data in an HIV/AIDS study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simultaneous Variable Selection and Estimation in Generalized Semiparametric Mixed Effects Modeling of Longitudinal Data

The additive hazard estimator is consistent for continuous-time marginal structural models

Article 23 February 2019

A two-step estimator for generalized linear models for longitudinal data with time-varying measurement error

Article 22 November 2021

References

Acosta E, Walawander HWA, Eron J, Pettinelli C, Yu S, Neath D (2004) Comparison of two indinavir/ritonavir regimens in treatmet-experienced HIV-infected individuals. J Acquir Immune Defic Syndr 37:1358–1366
Article Google Scholar
Barndorff-Nielsen O, Cox D (1989) Asymptotic techniques for use in statistics. Chapman and Hall, New York
Book MATH Google Scholar
Booth J, Hobert J (1999) Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J R Stat Soc Ser B 61:265–285
Article MATH Google Scholar
Bradley R, Gart J (1962) The asymptotic properties of ml estimators when sampling from associated populations. Biometrika 49:205–214
Article MathSciNet MATH Google Scholar
Breslow N, Clayton D (1993) Approximate inference in generalized linear mixed models. J Am Stat Assoc 88:9–25
MATH Google Scholar
Carroll R, Ruppert D, Stefanski L, Crainiceanu C (2006) Measurement error in nonlinear models: a modern perspective, 2nd edn. Chapman and Hall, London
Book MATH Google Scholar
Cruz R, Marshall G, Quintana F (2011) Logistic regression when covariates are random effects from a non-linear mixed model. Biom J 53:735–749
Article MathSciNet MATH Google Scholar
Demidenko E (2004) Mixed models: theory and applications. Wiley, New York
Book MATH Google Scholar
Fitzmanurice G, Laird N, Ware J (2011) Applied longitudinal analysis, 2nd edn. Wiley, New York
Book Google Scholar
Fu L, Lei Y, Sharma R, Tang S (2013) Parameter estimation of nonlinear mixed-effects models using first-order conditional linearization and em algorithm. J Appl Stat 40(2):252–265
Article MathSciNet Google Scholar
Ibrahim J, Lipsitz S, Chen M (1999) Missing covariates in generalized linear models when the missing data mechanism is nonignorable. J R Stat Soc Ser B 61:173–190
Article MathSciNet MATH Google Scholar
Laird N, Ware J (1982) Random-effects models for longitudinal data. Biometrics 38:963–974
Article MATH Google Scholar
Lee Y, Nelder J, Pawitan Y (2006) Generalized linear models with random effects: unified analysis via H-likelihood. Chapman and Hall/CRC, London
Book MATH Google Scholar
Lindstrom M, Bates D (1990) Nonliner mixed effects models for repeated measures data. Biometrics 46:673–687
Article MathSciNet Google Scholar
Liu W, Wu L (2010) Some asymptotic results for semiparametric nonlinear mixed-effects models with incomplete data. J Stat Plan Inference 140:52–64
Article MathSciNet MATH Google Scholar
McLachlan G, Krishnan T (1997) The EM-algorithm and extension. Wiley, New York
MATH Google Scholar
Prentice E, Zhao L (1991) Estimating equation for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics 47:825–839
Article MathSciNet MATH Google Scholar
Serfling F (1980) Approximaton theorems of mathematical statistics. Wiley, New York
Book MATH Google Scholar
Vonesh E, Chinchilli V (1997) Linear and nonlinear models for the analysis of repeated measurements. Marcel Dekker, New York
MATH Google Scholar
Vonesh E, Wang H, Nie L, Majumdar D (2002) Conditional second-order generalized estimating equations for generalized linear and nonlinear mixed-effects models. J Am Stat Assoc 97:271–283
Article MathSciNet MATH Google Scholar
Wei G, Tanner M (1990) A Monte-Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithm. J Am Stat Assoc 85:699–704
Article Google Scholar
Wu L (2010) Mixed effects models for complex data. Chapman and Hall, London
MATH Google Scholar
Wu H, Ding A (1999) Population HIV-1 dynamics in vivo: applicable models and inferential tools for virological data from AIDS clinical trials. Biometrics 55:410–418
Article MATH Google Scholar
Zhang H, Wong H, Wu L (2018) A mechanistic nonlinear model for censored and mis-measured covariates in longitudinal models, with application in aids studies. Stat Med 37(1):167–178
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is partially supported by the City University of New York High-Performance Computing Center, College of Staten Island, funded in part by the City and State of New York, City University of New York Research Foundation and National Science Foundation grants CNS-0958379, CNS-0855217, and ACI-112611.

Author information

Authors and Affiliations

Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, Institute for Implementation Science in Population Health, City University of New York, 55 West 125th Street, New York, NY, 10027, USA
Hongbin Zhang
Department of Statistics, University of British Columbia, Vancouver, V6T 1Z4, Canada
Lang Wu

Authors

Hongbin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lang Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongbin Zhang.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Appendices

A Proofs of asymptotic properties

1.1 A.1 Regularity conditions

Consider models (4) and (5). Denote ${\varvec{\gamma }}=({\varvec{\alpha }}^{T},{\varvec{\beta }}^{T})^{T} \in \varGamma $, ${\varvec{\omega }}_{i}=({\varvec{a}}_{i}^{T}, {\varvec{b}}_{i}^{T})^{T}$, and

$$\begin{aligned} \begin{aligned} l_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})&=l_{i}({\varvec{\alpha }},{\varvec{\beta }},{\varvec{a}}_{i},{\varvec{b}}_{i};{\varvec{x}}_{i},{\varvec{y}}_{i}) = \log f({\varvec{x}}_{i}|{\varvec{\omega }}_{i};{\varvec{\gamma }}) + \log f({\varvec{y}}_{i}|{\varvec{\omega }}_{i};{\varvec{\gamma }}),\\ k_{i}L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})&= l_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i}) + \log f({\varvec{a}}_{i}) + \log f({\varvec{b}}_{i}),\\ \end{aligned} \end{aligned}$$

where $f(\cdot )$ is a generic density function, $f({\varvec{x}}|{\varvec{y}})$ is a conditional distribution of X given Y and $k_{i}=n_{i}+m_{i}$. Denote the estimates by $\hat{{\varvec{\gamma }}}$ and ${\varvec{\hat{\omega }_{i}}}$ and

$$\begin{aligned} l_{i,{\varvec{\omega }}_{i}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) = \frac{\partial }{\partial {\varvec{\omega }}_{i}} l_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\Big |_{{\varvec{\omega }}_{i}={\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})}, \quad l_{i,{\varvec{\omega }}_{i}{\varvec{\omega }}_{i}}^{''}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) = \frac{\partial ^{2}}{\partial {\varvec{\omega }}_{i}\partial {\varvec{\omega }}_{i}^{T}} l_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\Big |_{{\varvec{\omega }}_{i}={\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})}. \end{aligned}$$

Similarly, we can define $l_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))$, $l_{i,{\varvec{\gamma }}{\varvec{\gamma }}}^{''}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))$, $l_{i,{\varvec{\gamma }}{\varvec{\omega }}_{i}}^{''}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))$ and derivatives for $L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})$.

We denote convergence in probability as $k_{i} \rightarrow \infty $ by $o_{p}(1_{k_{i}})$, convergence in probability as $n \rightarrow \infty $ by $o_{p}(1_{n})$, and convergence in probability as both $k_{i} \rightarrow \infty $ and $n \rightarrow \infty $ by $o_{p}(1_{k_{i},n})$. We show consistency and asymptotic normality under the following regularity conditions.

R1.
$k_{i} = O(N) $ uniformly for $i=1,\ldots ,n$, where $N= \text {min}_{i}\{k_{i}\}$.
R2.
The variance-covariance parameters ${\varvec{\lambda }}=(\sigma ^{2}, A, B) $ are fixed and known, and the true parameter ${\varvec{\gamma }}_{0}$ is in the interior of $\varGamma $. Quantities $U_{i}$, $\varLambda _{i}$, $V_{i}$, $\varSigma _{i}$ and D are evaluated at ${\varvec{\gamma }}$ and ${\varvec{\omega }}_{i}$. When ${\varvec{\lambda }}$ is unknown, we can simply replace it by its consistent estimate (in (12)).
R3.
The density $f(x_{ij}|{\varvec{\omega }}_{i};{\varvec{\gamma }})$ and $f(y_{ij}|{\varvec{\omega }}_{i};{\varvec{\gamma }})$ satisfy the necessary regularity conditions (e.g., Bradley and Gart 1962) such that, for fixed ${\varvec{\gamma }}$, the estimate of ${\varvec{\omega }}_{i}$ is $\sqrt{k_{i}}$-consistent for ${\varvec{\omega }}_{i}$ as $k_{i} \rightarrow \infty $. Also, the necessary regularity conditions are assumed (e.g., Serfling 1980, p. 27, Theorem C) such that, by the Law of Large Numbers, the following hold (as $k_{i} \rightarrow \infty $):
$$\begin{aligned} \begin{aligned}&-k_{i}^{-1} \frac{\partial ^{2}}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) = k_{i}^{-1}U_{i}^{T}\varLambda _{i}^{-1}U_{i} + o_{p}(1_{k_{i}}), \\&-k_{i}^{-1} \frac{\partial ^{2}}{\partial {\varvec{\omega }}_{i}\partial {\varvec{\omega }}_{i}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) = k_{i}^{-1}V_{i}^{T}\varLambda _{i}^{-1}V_{i} + o_{p}(1_{k_{i}}),\\&-k_{i}^{-1} \frac{\partial ^{2}}{\partial {\varvec{\gamma }}\partial {\varvec{\omega }}_{i}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) = k_{i}^{-1}U_{i}^{T}\varLambda _{i}^{-1}V_{i} + o_{p}(1_{k_{i}}), \\&-k_{i}^{-1} \frac{\partial ^{2}}{\partial {\varvec{\omega }}_{i}\partial {\varvec{\gamma }}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) = k_{i}^{-1}V_{i}^{T}\varLambda _{i}^{-1}U_{i} + o_{p}(1_{k_{i}}), \\ \end{aligned} \end{aligned}$$
where, under models (4)–(5),
$$\begin{aligned} \begin{aligned}&\text {E}_{{\varvec{\gamma }}|{\varvec{\omega }}}\left[ - \frac{\partial ^{2}}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \right] = U_{i}^{T}\varLambda _{i}^{-1}U_{i}, \\&\text {E}_{{\varvec{\gamma }}|{\varvec{\omega }}}\left[ - \frac{\partial ^{2}}{\partial {\varvec{\omega }}_{i}\partial {\varvec{\omega }}_{i}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \right] = V_{i}^{T}\varLambda _{i}^{-1}V_{i},\\&\text {E}_{{\varvec{\gamma }}|{\varvec{\omega }}}\left[ - \frac{\partial ^{2}}{\partial {\varvec{\gamma }}\partial {\varvec{\omega }}_{i}^{T}}l_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \right] = U_{i}^{T}\varLambda _{i}^{-1}V_{i}.\\ \end{aligned} \end{aligned}$$
Finally, the matrices $k_{i}^{-1}U_{i}^{T}\varLambda _{i}^{-1}U_{i}$ and $k_{i}^{-1}V_{i}^{T}\varLambda _{i}^{-1}V_{i}$ are both assumed to be positive definite with finite determinants such that, for example, the smallest eigenvalue of $k_{i}^{-1}U_{i}^{T}\varLambda _{i}^{-1}U_{i}$ exceeds $\lambda _{0}$ for some $\lambda _{0} > 0$.
R4.
For all ${\varvec{\gamma }}\in \varGamma $ and all the s-dimensional ${\varvec{\omega }}_{i}\in R^{s}$, the function $L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})$ is three times differentiable and continuous in ${\varvec{\gamma }}$ and ${\varvec{\omega }}_{i}$ for all $x_{ij}$ and $y_{ij}$, and $L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})$ satisfies the conditions to change the order of integration and differentiation, as indicated in the proof.
R5.
For any ${\varvec{\gamma }}\in \varGamma $, there exist $d_{1} > 0$ and $\lambda _{1} > 0$ such that (a): for all ${\varvec{\gamma }}^{*} \in B_{d_{1}}({\varvec{\gamma }})$, where $B_{d_{1}}({\varvec{\gamma }})$ is the $\tau $-dimensional sphere centered at ${\varvec{\gamma }}$ with radius $d_{1}$, the following holds:
$$\begin{aligned} -\frac{1}{n} \sum _{i=1}^{n} \frac{\partial }{\partial {\varvec{\gamma }}^{T}}l_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})) \Big |_{{\varvec{\gamma }}={\varvec{\gamma }}^{*}} = \varOmega ({\varvec{\gamma }}^{*})^{-1} + o_{p}(1_{n}),\quad as \quad n \rightarrow \infty , \end{aligned}$$
where $\varOmega ({\varvec{\gamma }}^{*})^{-1}$ is positive definite with minimum eigenvalue greater than $\lambda _{1}$ and
$$\begin{aligned} \begin{aligned} \frac{\partial }{\partial {\varvec{\gamma }}^{T}}l_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})) \Big |_{{\varvec{\gamma }}={\varvec{\gamma }}^{*}} =&\,\{ l_{i,{\varvec{\gamma }}{\varvec{\gamma }}}^{''}({\varvec{\gamma }}^{*},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*})) + l_{i,{\varvec{\gamma }}{\varvec{\omega }}_{i}}^{''}({\varvec{\gamma }}^{*},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*})) \\&\times [l_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''}({\varvec{\gamma }}^{*},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*})) + D^{-1} ]^{-1} l_{i,{\varvec{\omega }}{\varvec{\gamma }}}^{''}({\varvec{\gamma }}^{*},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*}))\}; \end{aligned} \end{aligned}$$
and (b): the first, second, and third derivatives of $\sqrt{k_{i}}L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})$ with respect to ${\varvec{\omega }}_{i}$ are uniformly bounded in $B_{d_{1}}({\varvec{\gamma }})$.
R6.
At the true value ${\varvec{\gamma }}_{0}$, the following hold true:
$$\begin{aligned} \begin{aligned}&\text {E}_{{\varvec{\omega }}}(U_{i}^{T}\varSigma _{i}^{-1}U_{i}) = \varphi _{i}({\varvec{\gamma }}_{0}) \quad \text {exists for all} \quad i=1,\ldots ,n, \\&\lim _{n \rightarrow \infty } n^{-2} \sum _{i=1}^{n}\text {Cov}_{{\varvec{\omega }}}( U_{i}^{T}\varSigma _{i}^{-1}U_{i}) = 0, \quad \lim _{n \rightarrow \infty } n^{-1} \sum _{i=1}^{n} \varphi _{i}({\varvec{\gamma }}_{0}) = \varOmega ({\varvec{\gamma }}_{0})^{-1}, \\ \end{aligned} \end{aligned}$$
where $U_{i}$, $V_{i}$, and $\varSigma _{i}$ are evaluated at ${\varvec{\gamma }}_{0}$ and ${\varvec{\omega }}_{i}$, and $\varOmega ({\varvec{\gamma }}_{0})^{-1}$ is positive definite.
R7.
The marginal densities, $\int \exp \{k_{i}L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\}d{\varvec{\omega }}_{i}$, satisfy the necessary regularity conditions such that the MLE of ${\varvec{\gamma }}$ exists and satisfies $(\hat{{\varvec{\gamma }}}_{MLE} - {\varvec{\gamma }}_{0}) = O_{p}(n^{-1/2})$.

Let ${\varvec{\gamma }}=({\varvec{\alpha }}^{T}, {\varvec{\beta }}^{T})^{T}$ be the mean parameters in models (4) and (5). Let ${{\varvec{\hat{\gamma }}}}_{\text {MLE}}$ be the the (exact) MLE of ${\varvec{\gamma }}$ which maximizes the observed data log-likelihood $l_o({\varvec{\theta }})$ in (3), and let $\hat{{\varvec{\gamma }}}$ be the approximate MLE based on the linearization method described in Sect. 3. Under the regularity conditions R1–R7, we have the following asymptotic results:

Theorem$\varvec{A}_{\varvec{1}}$The estimates${{\varvec{\hat{\gamma }}}}_{\text {MLE}}$and${\varvec{\hat{\gamma }}}$satisfy

$$\begin{aligned} \begin{aligned} {\varvec{\hat{\gamma }}}&= {\varvec{\hat{\gamma }}}_{\text {MLE}} + O_{p}\{ \text {min}(n_{i}, m_{i})^{-1/2} \} = {\varvec{\gamma }}_{0} + O_{p}\{ \text {max}[n^{-1/2}, \text {min}(n_{i}, m_{i})^{-1/2}]\},\\ \end{aligned} \end{aligned}$$

where ${\varvec{\gamma }}_{0}$ is the true value of ${\varvec{\gamma }}$.

Theorem$\varvec{A}_{\varvec{2}}$The approximate estimate${\varvec{\hat{\gamma }}}$asymptotically follows a normal distribution

$$\begin{aligned} \sqrt{n}({\varvec{\hat{\gamma }}}- {\varvec{\gamma }}_{0})\quad \xrightarrow {\text {d}} \quad N(\mathbf{0}, \varOmega ({\varvec{\gamma }}_{0})), \quad \text {as} \quad n \rightarrow \infty , \quad \text {min}(n_{i}, m_{i}) \rightarrow \infty , \end{aligned}$$

where $\varOmega ({\varvec{\gamma }}_{0})$ is given in R5 and R6.

1.2 A.2 Estimating equations

In Sect. 3.1, for fixed ${\tilde{{\varvec{\lambda }}}}$, we obtain an approximate MLE $\hat{{\varvec{\gamma }}}(\tilde{{\varvec{\lambda }}})$ and $\hat{{\varvec{\omega }}}_{i}(\tilde{{\varvec{\lambda }}})$ using linearization. The linearization procedure is equivalent to maximizing (with respect to ${\varvec{\gamma }}$ and ${\varvec{\omega }}_{i}$) a “complete-data” likelihood as shown below. We note that the maximization is taken with respect to the random effects ${\varvec{\omega }}_{i}$. It does not lead to MLE but it is equivalent to the approximation by integrating out the random effects. Therefore, it is in fact a computational method rather than finding MLE (see, e.g., Vonesh et al. 2002; Lee et al. 2006).

In fact, we can write a complete-data log-likelihood as (suppressing ${\varvec{\lambda }}$)

$$\begin{aligned} l_{c}({\varvec{\alpha }},{\varvec{\beta }},{\varvec{a}},{\varvec{b}};{\varvec{x}},{\varvec{y}}) = \sum _{i=1}^{n}[l_{i}({\varvec{\alpha }},{\varvec{\beta }},{\varvec{a}}_{i},{\varvec{b}}_{i};{\varvec{x}}_{i},{\varvec{y}}_{i}) + \log f({\varvec{a}}_{i}) + \log f({\varvec{b}}_{i})], \end{aligned}$$

(14)

where ${\varvec{a}}=({\varvec{a}}_{1}^{T},\ldots ,{\varvec{a}}_{n}^{T})^{T}$, ${\varvec{b}}=({\varvec{b}}_{1}^{T},\ldots ,{\varvec{b}}_{n}^{T})^{T}$, ${\varvec{x}}=({\varvec{x}}_{1}^{T},\ldots ,{\varvec{x}}_{n}^{T})^{T}$, ${\varvec{y}}=({\varvec{y}}_{1}^{T},\ldots ,{\varvec{y}}_{n}^{T})^{T}$, and $l_{i} =\log f({\varvec{x}}_{i}|{\varvec{a}}_{i};{\varvec{\alpha }}) + \log f({\varvec{y}}_{i}|{\varvec{a}}_{i},{\varvec{b}}_{i};{\varvec{\alpha }},{\varvec{\beta }}) $, where the last two density functions are referred in (3).

Note that

$$\begin{aligned} \begin{aligned}&\text {E}({\varvec{x}}_{i}|{\varvec{a}}_{i}) = {\varvec{h}}_{i}({\varvec{\alpha }},{\varvec{a}}_{i}), \quad \text {Var}({\varvec{x}}_{i}|{\varvec{a}}_{i}) = \sigma ^{2}I, \\&\text {E}({\varvec{y}}_{i}|{\varvec{a}}_{i},{\varvec{b}}_{i}) = {\varvec{g}}_{i}^{-1}({\varvec{\alpha }},{\varvec{\beta }},{\varvec{a}}_{i},{\varvec{b}}_{i}), \quad \text {Var}({\varvec{y}}_{i}|{\varvec{a}}_{i},{\varvec{b}}_{i})=\varOmega _{i},\\ \end{aligned} \end{aligned}$$

where ${\varvec{h}}_{i}(\cdot )$, ${\varvec{g}}_{i}^{-1}(\cdot )$ and $\varOmega _{i}$ are defined in Sect. 3.1. Recall ${\varvec{z}}_{i}= ({\varvec{x}}_{i}^{T},{\varvec{y}}_{i}^{T})^{T}$ and define ${\varvec{\mu }}_{i}=({\varvec{h}}_{i}^{T},{\varvec{g}}_{i}^{T})^{T}$. By directly applying to the results presented by Prentice and Zhao (1991), we have that maximizing (14) using Fisher’s method of scoring is equivalent to solving the following set of estimating equations:

$$\begin{aligned} \partial l/\partial {\varvec{\gamma }}= & {} \sum _{i=1}^{n}U_{i}^{T}\varLambda _{i}^{-1}({\varvec{z}}_{i}- {\varvec{\mu }}_{i}) = \mathbf{0}, \end{aligned}$$

(15)

$$\begin{aligned} \partial l/\partial {\varvec{\omega }}_{i}= & {} V_{i}^{T}\varLambda _{i}^{-1}({\varvec{z}}_{i}- {\varvec{\mu }}_{i}) - D^{-1}{\varvec{\omega }}_{i}=\mathbf{0},\quad i=1,\ldots ,n, \end{aligned}$$

(16)

where $U_{i}$, $\varLambda _{i}$, $V_{i}$, and D are defined in Sect. 3.1 (see also Vonesh et al. 2002).

By writing $U^{T}=[U_{1}^{T},\ldots ,U_{n}^{T}]$, $V^{T}=\text {diag}\{V_{1}^{T},\ldots ,V_{n}^{T}\}$, ${\varvec{z}}^{T}=[{\varvec{z}}_{1}^{T},\ldots ,{\varvec{z}}_{n}^{T}],{\varvec{\mu }}^{T}=[{\varvec{\mu }}_{1}^{T},\ldots ,{\varvec{\mu }}_{n}^{T}]$ and $\varLambda =\text {diag}\{\varLambda _{1},\ldots ,\varLambda _{n}\}$, it can be shown (see Prentice and Zhao 1991) that solving (15) and (16) via Fisher’s method of scoring reduces to iteratively solving the following linear mixed-model equations

$$\begin{aligned} \begin{pmatrix} U^{T}\varLambda ^{-1}U &{} U^{T}\varLambda ^{-1}V \\ V^{T}\varLambda ^{-1}U &{} V^{T}\varLambda ^{-1}V+\tilde{D}^{-1} \end{pmatrix} \begin{pmatrix} \hat{{\varvec{\gamma }}}^{(u+1)} \\ \hat{{\varvec{\omega }}}^{(u+1)} \end{pmatrix} = \begin{pmatrix} U^{T}\varLambda ^{-1}{\varvec{z}}^{*} \\ V^{T}\varLambda ^{-1}{\varvec{z}}^{*} \end{pmatrix}, \quad u=1,2,\ldots \end{aligned}$$

(17)

where $\tilde{D} = \text {diag}\{D,\ldots ,D\}$, ${\varvec{z}}^{*}={\varvec{z}}- {\varvec{\mu }}+ U\hat{{\varvec{\gamma }}}^{(u+1)} + V\hat{{\varvec{\omega }}}^{(u+1)}$ and u is the iteration indicator.

The solution to the Eq. (17) can be obtained by iteratively solving the following equations

$$\begin{aligned} {\left\{ \begin{array}{ll} &{}\sum _{i=1}^{n}U_{i}^{T}\varLambda _{i}^{-1}U_{i}\hat{{\varvec{\gamma }}}^{(u+1)} + \sum _{i=1}^{n}U_{i}\varLambda _{i}^{-1}V_{i}\hat{{\varvec{\omega }}}_{i}^{(u+1)} = \sum _{i=1}^{n}U_{i}\varLambda _{i}^{-1}{\varvec{z}}_{i},\\ &{}V_{i}^{T}\varLambda _{i}^{-1}U_{i}\hat{{\varvec{\gamma }}}^{(u+1)} + (V_{i}^{T}\varLambda _{i}^{-1}V_{i} + D_{i})\hat{{\varvec{\omega }}}_{i}^{(u+1)}=V_{i}^{T}\varLambda _{i}^{-1}{\varvec{z}}_{i}, \quad i=1,\ldots ,n,\end{array}\right. } \end{aligned}$$

(18)

where ${\varvec{z}}_{i}= ({\varvec{x}}_{i}^{T},{\varvec{y}}_{i}^{T})^{T}$ is defined in (7), and where $U_{i}$, $\varLambda _{i}$, and $V_{i}$ are all evaluated at $(\hat{{\varvec{\gamma }}}^{(u)},\hat{{\varvec{\omega }}}_{i}^{(u)})$.

The solution to the Eq. (18) is given in (9) and (11). Therefore, we have shown that for fixed ${\varvec{\lambda }}$, the final estimates $\hat{{\varvec{\gamma }}}$ and $\hat{{\varvec{\omega }}}_{i} =\hat{{\varvec{\omega }}}_{i}(\hat{{\varvec{\gamma }}})$ satisfy the estimating Eqs. (15) and (16) and maximize the complete-data log-likelihood function (14) with respect to ${\varvec{\gamma }}$ and ${\varvec{\omega }}_{i}$. These facts will be used to show the following asymptotic properties of ${\hat{{\varvec{\gamma }}}}$.

1.3 A.3 Consistency

We first note that, for fixed ${\varvec{\lambda }}$, the approximate MLE of ${\varvec{\gamma }}$ will satisfy the set of estimating equations (Vonesh et al. 2002)

$$\begin{aligned} J({\varvec{\gamma }}) = \frac{\partial }{\partial {\varvec{\gamma }}} \prod _{i=1}^{n} f({\varvec{y}}_{i}; {\varvec{\gamma }}) = \frac{\partial }{\partial {\varvec{\gamma }}} \prod _{i=1}^{n} \int \exp \{k_{i}L_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i})\} d{\varvec{\omega }}_{i}= \mathbf{0}. \end{aligned}$$

Under R4, we have

$$\begin{aligned} \begin{aligned} J({\varvec{\gamma }})&= \int \cdots \int \left\{ \sum _{i=1}^{n} k_{i} \frac{\partial }{\partial {\varvec{\gamma }}} L_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i})\right\} \exp \left\{ \sum _{j=1}^{n} k_{i}L_{j}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \right\} d {\varvec{\omega }}_{1}\cdots d{\varvec{\omega }}_{n}\\&= \sum _{i=1}^{n}\sqrt{k_{i}} \int \cdots \int \left( \sqrt{k_{i}} \frac{\partial }{\partial {\varvec{\gamma }}} L_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \right) \exp \left\{ \sum _{j=1}^{n} k_{i}L_{j}({\varvec{\gamma }}, {\varvec{\omega }}_{j}) \right\} d {\varvec{\omega }}_{1}\cdots d{\varvec{\omega }}_{n}\\&= \sum _{i=1}^{n}\sqrt{k_{i}} \left[ \int (\sqrt{k_{i}} L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \exp \{ k_{i}L_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \} d {\varvec{\omega }}_{i}\times \prod _{j\ne i}^{n} \int \exp \{ k_{i}L_{j}({\varvec{\gamma }}, {\varvec{\omega }}_{j}) \} d {\varvec{\omega }}_{j}\right] .\\ \end{aligned} \end{aligned}$$

Now we examine the term $L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\omega }}_{i})$ in the above expression. Recall that $k_{i}L_{i,{\varvec{\gamma }}}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \equiv l_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i}) + \log f({\varvec{a}}_{i}) + \log f({\varvec{b}}_{i})$. Since $y_{ij}'s$ are conditionally independent of each other given ${\varvec{\omega }}_{i}$, under R3, it follows that conditional on ${\varvec{\omega }}_{i}$,

$$\begin{aligned} L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) = k_{i}^{-1} l_{i, {\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) = k_{i}^{-1}U_{i}^{T}\varLambda _{i}^{-1}({\varvec{z}}_{i}- {\varvec{\mu }}_{i}) = O_{p}(k_{i}^{-1/2}). \end{aligned}$$

(19)

Furthermore, under R3 we have

$$\begin{aligned} {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}) = {\varvec{\omega }}_{i}+ O_{p}(k_{i}^{-1/2}). \end{aligned}$$

(20)

Combining the results in (19) and (20), we can show that

$$\begin{aligned} \begin{aligned} L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))&= L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) + O_{p}(k_{i}^{-1/2})\\&= O_{p}(k_{i}^{-1/2}) + O_{p}(k_{i}^{-1/2}) = O_{p}(k_{i}^{-1/2}). \end{aligned} \end{aligned}$$

(21)

Then, by direct application of the Laplace approximation to integrals of the form $\int \exp \{kp(x)\}dx$ and $\int q(x) \exp \{kp(x)\}dx$, where q(x) and p(x) are smooth functions in x with p(x) having a unique maximum at some point $\hat{x}$ , it can be shown Barndorff-Nielsen and Cox (1989) that

$$\begin{aligned} \int \exp \{k_{i}L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\} d{\varvec{\omega }}_{i}= \exp \{k_{i}L_{i}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\} \left( \frac{2\pi }{|k_{i} \hat{L}_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''} |} \right) ^{b/2} (1 + O(k_{i}^{-1})) \end{aligned}$$

and

$$\begin{aligned} \begin{aligned}&\int (\sqrt{k_{i}} L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \exp \{ k_{i}L_{i}({\varvec{\gamma }}, {\varvec{\omega }}_{i}) \} d {\varvec{\omega }}_{i}= \exp \{k_{i}L_{i}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\} \\&\quad \times \left( \frac{2\pi }{|k_{i} \hat{L}_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''} |} \right) ^{b/2} (\sqrt{k_{i}}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) + O_{p}(k_{i}^{-1})), \\ \end{aligned} \end{aligned}$$

where $\hat{L}_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''} = \hat{L}_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''} ({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) $. Because (19) implies $\sqrt{k_{i}}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))=O_{p}(1)$, it follows from R1 that

$$\begin{aligned} (\sqrt{k_{i}}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) + O_{p}(k_{i}^{-1})) \times \prod _{j \ne i}^{n} (1+O_{p}(k_{i}^{-1})) = \sqrt{k_{i}}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) + O_{p}(N^{-1}). \end{aligned}$$

Hence we have

$$\begin{aligned} \begin{aligned} J({\varvec{\gamma }})&= \sum _{i=1}^{n} \sqrt{k_{i}} \exp \{k_{i}L_{i}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\} \left( \frac{2\pi }{|k_{i} \hat{L}_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''} |} \right) ^{b/2} (\sqrt{k_{i}}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) + O_{p}(k_{i}^{-1})) \\&\qquad \times \prod _{j \ne i}^{n} \exp \{k_{i}L_{j}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}_{j}({\varvec{\gamma }}) \} \left( \frac{2\pi }{|k_{i} \hat{L}_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''} |} \right) ^{b/2} (1+O_{p}(k_{i}^{-1})) \\&\quad = K({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) \left[ \sum _{i=1}^{n}k_{i}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) + nO_{p}(N^{-1/2}) \right] \\ \end{aligned} \end{aligned}$$

where $K({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) = \exp \{\sum _{i=1}^{n} k_{i}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))\} \prod _{i=1}^{n} (2\pi /|k_{i}L_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''}|)^{b/2}.$ Since $K({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) \ne 0$ for all ${\varvec{\gamma }}\in \varGamma $, the approximate MLE $\hat{{\varvec{\gamma }}}_{MLE}$ of ${\varvec{\gamma }}$ satisfies

$$\begin{aligned} J({\varvec{\gamma }})|_{{\varvec{\gamma }}=\hat{{\varvec{\gamma }}}_{MLE}} = \mathbf{0}\Longleftrightarrow J_{1}({\varvec{\gamma }},\hat{{\varvec{\omega }}}({\varvec{\gamma }}))|_{{\varvec{\gamma }}=\hat{{\varvec{\gamma }}}_{MLE}} + O_{p}(nN^{-1/2}) = \mathbf{0}, \end{aligned}$$

(22)

where $J_{1}({\varvec{\gamma }},\hat{{\varvec{\omega }}}({\varvec{\gamma }})) = \sum _{i=1}^{n}k_{i}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) = \sum _{i=1}^{n} l_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))$ is the set of estimating equations for ${\varvec{\gamma }}$ conditional on fixed ${\varvec{\lambda }}$, as given in (19). By taking a first-order Taylor series expansion of $J_{1}({\varvec{\gamma }},\hat{{\varvec{\omega }}}({\varvec{\gamma }}))$ about $\hat{{\varvec{\gamma }}}_{MLE}$ and noting that, from (22), $J_{1}(\hat{{\varvec{\gamma }}}_{MLE},\hat{\hat{{\varvec{\omega }}}}({\varvec{\gamma }}_{MLE})) = O_{p}(nN^{-1/2})$, we have

$$\begin{aligned} J_{1}({\varvec{\gamma }},\hat{{\varvec{\omega }}}({\varvec{\gamma }})) = O_{p}(nN^{-1/2}) + J_{1}^{'}({\varvec{\gamma }}^{*},\hat{{\varvec{\omega }}}({\varvec{\gamma }}^{*}))({\varvec{\gamma }}- \hat{{\varvec{\gamma }}}_{MLE}), \end{aligned}$$

(23)

where

$$\begin{aligned} J_{1}^{'}({\varvec{\gamma }}^{*},\hat{{\varvec{\omega }}}({\varvec{\gamma }}^{*})) = \frac{\partial }{\partial {\varvec{\gamma }}^{T}}J_{1}({\varvec{\gamma }},\hat{{\varvec{\omega }}}({\varvec{\gamma }}))|_{{\varvec{\gamma }}= {\varvec{\gamma }}^{*}} = \frac{\partial }{\partial {\varvec{\gamma }}^{T}} \sum _{i=1}^{n} k_{i}L_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }}))|_{{\varvec{\gamma }}= {\varvec{\gamma }}^{*}} \end{aligned}$$

and ${\varvec{\gamma }}^{*}$ is on the line segment joining ${\varvec{\gamma }}$ to $\hat{{\varvec{\gamma }}}_{MLE}$. By applying the chain rule, we have, for any ${\varvec{\gamma }}\in \varGamma $,

$$\begin{aligned} J_{1}^{'}({\varvec{\gamma }},\hat{{\varvec{\omega }}}({\varvec{\gamma }}))&= \sum _{i=1}^{n}\left[ \frac{\partial ^{2}}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^{T}} k_{i}L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})|_{{\varvec{\omega }}_{i}=\hat{{\varvec{\omega }}_{i}}({\varvec{\gamma }})} +\frac{\partial ^{2}}{\partial {\varvec{\gamma }}\partial {\varvec{\omega }}_{i}^{T}} k_{i}L_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})|_{{\varvec{\omega }}_{i}=\hat{{\varvec{\omega }}_{i}}({\varvec{\gamma }})} \frac{\partial {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{T}} \right] \nonumber \\&= \sum _{i=1}^{n}\left[ l_{i,{\varvec{\gamma }}{\varvec{\gamma }}}^{''}({\varvec{\gamma }},{\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) + l_{i,{\varvec{\gamma }}{\varvec{\omega }}}^{''}({\varvec{\gamma }}, {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})) \frac{\partial {\varvec{\hat{\omega }_{i}}}({\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{T}} \right] . \end{aligned}$$

(24)

Note that $\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})$ maximized $k_{i}L_{i}({\varvec{\gamma }},{\varvec{\omega }})$ by definition and this implies

$$\begin{aligned} l_{i,{\varvec{\omega }}}^{'}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})) -D^{-1}\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}) = \mathbf{0}\Longleftrightarrow \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}) = D l_{i,{\varvec{\omega }}}^{'}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})). \end{aligned}$$

Applying the chain rule once again, we have

$$\begin{aligned} \begin{aligned} \frac{\partial \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{T}}&= \frac{\partial }{\partial {\varvec{\gamma }}^{T}}\{ Dl_{i,{\varvec{\omega }}}^{'}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}))\} \\&=D\left[ \frac{\partial ^{2}}{\partial {\varvec{\omega }}_{i} \partial {\varvec{\gamma }}^{T}}l_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\Big |_{{\varvec{\omega }}_{i} = \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})} \right] + D\left[ \frac{\partial ^{2}}{\partial {\varvec{\omega }}_{i} \partial {\varvec{\omega }}_{i}^{T}} l_{i}({\varvec{\gamma }},{\varvec{\omega }}_{i})\Big |_{{\varvec{\omega }}_{i} = \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})} \right] \frac{\partial \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{T}}\\&=Dl_{i,{\varvec{\omega }}{\varvec{\gamma }}}^{''}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})) + Dl_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}))\frac{\partial \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{T}} \end{aligned}. \end{aligned}$$

Solving the above equation for $[\partial {\hat{{\varvec{\omega }}}}_{i}({\varvec{\gamma }})/\partial {\varvec{\gamma }}^{T}]$, we have

$$\begin{aligned} \frac{\partial \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{T}}=\left[ - l_{i,{\varvec{\omega }}{\varvec{\omega }}}^{''}({\varvec{\gamma }}, \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})) + D^{-1} \right] ^{-1} l_{i,{\varvec{\omega }}{\varvec{\gamma }}}^{''}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})). \end{aligned}$$

Substituting this expression of $[\partial \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }})/\partial {\varvec{\gamma }}^{T}]$ in (24), it follows from R1, R3 and R5 that as $n\rightarrow \infty $

$$\begin{aligned} -\frac{1}{n} J_{1}^{'}({\varvec{\gamma }}^{*}, \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*}))&= -\frac{1}{n} \sum _{i=1}^{n} \frac{\partial }{\partial {\varvec{\gamma }}^{T}} l_{i,{\varvec{\gamma }}}^{'}({\varvec{\gamma }},\hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}))\big |_{{\varvec{\gamma }}={\varvec{\gamma }}^{*}}\nonumber \\&=\varOmega ({\varvec{\gamma }}^{*})^{-1} + o_{p}(1_{n}), \end{aligned}$$

(25)

which implies $-\frac{1}{n} J_{1}^{'}({\varvec{\gamma }}^{*}, \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*})) \xrightarrow {\text {p}} \varOmega ({\varvec{\gamma }}^{*})^{-1} $.

It follows from (25) that for sufficiently large n, $-\frac{1}{n} J_{1}^{'}({\varvec{\gamma }}^{*}, \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}^{*}))> \frac{1}{2}\varOmega ({\varvec{\gamma }}^{*})^{-1} > \frac{\lambda _{1}}{2}I$ with probability 1, where $\lambda _{1}$ is defined in R5. This, together with (23) and the fact that for sufficiently large N, $\parallel O_{p}(N^{-1/2})\parallel \le \lambda _{1} \delta /4 $ for any $0< \delta < d_{1}$ ($d_{1}$ as defined in R5), imply that, conditional on ${\varvec{\omega }}= ({\varvec{\omega }}_{1},\ldots , {\varvec{\omega }}_{n})$,

$$\begin{aligned} P_{{\varvec{\omega }}}\left\{ \frac{1}{\delta } ({\varvec{\gamma }}- \hat{{\varvec{\gamma }}}_{MLE})^{T} \left[ \frac{1}{n} J_{1}({\varvec{\gamma }}, \hat{{\varvec{\omega }}}({\varvec{\gamma }})) \right] <0 \right\} \rightarrow 1 \quad as \quad n \rightarrow \infty , N \rightarrow \infty . \end{aligned}$$

Thus, since $\hat{{\varvec{\gamma }}}$ satisfies $J_{1}(\hat{{\varvec{\gamma }}},\hat{{\varvec{\omega }}}({\varvec{\gamma }})) = \mathbf{0}$, it follows using standard arguments that

$$\begin{aligned} \lim _{n,N\rightarrow \infty } P_{{\varvec{\omega }}}\{ \parallel \hat{{\varvec{\gamma }}} - \hat{{\varvec{\gamma }}}_{MLE} \parallel < \delta \} = 1. \end{aligned}$$

Hence, using the bounded convergence theorem, we have

$$\begin{aligned} \lim _{n,N \rightarrow \infty } P\{ \parallel \hat{{\varvec{\gamma }}} - \hat{{\varvec{\gamma }}}_{MLE} \parallel< \delta \} = \text {E}_{{\varvec{\omega }}} \left\{ \lim _{n,N \rightarrow \infty } P_{{\varvec{\omega }}}\{ \parallel \hat{{\varvec{\gamma }}} - \hat{{\varvec{\gamma }}}_{MLE} \parallel < \delta \} \right\} = 1. \end{aligned}$$

(26)

Since $J_{1}(\hat{{\varvec{\gamma }}},\hat{{\varvec{\omega }}}(\hat{{\varvec{\gamma }}})) = \mathbf{0}$ and $J_{1}(\hat{{\varvec{\gamma }}}_{MLE}, \hat{{\varvec{\omega }}}(\hat{{\varvec{\gamma }}}_{MLE}))=O_{p}(nN^{-1/2})$ by (23), it follows from (25) that

$$\begin{aligned} -\frac{1}{n}J_{1}^{'}({\varvec{\gamma }}^{**},\hat{{\varvec{\omega }}}({\varvec{\gamma }}^{**}))(\hat{{\varvec{\gamma }}} - \hat{{\varvec{\gamma }}}_{MLE}) = O_{p}(N^{-1/2}), \end{aligned}$$

(27)

where ${\varvec{\gamma }}^{**}$ is on the line segment joining $\hat{{\varvec{\gamma }}}$ to $\hat{{\varvec{\gamma }}}_{MLE}$. From (27), we know the ${\varvec{\gamma }}^{**}$ is in the interior, it follows from R5 that $J_{1}^{'}({\varvec{\gamma }}^{**},\hat{{\varvec{\omega }}}({\varvec{\gamma }}^{**})) \ge \frac{n}{2}\varOmega ({\varvec{\gamma }}^{**})^{-1}$, where $\varOmega ({\varvec{\gamma }}^{**})^{-1}$ is positive definite. Hence the estimator satisfies $(\hat{{\varvec{\gamma }}} - \hat{{\varvec{\gamma }}}_{MLE}) = O_{p}(N^{-1/2})$, from which it follows (given R7) that

$$\begin{aligned} \begin{aligned} \hat{{\varvec{\gamma }}}&= \hat{{\varvec{\gamma }}}_{MLE} + O_{p}(N^{-1/2}) = {\varvec{\gamma }}_{0} + O_{p}(n^{-1/2}) + O_{p}(N^{-1/2}) \\&= {\varvec{\gamma }}_{0} + O_{p}\left\{ \text {max}\left[ n^{-1/2}, \text {min}(k_{i})^{-1/2}\right] \right\} ,\\ \end{aligned} \end{aligned}$$

the desired result.

1.4 A.4 Asymptotic normality of $\hat{{\varvec{\gamma }}}$

The asymptotic normality of $\hat{{\varvec{\gamma }}}$ will be shown based on the estimation equations (15) and (16). Let

$$\begin{aligned} \varPhi ({\varvec{\gamma }}) = \sum _{i=1}^{n} U_{i}^{T}\varSigma _{i}^{-1}({\varvec{z}}_{i}- U_{i}{\varvec{\gamma }}), \end{aligned}$$

where $U_{i}, \varSigma _{i}$ and ${\varvec{z}}_{i}$ are defined in Sect. 3.1. The estimator $\hat{{\varvec{\gamma }}}$ satisfies $\varPhi (\hat{{\varvec{\gamma }}}) = \mathbf{0}$ at convergence. Noting that $\partial \varPhi ({\varvec{\gamma }})/\partial {\varvec{\gamma }}^{T} = -\sum _{i=1}^{n}U_{i}^{T}\varSigma _{i}^{-1}U_{i}$ is constant for ${\varvec{\gamma }}$, we take a Taylor series expansion of $\varPhi (\hat{{\varvec{\gamma }}})$ around the true parameter ${\varvec{\gamma }}_{0}$:

$$\begin{aligned} \mathbf{0}= \varPhi (\hat{{\varvec{\gamma }}}) \approx \varPhi ({\varvec{\gamma }}_{0}) + \frac{\partial \varPhi ({\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{T}} (\hat{{\varvec{\gamma }}} - {\varvec{\gamma }}_{0}) , \end{aligned}$$

which implies

$$\begin{aligned} \begin{aligned} \sqrt{n}(\hat{{\varvec{\gamma }}} - {\varvec{\gamma }}_{0})&\approx \left[ -\frac{1}{n} \frac{\partial \varPhi ({\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{T}} \right] ^{-1}\left[ \frac{1}{\sqrt{n}}\varPhi ({\varvec{\gamma }}_{0})\right] \\&= \left[ \frac{1}{n} \sum _{i=1}^{n} U_{i}\varSigma _{i}^{-1}U_{i} \right] ^{-1} \left( \frac{1}{\sqrt{n}} \sum _{i=1}^{n} U_{i}^{T}\varSigma _{i}^{-1}(\tilde{{\varvec{r}}}_{i} - U_{i}{\varvec{\gamma }}_{0}) \right) . \end{aligned} \end{aligned}$$

Since $E[U_{i}^{T}\varSigma _{i}^{-1}({\varvec{z}}_{i}- U_{i}{\varvec{\gamma }}_{0})]=0$ and $\text {Cov}[U_{i}^{T}\varSigma _{i}^{-1}({\varvec{z}}_{i}- U_{i}{\varvec{\gamma }}_{0})]=U_{i}^{T}\varSigma _{i}^{-1}U_{i}$, by the Lindeberg Central Limit Theorem, we have

$$\begin{aligned} \left[ \frac{1}{n} \sum _{i=1}^{n} U_{i}\varSigma _{i}^{-1}U_{i} \right] ^{-1/2} \left( \frac{1}{\sqrt{n}} \sum _{i=1}^{n} U_{i}^{T}\varSigma _{i}^{-1}({\varvec{z}}_{i}- U_{i}{\varvec{\gamma }}_{0}) \right) \xrightarrow {\text {d}} N(\mathbf{0}, I). \end{aligned}$$

Noting that $\hat{{\varvec{\gamma }}} = {\varvec{\gamma }}_{0} + o_{p}(1_{N,n})$ and $k_{i}=O(N)$, and using (20), we have

$$\begin{aligned} \begin{aligned} \hat{{\varvec{\omega }}}_{i}(\hat{{\varvec{\gamma }}})&= \hat{{\varvec{\omega }}}_{i}({\varvec{\gamma }}_{0}) + o_{p}(1_{N,n}) = {\varvec{\omega }}_{i} + O_{p}(k_{i}^{-1/2}) + o_{p}(1_{N,n}) \\&= {\varvec{\omega }}_{i} + O_{p}(N^{-1/2}) + o_{p}(1_{N,n}) \\&= {\varvec{\omega }}_{i} + o_{p}(1_{N,n}).\\ \end{aligned} \end{aligned}$$

Hence, it follows by the Law of Large Numbers and R6 that

$$\begin{aligned} \begin{aligned} \left[ \frac{1}{n} \sum _{i=1}^{n} U_{i}\varSigma _{i}^{-1}U_{i} \right] ^{-1/2}&\xrightarrow {\text {p}} \left[ \frac{1}{n} \sum _{i=1}^{n} U_{i}\varSigma _{i}^{-1}U_{i} \right] _{\tilde{{\varvec{\gamma }}}={\varvec{\gamma }}_{0}, \tilde{{\varvec{\omega }}} = {\varvec{\omega }}}^{-1/2}, \quad N \rightarrow \infty , n \rightarrow \infty \\&\xrightarrow {\text {p}} [\varOmega ({\varvec{\gamma }}_{0})]^{1/2}, \quad \quad \quad \quad N \rightarrow \infty , n \rightarrow \infty .\\ \end{aligned} \end{aligned}$$

Using Slutsky’s theorem, we can show that

$$\begin{aligned} \begin{aligned} \sqrt{n}(\hat{{\varvec{\gamma }}} - {\varvec{\gamma }}_{0})&= \left[ \frac{1}{n} \sum _{i=1}^{n} U_{i}\varSigma _{i}^{-1}U_{i} \right] ^{-1/2} \left( \frac{1}{\sqrt{n}} \sum _{i=1}^{n} U_{i}^{T}\varSigma _{i}^{-1}({\varvec{z}}_{i}- U_{i}{\varvec{\gamma }}_{0}) \right) \\&\left[ \frac{1}{n} \sum _{i=1}^{n} U_{i}\varSigma _{i}^{-1}U_{i} \right] ^{-1/2} \\&\xrightarrow {\text {d}} N(\mathbf{0}, \varOmega ({\varvec{\gamma }}_{0})), \quad \text {as} \quad N \rightarrow \infty , n \rightarrow \infty .\\ \end{aligned} \end{aligned}$$

We can extend the foregoing proofs to the case where ${\varvec{\lambda }}$ is unknown by replacing ${\varvec{\lambda }}$ with its consistent estimate $\hat{{\varvec{\lambda }}}$. Note that at the estimate $\hat{{\varvec{\gamma }}}$, the estimate $\hat{{\varvec{\lambda }}}$ given in (12) can be shown to be consistent for ${\varvec{\lambda }}$ as $n \rightarrow \infty $ and $N\rightarrow \infty $ see, e.g., Demidenko (2004).

B Monte Carlo (MC) EM algorithm

The EM algorithm is a standard approach for likelihood estimation in the presence of missing data. In our case, by treating the random effects $({\varvec{a}}_{i}$, ${\varvec{b}}_{i})$ as “missing data”, we can write “complete data” as $\big \{ ({\varvec{x}}_{i},{\varvec{y}}_{i}, {\varvec{a}}_{i}, {\varvec{b}}_{i}), i=1,\ldots , n \big \}$. Let ${\varvec{\theta }}$ be the collection of all parameters. Then, the “complete-data” log-likelihood function for individual i can be expressed as

$$\begin{aligned} l_{c}^{(i)}({\varvec{\theta }}) = \log f({\varvec{y}}_{i}|{\varvec{x}}_{i}^{*},{\varvec{b}}_{i}; {\varvec{\beta }}) + \log f({\varvec{x}}_{i}|{\varvec{a}}_{i};{\varvec{\alpha }},\sigma ^2) + \log f({\varvec{a}}_{i};A) + \log f({\varvec{b}}_{i};B).\nonumber \\ \end{aligned}$$

(28)

The EM algorithm iterates between E-step and M-step until convergence. Let ${\varvec{\theta }}^{(t)}$ be the parameter estimates from the tth EM iteration. The E-step for individual i at the $(t+1)$th EM iteration can be expressed as

$$\begin{aligned} Q_{i}({\varvec{\theta }}|{\varvec{\theta }}^{(t)})= & {} \int \int [\log f({\varvec{y}}_{i}|{\varvec{x}}_{i}^{*},{\varvec{b}}_{i};{\varvec{\beta }}) + \log f({\varvec{x}}_{i}|{\varvec{a}}_{i};{\varvec{\alpha }},\sigma ^2) \nonumber \\&+\log f({\varvec{a}}_{i};A) + \log f({\varvec{b}}_{i};B) ] f({\varvec{a}}_{i},{\varvec{b}}_{i}|{\varvec{y}}_{i};{\varvec{\theta }}^{(t)}) d{\varvec{a}}_{i}d{\varvec{b}}_{i}\end{aligned}$$

(29)

The above E-step again involves an intractable integration. However, because expression (29) is an expectation with respect to $f({\varvec{a}}_{i},{\varvec{b}}_{i}|{\varvec{y}}_{i};{\varvec{\theta }}^{(t)})$, and it can be evaluated using the MCEM algorithm (Wei and Tanner 1990; Booth and Hobert 1999; Ibrahim et al. 1999). Specifically, we may use the Gibbs sampler to generate many samples from $f({\varvec{a}}_{i}, {\varvec{b}}_{i}|{\varvec{y}}_{i}; {\varvec{\theta }}^{(t)})$ by iteratively sampling from the full conditionals $[{\varvec{a}}_{i}|{\varvec{x}}_{i},{\varvec{y}}_{i},{\varvec{b}}_{i};{\varvec{\theta }}^{(t)}]$, and $[{\varvec{b}}_{i}|{\varvec{x}}_{i},{\varvec{y}}_{i},{\varvec{a}}_{i};{\varvec{\theta }}^{(t)}]$. Monte Carlo samples from each of the above full conditionals can be generated using rejection sampling methods.

After generating large random samples from the conditional distribution $f({\varvec{a}}_{i},{\varvec{b}}_{i}|{\varvec{y}}_{i};{\varvec{\theta }}^{(t)})$, we can then approximate the expectation $Q_{i}({\varvec{\theta }}|{\varvec{\theta }}^{(t)})$ in the E-step by its empirical mean, with “missing data” replaced by simulated values. Then the M-step, which maximizes $\sum _{i=1}^{n} Q_{i}({\varvec{\theta }}|{\varvec{\theta }}^{(t)})$, is like a complete-data maximization, so complete-data optimization procedures such as the Newton-Raphson method may be used to update the parameter estimates. At convergence, we obtain the MLE of ${\varvec{\theta }}$ or a possibly local maximum. We may try different starting values to roughly check if an MLE (i.e., global maximum) is obtained. Thus, for the MCEM algorithm, a major computational challenge is the implementation of the E-step. Although conceptually the MCEM method is not new, implementation for a specific problem like the complicated models described above can be quite challenging and tedious, since it not only involves non-trial programming but it also involves various convergence issues such as very slow or non-convergence.

To obtain the variance-covariance matrix of the MLE $\hat{{\varvec{\theta }}}$, we could consider the following approximate formula (McLachlan and Krishnan 1997). Let $S_{c}^{(i)} = \partial l_{c}^{(i)}/\partial {\varvec{\theta }}$, where $l_{c}^{(i)}$ is the complete-data log-likelihood for individual i. Then an approximate formula for the variance-covariance matrix of ${\varvec{\hat{\theta }}}$ is

$$\begin{aligned} \texttt {Cov}({\varvec{\hat{\theta }}}) = \left[ \sum _{i=1}^{n} \texttt {E}(S_{c}^{(i)}|{\varvec{y}}_{i};{\varvec{\hat{\theta }}})\texttt {E}(S_{c}^{(i)}| {\varvec{y}}_{i};{\varvec{\hat{\theta }}})^{T} \right] ^{-1}, \end{aligned}$$

where the expectation can be approximated by Monte Carlo empirical means, as above.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Wu, L. An approximate method for generalized linear and nonlinear mixed effects models with a mechanistic nonlinear covariate measurement error model. Metrika 82, 471–499 (2019). https://doi.org/10.1007/s00184-018-0690-z

Download citation

Received: 30 April 2018
Published: 17 October 2018
Issue Date: 01 May 2019
DOI: https://doi.org/10.1007/s00184-018-0690-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An approximate method for generalized linear and nonlinear mixed effects models with a mechanistic nonlinear covariate measurement error model

Abstract

Access this article

Similar content being viewed by others

Simultaneous Variable Selection and Estimation in Generalized Semiparametric Mixed Effects Modeling of Longitudinal Data

The additive hazard estimator is consistent for continuous-time marginal structural models

A two-step estimator for generalized linear models for longitudinal data with time-varying measurement error

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendices

A Proofs of asymptotic properties

1.1 A.1 Regularity conditions

1.2 A.2 Estimating equations

1.3 A.3 Consistency

1.4 A.4 Asymptotic normality of \(\hat{{\varvec{\gamma }}}\)

B Monte Carlo (MC) EM algorithm

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An approximate method for generalized linear and nonlinear mixed effects models with a mechanistic nonlinear covariate measurement error model

Abstract

Access this article

Similar content being viewed by others

Simultaneous Variable Selection and Estimation in Generalized Semiparametric Mixed Effects Modeling of Longitudinal Data

The additive hazard estimator is consistent for continuous-time marginal structural models

A two-step estimator for generalized linear models for longitudinal data with time-varying measurement error

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendices

A Proofs of asymptotic properties

1.1 A.1 Regularity conditions

1.2 A.2 Estimating equations

1.3 A.3 Consistency

1.4 A.4 Asymptotic normality of \(\hat{{\varvec{\gamma }}}\)

B Monte Carlo (MC) EM algorithm

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation