Abstract
Linear mixed effects models are widely used to analyze a clustered response variable. Motivated by a recent study to examine and compare the hospital length of stay (LOS) between patients undertaking percutaneous coronary intervention (PCI) and coronary artery bypass graft (CABG) from several international clinical trials, we proposed a bivariate linear mixed effects model for the joint modeling of clustered PCI and CABG LOSs where each clinical trial is considered a cluster. Due to the large number of patients in some trials, commonly used commercial statistical software for fitting (bivariate) linear mixed models failed to run since it could not allocate enough memory to invert large dimensional matrices during the optimization process. We consider ways to circumvent the computational problem in the maximum likelihood (ML) inference and restricted maximum likelihood (REML) inference. Particularly, we developed an expected and maximization (EM) algorithm for the REML inference and presented an ML implementation using existing software. The new REML EM algorithm is easy to implement and computationally stable and efficient. With this REML EM algorithm, we could analyze the LOS data and obtained meaningful results.
Similar content being viewed by others
References
Chan M, Sun J, Newby L, Lokhnygina Y, White HD, Moliterno DJ, Throux P, Ohman EM, Simoons ML, Mahaffey KW, Pieper KS, Giugliano RG, Armstrong PW, Califf RM, Van de Werf F, Harrington RA (2012) Trends in clinical trials of non-ST-segment elevation acute coronary syndromes over 15 years. Int J Cardiol 167(2):548–554. doi:10.1016/j.ijcard.2012.01.065
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:138
Diggle PJ, Heagerty P, Liang K-Y, Zeger SL (2002) Analysis of longitudinal data, 2nd edn. Oxford University Press, Oxford
Harville DA (1994) Bayesian inference for variance components using only error contrasts. Biometrika 61:383–385
Henderson CR (1984) Applications of linear models in animal breeding. University of Guelph, Guelph
Laird NM, Ware JH (1982) Random effects models for longitudinal data. Biometrics 38:963–974
SAS Institute Inc. (2013) SAS/STAT 9.3 User’s Guide, Cary: SAS Institute Inc
Acknowledgments
The work of D. Zhang was supported by NIH Grant R01 CA85848-12. The work of J. L. Sun and K. Pieper was supported through a grant from the Duke Clinical research Institute. The authors are grateful to Dr. Eric Peterson for his institutional financial support as well as providing the LOS data, without which this research would not have been possible. The authors declare no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Appendix: Properties of the REML EM Algorithm
Appendix: Properties of the REML EM Algorithm
The following theorem states the general EM algorithm for the REML estimation.
Theorem 1
Suppose \(f(y|b; \beta , \theta )\) is the conditional probability density function of y given random effects b and \(f(b; \theta )\) is the probability density function of the random effects b. Assume the following “REML” likelihood
exists, so that \(g(\beta , b|y; \theta ) = f(y|b; \beta , \theta ) f(b; \theta )/L_{R}(\theta ; y)\) is a probability density function. Given estimate \(\theta ^{(t)}\) at the tth iteration, define the Q-function for the REML algorithm as follows:
where \({E}_\mathrm{g}\) stands for the expectation taken with respect to \(g(\beta , b|y; \theta ^{(t)})\). Denote \(\theta ^{(t+1)}\) the update of \(\theta \) obtained by maximizing \(Q_\mathrm{R}(\theta |\theta ^{(t)})\) with respect to \(\theta \), then
where \(\ell _{R}(\theta ; y) = \log L_{R}(\theta ; y)\).
Proof
By the definition of \(L_{R}(\theta ; y)\), we have
Taking expectation with respect to \(g(\beta , b|y; \theta ^{(t)})\) in both sides of the above equation leads to
Denote \(H(\theta ) = {E}_\mathrm{g} \log g(\beta , b|y; \theta )\). Then
By the definition of \(H(\theta )\), \({E}_\mathrm{g}\) and Jensen’s inequality, we have
Since \(\theta ^{(t+1)}\) maximizes \(Q_\mathrm{R}(\theta |\theta ^{(t)})\), it follows that \(Q_\mathrm{R}(\theta ^{(t+1)}|\theta ^{(t)}) \ge Q_\mathrm{R}(\theta ^{(t)}|\theta ^{(t)})\). Therefore
\(\square \)
Rights and permissions
About this article
Cite this article
Zhang, D., Sun, J.L. & Pieper, K. Bivariate Mixed Effects Analysis of Clustered Data with Large Cluster Sizes. Stat Biosci 8, 220–233 (2016). https://doi.org/10.1007/s12561-015-9140-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-015-9140-x