Skip to main content
Log in

Bivariate Mixed Effects Analysis of Clustered Data with Large Cluster Sizes

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Linear mixed effects models are widely used to analyze a clustered response variable. Motivated by a recent study to examine and compare the hospital length of stay (LOS) between patients undertaking percutaneous coronary intervention (PCI) and coronary artery bypass graft (CABG) from several international clinical trials, we proposed a bivariate linear mixed effects model for the joint modeling of clustered PCI and CABG LOSs where each clinical trial is considered a cluster. Due to the large number of patients in some trials, commonly used commercial statistical software for fitting (bivariate) linear mixed models failed to run since it could not allocate enough memory to invert large dimensional matrices during the optimization process. We consider ways to circumvent the computational problem in the maximum likelihood (ML) inference and restricted maximum likelihood (REML) inference. Particularly, we developed an expected and maximization (EM) algorithm for the REML inference and presented an ML implementation using existing software. The new REML EM algorithm is easy to implement and computationally stable and efficient. With this REML EM algorithm, we could analyze the LOS data and obtained meaningful results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chan M, Sun J, Newby L, Lokhnygina Y, White HD, Moliterno DJ, Throux P, Ohman EM, Simoons ML, Mahaffey KW, Pieper KS, Giugliano RG, Armstrong PW, Califf RM, Van de Werf F, Harrington RA (2012) Trends in clinical trials of non-ST-segment elevation acute coronary syndromes over 15 years. Int J Cardiol 167(2):548–554. doi:10.1016/j.ijcard.2012.01.065

    Article  Google Scholar 

  2. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:138

    MathSciNet  MATH  Google Scholar 

  3. Diggle PJ, Heagerty P, Liang K-Y, Zeger SL (2002) Analysis of longitudinal data, 2nd edn. Oxford University Press, Oxford

    MATH  Google Scholar 

  4. Harville DA (1994) Bayesian inference for variance components using only error contrasts. Biometrika 61:383–385

    Article  MathSciNet  MATH  Google Scholar 

  5. Henderson CR (1984) Applications of linear models in animal breeding. University of Guelph, Guelph

    Google Scholar 

  6. Laird NM, Ware JH (1982) Random effects models for longitudinal data. Biometrics 38:963–974

    Article  MATH  Google Scholar 

  7. SAS Institute Inc. (2013) SAS/STAT 9.3 User’s Guide, Cary: SAS Institute Inc

Download references

Acknowledgments

The work of D. Zhang was supported by NIH Grant R01 CA85848-12. The work of J. L. Sun and K. Pieper was supported through a grant from the Duke Clinical research Institute. The authors are grateful to Dr. Eric Peterson for his institutional financial support as well as providing the LOS data, without which this research would not have been possible. The authors declare no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daowen Zhang.

Appendix: Properties of the REML EM Algorithm

Appendix: Properties of the REML EM Algorithm

The following theorem states the general EM algorithm for the REML estimation.

Theorem 1

Suppose \(f(y|b; \beta , \theta )\) is the conditional probability density function of y given random effects b and \(f(b; \theta )\) is the probability density function of the random effects b. Assume the following “REML” likelihood

$$\begin{aligned} L_{R}(\theta ; y) = \int f(y|b; \beta , \theta ) f(b; \theta ) \mathrm{d}\beta \mathrm{d}b \end{aligned}$$

exists, so that \(g(\beta , b|y; \theta ) = f(y|b; \beta , \theta ) f(b; \theta )/L_{R}(\theta ; y)\) is a probability density function. Given estimate \(\theta ^{(t)}\) at the tth iteration, define the Q-function for the REML algorithm as follows:

$$\begin{aligned} Q_\mathrm{R}(\theta |\theta ^{(t)}) = {E}_\mathrm{g} [ \log \{f(y|b; \beta , \theta ) f(b; \theta ) \}], \end{aligned}$$

where \({E}_\mathrm{g}\) stands for the expectation taken with respect to \(g(\beta , b|y; \theta ^{(t)})\). Denote \(\theta ^{(t+1)}\) the update of \(\theta \) obtained by maximizing \(Q_\mathrm{R}(\theta |\theta ^{(t)})\) with respect to \(\theta \), then

$$\begin{aligned} \ell _{R}(\theta ^{(t+1)}; y) \ge \ell _{R}(\theta ^{(t)}; y), \quad \text{ for } \text{ all } t, \end{aligned}$$

where \(\ell _{R}(\theta ; y) = \log L_{R}(\theta ; y)\).

Proof

By the definition of \(L_{R}(\theta ; y)\), we have

$$\begin{aligned} \log \left\{ f(y|b; \beta , \theta ) f(b; \theta ) \right\} = \log g(\beta , b; \theta ) + \ell _{R}(\theta ; y). \end{aligned}$$

Taking expectation with respect to \(g(\beta , b|y; \theta ^{(t)})\) in both sides of the above equation leads to

$$\begin{aligned} Q_\mathrm{R}(\theta |\theta ^{(t)}) = {E}_\mathrm{g} \log g(\beta , b|y; \theta ) + \ell _{R}(\theta ; y). \end{aligned}$$

Denote \(H(\theta ) = {E}_\mathrm{g} \log g(\beta , b|y; \theta )\). Then

$$\begin{aligned}&\ell _{R}\left( \theta ^{(t+1)}; y\right) - \ell _{R}\left( \theta ^{(t)}; y\right) \\&\quad = Q_\mathrm{R}\left( \theta ^{(t+1)}|\theta ^{(t)}\right) - Q_\mathrm{R}\left( \theta ^{(t)}|\theta ^{(t)}\right) - \left\{ H(\theta ^{(t+1)}) - H(\theta ^{(t)})\right\} . \end{aligned}$$

By the definition of \(H(\theta )\), \({E}_\mathrm{g}\) and Jensen’s inequality, we have

$$\begin{aligned} H(\theta ^{(t+1)}) - H(\theta ^{(t)})= & {} \mathrm{E}_{g} \log \left\{ \frac{g(\beta , b|y; \theta ^{(t+)})}{g(\beta , b|y; \theta ^{(t)})} \right\} \\\le & {} \log \mathrm{E}_{g} \left\{ \frac{g(\beta , b|y; \theta ^{(t+)})}{g(\beta , b|y; \theta ^{(t)})} \right\} \\= & {} \log (1) =0. \end{aligned}$$

Since \(\theta ^{(t+1)}\) maximizes \(Q_\mathrm{R}(\theta |\theta ^{(t)})\), it follows that \(Q_\mathrm{R}(\theta ^{(t+1)}|\theta ^{(t)}) \ge Q_\mathrm{R}(\theta ^{(t)}|\theta ^{(t)})\). Therefore

$$\begin{aligned} \ell _{R}(\theta ^{(t+1)}; y) \ge \ell _{R}(\theta ^{(t)}; y), \quad \text{ for } \text{ all } t. \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, D., Sun, J.L. & Pieper, K. Bivariate Mixed Effects Analysis of Clustered Data with Large Cluster Sizes. Stat Biosci 8, 220–233 (2016). https://doi.org/10.1007/s12561-015-9140-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-015-9140-x

Keywords

Navigation