Bivariate Mixed Effects Analysis of Clustered Data with Large Cluster Sizes

Zhang, Daowen; Sun, Jie Lena; Pieper, Karen

doi:10.1007/s12561-015-9140-x

Bivariate Mixed Effects Analysis of Clustered Data with Large Cluster Sizes

Published: 22 January 2016

Volume 8, pages 220–233, (2016)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

Daowen Zhang¹,
Jie Lena Sun² &
Karen Pieper²

201 Accesses
2 Citations
Explore all metrics

Abstract

Linear mixed effects models are widely used to analyze a clustered response variable. Motivated by a recent study to examine and compare the hospital length of stay (LOS) between patients undertaking percutaneous coronary intervention (PCI) and coronary artery bypass graft (CABG) from several international clinical trials, we proposed a bivariate linear mixed effects model for the joint modeling of clustered PCI and CABG LOSs where each clinical trial is considered a cluster. Due to the large number of patients in some trials, commonly used commercial statistical software for fitting (bivariate) linear mixed models failed to run since it could not allocate enough memory to invert large dimensional matrices during the optimization process. We consider ways to circumvent the computational problem in the maximum likelihood (ML) inference and restricted maximum likelihood (REML) inference. Particularly, we developed an expected and maximization (EM) algorithm for the REML inference and presented an ML implementation using existing software. The new REML EM algorithm is easy to implement and computationally stable and efficient. With this REML EM algorithm, we could analyze the LOS data and obtained meaningful results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bivariate beta-binomial model using Gaussian copula for bivariate meta-analysis of two binary outcomes with low incidence

Article 06 March 2019

Addressing unmeasured confounding bias with a prior knowledge guided approach: coronary artery bypass grafting (CABG) versus percutaneous coronary intervention (PCI) in patients with stable ischemic heart disease

Article 21 June 2022

Regression-based estimation of heterogeneous treatment effects when extending inferences from a randomized trial to a target population

Article 10 January 2023

References

Chan M, Sun J, Newby L, Lokhnygina Y, White HD, Moliterno DJ, Throux P, Ohman EM, Simoons ML, Mahaffey KW, Pieper KS, Giugliano RG, Armstrong PW, Califf RM, Van de Werf F, Harrington RA (2012) Trends in clinical trials of non-ST-segment elevation acute coronary syndromes over 15 years. Int J Cardiol 167(2):548–554. doi:10.1016/j.ijcard.2012.01.065
Article Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:138
MathSciNet MATH Google Scholar
Diggle PJ, Heagerty P, Liang K-Y, Zeger SL (2002) Analysis of longitudinal data, 2nd edn. Oxford University Press, Oxford
MATH Google Scholar
Harville DA (1994) Bayesian inference for variance components using only error contrasts. Biometrika 61:383–385
Article MathSciNet MATH Google Scholar
Henderson CR (1984) Applications of linear models in animal breeding. University of Guelph, Guelph
Google Scholar
Laird NM, Ware JH (1982) Random effects models for longitudinal data. Biometrics 38:963–974
Article MATH Google Scholar
SAS Institute Inc. (2013) SAS/STAT 9.3 User’s Guide, Cary: SAS Institute Inc

Download references

Acknowledgments

The work of D. Zhang was supported by NIH Grant R01 CA85848-12. The work of J. L. Sun and K. Pieper was supported through a grant from the Duke Clinical research Institute. The authors are grateful to Dr. Eric Peterson for his institutional financial support as well as providing the LOS data, without which this research would not have been possible. The authors declare no conflict of interest.

Author information

Authors and Affiliations

Department of Statistics, North Carolina State University, Raleigh, NC, 27695, USA
Daowen Zhang
Duke Clinical Research Institute, Duke University, Durham, NC, 27708, USA
Jie Lena Sun & Karen Pieper

Authors

Daowen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Lena Sun
View author publications
You can also search for this author in PubMed Google Scholar
Karen Pieper
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daowen Zhang.

Appendix: Properties of the REML EM Algorithm

The following theorem states the general EM algorithm for the REML estimation.

Theorem 1

Suppose $f(y|b; \beta , \theta )$ is the conditional probability density function of y given random effects b and $f(b; \theta )$ is the probability density function of the random effects b. Assume the following “REML” likelihood

$$\begin{aligned} L_{R}(\theta ; y) = \int f(y|b; \beta , \theta ) f(b; \theta ) \mathrm{d}\beta \mathrm{d}b \end{aligned}$$

exists, so that $g(\beta , b|y; \theta ) = f(y|b; \beta , \theta ) f(b; \theta )/L_{R}(\theta ; y)$ is a probability density function. Given estimate $\theta ^{(t)}$ at the tth iteration, define the Q-function for the REML algorithm as follows:

$$\begin{aligned} Q_\mathrm{R}(\theta |\theta ^{(t)}) = {E}_\mathrm{g} [ \log \{f(y|b; \beta , \theta ) f(b; \theta ) \}], \end{aligned}$$

where ${E}_\mathrm{g}$ stands for the expectation taken with respect to $g(\beta , b|y; \theta ^{(t)})$. Denote $\theta ^{(t+1)}$ the update of $\theta $ obtained by maximizing $Q_\mathrm{R}(\theta |\theta ^{(t)})$ with respect to $\theta $, then

$$\begin{aligned} \ell _{R}(\theta ^{(t+1)}; y) \ge \ell _{R}(\theta ^{(t)}; y), \quad \text{ for } \text{ all } t, \end{aligned}$$

where $\ell _{R}(\theta ; y) = \log L_{R}(\theta ; y)$.

Proof

By the definition of $L_{R}(\theta ; y)$, we have

$$\begin{aligned} \log \left\{ f(y|b; \beta , \theta ) f(b; \theta ) \right\} = \log g(\beta , b; \theta ) + \ell _{R}(\theta ; y). \end{aligned}$$

Taking expectation with respect to $g(\beta , b|y; \theta ^{(t)})$ in both sides of the above equation leads to

$$\begin{aligned} Q_\mathrm{R}(\theta |\theta ^{(t)}) = {E}_\mathrm{g} \log g(\beta , b|y; \theta ) + \ell _{R}(\theta ; y). \end{aligned}$$

Denote $H(\theta ) = {E}_\mathrm{g} \log g(\beta , b|y; \theta )$. Then

$$\begin{aligned}&\ell _{R}\left( \theta ^{(t+1)}; y\right) - \ell _{R}\left( \theta ^{(t)}; y\right) \\&\quad = Q_\mathrm{R}\left( \theta ^{(t+1)}|\theta ^{(t)}\right) - Q_\mathrm{R}\left( \theta ^{(t)}|\theta ^{(t)}\right) - \left\{ H(\theta ^{(t+1)}) - H(\theta ^{(t)})\right\} . \end{aligned}$$

By the definition of $H(\theta )$, ${E}_\mathrm{g}$ and Jensen’s inequality, we have

$$\begin{aligned} H(\theta ^{(t+1)}) - H(\theta ^{(t)})= & {} \mathrm{E}_{g} \log \left\{ \frac{g(\beta , b|y; \theta ^{(t+)})}{g(\beta , b|y; \theta ^{(t)})} \right\} \\\le & {} \log \mathrm{E}_{g} \left\{ \frac{g(\beta , b|y; \theta ^{(t+)})}{g(\beta , b|y; \theta ^{(t)})} \right\} \\= & {} \log (1) =0. \end{aligned}$$

Since $\theta ^{(t+1)}$ maximizes $Q_\mathrm{R}(\theta |\theta ^{(t)})$, it follows that $Q_\mathrm{R}(\theta ^{(t+1)}|\theta ^{(t)}) \ge Q_\mathrm{R}(\theta ^{(t)}|\theta ^{(t)})$. Therefore

$$\begin{aligned} \ell _{R}(\theta ^{(t+1)}; y) \ge \ell _{R}(\theta ^{(t)}; y), \quad \text{ for } \text{ all } t. \end{aligned}$$

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, D., Sun, J.L. & Pieper, K. Bivariate Mixed Effects Analysis of Clustered Data with Large Cluster Sizes. Stat Biosci 8, 220–233 (2016). https://doi.org/10.1007/s12561-015-9140-x

Download citation

Received: 11 June 2014
Accepted: 25 December 2015
Published: 22 January 2016
Issue Date: October 2016
DOI: https://doi.org/10.1007/s12561-015-9140-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bivariate Mixed Effects Analysis of Clustered Data with Large Cluster Sizes

Abstract

Access this article

Similar content being viewed by others

Bivariate beta-binomial model using Gaussian copula for bivariate meta-analysis of two binary outcomes with low incidence

Addressing unmeasured confounding bias with a prior knowledge guided approach: coronary artery bypass grafting (CABG) versus percutaneous coronary intervention (PCI) in patients with stable ischemic heart disease

Regression-based estimation of heterogeneous treatment effects when extending inferences from a randomized trial to a target population

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Properties of the REML EM Algorithm

Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bivariate Mixed Effects Analysis of Clustered Data with Large Cluster Sizes

Abstract

Access this article

Similar content being viewed by others

Bivariate beta-binomial model using Gaussian copula for bivariate meta-analysis of two binary outcomes with low incidence

Addressing unmeasured confounding bias with a prior knowledge guided approach: coronary artery bypass grafting (CABG) versus percutaneous coronary intervention (PCI) in patients with stable ischemic heart disease

Regression-based estimation of heterogeneous treatment effects when extending inferences from a randomized trial to a target population

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Properties of the REML EM Algorithm

Appendix: Properties of the REML EM Algorithm

Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation