Abstract
Generalized linear mixed models are commonly used when modeling counts or dichotomous observations on subjects within clusters such as patients in hospitals. When the sample sizes at the cluster levels are large, Bayesian inference about parameters of generalized linear mixed models using Markov Chain Monte Carlo sampling can be computationally slow. Standard large sample approximations can provide reasonable approximation for inference about cluster-level parameters which are near the “middle” but not necessarily for those parameters away from the middle. We provide an approach to simulating from the posterior distribution that gives better approximation when the sample sizes at the cluster levels are large and a multivariate normal prior or the default flat prior is used.
Similar content being viewed by others
References
Berger JO (1985) Statistical decision theory and Bayesian analysis, 2nd edn. Springer, New York
Berk R (1972) Consistency and asymptotic normality of MLE’s for exponential models. Ann Math Stat 43(1):193–204
Clayton DG (1996) Generalized linear mixed models. In: Gilks WR, Richardson S, Spiegelhalter DJ (eds) Markov chain monte carlo in practice. Chapman and Hall, London, pp 275–301
Dumouchel W, Volinsky C, Johnson T, Cortes C, and Pregibon D (1999). Squashing flat files flatter. In: Proceedings of the Fifth ACM conference on knowledge discovery and data mining. ACM Press, New York, pp 6–15
Fahrmeir L, Kaufman H (1985) Consistency and asymptotic normality of the maximu likelihood estimator in generalized linear models. Ann Stat 13(1):342–368
Guha S (2010) Posterior simulation in countable mixture models for large datasets. J Am Stat Assoc 105:775–786
Guha S, Ryan L, Morara M (2009) Gauss–Seidel estimation of generalized linear mixed models with application to poisson modeling of spatially varying disease rates. J Comput Gr Stat 18:818–837
Kipnis P, Escobar G, Draper D (2010) Effect of choice of estimation method on inter-hospital mortality rate comparisons. Med Care 48(5):458–465
Madigan D, Raghavan N, Dumouchel W, Nason M, Posse C, Ridgeway G (2002) Likelihood-based data squashing: a modeling approach to instance construction. Data Min Knowl Discov 6:173–190
Normand SL, Glickman ME, Gatsonis CA (1997) Statistical methods for profiling providers of medical care: issues and applications. J Am Stat Assoc 92:803–814
Pennell ML, Dunson DB (2007) Fitting semiparametric random effects models to large data sets. Biostatistics 8(4):821–834
Render M, Kim H, Deddens JA, Sivaganesan S (2005) Variation in outcomes in veterans affairs intensive care units with a computerized severity measure. Crit Care Med 33:930–939
Spiegelhalter DJ, Aylin P, Best NG, Evans SJW, Murray GD (2002) Commissioned analysis of surgical performance using routine data: lessons from the Bristol inquiry. JRSS-A—Stat Soc 165:191–221
Zeger SL, Karim MR (1991) Generalized linear models with random effects: a Gibbs sampling approach. J Am Stat Assoc 86:79–86
Acknowledgements
We would like to thank the anonymous Associate Editor and two Referees for their very useful comments and suggestions. Their comments have helped improve the paper considerably.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest statement
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Annette Christianson's work was performed in affiliation with IPEC when the author was an employee there.
Appendices
Appendix A.1: Proof of Lemma
Proof
(a) We recall that k and q are the dimensions of \({\varvec{\alpha }}\) and \({\varvec{\beta }}\), respectively. Here, we will assume that \(k>q\); the proof for the other case is similar. Using (2.11), and writing the normal density for \({\varvec{\theta }}=({\varvec{\alpha }},{\varvec{\beta }})\) as the product of marginal and conditional densities,
\(\square\)
where \(N({\varvec{x}}:{\varvec{\mu }},\Sigma )\) is the multivariate normal density with mean \({\varvec{\mu }}\) and variance \(\Sigma\), \(C_1=S_{ba}S_a^{-1}\), \(W=S_{bb}- S_{ba}S_{aa}^{-1}S_{ab}\), \(Q_1= ({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}})^\prime S_a^{-1}({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}})\), \(Q_2=(C_1{\varvec{\alpha }}-{\varvec{b}}_1)(V_0+W)^{-1}(C_1{\varvec{\alpha }}-{\varvec{b}}_1)^\prime\), and \({\varvec{b}}_1={\varvec{\beta }}_0-{\hat{{\varvec{\beta }}}}+C_1{\hat{{\varvec{\alpha }}}}\).
Letting \(C_2=\begin{pmatrix} &{} C_1 &{} \\ {\varvec{0}}&{} &{} I_{k-q} \end{pmatrix}\), a \(k\times k\) matrix, where \({\varvec{0}}\) is a \((k-q)\times q\) matrix of zeroes and \(I_{k-q}\) is the identity matrix of order \((k-q)\); \({\varvec{b}}_2=\begin{pmatrix} {\varvec{b}}_1\\ {\varvec{0}}\end{pmatrix}\), a vector of length k; and \(T_\lambda = \begin{pmatrix} V_0+W &{} 0\\ 0 &{} \lambda I_{k-q} \end{pmatrix}\) for \(\lambda >0\); we can show that
where \(C_3= C_2^\prime T_\lambda ^{-1}C_2\) and \({\varvec{\alpha }}_2= ({\varvec{0}},I_{k-q}){\varvec{\alpha }}\). Thus, using (5.2), (5.3), and some routine algebra, we have
where \({{\tilde{{\varvec{\alpha }}}}}_\lambda = (S_a^{-1}+C_3)^{-1}(S_a^{-1}{\hat{{\varvec{\alpha }}}} + C_3C_2^{-1}{\varvec{b}}_2)\), and \(V_{a,\lambda }= (S_a^{-1}+C_3)^{-1}\).
Now, letting \(\lambda \rightarrow \infty\), \({{\tilde{{\varvec{\alpha }}}}}_\lambda\) converges to
and \(V_{a,\lambda }\) converges to \(V_a= (S_a^{-1}+C)^{-1}\) where
and \(T = \begin{pmatrix} (V_0+W)^{-1} &{} 0\\ 0 &{} 0 \end{pmatrix}\).
(b) For \(p({\varvec{\beta }})=1\), we let \(V_0= \tau ^2I\) and let \(\tau ^2 \rightarrow \infty\) in the result (3.3) for normal prior. In this limit, clearly, \(C\rightarrow 0\), and hence \({{\tilde{{\varvec{\alpha }}}}}\rightarrow {\hat{{\varvec{\alpha }}}}\) and \(V_a\rightarrow S_a\), giving (3.4).
1.1 Appendix A.2: Three-Step Algorithm, TSA
Step 1:
Fit a fixed effects generalized linear regression model (e.g., using PROC GENMOD in SAS with site as a class variable) with no overall intercept, and different site-specific intercepts for each of s sites, and common fixed effect (coefficients) \({\varvec{\beta }}\) for the q covariates. Here, all \(s+q\) parameters are regarded as fixed effects.
Obtain the MLE, \({\hat{{\varvec{\alpha }}}}\), for the s-dimensional unit specific intercept vector, and \({\hat{{\varvec{\beta }}}}\) for the q-dimensional covariate coefficients \({\varvec{\beta }}\) and their associated estimated covariance matrix (see 2.8).
Step 2:
Diagonalize \(S_a\) using an orthogonal matrix P so that \(P^\prime SP =diag(s^2_1,\ldots ,s^2_k)\).
Transform \({\hat{{\varvec{\alpha }}}}\) to \({\hat{{\varvec{\eta }}}}=P{\hat{{\varvec{\alpha }}}}\), and let \({\hat{{\varvec{\eta }}}}=({\hat{\eta }}_{1},\ldots ,{\hat{\eta }}_{k})^\prime\).
Fit a normal hierarchical Bayesian model for \({\hat{\eta _i}}\)’s (\(i=1,\ldots ,k\)), assuming \({\hat{\eta _i}} \sim N(\eta _i,s_i^2)\), and simulate a MCMC sample of size M from the posterior distribution of \({\varvec{\eta }}=(\eta _1,\ldots ,\eta _k)^\prime\), which we denote by \({\varvec{\eta }}_1,\ldots ,{\varvec{\eta }}_M\). Transform \({\varvec{\eta }}_i\)’s (\(i=1,\ldots ,M\)) back to obtain a MCMC sample of size M from the approximate posterior distribution of \({\varvec{\alpha }}\), using \({\varvec{\alpha }}_i= P^\prime {\varvec{\eta }}_i\).
Step 3:
For each MCMC sample value of \({\varvec{\alpha }}\) in Step 2, generate a sample \({\varvec{\beta }}=(\beta _1,..,\beta _q)^\prime\) from the multivariate normal distribution with mean \({\varvec{\mu }}_b={\hat{{\varvec{\beta }}}} - S_{ca}S_a^{-1}({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}})\) and variance matrix \(\Sigma _b=S_c- S_{ca}S_a^{-1}S_{ac}\). This yields a MCMC sample of size M from the approximate posterior distribution of \({\varvec{\beta }}\).
Combine the MCMC samples in Step 3 and 4 to obtain a joint MCMC sample of size M for \(({\varvec{\alpha }},{\varvec{\beta }})\).
Appendix B: Extension to More General GLMM
For more general GLMM where there are more than one site-specific random effect, i.e., for \(r>1\) in (2.2), the approximation presented in Sect. 3 holds likewise. Here, we outline the steps involved in the simulation of an MCMC sample from the marginal distribution of the random effects and fixed effect parameters in the hierarchical model, analogous to the distribution in (3.5).
For the general GLMM with r-dimensional site-specific covariates , let \({\varvec{\alpha }}\) be the \(k\times r\) dimensional vector obtained by stacking the site-specific r-dimensional random effects \({\varvec{\alpha }}_i=(\alpha _{i1},\ldots \alpha _{ir}), i=1,\ldots ,k\). As before, we assume a flat prior for \({\varvec{\beta }}\), and a normal hierarchical prior for \({\varvec{\alpha }}_i\)’s given by
with \(\Sigma =diag(\tau _1^2,\ldots ,\tau _r^2)\), and
As in Sect. 3, the marginal posterior distribution of \(({\varvec{\alpha }},{\varvec{\mu }},\tau ^2)\) can be approximated by (3.5). We can simulate from this posterior distribution using Gibbs sampling. We now show how we can simulate from the (full) conditional posterior distribution of \({\varvec{\alpha }}_i\) given all other parameters,
where \({\varvec{\alpha }}_{-i}\) is the sub-vector of \({\varvec{\alpha }}\) after removing the components of \({\varvec{\alpha }}_i\). The first term in (5.5) is proportional to the probability density of a multivariate normal distribution, \(MVN({\hat{{\varvec{\alpha }}}},S_a)\) for \({\varvec{\alpha }}\), and hence is proportional to the product of the corresponding conditional distribution for \({\varvec{\alpha }}_{i}\) given \({\varvec{\alpha }}_{-i}\), and the marginal distribution of \({\varvec{\alpha }}_{-i}\), both of which are multivariate normal density functions. We can then drop the marginal density of \({\varvec{\alpha }}_{-i}\) since it does not involve \({\varvec{\alpha }}_i\), and get
where \({{\tilde{{\varvec{\alpha }}}}}_i\) and \(V_i\) are the conditional mean and variance of \({\varvec{\alpha }}_i\) given \({\varvec{\alpha }}_{-i}\), based on the \(MVN({\hat{{\varvec{\alpha }}}},S_a)\) density for \({\varvec{\alpha }}\). We can recognize the conditional density (5.6) as the posterior distribution for \({\varvec{\alpha }}_i\) derived using a normal likelihood \(N({{\tilde{{\varvec{\alpha }}}}}_i, V_i)\), and a normal prior density \(N({\varvec{\mu }}, \Sigma )\) for \({\varvec{\alpha }}_i\), with known variances \(V_i\) and \(\Sigma\). This can be used to simulate from the full conditional posterior distribution of \({\varvec{\alpha }}_i\), based on (5.5). Simulation from the other full conditional posterior distributions is quite straightforward, and we omit the details here.
Rights and permissions
About this article
Cite this article
Christianson, A., Sivaganesan, S. An Approximate Posterior Simulation for GLMM with Large Samples. J Stat Theory Pract 13, 45 (2019). https://doi.org/10.1007/s42519-019-0045-8
Published:
DOI: https://doi.org/10.1007/s42519-019-0045-8