Skip to main content
Log in

An Approximate Posterior Simulation for GLMM with Large Samples

  • Original Article
  • Published:
Journal of Statistical Theory and Practice Aims and scope Submit manuscript

Abstract

Generalized linear mixed models are commonly used when modeling counts or dichotomous observations on subjects within clusters such as patients in hospitals. When the sample sizes at the cluster levels are large, Bayesian inference about parameters of generalized linear mixed models using Markov Chain Monte Carlo sampling can be computationally slow. Standard large sample approximations can provide reasonable approximation for inference about cluster-level parameters which are near the “middle” but not necessarily for those parameters away from the middle. We provide an approach to simulating from the posterior distribution that gives better approximation when the sample sizes at the cluster levels are large and a multivariate normal prior or the default flat prior is used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Berger JO (1985) Statistical decision theory and Bayesian analysis, 2nd edn. Springer, New York

    Book  Google Scholar 

  2. Berk R (1972) Consistency and asymptotic normality of MLE’s for exponential models. Ann Math Stat 43(1):193–204

    Article  MathSciNet  Google Scholar 

  3. Clayton DG (1996) Generalized linear mixed models. In: Gilks WR, Richardson S, Spiegelhalter DJ (eds) Markov chain monte carlo in practice. Chapman and Hall, London, pp 275–301

    Google Scholar 

  4. Dumouchel W, Volinsky C, Johnson T, Cortes C, and Pregibon D (1999). Squashing flat files flatter. In: Proceedings of the Fifth ACM conference on knowledge discovery and data mining. ACM Press, New York, pp 6–15

  5. Fahrmeir L, Kaufman H (1985) Consistency and asymptotic normality of the maximu likelihood estimator in generalized linear models. Ann Stat 13(1):342–368

    Article  MathSciNet  Google Scholar 

  6. Guha S (2010) Posterior simulation in countable mixture models for large datasets. J Am Stat Assoc 105:775–786

    Article  MathSciNet  Google Scholar 

  7. Guha S, Ryan L, Morara M (2009) Gauss–Seidel estimation of generalized linear mixed models with application to poisson modeling of spatially varying disease rates. J Comput Gr Stat 18:818–837

    Article  MathSciNet  Google Scholar 

  8. Kipnis P, Escobar G, Draper D (2010) Effect of choice of estimation method on inter-hospital mortality rate comparisons. Med Care 48(5):458–465

    Article  Google Scholar 

  9. Madigan D, Raghavan N, Dumouchel W, Nason M, Posse C, Ridgeway G (2002) Likelihood-based data squashing: a modeling approach to instance construction. Data Min Knowl Discov 6:173–190

    Article  MathSciNet  Google Scholar 

  10. Normand SL, Glickman ME, Gatsonis CA (1997) Statistical methods for profiling providers of medical care: issues and applications. J Am Stat Assoc 92:803–814

    Article  Google Scholar 

  11. Pennell ML, Dunson DB (2007) Fitting semiparametric random effects models to large data sets. Biostatistics 8(4):821–834

    Article  Google Scholar 

  12. Render M, Kim H, Deddens JA, Sivaganesan S (2005) Variation in outcomes in veterans affairs intensive care units with a computerized severity measure. Crit Care Med 33:930–939

    Article  Google Scholar 

  13. Spiegelhalter DJ, Aylin P, Best NG, Evans SJW, Murray GD (2002) Commissioned analysis of surgical performance using routine data: lessons from the Bristol inquiry. JRSS-A—Stat Soc 165:191–221

    Article  MathSciNet  Google Scholar 

  14. Zeger SL, Karim MR (1991) Generalized linear models with random effects: a Gibbs sampling approach. J Am Stat Assoc 86:79–86

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We would like to thank the anonymous Associate Editor and two Referees for their very useful comments and suggestions. Their comments have helped improve the paper considerably.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siva Sivaganesan.

Ethics declarations

Conflict of interest statement

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Annette Christianson's work was performed in affiliation with IPEC when the author was an employee there.

Appendices

Appendix A.1: Proof of Lemma

Proof

(a) We recall that k and q are the dimensions of \({\varvec{\alpha }}\) and \({\varvec{\beta }}\), respectively. Here, we will assume that \(k>q\); the proof for the other case is similar. Using (2.11), and writing the normal density for \({\varvec{\theta }}=({\varvec{\alpha }},{\varvec{\beta }})\) as the product of marginal and conditional densities,

$$\begin{aligned} \int h({\varvec{\theta }})p({\varvec{\beta }})\mathrm{d}{\varvec{\beta }}\propto &N({\varvec{\alpha }}:{\hat{{\varvec{\alpha }}}}, S_a)\int N({\varvec{\beta }}: {\hat{{\varvec{\beta }}}}+C_1({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}}), W)\cdot N({\varvec{\beta }}: {\varvec{\beta }}_0,V_0)\mathrm{d}{\varvec{\beta }}\nonumber \\&= {} N({\varvec{\alpha }}:{\hat{{\varvec{\alpha }}}}, S_a)\cdot N({\varvec{\beta }}_0: {\hat{{\varvec{\beta }}}}+C_1({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}}), V_0+W) \end{aligned}$$
(5.1)
$$\begin{aligned}\propto & {} \exp \{-0.5(Q_1+Q_2)\} \end{aligned}$$
(5.2)

\(\square\)

where \(N({\varvec{x}}:{\varvec{\mu }},\Sigma )\) is the multivariate normal density with mean \({\varvec{\mu }}\) and variance \(\Sigma\), \(C_1=S_{ba}S_a^{-1}\), \(W=S_{bb}- S_{ba}S_{aa}^{-1}S_{ab}\), \(Q_1= ({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}})^\prime S_a^{-1}({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}})\), \(Q_2=(C_1{\varvec{\alpha }}-{\varvec{b}}_1)(V_0+W)^{-1}(C_1{\varvec{\alpha }}-{\varvec{b}}_1)^\prime\), and \({\varvec{b}}_1={\varvec{\beta }}_0-{\hat{{\varvec{\beta }}}}+C_1{\hat{{\varvec{\alpha }}}}\).

Letting \(C_2=\begin{pmatrix} &{} C_1 &{} \\ {\varvec{0}}&{} &{} I_{k-q} \end{pmatrix}\), a \(k\times k\) matrix, where \({\varvec{0}}\) is a \((k-q)\times q\) matrix of zeroes and \(I_{k-q}\) is the identity matrix of order \((k-q)\); \({\varvec{b}}_2=\begin{pmatrix} {\varvec{b}}_1\\ {\varvec{0}}\end{pmatrix}\), a vector of length k; and \(T_\lambda = \begin{pmatrix} V_0+W &{} 0\\ 0 &{} \lambda I_{k-q} \end{pmatrix}\) for \(\lambda >0\); we can show that

$$\begin{aligned} Q_2= ({\varvec{\alpha }}-C_2^{-1}{\varvec{b}}_2)^\prime C_3({\varvec{\alpha }}-C_2^{-1}{\varvec{b}}_2) -{\varvec{\alpha }}_2^\prime {\varvec{\alpha }}_2/\lambda , \end{aligned}$$
(5.3)

where \(C_3= C_2^\prime T_\lambda ^{-1}C_2\) and \({\varvec{\alpha }}_2= ({\varvec{0}},I_{k-q}){\varvec{\alpha }}\). Thus, using (5.2), (5.3), and some routine algebra, we have

$$\begin{aligned} \int h({\varvec{\theta }})p({\varvec{\beta }})\mathrm{d}{\varvec{\beta }}&\propto {} exp\{-0.5 ({\varvec{\alpha }}- {{\tilde{{\varvec{\alpha }}}}}_\lambda )^\prime V_{a,\lambda }^{-1}({\varvec{\alpha }}-{{\tilde{{\varvec{\alpha }}}}}_\lambda ) +0.5 {\varvec{\alpha }}_2^\prime {\varvec{\alpha }}_2/\lambda \}, \end{aligned}$$

where \({{\tilde{{\varvec{\alpha }}}}}_\lambda = (S_a^{-1}+C_3)^{-1}(S_a^{-1}{\hat{{\varvec{\alpha }}}} + C_3C_2^{-1}{\varvec{b}}_2)\), and \(V_{a,\lambda }= (S_a^{-1}+C_3)^{-1}\).

Now, letting \(\lambda \rightarrow \infty\), \({{\tilde{{\varvec{\alpha }}}}}_\lambda\) converges to

$$\begin{aligned} {{\tilde{{\varvec{\alpha }}}}} = (S_a^{-1}+C)^{-1}(S_a^{-1}{\hat{{\varvec{\alpha }}}} + C{\varvec{b}}) \end{aligned}$$

and \(V_{a,\lambda }\) converges to \(V_a= (S_a^{-1}+C)^{-1}\) where

$$\begin{aligned} C= C_2^\prime T C_2, \; {\varvec{b}}= C_2^{-1}{\varvec{b}}_2 \end{aligned}$$
(5.4)

and \(T = \begin{pmatrix} (V_0+W)^{-1} &{} 0\\ 0 &{} 0 \end{pmatrix}\).

(b) For \(p({\varvec{\beta }})=1\), we let \(V_0= \tau ^2I\) and let \(\tau ^2 \rightarrow \infty\) in the result (3.3) for normal prior. In this limit, clearly, \(C\rightarrow 0\), and hence \({{\tilde{{\varvec{\alpha }}}}}\rightarrow {\hat{{\varvec{\alpha }}}}\) and \(V_a\rightarrow S_a\), giving (3.4).

1.1 Appendix A.2: Three-Step Algorithm, TSA

Step 1:

Fit a fixed effects generalized linear regression model (e.g., using PROC GENMOD in SAS with site as a class variable) with no overall intercept, and different site-specific intercepts for each of s sites, and common fixed effect (coefficients) \({\varvec{\beta }}\) for the q covariates. Here, all \(s+q\) parameters are regarded as fixed effects.

Obtain the MLE, \({\hat{{\varvec{\alpha }}}}\), for the s-dimensional unit specific intercept vector, and \({\hat{{\varvec{\beta }}}}\) for the q-dimensional covariate coefficients \({\varvec{\beta }}\) and their associated estimated covariance matrix (see 2.8).

Step 2:

Diagonalize \(S_a\) using an orthogonal matrix P so that \(P^\prime SP =diag(s^2_1,\ldots ,s^2_k)\).

Transform \({\hat{{\varvec{\alpha }}}}\) to \({\hat{{\varvec{\eta }}}}=P{\hat{{\varvec{\alpha }}}}\), and let \({\hat{{\varvec{\eta }}}}=({\hat{\eta }}_{1},\ldots ,{\hat{\eta }}_{k})^\prime\).

Fit a normal hierarchical Bayesian model for \({\hat{\eta _i}}\)’s (\(i=1,\ldots ,k\)), assuming \({\hat{\eta _i}} \sim N(\eta _i,s_i^2)\), and simulate a MCMC sample of size M from the posterior distribution of \({\varvec{\eta }}=(\eta _1,\ldots ,\eta _k)^\prime\), which we denote by \({\varvec{\eta }}_1,\ldots ,{\varvec{\eta }}_M\). Transform \({\varvec{\eta }}_i\)’s (\(i=1,\ldots ,M\)) back to obtain a MCMC sample of size M from the approximate posterior distribution of \({\varvec{\alpha }}\), using \({\varvec{\alpha }}_i= P^\prime {\varvec{\eta }}_i\).

Step 3:

For each MCMC sample value of \({\varvec{\alpha }}\) in Step 2, generate a sample \({\varvec{\beta }}=(\beta _1,..,\beta _q)^\prime\) from the multivariate normal distribution with mean \({\varvec{\mu }}_b={\hat{{\varvec{\beta }}}} - S_{ca}S_a^{-1}({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}})\) and variance matrix \(\Sigma _b=S_c- S_{ca}S_a^{-1}S_{ac}\). This yields a MCMC sample of size M from the approximate posterior distribution of \({\varvec{\beta }}\).

Combine the MCMC samples in Step 3 and 4 to obtain a joint MCMC sample of size M for \(({\varvec{\alpha }},{\varvec{\beta }})\).

Appendix B: Extension to More General GLMM

For more general GLMM where there are more than one site-specific random effect, i.e., for \(r>1\) in (2.2), the approximation presented in Sect. 3 holds likewise. Here, we outline the steps involved in the simulation of an MCMC sample from the marginal distribution of the random effects and fixed effect parameters in the hierarchical model, analogous to the distribution in (3.5).

For the general GLMM with r-dimensional site-specific covariates , let \({\varvec{\alpha }}\) be the \(k\times r\) dimensional vector obtained by stacking the site-specific r-dimensional random effects \({\varvec{\alpha }}_i=(\alpha _{i1},\ldots \alpha _{ir}), i=1,\ldots ,k\). As before, we assume a flat prior for \({\varvec{\beta }}\), and a normal hierarchical prior for \({\varvec{\alpha }}_i\)’s given by

$$\begin{aligned} {\varvec{\alpha }}_i \text{ iid } N({\varvec{\mu }},\Sigma ) \end{aligned}$$

with \(\Sigma =diag(\tau _1^2,\ldots ,\tau _r^2)\), and

$$\begin{aligned} p({\varvec{\mu }},\tau _1^2,\ldots ,\tau _r^2)=1. \end{aligned}$$

As in Sect. 3, the marginal posterior distribution of \(({\varvec{\alpha }},{\varvec{\mu }},\tau ^2)\) can be approximated by (3.5). We can simulate from this posterior distribution using Gibbs sampling. We now show how we can simulate from the (full) conditional posterior distribution of \({\varvec{\alpha }}_i\) given all other parameters,

$$\begin{aligned} p({\varvec{\alpha }}_i|{\varvec{\alpha }}_{-i},\mu ,\Sigma ,{\varvec{y}}) \propto \frac{1}{(2\pi )^{s/2}\sqrt{|S_a|}}\exp \{-(1/2)({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}})^\prime S_a^{-1}({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}})\}p({\varvec{\alpha }}_i|{\varvec{\mu }},\Sigma ), \end{aligned}$$
(5.5)

where \({\varvec{\alpha }}_{-i}\) is the sub-vector of \({\varvec{\alpha }}\) after removing the components of \({\varvec{\alpha }}_i\). The first term in (5.5) is proportional to the probability density of a multivariate normal distribution, \(MVN({\hat{{\varvec{\alpha }}}},S_a)\) for \({\varvec{\alpha }}\), and hence is proportional to the product of the corresponding conditional distribution for \({\varvec{\alpha }}_{i}\) given \({\varvec{\alpha }}_{-i}\), and the marginal distribution of \({\varvec{\alpha }}_{-i}\), both of which are multivariate normal density functions. We can then drop the marginal density of \({\varvec{\alpha }}_{-i}\) since it does not involve \({\varvec{\alpha }}_i\), and get

$$\begin{aligned} p({\varvec{\alpha }}_i|{\varvec{\alpha }}_{-i},\mu ,\Sigma ,{\varvec{y}})\propto exp\{-(1/2)({\varvec{\alpha }}_i-{{\tilde{{\varvec{\alpha }}}}}_i)^\prime V_i^{-1}({\varvec{\alpha }}_i-{{\tilde{{\varvec{\alpha }}}}}_i)\}p({\varvec{\alpha }}_i|{\varvec{\mu }},\Sigma ) \end{aligned}$$
(5.6)

where \({{\tilde{{\varvec{\alpha }}}}}_i\) and \(V_i\) are the conditional mean and variance of \({\varvec{\alpha }}_i\) given \({\varvec{\alpha }}_{-i}\), based on the \(MVN({\hat{{\varvec{\alpha }}}},S_a)\) density for \({\varvec{\alpha }}\). We can recognize the conditional density (5.6) as the posterior distribution for \({\varvec{\alpha }}_i\) derived using a normal likelihood \(N({{\tilde{{\varvec{\alpha }}}}}_i, V_i)\), and a normal prior density \(N({\varvec{\mu }}, \Sigma )\) for \({\varvec{\alpha }}_i\), with known variances \(V_i\) and \(\Sigma\). This can be used to simulate from the full conditional posterior distribution of \({\varvec{\alpha }}_i\), based on (5.5). Simulation from the other full conditional posterior distributions is quite straightforward, and we omit the details here.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Christianson, A., Sivaganesan, S. An Approximate Posterior Simulation for GLMM with Large Samples. J Stat Theory Pract 13, 45 (2019). https://doi.org/10.1007/s42519-019-0045-8

Download citation

  • Published:

  • DOI: https://doi.org/10.1007/s42519-019-0045-8

Keywords

Navigation