An Approximate Posterior Simulation for GLMM with Large Samples

Christianson, Annette; Sivaganesan, Siva

doi:10.1007/s42519-019-0045-8

An Approximate Posterior Simulation for GLMM with Large Samples

Original Article
Published: 21 June 2019

Volume 13, article number 45, (2019)
Cite this article

Journal of Statistical Theory and Practice Aims and scope Submit manuscript

115 Accesses
Explore all metrics

Abstract

Generalized linear mixed models are commonly used when modeling counts or dichotomous observations on subjects within clusters such as patients in hospitals. When the sample sizes at the cluster levels are large, Bayesian inference about parameters of generalized linear mixed models using Markov Chain Monte Carlo sampling can be computationally slow. Standard large sample approximations can provide reasonable approximation for inference about cluster-level parameters which are near the “middle” but not necessarily for those parameters away from the middle. We provide an approach to simulating from the posterior distribution that gives better approximation when the sample sizes at the cluster levels are large and a multivariate normal prior or the default flat prior is used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximum Likelihood Estimators on MCMC Sampling Algorithms for Decision Making

Monte-Carlo Simulation in Modeling for Hierarchical Generalized Linear Mixed Models

Efficient Metropolis-Hastings Sampling for Nonlinear Mixed Effects Models

References

Berger JO (1985) Statistical decision theory and Bayesian analysis, 2nd edn. Springer, New York
Book Google Scholar
Berk R (1972) Consistency and asymptotic normality of MLE’s for exponential models. Ann Math Stat 43(1):193–204
Article MathSciNet Google Scholar
Clayton DG (1996) Generalized linear mixed models. In: Gilks WR, Richardson S, Spiegelhalter DJ (eds) Markov chain monte carlo in practice. Chapman and Hall, London, pp 275–301
Google Scholar
Dumouchel W, Volinsky C, Johnson T, Cortes C, and Pregibon D (1999). Squashing flat files flatter. In: Proceedings of the Fifth ACM conference on knowledge discovery and data mining. ACM Press, New York, pp 6–15
Fahrmeir L, Kaufman H (1985) Consistency and asymptotic normality of the maximu likelihood estimator in generalized linear models. Ann Stat 13(1):342–368
Article MathSciNet Google Scholar
Guha S (2010) Posterior simulation in countable mixture models for large datasets. J Am Stat Assoc 105:775–786
Article MathSciNet Google Scholar
Guha S, Ryan L, Morara M (2009) Gauss–Seidel estimation of generalized linear mixed models with application to poisson modeling of spatially varying disease rates. J Comput Gr Stat 18:818–837
Article MathSciNet Google Scholar
Kipnis P, Escobar G, Draper D (2010) Effect of choice of estimation method on inter-hospital mortality rate comparisons. Med Care 48(5):458–465
Article Google Scholar
Madigan D, Raghavan N, Dumouchel W, Nason M, Posse C, Ridgeway G (2002) Likelihood-based data squashing: a modeling approach to instance construction. Data Min Knowl Discov 6:173–190
Article MathSciNet Google Scholar
Normand SL, Glickman ME, Gatsonis CA (1997) Statistical methods for profiling providers of medical care: issues and applications. J Am Stat Assoc 92:803–814
Article Google Scholar
Pennell ML, Dunson DB (2007) Fitting semiparametric random effects models to large data sets. Biostatistics 8(4):821–834
Article Google Scholar
Render M, Kim H, Deddens JA, Sivaganesan S (2005) Variation in outcomes in veterans affairs intensive care units with a computerized severity measure. Crit Care Med 33:930–939
Article Google Scholar
Spiegelhalter DJ, Aylin P, Best NG, Evans SJW, Murray GD (2002) Commissioned analysis of surgical performance using routine data: lessons from the Bristol inquiry. JRSS-A—Stat Soc 165:191–221
Article MathSciNet Google Scholar
Zeger SL, Karim MR (1991) Generalized linear models with random effects: a Gibbs sampling approach. J Am Stat Assoc 86:79–86
Article MathSciNet Google Scholar

Download references

Acknowledgements

We would like to thank the anonymous Associate Editor and two Referees for their very useful comments and suggestions. Their comments have helped improve the paper considerably.

Author information

Authors and Affiliations

Department of Biomedical Informatics, University of Cincinnati, Cincinnati, OH, USA
Annette Christianson
Division of Statistics and Data Science, University of Cincinnati, Cincinnati, OH, USA
Siva Sivaganesan

Authors

Annette Christianson
View author publications
You can also search for this author in PubMed Google Scholar
Siva Sivaganesan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siva Sivaganesan.

Ethics declarations

Conflict of interest statement

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Annette Christianson's work was performed in affiliation with IPEC when the author was an employee there.

Appendices

Appendix A.1: Proof of Lemma

Proof

(a) We recall that k and q are the dimensions of ${\varvec{\alpha }}$ and ${\varvec{\beta }}$, respectively. Here, we will assume that $k>q$; the proof for the other case is similar. Using (2.11), and writing the normal density for ${\varvec{\theta }}=({\varvec{\alpha }},{\varvec{\beta }})$ as the product of marginal and conditional densities,

$$\begin{aligned} \int h({\varvec{\theta }})p({\varvec{\beta }})\mathrm{d}{\varvec{\beta }}\propto &N({\varvec{\alpha }}:{\hat{{\varvec{\alpha }}}}, S_a)\int N({\varvec{\beta }}: {\hat{{\varvec{\beta }}}}+C_1({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}}), W)\cdot N({\varvec{\beta }}: {\varvec{\beta }}_0,V_0)\mathrm{d}{\varvec{\beta }}\nonumber \\&= {} N({\varvec{\alpha }}:{\hat{{\varvec{\alpha }}}}, S_a)\cdot N({\varvec{\beta }}_0: {\hat{{\varvec{\beta }}}}+C_1({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}}), V_0+W) \end{aligned}$$

(5.1)

$$\begin{aligned}\propto & {} \exp \{-0.5(Q_1+Q_2)\} \end{aligned}$$

(5.2)

$\square$

where $N({\varvec{x}}:{\varvec{\mu }},\Sigma )$ is the multivariate normal density with mean ${\varvec{\mu }}$ and variance $\Sigma$, $C_1=S_{ba}S_a^{-1}$, $W=S_{bb}- S_{ba}S_{aa}^{-1}S_{ab}$, $Q_1= ({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}})^\prime S_a^{-1}({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}})$, $Q_2=(C_1{\varvec{\alpha }}-{\varvec{b}}_1)(V_0+W)^{-1}(C_1{\varvec{\alpha }}-{\varvec{b}}_1)^\prime$, and ${\varvec{b}}_1={\varvec{\beta }}_0-{\hat{{\varvec{\beta }}}}+C_1{\hat{{\varvec{\alpha }}}}$.

Letting $C_2=\begin{pmatrix} &{} C_1 &{} \\ {\varvec{0}}&{} &{} I_{k-q} \end{pmatrix}$, a $k\times k$ matrix, where ${\varvec{0}}$ is a $(k-q)\times q$ matrix of zeroes and $I_{k-q}$ is the identity matrix of order $(k-q)$; ${\varvec{b}}_2=\begin{pmatrix} {\varvec{b}}_1\\ {\varvec{0}}\end{pmatrix}$, a vector of length k; and $T_\lambda = \begin{pmatrix} V_0+W &{} 0\\ 0 &{} \lambda I_{k-q} \end{pmatrix}$ for $\lambda >0$; we can show that

$$\begin{aligned} Q_2= ({\varvec{\alpha }}-C_2^{-1}{\varvec{b}}_2)^\prime C_3({\varvec{\alpha }}-C_2^{-1}{\varvec{b}}_2) -{\varvec{\alpha }}_2^\prime {\varvec{\alpha }}_2/\lambda , \end{aligned}$$

(5.3)

where $C_3= C_2^\prime T_\lambda ^{-1}C_2$ and ${\varvec{\alpha }}_2= ({\varvec{0}},I_{k-q}){\varvec{\alpha }}$. Thus, using (5.2), (5.3), and some routine algebra, we have

$$\begin{aligned} \int h({\varvec{\theta }})p({\varvec{\beta }})\mathrm{d}{\varvec{\beta }}&\propto {} exp\{-0.5 ({\varvec{\alpha }}- {{\tilde{{\varvec{\alpha }}}}}_\lambda )^\prime V_{a,\lambda }^{-1}({\varvec{\alpha }}-{{\tilde{{\varvec{\alpha }}}}}_\lambda ) +0.5 {\varvec{\alpha }}_2^\prime {\varvec{\alpha }}_2/\lambda \}, \end{aligned}$$

where ${{\tilde{{\varvec{\alpha }}}}}_\lambda = (S_a^{-1}+C_3)^{-1}(S_a^{-1}{\hat{{\varvec{\alpha }}}} + C_3C_2^{-1}{\varvec{b}}_2)$, and $V_{a,\lambda }= (S_a^{-1}+C_3)^{-1}$.

Now, letting $\lambda \rightarrow \infty$, ${{\tilde{{\varvec{\alpha }}}}}_\lambda$ converges to

$$\begin{aligned} {{\tilde{{\varvec{\alpha }}}}} = (S_a^{-1}+C)^{-1}(S_a^{-1}{\hat{{\varvec{\alpha }}}} + C{\varvec{b}}) \end{aligned}$$

and $V_{a,\lambda }$ converges to $V_a= (S_a^{-1}+C)^{-1}$ where

$$\begin{aligned} C= C_2^\prime T C_2, \; {\varvec{b}}= C_2^{-1}{\varvec{b}}_2 \end{aligned}$$

(5.4)

and $T = \begin{pmatrix} (V_0+W)^{-1} &{} 0\\ 0 &{} 0 \end{pmatrix}$.

(b) For $p({\varvec{\beta }})=1$, we let $V_0= \tau ^2I$ and let $\tau ^2 \rightarrow \infty$ in the result (3.3) for normal prior. In this limit, clearly, $C\rightarrow 0$, and hence ${{\tilde{{\varvec{\alpha }}}}}\rightarrow {\hat{{\varvec{\alpha }}}}$ and $V_a\rightarrow S_a$, giving (3.4).

1.1 Appendix A.2: Three-Step Algorithm, TSA

Step 1:

Fit a fixed effects generalized linear regression model (e.g., using PROC GENMOD in SAS with site as a class variable) with no overall intercept, and different site-specific intercepts for each of s sites, and common fixed effect (coefficients) ${\varvec{\beta }}$ for the q covariates. Here, all $s+q$ parameters are regarded as fixed effects.

Obtain the MLE, ${\hat{{\varvec{\alpha }}}}$, for the s-dimensional unit specific intercept vector, and ${\hat{{\varvec{\beta }}}}$ for the q-dimensional covariate coefficients ${\varvec{\beta }}$ and their associated estimated covariance matrix (see 2.8).

Step 2:

Diagonalize $S_a$ using an orthogonal matrix P so that $P^\prime SP =diag(s^2_1,\ldots ,s^2_k)$.

Transform ${\hat{{\varvec{\alpha }}}}$ to ${\hat{{\varvec{\eta }}}}=P{\hat{{\varvec{\alpha }}}}$, and let ${\hat{{\varvec{\eta }}}}=({\hat{\eta }}_{1},\ldots ,{\hat{\eta }}_{k})^\prime$.

Fit a normal hierarchical Bayesian model for ${\hat{\eta _i}}$’s ($i=1,\ldots ,k$), assuming ${\hat{\eta _i}} \sim N(\eta _i,s_i^2)$, and simulate a MCMC sample of size M from the posterior distribution of ${\varvec{\eta }}=(\eta _1,\ldots ,\eta _k)^\prime$, which we denote by ${\varvec{\eta }}_1,\ldots ,{\varvec{\eta }}_M$. Transform ${\varvec{\eta }}_i$’s ($i=1,\ldots ,M$) back to obtain a MCMC sample of size M from the approximate posterior distribution of ${\varvec{\alpha }}$, using ${\varvec{\alpha }}_i= P^\prime {\varvec{\eta }}_i$.

Step 3:

For each MCMC sample value of ${\varvec{\alpha }}$ in Step 2, generate a sample ${\varvec{\beta }}=(\beta _1,..,\beta _q)^\prime$ from the multivariate normal distribution with mean ${\varvec{\mu }}_b={\hat{{\varvec{\beta }}}} - S_{ca}S_a^{-1}({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}})$ and variance matrix $\Sigma _b=S_c- S_{ca}S_a^{-1}S_{ac}$. This yields a MCMC sample of size M from the approximate posterior distribution of ${\varvec{\beta }}$.

Combine the MCMC samples in Step 3 and 4 to obtain a joint MCMC sample of size M for $({\varvec{\alpha }},{\varvec{\beta }})$.

Appendix B: Extension to More General GLMM

For more general GLMM where there are more than one site-specific random effect, i.e., for $r>1$ in (2.2), the approximation presented in Sect. 3 holds likewise. Here, we outline the steps involved in the simulation of an MCMC sample from the marginal distribution of the random effects and fixed effect parameters in the hierarchical model, analogous to the distribution in (3.5).

For the general GLMM with r-dimensional site-specific covariates , let ${\varvec{\alpha }}$ be the $k\times r$ dimensional vector obtained by stacking the site-specific r-dimensional random effects ${\varvec{\alpha }}_i=(\alpha _{i1},\ldots \alpha _{ir}), i=1,\ldots ,k$. As before, we assume a flat prior for ${\varvec{\beta }}$, and a normal hierarchical prior for ${\varvec{\alpha }}_i$’s given by

$$\begin{aligned} {\varvec{\alpha }}_i \text{ iid } N({\varvec{\mu }},\Sigma ) \end{aligned}$$

with $\Sigma =diag(\tau _1^2,\ldots ,\tau _r^2)$, and

$$\begin{aligned} p({\varvec{\mu }},\tau _1^2,\ldots ,\tau _r^2)=1. \end{aligned}$$

As in Sect. 3, the marginal posterior distribution of $({\varvec{\alpha }},{\varvec{\mu }},\tau ^2)$ can be approximated by (3.5). We can simulate from this posterior distribution using Gibbs sampling. We now show how we can simulate from the (full) conditional posterior distribution of ${\varvec{\alpha }}_i$ given all other parameters,

$$\begin{aligned} p({\varvec{\alpha }}_i|{\varvec{\alpha }}_{-i},\mu ,\Sigma ,{\varvec{y}}) \propto \frac{1}{(2\pi )^{s/2}\sqrt{|S_a|}}\exp \{-(1/2)({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}})^\prime S_a^{-1}({\varvec{\alpha }}-{\hat{{\varvec{\alpha }}}})\}p({\varvec{\alpha }}_i|{\varvec{\mu }},\Sigma ), \end{aligned}$$

(5.5)

where ${\varvec{\alpha }}_{-i}$ is the sub-vector of ${\varvec{\alpha }}$ after removing the components of ${\varvec{\alpha }}_i$. The first term in (5.5) is proportional to the probability density of a multivariate normal distribution, $MVN({\hat{{\varvec{\alpha }}}},S_a)$ for ${\varvec{\alpha }}$, and hence is proportional to the product of the corresponding conditional distribution for ${\varvec{\alpha }}_{i}$ given ${\varvec{\alpha }}_{-i}$, and the marginal distribution of ${\varvec{\alpha }}_{-i}$, both of which are multivariate normal density functions. We can then drop the marginal density of ${\varvec{\alpha }}_{-i}$ since it does not involve ${\varvec{\alpha }}_i$, and get

$$\begin{aligned} p({\varvec{\alpha }}_i|{\varvec{\alpha }}_{-i},\mu ,\Sigma ,{\varvec{y}})\propto exp\{-(1/2)({\varvec{\alpha }}_i-{{\tilde{{\varvec{\alpha }}}}}_i)^\prime V_i^{-1}({\varvec{\alpha }}_i-{{\tilde{{\varvec{\alpha }}}}}_i)\}p({\varvec{\alpha }}_i|{\varvec{\mu }},\Sigma ) \end{aligned}$$

(5.6)

where ${{\tilde{{\varvec{\alpha }}}}}_i$ and $V_i$ are the conditional mean and variance of ${\varvec{\alpha }}_i$ given ${\varvec{\alpha }}_{-i}$, based on the $MVN({\hat{{\varvec{\alpha }}}},S_a)$ density for ${\varvec{\alpha }}$. We can recognize the conditional density (5.6) as the posterior distribution for ${\varvec{\alpha }}_i$ derived using a normal likelihood $N({{\tilde{{\varvec{\alpha }}}}}_i, V_i)$, and a normal prior density $N({\varvec{\mu }}, \Sigma )$ for ${\varvec{\alpha }}_i$, with known variances $V_i$ and $\Sigma$. This can be used to simulate from the full conditional posterior distribution of ${\varvec{\alpha }}_i$, based on (5.5). Simulation from the other full conditional posterior distributions is quite straightforward, and we omit the details here.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Christianson, A., Sivaganesan, S. An Approximate Posterior Simulation for GLMM with Large Samples. J Stat Theory Pract 13, 45 (2019). https://doi.org/10.1007/s42519-019-0045-8

Download citation

Published: 21 June 2019
DOI: https://doi.org/10.1007/s42519-019-0045-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Approximate Posterior Simulation for GLMM with Large Samples

Abstract

Access this article

Similar content being viewed by others

Maximum Likelihood Estimators on MCMC Sampling Algorithms for Decision Making

Monte-Carlo Simulation in Modeling for Hierarchical Generalized Linear Mixed Models

Efficient Metropolis-Hastings Sampling for Nonlinear Mixed Effects Models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest statement

Additional information

Publisher's Note

Appendices

Appendix A.1: Proof of Lemma

Proof

1.1 Appendix A.2: Three-Step Algorithm, TSA

Appendix B: Extension to More General GLMM

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Approximate Posterior Simulation for GLMM with Large Samples

Abstract

Access this article

Similar content being viewed by others

Maximum Likelihood Estimators on MCMC Sampling Algorithms for Decision Making

Monte-Carlo Simulation in Modeling for Hierarchical Generalized Linear Mixed Models

Efficient Metropolis-Hastings Sampling for Nonlinear Mixed Effects Models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest statement

Additional information

Publisher's Note

Appendices

Appendix A.1: Proof of Lemma

Proof

1.1 Appendix A.2: Three-Step Algorithm, TSA

Appendix B: Extension to More General GLMM

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation