Skip to main content
Log in

Fixed versus Mixed Effects Based Marginal Models for Clustered Correlated Binary Data: an Overview on Advances and Challenges

  • Published:
Sankhya B Aims and scope Submit manuscript

Abstract

In a cross-sectional cluster setup, the binary responses from the individuals in a cluster become correlated as they share a common cluster effect, whereas longitudinal responses from an individual those form a cluster become correlated as the present and past responses are likely to maintain a suitable dynamic relationship. In both cluster and longitudinal setups, the marginal means may or may not be specified as the function of regression effects/parameters only. In a cluster setup, this depends on the distributional assumption of the random cluster effects and in a longitudinal setup this depends on the form such as linear or non-linear dynamic relationships used to construct a conditional model. However, over the last four decades, many studies arbitrarily pre-specified the marginal means as the function of regression effects only under both cluster and longitudinal setups and accommodated correlations also using arbitrarily selected ‘working’ correlation structures. This paper makes a thorough in-depth review of these decades long binary correlation models for consistent and efficient estimation of the regression effects. Both progress and drawbacks of these works are presented clearly showing how the inconsistency can arise if the pre-specified marginal fixed model is used when in fact such a marginal fixed effects model does not exist. This is because, some of the conditional random effects models in a cluster setup produce mixed effect models for the marginal means, and conditional non-linear dynamic models in a longitudinal setup produce history based marginal recursive/dynamic models. As the practitioners in both cluster and longitudinal setups deal with large data sets, it is demonstrated for their benefits how one can use the GQL (generalized quasi-likelihood) estimation approach both in cluster and longitudinal setups. Furthermore, there exist many studies using the Bayesisn approach where unlike the aforementioned parametric correlation structure based inferences, the marginal mixed effects models have been used for inferences for correlated binary data without specifying their correlation structures, under both cluster and longitudinal setup. We also provide a brief review on this alternative approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Amemiya, T. (1985). Advanced Econometrics. Harvard University Press, Cambridge.

    Google Scholar 

  • Bahadur, R.R. (1961). A representation of the joint distribution of responses to n dichotomous items, 6, Solomon, H. (ed.), p. 158–168.

  • Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. Journal of American Statistical Association 88, 9–25.

    MATH  Google Scholar 

  • Breslow, N.E. and Lin, X. (1995). Bias correction in generalized linear models with a single component of dispersion. Biometrika 82, 81–92.

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, Z., Yi, G.Y. and Wu, C. (2011). Marginal methods for correlated binary data with misclassified responses. Biometrika 98, 647–662.

    Article  MathSciNet  MATH  Google Scholar 

  • Chib, S. and Jeliazkov, I. (2006). Inference in semiparametric dynamic models for binary longitudinal data. Journal of American Statistical Association 101, 685–700.

    Article  MathSciNet  MATH  Google Scholar 

  • Crowder, M. (1995). On the use of a working correlation matrix in using generalized linear models for repeated measures. Biometrika 82, 407–410.

    Article  MATH  Google Scholar 

  • Congdon, P. (2014). Applied Bayesian Modelling. Wiley, New York.

    MATH  Google Scholar 

  • Cox, D.R. (1972). The analysis of multivariate binary data. Appl. Stat.21, 113–120.

    Article  Google Scholar 

  • Daniels, M.J. and Gatsonis, C. (1999). Hierarchical generalized linear models in the analysis of variations in health care utilization. Journal of American Statistical Association 94, 29–42.

    Article  Google Scholar 

  • Das, K., Li, R., Sengupta, S. and Wu, R. (2013). A Bayesisn semiparametric model for bivariate sparse longitudinal data. Stat. Med. 32, 3899–3910.

    Article  MathSciNet  Google Scholar 

  • Ekholm, A., Smith, P.W.F. and McDonald, J.W. (1995). Marginal regression analysis of a multivariate binary response. Biometrika 82, 847–854.

    Article  MathSciNet  MATH  Google Scholar 

  • Fokianos, K. and Kedem, B. (2003). Regression theory for categorical time series. Stat. Sci. 18, 357–376.

    Article  MathSciNet  MATH  Google Scholar 

  • Gelfand, A.E. and Carlin, B.P. (1993). Maximum likelihood estimation for constrained or missing data problems. Canadian Journal of Statistics 21, 303–311.

    Article  MATH  Google Scholar 

  • Haseman, J.K. and Kuper, J.K. (1979). Analysis of dichotomous response data from certain toxicological experiments. Biometrics 35, 281–294.

    Article  Google Scholar 

  • Henderson, C.R. (1963). Selection index and expected genetic advance. National Academy of Sciences, p. 141–63.

  • Jiang, J. (1998). Consistent estimators in generalized linear mixed models. Journal of American Statistical Association 93, 720–729.

    Article  MathSciNet  MATH  Google Scholar 

  • Johnson, N. L. and Kotz, S. (1970). Continuous Multivariate Distributions-2. Wiley, New York.

    Google Scholar 

  • Kanter, M. (1975). Auto-regression for discrete processes mod 2. Journal of Applied Probability 12, 371–375.

    Article  MathSciNet  MATH  Google Scholar 

  • Karim, M.R. and Zeger, S.L. (1992). Generalized linear models with random effects: Salamander mating revisited. Biometrics 48, 631–644.

    Article  Google Scholar 

  • Kuk, A.Y.C. (1995). Asymptotically unbiased estimation in generalized linear models with random effects. J. R. Stastist. Soc. B 58, 619–678.

    MATH  Google Scholar 

  • Laird, N.M. and Ware, J.H. (1982). Random effects models for longitudinal data. Biometrics 38, 963–974.

    Article  MATH  Google Scholar 

  • Lee, Y. and Nelder, J. (1996). Hierarchical generalized linear models. Journal of Royal Statistical Society, B 58, 619–678.

    MathSciNet  MATH  Google Scholar 

  • Liang, K.Y. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22.

    Article  MathSciNet  MATH  Google Scholar 

  • Liang, K.-Y., Zeger, S.L. and Qaqish, B. (1992). Multivariate regression analysis for categorical data. J. Roy. Statist. Soc. Ser. B 54, 3–40.

    MathSciNet  MATH  Google Scholar 

  • Lin, X. and Breslow, N.E. (1996). Bias correction in generalized linear mixed models with multiple components of dispersion. J. Am. Statist. Assoc. 91, 1007–1016.

    Article  MathSciNet  MATH  Google Scholar 

  • Lin, X. and Carroll, R.J. (2001). Semiparametric regression for cluster data using generalized estimating equations. J. Am. Statist. Asso. 96, 1045–1056.

    Article  MATH  Google Scholar 

  • Lipsitz, S.R., Laird, N.M. and Harrington, D.P. (1991). Generalized estimating equations for correlated binary data: using the odds ratio as a measure of association. Biometrika 78, 153–160.

    Article  MathSciNet  Google Scholar 

  • Loredo-Osti, J.C. and Sutradhar, B.C. (2012). Estimation of regression and dynamic dependence parameters for non-stationary multinomial time series. J. Time Ser. Anal. 33, 458–467.

    Article  MATH  Google Scholar 

  • McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models. Chapman and Hall, London.

    Book  MATH  Google Scholar 

  • McCulloch, C.E. (1997). Maximum likelihood algorithms for generalized linear mixed models. Journal of American Statistical Association 92, 162–170.

    Article  MathSciNet  MATH  Google Scholar 

  • McDonald, D.R. (2005). The local limit theorem: a historical perspective. Journal of Iranian Statistical Society 4, 73–86.

    MATH  Google Scholar 

  • McGilchrist, C.A. (1994). Estimation in generalised linear mixed models. J. R. Statist. Soc. B56, 61–69.

    MathSciNet  MATH  Google Scholar 

  • Neuhaus, J.M. (2002). Analysis of clustered and longitudinal binary data subject to response misclassification. Biometrics 58, 675–683.

    Article  MathSciNet  MATH  Google Scholar 

  • Neuhaus, J.M., Kalbfleisch, J.D. and Hauck, W.W. (1991). A comparison of cluster-specific and population-averaged approaches for analyzing correlated binary data. Int. Stat. Rev. 59, 25–35.

    Article  Google Scholar 

  • Parzen, M. et al. (2011). A generalized linear mixed model for longitudinal binary data with a marginal logit link function. The Annals of Applied Statistics5, 449–467.

    Article  MathSciNet  MATH  Google Scholar 

  • Prentice, R.L. (1986). Binary regression using an extended Beta-binomial distribution. Journal of American Statistical Association 81, 321–327.

    Article  MATH  Google Scholar 

  • Schall, R. (1991). Estimation in generalized linear models with random effects. Biometrika 78, 719–727.

    Article  MATH  Google Scholar 

  • Stiratelli, R., Laird, N. and Ware, J.H. (1984). Random effects model for serial observations with binary response. Biometrics 40, 961–971.

    Article  Google Scholar 

  • Sutradhar, B.C. (2003). An overview on regression models for discrete longitudinal responses. Stat. Sci. 18, 377–393.

    Article  MathSciNet  MATH  Google Scholar 

  • Sutradhar, B.C. (2004). On exact quasi-likelihood inference in generalized linear mixed models. Sankhya B: The Indian Journal of Statistics 66, 261–289.

    Google Scholar 

  • Sutradhar, B.C. (2010). Inferences in generalized linear longitudinal mixed models, Vol. 38.

  • Sutradhar, B.C. (2011). Dynamic Mixed Models for Familial Longitudinal Data. Springer, New York.

    Book  MATH  Google Scholar 

  • Sutradhar, B.C. (2014). Longitudinal Categorical Data Analysis. Springer, New York.

    Book  MATH  Google Scholar 

  • Sutradhar, B.C. and Ali, M.M. (1989). A generalization of the Wishart distribution for the elliptical model and its moments for the multivariate t model. J. Multivar. Anal. 29, 155–162.

    Article  MathSciNet  MATH  Google Scholar 

  • Sutradhar, B.C. and Das, K. (1997). Generalized linear models for beta correlated binary longitudinal data. Communications in Statistics- Theory and Methods26, 617–635.

    Article  MathSciNet  MATH  Google Scholar 

  • Sutradhar, B.C. and Das, K. (1999). On the efficiency of regression estimators in generalized linear models for longitudinal data. Biometrika 86, 459–465.

    Article  MathSciNet  MATH  Google Scholar 

  • Sutradhar, B.C., Bari, W. and Das, K. (2010). On probit versus logit dynamic mixed models for binary panel data. J. Stat. Comput. Simul. 80, 421–441.

    Article  MathSciNet  MATH  Google Scholar 

  • Sutradhar, B.C. and Farrell, P.J. (2007). On optimal lag 1 dependence estimation for dynamic binary models with application to asthma data. Sankhya B 69, 448–467.

    MathSciNet  MATH  Google Scholar 

  • Sutradhar, B.C. and Mukerjee, R. (2005). On likelihood inference in binary mixed model with an application to COPD data. Computational Statistics and Data Analysis 48, 345–361.

    Article  MathSciNet  MATH  Google Scholar 

  • Sutradhar, B.C. and Zheng, N. (2018). Inferences in binary dynamic fixed models in a semiparametric setup. Sankhya B 80, 263–291.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, Z. and Louis, T.A. (2003). Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function. Biometrika90, 765–775.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, Z. and Louis, T.A. (2004). Marginalized binary mixed-effects models with covariate-dependent random effects and likelihood inference. Biometrics 60, 884–891.

    Article  MathSciNet  MATH  Google Scholar 

  • Wedderburn, R. (1974). Quasilikelihood functions, generalized linear models and the Gauss-Newton method. Biometrika 61, 439–447.

    MathSciNet  MATH  Google Scholar 

  • Yi, G.Y. and Cook, R.J. (2002). Marginal methods for incomplete longitudinal data arising in clusters. Journal of American Statistical Association 97, 1071–1080.

    Article  MathSciNet  MATH  Google Scholar 

  • Zeger, S.L., Liang, K.Y. and Albert, P.S. (1988). Models for longitudinal data: a generalized estimating equations approach. Biometrics 44, 1049–1060.

    Article  MathSciNet  MATH  Google Scholar 

  • Zeger, S.L., Liang, K.Y. and Self, S.G. (1985). The analysis of binary longitudinal data with time independent covariates. Biometrika 72, 31–38.

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The author would like to thank a referee and the Associate Editor for their valuable comments and suggestions that lead to the improvement of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brajendra C. Sutradhar.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Higher Order Moments (up to order 4) for Clustered Binary Responses

Appendix: Higher Order Moments (up to order 4) for Clustered Binary Responses

To compute \({\Omega }_{i}(\boldsymbol {\beta },\sigma ^{2}_{\gamma })\) in Eq. 80 on top of var(Yij) and cov(Yij,Yik), we need the formulas for certain specific third and fourth order moments as follows.

Computation of var(Y ij Y ik)

This variance is computed as

$$ \begin{array}{@{}rcl@{}} \text{var}(Y_{ij}Y_{ik})&=&E[Y^{2}_{ij}{Y^{2}_{k}}]-[E(Y_{ij}Y_{ik})]^{2}\\ &=&E(Y_{ij}Y_{ik})-[E(Y_{ij}Y_{ik})]^{2}=\lambda^{BA}_{i,jk}[1-\lambda^{BA}_{i,jk}] =\omega^{BA}_{i,jjkk}, \end{array} $$
(98)

where \(\lambda ^{BA}_{i,jk}\) is computed by Eq. 46.

Computation of \(\text {cov}(Y_{ij},Y_{ik}Y_{i\ell })=\phi _{i,jk \ell } (\boldsymbol {\beta },\sigma ^{2}_{\gamma })\)

Because

$$ \begin{array}{@{}rcl@{}} \text{cov}(Y_{ij},Y_{ik}Y_{i\ell})&=&E[Y_{ij}Y_{ik}Y_{i\ell}]-\mu^{BA}_{ij}\lambda^{BA}_{i,k \ell}, \end{array} $$
(99)

we need the formula for the third order moments, namely

$$ \begin{array}{@{}rcl@{}} &&E[Y_{ij}Y_{ik}Y_{i\ell}]=E_{\gamma_{i}}E[\{Y_{ij}Y_{ik}Y_{i\ell}\}|\gamma_{i}] \\ &&=E_{\gamma_{i}}\left[E(Y_{ij}|\gamma_{i})E(Y_{ik}|\gamma_{i})E(Y_{i\ell}|\gamma_{i})\right] \\ &=&\int p^{*}_{ij}(\boldsymbol{\beta},\gamma_{i}) p^{*}_{ik}(\boldsymbol{\beta},\gamma_{i})p^{*}_{i\ell}(\boldsymbol{\beta},\gamma_{i})g_{N}(\gamma_{i})d\gamma_{i}, \end{array} $$
(100)

where, for example, \(p^{*}_{ij}(\boldsymbol {\beta },\gamma _{i})=\exp (\boldsymbol {x}^{\prime }_{ij}\boldsymbol {\beta } +\gamma _{i})/[1+\exp (\boldsymbol {x}^{\prime }_{ij}\boldsymbol {\beta }+\gamma _{i})],\) and \(g_{N}(\gamma _{i}) \equiv [\gamma _{i} \sim N(0,\sigma ^{2}_{\gamma })].\) Similar to Eq. 46, this normal integration in Eq. 100 may be computed approximately by

$$ \begin{array}{@{}rcl@{}} &&\lambda^{BA}_{i,jk\ell}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) \\ &=&{\sum}^{V}_{v_{i}=0}p^{*}_{ij}(\boldsymbol{x}_{ij};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))p^{*}_{ik}(\boldsymbol{x}_{ik};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i})) \\ & \times &p^{*}_{i\ell}(\boldsymbol{x}_{i\ell};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))\begin{pmatrix}V \\ v_{i}\end{pmatrix}(1/2)^{v_{i}}(1/2)^{V-v_{i}}, \end{array} $$
(101)

yielding

$$ \begin{array}{@{}rcl@{}} &&\phi^{BA}_{i,jk \ell} (\boldsymbol{\beta},\sigma^{2}_{\gamma})=\lambda^{BA}_{i,jk\ell}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) -\mu^{BA}_{ij}\lambda^{BA}_{i,k \ell}. \end{array} $$
(102)

Computation of \(\text {cov}(Y_{ij}Y_{ik},Y_{i\ell }Y_{im})=\omega _{i,jk\ell m} (\boldsymbol {\beta },\sigma ^{2}_{\gamma })\)

By similar calculations as in Eq. 101, one obtains

$$ \omega^{BA}_{i,jk\ell m} (\boldsymbol{\beta},\sigma^{2}_{\gamma})=\lambda^{BA}_{i,jk\ell m}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) -\lambda^{BA}_{i,jk}(\boldsymbol{\beta},\sigma^{2}_{\gamma})\lambda^{BA}_{i,\ell m}(\boldsymbol{\beta},\sigma^{2}_{\gamma}), $$
(103)

where

$$ \begin{array}{@{}rcl@{}} &&\lambda^{BA}_{i,jk\ell m}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) \\ &=&{\sum}^{V}_{v_{i}=0}p^{*}_{ij}(\boldsymbol{x}_{ij};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))p^{*}_{ik}(\boldsymbol{x}_{ik};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))p^{*}_{i\ell}(\boldsymbol{x}_{i\ell};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i})) \\ & \times &p^{*}_{im}(\boldsymbol{x}_{im};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))\begin{pmatrix}V \\ v_{i}\end{pmatrix}(1/2)^{v_{i}}(1/2)^{V-v_{i}}. \end{array} $$
(104)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sutradhar, B.C. Fixed versus Mixed Effects Based Marginal Models for Clustered Correlated Binary Data: an Overview on Advances and Challenges. Sankhya B 84, 259–302 (2022). https://doi.org/10.1007/s13571-021-00260-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13571-021-00260-3

Keywords

PACS Nos

Navigation