Fixed versus Mixed Effects Based Marginal Models for Clustered Correlated Binary Data: an Overview on Advances and Challenges

Sutradhar, Brajendra C.

doi:10.1007/s13571-021-00260-3

Fixed versus Mixed Effects Based Marginal Models for Clustered Correlated Binary Data: an Overview on Advances and Challenges

Published: 09 July 2021

Volume 84, pages 259–302, (2022)
Cite this article

Sankhya B Aims and scope Submit manuscript

Brajendra C. Sutradhar¹

118 Accesses
1 Citation
Explore all metrics

Abstract

In a cross-sectional cluster setup, the binary responses from the individuals in a cluster become correlated as they share a common cluster effect, whereas longitudinal responses from an individual those form a cluster become correlated as the present and past responses are likely to maintain a suitable dynamic relationship. In both cluster and longitudinal setups, the marginal means may or may not be specified as the function of regression effects/parameters only. In a cluster setup, this depends on the distributional assumption of the random cluster effects and in a longitudinal setup this depends on the form such as linear or non-linear dynamic relationships used to construct a conditional model. However, over the last four decades, many studies arbitrarily pre-specified the marginal means as the function of regression effects only under both cluster and longitudinal setups and accommodated correlations also using arbitrarily selected ‘working’ correlation structures. This paper makes a thorough in-depth review of these decades long binary correlation models for consistent and efficient estimation of the regression effects. Both progress and drawbacks of these works are presented clearly showing how the inconsistency can arise if the pre-specified marginal fixed model is used when in fact such a marginal fixed effects model does not exist. This is because, some of the conditional random effects models in a cluster setup produce mixed effect models for the marginal means, and conditional non-linear dynamic models in a longitudinal setup produce history based marginal recursive/dynamic models. As the practitioners in both cluster and longitudinal setups deal with large data sets, it is demonstrated for their benefits how one can use the GQL (generalized quasi-likelihood) estimation approach both in cluster and longitudinal setups. Furthermore, there exist many studies using the Bayesisn approach where unlike the aforementioned parametric correlation structure based inferences, the marginal mixed effects models have been used for inferences for correlated binary data without specifying their correlation structures, under both cluster and longitudinal setup. We also provide a brief review on this alternative approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Measurement Error Analysis from Independent to Longitudinal Setup

Modelling Correlated Bivariate Binary Data: A Comparative View

Article 13 May 2022

Robust Inference Progress from Independent to Longitudinal Setup

References

Amemiya, T. (1985). Advanced Econometrics. Harvard University Press, Cambridge.
Google Scholar
Bahadur, R.R. (1961). A representation of the joint distribution of responses to n dichotomous items, 6, Solomon, H. (ed.), p. 158–168.
Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. Journal of American Statistical Association 88, 9–25.
MATH Google Scholar
Breslow, N.E. and Lin, X. (1995). Bias correction in generalized linear models with a single component of dispersion. Biometrika 82, 81–92.
Article MathSciNet MATH Google Scholar
Chen, Z., Yi, G.Y. and Wu, C. (2011). Marginal methods for correlated binary data with misclassified responses. Biometrika 98, 647–662.
Article MathSciNet MATH Google Scholar
Chib, S. and Jeliazkov, I. (2006). Inference in semiparametric dynamic models for binary longitudinal data. Journal of American Statistical Association 101, 685–700.
Article MathSciNet MATH Google Scholar
Crowder, M. (1995). On the use of a working correlation matrix in using generalized linear models for repeated measures. Biometrika 82, 407–410.
Article MATH Google Scholar
Congdon, P. (2014). Applied Bayesian Modelling. Wiley, New York.
MATH Google Scholar
Cox, D.R. (1972). The analysis of multivariate binary data. Appl. Stat.21, 113–120.
Article Google Scholar
Daniels, M.J. and Gatsonis, C. (1999). Hierarchical generalized linear models in the analysis of variations in health care utilization. Journal of American Statistical Association 94, 29–42.
Article Google Scholar
Das, K., Li, R., Sengupta, S. and Wu, R. (2013). A Bayesisn semiparametric model for bivariate sparse longitudinal data. Stat. Med. 32, 3899–3910.
Article MathSciNet Google Scholar
Ekholm, A., Smith, P.W.F. and McDonald, J.W. (1995). Marginal regression analysis of a multivariate binary response. Biometrika 82, 847–854.
Article MathSciNet MATH Google Scholar
Fokianos, K. and Kedem, B. (2003). Regression theory for categorical time series. Stat. Sci. 18, 357–376.
Article MathSciNet MATH Google Scholar
Gelfand, A.E. and Carlin, B.P. (1993). Maximum likelihood estimation for constrained or missing data problems. Canadian Journal of Statistics 21, 303–311.
Article MATH Google Scholar
Haseman, J.K. and Kuper, J.K. (1979). Analysis of dichotomous response data from certain toxicological experiments. Biometrics 35, 281–294.
Article Google Scholar
Henderson, C.R. (1963). Selection index and expected genetic advance. National Academy of Sciences, p. 141–63.
Jiang, J. (1998). Consistent estimators in generalized linear mixed models. Journal of American Statistical Association 93, 720–729.
Article MathSciNet MATH Google Scholar
Johnson, N. L. and Kotz, S. (1970). Continuous Multivariate Distributions-2. Wiley, New York.
Google Scholar
Kanter, M. (1975). Auto-regression for discrete processes mod 2. Journal of Applied Probability 12, 371–375.
Article MathSciNet MATH Google Scholar
Karim, M.R. and Zeger, S.L. (1992). Generalized linear models with random effects: Salamander mating revisited. Biometrics 48, 631–644.
Article Google Scholar
Kuk, A.Y.C. (1995). Asymptotically unbiased estimation in generalized linear models with random effects. J. R. Stastist. Soc. B 58, 619–678.
MATH Google Scholar
Laird, N.M. and Ware, J.H. (1982). Random effects models for longitudinal data. Biometrics 38, 963–974.
Article MATH Google Scholar
Lee, Y. and Nelder, J. (1996). Hierarchical generalized linear models. Journal of Royal Statistical Society, B 58, 619–678.
MathSciNet MATH Google Scholar
Liang, K.Y. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22.
Article MathSciNet MATH Google Scholar
Liang, K.-Y., Zeger, S.L. and Qaqish, B. (1992). Multivariate regression analysis for categorical data. J. Roy. Statist. Soc. Ser. B 54, 3–40.
MathSciNet MATH Google Scholar
Lin, X. and Breslow, N.E. (1996). Bias correction in generalized linear mixed models with multiple components of dispersion. J. Am. Statist. Assoc. 91, 1007–1016.
Article MathSciNet MATH Google Scholar
Lin, X. and Carroll, R.J. (2001). Semiparametric regression for cluster data using generalized estimating equations. J. Am. Statist. Asso. 96, 1045–1056.
Article MATH Google Scholar
Lipsitz, S.R., Laird, N.M. and Harrington, D.P. (1991). Generalized estimating equations for correlated binary data: using the odds ratio as a measure of association. Biometrika 78, 153–160.
Article MathSciNet Google Scholar
Loredo-Osti, J.C. and Sutradhar, B.C. (2012). Estimation of regression and dynamic dependence parameters for non-stationary multinomial time series. J. Time Ser. Anal. 33, 458–467.
Article MATH Google Scholar
McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models. Chapman and Hall, London.
Book MATH Google Scholar
McCulloch, C.E. (1997). Maximum likelihood algorithms for generalized linear mixed models. Journal of American Statistical Association 92, 162–170.
Article MathSciNet MATH Google Scholar
McDonald, D.R. (2005). The local limit theorem: a historical perspective. Journal of Iranian Statistical Society 4, 73–86.
MATH Google Scholar
McGilchrist, C.A. (1994). Estimation in generalised linear mixed models. J. R. Statist. Soc. B56, 61–69.
MathSciNet MATH Google Scholar
Neuhaus, J.M. (2002). Analysis of clustered and longitudinal binary data subject to response misclassification. Biometrics 58, 675–683.
Article MathSciNet MATH Google Scholar
Neuhaus, J.M., Kalbfleisch, J.D. and Hauck, W.W. (1991). A comparison of cluster-specific and population-averaged approaches for analyzing correlated binary data. Int. Stat. Rev. 59, 25–35.
Article Google Scholar
Parzen, M. et al. (2011). A generalized linear mixed model for longitudinal binary data with a marginal logit link function. The Annals of Applied Statistics5, 449–467.
Article MathSciNet MATH Google Scholar
Prentice, R.L. (1986). Binary regression using an extended Beta-binomial distribution. Journal of American Statistical Association 81, 321–327.
Article MATH Google Scholar
Schall, R. (1991). Estimation in generalized linear models with random effects. Biometrika 78, 719–727.
Article MATH Google Scholar
Stiratelli, R., Laird, N. and Ware, J.H. (1984). Random effects model for serial observations with binary response. Biometrics 40, 961–971.
Article Google Scholar
Sutradhar, B.C. (2003). An overview on regression models for discrete longitudinal responses. Stat. Sci. 18, 377–393.
Article MathSciNet MATH Google Scholar
Sutradhar, B.C. (2004). On exact quasi-likelihood inference in generalized linear mixed models. Sankhya B: The Indian Journal of Statistics 66, 261–289.
Google Scholar
Sutradhar, B.C. (2010). Inferences in generalized linear longitudinal mixed models, Vol. 38.
Sutradhar, B.C. (2011). Dynamic Mixed Models for Familial Longitudinal Data. Springer, New York.
Book MATH Google Scholar
Sutradhar, B.C. (2014). Longitudinal Categorical Data Analysis. Springer, New York.
Book MATH Google Scholar
Sutradhar, B.C. and Ali, M.M. (1989). A generalization of the Wishart distribution for the elliptical model and its moments for the multivariate t model. J. Multivar. Anal. 29, 155–162.
Article MathSciNet MATH Google Scholar
Sutradhar, B.C. and Das, K. (1997). Generalized linear models for beta correlated binary longitudinal data. Communications in Statistics- Theory and Methods26, 617–635.
Article MathSciNet MATH Google Scholar
Sutradhar, B.C. and Das, K. (1999). On the efficiency of regression estimators in generalized linear models for longitudinal data. Biometrika 86, 459–465.
Article MathSciNet MATH Google Scholar
Sutradhar, B.C., Bari, W. and Das, K. (2010). On probit versus logit dynamic mixed models for binary panel data. J. Stat. Comput. Simul. 80, 421–441.
Article MathSciNet MATH Google Scholar
Sutradhar, B.C. and Farrell, P.J. (2007). On optimal lag 1 dependence estimation for dynamic binary models with application to asthma data. Sankhya B 69, 448–467.
MathSciNet MATH Google Scholar
Sutradhar, B.C. and Mukerjee, R. (2005). On likelihood inference in binary mixed model with an application to COPD data. Computational Statistics and Data Analysis 48, 345–361.
Article MathSciNet MATH Google Scholar
Sutradhar, B.C. and Zheng, N. (2018). Inferences in binary dynamic fixed models in a semiparametric setup. Sankhya B 80, 263–291.
Article MathSciNet MATH Google Scholar
Wang, Z. and Louis, T.A. (2003). Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function. Biometrika90, 765–775.
Article MathSciNet MATH Google Scholar
Wang, Z. and Louis, T.A. (2004). Marginalized binary mixed-effects models with covariate-dependent random effects and likelihood inference. Biometrics 60, 884–891.
Article MathSciNet MATH Google Scholar
Wedderburn, R. (1974). Quasilikelihood functions, generalized linear models and the Gauss-Newton method. Biometrika 61, 439–447.
MathSciNet MATH Google Scholar
Yi, G.Y. and Cook, R.J. (2002). Marginal methods for incomplete longitudinal data arising in clusters. Journal of American Statistical Association 97, 1071–1080.
Article MathSciNet MATH Google Scholar
Zeger, S.L., Liang, K.Y. and Albert, P.S. (1988). Models for longitudinal data: a generalized estimating equations approach. Biometrics 44, 1049–1060.
Article MathSciNet MATH Google Scholar
Zeger, S.L., Liang, K.Y. and Self, S.G. (1985). The analysis of binary longitudinal data with time independent covariates. Biometrika 72, 31–38.
MathSciNet MATH Google Scholar

Download references

Acknowledgments

The author would like to thank a referee and the Associate Editor for their valuable comments and suggestions that lead to the improvement of the paper.

Author information

Authors and Affiliations

Memorial University, St. John’s, NL, Canada
Brajendra C. Sutradhar

Authors

Brajendra C. Sutradhar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brajendra C. Sutradhar.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Higher Order Moments (up to order 4) for Clustered Binary Responses

To compute ${\Omega }_{i}(\boldsymbol {\beta },\sigma ^{2}_{\gamma })$ in Eq. 80 on top of var(Y_ij) and cov(Y_ij,Y_ik), we need the formulas for certain specific third and fourth order moments as follows.

Computation of var(Y _ij Y _ik)

This variance is computed as

$$ \begin{array}{@{}rcl@{}} \text{var}(Y_{ij}Y_{ik})&=&E[Y^{2}_{ij}{Y^{2}_{k}}]-[E(Y_{ij}Y_{ik})]^{2}\\ &=&E(Y_{ij}Y_{ik})-[E(Y_{ij}Y_{ik})]^{2}=\lambda^{BA}_{i,jk}[1-\lambda^{BA}_{i,jk}] =\omega^{BA}_{i,jjkk}, \end{array} $$

(98)

where $\lambda ^{BA}_{i,jk}$ is computed by Eq. 46.

Computation of $\text {cov}(Y_{ij},Y_{ik}Y_{i\ell })=\phi _{i,jk \ell } (\boldsymbol {\beta },\sigma ^{2}_{\gamma })$

Because

$$ \begin{array}{@{}rcl@{}} \text{cov}(Y_{ij},Y_{ik}Y_{i\ell})&=&E[Y_{ij}Y_{ik}Y_{i\ell}]-\mu^{BA}_{ij}\lambda^{BA}_{i,k \ell}, \end{array} $$

(99)

we need the formula for the third order moments, namely

$$ \begin{array}{@{}rcl@{}} &&E[Y_{ij}Y_{ik}Y_{i\ell}]=E_{\gamma_{i}}E[\{Y_{ij}Y_{ik}Y_{i\ell}\}|\gamma_{i}] \\ &&=E_{\gamma_{i}}\left[E(Y_{ij}|\gamma_{i})E(Y_{ik}|\gamma_{i})E(Y_{i\ell}|\gamma_{i})\right] \\ &=&\int p^{*}_{ij}(\boldsymbol{\beta},\gamma_{i}) p^{*}_{ik}(\boldsymbol{\beta},\gamma_{i})p^{*}_{i\ell}(\boldsymbol{\beta},\gamma_{i})g_{N}(\gamma_{i})d\gamma_{i}, \end{array} $$

(100)

where, for example, $p^{*}_{ij}(\boldsymbol {\beta },\gamma _{i})=\exp (\boldsymbol {x}^{\prime }_{ij}\boldsymbol {\beta } +\gamma _{i})/[1+\exp (\boldsymbol {x}^{\prime }_{ij}\boldsymbol {\beta }+\gamma _{i})],$ and $g_{N}(\gamma _{i}) \equiv [\gamma _{i} \sim N(0,\sigma ^{2}_{\gamma })].$ Similar to Eq. 46, this normal integration in Eq. 100 may be computed approximately by

$$ \begin{array}{@{}rcl@{}} &&\lambda^{BA}_{i,jk\ell}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) \\ &=&{\sum}^{V}_{v_{i}=0}p^{*}_{ij}(\boldsymbol{x}_{ij};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))p^{*}_{ik}(\boldsymbol{x}_{ik};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i})) \\ & \times &p^{*}_{i\ell}(\boldsymbol{x}_{i\ell};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))\begin{pmatrix}V \\ v_{i}\end{pmatrix}(1/2)^{v_{i}}(1/2)^{V-v_{i}}, \end{array} $$

(101)

yielding

$$ \begin{array}{@{}rcl@{}} &&\phi^{BA}_{i,jk \ell} (\boldsymbol{\beta},\sigma^{2}_{\gamma})=\lambda^{BA}_{i,jk\ell}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) -\mu^{BA}_{ij}\lambda^{BA}_{i,k \ell}. \end{array} $$

(102)

Computation of $\text {cov}(Y_{ij}Y_{ik},Y_{i\ell }Y_{im})=\omega _{i,jk\ell m} (\boldsymbol {\beta },\sigma ^{2}_{\gamma })$

By similar calculations as in Eq. 101, one obtains

$$ \omega^{BA}_{i,jk\ell m} (\boldsymbol{\beta},\sigma^{2}_{\gamma})=\lambda^{BA}_{i,jk\ell m}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) -\lambda^{BA}_{i,jk}(\boldsymbol{\beta},\sigma^{2}_{\gamma})\lambda^{BA}_{i,\ell m}(\boldsymbol{\beta},\sigma^{2}_{\gamma}), $$

(103)

where

$$ \begin{array}{@{}rcl@{}} &&\lambda^{BA}_{i,jk\ell m}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) \\ &=&{\sum}^{V}_{v_{i}=0}p^{*}_{ij}(\boldsymbol{x}_{ij};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))p^{*}_{ik}(\boldsymbol{x}_{ik};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))p^{*}_{i\ell}(\boldsymbol{x}_{i\ell};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i})) \\ & \times &p^{*}_{im}(\boldsymbol{x}_{im};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))\begin{pmatrix}V \\ v_{i}\end{pmatrix}(1/2)^{v_{i}}(1/2)^{V-v_{i}}. \end{array} $$

(104)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sutradhar, B.C. Fixed versus Mixed Effects Based Marginal Models for Clustered Correlated Binary Data: an Overview on Advances and Challenges. Sankhya B 84, 259–302 (2022). https://doi.org/10.1007/s13571-021-00260-3

Download citation

Received: 31 December 2020
Accepted: 05 June 2021
Published: 09 July 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s13571-021-00260-3

Keywords

PACS Nos

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fixed versus Mixed Effects Based Marginal Models for Clustered Correlated Binary Data: an Overview on Advances and Challenges

Abstract

Access this article

Similar content being viewed by others

Measurement Error Analysis from Independent to Longitudinal Setup

Modelling Correlated Bivariate Binary Data: A Comparative View

Robust Inference Progress from Independent to Longitudinal Setup

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix: Higher Order Moments (up to order 4) for Clustered Binary Responses

Computation of var(Y _ij Y _ik)

Computation of \(\text {cov}(Y_{ij},Y_{ik}Y_{i\ell })=\phi _{i,jk \ell } (\boldsymbol {\beta },\sigma ^{2}_{\gamma })\)

Computation of \(\text {cov}(Y_{ij}Y_{ik},Y_{i\ell }Y_{im})=\omega _{i,jk\ell m} (\boldsymbol {\beta },\sigma ^{2}_{\gamma })\)

Rights and permissions

About this article

Cite this article

Keywords

PACS Nos

Navigation

Fixed versus Mixed Effects Based Marginal Models for Clustered Correlated Binary Data: an Overview on Advances and Challenges

Abstract

Access this article

Similar content being viewed by others

Measurement Error Analysis from Independent to Longitudinal Setup

Modelling Correlated Bivariate Binary Data: A Comparative View

Robust Inference Progress from Independent to Longitudinal Setup

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix: Higher Order Moments (up to order 4) for Clustered Binary Responses

Appendix: Higher Order Moments (up to order 4) for Clustered Binary Responses

Computation of var(Y ij Y ik)

Computation of \(\text {cov}(Y_{ij},Y_{ik}Y_{i\ell })=\phi _{i,jk \ell } (\boldsymbol {\beta },\sigma ^{2}_{\gamma })\)

Computation of \(\text {cov}(Y_{ij}Y_{ik},Y_{i\ell }Y_{im})=\omega _{i,jk\ell m} (\boldsymbol {\beta },\sigma ^{2}_{\gamma })\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

PACS Nos

Search

Navigation

Computation of var(Y _ij Y _ik)