Cluster Correlations and Complexity in Binary Regression Analysis Using Two-stage Cluster Samples

Sutradhar, Brajendra C.

doi:10.1007/s13171-022-00281-8

Cluster Correlations and Complexity in Binary Regression Analysis Using Two-stage Cluster Samples

Published: 02 May 2022

Volume 85, pages 829–884, (2023)
Cite this article

Sankhya A Aims and scope Submit manuscript

Brajendra C. Sutradhar¹

254 Accesses
1 Citation
Explore all metrics

Abstract

In a two-stage cluster sampling setup for binary data, a sample of clusters such as hospitals is chosen at the first stage from a large number of clusters belonging to a finite population, and in the second stage a random sample of individuals such as nurses is chosen from the selected cluster and the binary responses along with covariates are collected from the selected individuals. Because the hypothetical binary responses from the individuals in a given cluster/hospital under the first stage sample are correlated (as they share a common cluster effect), this correlation plays a complex role in developing the second stage sample based estimating equations for the underlying regression parameters. Moreover, the correlation parameters have to be consistently estimated too. In this paper, unlike the existing studies, we demonstrate how to accommodate (1) the so-called inverse correlation weights arising from a finite population based generalized quasi-likelihood (GQL) estimating function, on top of (2) the sampling weights, to develop a survey sample based doubly weighted (SSDW) estimation approach, for consistent estimation of both regression and correlation parameters. For simplicity, we refer to this GQL cum SSDW approach as the SSDW approach only. The method of moments (MM) cum SSDW approach will be simpler but less efficient, which is not included in the paper. The estimating function involved in the proposed SSDW estimating equation has the form of a sample total, which unbiasedly estimate the corresponding finite population total that arises from the aforementioned generalized quasi-likelihood function for the targeted finite population parameter. The resulting SSDW estimators, thus, become consistent for the respective parameters. This consistency property for the SSDW estimator for both regression and cluster correlation parameters is studied in details.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introducing an efficient sampling method for national surveys with limited sample sizes: application to a national study to determine quality and cost of healthcare

Article Open access 17 July 2021

Doubly Weighted Estimation Approach for Linear Regression Analysis with Two-stage Cluster Samples

Article 15 December 2023

Semiparametric empirical likelihood estimation for two-stage outcome-dependent sampling under the frame of generalized linear models

Article 01 July 2014

References

Binder, D. (1983). On the variances of asymptotically normal estimators from complex surveys. Int. Stat. Rev. 51, 279–292.
Article MATH Google Scholar
Binder, D. and Roberts, G. (2009). Design-and model based inference for model parameters,.
Breslow, N.E. (1993). Approximate inference in generalized linear mixed models. Journal of American Statistical Association 88, 9–25.
MATH Google Scholar
Burdick, R.K. and Sielken Jr, R.L. (1979). Variance estimation based on superpopulation model in two-stage sampling. Journal of American Statistical Association 74, 438–440.
Google Scholar
Burgard, J.P. and Dörr, P. (2021). Generalized Linear Mixed Models with Crossed Effects and Unit-specific Survey Weights. Journal of Computational and Graphical Statistics. https://doi.org/10.1080/10618600.2021.2001342.
Christensen, R. (1984). A note on ordinary least squares methods for two-stage sampling. Journal of American Statistical Association 79, 720–721.
Article Google Scholar
Christensen, R. (1987). The analysis of two-stage sampling data by ordinary least squares. Journal of American Statistical Association 82, 492–498.
Article MATH Google Scholar
Cochran, W.G. (1977). Sampling Techniques. John Wiley & Sons, New York.
MATH Google Scholar
Ekholm, A., Smith, P.W.F. and Mc Donald, J.W. (1995). Marginal regression analysis of a multivariate binary response. Biometrika 82, 847–854.
Article MATH Google Scholar
Fay, R.E. and Herriot, R.A. (1979). Estimates of income for small places: An application of James-Stein procedures to census data. Journal of American Statistical Association 74, 269–277.
Article Google Scholar
Fuller, W.A. (2009). Sampling Statistics. John Wiley & Sons, New York.
Book MATH Google Scholar
Ghosh, M. (1991). Estimating functions in survey sampling : a review,.
Godambe, V.P. (1986). Parameters of super-population and survey population: Their relationships and estimation. International Statistical Review 54, 127–138.
Article MATH Google Scholar
Hansen, M.H., Madow, W.G. and Tepping, B.J. (1983). An evaluation of model-dependent and probability-sampling inferences in sample surveys. Journal of American Statistical Association 78, 776–793.
Article Google Scholar
Jiang, J. (1998). Consistent estimators in generalized linear mixed models. Journal of American Statistical Association 93, 720–729.
Article MATH Google Scholar
Kennel, T. and Valliant, R. (2020). Multivariate logistic assisted estimators of totals from clustered survey samples. Journal of survey statistic and methodology, pp. 1–35.
Lee, S. E., Lee, P. R. and Shin, K. (2016). A composite estimator for stratified two stage cluster sampling. Communications for Statistical Applications and Methods 23, 47–55.
Article Google Scholar
Lee, Y. and Nelder, J (1996). Hierarchical generalized linear models. Journal of Royal Statistical Society, B 58, 619–678.
MATH Google Scholar
Liang, K.-Y., Zeger, S.L. and Qaqish, B. (1992). Multivariate r egression analysis for categorical data. Journal of Royal Statistical Society, Series B54, 3–40.
MATH Google Scholar
Molina, E.A., Smith, T.M.F. and Sugden, R.A. (2001). Modelling overdispersion for complex survey data. Int. Stat. Rev. 69, 373–384.
Article MATH Google Scholar
Nandram, B. and Sedransk, J. (1993). Bayesian predictive inference for a finite population proportion:, Two-stage cluster sampling. J. R. Statist. Soc. B.55, 399–408.
MATH Google Scholar
Nelder, J.A. and Wedderburn, R.W.M. (1972). Generalized linear models. Journal of the Royal Statistical Society. Series A 135, 370–384.
Article Google Scholar
Prasad, N.G.N. and Rao, J.N.K. (1990). The estimation of the mean squared error of small-area estimators. Journal of American Statistical Association 85, 163–171.
Article MATH Google Scholar
Pfeffermann, D. and Nathan, G. (1981). Regression analysis of data from a cluster sample. Journal of American Statistical Association 76, 681–689.
Article MATH Google Scholar
Rao, J. N. K., Sutradhar, B. C. and Yue, K. (1993). Generalized least squares F test in regression analysis with two-stage cluster samples. Journal of American Statistical Association 88, 1388–1391.
MATH Google Scholar
Rao, J.N.K. and Molina, I. (2015). Small Area Estimation. John Wiley & Sons, New York.
Book MATH Google Scholar
Roberts, G., Rao, J.N.K. and Kumar, S. (1987). Logistic regression analysis of sample survey data. Biometrika 74, 1–12.
Article MATH Google Scholar
Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer, New York.
Book MATH Google Scholar
Scott, A.J. and Holt, D. (1982). The effect of two-stage sampling o n ordinary least squares methods. Journal of American Statistical Association 77, 848–854.
Article MATH Google Scholar
Seber, G.A.F. (1984). Multivariate Observations. John Wiley & Sons, New York.
Book MATH Google Scholar
Skinner, C. (2019). Analysis of categorical data for complex surveys. Int. Stat. Rev. 87, S64–S78.
Article Google Scholar
Sutradhar, B.C. (2004). On exact quasi-likelihood inference in generalized linear mixed models. Sankhya B 66, 261–289.
Google Scholar
Sutradhar, B.C. (2008). Inferences in familial Poisson mixed models for survey data. Sankhya B 70, 18–33.
MATH Google Scholar
Sutradhar, B.C. (2011). Dynamic Mixed Models for Familial Longitudinal Data. Springer, New York.
Book MATH Google Scholar
Sutradhar, B.C. (2020). Multinomial logistic mixed models for clustered categorical data in a complex survey setup. Sankhya A, Available as online first article.
Sutradhar, B.C. and Mukerjee, R. (2005). On likelihood inference in binary mixed model with an application to COPD data. Computational Statistics and Data Analysis 48, 345–361.
Article MATH Google Scholar
Ten have, T.R. and Morabia, A. (1999). Mixed effects models with bivariate and univariate association parameters for longitudinal bivariate binary response data. Biometrics 55, 85–93.
Article MATH Google Scholar
Trinkoff, A.M., Zhou, Q., Storr, C.L. and Soelken, K.L. (2000). Workplace access, negative proscriptions, job strain, and substance use in registered nurses. Nurs. Res. 49, 83–90.
Article Google Scholar
Valliant, R. (1985). Nonlinear prediction theory and the estimation of proportions in a finite population. Journal of American Statistical Association 80, 631–641.
Article MATH Google Scholar
Valliant, R. (1987). Generalized variance functions in stratified two-stage sampling. Journal of American Statistical Association 82, 499–508.
Article MATH Google Scholar
Wu, C. F. J., Holt, D. and Holmes, D.J. (1998). The effect of two-stage sampling on the F statistics. Journal of American Statistical Association83, 150–159.
MATH Google Scholar

Download references

Acknowledgments

The author would like to thank two reviewers and the Associate Editor for their valuable comments and suggestions leading to the improvement of the paper.

Funding

No fund was used to complete this research.

Author information

Authors and Affiliations

Memorial University, St. John’s, Canada
Brajendra C. Sutradhar

Authors

Brajendra C. Sutradhar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brajendra C. Sutradhar.

Ethics declarations

Conflict of Interests

There is no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Computation of the Mixed Effects Based Marginal Mean (1.7), and Covariance Matrix (1.8)

Computation of unconditional marginal mean

Use $\gamma ^{*}_{c}=\gamma _{c}/\sigma _{\gamma }$ in (1.2), and re-express the conditional mean as

$$ \begin{array}{@{}rcl@{}} \pi^{*}_{ci}(\beta,\sigma^{2}_{\gamma}, \gamma^{*}_{c})&=& =\frac{\exp(x^{\prime}_{ci}\beta+\sigma_{\gamma}\gamma^{*}_{c})}{[1+\exp(x^{\prime}_{ci}\beta+ \sigma_{\gamma}\gamma^{*}_{c})]}, \end{array} $$

(a.1)

where $\gamma ^{*}_{c} {\stackrel {iid}{\sim }} N(0,1)$. One may them compute the unconditional mean as

$$ \begin{array}{@{}rcl@{}} E_{M}[Y_{ci}]&=&\pi_{ci}(\beta,\sigma^{2}_{\gamma})=\int \pi^{*}_{ci}(\beta,\sigma^{2}_{\gamma}, \gamma^{*}_{c})g_{N}(\gamma^{*}_{c})d\gamma^{*}_{c}, \end{array} $$

(a.2)

where $g_{N}(\gamma ^{*}_{c})$ denotes the standard normal density.

Computation of unconditional covariance matrix

First, because y_ci is a binary response, the formula for its variance is written as

$$ \text{var}_{M}[Y_{ci}|x_{ci}]=\sigma_{c,ii}(\beta,\sigma^{2}_{\gamma}) =\pi_{ci}(\beta,\sigma^{2}_{\gamma})(1-\pi_{ci}(\beta,\sigma^{2}_{\gamma})), $$

(a.3)

where $\pi _{ci}(\beta ,\sigma ^{2}_{\gamma })$ is the unconditional mean, given by (a.2).

Next, because given the cluster effect, the individuals within a cluster must be pair-wise independent, we write

$$ \begin{array}{@{}rcl@{}} &&\text{cov}_{M}[\{Y_{ci},Y_{cj}\}|x_{ci},x_{cj},\gamma_{c}]=0, \end{array} $$

(a.4)

implying that

$$ \begin{array}{@{}rcl@{}} E_{M}[[\{Y_{ci},Y_{cj}\}|\gamma_{c}]&=&E_{M}[Y_{ci}|\gamma_{c}]E_{M}[Y_{cj}|\gamma_{c}] \\ &=&\pi^{*}_{ci}(\beta,\sigma^{2}_{\gamma}, \gamma^{*}_{c})\pi^{*}_{cj}(\beta,\sigma^{2}_{\gamma}, \gamma^{*}_{c}). \end{array} $$

(a.5)

Hence, the unconditional covariance between y_ci and y_cj, is given by

$$ \begin{array}{@{}rcl@{}} \text{cov}_{M}[\{Y_{ci},Y_{cj}\}|x_{ci},x_{cj}]&=&\sigma_{c,ij}(\beta, \sigma^{2}_{\gamma}) \\ &=&\lambda_{c,ij}(\beta,\sigma^{2}_{\gamma})-\pi_{ci}(\beta,\sigma^{2}_{\gamma}) \pi_{cj}(\beta,\sigma^{2}_{\gamma}), \end{array} $$

(a.6)

where

$$ \begin{array}{@{}rcl@{}} \lambda_{c,ij}(\beta,\sigma^{2}_{\gamma}) &=&E_{M}[Y_{ci}Y_{cj}]=E_{\gamma_{c}}E[\{Y_{ci}Y_{cj}\}|\gamma_{c}] \\ &=& E_{\gamma^{*}}[\pi^{*}_{ci}(\beta,\sigma^{2}_{\gamma}, \gamma^{*}_{c})\pi^{*}_{cj}(\beta,\sigma^{2}_{\gamma}, \gamma^{*}_{c})] \\ &=&\int \frac{\exp[(x_{ci}+x_{cj})'\beta+2\sigma_{\gamma} \gamma^{*}_{c}]}{[1+\exp(x^{\prime}_{ci}\beta+\sigma_{\gamma}\gamma^{*}_{c})] [1+\exp(x^{\prime}_{cj}\beta+\sigma_{\gamma}\gamma^{*}_{c})]}g_{N}(\gamma^{*}_{c})d\gamma^{*}_{c} \\ &=&\int \pi^{*}_{ci}(\beta,\sigma^{2}_{\gamma},\gamma^{*}_{c})\pi^{*}_{cj}(\beta,\sigma^{2}_{\gamma}, \gamma^{*}_{c}) g_{N}(\gamma^{*}_{c})d\gamma^{*}_{c}. \end{array} $$

Appendix B: Computation of the Covariance Matrix $V^{*}_{n}(\beta ,\sigma ^{2}_{\gamma })$ in (3.33)

By applying the indicator variables from (3.21)-(3.22), we express the formula of this matrix from (3.33), as

$$ \begin{array}{@{}rcl@{}} &&V^{*}_{n}(\beta,\sigma^{2}_{\gamma}) =\text{cov}_{p_{1}}\left[\frac{K}{k}{\sum}^{K}_{c=1}\frac{N_{c}}{n_{c}} \delta_{1,c}E_{p_{2c}}\left\{{\sum}^{N_{c}}_{i=1} \delta_{2,i|c}z_{ci}|p_{1}\right\}\right] \\ &+&E_{p_{1}}\left[(K^{2}/k^{2}){\sum}^{K}_{c=1}\frac{{N^{2}_{c}}}{{n^{2}_{c}}}\delta_{1,c} \text{cov}_{p_{2c}}\left\{{\sum}^{N_{c}}_{i=1}\delta_{2,i|c}z_{ci}|p_{1}\right\} \right]. \end{array} $$

(b.1)

Computational formula for the first term in (b1)

Notice that for hypothetically known z_ci under the FP, the expectation with respect to the sampling design p_2c, in the first term, may be computed as

$$ \begin{array}{@{}rcl@{}} E_{p_{2c}}\left\{{\sum}^{N_{c}}_{i=1} \delta_{2,i|c}z_{ci}|p_{1}\right\} &=&{\sum}^{N_{c}}_{i=1} E_{p_{2c}}[\delta_{2,i|c}]z_{ci}|p_{1} \\ &=&\frac{n_{c}}{N_{c}}{\sum}^{N_{c}}_{i=1}z_{ci}=\frac{n_{c}}{N_{c}}Z_{c}, \text{(say).} \end{array} $$

(b.2)

Because the first stage sample of clusters is chosen based on the SRS without replacement, by substituting (b.2) in the first term in (b.159), we can compute the covariance over the sampling design p₁, as

$$ \begin{array}{@{}rcl@{}} &&\text{cov}_{p_{1}}\!\left[\frac{K}{k}{\sum}^{K}_{c=1} \delta_{1,c}Z_{c}\right] = \frac{K^{2}}{k^{2}}\!\left[{\sum}^{K}_{c=1}Z_{c}Z^{\prime}_{c}\text{var}[\delta_{1,c}] + {\sum}^{K}_{c \neq d}Z_{c}Z^{\prime}_{d} \text{cov}[\delta_{1,c},\delta_{1,d}]\right]\!. \end{array} $$

(b.3)

Now because δ_1,c is the indicator variable as defined by (3.21) under the sampling design p₁ (SRS without replacement), we have

$$ \begin{array}{@{}rcl@{}} \text{var}(\delta_{1,c})&=&\frac{k}{K}(1-\frac{k}{K})\\ \text{cov}(\delta_{1,c},\delta_{1,d})&=&E(\delta_{1,c}\delta_{1,d}) -E(\delta_{1,c})E(\delta_{1,d}) \\ &=&\frac{k(k-1)}{K(K-1)}-\left( \frac{k}{K}\right)^{2} =-\frac{k}{K(K-1)}(1-\frac{k}{K}). \end{array} $$

(b.4)

Substitute (b.4) in (b.5), and write

$$ \begin{array}{@{}rcl@{}} &&\text{cov}_{p_{1}}\left[\frac{K}{k}{\sum}^{K}_{c=1} \delta_{1,c}Z_{c}\right]=\frac{K^{2}}{k^{2}}\frac{k}{K}(1-\frac{k}{K})\left[ {\sum}^{K}_{c=1}Z_{c}Z^{\prime}_{c}-\frac{1}{K-1}{\sum}^{K}_{c \neq d}Z_{c}Z^{\prime}_{d}\right] \\ &=&\frac{K}{K-1}\frac{K}{k}(1-\frac{k}{K})\left[\frac{(K-1)}{K} {\sum}^{K}_{c=1}Z_{c}Z^{\prime}_{c}-\frac{1}{K}{\sum}^{K}_{c \neq d}Z_{c}Z^{\prime}_{d}\right] \\ &=&\frac{1}{k}\frac{K^{2}}{K-1}(1-\frac{k}{K})\left[{\sum}^{K}_{c=1}Z_{c}Z^{\prime}_{c} -\frac{1}{K}\left\{{\sum}^{K}_{c=1}Z_{c}Z^{\prime}_{c}+{\sum}^{K}_{c \neq d}Z_{c}Z^{\prime}_{d}\right\}\right] \\ &=&\frac{1}{k}\frac{K^{2}}{K-1}(1-\frac{k}{K})\left[{\sum}^{K}_{c=1}Z_{c}Z^{\prime}_{c} -\frac{1}{K}\left\{{\sum}^{K}_{c=1}Z_{c} {\sum}^{K}_{c=1}Z^{\prime}_{c}\right\}\right] \\ &=&\frac{1}{k}\frac{K^{2}}{K-1}(1-\frac{k}{K})\left[{\sum}^{K}_{c=1}(Z_{c}-\bar{Z}) (Z_{c}-\bar{Z})'\right] \\ &=&K^{2}\left( \frac{K-k}{K}\right)\frac{1}{k}V_{1\cdot}(\beta,\sigma^{2}_{\gamma}), \end{array} $$

(b.5)

where, we have used

$$\bar{Z}=\frac{1}{K}{\sum}^{K}_{c=1}Z_{c}, \text{and} V_{1\cdot}(\beta,\sigma^{2}_{\gamma})=\frac{1}{K-1} {\sum}^{K}_{c=1}(Z_{c}-\bar{Z})(Z_{c}-\bar{Z})'.$$

Computational formula for the second term in (b.1)

First we obtain the covariance matrix over the second stage sampling design p_2c, as

$$ \begin{array}{@{}rcl@{}} &&\text{cov}_{p_{2c}}\left\{{\sum}^{N_{c}}_{i=1}\delta_{2,i|c}z_{ci}|p_{1}\right\} \\ &=&{\sum}^{N_{c}}_{i=1}\text{var}[\delta_{2,i|c}]z_{ci}z^{\prime}_{ci} + {\sum}^{N_{c}}_{i \neq j}\text{cov}[\delta_{2,i|c},\delta_{2,j|c}]z_{ci}z^{\prime}_{cj} \\ &=&\frac{n_{c}}{N_{c}}(1-\frac{n_{c}}{N_{c}}){\sum}^{N_{c}}_{i=1}z_{ci}z^{\prime}_{ci} -\frac{n_{c}}{N_{c}(N_{c}-1)}(1-\frac{n_{c}}{N_{c}}) {\sum}^{N_{c}}_{i \neq j}z_{ci}z^{\prime}_{cj}, \end{array} $$

(b.6)

by using the similar formula as in (b.4). Furthermore, by similar algebras as in (b.5), (b.6) reduces to

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!&&\text{cov}_{p_{2c}}\left\{{\sum}^{N_{c}}_{i=1}\delta_{2,i|c}z_{ci}|p_{1}\right\} \\ \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!&=&\!\!\!\frac{n_{c}}{N_{c}}(1-\frac{n_{c}}{N_{c}})\frac{N_{c}}{N_{c}-1} \left[\frac{(N_{c}-1)}{N_{c}}{\sum}^{N_{c}}_{i=1}z_{ci}z^{\prime}_{ci}-\frac{1}{N_{c}} {\sum}^{N_{c}}_{i \neq j}z_{ci}z^{\prime}_{cj}\right] \\ \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!&=&\!\!\!(1-\frac{n_{c}}{N_{c}})\frac{n_{c}}{N_{c}-1}{\sum}^{N_{c}}_{i=1}(z_{ci} - \bar{Z}_{c}) (z_{ci} - \bar{Z}_{c})' = n_{c}(1 - \frac{n_{c}}{N_{c}})V^{*}_{c}(\beta,\sigma^{2}_{\gamma}), \text{(say),} \end{array} $$

(b.7)

where we have used $\bar {Z}_{c}=\frac {Z_{c}}{N_{c}}=\frac {1}{N_{c}}{\sum }^{N_{c}}_{i=1}z_{ci}$. After putting (b.7) in the second term in (b.1), we take the desired expectation over the first stage sampling design p₁, which yields the formula for the second term, as

$$ \begin{array}{@{}rcl@{}} &&E_{p_{1}}\left[(K^{2}/k^{2}){\sum}^{K}_{c=1}\frac{{N^{2}_{c}}}{{n^{2}_{c}}}\delta_{1,c} \text{cov}_{p_{2c}}\left\{{\sum}^{N_{c}}_{i=1}\delta_{2,i|c}z_{ci}|p_{1}\right\} \right] \\ &=&\left[(K^{2}/k^{2}){\sum}^{K}_{c=1}\frac{{N^{2}_{c}}}{{n^{2}_{c}}}E_{p_{1}}[\delta_{1,c}] n_{c}(1-\frac{n_{c}}{N_{c}})V^{*}_{c}(\beta,\sigma^{2}_{\gamma}) \right] \\ &=&\frac{K}{k}{\sum}^{K}_{c=1}{N^{2}_{c}}\frac{N_{c}-n_{c}}{N_{c}}\frac{1}{n_{c}} V^{*}_{c}(\beta,\sigma^{2}_{\gamma}). \end{array} $$

(b.8)

Finally by combining (b.5) and (b.8), we obtain the covariance matrix $V^{*}_{n}(\beta ,\sigma ^{2}_{\gamma }),$ in (b.1), which is reported in (3.34), under Section 3.2.2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sutradhar, B.C. Cluster Correlations and Complexity in Binary Regression Analysis Using Two-stage Cluster Samples. Sankhya A 85, 829–884 (2023). https://doi.org/10.1007/s13171-022-00281-8

Download citation

Received: 18 April 2021
Accepted: 23 February 2022
Published: 02 May 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s13171-022-00281-8

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cluster Correlations and Complexity in Binary Regression Analysis Using Two-stage Cluster Samples

Abstract

Access this article

Similar content being viewed by others

Introducing an efficient sampling method for national surveys with limited sample sizes: application to a national study to determine quality and cost of healthcare

Doubly Weighted Estimation Approach for Linear Regression Analysis with Two-stage Cluster Samples

Semiparametric empirical likelihood estimation for two-stage outcome-dependent sampling under the frame of generalized linear models

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s Note

Appendices

Appendix A: Computation of the Mixed Effects Based Marginal Mean (1.7), and Covariance Matrix (1.8)

Computation of unconditional marginal mean

Computation of unconditional covariance matrix

Appendix B: Computation of the Covariance Matrix \(V^{*}_{n}(\beta ,\sigma ^{2}_{\gamma })\) in (3.33)

Computational formula for the first term in (b1)

Computational formula for the second term in (b.1)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Cluster Correlations and Complexity in Binary Regression Analysis Using Two-stage Cluster Samples

Abstract

Access this article

Similar content being viewed by others

Introducing an efficient sampling method for national surveys with limited sample sizes: application to a national study to determine quality and cost of healthcare

Doubly Weighted Estimation Approach for Linear Regression Analysis with Two-stage Cluster Samples

Semiparametric empirical likelihood estimation for two-stage outcome-dependent sampling under the frame of generalized linear models

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s Note

Appendices

Appendix A: Computation of the Mixed Effects Based Marginal Mean (1.7), and Covariance Matrix (1.8)

Computation of unconditional marginal mean

Computation of unconditional covariance matrix

Appendix B: Computation of the Covariance Matrix \(V^{*}_{n}(\beta ,\sigma ^{2}_{\gamma })\) in (3.33)

Computational formula for the first term in (b1)

Computational formula for the second term in (b.1)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation