Skip to main content
Log in

Prediction Theory for Multinomial Proportions Using Two-stage Cluster Samples

  • Published:
Sankhya A Aims and scope Submit manuscript

Abstract

In a two-stage clusters sampling setup for categorical data, it is well known that the so-called best prediction of the category based proportions involves computing the conditional means of the non-sampled multinomial variables conditional on the sampled multinomial responses. This computation is however not easy mainly due to the complex cluster correlations among multinomial responses within a cluster. The independence assumption based approach or any linear model approach for cluster correlated data those used so far in the existing studies are not valid for the computation of such conditional means in the prediction function for multinomial data. As opposed to these ‘working’ independence or linear models based approaches, in this paper we first develop a cluster correlation structure for multinomial data and exploit this structure to compute theoretically valid formulas for the conditional means of non-sampled hypothetical responses. Next because these conditional means or equivalently the prediction function contains the regression and clustered variance/correlation parameters, we estimate these parameters using the survey sampling weights based conditional likelihood approach, whereas the existing studies mostly use the independence assumption based likelihood or moment approaches which are invalid or inadequate in a correlation setup. The proposed conditional likelihood estimators are shown to be consistent for their respective parameters leading to the consistent estimation of the prediction function for the multinomial proportions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agresti, A. (2002). Categorical Data Analysis. Wiley, New York.

    Book  MATH  Google Scholar 

  • Binder, D. (1983). On the variances of asymptotically normal estimators from complex surveys. Int. Stat. Rev. 51, 279–292.

    Article  MathSciNet  MATH  Google Scholar 

  • Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. J. Amer. Stat. Assoc. 88, 9–25.

    MATH  Google Scholar 

  • Cochran, W.G. (1977). Sampling Techniques. Wiley, New York.

    MATH  Google Scholar 

  • Ghosh, M. (1991). Estimating functions in survey sampling: A review. Oxford Science Publications, Godambe, V. P. (ed.), p. 201–210.

  • Godambe, V.P. and Thompson, M.E. (1986). Parameters of super-population and survey population: Their relationships and estimation. Int. Stat. Rev. 54, 127–138.

    Article  MATH  Google Scholar 

  • Isaki, C.T. and Fuller, W.A. (1982). Survey design under the regression super-population model. J. Amer. Stat. Assoc. 77, 89–96.

    Article  MATH  Google Scholar 

  • Kennel, T. and Valliant, R. (2020). Multivariate logistic assisted estimators of totals from clustered survey samples. Journal of Survey Statistic and Methodology, 1–35.

  • Lee, S.E., Lee, P.R. and Shin, K. (2016). A composite estimator for stratified two-stage cluster sampling. Commun. Stat. Applic. Methods 23, 47–55.

    Article  Google Scholar 

  • Lee, Y. and Nelder, J. (1996). Hierarchical generalized linear models. J. R. Stat. Soc. B 58, 619–678.

    MathSciNet  MATH  Google Scholar 

  • Lehtonen, R. and Veijanen, A. (1998). Logistic generalized regression estimators. Surv. Methodol. 24, 51–55.

    Google Scholar 

  • MacGibon, B. and Tomberlin, T.J (1989). Small area estimation of proportions via empirical Bayes techniques. Surv. Methodol. 15, 237–252.

    Google Scholar 

  • Nandram, B. and Sedransk, J. (1993). Bayesian predictive inference for a finite population proportion: Two-stage cluster sampling. J. R. Statist. Soc. B. 55, 399–408.

    MathSciNet  MATH  Google Scholar 

  • Rao, J.N.K. and Molina, I. (2015). Small Area Estimation. Wiley, New York.

    Book  MATH  Google Scholar 

  • Särndal, C-E., Swensson, B. and Wretman, J (1992). Model Assisted Survey Sampling. Springer, New York.

    Book  MATH  Google Scholar 

  • Sutradhar, B.C. (2004). On exact quasi-likelihood inference in generalized linear mixed models. Sankhya B 66, 261–289.

    Google Scholar 

  • Sutradhar, B.C. (2020). Multinomial logistic mixed models for clustered categorical data in a complex survey setup. Sankhya A. Available as online first article https://doi.org/10.1007/s13171-020-00215-2.

  • Sutradhar, B.C. (2022). Fixed versus mixed effects based marginal models for clustered correlated binary data: an overview on advances and challenges. Sankhya B84, 259–302.

    Article  MathSciNet  MATH  Google Scholar 

  • Ten Have, T.R. and Morabia, A. (1999). Mixed effects models with bivariate and univariate association parameters for longitudinal bivariate binary response data. Biometrics 55, 85–93.

    Article  MATH  Google Scholar 

  • Valliant, R. (1985). Nonlinear prediction theory and the estimation of proportions in a finite population. J. Amer. Stat. Assoc. 80, 631–641.

    Article  MathSciNet  MATH  Google Scholar 

  • Valliant, R. (1987). Generalized variance functions in stratified two-stage sampling. J. Amer. Stat. Assoc. 82, 409–508.

    Article  MathSciNet  MATH  Google Scholar 

  • Valliant, R., Dorfman, A.H. and Royal, R.M. (2000). Finite Population Sampling and Inference: A Prediction Approach. Wiley, New York.

    MATH  Google Scholar 

Download references

Acknowledgements

The author would like to thank the reviewer for comments and suggestions leading to the improvement of the paper. Thanks are also due to the Editor in Chief, the Editor and an Associate Editor for their suggestions during the review process.

Funding

No fund was used to complete this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brajendra C. Sutradhar.

Ethics declarations

Conflict of Interests

There is no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sutradhar, B.C. Prediction Theory for Multinomial Proportions Using Two-stage Cluster Samples. Sankhya A 85, 1452–1488 (2023). https://doi.org/10.1007/s13171-022-00297-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13171-022-00297-0

Keywords

Mathematics Subject Classification (2010)

Navigation