Abstract
In a two-stage clusters sampling setup for categorical data, it is well known that the so-called best prediction of the category based proportions involves computing the conditional means of the non-sampled multinomial variables conditional on the sampled multinomial responses. This computation is however not easy mainly due to the complex cluster correlations among multinomial responses within a cluster. The independence assumption based approach or any linear model approach for cluster correlated data those used so far in the existing studies are not valid for the computation of such conditional means in the prediction function for multinomial data. As opposed to these ‘working’ independence or linear models based approaches, in this paper we first develop a cluster correlation structure for multinomial data and exploit this structure to compute theoretically valid formulas for the conditional means of non-sampled hypothetical responses. Next because these conditional means or equivalently the prediction function contains the regression and clustered variance/correlation parameters, we estimate these parameters using the survey sampling weights based conditional likelihood approach, whereas the existing studies mostly use the independence assumption based likelihood or moment approaches which are invalid or inadequate in a correlation setup. The proposed conditional likelihood estimators are shown to be consistent for their respective parameters leading to the consistent estimation of the prediction function for the multinomial proportions.
Similar content being viewed by others
References
Agresti, A. (2002). Categorical Data Analysis. Wiley, New York.
Binder, D. (1983). On the variances of asymptotically normal estimators from complex surveys. Int. Stat. Rev. 51, 279–292.
Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. J. Amer. Stat. Assoc. 88, 9–25.
Cochran, W.G. (1977). Sampling Techniques. Wiley, New York.
Ghosh, M. (1991). Estimating functions in survey sampling: A review. Oxford Science Publications, Godambe, V. P. (ed.), p. 201–210.
Godambe, V.P. and Thompson, M.E. (1986). Parameters of super-population and survey population: Their relationships and estimation. Int. Stat. Rev. 54, 127–138.
Isaki, C.T. and Fuller, W.A. (1982). Survey design under the regression super-population model. J. Amer. Stat. Assoc. 77, 89–96.
Kennel, T. and Valliant, R. (2020). Multivariate logistic assisted estimators of totals from clustered survey samples. Journal of Survey Statistic and Methodology, 1–35.
Lee, S.E., Lee, P.R. and Shin, K. (2016). A composite estimator for stratified two-stage cluster sampling. Commun. Stat. Applic. Methods 23, 47–55.
Lee, Y. and Nelder, J. (1996). Hierarchical generalized linear models. J. R. Stat. Soc. B 58, 619–678.
Lehtonen, R. and Veijanen, A. (1998). Logistic generalized regression estimators. Surv. Methodol. 24, 51–55.
MacGibon, B. and Tomberlin, T.J (1989). Small area estimation of proportions via empirical Bayes techniques. Surv. Methodol. 15, 237–252.
Nandram, B. and Sedransk, J. (1993). Bayesian predictive inference for a finite population proportion: Two-stage cluster sampling. J. R. Statist. Soc. B. 55, 399–408.
Rao, J.N.K. and Molina, I. (2015). Small Area Estimation. Wiley, New York.
Särndal, C-E., Swensson, B. and Wretman, J (1992). Model Assisted Survey Sampling. Springer, New York.
Sutradhar, B.C. (2004). On exact quasi-likelihood inference in generalized linear mixed models. Sankhya B 66, 261–289.
Sutradhar, B.C. (2020). Multinomial logistic mixed models for clustered categorical data in a complex survey setup. Sankhya A. Available as online first article https://doi.org/10.1007/s13171-020-00215-2.
Sutradhar, B.C. (2022). Fixed versus mixed effects based marginal models for clustered correlated binary data: an overview on advances and challenges. Sankhya B84, 259–302.
Ten Have, T.R. and Morabia, A. (1999). Mixed effects models with bivariate and univariate association parameters for longitudinal bivariate binary response data. Biometrics 55, 85–93.
Valliant, R. (1985). Nonlinear prediction theory and the estimation of proportions in a finite population. J. Amer. Stat. Assoc. 80, 631–641.
Valliant, R. (1987). Generalized variance functions in stratified two-stage sampling. J. Amer. Stat. Assoc. 82, 409–508.
Valliant, R., Dorfman, A.H. and Royal, R.M. (2000). Finite Population Sampling and Inference: A Prediction Approach. Wiley, New York.
Acknowledgements
The author would like to thank the reviewer for comments and suggestions leading to the improvement of the paper. Thanks are also due to the Editor in Chief, the Editor and an Associate Editor for their suggestions during the review process.
Funding
No fund was used to complete this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
There is no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sutradhar, B.C. Prediction Theory for Multinomial Proportions Using Two-stage Cluster Samples. Sankhya A 85, 1452–1488 (2023). https://doi.org/10.1007/s13171-022-00297-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13171-022-00297-0
Keywords
- Conditional means for non-sampled units
- cluster correlation effect
- consistency
- doubly weights composed of sampling and correlation weights
- doubly weighted conditional likelihood estimating equations
- multinomial proportions
- random cluster effects
- two-stage cluster sample.