Skip to main content
Log in

Inference on finite population categorical response: nonparametric regression-based predictive approach

  • Original Paper
  • Published:
AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Abstract

Suppose that a finite population consists of N distinct units. Associated with the ith unit is a polychotomous response vector, d i , and a vector of auxiliary variable x i . The values x i ’s are known for the entire population but d i ’s are known only for the units selected in the sample. The problem is to estimate the finite population proportion vector P. One of the fundamental questions in finite population sampling is how to make use of the complete auxiliary information effectively at the estimation stage. In this article a predictive estimator is proposed which incorporates the auxiliary information at the estimation stage by invoking a superpopulation model. However, the use of such estimators is often criticized since the working superpopulation model may not be correct. To protect the predictive estimator from the possible model failure, a nonparametric regression model is considered in the superpopulation. The asymptotic properties of the proposed estimator are derived and also a bootstrap-based hybrid re-sampling method for estimating the variance of the proposed estimator is developed. Results of a simulation study are reported on the performances of the predictive estimator and its re-sampling-based variance estimator from the model-based viewpoint. Finally, a data survey related to the opinions of 686 individuals on the cause of addiction is used for an empirical study to investigate the performance of the nonparametric predictive estimator from the design-based viewpoint.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abe, M.: A generalized additive model for discrete choice data. J. Bus. Econ. Stat. 17, 271–284 (1999)

    Article  Google Scholar 

  • Adhya, S., Banerjee, T., Chattopadhyay, G.: Inference on polychotomous responses in finite population: a predictive approach. (Revised version submitted after minor revision to Scandinavian Journal of Statistics) (2010)

  • Bester, C.A., Hansen, C.: Bias reduction for Bayesian and frequentist estimators (2006)

  • Breidt, F.J., Opsomer, J.D.: Local polynomial regression estimators in survey sampling. Ann. Stat. 28, 1026–1053 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  • Breidt, F.J., Opsomer, J.D.: Nonparametric and semiparametric estimation in complex surveys. In: Pfeffermann, D., Rao, C.R. (eds.) Handbook of Statistics—Sample Surveys: Inference and Analysis, vol. 29B, pp. 103–120. North Holland, Amsterdam (2009)

    Chapter  Google Scholar 

  • Breidt, F.J., Opsomer, J.D., Claeskens, G.: Model-assisted estimation of complex surveys using penalized splines. Biometrika 92, 831–846 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Breslow, N.E., Clayton, D.G.: Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 88, 9–25 (1993)

    Article  MATH  Google Scholar 

  • Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16, 1190–1208 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  • Cassel, C.M., Sarndal, C.E., Wretman, J.H.: Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika 63, 615–620 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  • Chambers, R.L., Dorfman, A.H., Hall, P.: Properties of estimators of the finite population distribution function. Biometrika 79, 577–582 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  • Chambers, R.L., Dorfman, A.H., Wehrly, T.E.: Bias robust estimation in finite population using nonparametric calibration. J. Am. Stat. Assoc. 88, 268–277 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  • Chen, Q., Ibahim, J.G.: Semiparametric models for missing covariate and response data in regression models. Biometrics 62, 177–184 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  • Claeskens, G., Aerts, M., Molenberghs, G.: A quadratic bootstrap method and improved estimation in logistic regression. Stat. Probab. Lett. 61, 383–394 (2002)

    Article  MathSciNet  Google Scholar 

  • Cox, D.R., Reid, N.: Parameter orthogonality and approximate conditional inference. J. R. Stat. Soc., Ser. B 49, 1–49 (1987)

    MATH  MathSciNet  Google Scholar 

  • Cox, D.R., Snell, E.J.: A general definition of residuals. J. R. Stat. Soc., Ser. B 30, 248–275 (1968)

    MathSciNet  Google Scholar 

  • Crainiceanu, C.M., Ruppert, D.: Likelihood ratio tests in linear mixed models with one variance component. J. R. Stat. Soc., Ser. B 66, 165–185 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., Ser. B 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  • Dorfman, A.H., Hall, P.: Estimators of finite population distribution function using nonparametric regression. Ann. Stat. 21, 1452–1475 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  • Fahmeir, L., Tutz, G.: Multivariate Statistical Modeling Based on Generalized Linear Models. Springer, New York (2001)

    Google Scholar 

  • Fahrmeir, L., Kneib, T., Lang, S.: Penalized structured additive regression: a Bayesian perspective. Stat. Sin. 14, 731–761 (2004)

    MATH  MathSciNet  Google Scholar 

  • French, J.L., Wand, M.P.: Generalized additive models for cancer mapping with incomplete covariates. Biostatistics 5, 177–191 (2004)

    Article  MATH  Google Scholar 

  • Geoman, J.J., Le Cessie, S.: A goodness-of-fit test for multinomial logistic regression. Biometrics 62, 980–995 (2006)

    Article  MathSciNet  Google Scholar 

  • Green, P.J.: Penalized likelihood for general semiparametric regression models. Int. Stat. Rev. 55, 245–259 (1987)

    Article  MATH  Google Scholar 

  • Hartzel, J., Agresti, A., Caffo, B.: Multinomial logit random effects models. Stat. Model. 1, 81–102 (2001)

    Article  MATH  Google Scholar 

  • Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Chapman and Hall, London (1990)

    MATH  Google Scholar 

  • Kass, R.E., Steffy, D.: Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). J. Am. Stat. Assoc. 84, 717–726 (1989)

    Article  Google Scholar 

  • Kneib, T., Fahrmeir, L.: Structured additive regression for multicategorical space-time data: a mixed model approach. Discussion Paper 377, SFB 386, Ludwig Maximilians University Munich (2004)

  • Kneib, T., Fahrmeir, L.: Structured additive regression for categorical space-time data: a mixed model approach. Biometrics 62, 109–118 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  • Kneib, T., Baumgartner, B., Steiner, W.J.: Semiparametric multinomial logit models for analysing consumer choice behaviour. AStA Adv. Stat. Anal. 91, 225–244 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Kuk, A.Y.C.: A kernel method of estimating finite population distribution function using auxiliary information. Biometrika 80, 385–392 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  • Kuo, L.: Classical and prediction approaches to estimating distribution functions from survey data. In: Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 280–285 (1988)

    Google Scholar 

  • Lombardia, M.J., Gonzalez-Manteiga, W., Prada-Sanchez, J.M.: Bootstrapping the Chamberts–Dunstan estimate of a finite population distribution function. J. Stat. Plan. Inference 106, 367–388 (2003)

    Article  MathSciNet  Google Scholar 

  • Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc., Ser. B 44, 226–233 (1982)

    MATH  MathSciNet  Google Scholar 

  • Lee, Y., Nelder, J.A.: Hierarchical generalized linear models (with discussion). J. R. Stat. Soc., Ser. B 58, 619–678 (1996)

    MATH  MathSciNet  Google Scholar 

  • Lee, Y., Nelder, J.A.: Hierarchical generalized linear models: a synthesis of generalized linear models, random effect models and structured discussions. Biometrika 88, 987–1006 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Little, R.J.A., Zheng, H.: Penalized spline nonparametric mixed models for inference about a finite population mean from two-stage samples. Surv. Methodol. 30, 209–218 (2004)

    Google Scholar 

  • Montanari, G.E., Ranalli, M.G.: Nonparametric model calibration estimation in survey sampling. J. Am. Stat. Assoc. 100, 1429–1442 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Noh, M., Lee, Y.: REML estimation for binary data in GLMMs. J. Multivar. Anal. 98, 896–915 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Opsomer, J.D., Breidt, F.J., Moisen, G.G., Kauermann, G.: Model-assisted estimation of forest resources with generalized additive models. J. Am. Stat. Assoc. 102, 400–409 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Randles, R.H.: On the asymptotic normality of statistics with estimated parameters. Ann. Stat. 10, 462–474 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  • Royall, R.M.: On finite population sampling theory under certain linear regression model. Biometrika 57, 377–387 (1970)

    Article  MATH  Google Scholar 

  • Royall, R.M.: The linear least-square prediction approach to two-stage sampling. J. Am. Stat. Assoc. 71, 657–664 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  • Ruppert, D.: Selecting the number of knots for penalized spline. J. Comput. Graph. Stat. 11, 735–757 (2002)

    Article  MathSciNet  Google Scholar 

  • Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, New York (2003)

    Book  MATH  Google Scholar 

  • Rudin, W.: Principles of Mathematical Analysis, 3rd edn. McGraw-Hill, Singapore (1976)

    MATH  Google Scholar 

  • Sarndal, C.E.: On π-inverse weighting verses best linear unbiased weighting in probability sampling. Biometrika 67, 639–650 (1980)

    MATH  MathSciNet  Google Scholar 

  • Shao, J., Tu, D.: The Jackknife and Bootstrap. Springer, New York (1996)

    Google Scholar 

  • Steele, B.M.: A modified EM algorithm for estimation in generalized mixed models. Biometrics 52, 1295–1310 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  • Tierney, L., Kass, R.E., Kadane, J.B.: Fully exponential Laplace approximations to expectations and variances of nonpositive functions. J. Am. Stat. Assoc. 84, 710–716 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  • Tutz, G., Scholz, T.: Semiparametric modeling of multicategorical data. J. Stat. Comput. Simul. 74, 183–200 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Wei, G.C.G., Tanner, M.A.: A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Am. Stat. Assoc. 85, 699–704 (1990)

    Article  Google Scholar 

  • Wu, C., Sitter, R.R.: A model-calibration approach to using auxiliary information from survey data. J. Am. Stat. Assoc. 96, 185–193 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Yee, T.W., Wild, C.J.: Vector generalized additive models. J. R. Stat. Soc., Ser. B 58, 481–493 (1996)

    MATH  MathSciNet  Google Scholar 

  • Yu, Y., Ruppert, D.: Penalized spline estimation for partially linear single-index models. J. Am. Stat. Assoc. 97, 1042–1054 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  • Yu, Y.: Penalized spline estimation in generalized partially linear single-index models. Technical Report, College of Business, University of Cincinnciti (2008)

  • Zheng, H., Little, R.J.A.: Penalized spline model-based estimation of the finite population total from probability-proportional-to-size samples. J. Off. Stat. 21, 1–20 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tathagata Banerjee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Adhya, S., Banerjee, T. & Chattopadhyay, G. Inference on finite population categorical response: nonparametric regression-based predictive approach. AStA Adv Stat Anal 96, 69–98 (2012). https://doi.org/10.1007/s10182-011-0159-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10182-011-0159-0

Keywords

Navigation