AStA Advances in Statistical Analysis

, Volume 96, Issue 1, pp 69–98 | Cite as

Inference on finite population categorical response: nonparametric regression-based predictive approach

  • Sumanta Adhya
  • Tathagata BanerjeeEmail author
  • Gaurangadeb Chattopadhyay
Original Paper


Suppose that a finite population consists of N distinct units. Associated with the ith unit is a polychotomous response vector, d i , and a vector of auxiliary variable x i . The values x i ’s are known for the entire population but d i ’s are known only for the units selected in the sample. The problem is to estimate the finite population proportion vector P. One of the fundamental questions in finite population sampling is how to make use of the complete auxiliary information effectively at the estimation stage. In this article a predictive estimator is proposed which incorporates the auxiliary information at the estimation stage by invoking a superpopulation model. However, the use of such estimators is often criticized since the working superpopulation model may not be correct. To protect the predictive estimator from the possible model failure, a nonparametric regression model is considered in the superpopulation. The asymptotic properties of the proposed estimator are derived and also a bootstrap-based hybrid re-sampling method for estimating the variance of the proposed estimator is developed. Results of a simulation study are reported on the performances of the predictive estimator and its re-sampling-based variance estimator from the model-based viewpoint. Finally, a data survey related to the opinions of 686 individuals on the cause of addiction is used for an empirical study to investigate the performance of the nonparametric predictive estimator from the design-based viewpoint.


Predictive approach Random coefficients splines model Laplace approximation EM algorithm 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Abe, M.: A generalized additive model for discrete choice data. J. Bus. Econ. Stat. 17, 271–284 (1999) CrossRefGoogle Scholar
  2. Adhya, S., Banerjee, T., Chattopadhyay, G.: Inference on polychotomous responses in finite population: a predictive approach. (Revised version submitted after minor revision to Scandinavian Journal of Statistics) (2010) Google Scholar
  3. Bester, C.A., Hansen, C.: Bias reduction for Bayesian and frequentist estimators (2006) Google Scholar
  4. Breidt, F.J., Opsomer, J.D.: Local polynomial regression estimators in survey sampling. Ann. Stat. 28, 1026–1053 (2000) CrossRefzbMATHMathSciNetGoogle Scholar
  5. Breidt, F.J., Opsomer, J.D.: Nonparametric and semiparametric estimation in complex surveys. In: Pfeffermann, D., Rao, C.R. (eds.) Handbook of Statistics—Sample Surveys: Inference and Analysis, vol. 29B, pp. 103–120. North Holland, Amsterdam (2009) CrossRefGoogle Scholar
  6. Breidt, F.J., Opsomer, J.D., Claeskens, G.: Model-assisted estimation of complex surveys using penalized splines. Biometrika 92, 831–846 (2005) CrossRefzbMATHMathSciNetGoogle Scholar
  7. Breslow, N.E., Clayton, D.G.: Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 88, 9–25 (1993) CrossRefzbMATHGoogle Scholar
  8. Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16, 1190–1208 (1995) CrossRefzbMATHMathSciNetGoogle Scholar
  9. Cassel, C.M., Sarndal, C.E., Wretman, J.H.: Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika 63, 615–620 (1976) CrossRefzbMATHMathSciNetGoogle Scholar
  10. Chambers, R.L., Dorfman, A.H., Hall, P.: Properties of estimators of the finite population distribution function. Biometrika 79, 577–582 (1992) CrossRefzbMATHMathSciNetGoogle Scholar
  11. Chambers, R.L., Dorfman, A.H., Wehrly, T.E.: Bias robust estimation in finite population using nonparametric calibration. J. Am. Stat. Assoc. 88, 268–277 (1993) CrossRefzbMATHMathSciNetGoogle Scholar
  12. Chen, Q., Ibahim, J.G.: Semiparametric models for missing covariate and response data in regression models. Biometrics 62, 177–184 (2006) CrossRefzbMATHMathSciNetGoogle Scholar
  13. Claeskens, G., Aerts, M., Molenberghs, G.: A quadratic bootstrap method and improved estimation in logistic regression. Stat. Probab. Lett. 61, 383–394 (2002) CrossRefMathSciNetGoogle Scholar
  14. Cox, D.R., Reid, N.: Parameter orthogonality and approximate conditional inference. J. R. Stat. Soc., Ser. B 49, 1–49 (1987) zbMATHMathSciNetGoogle Scholar
  15. Cox, D.R., Snell, E.J.: A general definition of residuals. J. R. Stat. Soc., Ser. B 30, 248–275 (1968) MathSciNetGoogle Scholar
  16. Crainiceanu, C.M., Ruppert, D.: Likelihood ratio tests in linear mixed models with one variance component. J. R. Stat. Soc., Ser. B 66, 165–185 (2004) CrossRefzbMATHMathSciNetGoogle Scholar
  17. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., Ser. B 39, 1–38 (1977) zbMATHMathSciNetGoogle Scholar
  18. Dorfman, A.H., Hall, P.: Estimators of finite population distribution function using nonparametric regression. Ann. Stat. 21, 1452–1475 (1993) CrossRefzbMATHMathSciNetGoogle Scholar
  19. Fahmeir, L., Tutz, G.: Multivariate Statistical Modeling Based on Generalized Linear Models. Springer, New York (2001) Google Scholar
  20. Fahrmeir, L., Kneib, T., Lang, S.: Penalized structured additive regression: a Bayesian perspective. Stat. Sin. 14, 731–761 (2004) zbMATHMathSciNetGoogle Scholar
  21. French, J.L., Wand, M.P.: Generalized additive models for cancer mapping with incomplete covariates. Biostatistics 5, 177–191 (2004) CrossRefzbMATHGoogle Scholar
  22. Geoman, J.J., Le Cessie, S.: A goodness-of-fit test for multinomial logistic regression. Biometrics 62, 980–995 (2006) CrossRefMathSciNetGoogle Scholar
  23. Green, P.J.: Penalized likelihood for general semiparametric regression models. Int. Stat. Rev. 55, 245–259 (1987) CrossRefzbMATHGoogle Scholar
  24. Hartzel, J., Agresti, A., Caffo, B.: Multinomial logit random effects models. Stat. Model. 1, 81–102 (2001) CrossRefzbMATHGoogle Scholar
  25. Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Chapman and Hall, London (1990) zbMATHGoogle Scholar
  26. Kass, R.E., Steffy, D.: Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). J. Am. Stat. Assoc. 84, 717–726 (1989) CrossRefGoogle Scholar
  27. Kneib, T., Fahrmeir, L.: Structured additive regression for multicategorical space-time data: a mixed model approach. Discussion Paper 377, SFB 386, Ludwig Maximilians University Munich (2004) Google Scholar
  28. Kneib, T., Fahrmeir, L.: Structured additive regression for categorical space-time data: a mixed model approach. Biometrics 62, 109–118 (2006) CrossRefzbMATHMathSciNetGoogle Scholar
  29. Kneib, T., Baumgartner, B., Steiner, W.J.: Semiparametric multinomial logit models for analysing consumer choice behaviour. AStA Adv. Stat. Anal. 91, 225–244 (2007) CrossRefzbMATHMathSciNetGoogle Scholar
  30. Kuk, A.Y.C.: A kernel method of estimating finite population distribution function using auxiliary information. Biometrika 80, 385–392 (1993) CrossRefzbMATHMathSciNetGoogle Scholar
  31. Kuo, L.: Classical and prediction approaches to estimating distribution functions from survey data. In: Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 280–285 (1988) Google Scholar
  32. Lombardia, M.J., Gonzalez-Manteiga, W., Prada-Sanchez, J.M.: Bootstrapping the Chamberts–Dunstan estimate of a finite population distribution function. J. Stat. Plan. Inference 106, 367–388 (2003) CrossRefMathSciNetGoogle Scholar
  33. Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc., Ser. B 44, 226–233 (1982) zbMATHMathSciNetGoogle Scholar
  34. Lee, Y., Nelder, J.A.: Hierarchical generalized linear models (with discussion). J. R. Stat. Soc., Ser. B 58, 619–678 (1996) zbMATHMathSciNetGoogle Scholar
  35. Lee, Y., Nelder, J.A.: Hierarchical generalized linear models: a synthesis of generalized linear models, random effect models and structured discussions. Biometrika 88, 987–1006 (2001) CrossRefzbMATHMathSciNetGoogle Scholar
  36. Little, R.J.A., Zheng, H.: Penalized spline nonparametric mixed models for inference about a finite population mean from two-stage samples. Surv. Methodol. 30, 209–218 (2004) Google Scholar
  37. Montanari, G.E., Ranalli, M.G.: Nonparametric model calibration estimation in survey sampling. J. Am. Stat. Assoc. 100, 1429–1442 (2005) CrossRefzbMATHMathSciNetGoogle Scholar
  38. Noh, M., Lee, Y.: REML estimation for binary data in GLMMs. J. Multivar. Anal. 98, 896–915 (2007) CrossRefzbMATHMathSciNetGoogle Scholar
  39. Opsomer, J.D., Breidt, F.J., Moisen, G.G., Kauermann, G.: Model-assisted estimation of forest resources with generalized additive models. J. Am. Stat. Assoc. 102, 400–409 (2007) CrossRefzbMATHMathSciNetGoogle Scholar
  40. Randles, R.H.: On the asymptotic normality of statistics with estimated parameters. Ann. Stat. 10, 462–474 (1982) CrossRefzbMATHMathSciNetGoogle Scholar
  41. Royall, R.M.: On finite population sampling theory under certain linear regression model. Biometrika 57, 377–387 (1970) CrossRefzbMATHGoogle Scholar
  42. Royall, R.M.: The linear least-square prediction approach to two-stage sampling. J. Am. Stat. Assoc. 71, 657–664 (1976) CrossRefzbMATHMathSciNetGoogle Scholar
  43. Ruppert, D.: Selecting the number of knots for penalized spline. J. Comput. Graph. Stat. 11, 735–757 (2002) CrossRefMathSciNetGoogle Scholar
  44. Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, New York (2003) CrossRefzbMATHGoogle Scholar
  45. Rudin, W.: Principles of Mathematical Analysis, 3rd edn. McGraw-Hill, Singapore (1976) zbMATHGoogle Scholar
  46. Sarndal, C.E.: On π-inverse weighting verses best linear unbiased weighting in probability sampling. Biometrika 67, 639–650 (1980) zbMATHMathSciNetGoogle Scholar
  47. Shao, J., Tu, D.: The Jackknife and Bootstrap. Springer, New York (1996) Google Scholar
  48. Steele, B.M.: A modified EM algorithm for estimation in generalized mixed models. Biometrics 52, 1295–1310 (1996) CrossRefzbMATHMathSciNetGoogle Scholar
  49. Tierney, L., Kass, R.E., Kadane, J.B.: Fully exponential Laplace approximations to expectations and variances of nonpositive functions. J. Am. Stat. Assoc. 84, 710–716 (1989) CrossRefzbMATHMathSciNetGoogle Scholar
  50. Tutz, G., Scholz, T.: Semiparametric modeling of multicategorical data. J. Stat. Comput. Simul. 74, 183–200 (2004) CrossRefzbMATHMathSciNetGoogle Scholar
  51. Wei, G.C.G., Tanner, M.A.: A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Am. Stat. Assoc. 85, 699–704 (1990) CrossRefGoogle Scholar
  52. Wu, C., Sitter, R.R.: A model-calibration approach to using auxiliary information from survey data. J. Am. Stat. Assoc. 96, 185–193 (2001) CrossRefzbMATHMathSciNetGoogle Scholar
  53. Yee, T.W., Wild, C.J.: Vector generalized additive models. J. R. Stat. Soc., Ser. B 58, 481–493 (1996) zbMATHMathSciNetGoogle Scholar
  54. Yu, Y., Ruppert, D.: Penalized spline estimation for partially linear single-index models. J. Am. Stat. Assoc. 97, 1042–1054 (2002) CrossRefzbMATHMathSciNetGoogle Scholar
  55. Yu, Y.: Penalized spline estimation in generalized partially linear single-index models. Technical Report, College of Business, University of Cincinnciti (2008) Google Scholar
  56. Zheng, H., Little, R.J.A.: Penalized spline model-based estimation of the finite population total from probability-proportional-to-size samples. J. Off. Stat. 21, 1–20 (2005) Google Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Sumanta Adhya
    • 1
  • Tathagata Banerjee
    • 2
    Email author
  • Gaurangadeb Chattopadhyay
    • 3
  1. 1.Department of StatisticsWest Bengal State UniversityBarasatIndia
  2. 2.Production and Quantitative MethodsIndian Institute of ManagementAhmedabadIndia
  3. 3.Department of StatisticsUniversity of CalcuttaKolkataIndia

Personalised recommendations