Robust semiparametric inference for polytomous logistic regression with complex survey design

Abstract

Analyzing polytomous response from a complex survey scheme, like stratified or cluster sampling is very crucial in several socio-economics applications. We present a class of minimum quasi weighted density power divergence estimators for the polytomous logistic regression model with such a complex survey. This family of semiparametric estimators is a robust generalization of the maximum quasi weighted likelihood estimator exploiting the advantages of the popular density power divergence measure. Accordingly robust estimators for the design effects are also derived. Using the new estimators, robust testing of general linear hypotheses on the regression coefficients are proposed. Their asymptotic distributions and robustness properties are theoretically studied and also empirically validated through a numerical example and an extensive Monte Carlo study.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, Hoboken

    Book  Google Scholar 

  2. Alonso-Revenga JM, Martín N, Pardo L (2017) New improved estimators for overdispersion in models with clustered multinomial data and unequal cluster sizes. Stat Comput 27:193–217

    MathSciNet  Article  Google Scholar 

  3. Basu A, Harris IR, Hjort NL, Jones MC (1998) Robust and efficient estimation by minimizing a density power divergence. Biometrika 85:549–559

    MathSciNet  Article  Google Scholar 

  4. Basu A, Shioya H, Park C (2011) Statistical inference: the minimum distance approach. Chapman & Hall/CRC, Boca Raton

    Book  Google Scholar 

  5. Basu A, Ghosh A, Mandal N Martin, Pardo L (2017) A Wald-type test statistic for testing linear hypothesis in logistic regression models based on minimum density power divergence estimator. Electron J Stat 11:2741–2772

    MathSciNet  Article  Google Scholar 

  6. Basu A, Ghosh A, Martin N, Pardo L (2018) Robust Wald-type tests for non-homogeneous observations based on the minimum density power divergence estimator. Metrika 81:493–522

    MathSciNet  Article  Google Scholar 

  7. Beaumont JF, Rivest LP (2009) Dealing with outliers in survey data, chapter 11. In: Rao (ed) Handbook of statistics, vol 29, Part A. Elsevier

  8. Beaumont JF, Haziza D, Ruiz-Gazen A (2013) A unified approach to robust estimation in finite population sampling. Biometrika 100:555–569

    MathSciNet  Article  Google Scholar 

  9. Beran R (1977) Minimum Hellinger distance estimates for parametric models. Ann Stat 5:445–463

    MathSciNet  Article  Google Scholar 

  10. Bianco AM, Martinez E (2009) Robust testing in the logistic regression model. Comput Stat Data Anal 53:4095–4105

    MathSciNet  Article  Google Scholar 

  11. Bianco AM, Yohai VJ (1996) Robust estimation in the logistic regression model. In: Robust statistics, data analysis, and computer intensive methods (Schloss Thurnau, 1994), volume 109 of lecture notes in statistics. Springer, New York, pp 17–34

  12. Binder DA (1983) On the variance of asymptotically normal estimators from complex surveys. Int Stat Rev 51:279–292

    MathSciNet  Article  Google Scholar 

  13. Bondell HD (2008) A characteristic function approach to the biased sampling model, with application to robust logistic regression. J Stat Plan Inference 138:742–755

    MathSciNet  Article  Google Scholar 

  14. Castilla E, Martin N, Pardo L (2018) Minimum phi-divergence estimators for multinomial logistic regression with complex sample design. Adv Stat Anal 102:381–411

    MathSciNet  Article  Google Scholar 

  15. Castilla E, Ghosh A, Martin N, Pardo L (2019) New robust statistical procedures for polytomous logistic regression models. Biometrics 74:1282–1291

    MathSciNet  Article  Google Scholar 

  16. Chambers RL (1986) Outlier robust finite population estimation. J Am Stat Assoc 81:1063–1069

    MathSciNet  Article  Google Scholar 

  17. Croux C, Haesbroeck G (2003) Implementing the Bianco and Yohai estimator for logistic regression. Comput Stat Data Anal 44:273–295

    MathSciNet  Article  Google Scholar 

  18. Department of Statistics (DOS) and ICF (2019) Jordan Population and Family and Health Survey 2017-18. Amman, Jordan, and Rockville, Maryland, USA: DOS and ICF. https://dhsprogram.com/publications/publication-fr346-dhs-final-reports.cfm. Accessed 20 Nov 2020

  19. Ghosh A, Basu A (2013) Robust estimation for independent non-homogeneous observations using density power divergence with applications to linear regression. Electron J Stat 7:2420–2456

    MathSciNet  Article  Google Scholar 

  20. Ghosh A, Basu A (2015) Robust estimation for non-homogeneous data and the selection of the optimal tuning parameter: the density power divergence approach. J Appl Stat 42:2056–2072

    MathSciNet  Article  Google Scholar 

  21. Ghosh A, Basu A (2016) Robust estimation in generalized linear models: the density power divergence approach. TEST 25:269–290

    MathSciNet  Article  Google Scholar 

  22. Ghosh A, Basu A (2018) Robust Bounded Influence Tests for Independent but Non-Homogeneous observations. Stat Sin 28:1133–1155

    MATH  Google Scholar 

  23. Gupta AK, Kasturiratna D, Nguyen T, Pardo L (2006) A new family of BAN estimators for polytomous logistic regression models based on density power divergence measures. Stat Methods Appl 15:159–176

    MathSciNet  Article  Google Scholar 

  24. Gupta AK, Nguyen T, Pardo L (2008) Residuals for polytomous logistic regression models based on density power divergences test statistics. Statistics 42:495–514

    MathSciNet  Article  Google Scholar 

  25. Hampel FR, Ronchetti E, Rousseeuw PJ, Stahel W (1986) Robust statistics: the approach based on influence functions. Wiley, New York

    MATH  Google Scholar 

  26. Jiménez R, Shao Y (2001) On robustness and efficiency of minimum divergence estimators. Test 10:241–248

    MathSciNet  Article  Google Scholar 

  27. Johnson W (1985) Influence measures for logistic regression: another point of view. Biometrics 72:59–65

    Article  Google Scholar 

  28. Lesaffre E, Albert A (1989) Multiple-group logistic regression diagnostic. Appl Stat 38:425–440

    MathSciNet  Article  Google Scholar 

  29. Lindsay BG (1994) Efficiency versus robustness: the case for minimum Hellinger distance and related methods. Ann Stat 22:1081–1114

    MathSciNet  Article  Google Scholar 

  30. McCullagh P (1980) Regression models for ordinary data. J R Stat Soc Ser B 42:109–142

    MATH  Google Scholar 

  31. Morel G (1989) Logistic regression under complex survey designs. Surv Methodol 15:203–223

    Google Scholar 

  32. Morel JG, Koehler KJ (1995) A one-step Gauss–Newton estimator for modelling categorical data with extraneous variation. J R Stat Soc Ser C 44:187–200

    MATH  Google Scholar 

  33. Morel G, Neerchal NK (2012) Overdispersion models in SAS. SAS Institute, Cary

    Google Scholar 

  34. Pardo L (2005) Statistical inference based on divergence measures. Statistics: texbooks and monographs. Chapman & Hall/CRC, New York

    Google Scholar 

  35. Raim AM, Neerchal NK, Morel JG (2015) Modeling overdispersion in R. Technical Report HPCI-2015-1 UMBCH High Performance Computing Facility, University of Maryland, Baltimore Country

  36. Roberts G, Rao JNK, Kumer S (1987) Logistic regression analysis of sample survey data. Biometrika 74:1–12

    MathSciNet  Article  Google Scholar 

  37. Rousseeuw PJ, Christmann A (2003) Robustness against separation and outliers in logistic regression. Comput Stat Data Anal 43:315–332

    MathSciNet  Article  Google Scholar 

  38. Tambay JL (1988) An integrated approach for the treatment of outliers in sub-annual economic surveys. In: Proceedings of the section on survey research methods. American Statistical Association, pp 229–234

  39. Toma A (2007) Minimum Hellinger distance estimators for some multivariate models: influence functions and breakdown point results. C R Math 345:353–358

    MathSciNet  Article  Google Scholar 

  40. Warwick J, Jones MC (2005) Choosing a robustness tuning parameter. J Stat Comput Simul 75:581–588

    MathSciNet  Article  Google Scholar 

  41. Wedderburn RWM (1974) Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika 61:439–447

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the reviewer for his/her helpful comments and suggestions. This research is partially supported by Grant PGC2018-005194-B-100 and Grant FPU16/0314 from Ministerio de Ciencia, Innovación y Universidades (Spain).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Nirian Martin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 278 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Castilla, E., Ghosh, A., Martin, N. et al. Robust semiparametric inference for polytomous logistic regression with complex survey design. Adv Data Anal Classif (2020). https://doi.org/10.1007/s11634-020-00430-7

Download citation

Keywords

  • Cluster sampling
  • Design effect
  • Minimum quasi weighted DPD estimator
  • Polytomous logistic regression model
  • Pseudo minimum phi-divergence estimator
  • Quasi-likelihood
  • Robustness

Mathematics Subject Classification

  • 62J05
  • 62F12
  • 62F35
  • 62H15
  • 62F10