Abstract
This article develops the theoretical framework needed to study the multinomial regression model for complex sample design with pseudo-minimum phi-divergence estimators. The numerical example and the simulation study propose new estimators for the parameter of the logistic regression with overdispersed multinomial distributions for the response variables, the pseudo-minimum Cressie–Read divergence estimators, as well as new estimators for the intra-cluster correlation coefficient. The simulation study shows that the Binder’s method for the intra-cluster correlation coefficient exhibits an excellent performance when the pseudo-minimum Cressie–Read divergence estimator, with \(\lambda =\frac{2}{3}\), is plugged.
Similar content being viewed by others
References
Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley, Hoboken (2002)
Alonso-Revenga, J.M., Martín, N., Pardo, L.: New improved estimators for overdispersion in models with clustered multinomial data and unequal cluster sizes. Stat. Comput. 27, 193–217 (2017)
Amemiya, T.: Qualitative response models: a survey. J. Econ. Lit. 19, 1483–1536 (1981)
An, A.B.: Performing logistic regression on survey data with the new SURVEYLOGISTIC procedure. In: Proceedings of the 27th Annual SAS Users Group International Conference, CD-Rom Version, Paper 258-27 (2002)
Anderson, J.A.: Separate sample logistic discrimination. Biometrika 59, 19–35 (1972)
Anderson, J.A.: Logistic discrimination. In: Krishnaiah, R., Kanal, L.N. (eds.) Handbook of Statistics, pp. 169–191. North-Holland Publishing Company, Amsterdam (1982)
Anderson, J.A.: Regression and ordered categorical variables. J. R. Stat. Soc. Ser. B 46, 1–30 (1984)
Binder, D.A.: On the variance of asymptotically normal estimators from complex surveys. Int. Stat. Rev. 51, 279–292 (1983)
Engel, J.: Polytomous logistic regression. Stat. Neerl. 42, 233–252 (1988)
Ghosh, A., Basu, A.: Robust estimation for independent but non-homogeneous observations using density power divergence with application to linear regression. Electron. J. Stat. 7, 2420–2456 (2013)
Ghosh, A., Basu, A.: Robust estimation for non-homogeneous data and the selection of the optimal tuning parameter: the density power divergence approach. J. Appl. Stat. 42(9), 2056–2072 (2015)
Ghosh, A., Harris, I.R., Maji, A., Basu, A., Pardo, L.: A generalized divergence for statistical inference. Bernoulli 23(4A), 2746–2783 (2017a)
Ghosh, A., Martin, N., Basu, A., Pardo, L.: A new class of robust two-sample Wald-type tests (2017b). arXiv:1702.04552
Gupta, A.K., Kasturiratna, D., Nguyen, T., Pardo, L.: A new family of BAN estimators for polytomous logistic regression models based on phi-divergence measures. Stat. Methods Appl. 15, 159–176 (2006a)
Gupta, A.K., Nguyen, T., Pardo, L.: Inference procedures for polytomous logistic regression models based on phi-divergence measures. Math. Methods Stat. 15, 269–288 (2006b)
Gupta, A.K., Pardo, L.: Phi-divergences and polytomous logistic regression models: an overview. J. Stat. Plan. Inference 137, 3513–3524 (2007)
Gupta, A.K., Nguyen, T., Pardo, L.: Residuals for polytomous logistic regression models based on phi-divergences test statistics. Statistics 42, 495–514 (2008)
Hong, C., Kim, Y.: Automatic selection of the tuning parameter in the minimum density power divergence estimation. J. Korean Stat. Soc. 30, 453465 (2001)
Lehtonen, R., Pahkinen, E.: Practical Methods for Design and Analysis of Complex Surveys. Wiley, Chchester (1995)
Lesaffre, E.: Logistic discrimination analysis with application in electrocardiography. Doctoral thesis, University of Leuven (1986)
Lesaffre, E., Albert, A.: Multiple-group logistic regression diagnostic. Appl. Stat. 38, 425–440 (1989)
Liu, I., Agresti, A.: The analysis of ordered categorical data: an overview and a survey of recent developments. With discussion and a rejoinder by the authors. Test 14, 1–73 (2005)
Mantel, N.: Models for complex contingency tables and polychotomous dosage response curves. Biometrics 22, 83–95 (1966)
McCullagh, P.: Regression models for ordinary data. J. R. Stat. Soc. Ser. B 42, 109–142 (1980)
Molina, E.A., Skinner, C.C.J.: Pseudo-likelihood and quasi-likelihood estimation for complex sampling schemes. Comput. Stat. Data Anal. 13, 395–405 (1992)
Morel, G.: Logistic regression under complex survey designs. Surv. Methodol. 15, 203–223 (1989)
Morel, G., Neerchal, N.K.: Overdispersion Models in SAS. SAS Institute, Cary (2012)
Pardo, L.: Statistical Inference Based on Divergence Measures. Statistics: Texbooks and Monographs. Chapman & Hall/CRC, New York (2005)
Rao, J.N.K., Scott, A.J.: On Chi-squared tests for multinomial contigency tables with cell proportions estimated from survey data. Ann. Stat. 6, 461–464 (1984)
Rao, J.N., Thomas, D.R.: Chi-squared tests for contingency tables. In: Skinner, C.J., Holt, D., Smith, T.M.F. (eds.) Analysis of Complex Survey, pp. 89–114. Wiley, New York (1989)
Roberts, G., Rao, J.N.K., Kumer, S.: Logistic regression analysis of sample survey data. Biometrika 74, 1–12 (1987)
SAS Institute Inc.: SAS/STAT®13.1 User’s Guide. Cary, NC (2013)
Skinner, C.J., Holt, D., Smith, T.M.F.: Analysis of Complex Surveys. Wiley, New York (1989)
Theil, H.: A multinomial extension of the linear logit model. Int. Econ. Rev. 10, 251–259 (1969)
Warwick, J., Jones, M.C.: Choosing a robustness tuning parameter. J. Stat. Comput. Simul. 75, 581–588 (2005)
Acknowledgements
We would like to thank the referees for their helpful comments and suggestions. Their comments have improved the paper. This research is partially supported by Grants MTM2012-33740, MTM2015-67057-P and ECO2015-66593-P from Ministerio de Economia y Competitividad (Spain).
Author information
Authors and Affiliations
Corresponding author
A Proof of results
A Proof of results
1.1 A.1 Proof of Theorem 1
Proof
The pseudo-minimum phi-divergence estimator of \(\varvec{\beta }\), \( \widehat{\varvec{\beta }}_{\phi ,P}\), is obtained by solving the system of equations \(\frac{\partial }{\partial \varvec{\beta }}d_{\phi }\left( \widehat{\varvec{p}},\varvec{\pi }\left( \varvec{\beta }\right) \right) =\varvec{0}_{dk}\), and then, it is also obtained from \( \varvec{u}_{\phi }\left( \varvec{\beta }\right) =\varvec{0}_{dk}\), where
with
and
Since
the expression of \(\varvec{u}_{\phi ,hi}\left( \varvec{\beta }\right) \) is rewritten as (16)–(17). \(\square \)
1.2 A.2 Proof of Theorem 2
Proof
From Theorem 1 and by following the same steps of the linearization method of Binder (1983),
where \(\varvec{U}_{\phi }\left( \varvec{\beta }\right) \) is the random variable generator of \(\varvec{u}_{\phi }\left( \varvec{\beta }\right) \) is given by (15). Taking into account that \(f_{\phi ,his}(\pi _{his}(\varvec{\beta }),\varvec{\beta })=0\) and \(f_{\phi ,his}^{\prime }(\pi _{his}(\varvec{\beta }),\varvec{\beta })=\frac{1}{\pi _{his}( \varvec{\beta })}\phi ^{\prime \prime }\left( 1\right) \), a first Taylor expansion of \(f_{\phi ,his}(\tfrac{\widehat{Y}_{his}}{m_{hi}},\varvec{ \beta })\) given in (18) is
i.e.,
and hence,
From the Central Limit Theorem
then
and thus,
Since
it follows that
Then, \(\mathbf {H}\left( \varvec{\beta }_{0}\right) \) is the limit of
as n increases, and hence, \(\mathbf {H}\left( \varvec{\beta }\right) =\lim _{n\rightarrow \infty }\mathbf {H}_{n}\left( \varvec{\beta }\right) \). On the other hand, from (29) it follows that
and this justifies that \(\mathbf {G}\left( \varvec{\beta }\right) =\lim _{n\rightarrow \infty }\mathbf {G}_{n}\left( \varvec{\beta }\right) \). \(\square \)
1.3 A.3 Proof of Theorem 3
Proof
If \(\varvec{V}[\widehat{\varvec{Y}}_{hi}]=\nu _{m_{h}}m_{h} \varvec{\Delta }(\varvec{\pi }_{hi}\left( \varvec{\beta } _{0}\right) )\), then from the expression of \(\mathbf {G}_{n_{h}}^{(h)}\left( \varvec{\beta }_{0}\right) \) given in Theorem 2,
Hence, from
and consistency of \(\mathbf {H}_{n_{h}}^{(h)}(\widehat{\varvec{\beta }} _{\phi ,P})\) and \(\widehat{\mathbf {G}}_{n_{h}}^{(h)}(\widehat{\varvec{ \beta }}_{\phi ,P})\),
is proved with
which is equivalent to (25). \(\square \)
1.4 A.4 Proof of Theorem 4
Proof
The mean vector and variance-covariance matrix of
are, respectively,
for \(h=1,\ldots ,H\). An unbiased estimator of \(\varvec{V}[\varvec{Z} _{hi}^{*}(\varvec{\beta }_{0})]\) is
from which is derived
This expression suggests using
Finally, since \(\varvec{\Delta }^{-}(\varvec{\pi }_{hi}(\widehat{ \varvec{\beta }}_{\phi ,P}))=\mathrm {diag}^{-1}(\varvec{\pi }_{hi}( \widehat{\varvec{\beta }}_{\phi ,P}))\) is a possible expression for the generalized inverse, the desired result for \(\widetilde{\nu }_{m_{h}}( \widehat{\varvec{\beta }}_{\phi ,P})\) is obtained. \(\square \)
Rights and permissions
About this article
Cite this article
Castilla, E., Martín, N. & Pardo, L. Minimum phi-divergence estimators for multinomial logistic regression with complex sample design. AStA Adv Stat Anal 102, 381–411 (2018). https://doi.org/10.1007/s10182-017-0311-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-017-0311-6