Abstract
Tests of goodness of fit of sparse multinomial models with non-canonical links is proposed by using approximations to the first three moments of the conditional distribution of a modified Pearson Chi-square statistic. The modified Pearson statistic is obtained using a supplementary estimating equation approach. Approximations to the first three conditional moments of the modified Pearson statistic are derived. A simulation study is conducted to compare, in terms of empirical size and power, the usual Pearson Chi-square statistic, the standardized modified Pearson Chi-square statistic using the first two conditional moments, a method using Edgeworth approximation of the p-values based on the first three conditional moments and a score test statistic. There does not seems to be any qualitative difference in size of the four methods. However, the standardized modified Pearson Chi-square statistic and the Edgeworth approximation method of obtaining p-values using the first three conditional moments show power advantages compared to the usual Pearson Chi-square statistic, and the score test statistic. In some situations, for example, for small nominal level, the standardized modified Pearson Chi-square statistic shows some power advantage over the method using Edgeworth approximation of the p-values using the first three conditional moments. Also, the former is easier to use and so is preferable. Two data sets are analyzed and a discussion is given.
Similar content being viewed by others
References
Agresti A. (2002). Categorical data analysis, 2nd. John Wiley & Sons, New York.
Cox D. R. and Reid N. (1987). Parameter orthogonality and approximate conditional inference (with discussion). J. R. Statist. Soc. B 49, 1–39.
Cressie N. and Read I. (1984). Multinomial goodness-of-fit Tests. J. R. Statist. Soc. B 46, 440–464.
Dale J. (1986). Asymptotic normality of goodness-of-fit statistics for sparse product multinomials. J. R. Statist. Soc. B 48, 48–59.
Fagerland M. W., Hosmer D. W. and Bofin A. M. (2008). Multinomial goodness-of-fit tests for logistic regression models. Statist. Med. 27, 4238–4253.
Fahrmeir L. and Tutz G. (1994). Multivariate statistical modelling based on generalized linear models. Springer-Verlag, New York.
Farrington C. P. (1996). On assessing goodness of fit of generalized linear models to sparse data. J. R. Statist. Soc. B 58, 349–360.
Firth D. (1987). Discussion of ‘parameter orthogonality and approximate conditional inference’ by D.R. Cox and N. Reid. J. R. Statist. Soc. B 49, 22–23.
Kim S., Choi H. and Lee S. (2009). Estimate-based goodness-of-fit test for large sparse multinomial distributions. Computational Statistics and Data Analysis 53, 1122–1131.
Koehler K. J. (1986). Goodness-of-fit tests for log-linear models in sparse contingency tables. J. Am. Statist. Ass. 81, 483–493.
Koehler K. J. and Larntz K. (1980). An empirical investigation of goodness-of-fit statistics for sparse multinomials. J. Am. Statist. Ass. 75, 336-344.
Lewis T., Saunders I. W. and Westcott M. (1984). The moments of the Pearson chi-squared statistic and the minimum expected value in two-way tables. Biometrika 71, 515–522.
McCullagh P. (1985). On the asymptotic distribution of Pearson’s statistic in linear exponential family models. Int. Statist. Rev. 53, 61–67.
McCullagh P. (1986). The conditional distribution of goodness-of-fit statistics for discrete data. J. Am. Statist. Ass. 81, 104–107.
McCullagh P. (1987). Tensor Methods in Statistics. Chapman and Hall, London.
Morel J. G. (1992). A simple algorithm for generating multinomial random vectors with extra variation. Comm. Statist. 21, 1255–1268.
Osius S. and Rojek D. (1992). Normal goodness-of-fit tests for parametric multinomial models with large degree of freedom. J. Am. Statist. Ass. 87, 1145–1152.
Paul S. R. and Deng D. (2000). Goodness of fit of generalized linear models to sparse data. J. R. Statist. Soc. B. 62, 323–333.
Paul S. R. and Deng D. (2012). Assessing goodness of fit of generalized linear models to sparse data using higher order moment corrections. Sankhya: The Indian Journal of Statistics (B) 74, 195–210.
Paul S. R., Liang K. Y. and Self S. G. (1989). On Testing Departure from the Binomial and Multinomial Assumptions. Biometrics 45, 231–236.
Stafford J. E. (1995). Exact cumulant calculations for Pearson χ 2 and Zelterman statistics for r-way contingency tables. Journal of Computational and Graphical Statistics 4, 199–212.
Zelterman D. (1987). Goodness-of-fit tests for large sparse multinomial distributions. J. Amer. Statist. Assoc. 82, 624–629.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Deng, D., Paul, S.R. Goodness of Fit of Product Multinomial Regression Models to Sparse Data. Sankhya B 78, 78–95 (2016). https://doi.org/10.1007/s13571-015-0109-z
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13571-015-0109-z