Abstract
For the goodness-of-fit test for a multinomial distribution and test of some kinds of independence of a multi-way contingency table, we consider test statistics based on the \(\phi \)-divergence family. Members of the \(\phi \)-divergence family of statistics all have an equivalent Chi-square limiting distribution under the null hypothesis. We consider a second-order correction term as an index of investigating whether the distributions of statistics are close to the Chi-square limiting distribution. We derive properties for the second-order correction term for selecting a \(\phi \)-divergence statistic when we consider an asymptotic test in the case of data being sparse. We propose a selection of statistics when we use a power divergence family of statistics and the family of Rukhin’s statistics as special \(\phi \)-divergence statistics.
Similar content being viewed by others
Data Availability
All data generated or analyzed during this study are included in this published article.
References
Agresti, A. (2002). Categorical data analysis (2nd ed.). Wiley.
Ali, S. M., & Silvey, S. D. (1966). A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society. Series B, 28, 131–142.
Birch, M. W. (1964). A new proof of the Pearson–Fisher theorem. Annals of Mathematical Statistics, 35, 817–824.
Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate analysis: Theory and practice. MIT Press.
Cressie, N., & Read, T. R. C. (1984). Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society. Series B, 46, 440–464.
Csiszár, I. (1967). Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica, 2, 299–318.
Dale, J. R. (1986). Asymptotic normality of goodness-of-fit statistics for sparse product multinomials. Journal of the Royal Statistical Society. Series B, 48, 48–59.
Holst, L. (1972). Asymptotic normality and efficiency for certain goodness-of-fit tests. Biometrika, 59, 137–145.
Koehler, K. J. (1986). Goodness-of-fit tests for log-linear models in sparse contingency tables. Journal of the American Statistical Association, 81, 483–493.
Koehler, K. J., & Larntz, K. (1980). An empirical investigation of goodness-of-fit statistics for sparse multinomials. Journal of the American Statistical Association, 75, 336–344.
Larntz, K. (1978). Small-sample comparisons of exact levels for chi-squared goodness-of-fit statistics. Journal of the American Statistical Association, 73, 253–263.
Lawal, H. B. (1984). Comparisons of the \(X^2\), \(Y^2\), Freeman–Tukey and Williams’s improved \(G^2\) test statistics in small samples of one-way multinomials. Biometrika, 71, 415–418.
Morales, D., Pardo, L., & Vajda, I. (2003). Asymptotic laws for disparity statistics in product multinomial models. Journal of Multivariate Analysis, 85, 335–360.
Morris, C. (1975). Central limit theorems for multinomial sums. Annals of Statistics, 3, 165–188.
Pardo, L., Morales, D., Salicrú, M., & Menéndez, M. L. (1993). The \(\phi \)-divergence statistics in bivariate multinomial populations including stratification. Metrika, 40, 223–235.
Pardo, L., Pardo, M. C., & Zografos, K. (1999). Homogeneity for multinomial populations based on \(\phi \)-divergences. Journal of the Japan Statistical Society, 29, 213–228.
Pardo, L. (2006). Statistical inference based on divergence measures. Chapman and Hall/CRC.
Read, T. R. C., & Cressie, N. A. C. (1988). Goodness-of-fit statistics for discrete multivariate data. Springer.
Rukhin, A. L. (1994). Optimal estimator for the mixture parameter by the method of moments and information affinity. Trans. 12th Prague conference on information theory (pp. 214–219).
Taneichi, N., Sekiya, Y., & Toyama, J. (2019). Transformed statistics for test of conditional independence in \( J \times K \times L\) contingency tables. Journal of Multivariate Analysis, 171, 193–208.
Taneichi, N., Sekiya, Y., & Toyama, J. (2021). Improvement of the test of independence among groups of factors in a multi-way contingency table. Japanese Journal of Statistics and Data Science, 4, 181–213.
Upton, G. J. G. (1982). A comparison of alternative tests for the \(2 \times 2\) comparative trial. Journal of the Royal Statistical Society. Series A, 145, 86–105.
Zografos, K., Ferentions, K., & Papaioannou, T. (1990). Sampling properties and multinomial goodness-of-fit and divergence tests. Communications in Statistics—Theory and Methods, 19, 1785–1802.
Zografos, K. (1993). Asymptotic properties of \(\Phi \)-divergence statistic and its applications in contingency tables. International Journal of Mathematics and Statistical Science, 2, 5–12.
Acknowledgements
The authors are very grateful to reviewers for their valuable comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix: Proof of Theorems 1 and 2
If we assume that statistic \(G_{\phi }\) has a continuous distribution, the characteristic function of \(G_{\phi }\) is evaluated as
where
and S is given by (7). Here, \(r_j^\phi \; (j=0, 1, 2, 3)\) satisfy
For an arbitrary natural number s, let \(\psi _{\phi }^{G(s)}(t)\) be the sth derivatives of \(\psi _{\phi }^G(t)\). Then
Therefore, the sth moment about the origin of the statistic \(G_{\phi }\) is evaluated as
where
If we put
from (A.1) and (A.2), coefficients \(b_{s,s},\;b_{s,s-1}\), and \(b_{s,s-2}\) are calculated as follows:
and
From the assumption that \(q_j = O(k^{-1}) \;(j=1, \ldots , k)\) and (7), we find that \(S=O(k^2)\). Therefore
holds. Furthermore,
Therefore, from (A.3)–(A.9), we obtain the following expression:
The second-order correction term \(m^G_{\phi }(s)\; (s=1,2,\ldots )\) is evaluated as
From (A.11), if we put \(m^G_{\phi }(s) = 0\), then \( 4\phi ^{'''}(1)+3\phi ^{(4)}(1) = o(1), \) which implies \( 4\phi ^{'''}(1)+3\phi ^{(4)}(1) \rightarrow 0 \) as \(k \rightarrow \infty \). We have completed the proof of Theorem 1. On the other hand, by substituting (8) in (A.10), we obtain the result of Theorem 2.
Proof of Corollary 3
Since the former half of Corollary 3 is immediately shown from Theorem 2, we prove the latter half. Proof of \(\vert A_1 \vert < \vert B_1 \vert \) is straightforward. The relation \(\vert A_s \vert > \vert B_s \vert \) is equivalent to \((A_s - B_s)(A_s + B_s) > 0\). Since
by noting \(S/k^2 \ge 1\), \(A_s-B_s > 0\) when \(s \ge 2\). Therefore, \(\vert A_s \vert > \vert B_s \vert \) is equivalent to \(A_s+B_s > 0\), when \(s \ge 2\). Since
\(S/k^2 > C(s)\) implies \(\vert A_s \vert > \vert B_s \vert \), when \(s \ge 2\). Similarly, \(1 \le S/k^2 < C(s) \) implies \(\vert A_s \vert < \vert B_s \vert \), when \(s \ge 2\). We obtain the results of the latter half.
Proof of Theorems 3 and 4
Let the characteristic function of \(M_\phi ^*\) be \(\psi _\phi ^M(t)\). By assuming that the distribution of \(M_\phi ^*\) is continuous, \(\psi _\phi ^M(t)\) is evaluated as follows:
where
\(S_m \; (m=1, \ldots , M)\) is given by (18), and \(\mu \) is given by (15). Here, \(d_j^\phi \; (j=0, 1, 2, 3)\) satisfy
By (A.12), the sth moment about the origin of \(M_\phi ^*\) under null hypothesis \(H_0^M\) given by (12) is expressed as follows:
where \(m_\phi ^M(s)\) is the second-order correction term of the sth moment about the origin of \(M_\phi ^*\) and is given by
In (A.14), let the coefficient of \(\mu ^\ell \) be \(c_{s,\ell }\) for \(\ell =0,1,\ldots ,s\), then \(m^M_\phi (s)\) can be written as
since \(c_{s,s}=0\), which is shown by (A.13). Here, \(m_\phi ^M(s)\) is a polynomial of \(\mu \). From (A.14), \(c_{s,s-1}\), which is the coefficient of the maximum degree for \(\mu \), is given as follows:
Therefore, by substituting the expression of \(d_j^\phi \; (j=1,2,3)\) in (A.16), we obtain
By the assumption that \( p_{j_1 \ldots j_M} =O(K^{-1}) \; (j_m=1, \ldots , J_m; m=1, \ldots , M), \) \( p_{\cdot (m, j_m)}=O(J_m^{-1}) \; (j_m=1, \ldots , J_m; m=1, \ldots , M) \) hold. Then, \( S_m=O(J_m^2) \) \( \; (m =1, \ldots , M) \) hold. Therefore, we consider evaluating \(m_\phi ^M(s)\) given by (A.15) as the order of \(J_1, \ldots , J_M\). We obtain the following relations:
From the above discussion, by assumption \(O(J_1) = \cdots = O(J_M)\), we find that \(\prod _{m=1}^M S_m\) is the highest order in terms of (A.18). On the other hand, the following evaluation holds:
and
Therefore, in the case of statistic for testing complete independence in a multi-way contingency table
since \( c_{s,\ell }=O(K^2), \)
and
From (A.19), we note that it is not necessary to substitute the expression of coefficient \(c_{s,s-2}\) in (A.19) to evaluate \(m_\phi ^M(s)\), which is different from the case of multinomial goodness-of-fit test statistic. By substituting (A.17) for \(c_{s, s-1}\) in (A.19), we obtain the evaluation of \(m_\phi ^M(s)\) as follows:
From (A.21), if we put \(m_\phi ^M(s)=0\), then \(4\phi '''(1)+3\phi ^{(4)}(1) = o(1)\), which implies \(4\phi '''(1)+3\phi ^{(4)}(1) \rightarrow 0\) as \( K \rightarrow \infty \). We have completed the proof of Theorem 3. On the other hand, by substituting (8) in (A.20), we obtain the results of Theorem 4.
Proof of Theorems 5 and 6
Let the characteristic function of \(C_\phi ^{*}\) be \(\psi _\phi ^C(t)\). By assuming that the distribution of \(C_\phi ^{*}\) is continuous, \(\psi _\phi ^{C}(t)\) is evaluated as follows:
where
and \(\nu \) is defined in (21). Here, \(v_j^{\phi } \; (j=0, 1, 2, 3)\) satisfy
Similar to the discussion for the proof of Theorems 3 and 4, we calculate \(m_\phi ^C(s)\). After that, since \(\Gamma _1=O(J_3^2)\), \(\Gamma _2=O(K^2)\), \(\Gamma _3=O((J_1J_3)^2)\), and \(\Gamma _4=O((J_2J_3)^2)\) under the assumption that \(p_{jk\ell }=O(K^{-1})\), we can evaluate \(m^C_\phi (s)\) as follows:
From (A.23), if we put \(m_\phi ^C(s) = 0\), then \(4\phi '''(1)+3\phi ^{(4)}(1)=o(1)\). On the other hand, by substituting (8) in (A.22), we obtain the result of Theorem 6.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Taneichi, N., Sekiya, Y. Selection of statistics for a multinomial goodness-of-fit test and a test of independence for a multi-way contingency table when data are sparse. Jpn J Stat Data Sci (2024). https://doi.org/10.1007/s42081-023-00233-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42081-023-00233-y