Skip to main content
Log in

Testing homogeneity of proportions from sparse binomial data with a large number of groups

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

In this paper, we consider testing the homogeneity for proportions in independent binomial distributions, especially when data are sparse for large number of groups. We provide broad aspects of our proposed tests such as theoretical studies, simulations and real data application. We present the asymptotic null distributions and asymptotic powers for our proposed tests and compare their performance with existing tests. Our simulation studies show that none of tests dominate the others; however, our proposed test and a few tests are expected to control given sizes and obtain significant powers. We also present a real example regarding safety concerns associated with Avandia (rosiglitazone) in Nissen and Wolski (New Engl J Med 356:2457–2471, 2007).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Bathke, A. C., Harrar, S. W. (2008). Nonparametric methods in multivariate factorial designs for large number of factor levels. Journal of Statistical Planning and Inference, 138(3), 588–610.

  • Bathke, A., Lankowski, D. (2005). Rank procedures for a large number of treatments. Journal of Statistical Planning and Inference, 133(2), 223–238.

  • Billingsley, P. (1995). Probability and Measure (3rd ed.). Hoboken: Wiley.

  • Boos, D. D., Brownie, C. (1995). ANOVA and rank tests when the number of treatments is large. Statistics and Probability Letters, 23, 183–191.

  • Cai, T., Parast, L., Ryan, L. (2010). Meta-analysis for rare events. Statistics in Medicine, 29(20), 2078–2089.

  • Cochran, W. G. (1954). Some methods for strengthening the common \(\chi ^2\) tests. Biometrics, 10, 417–451.

  • Greenshtein, E., Ritov, R. (2004). Persistence in high-dimensional linear predictor selection and the virtue of over parametrization. Bernoulli, 10, 971–988.

  • Nissen, S. E., Wolski, K. (2007). Effect of rosiglitazone on the risk of myocardial infarction and death from cardiovascular causes. New England Journal of Medicine, 356(24), 2457–2471.

  • Park, J. (2009). Independent rule in classification of multivariate binary data. Journal of Multivariate Analysis, 100, 2270–2286.

  • Park, J., Ghosh, J. K. (2007). Persistence of the plug-in rule in classification of high dimensional multivariate binary data. Journal of Statistical Planning and Inference, 137, 3687–3705.

  • Potthoff, R. F., Whittinghill, M. (1966). Testing for homogeneity: I. The binomial and multinomial distributions. Biometrika, 53, 167–182.

  • Shuster, J. J. (2010). Empirical versus natural weighting in random effects meta analysis. Statistics in Medicine, 29, 1259–1265.

  • Shuster, J. J., Jones, L. S., Salmon, D. A. (2007). Fixed vs random effects meta-analysis in rare event studies: The rosiglitazone link with myocardial infarction and cardiac death. Statistics in Medicine, 26, 4375–4385.

  • Stijnen, T., Hamza, Taye H., Zdemir, P. (2010). Random effects meta-analysis of event outcome in the framework of the generalized linear mixed model with applications in sparse data. Statistics in Medicine, 29, 3046–3067.

  • Tian, L., Cai, T., Pfeffer, M. A., Piankov, N., Cremieux, P. Y., Wei, L. J. (2009). Exact and efficient inference procedure for meta-analysis and its application to the analysis of independent \(2\times 2\) tables with all available data but without artificial continuity correction. Biostatistics, 10, 275–281.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junyong Park.

Appendices

Appendix

Proof of Theorem 1

We use the Lyapunov’s condition for the asymptotic normality of \(\frac{\mathcal{T}_S- E(\mathcal{T}_S)}{\sqrt{\mathcal{B}_k}}\). Let \(\mathcal{T}_{Si} = \frac{(X_i - n_i \bar{\pi })^2}{n_i \bar{\pi }(1-\bar{\pi })}\), then we define \(\mathcal{D}_i = \mathcal{T}_{Si} - E(\mathcal{T}_{Si}) = \frac{(X_i - n_i \bar{\pi })^2}{n_i \bar{\pi }(1-\bar{\pi })} - \frac{n_i(\pi _i -\bar{\pi })^2}{n_i \bar{\pi }(1-\bar{\pi })} - \frac{\pi _i (1-\pi _i)}{n_i \bar{\pi }(1-\bar{\pi })} = \frac{1}{n_i \bar{\pi }(1-\bar{\pi }) } ((X_i - n_i \pi _i)^2 + 2n_i (X_i - n_i \pi _i) (\pi _i -\bar{\pi }) - n_i \pi _i (1-\pi _i))\). We show that the Lyapunov’s condition is satisfied, \( \frac{\sum _{i=1}^kE(\mathcal{D}_i^4) }{\mathcal{B}_k^2} \rightarrow 0.\) We see that

$$\begin{aligned} \frac{\sum _{i=1}^kE(\mathcal{D}_i^4)}{\mathcal{B}_k^2}\le & {} \frac{1}{\mathcal{B}_k^2}\sum _{i=1}^k\frac{ n_i^4 E(\hat{\pi }_i - \pi _i)^8 + 2^4 n_i^4 (\pi _i -\bar{\pi })^4 n_i^4 E(\hat{\pi }_i - \pi _i)^4 + n_i^4 \pi _i (1-\pi _i)^4 }{n_i^4(\bar{\pi }(1-\bar{\pi }))^4} \\= & {} \frac{1}{ (\bar{\pi }(1-\bar{\pi }))^4 \mathcal{B}_k^2 } \sum _{i=1}^k\left[ \left( \theta _i^4 + \frac{\theta _i}{n_i} \right) + n_i^2 (\pi _i -\bar{\pi })^4 (3\theta _i^2 + \frac{(1-6\theta _i) \theta _i}{n_i}) + \theta _i^4\right] \\\le & {} \frac{\sum _{i=1}^k\left( 2\theta _i^4 + \frac{\theta _i}{n_i} \right) }{(\bar{\pi }(1-\bar{\pi }))^4 \mathcal{B}_k^2 } + \frac{ 3\sum _{i=1}^kn_i^2 \theta _i (\pi _i -\bar{\pi })^4 (\theta _i + \frac{1}{n_i}) }{(\bar{\pi }(1-\bar{\pi }))^4\mathcal{B}_k^2} \\\rightarrow & {} 0 \end{aligned}$$

from the given conditions. Therefore, we have the asymptotic normality of \(\frac{\mathcal{T}_S-E(\mathcal{T}_S)}{\sqrt{\mathcal{B}_k}} \rightarrow N(0,1) \) in distribution. Furthermore, we also have the asymptotic normality of

$$\begin{aligned} T_0 = \frac{T_S-k}{\sqrt{\mathcal{B}_{0k}}} = \sqrt{\frac{\mathcal{B}_k}{\mathcal{B}_{0k}}} \frac{T_S-k}{\sqrt{\mathcal{B}_k}} + \frac{k-E(T_S)}{\sqrt{\mathcal{B}_{0k}}} = \sigma _k \frac{T_S-k}{\sqrt{\mathcal{B}_{k}}} + \mu _k \end{aligned}$$

which leads to \( P( T_0 \ge z_{1-\alpha }) = P(\sigma _k \frac{T_S-k}{\sqrt{\mathcal{B}_{k}}} + \mu _k \ge z_{1-\alpha } ) = P(\frac{T_S -k}{\sqrt{\mathcal{B}_k}} \ge \frac{z_{1-\alpha }}{\sigma _k} -\mu _k)\). Using \(\frac{T_S -k}{\sqrt{\mathcal{B}_k}} \rightarrow N(0,1)\) in distribution, we have \( P(T_0 \ge z_{1-\alpha }) - \bar{\varPhi }( \frac{z_{1-\alpha }}{\sigma _k} -\mu _k) \rightarrow 0.\)\(\square \)

Proof of Lemma 2

Since \(T_{1i}\) and \(T_{1j}\) for \(i\ne j\) are independent, we have \(\mathcal{V}_1 \equiv \hbox {Var}(T_1) = \sum _{i=1}^k\hbox {Var}(T_{1i})\) where

$$\begin{aligned} \hbox {Var}(T_{1i})= & {} n_i^2 \hbox {Var}[(\hat{\pi }_i - \pi _i)^2] + d_i^2 \hbox {Var}[\hat{\pi }_i (1-\hat{\pi }_i)] + 4 n_i^2 (\pi _i-\bar{\pi })^2 \hbox {Var}[ (\hat{\pi }_i-\pi _i)] \\&-\,2n_i d_i \hbox {Cov}((\hat{\pi }_i - \pi _i)^2, \hat{\pi }_i (1-\hat{\pi }_i)) \\&+\, 2 \hbox {Cov}(n_i(\hat{\pi }_i-\pi _i)^2, 2n_i (\hat{\pi }_i -\pi _i)(\pi _i-\bar{\pi }))\\&-\,2 Cov( 2n_i(\hat{\pi }_i - \pi _i)(\pi _i-\bar{\pi }), d_i\hat{\pi }_i (1-\hat{\pi }_i)). \end{aligned}$$

Using the following results

$$\begin{aligned} \hbox {Var}[(\hat{\pi }_i - \pi _i)^2]= & {} E[(\hat{\pi }_i - \pi _i)^4] - (E[(\hat{\pi }_i - \pi _i)^2])^2 \\= & {} \frac{2\theta _i^2}{n_i^2} + \frac{(1-6\theta _i)\theta _i}{n_i^3} \\ \hbox {Var}[\hat{\pi }_i (1-\hat{\pi }_i)]= & {} \frac{(1-\theta _i)\theta _i}{n_i} - \frac{2\theta _i(1-4\theta _i)}{n_i^2} + \frac{(1-6\theta _i)\theta _i}{n_i^3} \\ \hbox {Cov}( (\hat{\pi }_i - \pi _i)^2, \hat{\pi }_i (1-\hat{\pi }_i))= & {} \frac{n_i-1}{n_i^3} \theta _i(1-6\theta _i)\\ \hbox {Cov}((\hat{\pi }_i- \pi _i)^2, \hat{\pi }_i-\pi _i)= & {} E(\hat{\pi }_i-\pi _i)^3 = \frac{(1-2\pi _i)\theta _i}{n_i^2} \\ \hbox {Cov}( (\hat{\pi }_i -\pi _i), \hat{\pi }_i (1-\hat{\pi }_i))= & {} \frac{(1-2\pi _i) \theta _i}{n_i}\left( 1 -\frac{1}{n_i} \right) , \end{aligned}$$

we derive

$$\begin{aligned}&\hbox {Var}(T_1) = \sum _{i=1}^k \left\{ \theta _i^2 \left( 2-\frac{6}{n_i} -\, \frac{d_i^2}{n_i} + \frac{8d_i^2}{n_i^2}-\frac{6d_i^2}{n_i^3} +\, 12 d_i \frac{n_i-1}{n_i^2} \right) \right. \\&\qquad \qquad \quad \left. + \theta _i \left( \frac{1}{n_i} +\,\frac{d_i^2}{n_i} - \frac{2d_i^2}{n_i^2} + \frac{d_i^2}{n_i^3}- 2d_i \frac{n_i-1}{n_i^2} \right) \right\} \\&\qquad \qquad \quad +\, 4 \sum _{i=1}^k n_i(\pi _i - \bar{\pi })^2 \theta _i + \frac{4}{N} \sum _{i=1}^kn_i(\pi _i - \bar{\pi })(1-2\pi _i)\theta _i \\&\qquad \qquad = \sum _{i=1}^k\mathcal{A}_{1i} \theta _i^2 + \sum _{i=1}^k\mathcal{A}_{2i} \theta _i +\, 4 \sum _{i=1}^k n_i(\pi _i - \bar{\pi })^2 \theta _i \\&\qquad \qquad \quad +\frac{4}{N}\sum _{i=1}^kn_i(\pi _i - \bar{\pi })(1-2\pi _i)\theta _i \end{aligned}$$

where \( \mathcal{A}_{1i}= \left( 2-\frac{6}{n_i} - \frac{d_i^2}{n_i} + \frac{8d_i^2}{n_i^2}-\frac{6d_i^2}{n_i^3} + 12 d_i \frac{n_i-1}{n_i^2} \right) \) and \(\mathcal{A}_{2i} =\Big (\frac{1}{n_i} +\frac{d_i^2}{n_i} - \frac{2d_i^2}{n_i^2} + \frac{d_i^2}{n_i^3}- 2d_i\frac{n_i-1}{n_i^2} \Big ) = \frac{n_i}{N^2}\) from \(d_i = \frac{n_i}{n_i-1} \left( 1 -\frac{n_i}{N} \right) \). \(\square \)

Proof of Lemma 3

  1. 1.

    Using \(d_i = \frac{n_i}{n_i-1} (1-\frac{n_i}{N}) < 2\), we can derive \(\mathcal{A}_{1i}\) is uniformly bounded since \(\mathcal{A}_{1i} = 2 - \frac{6}{n_i} - \frac{6 d_i^2}{n_i} + \frac{8d_i^2}{n_i^2} -\frac{6d_i^2}{n_i^3} + 12 \frac{n_i}{n_i-1} \frac{n_i-1}{n_i^2}(1-\frac{n_i}{N}) = 2 + \frac{6}{n_i} -\frac{12}{N} + \frac{d_i^2}{n_i}( -1 + \frac{8}{n_i} -\frac{6}{n_i^2}) (1-\frac{n_i}{N}) \). Let \(x = \frac{1}{n_i} \le \frac{1}{2}\), then \( f(x) = (-1 + \frac{8}{n_i} -\frac{6}{n_i^2}) = -6(x-\frac{2}{3})^2 + \frac{7}{9}\) which has the value \( -1< f(x) \le \frac{3}{2}\). Therefore, we have \( 2 + \frac{6}{n_i} -\frac{12}{N} + \frac{6}{n_i} \ge \mathcal{A}_{1i} \ge 2 + \frac{6}{n_i} -\frac{12}{N} - \frac{4}{n_i}\). Using \(n_i\ge 2\) and \(N \rightarrow \infty \) as \(k \rightarrow \infty \), lower and upper bounds are uniformly bounded away from 0 and \(\infty \) for all i. Therefore, we have \(\mathcal{A}_{1i} \asymp 1\) and \(\mathcal{A}_{2i} = \frac{n_i}{N^2}\) leading to \(\mathcal{V}_1 = \sum _{i=1}^k\mathcal{A}_{1i} \theta _i^2 + \sum _{i=1}^k\mathcal{A}_{2i} \theta _i \asymp \sum _{i=1}^k\theta _i^2 + \frac{1}{N^2} \sum _{i=1}^kn_i \theta _i\).

  2. 2.

    Let \(\mathcal{G}_n = 4 \sum _{i=1}^kn_i (\pi _i -\bar{\pi })^2 \theta _i + 4 \frac{1}{N} \sum _{i=1}^kn_i (\pi _i-\bar{\pi })(1-2\pi _i)\theta _i = 4 \sum _{i=1}^k\theta _i G_{i} \) where \(G_i = n_i(\pi _i -\bar{\pi })^2 + \frac{n_i}{N}(\pi _i-\bar{\pi })(1-2\pi _i)\). If we define \(\mathcal{B} = \{ i : |\pi _i - \bar{\pi }| \ge \frac{(1+\epsilon )}{N} \}\) for some \(\epsilon >0\), then we decompose

    $$\begin{aligned} \mathcal{V}_1= & {} \underbrace{ \sum _{i \in \mathcal{B}} ( \mathcal{A}_{1i} \theta _i^2 + \mathcal{A}_{2i} \theta _i)}_{\mathcal{F}_1} + \underbrace{\sum _{i \in \mathcal{B}^c} ( \mathcal{A}_{1i} \theta _i^2 + \mathcal{A}_{2i} \theta _i)}_{\mathcal{F}_2} \end{aligned}$$
    (19)
    $$\begin{aligned} \mathcal{G}_n= & {} 4 \underbrace{\sum _{i \in \mathcal{B}} \theta _i G_i}_{\mathcal{G}_{n1}} + 4 \underbrace{\sum _{i \in \mathcal{B}^c}\theta _i G_i}_{\mathcal{G}_{n2}} \equiv 4 \mathcal{G}_{n1} + 4 \mathcal{G}_{n2}. \end{aligned}$$
    (20)

    For \(i \in \mathcal{B}\), we have \( \frac{n_i}{N}| (1-2\pi _i) (\pi _i - \bar{\pi }) \theta _i| \le \frac{n_i}{(1+\epsilon )} (\pi _i - \bar{\pi })^2 \theta _i\) which implies

    $$\begin{aligned} \frac{4\epsilon }{1+\epsilon } \sum _{i \in \mathcal{B}} n_i ( \pi _i -\bar{\pi })^2 \theta _i \le 4 \mathcal{G}_{n1} \le \frac{4 (2+\epsilon )}{1+\epsilon } \sum _{i \in \mathcal{B}} n_i ( \pi _i -\bar{\pi })^2 \theta _i. \end{aligned}$$

    This leads to \(4 \mathcal{G}_{n1} \asymp \sum _{i \in \mathcal{B}} n_i (\pi _i- \bar{\pi })^2 \theta _i\) and

    $$\begin{aligned} \mathcal{F}_1+ 4 \mathcal{G}_{n1} \asymp \mathcal{F}_1+ \sum _{i \in \mathcal{B}} n_i ( \pi _i -\bar{\pi })^2 \theta _i. \end{aligned}$$
    (21)

    For \(\mathcal{B}^c = \{ i | |\pi _i -\bar{\pi }| < \frac{(1+\epsilon )}{N}\}\), we first show \( \mathcal{F}_2 + 4\mathcal{G}_{n2} \ge \sum _{i \in \mathcal{B}^c} \mathcal{A}_{1i} \theta _i^2\). For \(i \in \mathcal{B}^c\) and \(x = \pi _i-\bar{\pi }\), we have \( G_i = n_i( x+ \frac{1}{2N} (1-2\pi _i))^2 - \frac{(1-2\pi _i)^2 n_i }{4N^2} \ge -\frac{ (1-2\pi _i)^2 n_i}{4N^2}\) leading to

    $$\begin{aligned} \mathcal{F}_2 + 4 \mathcal{G}_{n2}\ge & {} \sum _{i \in \mathcal{B}^c} \mathcal{A}_{1i}\theta _i^2 + \frac{1}{N^2} \sum _{i \in \mathcal{B}^c} {n_i\theta _i} \left( 1-(1-2\pi )^2 \right) \nonumber \\= & {} \sum _{i\in \mathcal{B}^c} \mathcal{A}_{1i} \theta _i^2 + \frac{4}{N^2} \sum _{i \in \mathcal{B}^c} {n_i \theta _i^2} = \sum _{i\in \mathcal{B}^c} \mathcal{A}_{1i} \theta _i^2 + 4 \sum _{i \in \mathcal{B}^c} \mathcal{A}_{2i}\theta _i^2 \nonumber \\> & {} \sum _{i\in \mathcal{B}^c} \mathcal{A}_{1i} \theta _i^2. \end{aligned}$$
    (22)

    The upper bound of \(4\mathcal{G}_{n_2}\) is

    $$\begin{aligned} 4 \mathcal{G}_{n_2}\le & {} \frac{4(1+\epsilon )}{N^2} \sum _{i \in \mathcal{B}^c} n_i \theta _i = 4(1+\epsilon ) \sum _k \mathcal{A}_{2i} \theta _i \end{aligned}$$

    resulting in

    $$\begin{aligned} \mathcal{F}_2 + 4 \mathcal{G}_{n2}\le & {} 4(1+\epsilon ) \sum _{i\in \mathcal{B}^c} \mathcal{A}_{1i} \theta _i^2 + 4(1+\epsilon ) \sum _{i \in \mathcal{B}^c} \mathcal{A}_{2i} \theta _i + 4 \sum _{i \in \mathcal{B}^c} n_i (\pi _i - \bar{\pi })^2 \theta _i \nonumber \\< & {} 4(1+\epsilon ) ( \mathcal{F}_2 + \sum _{i \in \mathcal{B}^c} n_i (\pi _i - \bar{\pi })^2 \theta _i). \end{aligned}$$
    (23)

    Combining (22) and (23), we have

    $$\begin{aligned} \sum _{i \in \mathcal{B}^c} \mathcal{A}_{1i} \theta _i^2< \mathcal{F}_2 + \mathcal{G}_{n2} < 4(1+\epsilon )( \mathcal{F}_2 + \sum _{i \in \mathcal{B}^c} n_i (\pi _i - \bar{\pi })^2 \theta _i). \end{aligned}$$
    (24)

    From (21) and (24), we conclude, for \(K=4(1+\epsilon )\),

    $$\begin{aligned} \sum _{i=1}^k \mathcal{A}_{1i} \theta _i^2 < \hbox {Var}(T_1) \le K (\nu _1 + ||{{\varvec{\pi }}} - \bar{{\varvec{\pi }}} ||_{\mathbf{n} \theta }^2). \end{aligned}$$

    In particular, if \( \mathcal{B}^c\) is an empty set, then we have \(\hbox {Var}(T) = \mathcal{F}_{1} + 4 \mathcal{G}_{n1}\), therefore (21) implies (15). \(\square \)

Proof of Lemma 4

Let \(X = \sum _{i=1}^n X_i\) where \(X_i\)s are iid Bernoulli(\(\pi \)). In expansion of \((X-n\pi )\), each term has the form of \((X_{i_1}-\pi )^{m_1}(X_{i_2}-\pi )^{m_2}\cdots (X_{i_k}-pi)^{m_k}\) for \(1\le i_1,\ldots , i_k \le n\) and \( m_1 + \cdots + m_k=n\), so if there exists at least one \(m_{k}=1\), then expectation of the term is zero. We only need to consider the terms without \((X_{i_j}-\pi )\), so we finally have

$$\begin{aligned} E(X-n\pi )^8= & {} E\left( \sum _{i=1}^n (X_i -\pi )\right) ^8 \\= & {} {n \atopwithdelims ()1} E(X_1-\pi )^8\\&+\, 2 {8 \atopwithdelims ()6,2} {n \atopwithdelims ()2} E(X_1-\pi )^6 E(X_1-\pi )^2 \\&+\, 2 {n \atopwithdelims ()2} {8 \atopwithdelims ()5,3} E(X_1-\pi )^5 E(X_1-\pi )^3 \\&+ \, {n\atopwithdelims ()2} {8 \atopwithdelims ()4,4} [E(X_1-\pi )^4]^2 \\&+\, \frac{3!}{2!} {n \atopwithdelims ()3} {8 \atopwithdelims ()4,2,2} E(X_1-\pi )^4 [E(X_1-\pi )^2]^2 \\&+\, \frac{3!}{2!} {n \atopwithdelims ()3} {8 \atopwithdelims ()3,3,2} [E(X_1-\pi )^3]^2 E(X_1-\pi )^2 \\&+\, {n \atopwithdelims ()4} {8 \atopwithdelims ()2,2,2,2} [E(X_1-\pi )^2]^4. \end{aligned}$$

We have \(E(X_1-\pi )^m = \sum _{i=0}^m {m \atopwithdelims ()i} E(X_1^i) (-\pi )^{m-i} = (-\pi )^m + \sum _{i=1}^m {m \atopwithdelims ()i} E(X_1^i) (-\pi )^{m-i}\) and using \(E(X_1^i) = E(X_i) =\pi \) for \(i\ge 1\), we obtain \(E(X_1-\pi )^m = (-\pi )^m + \pi \sum _{i=1}^m {m \atopwithdelims ()i} (-\pi )^{m-i} = (-\pi )^m-\pi (-\pi )^m + \pi \sum _{i=0}^m {m \atopwithdelims ()i} (-\pi )^m = (1-\pi )(-\pi )^m + \pi (1-\pi )^m = \pi (1-\pi )( (-1)^m \pi ^{m-1}+(1-\pi )^{m-1})\le \pi (1-\pi )\) for \(m \ge 2\). Since all coefficients in the expansion of \(E(\sum _{i=1}^n (X_i - \pi ))\) are fixed constants, for some universal constant \(C>0\), we have

$$\begin{aligned} E(X-n\pi )^8\le & {} C \max (n\pi (1-\pi ), (n\pi (1-\pi ))^2, (n\pi (1-\pi ))^3, (n\pi (1-\pi ))^4 ) \\= & {} C \max \{ n\pi (1-\pi ), (n\pi (1-\pi ))^4 \}. \end{aligned}$$

since maximum is obtained at either \(n\pi (1-\pi )\) or \((n\pi (1-\pi ))^4\) depending on \(n\pi (1-\pi ) \le 1 \) or \(n\pi (1-\pi )>1\).

For the second equation, we first consider the moment of \(E(\hat{\pi }^4)\) and \(E(\hat{(}1-\hat{\pi })^4)\). The latter one is easily obtained from the first one by changing the distribution from \(B(n,\pi )\) to \(B(n,1-\pi )\). We first obtain

$$\begin{aligned} E \hat{\pi }^4= & {} \pi ^4 + \frac{6\pi ^2 \theta }{n} + \frac{4\pi (1-2\pi ) \theta }{n^2} + \frac{3\theta ^2}{n^2} + \frac{(1-6\theta )\theta }{n^3} \\\le & {} \pi ^4 + \frac{6\pi ^3}{n} + \frac{7\pi ^2}{n^2} + \frac{\pi }{n^3} \\\le & {} 7 \left( \pi ^4 + \frac{\pi ^3}{n} + \frac{\pi ^2}{ n^2} + \frac{\pi }{n^3} \right) \\= & {} 28 \max \left( \pi ^4, \frac{\pi }{n^3}\right) \end{aligned}$$

where the last equality holds due to the fact that the maximum is obtained at either \(\pi ^4\) or \(\frac{\pi }{n^3}\) depending on \(\pi \ge \frac{1}{n}\) or \(\pi < \frac{1}{n}\). Similarly, the following inequality is obtained

$$\begin{aligned} E (1-\hat{\pi })^4\le & {} 28 \max \left( (1-\pi )^4, \frac{1-\pi }{n^3} \right) . \end{aligned}$$

Using \(E \hat{\pi }^4 (1-\hat{\pi })^4 \le \min (E\hat{\pi }^4, E (1-\hat{\pi })^4 )\), we have

$$\begin{aligned} E \hat{\pi }^4 (1-\hat{\pi })^4\le & {} \min (E\hat{\pi }^4, E (1-\hat{\pi })^4 ) \\\le & {} 28 \min \left\{ \max \left( \pi ^4, \frac{\pi }{n^3}\right) , \max \left( (1-\pi )^4, \frac{1-\pi }{n^3} \right) \right\} \\= & {} \max \left( \pi ^4, \frac{\pi }{n^3}\right) ~~~\text{ if } \pi \le \frac{1}{2} \\&\max \left( (1-\pi )^4, \frac{1-\pi }{n^3} \right) ~~~ \text{ if } \pi > \frac{1}{2}. \end{aligned}$$

If \(\pi \le \frac{1}{2}\), \(\pi \ge 2\pi (1-\pi ) = 2\theta \); if \(\pi > \frac{1}{2}\), \(1-\pi \le 2\theta \). So the last equality is

$$\begin{aligned} E \hat{\pi }^4 (1-\hat{\pi })^4 \le C' \max \left( \theta ^4, \frac{\theta }{n^3} \right) \end{aligned}$$

for some universal constant \(C'\).

We use the following relationship: for some constants \(b_m\), \(m=1,\ldots ,l-1\)

$$\begin{aligned} X^l = \prod _{j=1}^{l} (X-j+1) + \sum _{m=1}^{l-1} b_m \prod _{j=1}^{m} (X-j+1). \end{aligned}$$

For example we have \(x^3 = x(x-1)(x-2) + 3x(x-1) +x\). Using \(E\prod _{j=1}^{l} (X-j+1) = \prod _{j=1}^{l} (n-j+1) \pi ^l\),

$$\begin{aligned} E\hat{\pi }^l= & {} \frac{1}{n^l} E\prod _{j=1}^{l} (X-j+1) + \frac{1}{n^l} \sum _{m=1}^{l-1} b_m E \left( \prod _{j=1}^{m} (X-j+1) \right) \\= & {} \pi ^l + O\left( \frac{\pi ^l}{n}\right) + O \left( \sum _{m=1}^{l-1} \frac{\pi ^m}{n^{l-m}} \right) \\= & {} \pi ^l + O\left( \frac{\pi ^{l-1}}{n} + \frac{\pi }{n^{l-1}} \right) . \end{aligned}$$

Using this, we can derive

$$\begin{aligned} E(\hat{\pi }^l -\pi ^l)^2= & {} E \hat{\pi }^{2l} - 2\pi ^l E \hat{\pi }^l + \pi ^{2l} = O\left( \frac{\pi ^{2l-1}}{n} + \frac{\pi }{n^{2l-1}} \right) .\\ E(\hat{\eta }_l - \pi ^l)^2= & {} E(\hat{\eta }_l -\hat{\pi }_l +\hat{\pi }_l - \pi _l)^2 \le 2^2 E(\hat{\eta }_l -\hat{\pi }^l)^2 + 2^2 E(\hat{\pi }^l - \pi ^l)^2. \end{aligned}$$

Since \(\hat{\eta }_l -\hat{\pi }^l = {\hat{\pi }^l}O(\frac{1}{n}) + \sum _{i=1}^{l-1} {\hat{\pi }^{l-i}} O(\frac{1}{n^i})\), we have \(E(\hat{\eta }_l -\hat{\pi }^l)^2 \le \left\{ E(\hat{\pi }^{2l})O(\frac{1}{n^2}) + \sum _{i=1}^{l-1} {\hat{\pi }^{2l-2i}} O(\frac{1}{n^{2i}}) \right\} .\) Using \(E \hat{\pi }^{2l}=\pi ^{2l}+ O \left( \frac{\pi }{n^{2l-1}} + \frac{\pi ^{2l-1}}{n} \right) \) from (4), we obtain

$$\begin{aligned} E(\hat{\eta }_l -\hat{\pi }^l)^2= & {} O\left( \frac{1}{n^2} \right) \left( \pi ^{2l} + O \left( \frac{\pi }{n^{2l-1}} + \frac{\pi ^{2l-1}}{n} \right) \right) \nonumber \\&+\sum _{i=1}^{l-1} \left( \pi ^{2l-2i} + O\left( \frac{\pi }{n^{2l-2i-1}} + \frac{\pi ^{2l-2i-1}}{n} \right) \right) O\left( \frac{1}{n^{2i}} \right) \nonumber \\= & {} O\left( \sum _{i=1}^{l-1} \frac{\pi ^{2(l-i)}}{n^{2i}} + \frac{\pi }{n^{2l-1}} \right) . \end{aligned}$$
(25)

We can show \(\frac{\pi ^{2(l-i)}}{n^{i}} \le \frac{\pi }{n^{2l-1}} + \frac{\pi ^{2l-1}}{n}\) for \(2\le i \le l-1\) since \(\frac{\pi ^{2(l-i)}}{n^{i}} \le \frac{\pi ^{2l-1}}{n}\) for \(\pi \ge \frac{1}{n}\) and \(\frac{\pi ^{2(l-i)}}{n^{i}} \le \frac{\pi }{n^{2l-1}}\) for \(\pi <\frac{1}{n}\). Using this, we have (25) \(\le O\left( \frac{\pi }{n^{2l-1}} + \frac{\pi ^{2l-1}}{n} \right) \) which proves \(E(\hat{\eta }_l -\hat{\pi }^l)^2 = O\left( \frac{\pi }{n^{2l-1}} + \frac{\pi ^{2l-1}}{n} \right) \). \(\square \)

Proof of Lemma 5

For the ratio consistency of \({\hat{\mathcal{V}}}_{1}\), it is enough to show \( \frac{E[( {\hat{\mathcal{V}}}_1 - \mathcal{V}_1 )^2]}{(\mathcal{V}_1)^2} \rightarrow 0\) as \(k \rightarrow \infty \). Since \(\hat{\mathcal{V}}_1\) is an unbiased estimator of \(\mathcal{V}_1\),

$$\begin{aligned} \hbox {Var}(\hat{\mathcal{V}}_1)= & {} E[(\hat{\mathcal{V}}_1 - \mathcal{V}_1)^2]\\= & {} \sum _{i=1}^k\sum _{l=1}^4 a^2_{li} E [( \hat{\eta }_{li}- \eta _{li})^2] + \sum _{i \ne i'} \sum _{l \ne l'} a_{li} a_{l'i'} E[ (\hat{\eta }_{li} - \eta _{li})(\hat{\eta }_{l'i'}-\eta _{l'i'})] \\= & {} \sum _{i=1}^k\sum _{l=1}^4 a^2_{li} E [( \hat{\eta }_{li}- \eta _{li})^2] \end{aligned}$$

where the last equality follows since \(E[(\hat{\eta }_{li}- \eta _{li})(\hat{\eta }_{l'i'} - \eta _{l'i'})] = E[(\hat{\eta }_{li}-\eta _{li})] E[(\hat{\eta }_{l'i'}-\eta _{l'i'})]=0\) because \(\hat{\eta }_{li}\) and \(\hat{\eta }_{l'i'}\) are independent for \(i \ne i'\) and both are unbiased estimators. Since \(\mathcal{V}_1\) depends on \(\theta _i=\pi _i(1-\pi _i)\), we have the same result if we change \(\pi _i\) to \(1-\pi _i\); in other words, \(\hbox {Var}(\hat{\mathcal{V}}_1) =\sum _{i=1}^k \sum _{l=1}^4 a_{li}^2 (\eta _{li} -\eta _{li})^2 = \sum _{i=1}^k \sum _{l=1}^4 a^2_{li} (\hat{\eta }^*_{li} -\eta ^*_{li})^2\) where \(\eta ^*_{li} = (1-\pi _i)^l\) and \(\hat{\eta }^*_{li}\) is the corresponding unbiased estimator. For \(\pi \le 1/2\), we use \(\mathcal{V}_1 =\sum _{i=1}^k \sum _{l=1}^4 a_{li} \pi _i^l\) and obtain \(\hbox {Var}(\hat{\mathcal{V}}_1) = O(\sum _{i=1}^k (\frac{\pi _i^3}{n_i} + \frac{\pi _i}{n_i^3}))\) from Lemma 4. Since \( \pi _i \le \delta <1\), we have \(\hbox {Var}(\hat{\mathcal{V}}_1) = O(\sum _{i=1}^k (\frac{\pi _i^3}{n_i} + \frac{\pi _i}{n_i^3})) = O(\sum _{i=1}^k (\frac{\theta _i^3}{n_i} + \frac{\theta _i}{n_i^3})).\) From Lemma 3 and the given condition, we obtain

$$\begin{aligned} \frac{\hbox {Var}(\hat{\mathcal{V}}_1)}{ \mathcal{V}_1^2 }= & {} O\left( \frac{\sum _{i=1}^k \left( \frac{\theta _i^3}{n_i}+ \frac{\theta _i}{n_i^3}\right) }{\left( \sum _{i=1}^k \left( \theta _i^2 + \frac{1}{N^2} \frac{\theta _i}{n_i}\right) \right) ^2} \right) =o(1). \end{aligned}$$

Similarly, we can show, for some constant \(C'\),

$$\begin{aligned} \frac{\hbox {Var}(\hat{\mathcal{V}}_{1*})}{ (\mathcal{V}_{1*})^2 } = O\left( \frac{ (\tilde{\theta })^3 \sum _{i=1}^k \frac{1}{n_i} + \tilde{\theta }\sum _{i=1}^k\frac{1}{n_i^3} }{ \left( k (\tilde{\theta })^2 + \frac{\tilde{\theta }}{N^2} \sum _{i=1}^k \frac{1}{n_i}\right) ^2 } \right) =o(1). \end{aligned}$$

\(\square \)

Proof of Theorem 2

Since the condition in Lemma 5 holds, \(\hat{\mathcal{V}}_1\) and \(\hat{\mathcal{V}}_{1*}\) are the ratio-consistent estimators of \( \mathcal{V}_1 = \mathcal{V}_{1*}\) under the \(H_0\). From \( \frac{T}{\sqrt{\mathcal{V}_1}} = \frac{T_1 - T_2}{\sqrt{\mathcal{V}_1}}\), we only need to show (i) \(\frac{T_1}{\sqrt{\mathcal{V}_1}} \rightarrow N(0,1)\) in distribution and (ii) \(\frac{T_2}{\sqrt{\mathcal{V}_1}} \rightarrow 0\) in probability. To prove (i), we show the Lyapunov’s condition (see Billingsley 1995) for the asymptotic normality is satisfied. In other words, under \(H_0\), we need to show \(\frac{\sum _{i=1}^k E(T_{1i}^4)}{ \hbox {Var}(T_1)^2} \rightarrow 0\). Under \(H_0\), we have \(T_{1i} = n_i(\hat{\pi }_i -\pi _i)^2 - d_i \hat{\pi }_i (1-\hat{\pi }_i)\) with \(E(T_{1i})=0\); therefore, the Lyapunov’s condition is \( \sum _{i=1}^kE(T_{1i}^4)/ \hbox {Var}(T_1)^2 \rightarrow 0\). Using Lemma 4, we have \(\sum _{i=1}^kE(T_{1i}^4) \le 2^4(\sum _{i=1}^kn_i^4 E(\hat{\pi }_i -\pi _i )^4 + d_i^4 E(\hat{\pi }_i (1-\hat{\pi }_i))^4 ) = O( \sum _{i=1}^k(\theta _i^4 + \frac{\theta _i}{n_i})) + O( \sum _{i=1}^k(\theta _i^4 + \frac{\theta _i}{n_i}^3) ) = O( (k \theta ^4 + \frac{\theta }{\sum _{i=1}^k}\frac{1}{n_i}))\) since all \(\theta _i =\theta \) under \(H_0\). Combining this with the result 1 in Lemma 3, we have \(\frac{\sum _{i=1}^k E(T_{1i}^4)}{ \hbox {Var}(T_1)^2} = \frac{ O(k\theta ^4 + \theta \sum _{i=1}^k\frac{1}{n_i}) }{ (k\theta ^2 + \frac{\theta }{N^2} \sum _{i=1}^k\frac{1}{n_i})^2} \le \frac{ k \theta ^4 + \theta \sum _{i=1}^k\frac{1}{n_i}}{ k \theta ^4} = \frac{1}{k} + \frac{\sum _{i=1}^k\frac{1}{n_i}}{ k \theta ^3} \rightarrow 0\) as \(k \rightarrow \infty \) from the given condition \(\frac{\sum _{i=1}^k\frac{1}{n_i}}{ k \theta ^3} \rightarrow 0\) which shows \(\frac{T_1}{\sqrt{\mathcal{V}_1}} \rightarrow N(0,1)\) in distribution.

Furthermore, from Lemma 3 under the \(H_0\), we have \(\mathcal{V}_1 \asymp k\theta ^2 + \theta \sum _{i=1}^k\frac{1}{n_i} \); therefore, we obtain \(E\left( \frac{T_2}{\sqrt{\mathcal{V}_1}} \right) = \frac{E( N(\hat{\bar{\pi }} - \bar{\pi })^2 )}{\sqrt{\mathcal{V}_1}} \asymp \frac{ \theta }{\sqrt{k \theta ^2 + \frac{\theta }{N^2} \sum _{i=1}^k\frac{1}{n_i} }} \le \frac{1}{\sqrt{k}} \rightarrow 0\) which leads to \(\frac{T_2}{\sqrt{\mathcal{V}_1}} \rightarrow 0\) in probability. Combining the asymptotic normality of \(\frac{T}{\sqrt{\mathcal{V}_1}}\) with the ratio consistency of \(\hat{\mathcal{V}}_1\) and \(\hat{\mathcal{V}}_{1*}\), we have the asymptotic normality of \(T_\mathrm{new1}\) and \(T_\mathrm{new2}\) under the \(H_0\). \(\square \)

Proof of Theorem 3

Since \(T=T_1 -T_2\) from (8), we only need to show the following:

  1. (I)

    \(\frac{T_1 -\sum _{i=1}^k n_i (\pi _i - \bar{\pi })^2 }{\sqrt{{\hbox {Var}(T_1)}}} \rightarrow N(0,1)\) in distribution

  2. (II)

    \(\frac{T_2}{\sqrt{{\hbox {Var}(T_1)}}} \rightarrow 0\) in probability.

For (I), we use the Lyapunov’s condition for the asymptotic normality of \(T_1\). We show \(\frac{\sum _{i=1}^k E(T_{1i}- n_i(\pi _i -\bar{\pi })^2)^4}{\hbox {Var}(T_1)^2 } \rightarrow 0\) where \( G_i = T_{1i}- n_i(\pi _i -\bar{\pi })^2 = n_i (\hat{\pi }_i -\pi _i)^2 -d_i \hat{\pi }_i (1-\hat{\pi }_i) + 2n_i (\hat{\pi }_i -\pi _i)(\pi _i-\bar{\pi })\). Using \(\sum _{i=1}^k E(G_i^4) \le \sum _{i=1}^k \left( n_i^4E((\hat{\pi }_i - \pi _i)^8) + d_i^4 E( (\hat{\pi }_i (1-\hat{\pi }_i))^4) + 2^4 n_i^4 E(\hat{\pi }_i -\pi _i)^4 (\pi _i-\bar{\pi })^4 \right) \). From Lemma 4, we have \(n_i^4E((\hat{\pi }_i - \pi _i)^8) \le O\left( \theta _i^4 + \frac{\theta _i}{n_i} \right) \), \(d_i^4 E( (\hat{\pi }_i (1-\hat{\pi }_i))^4) \le 2^4 \left( \frac{3\theta _i^2}{n_i^2} + \frac{(1-6\theta _i)\theta _i}{n_i^3} \right) \le O( \frac{\theta _i^2}{n_i^2} + \frac{\theta _i}{n_i^3} )\) where \(O(\cdot )\) is uniform in \(1\le i \le k\). Using the result in Lemma 1, we have \(2^4 \sum _{i=1}^k n_i^4 E(\hat{\pi }_i -\pi _i)^4 \sum _{i=1}^k (\pi _i-\bar{\pi })^4 \le 2^4 \sum _{i=1}^k n_i^4 (\pi _i-\bar{\pi })^4 \left( \frac{3\theta _i^2}{n_i^2}+ \frac{(1-6\theta _i)\theta _i}{n_i^3} \right) \le \max _{1\le i \le k} \left\{ n_i (\pi _i-\bar{\pi })^2 \left( \theta _i + \frac{1}{n_i}\right) \right\} \sum _{i=1}^k n_i (\pi _i-\bar{\pi })^2 \theta _i = \max _{1\le i \le k} \left\{ n_i (\pi _i-\bar{\pi })^2 \left( \theta _i + \frac{1}{n_i}\right) \right\} ||{{\varvec{\pi }}} - \bar{{\varvec{\pi }}} ||^2_{\theta \mathbf{n}}\). Therefore, we have

$$\begin{aligned} \frac{\sum _{i=1}^k E(G_i^4)}{\hbox {Var}(T_1)^2}\le & {} \frac{ \sum _{i=1}^k \left( \theta _i^4 + \frac{\theta _i}{n_i} \right) + \max _{1\le i \le k} \left\{ n_i (\pi _i-\bar{\pi })^2 \left( \theta _i + \frac{1}{n_i}\right) \right\} ||{{\varvec{\pi }}} - \bar{{\varvec{\pi }}} ||^2_{\theta \mathbf{n}} }{ \left( \mathcal{\nu }_1 + ||{{\varvec{\pi }}} - \bar{{\varvec{\pi }}} ||^2_{\theta \mathbf{n}}\right) ^2 }\end{aligned}$$
(26)
$$\begin{aligned}= & {} \frac{ \sum _{i=1}^k \left( \theta _i^3 + \frac{\theta _i}{n_i}\right) }{\left( \sum _{i=1}^k \left( \theta _i^2 + \frac{\theta _i}{n_i}\right) \right) ^2} + \frac{ \max _{1\le i \le k} \left\{ n_i (\pi _i-\bar{\pi })^2 \left( \theta _i + \frac{1}{n_i}\right) \right\} }{\mathcal{\nu } + ||{{\varvec{\pi }}} - \bar{{\varvec{\pi }}} ||^2_{\theta \mathbf{n}} }\rightarrow 0\nonumber \\ \end{aligned}$$
(27)

from the given conditions.

The negligibility of \(T_2 = N(\hat{\bar{\pi }} - \bar{\pi })^2\) can be proven by noting that \(\frac{\hbox {NE}(\hat{\bar{\pi }} - \bar{\pi })^2}{\sqrt{\hbox {Var}(T_1)}} = \frac{ \bar{\theta }}{\sqrt{\hbox {Var}(T_1)}} = \frac{1}{N} \frac{\sum _{i=1}^kn_i \theta _i}{\sqrt{\hbox {Var}(T_1)}} \asymp \frac{\max _i \theta _i \sum _{i=1}^kn_i}{ N \sqrt{ \mathcal{V}_1 + ||{{\varvec{\pi }}} - \bar{{\varvec{\pi }}} ||^2_{\theta \mathbf{n}} } }\) by (15) from the condition (i). This leads to \(\left( \frac{\max _i \theta _i^2 }{{ \mathcal{V}_1 + ||{{\varvec{\pi }}} - \bar{{\varvec{\pi }}} ||^2_{\theta \mathbf{n}} } } \right) ^{1/2} \rightarrow 0\) from the condition (ii), so we have \(\frac{N(\hat{\bar{\pi }} - \bar{\pi })^2}{\sqrt{\hbox {Var}(T_1)}} \rightarrow 0\) in probability. Combining (I) and (II), we conclude \(\frac{T - \sum _{i=1}^k n_i (\pi _i - \bar{\pi })^2}{\sqrt{{\hbox {Var}(T_1)}}} \rightarrow N(0,1)\) in distribution. \(\square \)

Proof of Theorem 4

  1. 1.

    Proof of 1 : We prove \(\beta (T_\mathrm{new2}) \ge \beta (T_\mathrm{new1})\). For this, we only need to show that \(\mathcal{V}_{1} \ge \mathcal{V}_{1*}\) from Corollary 2. Let \(f(x)= 2 x^2(1-x)^2 + \frac{x(1-x)}{n}\), then f(x) is convex for \(0< x < \frac{1}{2} -\frac{1}{\sqrt{3}} \sqrt{1+\frac{1}{n}}\) since \(f''(x) >0 \) for \(0< x < \frac{1}{2} -\frac{1}{\sqrt{3}} \sqrt{1+\frac{1}{n}}\). Furthermore, \(\mathcal{V}_1 = \sum _{i=1}^kf(\pi _i)\) and \(\mathcal{V}_{1*} = k f(\bar{\pi })\) for \(\bar{\pi }= \frac{1}{N} \sum _{i=1}^kn_i \pi _i\). From the convexity of f, if \(n_i=n\) for all \(1\le i\le k\), we have \(\frac{1}{k} \mathcal{V}_1 = \frac{1}{k} \sum _{i=1}^kf(\pi _i) \ge f(\bar{\pi }) = \frac{1}{k} \mathcal{V}_{1*}\). Therefore, \(\mathcal{V}_1 \ge \mathcal{V}_{1*}\) which leads to \( \lim _{k\rightarrow \infty }(\beta (T_\mathrm{new2}) - \beta (T_\mathrm{new1})) \ge 0\) for the given \(0< \pi _i < \frac{1}{2} -\frac{1}{\sqrt{3}}\) for all i.

    Under the given condition, \(\hat{\mathcal{B}}_{0k} = 2k (1+o_p(1))\) and

    $$\begin{aligned} T_\mathrm{new2}= & {} \frac{ \sum _{i=1}^k n_i (\hat{\pi }_i -\hat{\bar{\pi }})^2 - \sum _{i=1}^k \hat{\pi }_{i}(1-\hat{\pi }_i) }{\sqrt{2k \hat{\bar{\pi }} (1-\hat{\bar{\pi }}) }} (1+o_p(1))\\ T_{\chi }= & {} \frac{ \sum _{i=1}^k n_i (\hat{\pi }_i -\hat{\bar{\pi }})^2 - k \hat{\bar{\pi }}(1-\hat{\bar{\pi }}) }{\sqrt{2k \hat{\bar{\pi }} (1-\hat{\bar{\pi }}) }} (1+o_p(1)) \end{aligned}$$

    which leads to

    $$\begin{aligned} T_\mathrm{new2} - T_{\chi } = \frac{k \hat{\bar{\pi }}(1-\hat{\bar{\pi }}) - \sum _{i=1}^k \hat{\pi }_{i}(1-\hat{\pi }_i)}{\sqrt{2k \hat{\bar{\pi }} (1-\hat{\bar{\pi }})}} (1+o_p(1)). \end{aligned}$$

    Using \(k\hat{\bar{\pi }}(1-\hat{\bar{\pi }}) \ge \sum _{i=1}^k \hat{\pi }_i(1-\hat{\pi }_i)\), \( \lim _{k \rightarrow \infty } P( T_\mathrm{new2} - T_{\chi } \ge 0) \rightarrow 1\) which leads to \(\lim _{k \rightarrow \infty } (\beta (T_\mathrm{new2}) - \beta (T_{\chi })) \ge 0\).

  2. 2.

    Proof of 2: Note that \(\mathcal{A}_{1i} = 2(1+o(1))\) and \(\mathcal{A}_{2i} = 4(1+o(1))\) where o(1) is uniform in i. Using \(\bar{\pi }= (k^{-\gamma } + \delta k^{\alpha -1}) (1+O(k^{-1}))\) and \(\tilde{\theta }= \bar{\pi }(1+ o(1))\), we obtain

    $$\begin{aligned} \mathcal{V}_1= & {} \left( 2 \sum _{i=1}^k\theta _i^2 + 4 \sum _{i=1}^k\frac{\theta _i}{n_i} \right) (1+o(1)) \\= & {} \left( 2 (k-1) k^{-2\gamma } + 2 (k^{-\gamma }+\delta )^2+ \frac{(k-1)k^{-\gamma }}{n} + \frac{k^{-\gamma } + \delta }{ n k^{\alpha }} \right) \\= & {} \left( 2k^{1-2\gamma } + 2 \delta ^2 + \frac{4k^{1-\gamma }}{n}\right) (1+o(1))\\ \mathcal{V}_{1*}= & {} 2 k(k^{-\gamma } + \delta k^{\alpha -1})^2 (1+O(k^{-1})) + 4 \tilde{\theta }\sum _{i=1}^k\frac{1}{n_i} \\= & {} 2 k^{1-2\gamma } + 4\delta k^{\alpha -\gamma } +2 \delta ^2 k^{2\alpha -1}\\&+ 4(k^{-\gamma } + \delta k^{\alpha -1}) \left( \frac{k-1}{n} + \frac{1}{ nk^{\alpha }} \right) (1+o(1)) \\= & {} 2 k^{1-2\gamma } + 4 \delta k^{\alpha -\gamma } +2 \delta ^2 k^{2\alpha -1} + 4 \frac{k^{1-\gamma } + \delta k^{\alpha }}{n} (1+o(1)) \end{aligned}$$

    so

    $$\begin{aligned} \frac{\mathcal{V}_{1*} - \mathcal{V}_1}{\mathcal{V}_1} = \frac{ (2\delta k^{\alpha -\gamma } + \delta ^2 (k^{2\alpha -1}-1)) + 2\frac{\delta k^{\alpha }}{n} (1+o(1))}{k^{1-2\gamma } + 2 \delta ^2 + 2 \frac{k^{1-\gamma }}{n} (1+o(1))}. \end{aligned}$$
    (28)
    1. (a)

      if \(\alpha + \gamma <1\) and \(\alpha \ge \frac{1}{2}\), then \(k^{\alpha -\gamma } = o(k^{2\alpha -1})\), therefore (28) \(= \frac{\delta ^2 k^{2\alpha -1} I(\alpha \ne \frac{1}{2}) + 2\frac{k^{\alpha }}{n}}{ k^{1-2\gamma } + \delta ^2 + 2\frac{k^{1-\gamma }}{n}} \rightarrow 0\) where \(I(\cdot )\) is an indicator function.

    2. (b)

      if \(\alpha +\gamma <1\), \(\alpha < \frac{1}{2}\) and \(\alpha \ge \gamma \), then (28) \(= \frac{2 \delta k^{\alpha -\gamma }-\delta ^2 + 2\frac{k^{\alpha }}{n}}{k^{1-2\gamma } + \delta ^2 + 2\frac{k^{1-\gamma }}{n}} \rightarrow 0\).

    3. (c)

      if \(\alpha +\gamma <1\), \(\alpha < \frac{1}{2}\), \(\gamma \le \frac{1}{2}\) and \(\alpha <\gamma \), then (28)\(= \frac{-\delta ^2 + 2\frac{k^{\alpha }}{n}}{ k^{1-2\gamma } + \delta ^2 + 2\frac{k^{1-\gamma }}{n}} \rightarrow 0\).

    4. (d)

      if \(\alpha +\gamma <1\), \(\alpha < \frac{1}{2}\) and \(\gamma > \frac{1}{2}\), then there are two cases depending on the behavior of n. When \(\frac{k^{1-\gamma }}{ n} \rightarrow 0\), then \((28) \rightarrow \frac{-\delta ^2}{\delta ^2} = -1\). When \(\frac{k^{1-\gamma }}{n} \rightarrow \infty \), (28) \(= \frac{\frac{k^{\alpha }}{n}}{\frac{k^{1-\gamma }}{n}} (1+o(1)) = k^{\alpha +\gamma -1} \rightarrow 0\).

    5. (e)

      if \(\alpha +\gamma >1\), \(\alpha >\frac{1}{2}\) and \(\gamma < \frac{1}{2}\), then (28) \(= \frac{\delta ^2 k^{2\alpha -1} + 2\frac{k^{\alpha }}{n}}{ k^{1-2\gamma } + 2\frac{k^{1-\gamma }}{n}} (1+o(1)) \rightarrow \infty \).

    6. (f)

      if \(\alpha +\gamma >1\), \(\alpha >\frac{1}{2}\) and \(\gamma \ge \frac{1}{2}\), then (28) \(= \frac{ k^{\alpha -\gamma } + \delta ^2 k^{2\alpha -1} + 2\frac{k^{\alpha }}{n} }{ I(\gamma =\frac{1}{2}) +\delta ^2 + 2\frac{k^{1-\gamma }}{n}} (1+o(1)) \rightarrow \infty \).

    7. (g)

      if \(\alpha +\gamma >1\), \(\alpha <\frac{1}{2}\) and \(\gamma >\frac{1}{2}\), then \(\alpha <\gamma \) and (28) \(= \frac{ - \delta ^2 + 2\frac{k^{\alpha }}{n} }{ \delta ^2 + 2\frac{k^{1-\gamma }}{n}} (1+o(1)) = \frac{-\delta ^2 + \frac{k^{\alpha }}{n}}{ \delta ^2 + 2\frac{k^{1-\gamma }}{n}} (1+o(1))\). There are two situations depending n. When \(\frac{k^{\alpha }}{n} \rightarrow \infty \), (28) \(= \frac{-\delta ^2 + \frac{k^{\alpha }}{n}}{ 2\delta ^2 + \frac{k^{1-\gamma }}{n}} (1+o(1)) \rightarrow \infty \). When \(\frac{k^{\alpha }}{n} \rightarrow 0\), we have \(\frac{k^{1-\gamma }}{n} \rightarrow 0\), so we derive (28) \(= \frac{-\delta ^2}{ \delta ^2 } (1+o(1)) \rightarrow -1\).

    In \( (a) \cup (b) \cup (c) = \{ (\alpha ,\gamma ) : 0<\alpha<1, 0<\gamma<1, 0< \alpha + \gamma<1, 0< \gamma \le \frac{1}{2} \}\), we have \(\lim _n \frac{\mathcal{V}_{1*}}{\mathcal{V}_1} =1\) leading to \(\lim _n (\beta (T_\mathrm{new1}) - \beta (T_\mathrm{new1}))=0\). In \((e)\cup (f) = \{(\alpha , \gamma ) : 0<\alpha<1, 0<\gamma <1, \alpha + \gamma> 1, 1>\alpha >\frac{1}{2} \}\), we have \( \lim \frac{\mathcal{V}_{1*}}{\mathcal{V}} >1\) which leads to \(\lim _n (\beta (T_\mathrm{new1}) - \beta (T_\mathrm{new2})) >0\).

    In (e) and (g), the performances are different depending on the sample sizes.

  3. 3.

    We first have

    $$\begin{aligned} \mathcal{V}_1= & {} 2 (k^{-\gamma } + \delta )^2 + 2 (k -1) k^{-2\gamma } + \frac{4(\delta + k^{-\gamma })}{n} + 4 (k-1)\frac{k^{-\gamma }}{ nk^{\alpha }} \\= & {} \left( 2 \delta ^2 + 2 k^{1-2\gamma } + \frac{4 k^{1-\gamma -\alpha }}{n} \right) (1+o(1)). \end{aligned}$$

    Since \(\tilde{\theta }= \bar{\pi }(1-\bar{\pi }) = \frac{\delta + k^{\alpha -\gamma +1}}{k^{\alpha +1}} (1+o(1)) = k^{-\gamma }(1+o(1))\) from \( 0< \alpha <1\) and \(0<\gamma <1\),

    $$\begin{aligned} \mathcal{V}_1^*= & {} 2 k^{1-2\gamma } + \frac{4k^{-\gamma }}{n} + \frac{ (k-1)k^{-\gamma } }{nk^{\alpha }} \\= & {} \left( 2 k^{1-2\gamma } + \frac{4 k^{1-\gamma -\alpha }}{n} \right) (1+o(1)). \end{aligned}$$

    If \(1-2\gamma <0\) and \( k^{1-\gamma -\alpha } =o(n)\), then \(\mathcal{V}_1 = \delta ^2 (1+o(1))\) and \(\mathcal{V}_1^* =o(1)\), we have \(\frac{\mathcal{V}_1}{ \mathcal{V}_1^* } \rightarrow \infty \) which leads to \( \beta (T_\mathrm{new2}) - \beta (T_\mathrm{new1}) >0\). \(\square \)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, J. Testing homogeneity of proportions from sparse binomial data with a large number of groups. Ann Inst Stat Math 71, 505–535 (2019). https://doi.org/10.1007/s10463-018-0652-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-018-0652-2

Keywords

Navigation