Skip to main content
Log in

Two-sample test of stochastic block models via the maximum sampling entry-wise deviation

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

The paper discusses a statistical problem related to testing for differences between two networks with community structures. While existing methods have been proposed, they encounter challenges and do not perform effectively when the networks become sparse. We propose a test statistic that combines a method proposed by Wu and Hu (2024) and a resampling process. Specifically, the proposed test statistic proves effective under the condition that the community-wise edge probability matrices have entries of order \(\Omega (\log n/n)\), where n denotes the network size. We derive the asymptotic null distribution of the test statistic and provide a guarantee of asymptotic power against the alternative hypothesis. To evaluate the performance of the proposed test statistic, we conduct simulations and provide real data examples. The results indicate that the proposed test statistic performs well for both dense and sparse networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The dataset used in this paper is publicly available, with references provided in the text.

References

  • Abbe, E. (2018). Community Detection and Stochastic Block Models: Recent Developments. Journal of Machine Learning Research, 18(177), 1–86.

    Google Scholar 

  • Amini, A. A., Chen, A., Bickel, P. J., & Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. The Annals of Statistics, 41(4), 2097–2122.

    Article  MathSciNet  Google Scholar 

  • Bassett, D. S., Bullmore, E., Verchinski, B. A., Mattay, V. S., Weinberger, D. R., & Meyer-Lindenberg, A. (2008). Hierarchical Organization of Human Cortical Networks in Health and Schizophrenia. Journal of Neuroscience, 28(37), 9239–9248.

    Article  CAS  PubMed  Google Scholar 

  • Bickel, P., Choi, D., Chang, X., & Zhang, H. (2013). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. The Annals of Statistics, 41(4), 1922–1943.

    Article  MathSciNet  Google Scholar 

  • Bickel, P. J., & Sarkar, P. (2016). Hypothesis testing for automated community detection in networks. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(1), 253–273.

    Article  MathSciNet  Google Scholar 

  • Chen, K., & Lei, J. (2018). Network Cross-Validation for Determining the Number of Communities in Network Data. Journal of the American Statistical Association, 113(521), 241–251.

    Article  MathSciNet  CAS  Google Scholar 

  • Chen, J., & Yuan, B. (2006). Detecting functional modules in the yeast protein-protein interaction network. Bioinformatics (Oxford, England), 22(18), 2283–2290.

    CAS  PubMed  Google Scholar 

  • Chen, L., Zhou, J., & Lin, L. (2021). Hypothesis testing for populations of networks. Communications in Statistics - Theory and Methods 0(0), 1–24.

  • Chernozhukov, V., Chetverikov, D., & Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. The Annals of Statistics, 41(6), 2786–2819.

    Article  MathSciNet  Google Scholar 

  • Dong, Z., Wang, S., & Liu, Q. (2020). Spectral based hypothesis testing for community detection in complex networks. Information Sciences, 512, 1360–1371.

    Article  MathSciNet  Google Scholar 

  • Fan, J., & Jiang, T. (2019). Largest entries of sample correlation matrices from equi-correlated normal populations. The Annals of Probability, 47(5), 3321–3374.

    Article  MathSciNet  Google Scholar 

  • Gangrade, A., Venkatesh, P., Nazer, B., & Saligrama, V. (2019). Efficient near-optimal testing of community changes in balanced stochastic block models. Advances in Neural Information Processing Systems 32.

  • Gao, C., Ma, Z., Zhang, A. Y., & Zhou, H. H. (2017). Achieving Optimal Misclassification Proportion in Stochastic Block Models. Journal of Machine Learning Research, 18(60), 1–45.

    MathSciNet  Google Scholar 

  • Ghoshdastidar, D., Gutzeit, M., Carpentier, A., & von Luxburg, U. (2020). Two-sample hypothesis testing for inhomogeneous random graphs. Annals of Statistics, 48(4), 2208–2229.

    Article  MathSciNet  Google Scholar 

  • Ghoshdastidar, D. & von Luxburg, U. (2018). Practical Methods for Graph Two-Sample Testing. Advances in Neural Information Processing Systems 31.

  • Holland, P. W., Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks, 5(2), 109–137.

    Article  MathSciNet  Google Scholar 

  • Hu, J., Qin, H., Yan, T., & Zhao, Y. (2020). Corrected Bayesian Information Criterion for Stochastic Block Models. Journal of the American Statistical Association, 115(532), 1771–1783.

    Article  MathSciNet  CAS  Google Scholar 

  • Hu, J., Zhang, J., Qin, H., Yan, T. & Zhu, J. (2020). Using Maximum Entry-Wise Deviation to Test the Goodness of Fit for Stochastic Block Models. Journal of the American Statistical Association 0(0), 1–10.

  • Ji, P., & Jin, J. (2016). Coauthorship and citation networks for statisticians. The Annals of Applied Statistics, 10(4), 1779–1812.

    MathSciNet  Google Scholar 

  • Ji, P., Jin, J., Ke, Z.T., & Li, W. (2021). Co-citation and Co-authorship Networks of Statisticians. Journal of Business & Economic Statistics 0(0), 1–17

  • Jin, J. (2015). Fast community detection by SCORE. The Annals of Statistics, 43(1), 57–89.

    Article  MathSciNet  Google Scholar 

  • Jing, B.-Y., Li, T., Ying, N., & Yu, X. (2022). Community detection in sparse networks using the symmetrized laplacian inverse matrix (slim). Statistica Sinica, 32, 1–22.

    MathSciNet  Google Scholar 

  • Krishna Reddy, P., Kitsuregawa, M., Sreekanth, P., & Srinivasa Rao, S. (2002). A graph based approach to extract a neighborhood customer community for collaborative filtering. In S. Bhalla (Ed.), Databases in Networked Information Systems (pp. 188–200). Berlin, Heidelberg: Springer.

    Chapter  Google Scholar 

  • Le, C. M., & Levina, E. (2022). Estimating the number of communities by spectral methods. Electronic Journal of Statistics, 16(1), 3315–3342.

    Article  MathSciNet  Google Scholar 

  • Le, C. M., Levina, E., & Vershynin, R. (2017). Concentration and regularization of random graphs. Random Structures & Algorithms, 51(3), 538–561.

    Article  MathSciNet  Google Scholar 

  • Leadbetter, M. R., Lindgren, G., & Rootzén, H. (1983). Extremes and Related Properties of Random Sequences and Processes. New York, NY: Springer Series in Statistics. Springer.

    Book  Google Scholar 

  • Lei, J. (2016). A goodness-of-fit test for stochastic block models. Annals of Statistics, 44(1), 401–424.

    Article  MathSciNet  Google Scholar 

  • Lei, J., & Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Annals of Statistics, 43(1), 215–237.

    Article  MathSciNet  Google Scholar 

  • Li, T., Levina, E., & Zhu, J. (2020). Network cross-validation by edge sampling. Biometrika, 107(2), 257–276.

    Article  MathSciNet  Google Scholar 

  • Ma, X., Wang, B., & Yu, L. (2018). Semi-supervised spectral algorithms for community detection in complex networks based on equivalence of clustering methods. Physica A: Statistical Mechanics and its Applications, 490, 786–802.

    Article  ADS  MathSciNet  Google Scholar 

  • Newman, M. E. J., & Leicht, E. A. (2007). Mixture models and exploratory analysis in networks. Proceedings of the National Academy of Sciences, 104(23), 9564.

    Article  ADS  CAS  Google Scholar 

  • Newman, M.E.J. (2006). Finding community structure in networks using the eigenvectors of matrices. Physical Review E. Statistical, Nonlinear, and Soft Matter Physics 74(3), 036104–19.

  • Pal, S., & Zhu, Y. (2021). Community detection in the sparse hypergraph stochastic block model. Random Structures & Algorithms, 59(3), 407–463.

    Article  MathSciNet  Google Scholar 

  • Pontes, B., Giráldez, R., & Aguilar-Ruiz, J. S. (2015). Biclustering on expression data: A review. Journal of Biomedical Informatics, 57, 163–180.

    Article  PubMed  Google Scholar 

  • Rohe, K., Chatterjee, S., & Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 39(4), 1878–1915.

    Article  MathSciNet  Google Scholar 

  • Rossi, L., & Magnani, M. (2015). Towards effective visual analytics on multiplex and multilayer networks. Chaos, Solitons & Fractals, 72, 68–76.

    Article  ADS  MathSciNet  Google Scholar 

  • Saldaña, D. F., Yu, Y., & Feng, Y. (2017). How Many Communities Are There? Journal of Computational and Graphical Statistics, 26(1), 171–181.

    Article  MathSciNet  Google Scholar 

  • Tang, M., Athreya, A., Sussman, D. L., Lyzinski, V., Park, Y., & Priebe, C. E. (2017). A Semiparametric Two-Sample Hypothesis Testing Problem for Random Graphs. Journal of Computational and Graphical Statistics, 26(2), 344–354.

    Article  MathSciNet  Google Scholar 

  • Tang, M., Athreya, A., Sussman, D. L., Lyzinski, V., & Priebe, C. E. (2017). A nonparametric two-sample hypothesis testing problem for random graphs. Bernoulli, 23(3), 1599–1630.

    Article  MathSciNet  Google Scholar 

  • Wang, Y. X. R., & Bickel, P. J. (2017). Likelihood-based model selection for stochastic block models. Annals of Statistics, 45(2), 500–528.

    Article  MathSciNet  Google Scholar 

  • Westveld, A.H., & Hoff, P.D. (2011). A mixed effects model for longitudinal relational and network data, with applications to international trade and conflict. The Annals of Applied Statistics, 5(2A)

  • Wu, Q., & Hu, J. (2024). Two-sample test of stochastic block models. Computational Statistics & Data Analysis, 192, 107903.

    Article  MathSciNet  Google Scholar 

  • Wu, Y., Lan, W., Feng, L., & Tsai, C.-L. (2022). Testing stochastic block models via the maximum sampling entry-wise deviation. Manuscript.

  • Zhang, B., Li, H., Riggins, R. B., Zhan, M., Xuan, J., Zhang, Z., Hoffman, E. P., Clarke, R., & Wang, Y. (2009). Differential dependency network analysis to identify condition-specific topological changes in biological networks. Bioinformatics, 25(4), 526–532.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Editor, Associate Editor, and the two referees for their insightful comments.

Funding

Jiang Hu was partially supported by National Natural Science Foundation of China (Grant Nos. 12171078, 12292980, and 12292982), National Key R & D Program of China No. 2020YFA0714102 and Fundamental Research Funds for the Central Universities No. 2412023YQ003.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiang Hu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Detailed proofs of Theorem 1 and 2

Firstly we introduce the following lemma before the main proof.

Lemma 1

(Theorem 1.5.1 in Leadbetter et al. (1983)) Let \(Z_1, Z_2, \ldots , Z_{n^{*}}\) be a sequence of independent random variables with distribution function F(x) for \(-\infty< x < \infty\). Let \(z_1, z_2, \ldots , z_{n^{*}}\) be a sequence of real numbers, and let \(0< \tau < 1\). Define \(M_n = \max \{Z_1, Z_2, \ldots , Z_{n^{*}}\}\). Then, the equality

$$\begin{aligned} P(M_n \le z_n) = e^{-\tau } \end{aligned}$$

holds true if and only if

$$\begin{aligned} \lim _{n^{*} \rightarrow \infty } n^{*}\left( 1 - F(z_n^{*})\right) = \tau . \end{aligned}$$

Proof of Theorem 1

In this section, we present the proofs of Theorem 1.

Proof of Theorem 1

First, we bound the estimation error of the following equation:

$$\begin{aligned} \begin{aligned}&\max \limits _{1 \le u \le K,1 \le v \le K} \vert \frac{\sqrt{B_{1,uv}(1-B_{1,uv})+B_{2,uv}(1-B_{2,uv})}}{\sqrt{\hat{B}_{1,uv}(1-\hat{B}_{1,uv}) +\hat{B}_{2,uv}(1-\hat{B}_{2,uv})}}-1\vert \\&\quad = \max \limits _{1 \le u \le K,1 \le v \le K} \Bigg \vert \frac{\sqrt{B_{1,uv}(1-B_{1,uv})+B_{2,uv}(1-B_{2,uv})}}{\sqrt{\hat{B}_{1,uv} (1-\hat{B}_{1,uv})+\hat{B}_{2,uv}(1-\hat{B}_{2,uv})}}\\&\quad -\frac{\sqrt{\hat{B}_{1,uv}(1-\hat{B}_{1,uv}) +\hat{B}_{2,uv}(1-\hat{B}_{2,uv})}}{\sqrt{\hat{B}_{1,uv}(1-\hat{B}_{1,uv}) +\hat{B}_{2,uv}(1-\hat{B}_{2,uv})}}\Bigg \vert \\&\quad = O_p( \frac{K}{\log n}). \end{aligned} \end{aligned}$$
(A1)

Next, denote that

$$\begin{aligned} \begin{aligned}&F_{n,0} \triangleq \max \limits _{1 \le i \le n,1 \le k \le K} \vert \hat{\gamma }_{mk,0} \vert \\&\quad =\max \limits _{1 \le i \le n,1 \le k \le K} \vert \frac{\hat{\rho }_{m_1k,0}+\hat{\rho }_{m_2k,0}+\cdots +\hat{\rho }_{m_Sk,0}}{\sqrt{S}}\vert \\&\quad =\max \limits _{1 \le i \le n,1 \le k \le K} \Bigg \vert \frac{1}{\sqrt{S}}\sum _{s=1}^{S} \frac{1}{\sqrt{ \#( \ g^{-1}(k)\backslash \left\{ i_{m_s}\right\} )}} \\&\quad \times \sum _{j\in g^{-1}(k)\backslash \left\{ i_{m_s}\right\} }\frac{A_{1,ij}-A_{2,ij}}{\sqrt{ B_{1, g_{i} g_{j}}(1- B_{1, g_{i} g_{j}})+ B_{2,{g}_{i} g_{j}}(1- B_{2, g_{i} g_{j}})}}\Bigg \vert . \end{aligned} \end{aligned}$$

In accordance with Eq. A1, we can derive that

$$\begin{aligned} \begin{aligned} F_{n}&\triangleq \max \limits _{1 \le i \le n,1 \le k \le K} \vert \hat{\gamma }_{mk} \vert \\&=\max \limits _{1 \le i \le n,1 \le k \le K} \vert \frac{\hat{\rho }_{m_1k}+\hat{\rho }_{m_2k} +\cdots +\hat{\rho }_{m_Sk}}{\sqrt{S}}\vert \\&=\max \limits _{1 \le i \le n,1 \le k \le K} \Bigg \vert \frac{1}{\sqrt{S}}\sum _{s=1}^{S} \frac{1}{\sqrt{ \#( \ \hat{g}^{-1}(k)\backslash \left\{ i_{m_s}\right\} )}} \\&\quad \times \sum _{j\in \hat{g}^{-1}(k)\backslash \left\{ i_{m_s}\right\} }\frac{A_{1,ij} -A_{2,ij}}{\sqrt{ \hat{B}_{1, \hat{g}_{i} \hat{g}_{j}}(1- \hat{B}_{1, \hat{g}_{i} \hat{g}_{j}}) + \hat{B}_{2,{\hat{g}}_{i} \hat{g}_{j}}(1- \hat{B}_{2, \hat{g}_{i} \hat{g}_{j}})}}\Bigg \vert \\&=\max \limits _{1 \le i \le n,1 \le k \le K} \Bigg \vert \frac{1}{\sqrt{S}}\sum _{s=1}^{S} \frac{\sum _{j\in g^{-1}(k)\backslash \left\{ i_{m_s}\right\} }\frac{A_{1,ij}-A_{2,ij}}{\sqrt{ B_{1, g_{i} g_{j}}(1- B_{1, g_{i} g_{j}})+ B_{2,{g}_{i} g_{j}}(1- B_{2, g_{i} g_{j}})}}}{\sqrt{ \#( \ g^{-1}(k)\backslash \left\{ i_{m_s}\right\} )}} \\&\quad \quad \times \frac{\sqrt{B_{1,g_{i}g_{j}}(1-B_{1,g_{i}g_{j}})+B_{2,g_{i}g_{j}} (1-B_{2,g_{i}g_{j}})}}{\sqrt{\hat{B}_{1,g_{i}g_{j}}(1-\hat{B}_{1,g_{i}g_{j}})+\hat{B}_{2,g_{i}g_{j}} (1-\hat{B}_{2,g_{i}g_{j}})}}\Bigg \vert \\&\quad = F_{n,0}(1+O_p( \frac{K}{\log n})). \end{aligned} \end{aligned}$$

Under Assumption 3, we have \(K = o(\sqrt{\log n})\), \(MK=o(n)\) and if \(F_{n,0} = O_P(\sqrt{\log MK})\), then we have

$$\begin{aligned} F_n = F_{n,0}+o_p(1). \end{aligned}$$

Thus to prove Theorem 1, it is sufficient to show that

$$\begin{aligned} P\left( F^{2}_{n,0} - 2\log (MK) + \log \log (MK) \le x \right) \rightarrow \exp \left( -\frac{1}{\sqrt{\pi }}e^{-x/2}\right) . \end{aligned}$$

To derive the asymptotic null distribution of \(F_{n}\), we define the \(\sigma\)-field \({\mathcal {G}} =\sigma \{ \hat{\rho }_{ik,0}: 1 \le i \le n; 1 \le k \le K \}\) that is generated from the auxiliary quantity \(\hat{\rho }_{ik,0}\) in \(F_{n,0}\). Let \(\tilde{\rho }_{ik}\) be the observed value of \(\hat{\rho }_{ik,0}\). Then, conditional on \({\mathcal {G}}\), the \(\hat{\rho }_{ik,0}\)s are independent and identically distributed with the probability \(P(\hat{\rho }_{ik,0} = \tilde{\rho }_{ik}) = \frac{1}{nK}\) for any \(1 \le i \le n\) and \(1 \le k \le K\). Note that \(\hat{\gamma }_{mk,0}\) is calculated from \(\hat{\rho }_{ik,0}\) in \(F_{n,0}\), and we denote \(\tilde{\rho }_{ik}\) as the observed value of \(\hat{\rho }_{ik,0}\) calculated from \(\tilde{\rho }_{iv}\). Let \(\bar{\rho } = \frac{1}{nK} \sum _{i=1}^n \sum _{k=1}^K \tilde{\rho }_{ik}\) and \(\bar{\gamma } = \frac{1}{MK} \sum _{m=1}^M\sum _{k=1}^K \tilde{\gamma }_{mk}\).

As a result, \(E(\hat{\rho }_{iv,0} - \bar{\rho } | {\mathcal {G}}) = 0\) and \(E(\hat{\gamma }_{mv,0} - \bar{\gamma } | {\mathcal {G}}) = 0\). By Corollary 2.1 in Chernozhukov et al. (2013) and conditional on \({\mathcal {G}}\), as \(\min \{n, M, S\} \rightarrow \infty\), we can derive that

$$\begin{aligned}{} & {} \sup _{x \in {\mathbb {R}}} \left| P\left( \max _{1 \le m \le M, 1 \le k \le K} \frac{1}{\sqrt{S}} \sum _{s=1}^{S}\hat{\rho }_{m_sk,0}- \sqrt{B}\bar{\rho } \le x\right) - P\left( \max _{1 \le l \le MK} U_l \le x\right) \right| \\{} & {} \quad \le CS^{-c} \rightarrow 0, \end{aligned}$$

where \(U = (U_1, \ldots , U_{MK})^\top \in {\mathbb {R}}^{MK}\) is a Gaussian random vector with mean 0 and covariance matrix \(\text {cov}(\hat{\gamma }_{mk,0})\), and c and C are some finite positive constants. Define \(\check{\gamma } = \text {vec}(\hat{\gamma }_{mk,0}) = (\check{\gamma }_1, \ldots , \check{\gamma }_{MK})^\top \in {\mathbb {R}}^{MK}\) for \(1 \le m \le M\) and \(1 \le k \le K\). As a result, the above inequality can be rewritten as

$$\begin{aligned} \sup _{x \in {\mathbb {R}}} \left| P\left( \max _{1 \le l \le MK} \check{\gamma }_l -\bar{\gamma } \le x\right) - P\left( \max _{1 \le l \le MK} U_l\le x\right) \right| \le CS^{-c} \rightarrow 0 \end{aligned}$$
(A2)

as \(\min \{n, M, B\} \rightarrow \infty\).

Next, we calculate \(\text {cov}(\check{\gamma })\). The diagonal elements of \(\text {cov}(\check{\gamma })\) are 1 by definition, it suffices to compute \(\text {corr}(\check{\gamma })\). Denote \(\hat{\rho }_{.k,0}\) as the vector containing all elements \(\hat{\rho }_{ik,0}\) in block k. In addition, let \(\Lambda _{m_s} = (\lambda _{m_s,1}, \lambda _{m_s,2}, \ldots , \lambda _{ms,n})^T\) be some random variables that are independently generated from the binomial distribution \(\text {Bernoulli}(n, \frac{S}{n})\) for \(1 \le m_s \le M\), and they are independent of \(\hat{\rho }_{.k,0}\). Thus, for \(i = 1, 2, \ldots , n\), \(\lambda _{m_s,n}\) follows the Bernoulli distribution with probability \(\frac{S}{n}\), which implies that \(E(\lambda _{m_s,i}) = E(\lambda ^2_{m_s,i}) = \frac{S}{n}\). As a result, we obtain that \(\hat{\gamma }_{m_sk,0} = \Lambda ^T_{m_s}\hat{\rho }_{.k,0}/ \sqrt{S}\). Then, for any \(1 \le m_s, m_l \le M\) with subscripts \(s \ne l\), it can be shown that

$$\begin{aligned} \delta _{\gamma } \triangleq \text {corr}(\hat{\gamma }_{m_sk,0}, \hat{\gamma }_{m_lk,0})&=\frac{\text {cov}(\hat{\gamma }_{m_sk,0},\hat{\gamma }_{m_lk,0})}{\sqrt{\text {var}(\hat{\gamma }_{m_sk,0})}\sqrt{\text {var}(\hat{\gamma }_{m_lk,0})}} =\frac{E(\Lambda ^T_{m_s}\hat{\rho }_{.k,0}\Lambda ^T_{m_l}\hat{\rho }_{.k,0})}{E(\Lambda ^T_{m_s} \hat{\rho }_{.k,0}\Lambda ^T_{m_s}\hat{\rho }_{.k,0})}\\&=\frac{\sum _{i=1}^{n}E(\lambda _{m_s,i}\lambda _{m_l,i}\hat{\rho }^{2}_{ik,0})}{\sum _{i=1}^{n}E(\lambda ^2_{m_s,i}\hat{\rho }^{2}_{ik,0})}=\frac{\sum _{i=1}^{n}E(\lambda _{m_s,i} \lambda _{m_l,i})E(\hat{\rho }^{2}_{ik,0})}{\sum _{i=1}^{n}E(\lambda ^2_{m_s,i})E(\hat{\rho }^{2}_{ik,0})}\\&=\frac{S}{n}. \end{aligned}$$

It can be seen that correlations between \(\check{\gamma }_{s}\) and \(\check{\gamma }_{l}\) are all equal for any \(s \ne l\). According to Fan and Jiang (2019), we can rewrite \(\max _{1 \le l \le MK} U_l\) as following:

$$\begin{aligned} \max _{1 \le l \le MK} U_l = \sqrt{\delta _{\gamma }}\tilde{U}_0 + \sqrt{1-\delta _{\gamma }} \max _{1 \le l \le MK} \tilde{U}_l, \end{aligned}$$

where \(\tilde{U}_0, \tilde{U}_1, \ldots , \tilde{U}_{MK}\) are independent and identically distributed standard normal variables. Now we can write (A2) as

$$\begin{aligned}&\sup _{x \in {\mathbb {R}}} \left| P\left( \max _{1 \le l \le MK} \check{\gamma }_l -\bar{\gamma } \le x\right) \right. \nonumber \\&\quad \left. -P\left( \sqrt{\delta _{\gamma }}\tilde{U}_0 + \sqrt{1-\delta _{\gamma }} \max _{1 \le l \le MK} \tilde{U}_l\le x\right) \right| \le CS^{-c} \rightarrow 0 \end{aligned}$$
(A3)

Note that, under the null hypothesis we have \({\mathbb {E}}(\bar{\gamma }) = 0\). Next we derive \(\text {var}(\bar{\gamma })\).

$$\begin{aligned} \text {var}(\bar{\gamma })&= E(\bar{\gamma }^2) = E \left[ \frac{1}{MK} \sum _{m=1}^{M} \sum _{v=k}^{K} (\tilde{\gamma }_{mv})^2 \right] \\&= \frac{1}{M^2K^2} \left[ E \sum _{m_1=1}^{M} \sum _{m_2=1}^{M} \sum _{k_1=1}^{K} \sum _{k_2=1}^{K} \tilde{\gamma }_{m_1k_1} \tilde{\gamma }_{m_2k_2} \right] \\&= E[\tilde{\gamma }_{m_1k}\tilde{\gamma }_{m_1k}I(k_1=k_2=k)] +E[\tilde{\gamma }_{mk}I(m_1=m_2=m,k_1=k_2=k)]\\&= \delta _{\gamma } \frac{M(M-1)}{M^2} + \frac{1}{MK}\\&=\frac{S}{n}\frac{M(M-1)}{M^2} + \frac{1}{MK}. \end{aligned}$$

Under Assumption 3, we have \(\text {var}(\bar{\gamma })\rightarrow 0\), as \(\min \{n, M, B\} \rightarrow \infty\).

This implies \(\bar{\gamma } = o_p(1)\). Additionally under Assumption 3, we have \(\delta _{\gamma }=o(1)\).

As a result, we can rewrite (A3) as

$$\begin{aligned} \sup _{x \in {\mathbb {R}}} \left| P(\max _{1 \le l \le MK} \check{\gamma }_l\le x)- P( \max _{1 \le l \le MK} \tilde{U}_l\le x)\right| \rightarrow 0, \end{aligned}$$
(A4)

as \(\min \{n, M, B\} \rightarrow \infty\).

For any \(x \in {\mathbb {R}}\), denote \(u = \frac{1}{2} \log (MK) - \log \log (MK) + x\). We then have

$$\begin{aligned} P \left( |N(0, 1) |\ge u \right)&\sim \frac{2}{\sqrt{2 \pi u}} \\&\sim \frac{1}{\sqrt{\pi }}\frac{1}{\sqrt{\log MK}} \exp \left( -\frac{1}{2}\{2\log MK-\log \log MK+x\} \right) \\&\sim \frac{1}{\sqrt{\pi }}\frac{\exp \left( -\frac{x}{2} \right) }{MK}. \end{aligned}$$

Subsequently, by Lemma 1, as \(\min \{n, M, B\} \rightarrow \infty\), we have

$$\begin{aligned} P \left( \max _{1 \le l \le MK} \tilde{U}_l^2 - 2 \log (MK) + \log \log (MK)\le x \right) \rightarrow \exp \left( -\frac{1}{\sqrt{\pi }}\exp \left( -\frac{x}{2} \right) \right) , \end{aligned}$$

for any \(x \in {\mathbb {R}}\). Combining with (A4), we obtain

$$\begin{aligned} P \left( \max _{1 \le l \le MK} \check{\gamma }_l^2 - 2 \log (MK) + \log \log (MK)\le x \right) \rightarrow \exp \left( -\frac{1}{\sqrt{\pi }}\exp \left( -\frac{x}{2} \right) \right) . \end{aligned}$$
(A5)

Accordingly,

$$\begin{aligned} P \left( F^2_{n,0} - 2 \log (MK) + \log \log (MK)\le x \right) \rightarrow \exp \left( -\frac{1}{\sqrt{\pi }}\exp \left( -\frac{x}{2} \right) \right) . \end{aligned}$$

Since

$$\begin{aligned} F_n = F_{n,0}+o_p(1), \end{aligned}$$

we have

$$\begin{aligned} P \left( F^2_{n} - 2 \log (MK) + \log \log (MK)\le x \right) \rightarrow \exp \left( -\frac{1}{\sqrt{\pi }}\exp \left( -\frac{x}{2} \right) \right) . \end{aligned}$$

This completes the proof of Theorem 1.

\(\square\)

Proof of Theorem 2

In this section, we present the proofs of Theorem 2.

Proof of Theorem 2

Note that

$$\begin{aligned} \begin{aligned} \hat{\gamma }_{mk}&= \frac{\hat{\rho }_{m_1k}+\hat{\rho }_{m_2k}+\cdots +\hat{\rho }_{m_Sk}}{\sqrt{S}}\\&= \frac{1}{\sqrt{S}}\sum _{s=1}^{S} \frac{1}{\sqrt{ \#( \ \hat{g}^{-1}(k)\backslash \left\{ i_{m_s}\right\} )}} \\&\quad \sum _{j\in \hat{g}^{-1}(k)\backslash \left\{ i_{m_s}\right\} }\frac{A_{1,ij}-A_{2,ij}}{\sqrt{ \hat{B}_{1, \hat{g}_{i} \hat{g}_{j}}(1- \hat{B}_{1, \hat{g}_{i} \hat{g}_{j}})+ \hat{B}_{2,{g}_{i} \hat{g}_{j}}(1- \hat{B}_{2, \hat{g}_{i} \hat{g}_{j}})}} \\&= \frac{1}{\sqrt{S}}\sum _{s=1}^{S} \frac{1}{\sqrt{ \#( \ g^{-1}(k)\backslash \left\{ i_{m_s}\right\} )}} \\&\quad \sum _{j\in g^{-1}(k)\backslash \left\{ i_{m_s}\right\} }\left( \frac{A_{1,ij}-B_{1,g_{i}g_{j}}-(A_{2,ij}-B_{2,g_{i}g_{j}})}{\sqrt{B_{1,g_{i}g_{j}} (1-B_{1,g_{i}g_{j}})+B_{2,g_{i}g_{j}}(1-B_{2,g_{i}g_{j}})}}\right. \\&\quad +\left. \frac{B_{1,g_{i}g_{j}}-B_{2,g_{i}g_{j}}}{\sqrt{B_{1,g_{i}g_{j}}(1-B_{1,g_{i}g_{j}}) +B_{2,g_{i}g_{j}}(1-B_{2,g_{i}g_{j}})}}\right) \\&\quad \quad \quad \times \frac{\sqrt{B_{1,g_{i}g_{j}}(1-B_{1,g_{i}g_{j}})+B_{2,g_{i}g_{j}} (1-B_{2,g_{i}g_{j}})}}{\sqrt{\hat{B}_{1,g_{i}g_{j}}(1-\hat{B}_{1,g_{i}g_{j}}) +\hat{B}_{2,g_{i}g_{j}}(1-\hat{B}_{2,g_{i}g_{j}})}}. \end{aligned} \end{aligned}$$

From the discussion in the proof of Theorem 1, we can obtain that

$$\begin{aligned} \begin{aligned} F_{n}&=\max \limits _{1 \le m \le M,1 \le k \le K} \vert \hat{\gamma }_{mk} \vert \\&=\max \limits _{1 \le m \le M,1 \le k \le K} \vert \frac{1}{\sqrt{S}} \sum _{s=1}^{S} \frac{1}{\sqrt{ \#( \ g^{-1}(k)\backslash \left\{ i_{m_s}\right\} )}}\\&\times \sum _{j\in \hat{g}^{-1}(k)\backslash \left\{ i_{m_s}\right\} } \left( \frac{A_{1,ij}-B_{1,g_{i}g_{j}}-(A_{2,ij}-B_{2,g_{i}g_{j}})}{\sqrt{B_{1,g_{i}g_{j}}(1-B_{1,g_{i}g_{j}})+B_{2,g_{i}g_{j}}(1-B_{2,g_{i}g_{j}})}}\right. \\&\quad \quad +\left. \frac{B_{1,g_{i}g_{j}}-B_{2,g_{i}g_{j}}}{\sqrt{B_{1,g_{i}g_{j}} (1-B_{1,g_{i}g_{j}})+B_{2,g_{i}g_{j}}(1-B_{2,g_{i}g_{j}})}}\right) \vert (1+ o_p(1))\\&\quad \ge (l_1-l_2)(1+o_p(1)),\\ \end{aligned} \end{aligned}$$

where

$$\begin{aligned} l_1= & {} \max \limits _{1 \le m \le M,1 \le k \le K} \vert \frac{1}{\sqrt{S}}\sum _{s=1}^{S} \frac{1}{\sqrt{ \#( \ g^{-1}(k)\backslash \left\{ i_{m_s}\right\} )}} \\{} & {} \sum _{j\in g^{-1}(k)\backslash \left\{ i_{m_s}\right\} }\frac{B_{1,g_{i}g_{j}}-B_{2,g_{i}g_{j}}}{\sqrt{B_{1,g_{i}g_{j}} (1-B_{1,g_{i}g_{j}})+B_{2,g_{i}g_{j}}(1-B_{2,g_{i}g_{j}})}}) \vert ,\\ l_2= & {} \max \limits _{1 \le m \le M,1 \le k \le K} \vert \frac{1}{\sqrt{S}} \sum _{s=1}^{S} \frac{1}{\sqrt{ \#( \ g^{-1}(k)\backslash \left\{ i_{m_s}\right\} )}}\\{} & {} \sum _{j\in g^{-1}(k)\backslash \left\{ i_{m_s}\right\} }(\frac{A_{1,ij}-B_{1,g_{i}g_{j}} -(A_{2,ij}-B_{2,g_{i}g_{j}})}{\sqrt{B_{1,g_{i}g_{j}}(1-B_{1,g_{i}g_{j}})+B_{2,g_{i}g_{j}}(1-B_{2,g_{i}g_{j}})}} \vert . \end{aligned}$$

By by the result of (A5), we have that

$$\begin{aligned} l_2=O_p(\sqrt{\log (MK)}). \end{aligned}$$

Moreover, from Assumptions 1, 2, 3 and \(\max \limits _{1 \le i \le n,1 \le j \le n}\vert B_{1,g_{i}g_{j}}-B_{2,g_{i}g_{j}} \vert = \Omega (\frac{\log n}{n}\sqrt{\frac{K}{S}})\), we have that

$$\begin{aligned} \frac{l_1}{\sqrt{\log MK}}\ge & {} \sqrt{\frac{Sn}{K}}\sqrt{\frac{n}{\log n}}\Omega \left( \frac{\log n}{n}\sqrt{\frac{K}{S}}\right) /\sqrt{\log MK}\\= & {} \frac{\sqrt{\log n}}{\sqrt{\log MK}}\rightarrow \infty , \quad as \hspace{5.0pt}n\rightarrow \infty . \end{aligned}$$

Thus, we finally obtain that

$$\begin{aligned} P(T\ge c\log (MK))\rightarrow 1, \end{aligned}$$

for any positive constant c.

This completes the proof of Theorem 2. \(\square\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Q., Hu, J. Two-sample test of stochastic block models via the maximum sampling entry-wise deviation. J. Korean Stat. Soc. (2024). https://doi.org/10.1007/s42952-024-00260-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42952-024-00260-9

Keywords

Navigation