Improvements on SCORE, Especially for Weak Signals

Abstract

A network may have weak signals and severe degree heterogeneity, and may be very sparse in one occurrence but very dense in another. SCORE (Ann. Statist. 43, 57–89, 2015) is a recent approach to network community detection. It accommodates severe degree heterogeneity and is adaptive to different levels of sparsity, but its performance for networks with weak signals is unclear. In this paper, we show that in a broad class of network settings where we allow for weak signals, severe degree heterogeneity, and a wide range of network sparsity, SCORE achieves prefect clustering and has the so-called “exponential rate” in Hamming clustering errors. The proof uses the most recent advancement on entry-wise bounds for the leading eigenvectors of the network adjacency matrix. The theoretical analysis assures us that SCORE continues to work well in the weak signal settings, but it does not rule out the possibility that SCORE may be further improved to have better performance in real applications, especially for networks with weak signals. As a second contribution of the paper, we propose SCORE+ as an improved version of SCORE. We investigate SCORE+ with 8 network data sets and found that it outperforms several representative approaches. In particular, for the 6 data sets with relatively strong signals, SCORE+ has similar performance as that of SCORE, but for the 2 data sets (Simmons, Caltech) with possibly weak signals, SCORE+ has much lower error rates. SCORE+ proposes several changes to SCORE. We carefully explain the rationale underlying each of these changes, using a mixture of theoretical and numerical study.

This is a preview of subscription content, access via your institution.

Figure 1

Notes

  1. 1.

    We model \(\mathbb {E}[A]\) by Ω −diag(Ω) instead of Ω because the diagonals of \(\mathbb {E}[A]\) are all 0. Here, “main signal”, “secondary signal”, and “noise” refers to Ω, −diag(Ω) and W respectively.

  2. 2.

    For SBM, the diagonal entries of P can be unequal. DCBM has more free parameters, so we have to assume that P has unit diagonal entries to maintain identifiability.

  3. 3.

    A multi-\(\log (n)\) term is a term Ln > 0 that satisfies ”Lnnδ → 0 and \(L_n n^{\delta }\to \infty \) for any fixed constant δ > 0

  4. 4.

    For example, \(\frac {\hat {\xi }_{2}}{\hat {\xi }_{1}}\) is the n-dimensional vector \((\frac {\hat {\xi }_{2}(1)}{\hat {\xi }_{1}(1)}, \frac {\hat {\xi }_{2}(2)}{\hat {\xi }_{1}(2)}, \ldots , \frac {\hat {\xi }_{2}(n)}{\hat {\xi }_{1}(n)})^{\prime }\). Note that we may choose to threshold all entries of the n × (K − 1) matrix by \(\pm \log (n)\) from top and bottom (Jin, 2015), but this is not always necessary. For all data sets in this paper, thresholding or not only has a negligible difference.

  5. 5.

    When translating the bound in Gao et al. (2018), we notice that 𝜃i there have been normalized, so that their 𝜃i corresponds to our \((\theta _{i}/\bar {\theta })\).

  6. 6.

    This is analogous to the Students’ t-test, where for n samples from an unknown distribution, the t-test uses a normalization for the mean and a normalization for the variance.

References

  1. Abbe, E., Fan, J., Wang, K. and Zhong, Y. (2019). Entrywise eigenvector analysis of random matrices with low expected rank. Ann. Statist. (to appear).

  2. Adamic, L A and Glance, N (2005). The political blogosphere and the 2004 US election: divided they blog. In Proceedings of the 3rd international workshop on Link discovery, pp. 36–43.

  3. Airoldi, E., Blei, D., Fienberg, S. and Xing, E. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981–2014.

    MATH  Google Scholar 

  4. Bickel, P. J. and Chen, A (2009).

  5. Chaudhuri, K., Chung, F. and Tsiatas, A. (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. In Proceedings of the 25th annual conference on learning theory, JMLR workshop and conference proceedings, vol. 23, pp. 1–35.

  6. Chen, Y., Li, X. and Xu, J. (2018). Convexified modularity maximization for degree-corrected stochastic block models. Ann. Statist. 46, 1573–1602.

    MathSciNet  Article  Google Scholar 

  7. Duan, Y., Ke, Z. T. and Wang, M. (2018). State aggregation learning from Markov transition data. In NIPS workshop on probabilistic reinforcement learning and structured control.

  8. Fan, J., Fan, Y., Han, X. and Lv, J. (2019). SIMPLE: statistical inference on membership profiles in large networks. arXiv:1910.01734.

  9. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188.

    Article  Google Scholar 

  10. Gao, C., Ma, Z., Zhang, A.Y. and Zhou, H.H. (2018). Community detection in degree-corrected block models. Ann. Statist. 46, 2153–2185.

    MathSciNet  Article  Google Scholar 

  11. Girvan, M and Newman, M EJ (2002). Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99, 12, 7821–7826. National Acad Sciences.

    MathSciNet  Article  Google Scholar 

  12. Hastie, T., Tibshirani, R. and Friedman, J. (2009). The elements of statistical learning, 2nd edn. Springer, Berlin.

    Google Scholar 

  13. Ji, P. and Jin, J (2016). Coauthorship and citation networks for statisticians (with discussion). Ann. Appl. Statist. 10, 4, 1779–1812.

    Article  Google Scholar 

  14. Jin, J. (2015). Fast community detection by SCORE. Ann. Statist.43, 57–89.

    MathSciNet  Article  Google Scholar 

  15. Jin, J. and Ke, Z. T. (2018). Optimal membership estimation, especially for networks with severe degree heterogeneity. Manuscript.

  16. Jin, J., Ke, Z. T. and Luo, S. (2017). Estimating network memberships by simplex vertex hunting. arXiv:1708.07852.

  17. Jin, J., Tracy Ke, Z. and Luo, S. (2019). Optimal adaptivity of signed-polygon statistics for network testing. arXiv:1904.09532.

  18. Jin, J., Ke, Z. T., Luo, S. and Wang, M. (2020). Optimal approach to estimating K in social networks. Manuscript.

  19. Karrer, B. and Newman, M. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107.

    MathSciNet  Article  Google Scholar 

  20. Ke, Z. T. and Wang, M. (2017). A new SVD approach to optimal topic estimation. arXiv:1704.07016.

  21. Ke, Z. T., Shi, F. and Xia, D. (2020). Community detection for hypergraph networks via regularized tensor power iteration. arXiv:1909.06503.

  22. Lusseau, D, Schneider, K, Boisseau, O J, Haase, P, Slooten, E and Dawson, S M (2003). The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behav. Ecol. Sociobiol. 54, 4, 396–405. Springer.

    Article  Google Scholar 

  23. Liu, Y., Hou, Z., Yao, Z., Bai, Z., Hu, J. and Zheng, S. (2019). Community detection based on the \(\ell _{\infty }\) convergence of eigenvectors in dcbm. arXiv:1906.06713.

  24. Ma, Z., Ma, Z. and Yuan, H. (2020). Universal latent space model fitting for large networks with edge covariates. J. Mach. Learn. Res. 21, 1–67.

    MathSciNet  MATH  Google Scholar 

  25. Mao, X., Sarkar, P. and Chakrabarti, D. (2020). Estimating mixed memberships with sharp eigenvector deviations. J. Amer. Statist. Assoc. (to appear), 147.

  26. Mihail, M. and Papadimitriou, C. H. (2002). On the eigenvalue power law. In International workshop on randomization and approximation techniques in computer science, pp. 254–262. Springer, Berlin.

  27. Nepusz, T, Petróczi, A, Négyessy, L and Bazsó, F (2008). Fuzzy communities and the concept of bridgeness in complex networks. Phys. Rev. E 77, 1, 016107. APS.

    MathSciNet  Article  Google Scholar 

  28. Qin, T. and Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. Adv. Neural Inf. Process. Syst. 3120–3128.

  29. Su, L., Wang, W. and Zhang, Y. (2019). Strong consistency of spectral clustering for stochastic block models. IEEE Trans. Inform. Theory 66, 324–338.

    MathSciNet  Article  Google Scholar 

  30. Traud, A. L., Kelsic, E. D., Mucha, P. J. and Porter, M. A. (2011). Comparing community structure to characteristics in online collegiate social networks. SIAM Rev. 53, 526–543.

    MathSciNet  Article  Google Scholar 

  31. Traud, A. L., Mucha, P. J. and Porter, M. A. (2012). Social structure of facebook networks. Physica A 391, 4165–4180.

    Article  Google Scholar 

  32. Zachary, W W (1977). An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 4, 452–473. University of New Mexico.

    Article  Google Scholar 

  33. Zhang, Y., Levina, E. and Zhu, J. (2020). Detecting overlapping communities in networks using spectral methods. SIAM J. Math. Anal. 2, 265–283.

    MathSciNet  Google Scholar 

  34. Zhao, Y, Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. 40, 2266–2292.

    MathSciNet  Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jiashun Jin.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jin, J., Ke, Z.T. & Luo, S. Improvements on SCORE, Especially for Weak Signals. Sankhya A (2021). https://doi.org/10.1007/s13171-020-00240-1

Download citation

AMS (2000) subject classification

  • Primary: 62H30
  • 91C20
  • Secondary: 62P25