Abstract
A network may have weak signals and severe degree heterogeneity, and may be very sparse in one occurrence but very dense in another. SCORE (Ann. Statist. 43, 57–89, 2015) is a recent approach to network community detection. It accommodates severe degree heterogeneity and is adaptive to different levels of sparsity, but its performance for networks with weak signals is unclear. In this paper, we show that in a broad class of network settings where we allow for weak signals, severe degree heterogeneity, and a wide range of network sparsity, SCORE achieves prefect clustering and has the so-called “exponential rate” in Hamming clustering errors. The proof uses the most recent advancement on entry-wise bounds for the leading eigenvectors of the network adjacency matrix. The theoretical analysis assures us that SCORE continues to work well in the weak signal settings, but it does not rule out the possibility that SCORE may be further improved to have better performance in real applications, especially for networks with weak signals. As a second contribution of the paper, we propose SCORE+ as an improved version of SCORE. We investigate SCORE+ with 8 network data sets and found that it outperforms several representative approaches. In particular, for the 6 data sets with relatively strong signals, SCORE+ has similar performance as that of SCORE, but for the 2 data sets (Simmons, Caltech) with possibly weak signals, SCORE+ has much lower error rates. SCORE+ proposes several changes to SCORE. We carefully explain the rationale underlying each of these changes, using a mixture of theoretical and numerical study.
Similar content being viewed by others
Notes
We model \(\mathbb {E}[A]\) by Ω −diag(Ω) instead of Ω because the diagonals of \(\mathbb {E}[A]\) are all 0. Here, “main signal”, “secondary signal”, and “noise” refers to Ω, −diag(Ω) and W respectively.
For SBM, the diagonal entries of P can be unequal. DCBM has more free parameters, so we have to assume that P has unit diagonal entries to maintain identifiability.
A multi-\(\log (n)\) term is a term Ln > 0 that satisfies ”Lnn−δ → 0 and \(L_n n^{\delta }\to \infty \) for any fixed constant δ > 0
For example, \(\frac {\hat {\xi }_{2}}{\hat {\xi }_{1}}\) is the n-dimensional vector \((\frac {\hat {\xi }_{2}(1)}{\hat {\xi }_{1}(1)}, \frac {\hat {\xi }_{2}(2)}{\hat {\xi }_{1}(2)}, \ldots , \frac {\hat {\xi }_{2}(n)}{\hat {\xi }_{1}(n)})^{\prime }\). Note that we may choose to threshold all entries of the n × (K − 1) matrix by \(\pm \log (n)\) from top and bottom (Jin, 2015), but this is not always necessary. For all data sets in this paper, thresholding or not only has a negligible difference.
When translating the bound in Gao et al. (2018), we notice that 𝜃i there have been normalized, so that their 𝜃i corresponds to our \((\theta _{i}/\bar {\theta })\).
This is analogous to the Students’ t-test, where for n samples from an unknown distribution, the t-test uses a normalization for the mean and a normalization for the variance.
References
Abbe, E., Fan, J., Wang, K. and Zhong, Y. (2019). Entrywise eigenvector analysis of random matrices with low expected rank. Ann. Statist. (to appear).
Adamic, L A and Glance, N (2005). The political blogosphere and the 2004 US election: divided they blog. In Proceedings of the 3rd international workshop on Link discovery, pp. 36–43.
Airoldi, E., Blei, D., Fienberg, S. and Xing, E. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981–2014.
Bickel, P. J. and Chen, A (2009).
Chaudhuri, K., Chung, F. and Tsiatas, A. (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. In Proceedings of the 25th annual conference on learning theory, JMLR workshop and conference proceedings, vol. 23, pp. 1–35.
Chen, Y., Li, X. and Xu, J. (2018). Convexified modularity maximization for degree-corrected stochastic block models. Ann. Statist. 46, 1573–1602.
Duan, Y., Ke, Z. T. and Wang, M. (2018). State aggregation learning from Markov transition data. In NIPS workshop on probabilistic reinforcement learning and structured control.
Fan, J., Fan, Y., Han, X. and Lv, J. (2019). SIMPLE: statistical inference on membership profiles in large networks. arXiv:1910.01734.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188.
Gao, C., Ma, Z., Zhang, A.Y. and Zhou, H.H. (2018). Community detection in degree-corrected block models. Ann. Statist. 46, 2153–2185.
Girvan, M and Newman, M EJ (2002). Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99, 12, 7821–7826. National Acad Sciences.
Hastie, T., Tibshirani, R. and Friedman, J. (2009). The elements of statistical learning, 2nd edn. Springer, Berlin.
Ji, P. and Jin, J (2016). Coauthorship and citation networks for statisticians (with discussion). Ann. Appl. Statist. 10, 4, 1779–1812.
Jin, J. (2015). Fast community detection by SCORE. Ann. Statist.43, 57–89.
Jin, J. and Ke, Z. T. (2018). Optimal membership estimation, especially for networks with severe degree heterogeneity. Manuscript.
Jin, J., Ke, Z. T. and Luo, S. (2017). Estimating network memberships by simplex vertex hunting. arXiv:1708.07852.
Jin, J., Tracy Ke, Z. and Luo, S. (2019). Optimal adaptivity of signed-polygon statistics for network testing. arXiv:1904.09532.
Jin, J., Ke, Z. T., Luo, S. and Wang, M. (2020). Optimal approach to estimating K in social networks. Manuscript.
Karrer, B. and Newman, M. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107.
Ke, Z. T. and Wang, M. (2017). A new SVD approach to optimal topic estimation. arXiv:1704.07016.
Ke, Z. T., Shi, F. and Xia, D. (2020). Community detection for hypergraph networks via regularized tensor power iteration. arXiv:1909.06503.
Lusseau, D, Schneider, K, Boisseau, O J, Haase, P, Slooten, E and Dawson, S M (2003). The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behav. Ecol. Sociobiol. 54, 4, 396–405. Springer.
Liu, Y., Hou, Z., Yao, Z., Bai, Z., Hu, J. and Zheng, S. (2019). Community detection based on the \(\ell _{\infty }\) convergence of eigenvectors in dcbm. arXiv:1906.06713.
Ma, Z., Ma, Z. and Yuan, H. (2020). Universal latent space model fitting for large networks with edge covariates. J. Mach. Learn. Res. 21, 1–67.
Mao, X., Sarkar, P. and Chakrabarti, D. (2020). Estimating mixed memberships with sharp eigenvector deviations. J. Amer. Statist. Assoc. (to appear), 147.
Mihail, M. and Papadimitriou, C. H. (2002). On the eigenvalue power law. In International workshop on randomization and approximation techniques in computer science, pp. 254–262. Springer, Berlin.
Nepusz, T, Petróczi, A, Négyessy, L and Bazsó, F (2008). Fuzzy communities and the concept of bridgeness in complex networks. Phys. Rev. E 77, 1, 016107. APS.
Qin, T. and Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. Adv. Neural Inf. Process. Syst. 3120–3128.
Su, L., Wang, W. and Zhang, Y. (2019). Strong consistency of spectral clustering for stochastic block models. IEEE Trans. Inform. Theory 66, 324–338.
Traud, A. L., Kelsic, E. D., Mucha, P. J. and Porter, M. A. (2011). Comparing community structure to characteristics in online collegiate social networks. SIAM Rev. 53, 526–543.
Traud, A. L., Mucha, P. J. and Porter, M. A. (2012). Social structure of facebook networks. Physica A 391, 4165–4180.
Zachary, W W (1977). An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 4, 452–473. University of New Mexico.
Zhang, Y., Levina, E. and Zhu, J. (2020). Detecting overlapping communities in networks using spectral methods. SIAM J. Math. Anal. 2, 265–283.
Zhao, Y, Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. 40, 2266–2292.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jin, J., Ke, Z.T. & Luo, S. Improvements on SCORE, Especially for Weak Signals. Sankhya A 84, 127–162 (2022). https://doi.org/10.1007/s13171-020-00240-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13171-020-00240-1