Improvements on SCORE, Especially for Weak Signals

Jin, Jiashun; Ke, Zheng Tracy; Luo, Shengming

doi:10.1007/s13171-020-00240-1

Improvements on SCORE, Especially for Weak Signals

Published: 02 March 2021

Volume 84, pages 127–162, (2022)
Cite this article

Sankhya A Aims and scope Submit manuscript

Jiashun Jin¹,
Zheng Tracy Ke² &
Shengming Luo¹

626 Accesses
4 Citations
4 Altmetric
Explore all metrics

Abstract

A network may have weak signals and severe degree heterogeneity, and may be very sparse in one occurrence but very dense in another. SCORE (Ann. Statist. 43, 57–89, 2015) is a recent approach to network community detection. It accommodates severe degree heterogeneity and is adaptive to different levels of sparsity, but its performance for networks with weak signals is unclear. In this paper, we show that in a broad class of network settings where we allow for weak signals, severe degree heterogeneity, and a wide range of network sparsity, SCORE achieves prefect clustering and has the so-called “exponential rate” in Hamming clustering errors. The proof uses the most recent advancement on entry-wise bounds for the leading eigenvectors of the network adjacency matrix. The theoretical analysis assures us that SCORE continues to work well in the weak signal settings, but it does not rule out the possibility that SCORE may be further improved to have better performance in real applications, especially for networks with weak signals. As a second contribution of the paper, we propose SCORE+ as an improved version of SCORE. We investigate SCORE+ with 8 network data sets and found that it outperforms several representative approaches. In particular, for the 6 data sets with relatively strong signals, SCORE+ has similar performance as that of SCORE, but for the 2 data sets (Simmons, Caltech) with possibly weak signals, SCORE+ has much lower error rates. SCORE+ proposes several changes to SCORE. We carefully explain the rationale underlying each of these changes, using a mixture of theoretical and numerical study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review of Hidden Markov Models and Their Applications

Article 12 May 2020

Introduction to Bioinformatics

A Guide for Sparse PCA: Model Comparison and Applications

Article Open access 29 June 2021

Notes

We model \(\mathbb {E}[A]\) by Ω −diag(Ω) instead of Ω because the diagonals of \(\mathbb {E}[A]\) are all 0. Here, “main signal”, “secondary signal”, and “noise” refers to Ω, −diag(Ω) and W respectively.
For SBM, the diagonal entries of P can be unequal. DCBM has more free parameters, so we have to assume that P has unit diagonal entries to maintain identifiability.
A multi-\(\log (n)\) term is a term L_n > 0 that satisfies ”L_nn^−δ → 0 and \(L_n n^{\delta }\to \infty \) for any fixed constant δ > 0
For example, \(\frac {\hat {\xi }_{2}}{\hat {\xi }_{1}}\) is the n-dimensional vector \((\frac {\hat {\xi }_{2}(1)}{\hat {\xi }_{1}(1)}, \frac {\hat {\xi }_{2}(2)}{\hat {\xi }_{1}(2)}, \ldots , \frac {\hat {\xi }_{2}(n)}{\hat {\xi }_{1}(n)})^{\prime }\). Note that we may choose to threshold all entries of the n × (K − 1) matrix by \(\pm \log (n)\) from top and bottom (Jin, 2015), but this is not always necessary. For all data sets in this paper, thresholding or not only has a negligible difference.
When translating the bound in Gao et al. (2018), we notice that 𝜃_i there have been normalized, so that their 𝜃_i corresponds to our \((\theta _{i}/\bar {\theta })\).
This is analogous to the Students’ t-test, where for n samples from an unknown distribution, the t-test uses a normalization for the mean and a normalization for the variance.

References

Abbe, E., Fan, J., Wang, K. and Zhong, Y. (2019). Entrywise eigenvector analysis of random matrices with low expected rank. Ann. Statist. (to appear).
Adamic, L A and Glance, N (2005). The political blogosphere and the 2004 US election: divided they blog. In Proceedings of the 3rd international workshop on Link discovery, pp. 36–43.
Airoldi, E., Blei, D., Fienberg, S. and Xing, E. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981–2014.
MATH Google Scholar
Bickel, P. J. and Chen, A (2009).
Chaudhuri, K., Chung, F. and Tsiatas, A. (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. In Proceedings of the 25th annual conference on learning theory, JMLR workshop and conference proceedings, vol. 23, pp. 1–35.
Chen, Y., Li, X. and Xu, J. (2018). Convexified modularity maximization for degree-corrected stochastic block models. Ann. Statist. 46, 1573–1602.
MathSciNet MATH Google Scholar
Duan, Y., Ke, Z. T. and Wang, M. (2018). State aggregation learning from Markov transition data. In NIPS workshop on probabilistic reinforcement learning and structured control.
Fan, J., Fan, Y., Han, X. and Lv, J. (2019). SIMPLE: statistical inference on membership profiles in large networks. arXiv:1910.01734.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188.
Article Google Scholar
Gao, C., Ma, Z., Zhang, A.Y. and Zhou, H.H. (2018). Community detection in degree-corrected block models. Ann. Statist. 46, 2153–2185.
MathSciNet MATH Google Scholar
Girvan, M and Newman, M EJ (2002). Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99, 12, 7821–7826. National Acad Sciences.
Article MathSciNet Google Scholar
Hastie, T., Tibshirani, R. and Friedman, J. (2009). The elements of statistical learning, 2nd edn. Springer, Berlin.
Book Google Scholar
Ji, P. and Jin, J (2016). Coauthorship and citation networks for statisticians (with discussion). Ann. Appl. Statist. 10, 4, 1779–1812.
MathSciNet MATH Google Scholar
Jin, J. (2015). Fast community detection by SCORE. Ann. Statist.43, 57–89.
Article MathSciNet Google Scholar
Jin, J. and Ke, Z. T. (2018). Optimal membership estimation, especially for networks with severe degree heterogeneity. Manuscript.
Jin, J., Ke, Z. T. and Luo, S. (2017). Estimating network memberships by simplex vertex hunting. arXiv:1708.07852.
Jin, J., Tracy Ke, Z. and Luo, S. (2019). Optimal adaptivity of signed-polygon statistics for network testing. arXiv:1904.09532.
Jin, J., Ke, Z. T., Luo, S. and Wang, M. (2020). Optimal approach to estimating K in social networks. Manuscript.
Karrer, B. and Newman, M. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107.
Article MathSciNet Google Scholar
Ke, Z. T. and Wang, M. (2017). A new SVD approach to optimal topic estimation. arXiv:1704.07016.
Ke, Z. T., Shi, F. and Xia, D. (2020). Community detection for hypergraph networks via regularized tensor power iteration. arXiv:1909.06503.
Lusseau, D, Schneider, K, Boisseau, O J, Haase, P, Slooten, E and Dawson, S M (2003). The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behav. Ecol. Sociobiol. 54, 4, 396–405. Springer.
Article Google Scholar
Liu, Y., Hou, Z., Yao, Z., Bai, Z., Hu, J. and Zheng, S. (2019). Community detection based on the \(\ell _{\infty }\) convergence of eigenvectors in dcbm. arXiv:1906.06713.
Ma, Z., Ma, Z. and Yuan, H. (2020). Universal latent space model fitting for large networks with edge covariates. J. Mach. Learn. Res. 21, 1–67.
MathSciNet MATH Google Scholar
Mao, X., Sarkar, P. and Chakrabarti, D. (2020). Estimating mixed memberships with sharp eigenvector deviations. J. Amer. Statist. Assoc. (to appear), 147.
Mihail, M. and Papadimitriou, C. H. (2002). On the eigenvalue power law. In International workshop on randomization and approximation techniques in computer science, pp. 254–262. Springer, Berlin.
Nepusz, T, Petróczi, A, Négyessy, L and Bazsó, F (2008). Fuzzy communities and the concept of bridgeness in complex networks. Phys. Rev. E 77, 1, 016107. APS.
Article MathSciNet Google Scholar
Qin, T. and Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. Adv. Neural Inf. Process. Syst. 3120–3128.
Su, L., Wang, W. and Zhang, Y. (2019). Strong consistency of spectral clustering for stochastic block models. IEEE Trans. Inform. Theory 66, 324–338.
Article MathSciNet Google Scholar
Traud, A. L., Kelsic, E. D., Mucha, P. J. and Porter, M. A. (2011). Comparing community structure to characteristics in online collegiate social networks. SIAM Rev. 53, 526–543.
Article MathSciNet Google Scholar
Traud, A. L., Mucha, P. J. and Porter, M. A. (2012). Social structure of facebook networks. Physica A 391, 4165–4180.
Article Google Scholar
Zachary, W W (1977). An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 4, 452–473. University of New Mexico.
Article Google Scholar
Zhang, Y., Levina, E. and Zhu, J. (2020). Detecting overlapping communities in networks using spectral methods. SIAM J. Math. Anal. 2, 265–283.
MathSciNet MATH Google Scholar
Zhao, Y, Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. 40, 2266–2292.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburg, PA, USA
Jiashun Jin & Shengming Luo
Harvard University, Cambridge, MA, USA
Zheng Tracy Ke

Authors

Jiashun Jin
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Tracy Ke
View author publications
You can also search for this author in PubMed Google Scholar
Shengming Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiashun Jin.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jin, J., Ke, Z.T. & Luo, S. Improvements on SCORE, Especially for Weak Signals. Sankhya A 84, 127–162 (2022). https://doi.org/10.1007/s13171-020-00240-1

Download citation

Received: 07 June 2020
Accepted: 24 December 2020
Published: 02 March 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s13171-020-00240-1

AMS (2000) subject classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improvements on SCORE, Especially for Weak Signals

Abstract

Access this article

Similar content being viewed by others

A Systematic Review of Hidden Markov Models and Their Applications

Introduction to Bioinformatics

A Guide for Sparse PCA: Model Comparison and Applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

AMS (2000) subject classification

Navigation

Improvements on SCORE, Especially for Weak Signals

Abstract

Access this article

Similar content being viewed by others

A Systematic Review of Hidden Markov Models and Their Applications

Introduction to Bioinformatics

A Guide for Sparse PCA: Model Comparison and Applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

AMS (2000) subject classification

Search

Navigation