Abstract
We consider the problem of estimating overlapping community memberships in a network, where each node can belong to multiple communities. More than a few communities per node are difficult to both estimate and interpret, so we focus on sparse node membership vectors. Our algorithm is based on sparse principal subspace estimation with iterative thresholding. The method is computationally efficient, with computational cost equivalent to estimating the leading eigenvectors of the adjacency matrix, and does not require an additional clustering step, unlike spectral clustering methods. We show that a fixed point of the algorithm corresponds to correct node memberships under a version of the stochastic block model. The methods are evaluated empirically on simulated and real-world networks, showing good statistical performance and computational efficiency.
Similar content being viewed by others
References
Abbe, E (2017). Community detection and stochastic block models: recent developments. Journal of Machine Learning Research 18, 1–86.
Adamic, L A and Glance, N (2005). The political blogosphere and the 2004 US election: divided they blog. ACM, p. 36–43.
Airoldi, E M, Blei, D M, Fienberg, S E and Xing, E P (2009). Mixed membership stochastic blockmodels, p. 33–40.
Amini, A A and Wainwright, M J (2008). High-dimensional analysis of semidefinite relaxations for sparse principal components. IEEE, p. 2454–2458.
Amini, A A and Levina, E (2018). On semidefinite relaxations for the block model. The Annals of Statistics 46, 1, 149–179.
Arroyo, J and Levina, E (2020). Simultaneous prediction and community detection for networks with application to neuroimaging. arXiv:2002.01645.
Ball, B, Karrer, B and Newman, MEJ (2011). Efficient and principled method for detecting communities in networks. Physical Review E 84, 3, 036103.
Bollobás, B, Janson, S and Riordan, O (2007). The phase transition in inhomogeneous random graphs. Random Structures and Algorithms 31, 1, 3–122.
Bullmore, E and Sporns, O (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience10, 3, 186–198.
Cape, J, Tang, M and Priebe, C E (2019). On spectral embedding performance and elucidating network structure in stochastic blockmodel graphs. Network Science 7, 3, 269–291.
Conover, M, Ratkiewicz, J, Francisco, M R, Gonçalves, B, Menczer, F and Flammini, A (2011). Political polarization on Twitterx. ICWSM133, 89–96.
da Fonseca Vieira, V, Xavier, C R and Evsukoff, A G (2020). A comparative study of overlapping community detection methods from the perspective of the structural properties. Applied Network Science 5, 1, 1–42.
Girvan, M and Newman, Mark EJ (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12, 7821–7826.
Golub, G H and Van Loan, C F (2012). Matrix computations, 3. Johns Hopkins University Press, USA.
Gregory, S (2010). Finding overlapping communities in networks by label propagation. New Journal of Physics 12, 10, 103018.
Holland, P W, Laskey, K B and Leinhardt, S (1983). Stochastic blockmodels: First steps. Social Networks 5, 2, 109–137.
Huang, K and Fu, X (2019). Detecting overlapping and correlated communities without pure nodes: Identifiability and algorithm, p. 2859–2868.
Ji, P and Jin, J (2016). Coauthorship and citation networks for statisticians. The Annals of Applied Statistics 10, 4, 1779–1812.
Jin, J (2015). Fast community detection by score. The Annals of Statistics 43, 1, 57–89.
Jin, J, Ke, Z T and Luo, S (2017). Estimating network memberships by simplex vertex hunting. arXiv:1708.07852.
Johnstone, I M and Lu, A Y (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association 104, 486, 682–693.
Jolliffe, I T, Trendafilov, N T and Uddin, M (2003). A modified principal component technique based on the lasso. Journal of Computational and Graphical Statistics 12, 3, 531–547.
Karrer, B and Newman, M.EJ (2011). Stochastic blockmodels and community structure in networks. Physical Review E 83, 1, 016107.
Lancichinetti, A, Fortunato, S and Kertész, J (2009). Detecting the overlapping and hierarchical community structure in complex networks. New J. Phys.11, 3, 033015.
Lancichinetti, A, Radicchi, F, Ramasco, J J and Fortunato, S (2011). Finding statistically significant communities in networks. PLoS ONE 6, 4.
Latouche, P, Birmelé, E. and Ambroise, C (2011). Overlapping stochastic block models with application to the french political blogosphere. The Annals of Applied Statistics, 309–336.
Le, C M and Levina, E (2015). Estimating the number of communities in networks by spectral methods. arXiv:1507.00827.
Le, C M, Levina, E and Vershynin, R (2017). Concentration and regularization of random graphs. Random Structures & Algorithms 51, 3, 538–561.
Lee, D D and Seung, H S (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755, 788–791.
Lei, J and Rinaldo, A (2015). Consistency of spectral clustering in stochastic block models. The Annals of Statistics 43, 1, 215–237.
Levin, K, Athreya, A, Tang, M, Lyzinski, V and Priebe, C E (2017). A central limit theorem for an omnibus embedding of random dot product graphs. arXiv:1705.09355.
Li, T, Levina, E and Zhu, J (2020). Network cross-validation by edge sampling. Biometrika 107, 2, 257–276.
Lyzinski, V, Sussman, D L, Tang, M, Athreya, A and Priebe, C E (2014). Perfect clustering for stochastic blockmodel graphs via adjacency spectral embedding. Electronic Journal of Statistics 8, 2, 2905–2922.
Ma, Z (2013). Sparse principal component analysis and iterative thresholding. The Annals of Statistics, 41, 2, 772–801.
Mao, X, Sarkar, P and Chakrabarti, D (2017). On mixed memberships and symmetric nonnegative matrix factorizations. PMLR, p. 2324–2333.
Mao, X, Sarkar, P and Chakrabarti, D (2018). Overlapping clustering models, and one (class) svm to bind them all, p. 2126–2136.
Mao, X, Sarkar, P and Chakrabarti, D (2020). Estimating mixed memberships with sharp eigenvector deviations. Journal of the American Statistical Association. (just-accepted), 1–24.
McAuley, J J and Leskovec, J (2012). Learning to discover social circles in ego networks., 2012, p. 548–56.
Newman, Mark EJ (2006). Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74, 3, 036104.
Porter, M A, Onnela, J.-P. and Mucha, P J (2009). Communities in networks. Notices of the AMS 56, 9, 1082–1097.
Power, J D, Cohen, A L, Nelson, S M, Wig, G S, Barnes, K A, Church, J A, Vogel, A C, Laumann, T O, Miezin, F M and Schlaggar, B L (2011). Functional network organization of the human brain. Neuron 72, 4, 665–678.
Psorakis, I, Roberts, S, Ebden, M and Sheldon, B (2011). Overlapping community detection using bayesian non-negative matrix factorization. Physical Review E 83, 6, 066114.
Rohe, K, Chatterjee, S and Yu, B (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39, 4, 1878–1915.
Rubin-Delanchy, P, Priebe, C E and Tang, M (2017). Consistency of adjacency spectral embedding for the mixed membership stochastic blockmodel. arXiv:1705.04518.
Schlitt, T and Brazma, A (2007). Current approaches to gene regulatory network modelling. BMC Bioinformatics 8, Suppl 6, S9.
Schwarz, G (1978). Estimating the dimension of a model. The Annals of Statistics 6, 2, 461–464.
Schwarz, A J, Gozzi, A and Bifone, A (2008). Community structure and modularity in networks of correlated brain activity. agnetic Resonance Imaging 26, 7, 914–920.
Tang, M, Athreya, A, Sussman, D L, Lyzinski, V, Park, Y and Priebe, C E (2017). A semiparametric two-sample hypothesis testing problem for random graphs. Journal of Computational and Graphical Statistics 26, 2, 344–354.
Vu, V Q and Lei, J (2013). Minimax sparse principal subspace estimation in high dimensions. The Annals of Statistics 41, 6, 2905–2947.
Wang, C and Blei, D (2009). Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. Advances in Neural Information Processing Systems 22, 1982–1989.
Wang, YX R and Bickel, P J (2017). Likelihood-based model selection for stochastic block models. The Annals of Statistics 45, 2, 500–528.
Wasserman, S and Faust, K (1994). Social network analysis: Methods and applications, 8. Cambridge University Press, Cambridge.
Williamson, S, Wang, C, Heller, K A and Blei, D M (2010). The ibp compound dirichlet process and its application to focused topic modeling. Omnipress, Madison, p. 1151–1158.
Xie, J, Kelley, S and Szymanski, B K (2013). Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Computing Surveys 45, 4, 1–35.
Yu, Y, Wang, T and Samworth, R J (2015). A useful variant of the Davis–Kahan theorem for statisticians. Biometrika 102, 2, 315–323.
Zachary, W W (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33, 4, 452–473.
Zhang, Y, Levina, E and Zhu, J (2020). Detecting Overlapping Communities in Networks Using Spectral Methods. SIAM Journal on Mathematics of Data Science 2, 2, 265–283.
Zou, H, Hastie, T and Tibshirani, R (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics 15, 2, 265–286.
Acknowledgements
This research was supported in part by NSF grants DMS-1521551 and DMS-1916222. The authors would like to thank Yuan Zhang for helpful discussions, and Advanced Research Computing at the University of Michigan for computational resources and services.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Proposition 1.
Because V and \(\widetilde {\textup {\textbf {V}}}\) are two bases of the column space of P, and rank(P) = K, then \(\textbf {P}=\textbf {V}\textbf {U}^{\top }=\widetilde {\textbf {V}}\widetilde {\textbf {U}}^{\top }\) for some full rank matrices \(\textbf {U},\widetilde {\textbf {U}}\in \mathbb {R}^{n\times K}\) and therefore
Let \((\widetilde {\textup {\textbf {U}}}^{\top }\textup {\textbf {U}})({\textup {\textbf {U}}}^{\top }\textup {\textbf {U}})^{-1}=\boldsymbol {\Lambda }\). We will show that Λ = QD for a permutation matrix Q ∈{0,1}K×K and a diagonal matrix \(\textup {\textbf {D}}\in \mathbb {R}^{K\times K}\), or in other words, this is a generalized permutation matrix.
Let \(\boldsymbol {\theta },\widetilde {\boldsymbol {\theta }}\in \mathbb {R}^{n}\) and \(\textup {\textbf {Z}},\widetilde {\textup {\textbf {Z}}}\in \mathbb {R}^{n\times K}\) such that \(\boldsymbol {\theta }_{i} = \left ({\sum }_{k=1}^{K}\textup {\textbf {V}}_{ik}^{2}\right )^{1/2}\), \(\widetilde {\boldsymbol {\theta }}_{i} = \left ({\sum }_{k=1}^{K}\widetilde {\textup {\textbf {V}}}_{ik}^{2}\right )^{1/2}\), and Zik = Vik/𝜃i if 𝜃i > 0, and Zik = 0 otherwise (similarly for \(\widetilde {\textup {\textbf {Z}}}\)). Denote by \(\mathcal {S}_{1}=(i_{1},\ldots ,i_{K})\) to the vector of row indexes that satisfy \(\textup {\textbf {V}}_{i_{j}j}> 0\) and \(\textup {\textbf {V}}_{i_{j}j'}=0\) for j′≠j, and j = 1,…,j (these indexes exist by assumption). In the same way, define \(\mathcal {S}_{2}=(i'_{1},\ldots ,i'_{K})\) such that \(\widetilde {\textup {\textbf {V}}}_{i^{\prime }_{j}j}> 0\) and \(\widetilde {\textup {\textbf {V}}}_{i_{j}j'}=0\) for j′≠j. j = 1,…,j. Denote by \(\textup {\textbf {Z}}_{\mathcal {S}}\) to the K × K matrix formed by the rows indexed by \(\mathcal {S}\). Therefore
Write \(\boldsymbol {\Theta } = \text {diag}(\boldsymbol {\theta })\in \mathbb {R}^{n\times n}\) and \(\widetilde {\boldsymbol {\Theta }} = \text {diag}(\widetilde {\boldsymbol {\theta }})\in \text {real}^{n\times n}\). From Eq. A.1 we have
where \({\Theta }_{\mathcal {S}, \mathcal {S}}\) is the submatrix of Θ formed by the rows and columns indexed by \(\mathcal {S}\). Thus,
which implies that Λ is a non-negative matrix. Applying the same to the equation \((\boldsymbol {\Theta }\textbf {Z})_{\mathcal {S}_{1}}\boldsymbol {\Lambda }^{-1}= (\widetilde {\boldsymbol {\Theta }}\widetilde {\textup {\textbf {Z}}})_{\mathcal {S}_{1}}\), we have
Hence, both Λ and Λ− 1 are non-negative matrices, which implies that Λ is a positive generalized permutation matrix, so Λ = QD for some permutation matrix Q and a diagonal matrix D with diag(D) > 0. □
Proof of Proposition 2.
Let \(\boldsymbol {\theta }\in \mathbb {R}^{n}\) be a vector such that \({\boldsymbol {\theta }_{i}^{2}}={\sum }_{k=1}^{K}\boldsymbol {V}_{ik}^{2}\), and define \(\textup {\textbf {Z}}\in \mathbb {R}^{n\times K}\) such that \(\textbf {Z}_{ik}=\frac {1}{\theta _{i}}\textbf {V}_{ik}\), for each i ∈ [n],k ∈ [K]. Let B = (V⊤V)− 1VTU. To show that B is symmetric, observe that VU⊤ = P = P⊤ = UV⊤. Multiplying both sides by V and V⊤,
and observing that (V⊤V)− 1 exists since V is full rank, we have
which implies that B⊤ = B. To obtain the equivalent representation for P, form a diagonal matrix Θ = diag(𝜃). Then ΘZ = V, and
Finally, under the conditions of Proposition 1, V uniquely determines the pattern of zeros of any non-negative eigenbasis of P, and therefore supp(V) = supp(ΘZQ) = supp(ZQ) for some permutation Q. □
Proof of Proposition 3.
Suppose that P = VU⊤ for some non-negative matrix V that satisfies the assumptions of Proposition 1. Let D ∈realK such that Di = ∥V⋅k∥2 and D = diag(D). Then \(\textbf {P} = \widetilde {\textbf {V}}\textbf {D}\textbf {U}^{\top }\). Let \(\textup {\textbf {V}}^{(0)} = \widetilde {\textup {\textbf {V}}}\) be the initial value of Algorithm 1. Then, observe that
Suppose that λ ∈ [0,v∗). Then, \(\lambda \max \limits _{j\in [K]}|\widetilde {\textup {\textbf {V}}}| <\widetilde {\textup {\textbf {V}}}_{ik}\) for all i ∈ [n],k ∈ [K] such that Vik > 0, and hence \(\textup {\textbf {U}}^{(1)}=\mathcal {S}(\widetilde {\textup {\textbf {V}}}, \lambda ) = \widetilde {\textup {\textbf {V}}}\). Finally, since \(\|\widetilde {\textup {\textbf {V}}}_{\cdot ,k}\|_{2}=1\) for all k ∈ [K], then \(\textup {\textbf {V}}^{(1)}=\widetilde {\textup {\textbf {V}}}\). □
Proof of Theorem 1.
The proof consists of a one-step fixed point
analysis of Algorithm 2. We will show that if Z(t) = Z, then Z(t+ 1) = Z with high probability. Let T = T(t+ 1) = AZ be value after the multiplication step. Define \(\textup {\textbf {C}}\in \mathbb {R}^{K\times K}\) to be the diagonal matrix with community sizes on the diagonal, Ckk = nk = ∥Z⋅,k∥1. Then \(\widetilde {\textup {\textbf {T}}}=\widetilde {\textup {\textbf {T}}}^{(t+1)}= \textup {\textbf {T}}\textup {\textbf {C}}^{-1}\). In order for the threshold to set the correct set of entries to zero, a sufficient condition is that in each row i the largest element of \(\widetilde {\boldsymbol {T}}_{i,\cdot }\) corresponds to the correct community. Define \(\mathcal {C}_{k}\subset [n]\) as the node subset corresponding to community k. Then,
Therefore \(\widetilde {\textup {\textbf {T}}}_{ik}\) is a sum of independent and identically distributed Bernoulli random variables. Moreover, for each k1 and k2 in [K], \(\widetilde {\textup {\textbf {T}}}_{ik_{1}}\) and \(\widetilde {\textup {\textbf {T}}}_{ik_{2}}\) are independent of each other.
Given a value of λ ∈ (0,1), let
be the event that the largest entry of \(\widetilde {\textbf {T}}_{i\cdot }\) corresponds to ki, that is, the entry corresponding to the community of node i, and all the other indexes in that row are smaller in magnitude than \(\lambda |\widetilde {\textbf {T}}_{ik_{i}}|\). Let \(\textbf {U} = \textbf {U}^{(t+1)}=\mathcal {S}(\widetilde {\textbf {T}}^{(t+1)}, \lambda )\) be the matrix obtained after the thresholding step. Under the event \(\mathcal {E}(\lambda )=\bigcap _{i=1}^{n} \mathcal {E}_{i}(\lambda )\), we have that \(\|\textbf {U}_{i,\cdot }\|_{\infty } = \textbf {U}_{ik_{i}}\) for each i ∈ [n], and hence
Therefore, under the event \(\mathcal {E}(\lambda )\), the thresholding step recovers the correct support, so Z(t+ 1) = Z.
Now we verify that under the conditions of Theorem 3.6, the event \(\mathcal {E}(\lambda )\) happens with high probability. By a union bound,
For j≠ki, \(\widetilde {\textup {\textbf {T}}}_{ij}-\lambda \widetilde {\textup {\textbf {T}}}_{ik_{i}}\) is a sum of independent random variables with expectation
By Hoeffding’s inequality, we have that for any \(\tau \in \mathbb {R}\),
where \(n_{\min \limits } = \min \limits _{k\in [K]}n_{k}\). Setting
and using Eq. A.3 and 3.6, we obtain that for n sufficiently large,
Combining with the bound (A.2), the probability of event \(\mathcal {E}(\lambda )\) (which implies that Z(t+ 1) = Z) is bounded from below as
Therefore, with high probability Z is a fixed point of the Algorithm 2 for any λ ∈ (λ∗,1). □
Proof of Proposition 4.
Observe that
where C is a constant that does not depend on B. Therefore \(\widehat {\textup {\textbf {B}}}\)
Suppose that \(\widehat {\boldsymbol {V}} = \widehat {\boldsymbol {Q}}\widehat {\boldsymbol {R}}\) for some matrix Q with orthonormal columns of size n × K. Then, \(\widehat {\boldsymbol {R}}\) is a full rank matrix, and therefore
Using this equation, we obtain the desired result. □
Rights and permissions
About this article
Cite this article
Arroyo, J., Levina, E. Overlapping Community Detection in Networks via Sparse Spectral Decomposition. Sankhya A 84, 1–35 (2022). https://doi.org/10.1007/s13171-021-00245-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13171-021-00245-4