Overlapping Community Detection in Networks via Sparse Spectral Decomposition

Abstract

We consider the problem of estimating overlapping community memberships in a network, where each node can belong to multiple communities. More than a few communities per node are difficult to both estimate and interpret, so we focus on sparse node membership vectors. Our algorithm is based on sparse principal subspace estimation with iterative thresholding. The method is computationally efficient, with computational cost equivalent to estimating the leading eigenvectors of the adjacency matrix, and does not require an additional clustering step, unlike spectral clustering methods. We show that a fixed point of the algorithm corresponds to correct node memberships under a version of the stochastic block model. The methods are evaluated empirically on simulated and real-world networks, showing good statistical performance and computational efficiency.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9

References

  1. Abbe, E (2017). Community detection and stochastic block models: recent developments. Journal of Machine Learning Research 18, 1–86.

    MathSciNet  MATH  Google Scholar 

  2. Adamic, L A and Glance, N (2005). The political blogosphere and the 2004 US election: divided they blog. ACM, p. 36–43.

  3. Airoldi, E M, Blei, D M, Fienberg, S E and Xing, E P (2009). Mixed membership stochastic blockmodels, p. 33–40.

  4. Amini, A A and Wainwright, M J (2008). High-dimensional analysis of semidefinite relaxations for sparse principal components. IEEE, p. 2454–2458.

  5. Amini, A A and Levina, E (2018). On semidefinite relaxations for the block model. The Annals of Statistics 46, 1, 149–179.

    MathSciNet  MATH  Article  Google Scholar 

  6. Arroyo, J and Levina, E (2020). Simultaneous prediction and community detection for networks with application to neuroimaging. arXiv:2002.01645.

  7. Ball, B, Karrer, B and Newman, MEJ (2011). Efficient and principled method for detecting communities in networks. Physical Review E 84, 3, 036103.

    Article  Google Scholar 

  8. Bollobás, B, Janson, S and Riordan, O (2007). The phase transition in inhomogeneous random graphs. Random Structures and Algorithms 31, 1, 3–122.

    MathSciNet  MATH  Article  Google Scholar 

  9. Bullmore, E and Sporns, O (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience10, 3, 186–198.

    Article  Google Scholar 

  10. Cape, J, Tang, M and Priebe, C E (2019). On spectral embedding performance and elucidating network structure in stochastic blockmodel graphs. Network Science 7, 3, 269–291.

    Article  Google Scholar 

  11. Conover, M, Ratkiewicz, J, Francisco, M R, Gonçalves, B, Menczer, F and Flammini, A (2011). Political polarization on Twitterx. ICWSM133, 89–96.

    Google Scholar 

  12. da Fonseca Vieira, V, Xavier, C R and Evsukoff, A G (2020). A comparative study of overlapping community detection methods from the perspective of the structural properties. Applied Network Science 5, 1, 1–42.

    Article  Google Scholar 

  13. Girvan, M and Newman, Mark EJ (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12, 7821–7826.

    MathSciNet  MATH  Article  Google Scholar 

  14. Golub, G H and Van Loan, C F (2012). Matrix computations, 3. Johns Hopkins University Press, USA.

    MATH  Google Scholar 

  15. Gregory, S (2010). Finding overlapping communities in networks by label propagation. New Journal of Physics 12, 10, 103018.

    MATH  Article  Google Scholar 

  16. Holland, P W, Laskey, K B and Leinhardt, S (1983). Stochastic blockmodels: First steps. Social Networks 5, 2, 109–137.

    MathSciNet  Article  Google Scholar 

  17. Huang, K and Fu, X (2019). Detecting overlapping and correlated communities without pure nodes: Identifiability and algorithm, p. 2859–2868.

  18. Ji, P and Jin, J (2016). Coauthorship and citation networks for statisticians. The Annals of Applied Statistics 10, 4, 1779–1812.

    MathSciNet  MATH  Google Scholar 

  19. Jin, J (2015). Fast community detection by score. The Annals of Statistics 43, 1, 57–89.

    MathSciNet  MATH  Article  Google Scholar 

  20. Jin, J, Ke, Z T and Luo, S (2017). Estimating network memberships by simplex vertex hunting. arXiv:1708.07852.

  21. Johnstone, I M and Lu, A Y (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association 104, 486, 682–693.

    MathSciNet  MATH  Article  Google Scholar 

  22. Jolliffe, I T, Trendafilov, N T and Uddin, M (2003). A modified principal component technique based on the lasso. Journal of Computational and Graphical Statistics 12, 3, 531–547.

    MathSciNet  Article  Google Scholar 

  23. Karrer, B and Newman, M.EJ (2011). Stochastic blockmodels and community structure in networks. Physical Review E 83, 1, 016107.

    MathSciNet  Article  Google Scholar 

  24. Lancichinetti, A, Fortunato, S and Kertész, J (2009). Detecting the overlapping and hierarchical community structure in complex networks. New J. Phys.11, 3, 033015.

    Article  Google Scholar 

  25. Lancichinetti, A, Radicchi, F, Ramasco, J J and Fortunato, S (2011). Finding statistically significant communities in networks. PLoS ONE 6, 4.

  26. Latouche, P, Birmelé, E. and Ambroise, C (2011). Overlapping stochastic block models with application to the french political blogosphere. The Annals of Applied Statistics, 309–336.

  27. Le, C M and Levina, E (2015). Estimating the number of communities in networks by spectral methods. arXiv:1507.00827.

  28. Le, C M, Levina, E and Vershynin, R (2017). Concentration and regularization of random graphs. Random Structures & Algorithms 51, 3, 538–561.

    MathSciNet  MATH  Article  Google Scholar 

  29. Lee, D D and Seung, H S (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755, 788–791.

    MATH  Article  Google Scholar 

  30. Lei, J and Rinaldo, A (2015). Consistency of spectral clustering in stochastic block models. The Annals of Statistics 43, 1, 215–237.

    MathSciNet  MATH  Google Scholar 

  31. Levin, K, Athreya, A, Tang, M, Lyzinski, V and Priebe, C E (2017). A central limit theorem for an omnibus embedding of random dot product graphs. arXiv:1705.09355.

  32. Li, T, Levina, E and Zhu, J (2020). Network cross-validation by edge sampling. Biometrika 107, 2, 257–276.

    MathSciNet  MATH  Article  Google Scholar 

  33. Lyzinski, V, Sussman, D L, Tang, M, Athreya, A and Priebe, C E (2014). Perfect clustering for stochastic blockmodel graphs via adjacency spectral embedding. Electronic Journal of Statistics 8, 2, 2905–2922.

    MathSciNet  MATH  Article  Google Scholar 

  34. Ma, Z (2013). Sparse principal component analysis and iterative thresholding. The Annals of Statistics, 41, 2, 772–801.

    MathSciNet  MATH  Article  Google Scholar 

  35. Mao, X, Sarkar, P and Chakrabarti, D (2017). On mixed memberships and symmetric nonnegative matrix factorizations. PMLR, p. 2324–2333.

  36. Mao, X, Sarkar, P and Chakrabarti, D (2018). Overlapping clustering models, and one (class) svm to bind them all, p. 2126–2136.

  37. Mao, X, Sarkar, P and Chakrabarti, D (2020). Estimating mixed memberships with sharp eigenvector deviations. Journal of the American Statistical Association. (just-accepted), 1–24.

  38. McAuley, J J and Leskovec, J (2012). Learning to discover social circles in ego networks., 2012, p. 548–56.

  39. Newman, Mark EJ (2006). Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74, 3, 036104.

    MathSciNet  Article  Google Scholar 

  40. Porter, M A, Onnela, J.-P. and Mucha, P J (2009). Communities in networks. Notices of the AMS 56, 9, 1082–1097.

    MathSciNet  MATH  Google Scholar 

  41. Power, J D, Cohen, A L, Nelson, S M, Wig, G S, Barnes, K A, Church, J A, Vogel, A C, Laumann, T O, Miezin, F M and Schlaggar, B L (2011). Functional network organization of the human brain. Neuron 72, 4, 665–678.

    Article  Google Scholar 

  42. Psorakis, I, Roberts, S, Ebden, M and Sheldon, B (2011). Overlapping community detection using bayesian non-negative matrix factorization. Physical Review E 83, 6, 066114.

    Article  Google Scholar 

  43. Rohe, K, Chatterjee, S and Yu, B (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39, 4, 1878–1915.

    MathSciNet  MATH  Article  Google Scholar 

  44. Rubin-Delanchy, P, Priebe, C E and Tang, M (2017). Consistency of adjacency spectral embedding for the mixed membership stochastic blockmodel. arXiv:1705.04518.

  45. Schlitt, T and Brazma, A (2007). Current approaches to gene regulatory network modelling. BMC Bioinformatics 8, Suppl 6, S9.

    Article  Google Scholar 

  46. Schwarz, G (1978). Estimating the dimension of a model. The Annals of Statistics 6, 2, 461–464.

    MathSciNet  MATH  Article  Google Scholar 

  47. Schwarz, A J, Gozzi, A and Bifone, A (2008). Community structure and modularity in networks of correlated brain activity. agnetic Resonance Imaging 26, 7, 914–920.

    Article  Google Scholar 

  48. Tang, M, Athreya, A, Sussman, D L, Lyzinski, V, Park, Y and Priebe, C E (2017). A semiparametric two-sample hypothesis testing problem for random graphs. Journal of Computational and Graphical Statistics 26, 2, 344–354.

    MathSciNet  MATH  Article  Google Scholar 

  49. Vu, V Q and Lei, J (2013). Minimax sparse principal subspace estimation in high dimensions. The Annals of Statistics 41, 6, 2905–2947.

    MathSciNet  MATH  Article  Google Scholar 

  50. Wang, C and Blei, D (2009). Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. Advances in Neural Information Processing Systems 22, 1982–1989.

    Google Scholar 

  51. Wang, YX R and Bickel, P J (2017). Likelihood-based model selection for stochastic block models. The Annals of Statistics 45, 2, 500–528.

    MathSciNet  MATH  Article  Google Scholar 

  52. Wasserman, S and Faust, K (1994). Social network analysis: Methods and applications, 8. Cambridge University Press, Cambridge.

    MATH  Book  Google Scholar 

  53. Williamson, S, Wang, C, Heller, K A and Blei, D M (2010). The ibp compound dirichlet process and its application to focused topic modeling. Omnipress, Madison, p. 1151–1158.

  54. Xie, J, Kelley, S and Szymanski, B K (2013). Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Computing Surveys 45, 4, 1–35.

    MATH  Article  Google Scholar 

  55. Yu, Y, Wang, T and Samworth, R J (2015). A useful variant of the Davis–Kahan theorem for statisticians. Biometrika 102, 2, 315–323.

    MathSciNet  MATH  Article  Google Scholar 

  56. Zachary, W W (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33, 4, 452–473.

    Article  Google Scholar 

  57. Zhang, Y, Levina, E and Zhu, J (2020). Detecting Overlapping Communities in Networks Using Spectral Methods. SIAM Journal on Mathematics of Data Science 2, 2, 265–283.

    MathSciNet  Article  Google Scholar 

  58. Zou, H, Hastie, T and Tibshirani, R (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics 15, 2, 265–286.

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgements

This research was supported in part by NSF grants DMS-1521551 and DMS-1916222. The authors would like to thank Yuan Zhang for helpful discussions, and Advanced Research Computing at the University of Michigan for computational resources and services.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jesús Arroyo.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Proposition 1.

Because V and \(\widetilde {\textup {\textbf {V}}}\) are two bases of the column space of P, and rank(P) = K, then \(\textbf {P}=\textbf {V}\textbf {U}^{\top }=\widetilde {\textbf {V}}\widetilde {\textbf {U}}^{\top }\) for some full rank matrices \(\textbf {U},\widetilde {\textbf {U}}\in \mathbb {R}^{n\times K}\) and therefore

$$ \textup{\textbf{V}}=\widetilde{\textup{\textbf{V}}}(\widetilde{\textup{\textbf{U}}}^{\top}\textup{\textbf{U}})({\textup{\textbf{U}}}^{\top}\textup{\textbf{U}})^{-1}. $$
(A.1)

Let \((\widetilde {\textup {\textbf {U}}}^{\top }\textup {\textbf {U}})({\textup {\textbf {U}}}^{\top }\textup {\textbf {U}})^{-1}=\boldsymbol {\Lambda }\). We will show that Λ = QD for a permutation matrix Q ∈{0,1}K×K and a diagonal matrix \(\textup {\textbf {D}}\in \mathbb {R}^{K\times K}\), or in other words, this is a generalized permutation matrix.

Let \(\boldsymbol {\theta },\widetilde {\boldsymbol {\theta }}\in \mathbb {R}^{n}\) and \(\textup {\textbf {Z}},\widetilde {\textup {\textbf {Z}}}\in \mathbb {R}^{n\times K}\) such that \(\boldsymbol {\theta }_{i} = \left ({\sum }_{k=1}^{K}\textup {\textbf {V}}_{ik}^{2}\right )^{1/2}\), \(\widetilde {\boldsymbol {\theta }}_{i} = \left ({\sum }_{k=1}^{K}\widetilde {\textup {\textbf {V}}}_{ik}^{2}\right )^{1/2}\), and Zik = Vik/𝜃i if 𝜃i > 0, and Zik = 0 otherwise (similarly for \(\widetilde {\textup {\textbf {Z}}}\)). Denote by \(\mathcal {S}_{1}=(i_{1},\ldots ,i_{K})\) to the vector of row indexes that satisfy \(\textup {\textbf {V}}_{i_{j}j}> 0\) and \(\textup {\textbf {V}}_{i_{j}j'}=0\) for jj, and j = 1,…,j (these indexes exist by assumption). In the same way, define \(\mathcal {S}_{2}=(i'_{1},\ldots ,i'_{K})\) such that \(\widetilde {\textup {\textbf {V}}}_{i^{\prime }_{j}j}> 0\) and \(\widetilde {\textup {\textbf {V}}}_{i_{j}j'}=0\) for jj. j = 1,…,j. Denote by \(\textup {\textbf {Z}}_{\mathcal {S}}\) to the K × K matrix formed by the rows indexed by \(\mathcal {S}\). Therefore

$$\textup{\textbf{Z}}_{\mathcal{S}_{1}}= \textup{\textbf{I}}_{K}=\widetilde{\textup{\textbf{Z}}}_{\mathcal{S}_{2}}.$$

Write \(\boldsymbol {\Theta } = \text {diag}(\boldsymbol {\theta })\in \mathbb {R}^{n\times n}\) and \(\widetilde {\boldsymbol {\Theta }} = \text {diag}(\widetilde {\boldsymbol {\theta }})\in \text {real}^{n\times n}\). From Eq. A.1 we have

$$ \begin{array}{@{}rcl@{}} (\boldsymbol{\Theta}\textup{\textbf{Z}})_{\mathcal{S}_{2}}= & (\widetilde{\boldsymbol{\Theta}}\widetilde{\textup{\textbf{Z}}})_{\mathcal{S}_{2}}\boldsymbol{\Lambda} =\widetilde{\boldsymbol{\Theta}}_{\mathcal{S}_{2}, \mathcal{S}_{2}} \widetilde{\textup{\textbf{Z}}}_{\mathcal{S}_{2}}\boldsymbol{\Lambda} = \widetilde{\boldsymbol{\Theta}}_{\mathcal{S}_{2}, \mathcal{S}_{2}}\boldsymbol{\Lambda}\ , \end{array} $$

where \({\Theta }_{\mathcal {S}, \mathcal {S}}\) is the submatrix of Θ formed by the rows and columns indexed by \(\mathcal {S}\). Thus,

$$\boldsymbol{\Lambda} = (\widetilde{\boldsymbol{\Theta}}_{\mathcal{S}_{2}, \mathcal{S}_{2}}^{-1}{\boldsymbol{\Theta}}_{\mathcal{S}_{2}, \mathcal{S}_{2}})\textup{\textbf{Z}}_{\mathcal{S}_{2}},$$

which implies that Λ is a non-negative matrix. Applying the same to the equation \((\boldsymbol {\Theta }\textbf {Z})_{\mathcal {S}_{1}}\boldsymbol {\Lambda }^{-1}= (\widetilde {\boldsymbol {\Theta }}\widetilde {\textup {\textbf {Z}}})_{\mathcal {S}_{1}}\), we have

$$\boldsymbol{\Lambda}^{-1} = ({\boldsymbol{\Theta}}_{\mathcal{S}_{1}, \mathcal{S}_{1}}^{-1}\widetilde{\boldsymbol{\Theta}}_{\mathcal{S}_{1}, \mathcal{S}_{1}})\widetilde{\textup{\textbf{Z}}}_{\mathcal{S}_{1}}.$$

Hence, both Λ and Λ− 1 are non-negative matrices, which implies that Λ is a positive generalized permutation matrix, so Λ = QD for some permutation matrix Q and a diagonal matrix D with diag(D) > 0. □

Proof of Proposition 2.

Let \(\boldsymbol {\theta }\in \mathbb {R}^{n}\) be a vector such that \({\boldsymbol {\theta }_{i}^{2}}={\sum }_{k=1}^{K}\boldsymbol {V}_{ik}^{2}\), and define \(\textup {\textbf {Z}}\in \mathbb {R}^{n\times K}\) such that \(\textbf {Z}_{ik}=\frac {1}{\theta _{i}}\textbf {V}_{ik}\), for each i ∈ [n],k ∈ [K]. Let B = (VV)− 1VTU. To show that B is symmetric, observe that VU = P = P = UV. Multiplying both sides by V and V,

$$\textup{\textbf{V}}^{\top} \textup{\textbf{V}} \textup{\textbf{U}}^{\top} \textup{\textbf{V}} = \textup{\textbf{V}}^{\top} \textup{\textbf{U}}\textup{\textbf{V}}^{\top} \textup{\textbf{V}},$$

and observing that (VV)− 1 exists since V is full rank, we have

$$ \textup{\textbf{U}}^{\top} \textup{\textbf{V}}(\textup{\textbf{V}}^{\top} \textup{\textbf{V}})^{-1} = (\textup{\textbf{V}}^{\top} \textup{\textbf{V}})^{-1}\textup{\textbf{V}}^{\top} \textup{\textbf{U}},$$

which implies that B = B. To obtain the equivalent representation for P, form a diagonal matrix Θ = diag(𝜃). Then ΘZ = V, and

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\Theta}\textup{\textbf{Z}}\textup{\textbf{B}}\textup{\textbf{Z}}^{\top} \boldsymbol{\Theta} \! =\! \textup{\textbf{V}}[(\textup{\textbf{V}}^{T}\textup{\textbf{V}})^{-1}\textup{\textbf{V}}^{T}\textup{\textbf{U}}]\textup{\textbf{V}}^{\top} = \textup{\textbf{V}}(\textup{\textbf{V}}^{T}\textup{\textbf{V}})^{-1}\textup{\textbf{V}}^{T}\textup{\textbf{V}}\textup{\textbf{U}}^{\top} = \textup{\textbf{V}}\textup{\textbf{U}}^{\top} = \textup{\textbf{P}}. \end{array} $$

Finally, under the conditions of Proposition 1, V uniquely determines the pattern of zeros of any non-negative eigenbasis of P, and therefore supp(V) = supp(ΘZQ) = supp(ZQ) for some permutation Q. □

Proof of Proposition 3.

Suppose that P = VU for some non-negative matrix V that satisfies the assumptions of Proposition 1. Let D ∈realK such that Di = ∥Vk2 and D = diag(D). Then \(\textbf {P} = \widetilde {\textbf {V}}\textbf {D}\textbf {U}^{\top }\). Let \(\textup {\textbf {V}}^{(0)} = \widetilde {\textup {\textbf {V}}}\) be the initial value of Algorithm 1. Then, observe that

$$\textup{\textbf{T}}^{(1)} = \textup{\textbf{P}}\widetilde{\textup{\textbf{V}}} = \widetilde{\textup{\textbf{V}}}\textup{\textbf{D}}\textup{\textbf{U}}^{T}\widetilde{\textup{\textbf{V}}},$$
$$ \begin{array}{@{}rcl@{}} \widetilde{\textup{\textbf{T}}}^{(1)} & = &\textup{\textbf{T}}^{(1)}\left[\widetilde{\textup{\textbf{V}}}^{\top}\textup{\textbf{T}}^{(1)}\right]^{-1}(\widetilde{\textup{\textbf{V}}}^{T}\widetilde{\textup{\textbf{V}}})\\ & =& \widetilde{\textup{\textbf{V}}}\textup{\textbf{D}} (\textup{\textbf{U}}^{T}\widetilde{\textup{\textbf{V}}})\left( \textup{\textbf{U}}^{T}\widetilde{\textup{\textbf{V}}}\right)^{-1}\textup{\textbf{D}}^{-1}(\widetilde{\textup{\textbf{V}}}^{\top}\widetilde{\textup{\textbf{V}}})^{-1}(\widetilde{\textup{\textbf{V}}}^{\top}\widetilde{\textup{\textbf{V}}})\\ & = &\widetilde{\textup{\textbf{V}}}. \end{array} $$

Suppose that λ ∈ [0,v). Then, \(\lambda \max \limits _{j\in [K]}|\widetilde {\textup {\textbf {V}}}| <\widetilde {\textup {\textbf {V}}}_{ik}\) for all i ∈ [n],k ∈ [K] such that Vik > 0, and hence \(\textup {\textbf {U}}^{(1)}=\mathcal {S}(\widetilde {\textup {\textbf {V}}}, \lambda ) = \widetilde {\textup {\textbf {V}}}\). Finally, since \(\|\widetilde {\textup {\textbf {V}}}_{\cdot ,k}\|_{2}=1\) for all k ∈ [K], then \(\textup {\textbf {V}}^{(1)}=\widetilde {\textup {\textbf {V}}}\). □

Proof of Theorem 1.

The proof consists of a one-step fixed point

analysis of Algorithm 2. We will show that if Z(t) = Z, then Z(t+ 1) = Z with high probability. Let T = T(t+ 1) = AZ be value after the multiplication step. Define \(\textup {\textbf {C}}\in \mathbb {R}^{K\times K}\) to be the diagonal matrix with community sizes on the diagonal, Ckk = nk = ∥Z⋅,k1. Then \(\widetilde {\textup {\textbf {T}}}=\widetilde {\textup {\textbf {T}}}^{(t+1)}= \textup {\textbf {T}}\textup {\textbf {C}}^{-1}\). In order for the threshold to set the correct set of entries to zero, a sufficient condition is that in each row i the largest element of \(\widetilde {\boldsymbol {T}}_{i,\cdot }\) corresponds to the correct community. Define \(\mathcal {C}_{k}\subset [n]\) as the node subset corresponding to community k. Then,

$$ \widetilde{\textup{\textbf{T}}}_{ik} = \frac{1}{{n_{k}}} \textup{\textbf{A}}_{i,\cdot} \boldsymbol{Z}_{\cdot, k} = \frac{1}{{n_{k}}}\sum\limits_{j\in\mathcal{C}_{k}}\textup{\textbf{A}}_{ij}. $$

Therefore \(\widetilde {\textup {\textbf {T}}}_{ik}\) is a sum of independent and identically distributed Bernoulli random variables. Moreover, for each k1 and k2 in [K], \(\widetilde {\textup {\textbf {T}}}_{ik_{1}}\) and \(\widetilde {\textup {\textbf {T}}}_{ik_{2}}\) are independent of each other.

Given a value of λ ∈ (0,1), let

$$\mathcal{E}_{i}(\lambda)=\{ \lambda |\widetilde{\boldsymbol{T}}_{ik_{i}}|> |\widetilde{\boldsymbol{T}}_{ik_{j}}|, i\in\mathcal{C}_{k_{i}} \forall k_{j}\neq k_{i}\}$$

be the event that the largest entry of \(\widetilde {\textbf {T}}_{i\cdot }\) corresponds to ki, that is, the entry corresponding to the community of node i, and all the other indexes in that row are smaller in magnitude than \(\lambda |\widetilde {\textbf {T}}_{ik_{i}}|\). Let \(\textbf {U} = \textbf {U}^{(t+1)}=\mathcal {S}(\widetilde {\textbf {T}}^{(t+1)}, \lambda )\) be the matrix obtained after the thresholding step. Under the event \(\mathcal {E}(\lambda )=\bigcap _{i=1}^{n} \mathcal {E}_{i}(\lambda )\), we have that \(\|\textbf {U}_{i,\cdot }\|_{\infty } = \textbf {U}_{ik_{i}}\) for each i ∈ [n], and hence

$$\textup{\textbf{U}}_{ik}= \left\{\begin{array}{cl} \textup{\textbf{U}}_{ik_{i}} & \text{if }k=k_{i},\\ 0 & \text{otherwise.} \end{array}\right. $$

Therefore, under the event \(\mathcal {E}(\lambda )\), the thresholding step recovers the correct support, so Z(t+ 1) = Z.

Now we verify that under the conditions of Theorem 3.6, the event \(\mathcal {E}(\lambda )\) happens with high probability. By a union bound,

$$ \begin{array}{@{}rcl@{}} \mathbb{P}(\mathcal{E}(\lambda)) \geq 1-\sum\limits_{i=1}^{n} \mathbb{P}(\mathcal{E}_{i}(\lambda)^{C}) \geq 1-\sum\limits_{i=1}^{n}\sum\limits_{j\neq k_{i}}\mathbb{P}(\widetilde{\textup{\textbf{T}}}_{ij}>\lambda \widetilde{\textup{\textbf{T}}}_{ik_{i}}). \end{array} $$
(A.2)

For jki, \(\widetilde {\textup {\textbf {T}}}_{ij}-\lambda \widetilde {\textup {\textbf {T}}}_{ik_{i}}\) is a sum of independent random variables with expectation

$$ \begin{array}{@{}rcl@{}} \mathbb{E}\left[\widetilde{\textup{\textbf{T}}}_{ij}-\lambda \widetilde{\textup{\textbf{T}}}_{ik_{i}}\right] = \frac{1}{n_{j}}\sum\limits_{j\in\mathcal{C}_{j}}\mathbb{E}[\boldsymbol{A}_{ij}] - \frac{\lambda}{n_{k_{i}}}\sum\limits_{j\in\mathcal{C}_{k_{i}}}\mathbb{E}[\textup{\textbf{A}}_{ij}]\\ = q - \lambda \frac{n_{k_{i}}-1}{n_{k_{i}}}p. \end{array} $$
(A.3)

By Hoeffding’s inequality, we have that for any \(\tau \in \mathbb {R}\),

$$ \begin{array}{@{}rcl@{}} \mathbb{P}\left( \widehat{\textup{\textbf{T}}}_{ij}-\lambda \widehat{\textup{\textbf{T}}}_{ik_{i}} \geq \tau + \mathbb{E}\left[\widehat{\textup{\textbf{T}}}_{ij} - \lambda \widehat{\textup{\textbf{T}}}_{ik_{i}}\right] \right) & \leq& 2\exp\left( \frac{-2\tau^{2}}{\frac{1}{n_{j}} + \frac{\lambda^{2}}{n_{k_{i}}}}\right)\\ & \leq& 2\exp\left( - \frac{2n_{\min} \tau^{2}}{1+\lambda^{2}}\right) \\&\leq& 2\exp\left( -n_{\min} \tau^{2}\right), \end{array} $$

where \(n_{\min \limits } = \min \limits _{k\in [K]}n_{k}\). Setting

$$ \begin{array}{@{}rcl@{}} \tau = -\mathbb{E}\left[\widehat{\textup{\textbf{T}}}_{ij}- \lambda \widehat{\textup{\textbf{T}}}_{ik_{i}}\right] \geq \lambda^{\ast} p - q - \frac{1}{n_{k_{i}}}p \end{array} $$

and using Eq. A.3 and 3.6, we obtain that for n sufficiently large,

$$ \begin{array}{@{}rcl@{}} \mathbb{P}\left( \widehat{\textup{\textbf{T}}}_{ij}>\lambda \widehat{\textup{\textbf{T}}}_{ik_{i}} \right) & \leq& 2\exp\left( -n_{\min} \left( c_{1}\sqrt{\frac{\log(Kn)}{\min_{k}n_{k}}} -\frac{p}{n_{k}} \right)^{2} \right)\\ & \leq &2\exp\left( -n_{\min} \left( (c_{1}-1)\frac{\log(Kn)}{n_{\min}} \right) \right)= \frac{2}{(Kn)^{c_{1}-1}}. \end{array} $$

Combining with the bound (A.2), the probability of event \(\mathcal {E}(\lambda )\) (which implies that Z(t+ 1) = Z) is bounded from below as

$$ \begin{array}{@{}rcl@{}} \mathbb{P}(\mathcal{E}(\lambda)) & \geq& 1- n(K-1)\min_{i\in[n], k\in[K]}\mathbb{P}\left( \widehat{\textup{\textbf{T}}}_{ij}>\lambda \widehat{\textup{\textbf{T}}}_{ik_{i}} \right)\\ & \geq& 1-\frac{2(K-1)n}{(Kn)^{c_{1}-1}}\\ &\geq& 1- \frac{2}{Kn^{(c_{1} -2)}}. \end{array} $$

Therefore, with high probability Z is a fixed point of the Algorithm 2 for any λ ∈ (λ,1). □

Proof of Proposition 4.

Observe that

$$ \begin{array}{@{}rcl@{}} \|\textup{\textbf{A}} -\widehat{\textup{\textbf{V}}}\textup{\textbf{B}}\widehat{\textup{\textbf{V}}}^{\top}\|_{F}^{2} & = &\text{Tr}(\textup{\textbf{A}}^{\top}\textup{\textbf{A}}) - 2\text{Tr}(\widehat{\textup{\textbf{V}}}^{\top}\textup{\textbf{A}}^{\top}\widehat{\textup{\textbf{V}}}\textup{\textbf{B}}) + \text{Tr}(\textbf{B}^{\top}\widehat{\textbf{V}}^{\top}\widehat{\textbf{V}}\textbf{B}\widehat{\textbf{V}}^{\top}\widehat{\textbf{V}})\\ & =& \|\textup{\textbf{B}} - (\widehat{\textup{\textbf{V}}}^{\top}\widehat{\textup{\textbf{V}}})^{-1}\widehat{\textup{\textbf{V}}}^{\top}\textup{\textbf{A}}\widehat{\textup{\textbf{V}}}(\widehat{\textup{\textbf{V}}}^{\top}\widehat{\textup{\textbf{V}}})^{-1}\|_{F}^{2} + C, \end{array} $$

where C is a constant that does not depend on B. Therefore \(\widehat {\textup {\textbf {B}}}\)

$$ \begin{array}{@{}rcl@{}} \widehat{\textup{\textbf{P}}} & = &\underset{\boldsymbol{B}\in\mathbb{R}^{K\times K}}{\arg\min}\|\textup{\textbf{A}} -\widehat{\textup{\textbf{V}}}\textup{\textbf{B}}\widehat{\textup{\textbf{V}}}^{\top}\|_{F}^{2}\\ & = & \widehat{\textup{\textbf{V}}}(\widehat{\textup{\textbf{V}}}^{\top}\widehat{\textup{\textbf{V}}})^{-1}\widehat{\textup{\textbf{V}}}^{\top}\textup{\textbf{A}}\widehat{\textup{\textbf{V}}}(\widehat{\textup{\textbf{V}}}^{\top}\widehat{\textup{\textbf{V}}})^{-1}\widehat{\textup{\textbf{V}}}^{\top}. \end{array} $$

Suppose that \(\widehat {\boldsymbol {V}} = \widehat {\boldsymbol {Q}}\widehat {\boldsymbol {R}}\) for some matrix Q with orthonormal columns of size n × K. Then, \(\widehat {\boldsymbol {R}}\) is a full rank matrix, and therefore

$$(\widehat{\textup{\textbf{V}}}^{\top}\widehat{\textup{\textbf{V}}})^{-1} = \widehat{\textup{\textbf{R}}}^{-1}(\widehat{\textup{\textbf{Q}}}^{\top}\widehat{\textup{\textbf{Q}}})^{-1}(\widehat{\textup{\textbf{R}}^{\top}})^{-1}.$$

Using this equation, we obtain the desired result. □

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Arroyo, J., Levina, E. Overlapping Community Detection in Networks via Sparse Spectral Decomposition. Sankhya A (2021). https://doi.org/10.1007/s13171-021-00245-4

Download citation

Keywords

  • Sparse principal component analysis
  • Stochastic blockmodel
  • Mixed memberships

AMS (2000) subject classification

  • Primary: 62H30
  • Secondary: 91C20
  • 68T10