Appendix
Proof of Proposition 1.
Because V and \(\widetilde {\textup {\textbf {V}}}\) are two bases of the column space of P, and rank(P) = K, then \(\textbf {P}=\textbf {V}\textbf {U}^{\top }=\widetilde {\textbf {V}}\widetilde {\textbf {U}}^{\top }\) for some full rank matrices \(\textbf {U},\widetilde {\textbf {U}}\in \mathbb {R}^{n\times K}\) and therefore
$$ \textup{\textbf{V}}=\widetilde{\textup{\textbf{V}}}(\widetilde{\textup{\textbf{U}}}^{\top}\textup{\textbf{U}})({\textup{\textbf{U}}}^{\top}\textup{\textbf{U}})^{-1}. $$
(A.1)
Let \((\widetilde {\textup {\textbf {U}}}^{\top }\textup {\textbf {U}})({\textup {\textbf {U}}}^{\top }\textup {\textbf {U}})^{-1}=\boldsymbol {\Lambda }\). We will show that Λ = QD for a permutation matrix Q ∈{0,1}K×K and a diagonal matrix \(\textup {\textbf {D}}\in \mathbb {R}^{K\times K}\), or in other words, this is a generalized permutation matrix.
Let \(\boldsymbol {\theta },\widetilde {\boldsymbol {\theta }}\in \mathbb {R}^{n}\) and \(\textup {\textbf {Z}},\widetilde {\textup {\textbf {Z}}}\in \mathbb {R}^{n\times K}\) such that \(\boldsymbol {\theta }_{i} = \left ({\sum }_{k=1}^{K}\textup {\textbf {V}}_{ik}^{2}\right )^{1/2}\), \(\widetilde {\boldsymbol {\theta }}_{i} = \left ({\sum }_{k=1}^{K}\widetilde {\textup {\textbf {V}}}_{ik}^{2}\right )^{1/2}\), and Zik = Vik/𝜃i if 𝜃i > 0, and Zik = 0 otherwise (similarly for \(\widetilde {\textup {\textbf {Z}}}\)). Denote by \(\mathcal {S}_{1}=(i_{1},\ldots ,i_{K})\) to the vector of row indexes that satisfy \(\textup {\textbf {V}}_{i_{j}j}> 0\) and \(\textup {\textbf {V}}_{i_{j}j'}=0\) for j′≠j, and j = 1,…,j (these indexes exist by assumption). In the same way, define \(\mathcal {S}_{2}=(i'_{1},\ldots ,i'_{K})\) such that \(\widetilde {\textup {\textbf {V}}}_{i^{\prime }_{j}j}> 0\) and \(\widetilde {\textup {\textbf {V}}}_{i_{j}j'}=0\) for j′≠j. j = 1,…,j. Denote by \(\textup {\textbf {Z}}_{\mathcal {S}}\) to the K × K matrix formed by the rows indexed by \(\mathcal {S}\). Therefore
$$\textup{\textbf{Z}}_{\mathcal{S}_{1}}= \textup{\textbf{I}}_{K}=\widetilde{\textup{\textbf{Z}}}_{\mathcal{S}_{2}}.$$
Write \(\boldsymbol {\Theta } = \text {diag}(\boldsymbol {\theta })\in \mathbb {R}^{n\times n}\) and \(\widetilde {\boldsymbol {\Theta }} = \text {diag}(\widetilde {\boldsymbol {\theta }})\in \text {real}^{n\times n}\). From Eq. A.1 we have
$$ \begin{array}{@{}rcl@{}} (\boldsymbol{\Theta}\textup{\textbf{Z}})_{\mathcal{S}_{2}}= & (\widetilde{\boldsymbol{\Theta}}\widetilde{\textup{\textbf{Z}}})_{\mathcal{S}_{2}}\boldsymbol{\Lambda} =\widetilde{\boldsymbol{\Theta}}_{\mathcal{S}_{2}, \mathcal{S}_{2}} \widetilde{\textup{\textbf{Z}}}_{\mathcal{S}_{2}}\boldsymbol{\Lambda} = \widetilde{\boldsymbol{\Theta}}_{\mathcal{S}_{2}, \mathcal{S}_{2}}\boldsymbol{\Lambda}\ , \end{array} $$
where \({\Theta }_{\mathcal {S}, \mathcal {S}}\) is the submatrix of Θ formed by the rows and columns indexed by \(\mathcal {S}\). Thus,
$$\boldsymbol{\Lambda} = (\widetilde{\boldsymbol{\Theta}}_{\mathcal{S}_{2}, \mathcal{S}_{2}}^{-1}{\boldsymbol{\Theta}}_{\mathcal{S}_{2}, \mathcal{S}_{2}})\textup{\textbf{Z}}_{\mathcal{S}_{2}},$$
which implies that Λ is a non-negative matrix. Applying the same to the equation \((\boldsymbol {\Theta }\textbf {Z})_{\mathcal {S}_{1}}\boldsymbol {\Lambda }^{-1}= (\widetilde {\boldsymbol {\Theta }}\widetilde {\textup {\textbf {Z}}})_{\mathcal {S}_{1}}\), we have
$$\boldsymbol{\Lambda}^{-1} = ({\boldsymbol{\Theta}}_{\mathcal{S}_{1}, \mathcal{S}_{1}}^{-1}\widetilde{\boldsymbol{\Theta}}_{\mathcal{S}_{1}, \mathcal{S}_{1}})\widetilde{\textup{\textbf{Z}}}_{\mathcal{S}_{1}}.$$
Hence, both Λ and Λ− 1 are non-negative matrices, which implies that Λ is a positive generalized permutation matrix, so Λ = QD for some permutation matrix Q and a diagonal matrix D with diag(D) > 0. □
Proof of Proposition 2.
Let \(\boldsymbol {\theta }\in \mathbb {R}^{n}\) be a vector such that \({\boldsymbol {\theta }_{i}^{2}}={\sum }_{k=1}^{K}\boldsymbol {V}_{ik}^{2}\), and define \(\textup {\textbf {Z}}\in \mathbb {R}^{n\times K}\) such that \(\textbf {Z}_{ik}=\frac {1}{\theta _{i}}\textbf {V}_{ik}\), for each i ∈ [n],k ∈ [K]. Let B = (V⊤V)− 1VTU. To show that B is symmetric, observe that VU⊤ = P = P⊤ = UV⊤. Multiplying both sides by V and V⊤,
$$\textup{\textbf{V}}^{\top} \textup{\textbf{V}} \textup{\textbf{U}}^{\top} \textup{\textbf{V}} = \textup{\textbf{V}}^{\top} \textup{\textbf{U}}\textup{\textbf{V}}^{\top} \textup{\textbf{V}},$$
and observing that (V⊤V)− 1 exists since V is full rank, we have
$$ \textup{\textbf{U}}^{\top} \textup{\textbf{V}}(\textup{\textbf{V}}^{\top} \textup{\textbf{V}})^{-1} = (\textup{\textbf{V}}^{\top} \textup{\textbf{V}})^{-1}\textup{\textbf{V}}^{\top} \textup{\textbf{U}},$$
which implies that B⊤ = B. To obtain the equivalent representation for P, form a diagonal matrix Θ = diag(𝜃). Then ΘZ = V, and
$$ \begin{array}{@{}rcl@{}} \boldsymbol{\Theta}\textup{\textbf{Z}}\textup{\textbf{B}}\textup{\textbf{Z}}^{\top} \boldsymbol{\Theta} \! =\! \textup{\textbf{V}}[(\textup{\textbf{V}}^{T}\textup{\textbf{V}})^{-1}\textup{\textbf{V}}^{T}\textup{\textbf{U}}]\textup{\textbf{V}}^{\top} = \textup{\textbf{V}}(\textup{\textbf{V}}^{T}\textup{\textbf{V}})^{-1}\textup{\textbf{V}}^{T}\textup{\textbf{V}}\textup{\textbf{U}}^{\top} = \textup{\textbf{V}}\textup{\textbf{U}}^{\top} = \textup{\textbf{P}}. \end{array} $$
Finally, under the conditions of Proposition 1, V uniquely determines the pattern of zeros of any non-negative eigenbasis of P, and therefore supp(V) = supp(ΘZQ) = supp(ZQ) for some permutation Q. □
Proof of Proposition 3.
Suppose that P = VU⊤ for some non-negative matrix V that satisfies the assumptions of Proposition 1. Let D ∈realK such that Di = ∥V⋅k∥2 and D = diag(D). Then \(\textbf {P} = \widetilde {\textbf {V}}\textbf {D}\textbf {U}^{\top }\). Let \(\textup {\textbf {V}}^{(0)} = \widetilde {\textup {\textbf {V}}}\) be the initial value of Algorithm 1. Then, observe that
$$\textup{\textbf{T}}^{(1)} = \textup{\textbf{P}}\widetilde{\textup{\textbf{V}}} = \widetilde{\textup{\textbf{V}}}\textup{\textbf{D}}\textup{\textbf{U}}^{T}\widetilde{\textup{\textbf{V}}},$$
$$ \begin{array}{@{}rcl@{}} \widetilde{\textup{\textbf{T}}}^{(1)} & = &\textup{\textbf{T}}^{(1)}\left[\widetilde{\textup{\textbf{V}}}^{\top}\textup{\textbf{T}}^{(1)}\right]^{-1}(\widetilde{\textup{\textbf{V}}}^{T}\widetilde{\textup{\textbf{V}}})\\ & =& \widetilde{\textup{\textbf{V}}}\textup{\textbf{D}} (\textup{\textbf{U}}^{T}\widetilde{\textup{\textbf{V}}})\left( \textup{\textbf{U}}^{T}\widetilde{\textup{\textbf{V}}}\right)^{-1}\textup{\textbf{D}}^{-1}(\widetilde{\textup{\textbf{V}}}^{\top}\widetilde{\textup{\textbf{V}}})^{-1}(\widetilde{\textup{\textbf{V}}}^{\top}\widetilde{\textup{\textbf{V}}})\\ & = &\widetilde{\textup{\textbf{V}}}. \end{array} $$
Suppose that λ ∈ [0,v∗). Then, \(\lambda \max \limits _{j\in [K]}|\widetilde {\textup {\textbf {V}}}| <\widetilde {\textup {\textbf {V}}}_{ik}\) for all i ∈ [n],k ∈ [K] such that Vik > 0, and hence \(\textup {\textbf {U}}^{(1)}=\mathcal {S}(\widetilde {\textup {\textbf {V}}}, \lambda ) = \widetilde {\textup {\textbf {V}}}\). Finally, since \(\|\widetilde {\textup {\textbf {V}}}_{\cdot ,k}\|_{2}=1\) for all k ∈ [K], then \(\textup {\textbf {V}}^{(1)}=\widetilde {\textup {\textbf {V}}}\). □
Proof of Theorem 1.
The proof consists of a one-step fixed point
analysis of Algorithm 2. We will show that if Z(t) = Z, then Z(t+ 1) = Z with high probability. Let T = T(t+ 1) = AZ be value after the multiplication step. Define \(\textup {\textbf {C}}\in \mathbb {R}^{K\times K}\) to be the diagonal matrix with community sizes on the diagonal, Ckk = nk = ∥Z⋅,k∥1. Then \(\widetilde {\textup {\textbf {T}}}=\widetilde {\textup {\textbf {T}}}^{(t+1)}= \textup {\textbf {T}}\textup {\textbf {C}}^{-1}\). In order for the threshold to set the correct set of entries to zero, a sufficient condition is that in each row i the largest element of \(\widetilde {\boldsymbol {T}}_{i,\cdot }\) corresponds to the correct community. Define \(\mathcal {C}_{k}\subset [n]\) as the node subset corresponding to community k. Then,
$$ \widetilde{\textup{\textbf{T}}}_{ik} = \frac{1}{{n_{k}}} \textup{\textbf{A}}_{i,\cdot} \boldsymbol{Z}_{\cdot, k} = \frac{1}{{n_{k}}}\sum\limits_{j\in\mathcal{C}_{k}}\textup{\textbf{A}}_{ij}. $$
Therefore \(\widetilde {\textup {\textbf {T}}}_{ik}\) is a sum of independent and identically distributed Bernoulli random variables. Moreover, for each k1 and k2 in [K], \(\widetilde {\textup {\textbf {T}}}_{ik_{1}}\) and \(\widetilde {\textup {\textbf {T}}}_{ik_{2}}\) are independent of each other.
Given a value of λ ∈ (0,1), let
$$\mathcal{E}_{i}(\lambda)=\{ \lambda |\widetilde{\boldsymbol{T}}_{ik_{i}}|> |\widetilde{\boldsymbol{T}}_{ik_{j}}|, i\in\mathcal{C}_{k_{i}} \forall k_{j}\neq k_{i}\}$$
be the event that the largest entry of \(\widetilde {\textbf {T}}_{i\cdot }\) corresponds to ki, that is, the entry corresponding to the community of node i, and all the other indexes in that row are smaller in magnitude than \(\lambda |\widetilde {\textbf {T}}_{ik_{i}}|\). Let \(\textbf {U} = \textbf {U}^{(t+1)}=\mathcal {S}(\widetilde {\textbf {T}}^{(t+1)}, \lambda )\) be the matrix obtained after the thresholding step. Under the event \(\mathcal {E}(\lambda )=\bigcap _{i=1}^{n} \mathcal {E}_{i}(\lambda )\), we have that \(\|\textbf {U}_{i,\cdot }\|_{\infty } = \textbf {U}_{ik_{i}}\) for each i ∈ [n], and hence
$$\textup{\textbf{U}}_{ik}= \left\{\begin{array}{cl} \textup{\textbf{U}}_{ik_{i}} & \text{if }k=k_{i},\\ 0 & \text{otherwise.} \end{array}\right. $$
Therefore, under the event \(\mathcal {E}(\lambda )\), the thresholding step recovers the correct support, so Z(t+ 1) = Z.
Now we verify that under the conditions of Theorem 3.6, the event \(\mathcal {E}(\lambda )\) happens with high probability. By a union bound,
$$ \begin{array}{@{}rcl@{}} \mathbb{P}(\mathcal{E}(\lambda)) \geq 1-\sum\limits_{i=1}^{n} \mathbb{P}(\mathcal{E}_{i}(\lambda)^{C}) \geq 1-\sum\limits_{i=1}^{n}\sum\limits_{j\neq k_{i}}\mathbb{P}(\widetilde{\textup{\textbf{T}}}_{ij}>\lambda \widetilde{\textup{\textbf{T}}}_{ik_{i}}). \end{array} $$
(A.2)
For j≠ki, \(\widetilde {\textup {\textbf {T}}}_{ij}-\lambda \widetilde {\textup {\textbf {T}}}_{ik_{i}}\) is a sum of independent random variables with expectation
$$ \begin{array}{@{}rcl@{}} \mathbb{E}\left[\widetilde{\textup{\textbf{T}}}_{ij}-\lambda \widetilde{\textup{\textbf{T}}}_{ik_{i}}\right] = \frac{1}{n_{j}}\sum\limits_{j\in\mathcal{C}_{j}}\mathbb{E}[\boldsymbol{A}_{ij}] - \frac{\lambda}{n_{k_{i}}}\sum\limits_{j\in\mathcal{C}_{k_{i}}}\mathbb{E}[\textup{\textbf{A}}_{ij}]\\ = q - \lambda \frac{n_{k_{i}}-1}{n_{k_{i}}}p. \end{array} $$
(A.3)
By Hoeffding’s inequality, we have that for any \(\tau \in \mathbb {R}\),
$$ \begin{array}{@{}rcl@{}} \mathbb{P}\left( \widehat{\textup{\textbf{T}}}_{ij}-\lambda \widehat{\textup{\textbf{T}}}_{ik_{i}} \geq \tau + \mathbb{E}\left[\widehat{\textup{\textbf{T}}}_{ij} - \lambda \widehat{\textup{\textbf{T}}}_{ik_{i}}\right] \right) & \leq& 2\exp\left( \frac{-2\tau^{2}}{\frac{1}{n_{j}} + \frac{\lambda^{2}}{n_{k_{i}}}}\right)\\ & \leq& 2\exp\left( - \frac{2n_{\min} \tau^{2}}{1+\lambda^{2}}\right) \\&\leq& 2\exp\left( -n_{\min} \tau^{2}\right), \end{array} $$
where \(n_{\min \limits } = \min \limits _{k\in [K]}n_{k}\). Setting
$$ \begin{array}{@{}rcl@{}} \tau = -\mathbb{E}\left[\widehat{\textup{\textbf{T}}}_{ij}- \lambda \widehat{\textup{\textbf{T}}}_{ik_{i}}\right] \geq \lambda^{\ast} p - q - \frac{1}{n_{k_{i}}}p \end{array} $$
and using Eq. A.3 and 3.6, we obtain that for n sufficiently large,
$$ \begin{array}{@{}rcl@{}} \mathbb{P}\left( \widehat{\textup{\textbf{T}}}_{ij}>\lambda \widehat{\textup{\textbf{T}}}_{ik_{i}} \right) & \leq& 2\exp\left( -n_{\min} \left( c_{1}\sqrt{\frac{\log(Kn)}{\min_{k}n_{k}}} -\frac{p}{n_{k}} \right)^{2} \right)\\ & \leq &2\exp\left( -n_{\min} \left( (c_{1}-1)\frac{\log(Kn)}{n_{\min}} \right) \right)= \frac{2}{(Kn)^{c_{1}-1}}. \end{array} $$
Combining with the bound (A.2), the probability of event \(\mathcal {E}(\lambda )\) (which implies that Z(t+ 1) = Z) is bounded from below as
$$ \begin{array}{@{}rcl@{}} \mathbb{P}(\mathcal{E}(\lambda)) & \geq& 1- n(K-1)\min_{i\in[n], k\in[K]}\mathbb{P}\left( \widehat{\textup{\textbf{T}}}_{ij}>\lambda \widehat{\textup{\textbf{T}}}_{ik_{i}} \right)\\ & \geq& 1-\frac{2(K-1)n}{(Kn)^{c_{1}-1}}\\ &\geq& 1- \frac{2}{Kn^{(c_{1} -2)}}. \end{array} $$
Therefore, with high probability Z is a fixed point of the Algorithm 2 for any λ ∈ (λ∗,1). □
Proof of Proposition 4.
Observe that
$$ \begin{array}{@{}rcl@{}} \|\textup{\textbf{A}} -\widehat{\textup{\textbf{V}}}\textup{\textbf{B}}\widehat{\textup{\textbf{V}}}^{\top}\|_{F}^{2} & = &\text{Tr}(\textup{\textbf{A}}^{\top}\textup{\textbf{A}}) - 2\text{Tr}(\widehat{\textup{\textbf{V}}}^{\top}\textup{\textbf{A}}^{\top}\widehat{\textup{\textbf{V}}}\textup{\textbf{B}}) + \text{Tr}(\textbf{B}^{\top}\widehat{\textbf{V}}^{\top}\widehat{\textbf{V}}\textbf{B}\widehat{\textbf{V}}^{\top}\widehat{\textbf{V}})\\ & =& \|\textup{\textbf{B}} - (\widehat{\textup{\textbf{V}}}^{\top}\widehat{\textup{\textbf{V}}})^{-1}\widehat{\textup{\textbf{V}}}^{\top}\textup{\textbf{A}}\widehat{\textup{\textbf{V}}}(\widehat{\textup{\textbf{V}}}^{\top}\widehat{\textup{\textbf{V}}})^{-1}\|_{F}^{2} + C, \end{array} $$
where C is a constant that does not depend on B. Therefore \(\widehat {\textup {\textbf {B}}}\)
$$ \begin{array}{@{}rcl@{}} \widehat{\textup{\textbf{P}}} & = &\underset{\boldsymbol{B}\in\mathbb{R}^{K\times K}}{\arg\min}\|\textup{\textbf{A}} -\widehat{\textup{\textbf{V}}}\textup{\textbf{B}}\widehat{\textup{\textbf{V}}}^{\top}\|_{F}^{2}\\ & = & \widehat{\textup{\textbf{V}}}(\widehat{\textup{\textbf{V}}}^{\top}\widehat{\textup{\textbf{V}}})^{-1}\widehat{\textup{\textbf{V}}}^{\top}\textup{\textbf{A}}\widehat{\textup{\textbf{V}}}(\widehat{\textup{\textbf{V}}}^{\top}\widehat{\textup{\textbf{V}}})^{-1}\widehat{\textup{\textbf{V}}}^{\top}. \end{array} $$
Suppose that \(\widehat {\boldsymbol {V}} = \widehat {\boldsymbol {Q}}\widehat {\boldsymbol {R}}\) for some matrix Q with orthonormal columns of size n × K. Then, \(\widehat {\boldsymbol {R}}\) is a full rank matrix, and therefore
$$(\widehat{\textup{\textbf{V}}}^{\top}\widehat{\textup{\textbf{V}}})^{-1} = \widehat{\textup{\textbf{R}}}^{-1}(\widehat{\textup{\textbf{Q}}}^{\top}\widehat{\textup{\textbf{Q}}})^{-1}(\widehat{\textup{\textbf{R}}^{\top}})^{-1}.$$
Using this equation, we obtain the desired result. □