Mixed membership distribution-free model

Qing, Huan; Wang, Jingli

doi:10.1007/s10115-023-02021-2

Mixed membership distribution-free model

Regular Paper
Published: 27 November 2023

Volume 66, pages 879–904, (2024)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Huan Qing¹ &
Jingli Wang²

137 Accesses
1 Citation
Explore all metrics

Abstract

We consider the problem of community detection in overlapping weighted networks, where nodes can belong to multiple communities and edge weights can be finite real numbers. To model such complex networks, we propose a general framework—the mixed membership distribution-free (MMDF) model. MMDF has no distribution constraints of edge weights and can be viewed as generalizations of some previous models, including the well-known mixed membership stochastic blockmodels. Especially, overlapping signed networks with latent community structures can also be generated from our model. We use an efficient spectral algorithm with a theoretical guarantee of convergence rate to estimate community memberships under the model. We also propose the fuzzy weighted modularity to evaluate the quality of community detection for overlapping weighted networks with positive and negative edge weights. We then provide a method to determine the number of communities for weighted networks by taking advantage of our fuzzy weighted modularity. Numerical simulations and real data applications are carried out to demonstrate the usefulness of our mixed membership distribution-free model and our fuzzy weighted modularity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Overlapping Community Structure and Modular Overlaps in Complex Networks

Fuzzy overlapping community quality metrics

Article 12 July 2015

Combined node and link partitions method for finding overlapping communities in complex networks

Article Open access 26 February 2015

References

Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Article MathSciNet Google Scholar
Fortunato S, Hric D (2016) Community detection in networks: a user guide. Phys Rep 659:1–44
Article MathSciNet Google Scholar
Papadopoulos S, Kompatsiaris Y, Vakali A, Spyridonos P (2012) Community detection in social media. Data Min Knowl Disc 24(3):515–554
Article Google Scholar
Newman MEJ (2004) Analysis of weighted networks. Phys Rev E 70(5):56131–56131
Article Google Scholar
Goldenberg A, Zheng AX, Fienberg SE, Airoldi EM (2010) A survey of statistical network models. Found Trends® Mach Learn Arch 2(2):129–233
Article Google Scholar
Holland PW, Laskey KB, Leinhardt S (1983) Stochastic blockmodels: first steps. Soc Netw 5(2):109–137
Article MathSciNet Google Scholar
Abbe E (2017) Community detection and stochastic block models: recent developments. J Mach Learn Res 18(1):6446–6531
MathSciNet Google Scholar
Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state-of-the-art and comparative study. Acm Comput Surv (csur) 45(4):1–35
Article Google Scholar
Airoldi EM, Blei DM, Fienberg SE, Xing EP (2008) Mixed membership stochastic blockmodels. J Mach Learn Res 9:1981–2014
Google Scholar
Karrer B, Newman MEJ (2011) Stochastic blockmodels and community structure in networks. Phys Rev E 83(1):16107
Article MathSciNet Google Scholar
Zhang Y, Levina E, Zhu J (2020) Detecting overlapping communities in networks using spectral methods. SIAM J Math Data Sci 2(2):265–283
Article MathSciNet Google Scholar
Jin J, Ke ZT, Luo S (2023) Mixed membership estimation for social networks. J Econom. https://doi.org/10.1016/j.jeconom.2022.12.003
Article Google Scholar
Rohe K, Chatterjee S, Yu B (2011) Spectral clustering and the high-dimensional stochastic blockmodel. Ann Stat 39(4):1878–1915
Article MathSciNet Google Scholar
Choi DS, Wolfe PJ, Airoldi EM (2011) Stochastic blockmodels with a growing number of classes. Biometrika 99(2):273–284
Article MathSciNet Google Scholar
Lei J, Rinaldo A (2015) Consistency of spectral clustering in stochastic block models. Ann Stat 43(1):215–237
Article MathSciNet Google Scholar
Abbe E, Sandon C (2015) Community detection in general stochastic block models: fundamental limits and efficient algorithms for recovery. In: 2015 IEEE 56th annual symposium on foundations of computer science, pp 670–688
Jin J (2015) Fast community detection by SCORE. Ann Stat 43(1):57–89
Article MathSciNet Google Scholar
Joseph A, Yu B (2016) Impact of regularization on spectral clustering. Ann Stat 44(4):1765–1791
Article MathSciNet Google Scholar
Abbe E, Bandeira AS, Hall G (2016) Exact recovery in the stochastic block model. IEEE Trans Inf Theory 62(1):471–487
Article MathSciNet Google Scholar
Chen Y, Li X, Xu J (2018) Convexified modularity maximization for degree-corrected stochastic block models. Ann Stat 46(4):1573–1602
Article MathSciNet Google Scholar
Mao X, Sarkar P, Chakrabarti D (2020) Estimating mixed memberships with sharp eigenvector deviations. J Am Stat Assoc 16(536):1928–1940
Article MathSciNet Google Scholar
Qing H, Wang J (2023) Regularized spectral clustering under the mixed membership stochasticblock model. Neurocomputing 550:126490
Article Google Scholar
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393(6684):440–442
Article Google Scholar
Opsahl T, Panzarasa P (2009) Clustering in weighted networks. Soc Netw 31(2):155–163
Article Google Scholar
Colizza V, Pastor-Satorras R, Vespignani A (2007) Reaction-diffusion processes and metapopulation models in heterogeneous networks. Nat Phys 3(4):276–282
Article Google Scholar
Opsahl T, Colizza V, Panzarasa P, Ramasco JJ (2008) Prominence and control: the weighted rich-club effect. Phys Rev Lett 101(16):168702
Article Google Scholar
Liu X, Bollen J, Nelson ML, Sompel H (2005) Co-authorship networks in the digital library research community. Inf Process Manag 41(6):1462–1480
Article Google Scholar
Read KE (1954) Cultures of the central highlands, new guinea. Southwest J Anthropol 10(1):1–43
Article Google Scholar
Yang B, Cheung W, Liu J (2007) Community mining from signed social networks. IEEE Trans Knowl Data Eng 19(10):1333–1348
Article Google Scholar
Kunegis J, Lommatzsch A, Bauckhage C (2009) The slashdot zoo: mining a social network with negative edges. In: Proceedings of the 18th international conference on World Wide Web, pp 741–750
Tang J, Chang Y, Aggarwal C, Liu H (2016) A survey of signed network mining in social media. ACM Comput Surv (CSUR) 49(3):1–37
Article Google Scholar
Brandes U, Kenis P, Lerner J, Van Raaij D (2009) Network analysis of collaboration structure in wikipedia. In: Proceedings of the 18th international conference on World Wide Web, pp 731–740
Kunegis J (2013) Konect: the Koblenz network collection. In: Proceedings of the 22nd international conference on World Wide Web, pp 1343–1350
Aicher C, Jacobs AZ, Clauset A (2015) Learning latent block structure in weighted networks. J Complex Netw 3(2):221–248
Article MathSciNet Google Scholar
Palowitch J, Bhamidi S, Nobel AB (2018) Significance-based community detection in weighted networks. J Mach Learn Res 18(188):1–48
MathSciNet Google Scholar
Xu M, Jog V, Loh P-L (2020) Optimal rates for community estimation in the weighted stochastic block model. Ann Stat 48(1):183–204
Article MathSciNet Google Scholar
Ng TLJ, Murphy TB (2021) Weighted stochastic block model. Statist Methods Appl 30:1365–1398
Article MathSciNet Google Scholar
Qing H (2023) Distribution-free model for community detection. Prog Theor Exp Phys 2023(3):033A01
Article MathSciNet Google Scholar
Qing H, Wang J (2023) Community detection for weighted bipartite networks. Knowl-Based Syst 274:110643
Article Google Scholar
Airoldi EM, Wang X, Lin X (2013) Multi-way blockmodels for analyzing coordinated high-dimensional responses. Ann Appl Stat 7(4):2431–2457
Article MathSciNet Google Scholar
Mao X, Sarkar P, Chakrabarti D (2018) Overlapping clustering models, and one (class) svm to bind them all. Adv Neural Inf Process Syst 31:2126–2136
Google Scholar
Dulac A, Gaussier E, Largeron C (2020) Mixed-membership stochastic block models for weighted networks. In: Conference on uncertainty in artificial intelligence (UAI), vol. 124, pp 679–688
Erdos P, Rényi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5(1):17–60
MathSciNet Google Scholar
Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(4):046110
Article Google Scholar
Lancichinetti A, Fortunato S (2009) Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E 80(1):016118
Article Google Scholar
Gillis N, Vavasis SA (2015) Semidefinite programming based preconditioning for more robust near-separable nonnegative matrix factorization. SIAM J Optim 25(1):677–698
Article MathSciNet Google Scholar
Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Article Google Scholar
Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103(23):8577–8582
Article Google Scholar
Gómez S, Jensen P, Arenas A (2009) Analysis of community structure in networks of correlated data. Phys Rev E 80(1):016114
Article Google Scholar
Nepusz T, Petróczi A, Négyessy L, Bazsó F (2008) Fuzzy communities and the concept of Bridgeness in complex networks. Phys Rev E 77(1):016107
Article MathSciNet Google Scholar
Strehl A, Ghosh J (2002) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(Dec):583–617
MathSciNet Google Scholar
Danon L, Diaz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech: Theory Exp 2005(09):09008
Article Google Scholar
Bagrow JP (2008) Evaluating local community methods in networks. J Stat Mech: Theory Exp 2008(05):05001
Article Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Article Google Scholar
Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th annual international conference on machine learning, pp 1073–1080
Mao X, Sarkar P, Chakrabarti D (2017) On mixed memberships and symmetric nonnegative matrix factorizations, pp 2324–2333
Le CM, Levina E (2022) Estimating the number of communities by spectral methods. Electron J Stat 16(1):3315–3342
Article MathSciNet Google Scholar
Lancichinetti A, Fortunato S, Kertész J (2009) Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 11(3):033015
Article Google Scholar
Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473
Article Google Scholar
Ferligoj A, Kramberger A (1996) An analysis of the slovene parliamentary parties network. Dev Stati Method 12:209–216
Google Scholar
Hayes B (2006) Connecting the dots. Am Sci 94(5):400–404
Article Google Scholar
Knuth DE (1993) The stanford graphbase: a platform for combinatorial computing, vol 37. Addison-Wesley Reading, New York
Google Scholar
Adamic LA, Glance N (2005) The political blogosphere and the 2004 us election: divided they blog, pp 36–43
Opsahl T (2011) Why anchorage is not (that) important: binary ties and sample selection. online] http://toreopsahl.com
Newman ME (2001) The structure of scientific collaboration networks. Proc Natl Acad Sci 98(2):404–409
Article MathSciNet Google Scholar
Zhang H, Guo X, Chang X (2022) Randomized spectral clustering in large-scale stochastic block models. J Comput Graph Stat 31(3):887–906
Article MathSciNet Google Scholar
Tropp JA (2012) User-friendly tail bounds for sums of random matrices. Found Comput Math 12(4):389–434
Article MathSciNet Google Scholar
Cape J, Tang M, Priebe CE (2019) The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics. Ann Stat 47(5):2405–2439
Article MathSciNet Google Scholar
Chen Y, Chi Y, Fan J, Ma C (2021) Spectral methods for data science: a statistical perspective. Found Trends® Mach Learn 14(5):566–806
Article Google Scholar

Download references

Acknowledgements

Wang’s work was supported by the Fundamental Research Funds for the Central Universities, Nankai Univerity, 63231186 and the National Natural Science Foundation of China (Grant 12001295, 12271272).

Author information

Authors and Affiliations

School of Economics and Finance, Chongqing University of Technology, Chongqing, 400054, Chongqing, China
Huan Qing
School of Statistics and Data Science, KLMDASR, LEBPS, and LPMC, Nankai University, Tianjin, 300071, Tianjin, China
Jingli Wang

Authors

Huan Qing
View author publications
You can also search for this author in PubMed Google Scholar
Jingli Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

HQ was involved in conceptualization, methodology, investigation, software, formal analysis, data curation, writing—original draft, writing—reviewing and editing. JW helped in writing—reviewing and editing, funding acquisition.

Corresponding author

Correspondence to Huan Qing.

Ethics declarations

Conflicts of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Vertex hunting algorithm

Algorithm 2 is the SP algorithm.

Appendix B Proofs under MMDF

1.1 B.1 Proof of Proposition 1

Proof

This proposition holds immediately by the first statement of Theorem 2.1 [21] since we let P be a full rank matrix and Theorem 2.1 [21] is a distribution-free result such that it always holds without constraining the distribution of A. $\square $

1.2 B.2 Proof of Lemma 1

Proof

Since $\Omega =\Pi \rho P\Pi '=U\Lambda U'$ and $U'U=I_{K}$, we have $U=\Pi \rho P\Pi 'U\Lambda ^{-1}$, i.e., $B=\rho P\Pi ' U\Lambda ^{-1}$. So B is unique. Since $U=\Pi B$, we have $U(\mathcal {I},:)=\Pi (\mathcal {I},:)B=B$ and the lemma follows. $\square $

1.3 B.3 Proof of Theorem 1

Proof

First, we prove the following lemma to provide an upper bound of row-wise eigenspace error $\Vert \hat{U}\hat{U}'-UU'\Vert _{2\rightarrow \infty }$. $\square $

Lemma 2

(Row-wise eigenspace error) Under $MMDF_{n}(K,P,\Pi ,\rho ,\mathcal {F})$, when Assumption 1 holds, suppose $\sigma _{K}(\Omega )\ge C\sqrt{\gamma \rho n\textrm{log}(n)}$ for some $C>0$, with probability at least $1-o(n^{-3})$, we have

$$\begin{aligned} \Vert \hat{U}\hat{U}'-UU'\Vert _{2\rightarrow \infty }=O(\frac{\sqrt{\gamma n\textrm{log}(n)}}{\sigma _{K}(P)\rho ^{0.5} \lambda ^{1.5}_{K}(\Pi '\Pi )}). \end{aligned}$$

Proof

First, we use Theorem 1.4 (the Matrix Bernstein) of [67] to build an upper bound of $\Vert A-\Omega \Vert _{\infty }$. This theorem is given below $\square $

Theorem 2

Consider a finite sequence $\{X_{k}\}$ of independent, random, self-adjoint matrices with dimension d. Assume that each random matrix satisfies

$$\begin{aligned} \mathbb {E}[X_{k}]=0, \mathrm {and~}\Vert X_{k}\Vert \le R~\mathrm {almost~surely}. \end{aligned}$$

Then, for all $t\ge 0$,

$$\begin{aligned} \mathbb {P}(\Vert \sum _{k}X_{k}\Vert \ge t)\le d\cdot \textrm{exp}(\frac{-t^{2}/2}{\sigma ^{2}+Rt/3}), \end{aligned}$$

where $\sigma ^{2}:=\Vert \sum _{k}\mathbb {E}(X^{2}_{k})\Vert $.

Let $x=(x_{1},x_{2},\ldots , x_{n})'$ be any $n\times 1$ vector. For any $i,j\in [n]$, we have $\mathbb {E}[(A(i,j)-\Omega (i,j))x(j)]=0$ and $\Vert (A(i,j)-\Omega (i,j))x(j)\Vert \le \tau \Vert x\Vert _{\infty }$. Set $R=\tau \Vert x\Vert _{\infty }$. Since $\Vert \sum _{j=1}^{n}\mathbb {E}[(A(i,j)-\Omega (i,j))^{2}x^{2}(j)]\Vert =\Vert \sum _{j=1}^{n}x^{2}(j)\mathbb {E}[(A(i,j)-\Omega (i,j))^{2}]\Vert =\Vert \sum _{j=1}^{n}x^{2}(j)\textrm{Var}(A(i,j))\Vert \le \gamma \rho \sum _{j=1}^{n}x^{2}(j)$, by Theorem 2, for any $t\ge 0$ and $i\in [n]$, we have

$$\begin{aligned} \mathbb {P}(|\sum _{j=1}^{n}(A(i,j)-\Omega (i,j))x(j)|>t)\le 2\textrm{exp}(-\frac{t^{2}/2}{\gamma \rho \sum _{j=1}^{n}x^{2}(j)+\frac{Rt}{3}}). \end{aligned}$$

Set x(j) as 1 or $-1$ such that $(A(i,j)-\Omega (i,j))y(j)=|A(i,j)-\Omega (i,j)|$, we have

$$\begin{aligned} \mathbb {P}(\Vert A-\Omega \Vert _{\infty }>t)\le 2\textrm{exp}(-\frac{t^{2}/2}{\gamma \rho n+\frac{Rt}{3}}). \end{aligned}$$

Set $t=\frac{\alpha +1+\sqrt{(\alpha +1)(\alpha +19)}}{3}\sqrt{\gamma \rho n\textrm{log}(n)}$ for any $\alpha >0$. By assumption 1, we have

$$\begin{aligned} \mathbb {P}(\Vert A-\Omega \Vert _{\infty }>t)\le 2\textrm{exp}(-\frac{t^{2}/2}{\gamma \rho n+\frac{Rt}{3}})\le n^{-\alpha }. \end{aligned}$$

By Theorem 4.2 of [68], when $\sigma _{K}(\Omega )\ge 4\Vert A-\Omega \Vert _{\infty }$, we have

$$\begin{aligned} \Vert \hat{U}-U\mathcal {O}\Vert _{2\rightarrow \infty }\le 14\frac{\Vert A-\Omega \Vert _{\infty }}{\sigma _{K}(\Omega )}\Vert U\Vert _{2\rightarrow \infty }, \end{aligned}$$

where $\mathcal {O}$ is a $K\times K$ orthogonal matrix. With probability at least $1-o(n^{-\alpha })$, we have

$$\begin{aligned} \Vert \hat{U}-U\mathcal {O}\Vert _{2\rightarrow \infty }=O(\frac{\Vert U\Vert _{2\rightarrow \infty }\sqrt{\gamma \rho n\textrm{log}(n)}}{\sigma _{K}(\Omega )}). \end{aligned}$$

Since $\hat{U}'\hat{U}=I_{K},U'U=I_{K}$, by basic algebra, we have $\Vert \hat{U}\hat{U}'-UU'\Vert _{2\rightarrow \infty }\le 2\Vert \hat{U}-U\mathcal {O}\Vert _{2\rightarrow \infty }$, which gives

$$\begin{aligned} \Vert \hat{U}\hat{U}'-UU'\Vert _{2\rightarrow \infty }=O(\frac{\Vert U\Vert _{2\rightarrow \infty }\sqrt{\gamma \rho n\textrm{log}(n)}}{\sigma _{K}(\Omega )}). \end{aligned}$$

Since $\sigma _{K}(\Omega )\ge \sigma _{K}(P)\rho \lambda _{K}(\Pi '\Pi )$ by Lemma II.4 of [21] and $\Vert U\Vert ^{2}_{2\rightarrow \infty }\le \frac{1}{\lambda _{K}(\Pi '\Pi )}$ by Lemma 3.1 of [21], where these two lemmas are distribution-free and always hold as long as Eqs. (2), (4), and (5) hold, we have

$$\begin{aligned} \Vert \hat{U}\hat{U}'-UU'\Vert _{2\rightarrow \infty }=O(\frac{\sqrt{\gamma n\textrm{log}(n)}}{\sigma _{K}(P)\rho ^{0.5} \lambda ^{1.5}_{K}(\Pi '\Pi )}). \end{aligned}$$

Set $\alpha =3$, and this claim follows.

Remark 6

Alternatively, Theorem 4.2. of [69] can also be applied to obtain the upper bound of $\Vert \hat{U}\hat{U}'-UU'\Vert _{2\rightarrow \infty }$, and this bound is similar to the one in Lemma 2.

For convenience, set $\varpi =\Vert \hat{U}\hat{U}'-UU'\Vert _{2\rightarrow \infty }$. Since DFSP is the SPACL algorithm without the prune step of [21], the proof of DFSP’s consistency is the same as SPACL except for the row-wise eigenspace error step where we need to consider $\gamma $ which is directly related with distribution $\mathcal {F}$. By Lemma 2 and Equation (3) in Theorem 3.2 of [21] where the proof is distribution-free, there exists a $K\times K$ permutation matrix $\mathcal {P}$ such that

$$\begin{aligned} \textrm{max}_{i\in [n]}\Vert e'_{i}({\hat{\Pi }}-\Pi \mathcal {P})\Vert _{1}=O(\varpi \kappa (\Pi '\Pi )\sqrt{\lambda _{1}(\Pi '\Pi )})=O(\frac{\kappa ^{1.5}(\Pi '\Pi )\sqrt{\gamma n\textrm{log}(n)}}{\sigma _{K}(P)\rho ^{0.5} \lambda _{K}(\Pi '\Pi )}). \end{aligned}$$

1.4 B.4 Proof of Corollary 1

Proof

When $\lambda _{K}(\Pi '\Pi )=O(\frac{n}{K})$ and $K=O(1)$, we have $\kappa (\Pi '\Pi )=O(1)$ and $\lambda _{K}(\Pi '\Pi )=O(n/K)=O(n)$. Then, the corollary follows immediately by Theorem 1. $\square $

Appendix C Extra simulation results

In this part, we consider two extra simulations: imbalanced networks and running time. For imbalanced networks, we study the stability of DFSP and its competitors when there are small-size communities. For running time, we compare the running time for each method by increasing the network size n. For simplicity, we only consider the case when $\mathcal {F}$ is Normal distribution here. When $A(i,j)\sim \textrm{Normal}(\Omega (i,j),\sigma ^{2}_{A})$, let all nodes be pure, $K=2, \rho =1$, and $\sigma ^{2}_{A}=1$. Set P as

$$\begin{aligned}P=\begin{bmatrix} 1&{}-0.2\\ -0.2&{}0.9\\ \end{bmatrix}.\end{aligned}$$

Let the first community has $\delta n$ nodes. So, the second community has $(1-\delta )n$ nodes. Based on the above settings, we consider the following two simulations.

Changing $\delta $: Let $n=200$ or $n=1000$. Let $\delta $ range in $\{0.025, 0.05, 0.075, \ldots , 0.5\}$. For this case, the two evaluation metrics Hamming error and Relative are not suitable for imbalanced networks. To prioritize the ability of DFSP and its competitors to detect the minority communities, we consider the following two metrics who are the smaller the better.

$$\begin{aligned}&\textrm{Clustering}~l_{1}~\textrm{error}=\textrm{min}_{\mathcal {P}\in \{ K\times K\mathrm {~permutation~matrix}\}}\textrm{max}_{k\in [K]}\frac{\Vert {\hat{\Pi }}(:,k)-(\Pi \mathcal {P})(:,k)\Vert _{1}}{\Vert (\Pi \mathcal {P})(:,k)\Vert _{1}},\\&\textrm{Clustering}~l_{2}~\textrm{error}=\textrm{min}_{\mathcal {P}\in \{K\times K\mathrm {~permutation~matrix}\}}\textrm{max}_{k\in [K]}\frac{\Vert {\hat{\Pi }}(:,k)-(\Pi \mathcal {P})(:,k)\Vert _{F}}{\Vert (\Pi \mathcal {P})(:,k)\Vert _{F}}. \end{aligned}$$

Unlike Hamming error which measures the $l_{1}$ difference between $\Pi $ and ${\hat{\Pi }}$ up to a permutation of community labels, Clustering $l_{1}$ error measures the maximum $l_{1}$ difference between the size of the true k-th community and the size of the estimated k-th community up to a permutation of community labels among all K communities. Therefore, Clustering $l_{1}$ error can evaluate the ability of a community detection method to detect the minority communities. Similar arguments hold for the Clustering $l_{2}$ error.

Panels (a–f) of Fig. 5 display numerical results for changing $\delta $. For the case when $n=200$, we find that DFSP and its competitors perform similarly and all of them can successfully detect the minority community when $\delta \in [0.125,0.5]$, i.e., the proportion of community sizes between the largest community and the smallest community locates in [1, 7]. KDFSP successfully estimates the number of communities K when $\delta \in [0.15,0.5]$ while NB and BHac fail to infer K for all cases. For the case when $n=1000$, DFSP and its competitors successfully detect all communities when $\delta \in [0.1, 0.5]$, i.e., the proportion of community sizes between the largest community and the smallest community locates in [1, 9]. KDFSP correctly determines K when $\delta \in [0.05,0.5]$ while its competitors fail to find K.

Changing n: Let $\delta =0.075$ or $\delta =0.1$, i.e., let the proportion of community sizes between the largest community and the smallest community be $\frac{37}{3}$ or 9. Let n range in $\{2000,4000,6000,\ldots ,12000\}$. For simplicity, we only report the averaged Clustering $l_{1}$ error, averaged Clustering $l_{2}$ error, and averaged running time over 100 repetitions for DFSP and its competitors. Figure 6 displays the numerical results. We see that DFSP is better than GeoNMF, SVM-cD, and OCCAM in both estimation accuracy and running time. In particular, DFSP runs much faster than OCCAM. Meanwhile, DFSP performs satisfactorily for its small clustering errors for the two cases $\delta =0.075$ and $\delta =0.1$. By comparing panel (a) and panel (e) (panel (b) and panel (f)), we see that all methods perform poorer for a more imbalanced network and this result is consistent with that of changing $\delta $. By comparing panel (c) and panel (g) (panel (d) and panel (h)), we see that each method takes more time to detect a more imbalanced network.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Qing, H., Wang, J. Mixed membership distribution-free model. Knowl Inf Syst 66, 879–904 (2024). https://doi.org/10.1007/s10115-023-02021-2

Download citation

Received: 11 June 2023
Revised: 10 October 2023
Accepted: 31 October 2023
Published: 27 November 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s10115-023-02021-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixed membership distribution-free model

Abstract

Access this article

Similar content being viewed by others

Overlapping Community Structure and Modular Overlaps in Complex Networks

Fuzzy overlapping community quality metrics

Combined node and link partitions method for finding overlapping communities in complex networks

References

Acknowledgements