Skip to main content
Log in

Fuzzy community detection on the basis of similarities in structural/attribute in large-scale social networks

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Community detection aims to partition a set of nodes with more similarities in the set than out of it based on different criteria like neighborhood similarity or vertex connectivity. Most present day community detection methods principally concentrate on the topological structure, largely ignoring the heterogeneous properties of the vertex. This paper proposes a new community detection model, based on the possibilistic c-means model, by using structural as well as attribute similarities in a large scale in social networks. In the majority of real social networks, different clusters share nodes, resulting in the formation of overlapping communities. The proposed model, on the basis of structural and attribute similarity (PCMSA), serves as a fuzzy community detection model addressing the overlapping community detection problem, and detecting communities in a way that each community has a densely connected sub-graph with homogeneous attribute values. The function of the proposed model is assessed by a trade-off between intra-cluster and inter-cluster density and homogeneity. Therefore, to validate the proposed community detection algorithm (PCMSA) and its results, an index, compatible with the proposed model, is defined; and to assess the efficiency of the proposed fuzzy community detection, several experimental results in variety sizes from very small to very large sizes of real social networks are given, and the results are contrasted with other community detection models like FCAN, CODICIL, SA-cluster, K-SNAP and PCM. The experimental findings reveal the superiority of this novel model and its promising scalability and computational complexity over others.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig.10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig.15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  • Adamic LA, Glance N (2004) The political blogosphere and the 2004 US election. In n Proceedings of the WWW-2005 Workshop on the Weblogging Ecosystem, 2005

  • Andersen R, Chung F, Lang K (2006) Local graph partitioning using pagerank vectors. Proc - Annu IEEE Symp Found Comput Sci FOCS, pp. 475–483

  • Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms, vol 25, no 3. Utah state university, Logan, Utah, Plenum press, New York

  • Bezdek JC, Keller J, Krisnapuram R, Pal NR (1999) Fuzzy models and algorithms for pattern recognition and image processing, vol 18, no 2. Kluwer Academic Publisher, Boston, London, Dordrecht

  • Bu Z et al (2019) Graph K-means based on leader Identification, dynamic game and opinion dynamics. IEEE Trans Knowl Data Eng 32(7):1348–1361

    Article  Google Scholar 

  • Cao J, Bu Z, Wang Y, Yang H, Jiang J, Li H (2019) Detecting prosumer-community groups in smart grids from the multiagent perspective. IEEE Trans Syst Man Cybern Syst 49(8):1652–1664

    Article  Google Scholar 

  • Dunn JC (1974) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57

    Article  MathSciNet  Google Scholar 

  • Flake GW, Lawrence S, Giles CL (2000) Efficient identification of Web communities, Proc sixth ACM SIGKDD Int Conf Knowl Discov data Min - KDD ’00, 150–160

  • Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174

    Article  MathSciNet  Google Scholar 

  • Fu X, Liu L, Wang C (2013) Detection of community overlap according to belief propagation and conflict. Phys A Stat Mech its Appl 392(4):941–952

    Article  Google Scholar 

  • Höppner F (Ed.) (1999), Fuzzy cluster analysis: methods for classification, data analysis and image recognition

  • Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826

    Article  MathSciNet  Google Scholar 

  • Golsefid SMM, Fazel Zarandi MH, Bastani S (2015) Fuzzy duocentric community detection model in social networks. Soc Networks 43:177–189

    Article  Google Scholar 

  • Granovetter MS (1977) The strength of weak ties, vol 1380. ACADEMIC PRESS, INC.

    Google Scholar 

  • Gustafson DE, Kessel WC (1978) Fuzzy-clustering-with-a-fuzzy-covariance-matrix. IEEE

  • Hu L, Chan KCC (2016) Fuzzy clustering in a complex network based on content relevance and link structures. IEEE Trans Fuzzy Syst 24(2):456–470

    Article  Google Scholar 

  • Kadushin C (2004) Understanding social network

  • Kelley CT (1999) Iterative methods for optimization

  • Krishnapuram R, Keller JM (1993) A Possibilistic approach to clustering. IEEE Trans Fuzzy Syst 1(2):98–110

    Article  Google Scholar 

  • Mendel JM, Mendel JM (2017) Uncertain rule- b ased fuzzy systems

  • Malek Mohamadi Golsefid S, Fazel Zarandi MH, and Susan B (2015) Fuzzy communtiy detection model in social networks. Int J Intell Syst 30:1227–1244

    Article  Google Scholar 

  • Pathak N, Delong C, Banerjee A (2008) Social topic models for community extraction, 2nd SNA-KDD Work., pp. 565–574

  • Ruan Y, Fuhry D, Parthasarathy S (2013) Efficient community detection in large networks using content and links. In WWW ’13 proceedings of the 22nd international conference on Word Wide Web, pp. 1089–1098

  • Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64

    Article  Google Scholar 

  • Sun Y, Han J, Gao J, Yu Y (2009) iTopicModel : information network-integrated topic modelling. In 2009 Ninth IEEE international conference on data mining

  • Tan WW, Chua TW, Cliffs E (2007) Book review, no. February

  • Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization.In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 567–580

  • Traud AL, Mucha PJ, Porter MA (2011) Social structure of facebook networks.PDF, Arxiv Prepr. arXiv1102.2166, 2011., 391(16)s pp. 4165–4180,

  • Valente de Oliveira J, Pedrycz W (2007) Advances in fuzzy clustering and its applications. Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England

  • Wang W, Liu D, Liu X, Pan L (2013) Fuzzy overlapping community detection based on local random walk and multidimensional scaling. Phys A 392(24):6578–6586

    Article  Google Scholar 

  • Wasserman S, Faust K (1994) Social network analysis: methods and applications, Methods and Applications. p. 825

  • Yang MS (1993) A survey of fuzzy clustering. Mat Comput Model 18(11):1–16

    Article  MathSciNet  Google Scholar 

  • Yang T, Jin R, Chi Y, Zhu S (2009) Combining link and content for community detection: a discriminative approach, Proc 15th ACM SIGKDD Int Conf Knowl Discov data Min, pp. 927–936

  • Zarandi MHF, Razaee ZS (2010) A fuzzy clustering model for fuzzy data with outliers. Int J Fuzzy Syst Appl 1(2):1–18

    Google Scholar 

  • Zarandi MHF, Faraji MR, Karbasian M (2010) An exponentioal cluster validity index for fuzzy clustering with crisp and fuzzy data. Sci Iran 17(2):95–110

    Google Scholar 

  • Zarinbal M, Fazel Zarandi MH, Turksen IB (2014) Relative entropy fuzzy c-means clustering. Inf Sci (Ny) 260:74–97

    Article  MathSciNet  Google Scholar 

  • Zhang S, Wang R-S, Zhang X-S (2007) Identification of overlapping community structure in complex networks using fuzzy c-means clustering. Phys A-Stat Mech Its Appl 374:483–490

    Article  Google Scholar 

  • Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural / attribute similarities. Vldb 2(1):718–729

    Google Scholar 

  • Zhou Y, Cheng H, Yu JX (2010) Clustering large attributed graphs: an efficient incremental approach. Proc. - IEEE Int. Conf. Data Mining, ICDM, pp 689–698

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Hossein Fazel Zarandi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Proof of Theorem

Theorem 1

All \(u_{ik}\) in \(U,\forall i,k\) are independent. Hence, minimization of \(J(U,V)\) with regard to U is similar to that of \(J(u_{ik} ,v_{i} )\) regarding \(u_{ik}\). The gradient of \(J(u_{ik} ,v_{i} )\) with respect to \(u_{ik}\) is set to zero in an attempt to find the first-order necessary conditions for optimality:

$$\frac{\partial J}{{\partial u_{ik} }} = m{\text{ u}}_{ik}^{m - 1} D_{ik} - m \, \Delta_{i} (1 - u_{ik} )^{m - 1} = 0 \Rightarrow u_{ik} = \left( {1 + \left( {\frac{{D_{ik} }}{{\Delta_{i} }}} \right)^{1/(m - 1)} } \right)^{ - 1}$$
(30)

To find the most favorable node as the center of cluster, a node with the closest in structure to other members of the cluster considering their membership value \(u_{ik}\) should be selected. As a result, the center of cluster i is defined as follows:

$$v_{i}^{ * } = \mathop {\arg \min }\limits_{{v_{i} \in [1,n]}} (\sum\limits_{k = 1}^{n} {\sum\limits_{j = 1}^{n} {u_{ik}^{m} D_{ik} } } )$$
(31)

Theorem 2

From (18), the necessary conditions considering partial derivatives are as follows:

$$\begin{gathered} \frac{\partial J}{{\partial u_{ik} }} = m(u_{ik} )^{m - 1} d_{ik} (\Omega_{i} ) + m\delta_{i} (1 - u_{ik} )^{m - 1} = 0 \hfill \\ u_{ik}^{*} = \left( {1 + \left( {\frac{{d_{ik} (\Omega_{i} )}}{{ - \delta_{i} }}} \right)^{1/(m - 1)} } \right)^{ - 1} \hfill \\ \end{gathered}$$
(32)
$$\begin{gathered} \left. {\frac{\partial J}{{\partial v_{i} }}} \right|_{*} = - 2\sum\limits_{k = 1}^{n} {(u_{ik} )^{m} H_{i} (x_{k} - v_{i}^{*} )} = 0 \, ; \, i = 1,2,...,c \hfill \\ v_{i}^{*} = \frac{{\sum\limits_{k = 1}^{n} {u_{ik}^{m} } x_{k} }}{{\sum\limits_{k = 1}^{n} {u_{ik}^{m} } }} \hfill \\ \end{gathered}$$
(33)

Note that \(\frac{\partial }{\partial x}(x^{T} Hx) = 2Hx\) in which H is symmetric and is not a function of x.

and finally,

$$\begin{gathered} \left. {\frac{\partial J}{{\partial H_{i} }}} \right|_{*} = \sum\limits_{k = 1}^{n} {u_{ik}^{m} (x_{k} - v_{i} )(x_{k} - v_{i} )^{T} + \lambda_{i} \left| {H_{i}^{*} } \right|} \, H_{i}^{{*^{ - 1} }} = 0 \hfill \\ H_{i}^{{*^{ - 1} }} = \frac{1}{{\lambda_{i} \left| {H_{i}^{*} } \right|}}\sum\limits_{k = 1}^{n} {u_{ik}^{m} } (x_{k} - v_{i}^{*} )(x_{k} - v_{i}^{*} )^{T} \hfill \\ \end{gathered}$$
(34)

The identities \(\frac{\partial }{\partial H}(x^{T} Hx) = xx^{T} { , }\frac{\partial }{\partial H}\left| H \right| = \left| H \right|H^{ - 1}\) are used for a non-singular matrix H and any compatible vector x (Gustafson and Kessel 1978).

Appendix 2

The required condition for converging of the algorithm proposed in Fig. 2 is met when:

$$\lim_{t \to \infty } \left\| {U^{(t)} - U^{(t - 1)} } \right\| = 0$$
(35)

The iterative formula for \(u_{ik}\) originates from the classical gradient descent method (Newton–Raphson method) (Kelley 1999), with \(J_{m}\) as the error function which should be minimized, namely:

$$u_{ik}^{(t)} = u_{ik}^{(t - 1)} - \tau^{(t)} (J_{m} (u_{ik} ,\Omega_{i} ,\delta_{i} )^{(t - 1)} )\left( {\frac{{\partial J_{m} (u_{ik} ,\Omega_{i} ,\delta_{i} )^{(t - 1)} }}{{\partial u_{ik} }}} \right)^{ - 1}$$
(36)

where \(\tau^{(t)}\) denotes a positive parameter of learning rate and \(2 \le c \le 20\) stands for the gradient of \(J_{m}\) with respect to \(u_{ik}\) at (t-1) iteration. Re-writing (36) for U renders:

$$U^{(t)} - U^{(t - 1)} = - \tau^{(t)} (J_{m} (U,\Omega ,\delta )^{(t - 1)} )\left( {\frac{{\partial J_{m} (U,\Omega ,\delta )^{(t - 1)} }}{\partial U}} \right)^{ - 1}$$
(37)

Now, putting (35) in (37):

$$\lim_{t \to \infty } \left\| {U^{(t)} - U^{(t - 1)} } \right\| = \lim_{t \to \infty } \left( {\left\| {\tau^{(t)} } \right\|\left\| {J_{m} (U,\Omega ,\delta )^{(t - 1)} } \right\|\left\| {\frac{{\partial J_{m} (U,\Omega ,\delta )^{(t - 1)} }}{\partial U}} \right\|^{ - 1} } \right)$$
(38)

By considering \(\tau^{(t)} = \frac{{\psi^{(t)} }}{{\left\| {J_{m} (U,\Omega ,\delta )^{(t - 1)} } \right\|\left\| {\frac{{\partial J_{m} (U,\Omega ,\delta )^{(t - 1)} }}{\partial U}} \right\|^{ - 1} }}\), (38) becomes:

$$\lim_{t \to \infty } \left\| {U^{(t)} - U^{(t - 1)} } \right\| = \lim_{t \to \infty } \left\| {\psi^{(t)} } \right\| = 0$$
(39)

where, \(\psi^{(t)} = \psi_{0}^{(t)} /t\) in which \(\psi_{0}^{(t)}\) represents a constant value, and \(\psi_{0}^{(t)} \to 0\) when \(t \to \infty\). Hence, (B.5) is proved and, as a result, the suggested algorithm is convergent.

Appendix 3

NMI is a measurement index that measures the degree of matching between the communities identified by different algorithms and that of the expected. Consider \(F = \{ F_{k} \} (1 < k < c\}\) is the expected communities. NMI is defined as follows (Hu and Chan 2016):

$$NMI = \frac{{\sum\limits_{{k_{1} = 1}}^{c} {\sum\limits_{{k_{2} = 1}}^{c} {n_{{C_{{k_{1} }} ,F_{{k_{2} }} }} \log \left(\frac{{n \, *n_{{C_{{k_{1} }} ,F_{{k_{2} }} }} }}{{n_{{C_{{k_{1} }} }} *n_{{F_{{k_{2} }} }} }}\right)} } }}{{\sqrt {\left(\sum\limits_{k = 1}^{c} {n_{{C_{k} }} \log \frac{{n_{{C_{k} }} }}{n}} \right)\left(\sum\limits_{k = 1}^{c} {n_{{F_{k} }} \log \frac{{n_{{F_{k} }} }}{n}} \right)} }}$$
(40)

where \(n_{{C_{k} }}\) is the number of nodes in \(C_{k}\), \(n_{{F_{k} }}\) is the number of nodes in \(F_{k}\), and \(n_{{C_{{k_{1} }} ,F_{{k_{2} }} }}\) is the number of nodes discovered in both \(C_{{k_{1} }}\) and \(F_{{k_{2} }}\).

For Accuracy measure (Hu and Chan 2016), the mapping function \(Z:C_{{k_{1} }} \to F_{{k_{2} }}\) is needed. To find Z, the \(n_{{C_{{k_{1} }} ,F_{{k_{2} }} }}\) is determined for all combinations of \(C_{{k_{1} }}\) and \(F_{{k_{2} }}\) in C and F, respectively. Given that, this is an iterative process, for each iteration, starting from the largest \(n_{{C_{{k_{1} }} ,F_{{k_{2} }} }}\), the \(C_{{k_{1} }}\) that matches against \(F_{{k_{2} }}\) is determined; then the mapping \(C_{{k_{1} }} \to F_{{k_{2} }}\) to Z is added and \(C_{{k_{1} }}\) and \(F_{{k_{2} }}\) are ignored in future iterations. The process ends when all \(C_{{k_{1} }}\) in C has found a match in F. This measurement is defined as follows:

$$Accuracy = \frac{1}{n}\sum\limits_{k = 1}^{c} {n_{{C_{k} ,Z(C_{k} )}} }$$
(41)

Therefore, the values of NMI and Accuracy are larger when \(C_{k}\) matches better with the expected result F.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Naderipour, M., Fazel Zarandi, M.H. & Bastani, S. Fuzzy community detection on the basis of similarities in structural/attribute in large-scale social networks. Artif Intell Rev 55, 1373–1407 (2022). https://doi.org/10.1007/s10462-021-09987-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-021-09987-x

Keywords

Navigation