An approach based on mixed hierarchical clustering and optimization for graph analysis in social media network: toward globally hierarchical community structure
- 8 Downloads
Abstract
As the massive size of contemporary social networks poses a serious challenge to the scalability of traditional graph clustering algorithms and the evaluation of discovered communities, we develop, in this manuscript, an approach used to discover hierarchical community structure in large networks. The introduced hybrid technique combines the strengths of bottom-up hierarchical clustering method with that of top-down hierarchical clustering. In fact, the first approach is efficient in identifying small clusters, while the second one is good at determining large ones. Our mixed hierarchical clustering technique, based on the assumption that there exists an initial solution composed of k classes and the combination of the two previously mentioned methods, does not the change of the number of partitions, modifies the repartition of the initial classes. At the end of the introduced clustering process, a fixed point, representing a local optimum of the cost function which measures the degree of importance between two partitions, is obtained. Consequently, the introduced combined model leads to the emergence of local community structure. To avoid this local optimum and detect community structure converged to the global optimum of the cost function, the detection of community structures, in this study, is not considered only as a clustering problem, but as an optimization issue. Besides, a novel mixed hierarchical clustering algorithm based on swarms intelligence is suggested for identifying community structures in social networks. In order to validate the proposed method, performances of the introduced approach are evaluated using both real and artificial networks as well as internal and external clustering evaluation criteria.
Keywords
Social networks Social network analysis Graph analysis Community detection Hierarchical clustering Bottom-up clustering Top-down clustering Mixed clustering Swarms intelligence Ant colony optimization Bee colony optimizationAbbreviations
- SHC
Similarity-based hierarchical community
- HAMUHI-CODE
Heuristic algorithm for multi-scale hierarchical community detection
- PMAC
Partial matrix approximation convergence
- SN
Social network
- JS
Jaccard similarity measure
- AgA
Agglomerative algorithm
- DST
Dependence similarity table
- AHL
Ascendant hierarchical level
- DivA
Divisive algorithm
- DHL
Descendant hierarchical level
- MHA
Mixed hierarchical algorithm
- T-D-H-L
Top-down hierarchical level
- B-U-H-L
Bottom-up hierarchical level
- MHAS
Mixed hierarchical algorithm-based swarms
- AntCDivA
Ant colony-based divisive algorithm
- BeeCAgA
Bee colony-based agglomerative algorithm
- LFR benchmark
Lancichinetti Fortunato Radicchi benchmark
- CEC
Cross-entropy clustering
- NMI
Normalized mutual information
- DBI
Davies–Bouldin index
- PGP
Pretty good privacy
- SI
Swarm intelligence
List of symbols
- \(Q_\mathrm{comb}\)
Combined modularity function
- \(Q_\mathrm{comb}\)
Separated modularity function
- \(\mathrm{SN} = (V; E; \mu )\)
Graph modeling SN
- V
Nodes representing to social network members
- E
Edges modeling the relationship between social network members
- \(\mu \)
Weight of edges
- n
Number of nodes
- \(\ell \)
Hierarchical level
- k
Number of sub-detected partitions at each hierarchical level
- \(P=\{p_{1},p_{2},\ldots ,p_{s}\}\), \(G=\{g_{1},g_{2},\ldots ,g_{r}\}\), \(C=\{c_{1},c_{2},\ldots ,c_{s}\}\)
SN detected partitions
- \(p_{1},p_{2},\ldots ,p_{s}\), \(g_{1},g_{2},\ldots ,g_{r}\), \(c_{1},c_{2},\ldots ,c_{s}\)
Sub-partitions
- m
Social network members’
- D
Any element contained in SN partitions
- A[i, j]
The adjacency matrix of SN
- \(\overline{A}{[}i{]}\)
Average of the vector A[i]
- cov(\(E_{i,j}\))
Covariance function
- Op(\(V_{i}\))
Extracted opinions from the node\(V_{i}\)
- Op(\(V_{j}\))
Extracted opinions from the node\(V_{j}\).
- \(N_{i}\)
Neighbor of node i
- \(N_{j}\)
Neighbor of node j
- \(Score_{importantOp}\)
Function measuring the degree of importance of nodes
- \(GScore_{importantOp}\)
General \(GScore_{importantOp}\)
- \(MoyScore_{importantOp}\)
Average of \(Score_{importantOp}\) of sub-partitions
- Initpart
Initial partition
- cordMin
Function returning m having the least \(Score_{importantOp}\) value
- cordMax
Function returning m having the highest \(Score_{importantOp}\) value
- \(Q_{DS}\)
Dependance similarity-based modularity
- \(AgQ_{DS}\)
\(Q_{DS}\) function for BeeCAgA
- \(DivQ_{DS}\)
\(Q_{DS}\) function for AntCDivA
- \(MixQ_{DS}\)
\(Q_{DS}\) function for MHAS
- E
Energy function
Notes
References
- 1.Aggarwal CC (2011) An introduction to social network data analytics. In: Social network data analytics. Springer, Berlin, pp 1–15Google Scholar
- 2.Ahn JP, Bagrow Y-Y, Lehmann S (2010) Link communities reveal multi-scale complexity in networks. Nature 446:761CrossRefGoogle Scholar
- 3.Ahn Y-Y, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466(7307):761CrossRefGoogle Scholar
- 4.Ahn YY, Lehmann S, Bagrow JP (2009) Communities and hierarchical organization of links in complex networks. arXiv:0903.3178
- 5.Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008CrossRefGoogle Scholar
- 6.Boguná M, Pastor-Satorras R, Díaz-Guilera A, Arenas A (2004) Models of social networks based on social distance attachment. Phys Rev E 70(5):056122CrossRefGoogle Scholar
- 7.Cai Q, Ma L, Gong M, Tian D (2016) A survey on network community detection based on evolutionary computation. Int J Bio Inspir Comput 8(2):84–98CrossRefGoogle Scholar
- 8.Castrillo E, Leon E, Gomez J (2017) Fast heuristic algorithm for multi-scale hierarchical community detection. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp 982–989Google Scholar
- 9.Clauset A, Newman ME, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111CrossRefGoogle Scholar
- 10.Danon L, DÃaz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp 2005(09), P09008. Retrieved from http://stacks.iop.org/1742-5468/2005/i=09/a=P09008
- 11.Duch J, Arenas A (2005) Community detection in complex networks using extremal optimization. Phys Rev E 72(2):027104CrossRefGoogle Scholar
- 12.Dutta S, Ghatak S, Roy M, Ghosh S, Das AK (2015) A graph based clustering technique for tweet summarization. In: 2015 4th international conference on reliability, infocom technologies and optimization (ICRITO) (trends and future directions), pp 1–6Google Scholar
- 13.Fortunato S (2011) Benchmark graphs to test community detection algorithms. https://sites.google.com/site/santofortunato/inthepress2)
- 14.Fortunato S, Barthelemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci 104(1):36–41CrossRefGoogle Scholar
- 15.Fortunato S (2007) Community detection in graphs. Phys Rep 486:75–174MathSciNetCrossRefGoogle Scholar
- 16.Frenken K, Mendritzki S (2012) Optimal modularity: a demonstration of the evolutionary advantage of modular architectures. J Evol Econ 22(5):935–956CrossRefGoogle Scholar
- 17.Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826MathSciNetCrossRefzbMATHGoogle Scholar
- 18.Gonzalez-Pardo A, Jung JJ, Camacho D (2017) Aco-based clustering for ego network analysis. Fut Gener Comput Syst 66:160–170CrossRefGoogle Scholar
- 19.Guimera R, Sales-Pardo M, Amaral LAN (2007) Module identification in bipartite and directed networks. Retrieved from http://arxiv.org/abs/physics/0701151 (cite arXiv:physics/0701151)
- 20.Gulbahce N, Lehmann S (2008) The art of community detection. BioEssays 30(10):934–938CrossRefGoogle Scholar
- 21.Harrington J, Salibián-Barrera M (2010) Finding approximate solutions to combinatorial problems with very large data sets using birch. Comput Stat Data Anal 54(3):655–667MathSciNetCrossRefzbMATHGoogle Scholar
- 22.Herrmann S, Ochoa G, Rothlauf F (2016) Communities of local optima as funnels in fitness landscapes. In: Proceedings of the genetic and evolutionary computation conference 2016, pp 325–331Google Scholar
- 23.John Lu Z (2010) The elements of statistical learning: data mining, inference, and prediction. J R Stat Soc Ser A (Stat Soc) 173(3):693–694CrossRefGoogle Scholar
- 24.Kim B, Kim J, Yi G (2017) Analysis of clustering evaluation considering features of item response data using data mining technique for setting cut-off scores. Symmetry 9(5):62CrossRefGoogle Scholar
- 25.Kim Y, Son S-W, Jeong H (2010) Finding communities in directed networks. Phys Rev E 81(1):016103CrossRefGoogle Scholar
- 26.Li Y, He K, Bindel D, Hopcroft J (2015) Overlapping community detection via local spectral clustering. arXiv preprint arXiv:1509.07996
- 27.Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Assoc Inf Sci Technol 58(7):1019–1031CrossRefGoogle Scholar
- 28.Liu Y, Wang Q, Wang Q, Yao Q, Liu Y (2007) Email community detection using artificial ant colony clustering. In: Advances in web and network technologies, and information management. Springer, Berlin, pp 287–298Google Scholar
- 29.LIU Y, YANG T, FU L, LIU J (2015) Community detection in networks based on information bottleneck clustering. J Comput Inf Syst 11(2):693–700Google Scholar
- 30.Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E, Dawson SM (2003) The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav Ecol Sociobiol 54(4):396–405CrossRefGoogle Scholar
- 31.Mathias SB, Rosset V, Nascimento M (2016) Community detection by consensus genetic-based algorithm for directed networks. Proc Comput Sci 96:90–99CrossRefGoogle Scholar
- 32.Moradi P, Rostami M (2015) Integration of graph clustering with ant colony optimization for feature selection. Knowl Based Syst 84:144–161CrossRefGoogle Scholar
- 33.Newman M (2004) Detecting community structure in networks. Eur Phys J 38:321–330CrossRefGoogle Scholar
- 34.Newman ME (2006a) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104MathSciNetCrossRefGoogle Scholar
- 35.Newman ME (2006b) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582CrossRefGoogle Scholar
- 36.Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113CrossRefGoogle Scholar
- 37.Papadopoulos KYVAS, Spyridonos P (2012) Community detection in social media. Data Min Knowl Discov 24:515–554CrossRefGoogle Scholar
- 38.Pons P, Latapy M (2005) Computing communities in large networks using random walks. In: Computer and information sciences-ISCIS 2005. Springer, Berlin, pp 284–293Google Scholar
- 39.Ratkiewicz J, Conover M, Meiss MR, Goncalves B, Flammini, A., Menczer F (2011) Detecting and tracking political abuse in social media. In: ICWSM11, pp 297–304Google Scholar
- 40.Ravasz E, Barabasi A-L (2003) Hierarchical organization in complex networks. Phys Rev E67(2):026112zbMATHGoogle Scholar
- 41.Rees BS, Gallagher KB (2012) Overlapping community detection using a community optimized graph swarm. Soc Netw Anal Min 2(4):405–417CrossRefGoogle Scholar
- 42.Richardson T, Mucha PJ, Porter MA (2009) Spectral tripartitioning of networks spectral tripartitioning of networks. Phys Rev E 80(3):036111CrossRefGoogle Scholar
- 43.Rosset V, Paulo MA, Cespedes JG, Nascimento M (2017) Enhancing the reliability on data delivery and energy efficiency by combining swarm intelligence and community detection in large-scale WSNs. Exp Syst Appl 78:89–102CrossRefGoogle Scholar
- 44.Rosvall M, Bergstrom CT (2007) An information-theoretic framework for resolving community structure in complex networks. Proc Natl Acad Sci 104(18):7327–7331CrossRefGoogle Scholar
- 45.Soumi D, Roy M, Ghosh S, Das AK, Sujata. (n.d.). A graph based clustering technique for tweet summarization, pp 4673–7231Google Scholar
- 46.Spurek P (2017) Split-and-merge tweak in cross entropy clustering. In: Computer information systems and industrial management: 16th IFIP TC8 international conference, CISIM 2017, Bialystok, Poland, June 16–18, 2017, proceedings, vol 10244, p 193Google Scholar
- 47.Staudt CL, Meyerhenke H (2016) Engineering parallel algorithms for community detection in massive networks. IEEE Trans Paral Distrib Syst 27(1):171–184CrossRefGoogle Scholar
- 48.Talbi M (2013) Une nouvelle approche de detection de communautes dans les reseaux sociaux (Unpublished doctoral dissertation). Universite du Quebec en OutaouaisGoogle Scholar
- 49.Toujani R, Akaichi J (2017) Fuzzy sentiment classification in social network Facebook’statuses mining. In: 2017 international conference on information and digital technologies (IDT), pp 393–397Google Scholar
- 50.Toujani R, Akaichi J (2015) Machine learning and metaheuristic for sentiment analysis in social networks. In: Proceedings of the metaheuristic internatianal conference (MIC’15)Google Scholar
- 51.Toujani R, Akaichi J (2017) Optimal initial partitionning for high quality hybrid hierarchical community detection in social networks. In Proceedings of the international conference on control, decision and information technologies (\({\rm {codit}}^{TM}\)17)Google Scholar
- 52.Van Laarhoven T, Marchiori E (2016) Local network community detection with continuous optimization of conductance and weighted kernel k-means. J Mach Learn Res 17(147):1–28MathSciNetzbMATHGoogle Scholar
- 53.Wang Z, Li Z, Yuan G, Sun Y, Rui X, Xiang X (2018) Tracking the evolution of overlapping communities in dynamic social networks. Knowl Based Syst 157:81–97CrossRefGoogle Scholar
- 54.Wu J, Hou Y, Jiao Y, Li Y, Li X, Jiao L (2015) Density shrinking algorithm for community detection with path based similarity. Phys A Stat Mech Appl 433:218–228CrossRefGoogle Scholar
- 55.Xi J, Zhan W, Wang Z (2016) Hierarchical community detection algorithm based on node similarity. Int J Database Theory Appl 9(6):209–218CrossRefGoogle Scholar
- 56.Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state-of-the-art and comparative study. ACM Comput Surv (CSUR) 45(4):43CrossRefzbMATHGoogle Scholar
- 57.Xu L, Dong-Yun Y (2011) Complex network community detection by local similarity. Acta Autom Sin 37(12):1520–1529zbMATHGoogle Scholar
- 58.Yang Z, Algesheimer R, Tessone CJ (2016) A comparative analysis of community detection algorithms on artificial networks. Sci Rep 6:30750CrossRefGoogle Scholar
- 59.Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473CrossRefGoogle Scholar
- 60.Zhang W, Kong F, Yang L, Chen Y, Zhang M (2018) Hierarchical community detection based on partial matrix convergence using random walks. Tsinghua Sci Technol 1:004Google Scholar
- 61.Zhi-Xiao W, Ze-chao L, Xiao-fang D, Jin-hui T (2016) Overlapping community detection based on node location analysis. Knowl Based Syst 105:225–235CrossRefGoogle Scholar
- 62.Zhou C, Feng L, Zhao Q (2018) A novel community detection method in bipartite networks. Phys A Stat Mech Appl 492:1679–1693CrossRefGoogle Scholar