Advertisement

An approach based on mixed hierarchical clustering and optimization for graph analysis in social media network: toward globally hierarchical community structure

  • Radhia ToujaniEmail author
  • Jalel Akaichi
Regular Paper
  • 8 Downloads

Abstract

As the massive size of contemporary social networks poses a serious challenge to the scalability of traditional graph clustering algorithms and the evaluation of discovered communities, we develop, in this manuscript, an approach used to discover hierarchical community structure in large networks. The introduced hybrid technique combines the strengths of bottom-up hierarchical clustering method with that of top-down hierarchical clustering. In fact, the first approach is efficient in identifying small clusters, while the second one is good at determining large ones. Our mixed hierarchical clustering technique, based on the assumption that there exists an initial solution composed of k classes and the combination of the two previously mentioned methods, does not the change of the number of partitions, modifies the repartition of the initial classes. At the end of the introduced clustering process, a fixed point, representing a local optimum of the cost function which measures the degree of importance between two partitions, is obtained. Consequently, the introduced combined model leads to the emergence of local community structure. To avoid this local optimum and detect community structure converged to the global optimum of the cost function, the detection of community structures, in this study, is not considered only as a clustering problem, but as an optimization issue. Besides, a novel mixed hierarchical clustering algorithm based on swarms intelligence is suggested for identifying community structures in social networks. In order to validate the proposed method, performances of the introduced approach are evaluated using both real and artificial networks as well as internal and external clustering evaluation criteria.

Keywords

Social networks Social network analysis Graph analysis Community detection Hierarchical clustering Bottom-up clustering Top-down clustering Mixed clustering Swarms intelligence Ant colony optimization Bee colony optimization 

Abbreviations

SHC

Similarity-based hierarchical community

HAMUHI-CODE

Heuristic algorithm for multi-scale hierarchical community detection

PMAC

Partial matrix approximation convergence

SN

Social network

JS

Jaccard similarity measure

AgA

Agglomerative algorithm

DST

Dependence similarity table

AHL

Ascendant hierarchical level

DivA

Divisive algorithm

DHL

Descendant hierarchical level

MHA

Mixed hierarchical algorithm

T-D-H-L

Top-down hierarchical level

B-U-H-L

Bottom-up hierarchical level

MHAS

Mixed hierarchical algorithm-based swarms

AntCDivA

Ant colony-based divisive algorithm

BeeCAgA

Bee colony-based agglomerative algorithm

LFR benchmark

Lancichinetti Fortunato Radicchi benchmark

CEC

Cross-entropy clustering

NMI

Normalized mutual information

DBI

Davies–Bouldin index

PGP

Pretty good privacy

SI

Swarm intelligence

List of symbols

\(Q_\mathrm{comb}\)

Combined modularity function

\(Q_\mathrm{comb}\)

Separated modularity function

\(\mathrm{SN} = (V; E; \mu )\)

Graph modeling SN

V

Nodes representing to social network members

E

Edges modeling the relationship between social network members

\(\mu \)

Weight of edges

n

Number of nodes

\(\ell \)

Hierarchical level

k

Number of sub-detected partitions at each hierarchical level

\(P=\{p_{1},p_{2},\ldots ,p_{s}\}\), \(G=\{g_{1},g_{2},\ldots ,g_{r}\}\), \(C=\{c_{1},c_{2},\ldots ,c_{s}\}\)

SN detected partitions

\(p_{1},p_{2},\ldots ,p_{s}\), \(g_{1},g_{2},\ldots ,g_{r}\), \(c_{1},c_{2},\ldots ,c_{s}\)

Sub-partitions

m

Social network members’

D

Any element contained in SN partitions

A[ij]

The adjacency matrix of SN

\(\overline{A}{[}i{]}\)

Average of the vector A[i]

cov(\(E_{i,j}\))

Covariance function

Op(\(V_{i}\))

Extracted opinions from the node\(V_{i}\)

Op(\(V_{j}\))

Extracted opinions from the node\(V_{j}\).

\(N_{i}\)

Neighbor of node i

\(N_{j}\)

Neighbor of node j

\(Score_{importantOp}\)

Function measuring the degree of importance of nodes

\(GScore_{importantOp}\)

General \(GScore_{importantOp}\)

\(MoyScore_{importantOp}\)

Average of \(Score_{importantOp}\) of sub-partitions

Initpart

Initial partition

cordMin

Function returning m having the least \(Score_{importantOp}\) value

cordMax

Function returning m having the highest \(Score_{importantOp}\) value

\(Q_{DS}\)

Dependance similarity-based modularity

\(AgQ_{DS}\)

\(Q_{DS}\) function for BeeCAgA

\(DivQ_{DS}\)

\(Q_{DS}\) function for AntCDivA

\(MixQ_{DS}\)

\(Q_{DS}\) function for MHAS

E

Energy function

Notes

References

  1. 1.
    Aggarwal CC (2011) An introduction to social network data analytics. In: Social network data analytics. Springer, Berlin, pp 1–15Google Scholar
  2. 2.
    Ahn JP, Bagrow Y-Y, Lehmann S (2010) Link communities reveal multi-scale complexity in networks. Nature 446:761CrossRefGoogle Scholar
  3. 3.
    Ahn Y-Y, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466(7307):761CrossRefGoogle Scholar
  4. 4.
    Ahn YY, Lehmann S, Bagrow JP (2009) Communities and hierarchical organization of links in complex networks. arXiv:0903.3178
  5. 5.
    Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008CrossRefGoogle Scholar
  6. 6.
    Boguná M, Pastor-Satorras R, Díaz-Guilera A, Arenas A (2004) Models of social networks based on social distance attachment. Phys Rev E 70(5):056122CrossRefGoogle Scholar
  7. 7.
    Cai Q, Ma L, Gong M, Tian D (2016) A survey on network community detection based on evolutionary computation. Int J Bio Inspir Comput 8(2):84–98CrossRefGoogle Scholar
  8. 8.
    Castrillo E, Leon E, Gomez J (2017) Fast heuristic algorithm for multi-scale hierarchical community detection. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp 982–989Google Scholar
  9. 9.
    Clauset A, Newman ME, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111CrossRefGoogle Scholar
  10. 10.
    Danon L, DÃaz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp 2005(09), P09008. Retrieved from http://stacks.iop.org/1742-5468/2005/i=09/a=P09008
  11. 11.
    Duch J, Arenas A (2005) Community detection in complex networks using extremal optimization. Phys Rev E 72(2):027104CrossRefGoogle Scholar
  12. 12.
    Dutta S, Ghatak S, Roy M, Ghosh S, Das AK (2015) A graph based clustering technique for tweet summarization. In: 2015 4th international conference on reliability, infocom technologies and optimization (ICRITO) (trends and future directions), pp 1–6Google Scholar
  13. 13.
    Fortunato S (2011) Benchmark graphs to test community detection algorithms. https://sites.google.com/site/santofortunato/inthepress2)
  14. 14.
    Fortunato S, Barthelemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci 104(1):36–41CrossRefGoogle Scholar
  15. 15.
    Fortunato S (2007) Community detection in graphs. Phys Rep 486:75–174MathSciNetCrossRefGoogle Scholar
  16. 16.
    Frenken K, Mendritzki S (2012) Optimal modularity: a demonstration of the evolutionary advantage of modular architectures. J Evol Econ 22(5):935–956CrossRefGoogle Scholar
  17. 17.
    Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Gonzalez-Pardo A, Jung JJ, Camacho D (2017) Aco-based clustering for ego network analysis. Fut Gener Comput Syst 66:160–170CrossRefGoogle Scholar
  19. 19.
    Guimera R, Sales-Pardo M, Amaral LAN (2007) Module identification in bipartite and directed networks. Retrieved from http://arxiv.org/abs/physics/0701151 (cite arXiv:physics/0701151)
  20. 20.
    Gulbahce N, Lehmann S (2008) The art of community detection. BioEssays 30(10):934–938CrossRefGoogle Scholar
  21. 21.
    Harrington J, Salibián-Barrera M (2010) Finding approximate solutions to combinatorial problems with very large data sets using birch. Comput Stat Data Anal 54(3):655–667MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Herrmann S, Ochoa G, Rothlauf F (2016) Communities of local optima as funnels in fitness landscapes. In: Proceedings of the genetic and evolutionary computation conference 2016, pp 325–331Google Scholar
  23. 23.
    John Lu Z (2010) The elements of statistical learning: data mining, inference, and prediction. J R Stat Soc Ser A (Stat Soc) 173(3):693–694CrossRefGoogle Scholar
  24. 24.
    Kim B, Kim J, Yi G (2017) Analysis of clustering evaluation considering features of item response data using data mining technique for setting cut-off scores. Symmetry 9(5):62CrossRefGoogle Scholar
  25. 25.
    Kim Y, Son S-W, Jeong H (2010) Finding communities in directed networks. Phys Rev E 81(1):016103CrossRefGoogle Scholar
  26. 26.
    Li Y, He K, Bindel D, Hopcroft J (2015) Overlapping community detection via local spectral clustering. arXiv preprint arXiv:1509.07996
  27. 27.
    Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Assoc Inf Sci Technol 58(7):1019–1031CrossRefGoogle Scholar
  28. 28.
    Liu Y, Wang Q, Wang Q, Yao Q, Liu Y (2007) Email community detection using artificial ant colony clustering. In: Advances in web and network technologies, and information management. Springer, Berlin, pp 287–298Google Scholar
  29. 29.
    LIU Y, YANG T, FU L, LIU J (2015) Community detection in networks based on information bottleneck clustering. J Comput Inf Syst 11(2):693–700Google Scholar
  30. 30.
    Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E, Dawson SM (2003) The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav Ecol Sociobiol 54(4):396–405CrossRefGoogle Scholar
  31. 31.
    Mathias SB, Rosset V, Nascimento M (2016) Community detection by consensus genetic-based algorithm for directed networks. Proc Comput Sci 96:90–99CrossRefGoogle Scholar
  32. 32.
    Moradi P, Rostami M (2015) Integration of graph clustering with ant colony optimization for feature selection. Knowl Based Syst 84:144–161CrossRefGoogle Scholar
  33. 33.
    Newman M (2004) Detecting community structure in networks. Eur Phys J 38:321–330CrossRefGoogle Scholar
  34. 34.
    Newman ME (2006a) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104MathSciNetCrossRefGoogle Scholar
  35. 35.
    Newman ME (2006b) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582CrossRefGoogle Scholar
  36. 36.
    Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113CrossRefGoogle Scholar
  37. 37.
    Papadopoulos KYVAS, Spyridonos P (2012) Community detection in social media. Data Min Knowl Discov 24:515–554CrossRefGoogle Scholar
  38. 38.
    Pons P, Latapy M (2005) Computing communities in large networks using random walks. In: Computer and information sciences-ISCIS 2005. Springer, Berlin, pp 284–293Google Scholar
  39. 39.
    Ratkiewicz J, Conover M, Meiss MR, Goncalves B, Flammini, A., Menczer F (2011) Detecting and tracking political abuse in social media. In: ICWSM11, pp 297–304Google Scholar
  40. 40.
    Ravasz E, Barabasi A-L (2003) Hierarchical organization in complex networks. Phys Rev E67(2):026112zbMATHGoogle Scholar
  41. 41.
    Rees BS, Gallagher KB (2012) Overlapping community detection using a community optimized graph swarm. Soc Netw Anal Min 2(4):405–417CrossRefGoogle Scholar
  42. 42.
    Richardson T, Mucha PJ, Porter MA (2009) Spectral tripartitioning of networks spectral tripartitioning of networks. Phys Rev E 80(3):036111CrossRefGoogle Scholar
  43. 43.
    Rosset V, Paulo MA, Cespedes JG, Nascimento M (2017) Enhancing the reliability on data delivery and energy efficiency by combining swarm intelligence and community detection in large-scale WSNs. Exp Syst Appl 78:89–102CrossRefGoogle Scholar
  44. 44.
    Rosvall M, Bergstrom CT (2007) An information-theoretic framework for resolving community structure in complex networks. Proc Natl Acad Sci 104(18):7327–7331CrossRefGoogle Scholar
  45. 45.
    Soumi D, Roy M, Ghosh S, Das AK, Sujata. (n.d.). A graph based clustering technique for tweet summarization, pp 4673–7231Google Scholar
  46. 46.
    Spurek P (2017) Split-and-merge tweak in cross entropy clustering. In: Computer information systems and industrial management: 16th IFIP TC8 international conference, CISIM 2017, Bialystok, Poland, June 16–18, 2017, proceedings, vol 10244, p 193Google Scholar
  47. 47.
    Staudt CL, Meyerhenke H (2016) Engineering parallel algorithms for community detection in massive networks. IEEE Trans Paral Distrib Syst 27(1):171–184CrossRefGoogle Scholar
  48. 48.
    Talbi M (2013) Une nouvelle approche de detection de communautes dans les reseaux sociaux (Unpublished doctoral dissertation). Universite du Quebec en OutaouaisGoogle Scholar
  49. 49.
    Toujani R, Akaichi J (2017) Fuzzy sentiment classification in social network Facebook’statuses mining. In: 2017 international conference on information and digital technologies (IDT), pp 393–397Google Scholar
  50. 50.
    Toujani R, Akaichi J (2015) Machine learning and metaheuristic for sentiment analysis in social networks. In: Proceedings of the metaheuristic internatianal conference (MIC’15)Google Scholar
  51. 51.
    Toujani R, Akaichi J (2017) Optimal initial partitionning for high quality hybrid hierarchical community detection in social networks. In Proceedings of the international conference on control, decision and information technologies (\({\rm {codit}}^{TM}\)17)Google Scholar
  52. 52.
    Van Laarhoven T, Marchiori E (2016) Local network community detection with continuous optimization of conductance and weighted kernel k-means. J Mach Learn Res 17(147):1–28MathSciNetzbMATHGoogle Scholar
  53. 53.
    Wang Z, Li Z, Yuan G, Sun Y, Rui X, Xiang X (2018) Tracking the evolution of overlapping communities in dynamic social networks. Knowl Based Syst 157:81–97CrossRefGoogle Scholar
  54. 54.
    Wu J, Hou Y, Jiao Y, Li Y, Li X, Jiao L (2015) Density shrinking algorithm for community detection with path based similarity. Phys A Stat Mech Appl 433:218–228CrossRefGoogle Scholar
  55. 55.
    Xi J, Zhan W, Wang Z (2016) Hierarchical community detection algorithm based on node similarity. Int J Database Theory Appl 9(6):209–218CrossRefGoogle Scholar
  56. 56.
    Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state-of-the-art and comparative study. ACM Comput Surv (CSUR) 45(4):43CrossRefzbMATHGoogle Scholar
  57. 57.
    Xu L, Dong-Yun Y (2011) Complex network community detection by local similarity. Acta Autom Sin 37(12):1520–1529zbMATHGoogle Scholar
  58. 58.
    Yang Z, Algesheimer R, Tessone CJ (2016) A comparative analysis of community detection algorithms on artificial networks. Sci Rep 6:30750CrossRefGoogle Scholar
  59. 59.
    Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473CrossRefGoogle Scholar
  60. 60.
    Zhang W, Kong F, Yang L, Chen Y, Zhang M (2018) Hierarchical community detection based on partial matrix convergence using random walks. Tsinghua Sci Technol 1:004Google Scholar
  61. 61.
    Zhi-Xiao W, Ze-chao L, Xiao-fang D, Jin-hui T (2016) Overlapping community detection based on node location analysis. Knowl Based Syst 105:225–235CrossRefGoogle Scholar
  62. 62.
    Zhou C, Feng L, Zhao Q (2018) A novel community detection method in bipartite networks. Phys A Stat Mech Appl 492:1679–1693CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.BESTMOD Department, Higher Institute of ManagementUniversity of TunisTunisTunisia
  2. 2.College of Computer ScienceUniversity of BishaBishaSaudi Arabia

Personalised recommendations