Abstract
One challenge for social network researchers is to evaluate balance in a social network. The degree of balance in a social group can be used as a tool to study whether and how this group evolves to a possible balanced state. The solution of clustering problems defined on signed graphs can be used as a criterion to measure the degree of balance in social networks and this measure can be obtained with the optimal solution of the correlation clustering problem, as well as a variation of it, the relaxed correlation clustering problem. However, solving these problems is no easy task, especially when large network instances need to be analyzed. In this work, we contribute to the efficient solution of both problems by developing sequential and parallel ILS metaheuristics. Then, by using our algorithms, we solve the problem of measuring the structural balance on large real-world social networks.
Similar content being viewed by others
Change history
19 July 2017
An erratum to this article has been published.
Notes
We consider small networks those comprised of dozens of nodes, while medium-sized networks contain hundreds of elements and large-scale ones can have more than a hundred thousand nodes.
It is possible to solve the CC problem on directed graphs. One must first convert the directed graph to an undirected one: Opposite arcs with the same sign are converted to a single edge whose weight is equal to the sum of the arcs’ weights; opposite arcs with different signs become two parallel edges, each one with the original sign and weight of each arc.
The original dataset from UCI Machine Learning Repository (Bache and Lichman 2013) consists of 20, 000 messages taken from 20 newsgroups. However, since the authors’ bounding technique did not scale to the full dataset, they restricted their testbed to a subsample of 100 messages from each newsgroup, for a total of 2000 messages.
Wikipedia adminship election data has 7,000 vertices and 100,000 edges, Epinions signed social network has 131,828 vertices and 841,372 edges and Slashdot Zoo signed social network (from February 21 2009) is comprised of 82,144 vertices and 549,202 edges.
The Pajek program is for analysis and visualization of large networks, which provides features such as cluster identification, decomposition of large networks, visualization tools, as well as an implementation of several efficient algorithms for analysis of large networks, having thousands or even millions of vertices.
All instances are available in http://www.ic.uff.br/~yuri/files/CCinst.zip.
We have extracted subsets of Slashdot Zoo signed social network from February 21, 2009 (Leskovec and Krevl 2014), containing the first n vertices, where \(n \in \{600, 1000, 2000, 4000, 8000, 10{,}000\}\). Since the original Slashdot network is a signed digraph, before extracting the subsets, we have converted it into an undirected graph.
United Nations General Assembly Voting Data, by Anton Strezhnev and Erik Voeten, http://hdl.handle.net/1902.1/12379. Accessed in Apr 2016.
Instances solved in less than 100 seconds were omitted from the graph.
The target solution value is calculated as the best solution value that can be obtained by all heuristic methods when solving a specific instance.
The full tables containing the comparison between SeqGRASP and SeqILS are available in http://www.ic.uff.br/~yuri/files/CC-ADDcomp.zip.
Instances solved in less than 5 seconds were omitted from the graph.
The full tables containing the comparison between SeqGRASP and SeqILS are available in http://www.ic.uff.br/~yuri/files/CC-ADDcomp.zip.
In order to solve the CC Problem with Doreian Mrvar method inside Pajek, we followed these steps: After loading the network file, we created a random initial partition using \(1-mode\). At this point, we defined the number of clusters k of the solution. We have used the same number of clusters of the solution returned by ILS. Finally, we called the resolution method from the menu (Network \(\rightarrow \) Signed Network \(\rightarrow \) Create Partition \(\rightarrow \) Doreian Mrvar Method), specifying the number of repetitions (we set it to 100), the alpha parameter (set to 0.5 to match the CC Problem objective function) and the minimum number of vertices in clusters (set to 1).
We conducted several experiments to find the optimal number of processes to be used in the parallel algorithms. This setting is closely related to the hardware configuration of the computer cluster used in the experiments. Since each machine in the computer cluster has two quad-core CPUs (8 processor cores), it can host eight processes running in parallel. In ParILS / ParVND, we chose to group each ILS master process together with its corresponding VND search slaves, in order to maximize the performance of message exchange between related processes.
Whenever it is required to assess the performance of parallel algorithms, two metrics are applied: Speedup Su(p) measures the acceleration observed for the parallel algorithm when compared with its sequential version and efficiency E(p) measures the average fraction of time along which each process is effectively used. Thus, \(Su(p)=T(seq)/T(p)\), such that T(seq) is the time required for the sequential algorithm and T(p) the time required for the parallel algorithm run on p processors, and \(E(p)=Su(p)/p\).
The table containing the performance comparison between Parallel GRASP and Parallel ILS is available in http://www.ic.uff.br/~yuri/files/CC-ADDcomp.zip.
For a full report of these results, including the groups of countries in each solution, separated by year, see the complementary material in http://www.ic.uff.br/~yuri/files/CCcomp.zip.
Slashdot friends or foes network from Feb 21 2009: 82,144 vertices and 549,202 edges.
Wikipedia adminship election data: 7,000 vertices and 100,000 edges.
Epinions who-vote-whom network: 131,828 vertices and 841,372 edges.
References
Abell P, Ludwig M (2009) Structural balance: a dynamic perspective. J Math Sociol 33:129–155
Aiex RM, Resende MGC, Ribeiro CC (2007) TTT plots: a perl program to create time-to-target plots. Optim Lett 1(4):355–366
Ailon N, Charikar M, Newman A (2008) Aggregating inconsistent information: ranking and clustering. J ACM 55(5):23
Alba E (2005) Parallel metaheuristics: a new class of algorithms, vol 47. Wiley, New York
Allison GT (1969) Conceptual models and the cuban missile crisis. Am Polit Sci Rev 63(03):689–718
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Bansal N, Blum A, Chawla S (2002) Correlation clustering. In: Proceedings of the 43rd annual IEEE symposium of foundations of computer science. Vancouver, Canada, pp 238–250
Bhattacharya A, De RK (2008) Divisive correlation clustering algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles. Bioinformatics 24(11):1359–1366
Bonchi F, Gionis A, Ukkonen A (2011) Overlapping correlation clustering. 2011 IEEE 11th international conference on data mining (ICDM). IEEE, pp 51–60
Brandes U, Delling D, Gaertler M, Gorke R, Hoefer M, Nikoloski Z, Wagner D (2008) On modularity clustering. IEEE Trans Knowl Data Eng 20(2):172–188
Brusco M (2003) An enhanced branch-and-bound algorithm for a partitioning problem. Br J Math Stat Psychol 56:83–92
Brusco M, Doreian P, Mrvar A, Steinly D (2011) Two algorithms for relaxed structural balance partitioning: linking theory, models and data to understand social network phenomena. Sociol Methods Res 40:57–87
Brusco MJ, Köhn H-F (2009) Clustering qualitative data based on binary equivalence relations: neighborhood search heuristics for the clique partitioning problem. Psychometrika 74(4):685–703
Cartwright D, Harary F (1956) Structural balance: a generalization of Heider’s theory. Psychol Rev 63:277–293
Charikara M, Guruswamib V, Wirtha A (2005) Clustering with qualitative information. J Comput Syst Sci 71:360–383
Chiang K-Y, Hsieh C-J, Natarajan N, Tewari A, Inderjit SD (2013) Prediction and clustering in signed networks. A local to global perspective. arXiv:1302.5145
Crainic TG, Toulouse M (2010) Parallel meta-heuristics. In: Handbook of metaheuristics. Springer, US, pp 497–541
DasGupta B, Encisob GA, Sontag E, Zhanga Y (2007) Algorithmic and complexity results for decompositions of biological networks into monotone subsystems. BioSystems 90:161–178
Davis JA (1967) Clustering and structural balance in graphs. Hum Relat 20:181–187
De Nooy W, Mrvar A, Vladimir B (2011) Exploratory social network analysis with Pajek: revised and expanded, vol 27, 2nd edn. Cambridge University Press, Cambridge
Demaine ED, Emanuel D, Fiat A, Immorlica N (2006) Correlation clustering in general weighted graphs. Theoret Comput Sci 361:172–187
Den Besten M, Stützle T, Dorigo M (2001) Design of iterated local search algorithms. In: Workshops on applications of evolutionary computation. Springer Berlin, Heidelberg, pp 441–451
Doreian P, Mrvar A (1996a) A partitioning approach to structural balance. Soc Netw 18:149–168
Doreian P, Mrvar A (2009) Partitioning signed social networks. Soc Netw 31:1–11
Doreian P, Krackhardt D (2001) Pre-transitive balance mechanisms for signed networks*. J Math Sociol 25(1):43–67
Doreian P, Mrvar A (1996b) Structural balance and partitioning signed graphs. Developments in data analysis, pp 195–208
Dowdall AT (2009) The birth and death of a tar baby: Henry kissinger and southern africa. Ph.D. thesis, University of Missouri–Columbia
Drummond L, Figueiredo R, Frota Y, Levorato M (2013) Efficient solution of the correlation clustering problem: an application to structural balance. In: YanTang D, Herv P (eds) OTM 2013 Workshops, LNCS, vol 8186. Springer, Berlin, pp 674–683
Duch J, Arenas A (2005) Community detection in complex networks using extremal optimization. Phys Rev E 72(2):027104
Ekşioglu SD, Pardalos PM, Resende MGC (2002) Parallel metaheuristics for combinatorial optimization. In: Models for parallel and distributed computation. Springer, pp 179–206
Elsner M, Schudy W (2009) Bounding and comparing methods for correlation clustering beyond ILP. In: ILP’09 proceedings of the workshop on integer linear programming for natural language processing, pp 19–27
Epinions (1999) Website. http://www.epinions.com. Accessed on March 2015
Esmailian P, Abtahi SE, Jalili M (2014) Mesoscopic analysis of online social networks: the role of negative ties. Phys Rev E 90(4):042817
Facchetti G, Iacono G, Altafini C (2011) Computing global structural balance in large-scale signed social networks. Proc Natl Acad Sci USA 108:20953–20958
Feo TA, Resende MGC (1995) Greedy randomized adaptive search procedures. J Glob Optim 6(2):109–133
Figueiredo R, Frota Y (2014) The maximum balanced subgraph of a signed graph: applications and solution approaches. Eur J Oper Res 236(2):473–487
Figueiredo R, Moura G (2013) Mixed integer programming formulations for clustering problems related to structural balance. Soc Netw 35(4):639–651
Gendreau M, Potvin JY (2010) Handbook of metaheuristics. International series in operations research and management science. Springer, Berlin
Giotis I, Guruswami V (2006) Correlation clustering with a fixed number of clusters. In: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm. ACM, pp 1167–1176
Golan G (2010) Yom Kippur and after: The Soviet Union and the Middle East Crisis, vol 19. Cambridge University Press, Cambridge
Golani M (1995) The historical place of the czech-egyptian arms deal, fall 1955. Middle Eastern Stud 31(4):803–827
Gülpinar N, Gutin G, Mitra G, Zverovitch A (2004) Extracting pure network submatrices in linear programs using signed graphs. Discrete Appl Math 137:359–372
Harary F, Lim M, Wunsch DC (2003) Signed graphs for portfolio analysis in risk management. IMA J Manag Math 13:1–10
Heider F (1946) Attitudes and cognitive organization. J Psychol 21:107–112
Huffner F, Betzler N, Niedermeier R (2010) Separator-based data reduction for signed graph balancing. J Combin Optim 20:335–360
Inohara T (1998) On conditions for a meeting not to reach a deadlock. Appl Math Comput 90:1–9
Kim S, Yoo CD, Nowozin S, Kohli P (2014) Image segmentation usinghigher-order correlation clustering. IEEE Trans Pattern Anal Mach Intell 36(9):1761–1774
Kreps SE (2007) The 2006 Lebanon war: lessons learned. Parameters 37(1):72
Kunegis J, Lommatzsch A, Bauckhage C (2009) The slashdot zoo: mining a social network with negative edges. In: WWW’09 Proceedings of the 18th international conference on World wide web, pp 741–750
Kunegis J, Schmidt S, Lommatzsch A, Lerner J, De Luca EW, Albayrak S (2010) Spectral analysis of signed graphs for clustering, prediction and visualization. SDM, vol 10. SIAM, pp 559–559
Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: CHI’10 Proceedings of the SIGCHI conference on human factors in computing systems, pp 1361–1370
Leskovec J, Krevl A (2014) SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data
Levorato M, Drummond L, Frota Y, Figueiredo R (2015) An ILS algorithm to evaluate structural balance in signed social networks. In: Symposium on applied computing, SAC 2015, Salamanca, Spain—April 13–17, pp 1117–1122
Lourenço HR, Martin OC, Stützle T (2003) Iterated local search. Springer, Berlin
Macon KT, Mucha PJ, Porter MA (2012) Community structure in the united nations general assembly. Phys A 391:343–361
McGreal C (2006) Brothers in arms-Israel’s secret pact with pretoria. Guardian 7. https://www.theguardian.com/world/2006/feb/07/southafrica.israel. Accessed 23 Jan 2017
Mearsheimer JJ, Walt SM (2006) The Israel lobby and us foreign policy. Middle East Policy 13(3):29–87
Mehrotra A, Trick MA (1998) Cliques and clustering: a combinatorial approach. Oper Res Lett 22(1):1–12
Mladenović N, Hansen P (1997) Variable neighborhood search. Comput Oper Res 24(11):1097–1100
Munem BA (2008) Canada and peace in the middle east. http://www.palestine1.net/can&p-e.htm. Accessed on Jan 2015
Nascimento MC, Pitsoulis L (2013) Community detection by modularity maximization using GRASP with path relinking. Comput Oper Res 40(12):3121–3131
Nesbitt FN (2004) Race for sanctions: African Americans against apartheid, 1946–1994. Indiana University Press, Bloomington
Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103(23):8577–8582
Pérez-Stable M (1993) The Cuban revolution: origins, course, and legacy. Oxford University Press, New York
Ruiz R, Stützle T (2007) A simple and effective iterated greedy algorithm for the permutation flowshop scheduling problem. Eur J Oper Res 177(3):2033–2049
Slashdot Website (1997) http://slashdot.org. Accessed on March 2015
Smith CD (2010) Palestine and the Arab-Israeli conflict:[a history with documents]. Bedford/St. Martin’s,
Srinivasan A (2011) Local balancing influences global structure in social networks. Proc Natl Acad Sci USA 108:1751–1752
Stinnett DM, Tir J, Diehl PF, Schafer P, Gochman C (2002) The correlates of war (cow) project direct contiguity data, version 3.0. Confl Manag Peace Sci 19:59–67
Swamy Chaitanya (2004) Correlation clustering: maximizing agreements via semidefinite programming. In: Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, pp 526–527
Traag VA, Bruggeman J (2009) Community detection in networks with positive and negative links. Phys Rev E 80:036115
Wang Ning, Li Jie (2013) Restoring: A greedy heuristic approach based on neighborhood for correlation clustering. In: Advanced data mining and applications. Springer, pp 348–359
Yang B, Cheung WK, Liu J (2007) Community mining from signed social networks. IEEE Trans Knowl Data Eng 19:1333–1348
Zhang S, Wang R-S, Zhang X-S (2007) Identification of overlapping community structure in complex networks using fuzzy c-means clustering. Phys A 374(1):483–490
Zhang Z, Cheng H, Chen W, Zhang S, Fang Q (2008) Correlation clustering based on genetic algorithm for documents clustering. IEEE congress on evolutionary computation, pp 3193–3198
Acknowledgements
The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
An erratum to this article is available at https://doi.org/10.1007/s13675-017-0087-1.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Levorato, M., Figueiredo, R., Frota, Y. et al. Evaluating balancing on social networks through the efficient solution of correlation clustering problems. EURO J Comput Optim 5, 467–498 (2017). https://doi.org/10.1007/s13675-017-0082-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13675-017-0082-6