Evaluating balancing on social networks through the efficient solution of correlation clustering problems

Levorato, Mario; Figueiredo, Rosa; Frota, Yuri; Drummond, Lúcia

doi:10.1007/s13675-017-0082-6

Evaluating balancing on social networks through the efficient solution of correlation clustering problems

Original Paper
Published: 31 January 2017

Volume 5, pages 467–498, (2017)
Cite this article

EURO Journal on Computational Optimization

Mario Levorato¹,
Rosa Figueiredo²,
Yuri Frota¹ &
…
Lúcia Drummond¹

298 Accesses
23 Citations
Explore all metrics

An Erratum to this article was published on 19 July 2017

This article has been updated

Abstract

One challenge for social network researchers is to evaluate balance in a social network. The degree of balance in a social group can be used as a tool to study whether and how this group evolves to a possible balanced state. The solution of clustering problems defined on signed graphs can be used as a criterion to measure the degree of balance in social networks and this measure can be obtained with the optimal solution of the correlation clustering problem, as well as a variation of it, the relaxed correlation clustering problem. However, solving these problems is no easy task, especially when large network instances need to be analyzed. In this work, we contribute to the efficient solution of both problems by developing sequential and parallel ILS metaheuristics. Then, by using our algorithms, we solve the problem of measuring the structural balance on large real-world social networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Solution of the Correlation Clustering Problem: An Application to Structural Balance

On Optimization of Complete Social Networks

Article 01 January 2019

Efficient enumeration of the optimal solutions to the correlation clustering problem

Article 16 February 2023

Change history

19 July 2017
An erratum to this article has been published.

Notes

We consider small networks those comprised of dozens of nodes, while medium-sized networks contain hundreds of elements and large-scale ones can have more than a hundred thousand nodes.
It is possible to solve the CC problem on directed graphs. One must first convert the directed graph to an undirected one: Opposite arcs with the same sign are converted to a single edge whose weight is equal to the sum of the arcs’ weights; opposite arcs with different signs become two parallel edges, each one with the original sign and weight of each arc.
The original dataset from UCI Machine Learning Repository (Bache and Lichman 2013) consists of 20, 000 messages taken from 20 newsgroups. However, since the authors’ bounding technique did not scale to the full dataset, they restricted their testbed to a subsample of 100 messages from each newsgroup, for a total of 2000 messages.
Wikipedia adminship election data has 7,000 vertices and 100,000 edges, Epinions signed social network has 131,828 vertices and 841,372 edges and Slashdot Zoo signed social network (from February 21 2009) is comprised of 82,144 vertices and 549,202 edges.
In simple terms, GRASP (Feo and Resende 1995) is a multistart metaheuristic in which each iteration consists basically of two phases: construction (Algorithm 2) and local search (Algorithm 3).
The Pajek program is for analysis and visualization of large networks, which provides features such as cluster identification, decomposition of large networks, visualization tools, as well as an implementation of several efficient algorithms for analysis of large networks, having thousands or even millions of vertices.
All instances are available in http://www.ic.uff.br/~yuri/files/CCinst.zip.
We have extracted subsets of Slashdot Zoo signed social network from February 21, 2009 (Leskovec and Krevl 2014), containing the first n vertices, where \(n \in \{600, 1000, 2000, 4000, 8000, 10{,}000\}\). Since the original Slashdot network is a signed digraph, before extracting the subsets, we have converted it into an undirected graph.
United Nations General Assembly Voting Data, by Anton Strezhnev and Erik Voeten, http://hdl.handle.net/1902.1/12379. Accessed in Apr 2016.
Instances solved in less than 100 seconds were omitted from the graph.
The target solution value is calculated as the best solution value that can be obtained by all heuristic methods when solving a specific instance.
The full tables containing the comparison between SeqGRASP and SeqILS are available in http://www.ic.uff.br/~yuri/files/CC-ADDcomp.zip.
Instances solved in less than 5 seconds were omitted from the graph.
The full tables containing the comparison between SeqGRASP and SeqILS are available in http://www.ic.uff.br/~yuri/files/CC-ADDcomp.zip.
In order to solve the CC Problem with Doreian Mrvar method inside Pajek, we followed these steps: After loading the network file, we created a random initial partition using \(1-mode\). At this point, we defined the number of clusters k of the solution. We have used the same number of clusters of the solution returned by ILS. Finally, we called the resolution method from the menu (Network \(\rightarrow \) Signed Network \(\rightarrow \) Create Partition \(\rightarrow \) Doreian Mrvar Method), specifying the number of repetitions (we set it to 100), the alpha parameter (set to 0.5 to match the CC Problem objective function) and the minimum number of vertices in clusters (set to 1).
We conducted several experiments to find the optimal number of processes to be used in the parallel algorithms. This setting is closely related to the hardware configuration of the computer cluster used in the experiments. Since each machine in the computer cluster has two quad-core CPUs (8 processor cores), it can host eight processes running in parallel. In ParILS / ParVND, we chose to group each ILS master process together with its corresponding VND search slaves, in order to maximize the performance of message exchange between related processes.
Whenever it is required to assess the performance of parallel algorithms, two metrics are applied: Speedup Su(p) measures the acceleration observed for the parallel algorithm when compared with its sequential version and efficiency E(p) measures the average fraction of time along which each process is effectively used. Thus, \(Su(p)=T(seq)/T(p)\), such that T(seq) is the time required for the sequential algorithm and T(p) the time required for the parallel algorithm run on p processors, and \(E(p)=Su(p)/p\).
The table containing the performance comparison between Parallel GRASP and Parallel ILS is available in http://www.ic.uff.br/~yuri/files/CC-ADDcomp.zip.
For a full report of these results, including the groups of countries in each solution, separated by year, see the complementary material in http://www.ic.uff.br/~yuri/files/CCcomp.zip.
Slashdot friends or foes network from Feb 21 2009: 82,144 vertices and 549,202 edges.
Wikipedia adminship election data: 7,000 vertices and 100,000 edges.
Epinions who-vote-whom network: 131,828 vertices and 841,372 edges.

References

Abell P, Ludwig M (2009) Structural balance: a dynamic perspective. J Math Sociol 33:129–155
Article Google Scholar
Aiex RM, Resende MGC, Ribeiro CC (2007) TTT plots: a perl program to create time-to-target plots. Optim Lett 1(4):355–366
Article Google Scholar
Ailon N, Charikar M, Newman A (2008) Aggregating inconsistent information: ranking and clustering. J ACM 55(5):23
Article Google Scholar
Alba E (2005) Parallel metaheuristics: a new class of algorithms, vol 47. Wiley, New York
Book Google Scholar
Allison GT (1969) Conceptual models and the cuban missile crisis. Am Polit Sci Rev 63(03):689–718
Article Google Scholar
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Bansal N, Blum A, Chawla S (2002) Correlation clustering. In: Proceedings of the 43rd annual IEEE symposium of foundations of computer science. Vancouver, Canada, pp 238–250
Bhattacharya A, De RK (2008) Divisive correlation clustering algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles. Bioinformatics 24(11):1359–1366
Article Google Scholar
Bonchi F, Gionis A, Ukkonen A (2011) Overlapping correlation clustering. 2011 IEEE 11th international conference on data mining (ICDM). IEEE, pp 51–60
Brandes U, Delling D, Gaertler M, Gorke R, Hoefer M, Nikoloski Z, Wagner D (2008) On modularity clustering. IEEE Trans Knowl Data Eng 20(2):172–188
Article Google Scholar
Brusco M (2003) An enhanced branch-and-bound algorithm for a partitioning problem. Br J Math Stat Psychol 56:83–92
Article Google Scholar
Brusco M, Doreian P, Mrvar A, Steinly D (2011) Two algorithms for relaxed structural balance partitioning: linking theory, models and data to understand social network phenomena. Sociol Methods Res 40:57–87
Article Google Scholar
Brusco MJ, Köhn H-F (2009) Clustering qualitative data based on binary equivalence relations: neighborhood search heuristics for the clique partitioning problem. Psychometrika 74(4):685–703
Article Google Scholar
Cartwright D, Harary F (1956) Structural balance: a generalization of Heider’s theory. Psychol Rev 63:277–293
Article Google Scholar
Charikara M, Guruswamib V, Wirtha A (2005) Clustering with qualitative information. J Comput Syst Sci 71:360–383
Article Google Scholar
Chiang K-Y, Hsieh C-J, Natarajan N, Tewari A, Inderjit SD (2013) Prediction and clustering in signed networks. A local to global perspective. arXiv:1302.5145
Crainic TG, Toulouse M (2010) Parallel meta-heuristics. In: Handbook of metaheuristics. Springer, US, pp 497–541
DasGupta B, Encisob GA, Sontag E, Zhanga Y (2007) Algorithmic and complexity results for decompositions of biological networks into monotone subsystems. BioSystems 90:161–178
Article Google Scholar
Davis JA (1967) Clustering and structural balance in graphs. Hum Relat 20:181–187
Article Google Scholar
De Nooy W, Mrvar A, Vladimir B (2011) Exploratory social network analysis with Pajek: revised and expanded, vol 27, 2nd edn. Cambridge University Press, Cambridge
Book Google Scholar
Demaine ED, Emanuel D, Fiat A, Immorlica N (2006) Correlation clustering in general weighted graphs. Theoret Comput Sci 361:172–187
Article Google Scholar
Den Besten M, Stützle T, Dorigo M (2001) Design of iterated local search algorithms. In: Workshops on applications of evolutionary computation. Springer Berlin, Heidelberg, pp 441–451
Doreian P, Mrvar A (1996a) A partitioning approach to structural balance. Soc Netw 18:149–168
Article Google Scholar
Doreian P, Mrvar A (2009) Partitioning signed social networks. Soc Netw 31:1–11
Article Google Scholar
Doreian P, Krackhardt D (2001) Pre-transitive balance mechanisms for signed networks*. J Math Sociol 25(1):43–67
Article Google Scholar
Doreian P, Mrvar A (1996b) Structural balance and partitioning signed graphs. Developments in data analysis, pp 195–208
Dowdall AT (2009) The birth and death of a tar baby: Henry kissinger and southern africa. Ph.D. thesis, University of Missouri–Columbia
Drummond L, Figueiredo R, Frota Y, Levorato M (2013) Efficient solution of the correlation clustering problem: an application to structural balance. In: YanTang D, Herv P (eds) OTM 2013 Workshops, LNCS, vol 8186. Springer, Berlin, pp 674–683
Duch J, Arenas A (2005) Community detection in complex networks using extremal optimization. Phys Rev E 72(2):027104
Article Google Scholar
Ekşioglu SD, Pardalos PM, Resende MGC (2002) Parallel metaheuristics for combinatorial optimization. In: Models for parallel and distributed computation. Springer, pp 179–206
Elsner M, Schudy W (2009) Bounding and comparing methods for correlation clustering beyond ILP. In: ILP’09 proceedings of the workshop on integer linear programming for natural language processing, pp 19–27
Epinions (1999) Website. http://www.epinions.com. Accessed on March 2015
Esmailian P, Abtahi SE, Jalili M (2014) Mesoscopic analysis of online social networks: the role of negative ties. Phys Rev E 90(4):042817
Article Google Scholar
Facchetti G, Iacono G, Altafini C (2011) Computing global structural balance in large-scale signed social networks. Proc Natl Acad Sci USA 108:20953–20958
Article Google Scholar
Feo TA, Resende MGC (1995) Greedy randomized adaptive search procedures. J Glob Optim 6(2):109–133
Article Google Scholar
Figueiredo R, Frota Y (2014) The maximum balanced subgraph of a signed graph: applications and solution approaches. Eur J Oper Res 236(2):473–487
Article Google Scholar
Figueiredo R, Moura G (2013) Mixed integer programming formulations for clustering problems related to structural balance. Soc Netw 35(4):639–651
Article Google Scholar
Gendreau M, Potvin JY (2010) Handbook of metaheuristics. International series in operations research and management science. Springer, Berlin
Google Scholar
Giotis I, Guruswami V (2006) Correlation clustering with a fixed number of clusters. In: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm. ACM, pp 1167–1176
Golan G (2010) Yom Kippur and after: The Soviet Union and the Middle East Crisis, vol 19. Cambridge University Press, Cambridge
Google Scholar
Golani M (1995) The historical place of the czech-egyptian arms deal, fall 1955. Middle Eastern Stud 31(4):803–827
Article Google Scholar
Gülpinar N, Gutin G, Mitra G, Zverovitch A (2004) Extracting pure network submatrices in linear programs using signed graphs. Discrete Appl Math 137:359–372
Article Google Scholar
Harary F, Lim M, Wunsch DC (2003) Signed graphs for portfolio analysis in risk management. IMA J Manag Math 13:1–10
Google Scholar
Heider F (1946) Attitudes and cognitive organization. J Psychol 21:107–112
Article Google Scholar
Huffner F, Betzler N, Niedermeier R (2010) Separator-based data reduction for signed graph balancing. J Combin Optim 20:335–360
Article Google Scholar
Inohara T (1998) On conditions for a meeting not to reach a deadlock. Appl Math Comput 90:1–9
Article Google Scholar
Kim S, Yoo CD, Nowozin S, Kohli P (2014) Image segmentation usinghigher-order correlation clustering. IEEE Trans Pattern Anal Mach Intell 36(9):1761–1774
Article Google Scholar
Kreps SE (2007) The 2006 Lebanon war: lessons learned. Parameters 37(1):72
Google Scholar
Kunegis J, Lommatzsch A, Bauckhage C (2009) The slashdot zoo: mining a social network with negative edges. In: WWW’09 Proceedings of the 18th international conference on World wide web, pp 741–750
Kunegis J, Schmidt S, Lommatzsch A, Lerner J, De Luca EW, Albayrak S (2010) Spectral analysis of signed graphs for clustering, prediction and visualization. SDM, vol 10. SIAM, pp 559–559
Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: CHI’10 Proceedings of the SIGCHI conference on human factors in computing systems, pp 1361–1370
Leskovec J, Krevl A (2014) SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data
Levorato M, Drummond L, Frota Y, Figueiredo R (2015) An ILS algorithm to evaluate structural balance in signed social networks. In: Symposium on applied computing, SAC 2015, Salamanca, Spain—April 13–17, pp 1117–1122
Lourenço HR, Martin OC, Stützle T (2003) Iterated local search. Springer, Berlin
Book Google Scholar
Macon KT, Mucha PJ, Porter MA (2012) Community structure in the united nations general assembly. Phys A 391:343–361
Article Google Scholar
McGreal C (2006) Brothers in arms-Israel’s secret pact with pretoria. Guardian 7. https://www.theguardian.com/world/2006/feb/07/southafrica.israel. Accessed 23 Jan 2017
Mearsheimer JJ, Walt SM (2006) The Israel lobby and us foreign policy. Middle East Policy 13(3):29–87
Article Google Scholar
Mehrotra A, Trick MA (1998) Cliques and clustering: a combinatorial approach. Oper Res Lett 22(1):1–12
Article Google Scholar
Mladenović N, Hansen P (1997) Variable neighborhood search. Comput Oper Res 24(11):1097–1100
Article Google Scholar
Munem BA (2008) Canada and peace in the middle east. http://www.palestine1.net/can&p-e.htm. Accessed on Jan 2015
Nascimento MC, Pitsoulis L (2013) Community detection by modularity maximization using GRASP with path relinking. Comput Oper Res 40(12):3121–3131
Nesbitt FN (2004) Race for sanctions: African Americans against apartheid, 1946–1994. Indiana University Press, Bloomington
Google Scholar
Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103(23):8577–8582
Article Google Scholar
Pérez-Stable M (1993) The Cuban revolution: origins, course, and legacy. Oxford University Press, New York
Google Scholar
Ruiz R, Stützle T (2007) A simple and effective iterated greedy algorithm for the permutation flowshop scheduling problem. Eur J Oper Res 177(3):2033–2049
Article Google Scholar
Slashdot Website (1997) http://slashdot.org. Accessed on March 2015
Smith CD (2010) Palestine and the Arab-Israeli conflict:[a history with documents]. Bedford/St. Martin’s,
Srinivasan A (2011) Local balancing influences global structure in social networks. Proc Natl Acad Sci USA 108:1751–1752
Article Google Scholar
Stinnett DM, Tir J, Diehl PF, Schafer P, Gochman C (2002) The correlates of war (cow) project direct contiguity data, version 3.0. Confl Manag Peace Sci 19:59–67
Article Google Scholar
Swamy Chaitanya (2004) Correlation clustering: maximizing agreements via semidefinite programming. In: Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, pp 526–527
Traag VA, Bruggeman J (2009) Community detection in networks with positive and negative links. Phys Rev E 80:036115
Article Google Scholar
Wang Ning, Li Jie (2013) Restoring: A greedy heuristic approach based on neighborhood for correlation clustering. In: Advanced data mining and applications. Springer, pp 348–359
Yang B, Cheung WK, Liu J (2007) Community mining from signed social networks. IEEE Trans Knowl Data Eng 19:1333–1348
Article Google Scholar
Zhang S, Wang R-S, Zhang X-S (2007) Identification of overlapping community structure in complex networks using fuzzy c-means clustering. Phys A 374(1):483–490
Article Google Scholar
Zhang Z, Cheng H, Chen W, Zhang S, Fang Q (2008) Correlation clustering based on genetic algorithm for documents clustering. IEEE congress on evolutionary computation, pp 3193–3198

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper.

Author information

Authors and Affiliations

Department of Computer Science, Fluminense Federal University, Niterói, RJ, 24210-240, Brazil
Mario Levorato, Yuri Frota & Lúcia Drummond
Laboratoire d’Informatique d’Avignon, University of Avignon, 84911, Avignon, France
Rosa Figueiredo

Authors

Mario Levorato
View author publications
You can also search for this author in PubMed Google Scholar
Rosa Figueiredo
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Frota
View author publications
You can also search for this author in PubMed Google Scholar
Lúcia Drummond
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rosa Figueiredo.

Additional information

An erratum to this article is available at https://doi.org/10.1007/s13675-017-0087-1.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 218 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Levorato, M., Figueiredo, R., Frota, Y. et al. Evaluating balancing on social networks through the efficient solution of correlation clustering problems. EURO J Comput Optim 5, 467–498 (2017). https://doi.org/10.1007/s13675-017-0082-6

Download citation

Received: 26 July 2016
Accepted: 15 January 2017
Published: 31 January 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s13675-017-0082-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating balancing on social networks through the efficient solution of correlation clustering problems

Abstract

Access this article

Similar content being viewed by others

Efficient Solution of the Correlation Clustering Problem: An Application to Structural Balance

On Optimization of Complete Social Networks

Efficient enumeration of the optimal solutions to the correlation clustering problem

Change history

19 July 2017

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 218 KB)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Evaluating balancing on social networks through the efficient solution of correlation clustering problems

Abstract

Access this article

Similar content being viewed by others

Efficient Solution of the Correlation Clustering Problem: An Application to Structural Balance

On Optimization of Complete Social Networks

Efficient enumeration of the optimal solutions to the correlation clustering problem

Change history

19 July 2017

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 218 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation