Skip to main content
Log in

Evaluating balancing on social networks through the efficient solution of correlation clustering problems

  • Original Paper
  • Published:
EURO Journal on Computational Optimization

An Erratum to this article was published on 19 July 2017

This article has been updated

Abstract

One challenge for social network researchers is to evaluate balance in a social network. The degree of balance in a social group can be used as a tool to study whether and how this group evolves to a possible balanced state. The solution of clustering problems defined on signed graphs can be used as a criterion to measure the degree of balance in social networks and this measure can be obtained with the optimal solution of the correlation clustering problem, as well as a variation of it, the relaxed correlation clustering problem. However, solving these problems is no easy task, especially when large network instances need to be analyzed. In this work, we contribute to the efficient solution of both problems by developing sequential and parallel ILS metaheuristics. Then, by using our algorithms, we solve the problem of measuring the structural balance on large real-world social networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Change history

  • 19 July 2017

    An erratum to this article has been published.

Notes

  1. We consider small networks those comprised of dozens of nodes, while medium-sized networks contain hundreds of elements and large-scale ones can have more than a hundred thousand nodes.

  2. It is possible to solve the CC problem on directed graphs. One must first convert the directed graph to an undirected one: Opposite arcs with the same sign are converted to a single edge whose weight is equal to the sum of the arcs’ weights; opposite arcs with different signs become two parallel edges, each one with the original sign and weight of each arc.

  3. The original dataset from UCI Machine Learning Repository (Bache and Lichman 2013) consists of 20, 000 messages taken from 20 newsgroups. However, since the authors’ bounding technique did not scale to the full dataset, they restricted their testbed to a subsample of 100 messages from each newsgroup, for a total of 2000 messages.

  4. Wikipedia adminship election data has 7,000 vertices and 100,000 edges, Epinions signed social network has 131,828 vertices and 841,372 edges and Slashdot Zoo signed social network (from February 21 2009) is comprised of 82,144 vertices and 549,202 edges.

  5. In simple terms, GRASP (Feo and Resende 1995) is a multistart metaheuristic in which each iteration consists basically of two phases: construction (Algorithm  2) and local search (Algorithm 3).

  6. The Pajek program is for analysis and visualization of large networks, which provides features such as cluster identification, decomposition of large networks, visualization tools, as well as an implementation of several efficient algorithms for analysis of large networks, having thousands or even millions of vertices.

  7. All instances are available in http://www.ic.uff.br/~yuri/files/CCinst.zip.

  8. We have extracted subsets of Slashdot Zoo signed social network from February 21, 2009 (Leskovec and Krevl 2014), containing the first n vertices, where \(n \in \{600, 1000, 2000, 4000, 8000, 10{,}000\}\). Since the original Slashdot network is a signed digraph, before extracting the subsets, we have converted it into an undirected graph.

  9. United Nations General Assembly Voting Data, by Anton Strezhnev and Erik Voeten, http://hdl.handle.net/1902.1/12379. Accessed in Apr 2016.

  10. Instances solved in less than 100 seconds were omitted from the graph.

  11. The target solution value is calculated as the best solution value that can be obtained by all heuristic methods when solving a specific instance.

  12. The full tables containing the comparison between SeqGRASP and SeqILS are available in http://www.ic.uff.br/~yuri/files/CC-ADDcomp.zip.

  13. Instances solved in less than 5 seconds were omitted from the graph.

  14. The full tables containing the comparison between SeqGRASP and SeqILS are available in http://www.ic.uff.br/~yuri/files/CC-ADDcomp.zip.

  15. In order to solve the CC Problem with Doreian Mrvar method inside Pajek, we followed these steps: After loading the network file, we created a random initial partition using \(1-mode\). At this point, we defined the number of clusters k of the solution. We have used the same number of clusters of the solution returned by ILS. Finally, we called the resolution method from the menu (Network \(\rightarrow \) Signed Network \(\rightarrow \) Create Partition \(\rightarrow \) Doreian Mrvar Method), specifying the number of repetitions (we set it to 100), the alpha parameter (set to 0.5 to match the CC Problem objective function) and the minimum number of vertices in clusters (set to 1).

  16. We conducted several experiments to find the optimal number of processes to be used in the parallel algorithms. This setting is closely related to the hardware configuration of the computer cluster used in the experiments. Since each machine in the computer cluster has two quad-core CPUs (8 processor cores), it can host eight processes running in parallel. In ParILS / ParVND, we chose to group each ILS master process together with its corresponding VND search slaves, in order to maximize the performance of message exchange between related processes.

  17. Whenever it is required to assess the performance of parallel algorithms, two metrics are applied: Speedup Su(p) measures the acceleration observed for the parallel algorithm when compared with its sequential version and efficiency E(p) measures the average fraction of time along which each process is effectively used. Thus, \(Su(p)=T(seq)/T(p)\), such that T(seq) is the time required for the sequential algorithm and T(p) the time required for the parallel algorithm run on p processors, and \(E(p)=Su(p)/p\).

  18. The table containing the performance comparison between Parallel GRASP and Parallel ILS is available in http://www.ic.uff.br/~yuri/files/CC-ADDcomp.zip.

  19. For a full report of these results, including the groups of countries in each solution, separated by year, see the complementary material in http://www.ic.uff.br/~yuri/files/CCcomp.zip.

  20. Slashdot friends or foes network from Feb 21 2009: 82,144 vertices and 549,202 edges.

  21. Wikipedia adminship election data: 7,000 vertices and 100,000 edges.

  22. Epinions who-vote-whom network: 131,828 vertices and 841,372 edges.

References

  • Abell P, Ludwig M (2009) Structural balance: a dynamic perspective. J Math Sociol 33:129–155

    Article  Google Scholar 

  • Aiex RM, Resende MGC, Ribeiro CC (2007) TTT plots: a perl program to create time-to-target plots. Optim Lett 1(4):355–366

    Article  Google Scholar 

  • Ailon N, Charikar M, Newman A (2008) Aggregating inconsistent information: ranking and clustering. J ACM 55(5):23

    Article  Google Scholar 

  • Alba E (2005) Parallel metaheuristics: a new class of algorithms, vol 47. Wiley, New York

    Book  Google Scholar 

  • Allison GT (1969) Conceptual models and the cuban missile crisis. Am Polit Sci Rev 63(03):689–718

    Article  Google Scholar 

  • Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Bansal N, Blum A, Chawla S (2002) Correlation clustering. In: Proceedings of the 43rd annual IEEE symposium of foundations of computer science. Vancouver, Canada, pp 238–250

  • Bhattacharya A, De RK (2008) Divisive correlation clustering algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles. Bioinformatics 24(11):1359–1366

    Article  Google Scholar 

  • Bonchi F, Gionis A, Ukkonen A (2011) Overlapping correlation clustering. 2011 IEEE 11th international conference on data mining (ICDM). IEEE, pp 51–60

  • Brandes U, Delling D, Gaertler M, Gorke R, Hoefer M, Nikoloski Z, Wagner D (2008) On modularity clustering. IEEE Trans Knowl Data Eng 20(2):172–188

    Article  Google Scholar 

  • Brusco M (2003) An enhanced branch-and-bound algorithm for a partitioning problem. Br J Math Stat Psychol 56:83–92

    Article  Google Scholar 

  • Brusco M, Doreian P, Mrvar A, Steinly D (2011) Two algorithms for relaxed structural balance partitioning: linking theory, models and data to understand social network phenomena. Sociol Methods Res 40:57–87

    Article  Google Scholar 

  • Brusco MJ, Köhn H-F (2009) Clustering qualitative data based on binary equivalence relations: neighborhood search heuristics for the clique partitioning problem. Psychometrika 74(4):685–703

    Article  Google Scholar 

  • Cartwright D, Harary F (1956) Structural balance: a generalization of Heider’s theory. Psychol Rev 63:277–293

    Article  Google Scholar 

  • Charikara M, Guruswamib V, Wirtha A (2005) Clustering with qualitative information. J Comput Syst Sci 71:360–383

    Article  Google Scholar 

  • Chiang K-Y, Hsieh C-J, Natarajan N, Tewari A, Inderjit SD (2013) Prediction and clustering in signed networks. A local to global perspective. arXiv:1302.5145

  • Crainic TG, Toulouse M (2010) Parallel meta-heuristics. In: Handbook of metaheuristics. Springer, US, pp 497–541

  • DasGupta B, Encisob GA, Sontag E, Zhanga Y (2007) Algorithmic and complexity results for decompositions of biological networks into monotone subsystems. BioSystems 90:161–178

    Article  Google Scholar 

  • Davis JA (1967) Clustering and structural balance in graphs. Hum Relat 20:181–187

    Article  Google Scholar 

  • De Nooy W, Mrvar A, Vladimir B (2011) Exploratory social network analysis with Pajek: revised and expanded, vol 27, 2nd edn. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Demaine ED, Emanuel D, Fiat A, Immorlica N (2006) Correlation clustering in general weighted graphs. Theoret Comput Sci 361:172–187

    Article  Google Scholar 

  • Den Besten M, Stützle T, Dorigo M (2001) Design of iterated local search algorithms. In: Workshops on applications of evolutionary computation. Springer Berlin, Heidelberg, pp 441–451

  • Doreian P, Mrvar A (1996a) A partitioning approach to structural balance. Soc Netw 18:149–168

    Article  Google Scholar 

  • Doreian P, Mrvar A (2009) Partitioning signed social networks. Soc Netw 31:1–11

    Article  Google Scholar 

  • Doreian P, Krackhardt D (2001) Pre-transitive balance mechanisms for signed networks*. J Math Sociol 25(1):43–67

    Article  Google Scholar 

  • Doreian P, Mrvar A (1996b) Structural balance and partitioning signed graphs. Developments in data analysis, pp 195–208

  • Dowdall AT (2009) The birth and death of a tar baby: Henry kissinger and southern africa. Ph.D. thesis, University of Missouri–Columbia

  • Drummond L, Figueiredo R, Frota Y, Levorato M (2013) Efficient solution of the correlation clustering problem: an application to structural balance. In: YanTang D, Herv P (eds) OTM 2013 Workshops, LNCS, vol 8186. Springer, Berlin, pp 674–683

  • Duch J, Arenas A (2005) Community detection in complex networks using extremal optimization. Phys Rev E 72(2):027104

    Article  Google Scholar 

  • Ekşioglu SD, Pardalos PM, Resende MGC (2002) Parallel metaheuristics for combinatorial optimization. In: Models for parallel and distributed computation. Springer, pp 179–206

  • Elsner M, Schudy W (2009) Bounding and comparing methods for correlation clustering beyond ILP. In: ILP’09 proceedings of the workshop on integer linear programming for natural language processing, pp 19–27

  • Epinions (1999) Website. http://www.epinions.com. Accessed on March 2015

  • Esmailian P, Abtahi SE, Jalili M (2014) Mesoscopic analysis of online social networks: the role of negative ties. Phys Rev E 90(4):042817

    Article  Google Scholar 

  • Facchetti G, Iacono G, Altafini C (2011) Computing global structural balance in large-scale signed social networks. Proc Natl Acad Sci USA 108:20953–20958

    Article  Google Scholar 

  • Feo TA, Resende MGC (1995) Greedy randomized adaptive search procedures. J Glob Optim 6(2):109–133

    Article  Google Scholar 

  • Figueiredo R, Frota Y (2014) The maximum balanced subgraph of a signed graph: applications and solution approaches. Eur J Oper Res 236(2):473–487

    Article  Google Scholar 

  • Figueiredo R, Moura G (2013) Mixed integer programming formulations for clustering problems related to structural balance. Soc Netw 35(4):639–651

    Article  Google Scholar 

  • Gendreau M, Potvin JY (2010) Handbook of metaheuristics. International series in operations research and management science. Springer, Berlin

    Google Scholar 

  • Giotis I, Guruswami V (2006) Correlation clustering with a fixed number of clusters. In: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm. ACM, pp 1167–1176

  • Golan G (2010) Yom Kippur and after: The Soviet Union and the Middle East Crisis, vol 19. Cambridge University Press, Cambridge

    Google Scholar 

  • Golani M (1995) The historical place of the czech-egyptian arms deal, fall 1955. Middle Eastern Stud 31(4):803–827

    Article  Google Scholar 

  • Gülpinar N, Gutin G, Mitra G, Zverovitch A (2004) Extracting pure network submatrices in linear programs using signed graphs. Discrete Appl Math 137:359–372

    Article  Google Scholar 

  • Harary F, Lim M, Wunsch DC (2003) Signed graphs for portfolio analysis in risk management. IMA J Manag Math 13:1–10

    Google Scholar 

  • Heider F (1946) Attitudes and cognitive organization. J Psychol 21:107–112

    Article  Google Scholar 

  • Huffner F, Betzler N, Niedermeier R (2010) Separator-based data reduction for signed graph balancing. J Combin Optim 20:335–360

    Article  Google Scholar 

  • Inohara T (1998) On conditions for a meeting not to reach a deadlock. Appl Math Comput 90:1–9

    Article  Google Scholar 

  • Kim S, Yoo CD, Nowozin S, Kohli P (2014) Image segmentation usinghigher-order correlation clustering. IEEE Trans Pattern Anal Mach Intell 36(9):1761–1774

    Article  Google Scholar 

  • Kreps SE (2007) The 2006 Lebanon war: lessons learned. Parameters 37(1):72

    Google Scholar 

  • Kunegis J, Lommatzsch A, Bauckhage C (2009) The slashdot zoo: mining a social network with negative edges. In: WWW’09 Proceedings of the 18th international conference on World wide web, pp 741–750

  • Kunegis J, Schmidt S, Lommatzsch A, Lerner J, De Luca EW, Albayrak S (2010) Spectral analysis of signed graphs for clustering, prediction and visualization. SDM, vol 10. SIAM, pp 559–559

  • Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: CHI’10 Proceedings of the SIGCHI conference on human factors in computing systems, pp 1361–1370

  • Leskovec J, Krevl A (2014) SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data

  • Levorato M, Drummond L, Frota Y, Figueiredo R (2015) An ILS algorithm to evaluate structural balance in signed social networks. In: Symposium on applied computing, SAC 2015, Salamanca, Spain—April 13–17, pp 1117–1122

  • Lourenço HR, Martin OC, Stützle T (2003) Iterated local search. Springer, Berlin

    Book  Google Scholar 

  • Macon KT, Mucha PJ, Porter MA (2012) Community structure in the united nations general assembly. Phys A 391:343–361

    Article  Google Scholar 

  • McGreal C (2006) Brothers in arms-Israel’s secret pact with pretoria. Guardian 7. https://www.theguardian.com/world/2006/feb/07/southafrica.israel. Accessed 23 Jan 2017

  • Mearsheimer JJ, Walt SM (2006) The Israel lobby and us foreign policy. Middle East Policy 13(3):29–87

    Article  Google Scholar 

  • Mehrotra A, Trick MA (1998) Cliques and clustering: a combinatorial approach. Oper Res Lett 22(1):1–12

    Article  Google Scholar 

  • Mladenović N, Hansen P (1997) Variable neighborhood search. Comput Oper Res 24(11):1097–1100

    Article  Google Scholar 

  • Munem BA (2008) Canada and peace in the middle east. http://www.palestine1.net/can&p-e.htm. Accessed on Jan 2015

  • Nascimento MC, Pitsoulis L (2013) Community detection by modularity maximization using GRASP with path relinking. Comput Oper Res 40(12):3121–3131

  • Nesbitt FN (2004) Race for sanctions: African Americans against apartheid, 1946–1994. Indiana University Press, Bloomington

    Google Scholar 

  • Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103(23):8577–8582

    Article  Google Scholar 

  • Pérez-Stable M (1993) The Cuban revolution: origins, course, and legacy. Oxford University Press, New York

    Google Scholar 

  • Ruiz R, Stützle T (2007) A simple and effective iterated greedy algorithm for the permutation flowshop scheduling problem. Eur J Oper Res 177(3):2033–2049

    Article  Google Scholar 

  • Slashdot Website (1997) http://slashdot.org. Accessed on March 2015

  • Smith CD (2010) Palestine and the Arab-Israeli conflict:[a history with documents]. Bedford/St. Martin’s,

  • Srinivasan A (2011) Local balancing influences global structure in social networks. Proc Natl Acad Sci USA 108:1751–1752

    Article  Google Scholar 

  • Stinnett DM, Tir J, Diehl PF, Schafer P, Gochman C (2002) The correlates of war (cow) project direct contiguity data, version 3.0. Confl Manag Peace Sci 19:59–67

    Article  Google Scholar 

  • Swamy Chaitanya (2004) Correlation clustering: maximizing agreements via semidefinite programming. In: Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, pp 526–527

  • Traag VA, Bruggeman J (2009) Community detection in networks with positive and negative links. Phys Rev E 80:036115

    Article  Google Scholar 

  • Wang Ning, Li Jie (2013) Restoring: A greedy heuristic approach based on neighborhood for correlation clustering. In: Advanced data mining and applications. Springer, pp 348–359

  • Yang B, Cheung WK, Liu J (2007) Community mining from signed social networks. IEEE Trans Knowl Data Eng 19:1333–1348

    Article  Google Scholar 

  • Zhang S, Wang R-S, Zhang X-S (2007) Identification of overlapping community structure in complex networks using fuzzy c-means clustering. Phys A 374(1):483–490

    Article  Google Scholar 

  • Zhang Z, Cheng H, Chen W, Zhang S, Fang Q (2008) Correlation clustering based on genetic algorithm for documents clustering. IEEE congress on evolutionary computation, pp 3193–3198

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rosa Figueiredo.

Additional information

An erratum to this article is available at https://doi.org/10.1007/s13675-017-0087-1.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 218 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Levorato, M., Figueiredo, R., Frota, Y. et al. Evaluating balancing on social networks through the efficient solution of correlation clustering problems. EURO J Comput Optim 5, 467–498 (2017). https://doi.org/10.1007/s13675-017-0082-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13675-017-0082-6

Keywords

Mathematics Subject Classification

Navigation