Structure-preserving sparsification methods for social networks

  • Michael Hamann
  • Gerd Lindner
  • Henning Meyerhenke
  • Christian L. Staudt
  • Dorothea Wagner
Original Article

Abstract

Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of edge sparsification methods on a diverse set of network properties. It is shown that they can be understood as methods for rating edges by importance and then filtering globally or locally by these scores. We show that applying a local filtering technique improves the preservation of all kinds of properties. In addition, we propose a new sparsification method (Local Degree) which preserves edges leading to local hub nodes. All methods are evaluated on a set of social networks from Facebook, Google+, Twitter and LiveJournal with respect to network properties including diameter, connected components, community structure, multiple node centrality measures and the behavior of epidemic simulations. To assess the preservation of the community structure, we also include experiments on synthetically generated networks with ground truth communities. Experiments with our implementations of the sparsification methods (included in the open-source network analysis tool suite NetworKit) show that many network properties can be preserved down to about 20 % of the original set of edges for sparse graphs with a reasonable density. The experimental results allow us to differentiate the behavior of different methods and show which method is suitable with respect to which property. While our Local Degree method is best for preserving connectivity and short distances, other newly introduced local variants are best for preserving the community structure.

Keywords

Complex networks Sparsification Backbones Network reduction Edge sampling 

References

  1. Ahmed NK, Neville J, Kompella R (2014) Network sampling: from static to streaming graphs. ACM Trans Knowl Discov Data (TKDD) 8(2):7Google Scholar
  2. Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286:509–512MathSciNetCrossRefMATHGoogle Scholar
  3. Bastian M, Heymann S, Jacomy M (2009) Gephi: An open source software for exploring and manipulating networks. In: Adar E, Hurst M, Finin T, Glance NS, Nicolov N, Tseng BL (eds) ICWSM, The AAAI Press. http://dblp.uni-trier.de/db/conf/icwsm/icwsm2009.html#BastianHJ09
  4. Batson J, Spielman DA, Srivastava N, Teng SH (2013) Spectral sparsification of graphs: theory and algorithms. Commun ACM 56(8):87–94CrossRefGoogle Scholar
  5. Borassi M, Crescenzi P, Habib M, Kosters WA, Marino A, Takes FW (2015) Fast diameter and radius bfs-based computation in (weakly connected) real-world graphs: with an application to the six degrees of separation games. Theor Comput Sci 586:59–80. doi:10.1016/j.tcs.2015.02.033, http://www.sciencedirect.com/science/article/pii/S0304397515001644 (fun with Algorithms)
  6. Chen J, Safro I (2011) Algebraic distance on graphs. SIAM J Sci Comput 33(6):3468–3490MathSciNetCrossRefMATHGoogle Scholar
  7. Chiba N, Nishizeki T (1985) Arboricity and subgraph listing algorithms. SIAM J Comput 14(1):210–223. doi:10.1137/0214017 MathSciNetCrossRefMATHGoogle Scholar
  8. Costa LdF, Oliveira ON Jr, Travieso G, Rodrigues FA, Villas Boas PR, Antiqueira L, Viana MP, Correa Rocha LE (2011) Analyzing and modeling real-world phenomena with complex networks: a survey of applications. Adv Phys 60(3):329–412CrossRefGoogle Scholar
  9. Ebbes P, Huang Z, Rangaswamy A, Thadakamalla HP, Unit ORGB (2008) Sampling large-scale social networks: insights from simulated networks. In: 18th annual workshop on information technologies and systems, Paris, France, CiteseerGoogle Scholar
  10. Fortunato S, Barthelemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci 104(1):36–41CrossRefGoogle Scholar
  11. Fortunato S, Boguñá M, Flammini A, Menczer F (2008) Approximating pagerank from in-degree. In: Aiello W, Broder A, Janssen J, Milios E (eds) Algorithms and models for the web-graph, Springer, Berlin, pp 59–71Google Scholar
  12. Geisberger R, Sanders P, Schultes D (2008) Better approximation of betweenness centrality. In: ALENEX, SIAM, pp 90–100Google Scholar
  13. Gleiser PM, Danon L (2003) Community structure in jazz. Adv Complex Syst 6(4):565–574. http://dblp.uni-trier.de/db/journals/advcs/advcs6.html#GleiserD03
  14. Hu X, Tao Y, Chung CW (2014) I/o-efficient algorithms on triangle listing and counting. ACM Trans Database Syst 39(4):1–27. doi:10.1145/2691190.2691193 MathSciNetCrossRefGoogle Scholar
  15. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218. doi:10.1007/BF01908075 CrossRefMATHGoogle Scholar
  16. John E, Safro I (2016) Single-and multi-level network sparsification by algebraic distance. arXiv:160105527 (arXiv preprint)
  17. Keeling M, Rohani P (2008) Modeling infectious diseases in humans and animals. Princeton University Press, PrincetonMATHGoogle Scholar
  18. Lancichinetti A, Fortunato S (2009a) Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E 80(1):016118. doi:10.1103/PhysRevE.80.016118 CrossRefGoogle Scholar
  19. Lancichinetti A, Fortunato S (2009b) Community detection algorithms: a comparative analysis. Phys Rev E 80:056117. doi:10.1103/PhysRevE.80.056117 CrossRefGoogle Scholar
  20. Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’06, pp 631–636. doi:10.1145/1150402.1150479
  21. Leskovec J, Mcauley JJ (2012) Learning to discover social circles in ego networks. In: Pereira F, Burges C, Bottou L, Weinberger K (eds) Advances in neural information processing systems 25. Curran Associates Inc, New York, pp 539–547Google Scholar
  22. Lindner G, Staudt CL, Hamann M, Meyerhenke H, Wagner D (2015) Structure-preserving sparsification of social networks. In: Pei J, Silvestri F, Tang J (eds) Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM 2015, Paris, France, August 25–28, 2015, ACM, pp 448–454. doi:10.1145/2808797.2809313
  23. Newman M (2010) Networks: an introduction. Oxford University Press, OxfordCrossRefMATHGoogle Scholar
  24. Nick B, Lee C, Cunningham P, Brandes U (2013) Simmelian backbones: Amplifying hidden homophily in facebook networks. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining, ACM, New York, NY, USA, ASONAM ’13, pp 525–532. doi:10.1145/2492517.2492569
  25. Nocaj A, Ortmann M, Brandes U (2014) Untangling hairballs—from 3 to 14 degrees of separation. In: Duncan CA, Symvonis A (eds) Graph Drawing—22nd international symposium, GD 2014, Würzburg, Germany, September 24-26, 2014 Revised Selected Papers, Lecture Notes in Computer Science, vol 8871. Springer, Berlin, pp 101–112. doi:10.1007/978-3-662-45803-7_9
  26. Ortmann M, Brandes U (2014) Triangle listing algorithms: Back from the diversion. In: McGeoch CC, Meyer U (eds) 2014 Proceedings of the Sixteenth Workshop on Algorithm Engineering and Experiments, ALENEX 2014, Portland, Oregon, USA, January 5, 2014, SIAM, pp 1–8. doi:10.1137/1.9781611973198.1
  27. Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Technical Report 1999-66, Stanford InfoLab. http://ilpubs.stanford.edu:8090/422/, previous number = SIDL-WP-1999-0120
  28. Rosvall M, Axelsson D, Bergstrom CT (2009) The map equation. Euro Phys J Spec Topics 178(1):13–23. doi:10.1140/epjst/e2010-01179-1 CrossRefGoogle Scholar
  29. Saha T, Rangwala H, Domeniconi C (2013) Sparsification and sampling of networks for collective classification. In: Greenberg AM, Kennedy WG, Bos ND (eds) Social Computing, Behavioral-cultural modeling and prediction. Springer, Berlin, pp 293–302Google Scholar
  30. Salathé M, Kazandjieva M, Lee JW, Levis P, Feldman MW, Jones JH (2010) A high-resolution human contact network for infectious disease transmission. Proc Natl Acad Sci 107(51):22020–22025CrossRefGoogle Scholar
  31. Satuluri V, Parthasarathy S, Ruan Y (2011) Local graph sparsification for scalable clustering. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, ACM, New York, NY, USA, SIGMOD ’11, pp 721–732. doi:10.1145/1989323.1989399
  32. Serrano MÁ, Boguñá M, Vespignani A (2009) Extracting the multiscale backbone of complex weighted networks. Proc Natl Acad Sci 106(16):6483–6488. doi:10.1073/pnas.0808904106, http://www.pnas.org/content/106/16/6483.abstract
  33. Shun J, Tangwongsan K (2015) Multicore triangle computations without tuning. In: Proceedings of the IEEE international conference on data engineering (ICDE). http://dblp.unitrier.de/rec/bibtex/conf/icde/ShunT15
  34. Simmel G, Wolff K (1950) The sociology of Georg Simmel. Free Press paperback, Free Press. http://books.google.de/books?id=Ha2aBqS415YC
  35. Staudt C, Meyerhenke H (2016) Engineering parallel algorithms for community detection in massive networks. IEEE Trans Parallel Distrib Syst 27(1):171–184. doi:10.1109/TPDS.2015.2390633 CrossRefGoogle Scholar
  36. Staudt C, Sazonovs A, Meyerhenke H (2014) Networkit: a tool suite for large-scale complex network analysis. CoRR http://arxiv.org/abs/1403.3005, arXiv: abs/1403.3005
  37. Traud AL, Mucha PJ, Porter MA (2012) Social structure of facebook networks. Phys A Stat Mech Appl 391(16):4165–4180CrossRefGoogle Scholar
  38. Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th annual international conference on machine learning, ACM, New York, NY, USA, ICML ’09, pp 1073–1080. doi:10.1145/1553374.1553511
  39. Wang Y, Chakrabarti D, Wang C, Faloutsos C (2003) Epidemic spreading in real networks: an eigenvalue viewpoint. In: Proceedings of the 22nd international symposium on reliable distributed systems, 2003, IEEE, pp 25–34Google Scholar
  40. Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In: Proceedings of the ACM SIGKDD workshop on mining data semantics, ACM, p 3Google Scholar

Copyright information

© Springer-Verlag Wien 2016

Authors and Affiliations

  • Michael Hamann
    • 1
  • Gerd Lindner
    • 1
  • Henning Meyerhenke
    • 1
  • Christian L. Staudt
    • 1
  • Dorothea Wagner
    • 1
  1. 1.Karlsruhe Institute of Technology (KIT), Institute of Theoretical InformaticsKarlsruheGermany

Personalised recommendations