Data Mining and Knowledge Discovery

, Volume 30, Issue 2, pp 438–475 | Cite as

Fast approximation of betweenness centrality through sampling

Article

Abstract

Betweenness centrality is a fundamental measure in social network analysis, expressing the importance or influence of individual vertices (or edges) in a network in terms of the fraction of shortest paths that pass through them. Since exact computation in large networks is prohibitively expensive, we present two efficient randomized algorithms for betweenness estimation. The algorithms are based on random sampling of shortest paths and offer probabilistic guarantees on the quality of the approximation. The first algorithm estimates the betweenness of all vertices (or edges): all approximate values are within an additive factor \(\varepsilon \in (0,1)\) from the real values, with probability at least \(1-\delta \). The second algorithm focuses on the top-K vertices (or edges) with highest betweenness and estimate their betweenness value to within a multiplicative factor \(\varepsilon \), with probability at least \(1-\delta \). This is the first algorithm that can compute such approximation for the top-K vertices (or edges). By proving upper and lower bounds to the VC-dimension of a range set associated with the problem at hand, we can bound the sample size needed to achieve the desired approximations. We obtain sample sizes that are independent from the number of vertices in the network and only depend on a characteristic quantity that we call the vertex-diameter, that is the maximum number of vertices in a shortest path. In some cases, the sample size is completely independent from any quantitative property of the graph. An extensive experimental evaluation on real and artificial networks shows that our algorithms are significantly faster and much more scalable as the number of vertices grows than other algorithms with similar approximation guarantees.

Keywords

Social network analysis Betweenness centrality VC-dimension Sampling Approximation algorithms 

References

  1. Abraham I, Delling D, Fiat A, Goldberg AV, Werneck RF (2011) VC-dimension and shortest path algorithms. Automata, languages and programming., Lecture notes in computer scienceSpringer, Berlin, pp 690–699. doi:10.1007/978-3-642-22006-7_58 CrossRefGoogle Scholar
  2. Aingworth D, Chekuri C, Indyk P, Motwani R (1999) Fast estimation of diameter and shortest paths (without matrix multiplication). SIAM J Comput 28(4):1167–1181. doi:10.1137/S0097539796303421 CrossRefMathSciNetMATHGoogle Scholar
  3. Anthonisse JM (1971) The rush in a directed graph. Technical Report BN 9/71, Stichting Mathematisch Centrum, AmsterdamGoogle Scholar
  4. Anthony M, Bartlett PL (1999) Neural network learning—theoretical foundations. Cambridge University Press, New YorkCrossRefMATHGoogle Scholar
  5. Bader DA, Kintali S, Madduri K, Mihail M (2007) Approximating betweenness centrality. In: Bonato A, Chung F (eds) Algorithms and models for the web-graph. Lecture notes in computer science, vol 4863. Springer, Berlin. doi:10.1007/978-3-540-77004-6_10 Google Scholar
  6. Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512CrossRefMathSciNetGoogle Scholar
  7. Bergamini E, Meyerhenke H (2015) Fully-dynamic approximation of betweenness centrality. CoRR abs/1504.0709 (to appear in ESA’15)Google Scholar
  8. Bergamini E, Meyerhenke H, Staudt CL (2015) Approximating betweenness centrality in large evolving networks. In: 17th Workshop on Algorithm Engineering and Experiments, ALENEX 2015, SIAM, pp 133–146Google Scholar
  9. Boitmanis K, Freivalds K, Lediņš P, Opmanis R (2006) Fast and simple approximation of the diameter and radius of a graph. In: Alvarez C, Serna M (eds) Experimental algorithms. Lecture notes in computer science. Springer, Berlin, pp 98–108. doi:10.1007/11764298_9 Google Scholar
  10. Borgatti SP, Everett MG (2006) A graph-theoretic perspective on centrality. Soc Netw 28(4):466–484. doi:10.1016/j.socnet.2005.11.005 CrossRefGoogle Scholar
  11. Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177. doi:10.1080/0022250X.2001.9990249 CrossRefMATHGoogle Scholar
  12. Brandes U (2008) On variants of shortest-path betweenness centrality and their generic computation. Soc Netw 30(2):136–145. doi:10.1016/j.socnet.2007.11.001 CrossRefMathSciNetGoogle Scholar
  13. Brandes U, Pich C (2007) Centrality estimation in large networks. Int J Bifurc Chaos 17(7):2303–2318. doi:10.1142/S0218127407018403 CrossRefMathSciNetMATHGoogle Scholar
  14. Cormode G, Duffield N (2014) Sampling for big data: a tutorial. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’14, pp 1975–1975. ACM, New York. doi:10.1145/2623330.2630811. http://doi.acm.org/10.1145/2623330.2630811
  15. Csárdi G, Nepusz T (2006) The igraph software package for complex network research. InterJ Complex Syst 1695:1–9Google Scholar
  16. Dolev S, Elovici Y, Puzis R (2010) Routing betweenness centrality. J ACM 57(4):25:1–25:27. doi:10.1145/1734213.1734219 CrossRefMathSciNetGoogle Scholar
  17. Eppstein D, Wang J (2004) Fast approximation of centrality. J Graph Algorithms Appl 8(1):39–45CrossRefMathSciNetMATHGoogle Scholar
  18. Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40:35–41CrossRefGoogle Scholar
  19. Geisberger R, Sanders P, Schultes D (2008) Better approximation of betweenness centrality. In: Munro JI, Wagner D (eds) Algorithm engineering & experiments (ALENEX’08), SIAM, pp 90–100Google Scholar
  20. Har-Peled S, Sharir M (2011) Relative \((p,\varepsilon )\)-approximations in geometry. Discret Comput Geom 45(3):462–496. doi:10.1007/s00454-010-9248-1 CrossRefMathSciNetMATHGoogle Scholar
  21. Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30CrossRefMathSciNetMATHGoogle Scholar
  22. Jacob R, Koschützki D, Lehmann K, Peeters L, Tenfelde-Podehl D (2005) Algorithms for centrality indices. In: Brandes U, Erlebach T (eds) Network analysis. Lecture notes in computer science, vol 3418. Springer, Berlin, pp 62–82. doi:10.1007/978-3-540-31955-9_4 Google Scholar
  23. Kaindl H, Kainz G (1997) Bidirectional heuristic search reconsidered. J Artif Intell Res 7:283–317. doi:10.1613/jair.460 MathSciNetMATHGoogle Scholar
  24. Kourtellis N, Alahakoon T, Simha R, Iamnitchi A, Tripathi R (2012) Identifying high betweenness centrality nodes in large social networks. Soc Netw Anal Mining 3:899–914. doi:10.1007/s13278-012-0076-6 CrossRefGoogle Scholar
  25. Kourtellis N, Morales GDF, Bonchi F (2014) Scalable online betweenness centrality in evolving graphs. CoRR abs/1401.6981Google Scholar
  26. Kranakis E, Krizanc D, Ruf B, Urrutia J, Woeginger G (1997) The VC-dimension of set systems defined by graphs. Discret Appl Math 77(3):237–257. doi:10.1016/S0166-218X(96)00137-0 CrossRefMathSciNetMATHGoogle Scholar
  27. Li Y, Long PM, Srinivasan A (2001) Improved bounds on the sample complexity of learning. J Comput Syst Sci 62(3):516–527. doi:10.1006/jcss.2000.1741 CrossRefMathSciNetMATHGoogle Scholar
  28. Lim YS, Menasche DS, Ribeiro B, Towsley D, Basu P (2011) Online estimating the k central nodes of a network. In: IEEE network science workshop, NSW’11, pp 118–122. doi:10.1109/NSW.2011.6004633
  29. Löffler M, Phillips JM (2009) Shape fitting on point sets with probability distributions. In: Fiat A, Sanders P (eds) Algorithms—ESA 2009. Lecture notes in computer science, vol 5757. Springer, Berlin, pp 313–324. doi:10.1007/978-3-642-04128-0_29 Google Scholar
  30. Maiya AS, Berger-Wolf TY (2010) Online sampling of high centrality individuals in social networks. In: Zaki M, Yu J, Ravindran B, Pudi V (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 6118. Springer, Berlin, pp 91–98. doi:10.1007/978-3-642-13657-3_12 CrossRefGoogle Scholar
  31. Matoušek J (2002) Lectures on discrete geometry. Springer, SecaucusCrossRefMATHGoogle Scholar
  32. Mitzenmacher M, Upfal E (2005) Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press, New YorkCrossRefGoogle Scholar
  33. Newman MEJ (2010) Networks—an introduction. Oxford University Press, OxfordCrossRefMATHGoogle Scholar
  34. Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(026):113. doi:10.1103/PhysRevE.69.026113 Google Scholar
  35. Opsahl T, Agneessens F, Skvoretz J (2010) Node centrality in weighted networks: generalizing degree and shortest paths. Soc Netw 32(3):245–251. doi:10.1016/j.socnet.2010.03.006 CrossRefGoogle Scholar
  36. Papagelis M, Das G, Koudas N (2013) Sampling online social networks. IEEE Trans Knowl Data Eng 25(3):662–676CrossRefGoogle Scholar
  37. Pfeffer J, Carley KM (2012) k-centralities: local approximations of global measures based on shortest paths. In: Proceedings of the 21st international conference on companion on world wide web, WWW ’12 Companion, pp 1043–1050. ACM, New York. doi:10.1145/2187980.2188239
  38. Pohl I (1969) Bidirectional heuristic search in path problems. Ph.D. Thesis, Stanford UniversityGoogle Scholar
  39. Riondato M, Kornaropoulos EM (2014) Fast approximation of betweenness centrality through sampling. In: Castillo C, Metzler D (eds) Proceedings of 7th ACM conference on web search data mining, WSDM’14. ACM, New YorkGoogle Scholar
  40. Riondato M, Upfal E (2014) Efficient discovery of association rules and frequent itemsets through sampling with tight performance guarantees. ACM Trans Knowl Disc Data 8(2)Google Scholar
  41. Roditty L, Williams VV (2012) Approximating the diameter of a graph. CoRR abs/1207.3622Google Scholar
  42. Sarıyüce AE, Saule E, Kaya K, Çatalyürek UV (2013) Shattering and compressing networks for betweenness centrality. In: SIAM data mining conferenceGoogle Scholar
  43. Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: from theory to algorithms. Cambridge University Press, New YorkCrossRefGoogle Scholar
  44. Staudt C, Sazonovs A, Meyerhenke H (2014) Networkit: an interactive tool suite for high-performance network analysis. CoRR abs/1403.3005Google Scholar
  45. Tang J, Zhang C, Cai K, Zhang L, Su Z (2015) Sampling representative users from large social networks. In: AAAIGoogle Scholar
  46. Vapnik VN, Chervonenkis AJ (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab Appl 16(2):264–280. doi:10.1137/1116025 CrossRefMathSciNetMATHGoogle Scholar
  47. Yoshida Y (2014) Almost linear-time algorithms for adaptive betweenness centrality using hypergraph sketches. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’14, pp 1416–1425. ACM, New York. doi:10.1145/2623330.2623626

Copyright information

© The Author(s) 2015

Authors and Affiliations

  1. 1.Department of Computer ScienceBrown UniversityProvidenceUSA

Personalised recommendations