Skip to main content

Fast approximation of betweenness centrality through sampling

Abstract

Betweenness centrality is a fundamental measure in social network analysis, expressing the importance or influence of individual vertices (or edges) in a network in terms of the fraction of shortest paths that pass through them. Since exact computation in large networks is prohibitively expensive, we present two efficient randomized algorithms for betweenness estimation. The algorithms are based on random sampling of shortest paths and offer probabilistic guarantees on the quality of the approximation. The first algorithm estimates the betweenness of all vertices (or edges): all approximate values are within an additive factor \(\varepsilon \in (0,1)\) from the real values, with probability at least \(1-\delta \). The second algorithm focuses on the top-K vertices (or edges) with highest betweenness and estimate their betweenness value to within a multiplicative factor \(\varepsilon \), with probability at least \(1-\delta \). This is the first algorithm that can compute such approximation for the top-K vertices (or edges). By proving upper and lower bounds to the VC-dimension of a range set associated with the problem at hand, we can bound the sample size needed to achieve the desired approximations. We obtain sample sizes that are independent from the number of vertices in the network and only depend on a characteristic quantity that we call the vertex-diameter, that is the maximum number of vertices in a shortest path. In some cases, the sample size is completely independent from any quantitative property of the graph. An extensive experimental evaluation on real and artificial networks shows that our algorithms are significantly faster and much more scalable as the number of vertices grows than other algorithms with similar approximation guarantees.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. Note that even if \(p_\emptyset =\emptyset \), the set \(\{p_\emptyset \}\) is not empty. It contains one element.

  2. We use the normalized version of betweenness as we believe it to be more suitable for presenting approximation results.

  3. After this work was accepted for publication, Bergamini and Meyerhenke (2015) presented an improved upper bound for VD(G) for undirected weighted graphs.

  4. It is a well-known open problem whether there is an algorithm to perform a single st-shortest path computation between a pair of vertices with smaller worst-case time complexity than the Single Source Shortest Path computation.

  5. Bounded-distance betweenness is also known as k-betweenness. We prefer the former denomination to avoid confusion with k-path betweenness as defined by Kourtellis et al. (2012).

  6. We take the average, rather than the sum, as defined in the original work by Kourtellis et al. (2012), to normalize the value so that it belongs to the interval [0, 1]. This has no consequences on the results.

  7. The implementations are available at http://cs.brown.edu/~matteo/centrsampl.tar.bz2. An independent implementation of our algorithms is available in NetworKit Staudt et al. (2014).

  8. http://snap.stanford.edu/data/index.html

References

  • Abraham I, Delling D, Fiat A, Goldberg AV, Werneck RF (2011) VC-dimension and shortest path algorithms. Automata, languages and programming., Lecture notes in computer scienceSpringer, Berlin, pp 690–699. doi:10.1007/978-3-642-22006-7_58

    Book  Google Scholar 

  • Aingworth D, Chekuri C, Indyk P, Motwani R (1999) Fast estimation of diameter and shortest paths (without matrix multiplication). SIAM J Comput 28(4):1167–1181. doi:10.1137/S0097539796303421

    Article  MathSciNet  MATH  Google Scholar 

  • Anthonisse JM (1971) The rush in a directed graph. Technical Report BN 9/71, Stichting Mathematisch Centrum, Amsterdam

  • Anthony M, Bartlett PL (1999) Neural network learning—theoretical foundations. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  • Bader DA, Kintali S, Madduri K, Mihail M (2007) Approximating betweenness centrality. In: Bonato A, Chung F (eds) Algorithms and models for the web-graph. Lecture notes in computer science, vol 4863. Springer, Berlin. doi:10.1007/978-3-540-77004-6_10

    Google Scholar 

  • Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    Article  MathSciNet  Google Scholar 

  • Bergamini E, Meyerhenke H (2015) Fully-dynamic approximation of betweenness centrality. CoRR abs/1504.0709 (to appear in ESA’15)

  • Bergamini E, Meyerhenke H, Staudt CL (2015) Approximating betweenness centrality in large evolving networks. In: 17th Workshop on Algorithm Engineering and Experiments, ALENEX 2015, SIAM, pp 133–146

  • Boitmanis K, Freivalds K, Lediņš P, Opmanis R (2006) Fast and simple approximation of the diameter and radius of a graph. In: Alvarez C, Serna M (eds) Experimental algorithms. Lecture notes in computer science. Springer, Berlin, pp 98–108. doi:10.1007/11764298_9

    Google Scholar 

  • Borgatti SP, Everett MG (2006) A graph-theoretic perspective on centrality. Soc Netw 28(4):466–484. doi:10.1016/j.socnet.2005.11.005

    Article  Google Scholar 

  • Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177. doi:10.1080/0022250X.2001.9990249

    Article  MATH  Google Scholar 

  • Brandes U (2008) On variants of shortest-path betweenness centrality and their generic computation. Soc Netw 30(2):136–145. doi:10.1016/j.socnet.2007.11.001

    Article  MathSciNet  Google Scholar 

  • Brandes U, Pich C (2007) Centrality estimation in large networks. Int J Bifurc Chaos 17(7):2303–2318. doi:10.1142/S0218127407018403

    Article  MathSciNet  MATH  Google Scholar 

  • Cormode G, Duffield N (2014) Sampling for big data: a tutorial. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’14, pp 1975–1975. ACM, New York. doi:10.1145/2623330.2630811. http://doi.acm.org/10.1145/2623330.2630811

  • Csárdi G, Nepusz T (2006) The igraph software package for complex network research. InterJ Complex Syst 1695:1–9

    Google Scholar 

  • Dolev S, Elovici Y, Puzis R (2010) Routing betweenness centrality. J ACM 57(4):25:1–25:27. doi:10.1145/1734213.1734219

    Article  MathSciNet  Google Scholar 

  • Eppstein D, Wang J (2004) Fast approximation of centrality. J Graph Algorithms Appl 8(1):39–45

    Article  MathSciNet  MATH  Google Scholar 

  • Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40:35–41

    Article  Google Scholar 

  • Geisberger R, Sanders P, Schultes D (2008) Better approximation of betweenness centrality. In: Munro JI, Wagner D (eds) Algorithm engineering & experiments (ALENEX’08), SIAM, pp 90–100

  • Har-Peled S, Sharir M (2011) Relative \((p,\varepsilon )\)-approximations in geometry. Discret Comput Geom 45(3):462–496. doi:10.1007/s00454-010-9248-1

    Article  MathSciNet  MATH  Google Scholar 

  • Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30

    Article  MathSciNet  MATH  Google Scholar 

  • Jacob R, Koschützki D, Lehmann K, Peeters L, Tenfelde-Podehl D (2005) Algorithms for centrality indices. In: Brandes U, Erlebach T (eds) Network analysis. Lecture notes in computer science, vol 3418. Springer, Berlin, pp 62–82. doi:10.1007/978-3-540-31955-9_4

    Google Scholar 

  • Kaindl H, Kainz G (1997) Bidirectional heuristic search reconsidered. J Artif Intell Res 7:283–317. doi:10.1613/jair.460

    MathSciNet  MATH  Google Scholar 

  • Kourtellis N, Alahakoon T, Simha R, Iamnitchi A, Tripathi R (2012) Identifying high betweenness centrality nodes in large social networks. Soc Netw Anal Mining 3:899–914. doi:10.1007/s13278-012-0076-6

    Article  Google Scholar 

  • Kourtellis N, Morales GDF, Bonchi F (2014) Scalable online betweenness centrality in evolving graphs. CoRR abs/1401.6981

  • Kranakis E, Krizanc D, Ruf B, Urrutia J, Woeginger G (1997) The VC-dimension of set systems defined by graphs. Discret Appl Math 77(3):237–257. doi:10.1016/S0166-218X(96)00137-0

    Article  MathSciNet  MATH  Google Scholar 

  • Li Y, Long PM, Srinivasan A (2001) Improved bounds on the sample complexity of learning. J Comput Syst Sci 62(3):516–527. doi:10.1006/jcss.2000.1741

    Article  MathSciNet  MATH  Google Scholar 

  • Lim YS, Menasche DS, Ribeiro B, Towsley D, Basu P (2011) Online estimating the k central nodes of a network. In: IEEE network science workshop, NSW’11, pp 118–122. doi:10.1109/NSW.2011.6004633

  • Löffler M, Phillips JM (2009) Shape fitting on point sets with probability distributions. In: Fiat A, Sanders P (eds) Algorithms—ESA 2009. Lecture notes in computer science, vol 5757. Springer, Berlin, pp 313–324. doi:10.1007/978-3-642-04128-0_29

    Google Scholar 

  • Maiya AS, Berger-Wolf TY (2010) Online sampling of high centrality individuals in social networks. In: Zaki M, Yu J, Ravindran B, Pudi V (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 6118. Springer, Berlin, pp 91–98. doi:10.1007/978-3-642-13657-3_12

    Chapter  Google Scholar 

  • Matoušek J (2002) Lectures on discrete geometry. Springer, Secaucus

    Book  MATH  Google Scholar 

  • Mitzenmacher M, Upfal E (2005) Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press, New York

    Book  Google Scholar 

  • Newman MEJ (2010) Networks—an introduction. Oxford University Press, Oxford

    Book  MATH  Google Scholar 

  • Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(026):113. doi:10.1103/PhysRevE.69.026113

    Google Scholar 

  • Opsahl T, Agneessens F, Skvoretz J (2010) Node centrality in weighted networks: generalizing degree and shortest paths. Soc Netw 32(3):245–251. doi:10.1016/j.socnet.2010.03.006

    Article  Google Scholar 

  • Papagelis M, Das G, Koudas N (2013) Sampling online social networks. IEEE Trans Knowl Data Eng 25(3):662–676

    Article  Google Scholar 

  • Pfeffer J, Carley KM (2012) k-centralities: local approximations of global measures based on shortest paths. In: Proceedings of the 21st international conference on companion on world wide web, WWW ’12 Companion, pp 1043–1050. ACM, New York. doi:10.1145/2187980.2188239

  • Pohl I (1969) Bidirectional heuristic search in path problems. Ph.D. Thesis, Stanford University

  • Riondato M, Kornaropoulos EM (2014) Fast approximation of betweenness centrality through sampling. In: Castillo C, Metzler D (eds) Proceedings of 7th ACM conference on web search data mining, WSDM’14. ACM, New York

  • Riondato M, Upfal E (2014) Efficient discovery of association rules and frequent itemsets through sampling with tight performance guarantees. ACM Trans Knowl Disc Data 8(2)

  • Roditty L, Williams VV (2012) Approximating the diameter of a graph. CoRR abs/1207.3622

  • Sarıyüce AE, Saule E, Kaya K, Çatalyürek UV (2013) Shattering and compressing networks for betweenness centrality. In: SIAM data mining conference

  • Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: from theory to algorithms. Cambridge University Press, New York

    Book  Google Scholar 

  • Staudt C, Sazonovs A, Meyerhenke H (2014) Networkit: an interactive tool suite for high-performance network analysis. CoRR abs/1403.3005

  • Tang J, Zhang C, Cai K, Zhang L, Su Z (2015) Sampling representative users from large social networks. In: AAAI

  • Vapnik VN, Chervonenkis AJ (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab Appl 16(2):264–280. doi:10.1137/1116025

    Article  MathSciNet  MATH  Google Scholar 

  • Yoshida Y (2014) Almost linear-time algorithms for adaptive betweenness centrality using hypergraph sketches. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’14, pp 1416–1425. ACM, New York. doi:10.1145/2623330.2623626

Download references

Acknowledgments

This project was supported, in part, by the National Science Foundation under award IIS-1247581. E. M. Kornaropoulos was supported, in part by the Kanellakis Fellowship at Brown University. We are thankful to Eli Upfal for his guidance and advice and to the anonymous reviewers of WSDM’14 and DMKD whose comments helped us improving this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matteo Riondato.

Additional information

Responsible editor: Tina Eliassi-Rad.

A preliminary version of this work appeared in the proceedings of ACM WSDM’14 (Riondato and Kornaropoulos 2014).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Riondato, M., Kornaropoulos, E.M. Fast approximation of betweenness centrality through sampling. Data Min Knowl Disc 30, 438–475 (2016). https://doi.org/10.1007/s10618-015-0423-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-015-0423-0

Keywords

  • Social network analysis
  • Betweenness centrality
  • VC-dimension
  • Sampling
  • Approximation algorithms