Fast approximation of betweenness centrality through sampling

Riondato, Matteo; Kornaropoulos, Evgenios M.

doi:10.1007/s10618-015-0423-0

Fast approximation of betweenness centrality through sampling

Published: 30 June 2015

Volume 30, pages 438–475, (2016)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Matteo Riondato¹ &
Evgenios M. Kornaropoulos¹

2729 Accesses
103 Citations
2 Altmetric
Explore all metrics

Abstract

Betweenness centrality is a fundamental measure in social network analysis, expressing the importance or influence of individual vertices (or edges) in a network in terms of the fraction of shortest paths that pass through them. Since exact computation in large networks is prohibitively expensive, we present two efficient randomized algorithms for betweenness estimation. The algorithms are based on random sampling of shortest paths and offer probabilistic guarantees on the quality of the approximation. The first algorithm estimates the betweenness of all vertices (or edges): all approximate values are within an additive factor \(\varepsilon \in (0,1)\) from the real values, with probability at least \(1-\delta \). The second algorithm focuses on the top-K vertices (or edges) with highest betweenness and estimate their betweenness value to within a multiplicative factor \(\varepsilon \), with probability at least \(1-\delta \). This is the first algorithm that can compute such approximation for the top-K vertices (or edges). By proving upper and lower bounds to the VC-dimension of a range set associated with the problem at hand, we can bound the sample size needed to achieve the desired approximations. We obtain sample sizes that are independent from the number of vertices in the network and only depend on a characteristic quantity that we call the vertex-diameter, that is the maximum number of vertices in a shortest path. In some cases, the sample size is completely independent from any quantitative property of the graph. An extensive experimental evaluation on real and artificial networks shows that our algorithms are significantly faster and much more scalable as the number of vertices grows than other algorithms with similar approximation guarantees.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A new semi-local centrality for identifying influential nodes based on local average shortest path with extended neighborhood

Article Open access 13 April 2024

Yi Xiao, Yuan Chen, … Xiaoping Zhu

Algorithms for generating all possible spanning trees of a simple undirected connected graph: an extensive review

Article Open access 13 August 2018

Maumita Chakraborty, Sumon Chowdhury, … Rajat Kumar Pal

Approximate Bayesian computation using asymptotically normal point estimates

Article 27 April 2022

George Karabatsos

Notes

Note that even if \(p_\emptyset =\emptyset \), the set \(\{p_\emptyset \}\) is not empty. It contains one element.
We use the normalized version of betweenness as we believe it to be more suitable for presenting approximation results.
After this work was accepted for publication, Bergamini and Meyerhenke (2015) presented an improved upper bound for VD(G) for undirected weighted graphs.
It is a well-known open problem whether there is an algorithm to perform a single s, t-shortest path computation between a pair of vertices with smaller worst-case time complexity than the Single Source Shortest Path computation.
Bounded-distance betweenness is also known as k-betweenness. We prefer the former denomination to avoid confusion with k-path betweenness as defined by Kourtellis et al. (2012).
We take the average, rather than the sum, as defined in the original work by Kourtellis et al. (2012), to normalize the value so that it belongs to the interval [0, 1]. This has no consequences on the results.
The implementations are available at http://cs.brown.edu/~matteo/centrsampl.tar.bz2. An independent implementation of our algorithms is available in NetworKit Staudt et al. (2014).
http://snap.stanford.edu/data/index.html

References

Abraham I, Delling D, Fiat A, Goldberg AV, Werneck RF (2011) VC-dimension and shortest path algorithms. Automata, languages and programming., Lecture notes in computer scienceSpringer, Berlin, pp 690–699. doi:10.1007/978-3-642-22006-7_58
Book Google Scholar
Aingworth D, Chekuri C, Indyk P, Motwani R (1999) Fast estimation of diameter and shortest paths (without matrix multiplication). SIAM J Comput 28(4):1167–1181. doi:10.1137/S0097539796303421
Article MathSciNet MATH Google Scholar
Anthonisse JM (1971) The rush in a directed graph. Technical Report BN 9/71, Stichting Mathematisch Centrum, Amsterdam
Anthony M, Bartlett PL (1999) Neural network learning—theoretical foundations. Cambridge University Press, New York
Book MATH Google Scholar
Bader DA, Kintali S, Madduri K, Mihail M (2007) Approximating betweenness centrality. In: Bonato A, Chung F (eds) Algorithms and models for the web-graph. Lecture notes in computer science, vol 4863. Springer, Berlin. doi:10.1007/978-3-540-77004-6_10
Google Scholar
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Article MathSciNet Google Scholar
Bergamini E, Meyerhenke H (2015) Fully-dynamic approximation of betweenness centrality. CoRR abs/1504.0709 (to appear in ESA’15)
Bergamini E, Meyerhenke H, Staudt CL (2015) Approximating betweenness centrality in large evolving networks. In: 17th Workshop on Algorithm Engineering and Experiments, ALENEX 2015, SIAM, pp 133–146
Boitmanis K, Freivalds K, Lediņš P, Opmanis R (2006) Fast and simple approximation of the diameter and radius of a graph. In: Alvarez C, Serna M (eds) Experimental algorithms. Lecture notes in computer science. Springer, Berlin, pp 98–108. doi:10.1007/11764298_9
Google Scholar
Borgatti SP, Everett MG (2006) A graph-theoretic perspective on centrality. Soc Netw 28(4):466–484. doi:10.1016/j.socnet.2005.11.005
Article Google Scholar
Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177. doi:10.1080/0022250X.2001.9990249
Article MATH Google Scholar
Brandes U (2008) On variants of shortest-path betweenness centrality and their generic computation. Soc Netw 30(2):136–145. doi:10.1016/j.socnet.2007.11.001
Article MathSciNet Google Scholar
Brandes U, Pich C (2007) Centrality estimation in large networks. Int J Bifurc Chaos 17(7):2303–2318. doi:10.1142/S0218127407018403
Article MathSciNet MATH Google Scholar
Cormode G, Duffield N (2014) Sampling for big data: a tutorial. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’14, pp 1975–1975. ACM, New York. doi:10.1145/2623330.2630811. http://doi.acm.org/10.1145/2623330.2630811
Csárdi G, Nepusz T (2006) The igraph software package for complex network research. InterJ Complex Syst 1695:1–9
Google Scholar
Dolev S, Elovici Y, Puzis R (2010) Routing betweenness centrality. J ACM 57(4):25:1–25:27. doi:10.1145/1734213.1734219
Article MathSciNet Google Scholar
Eppstein D, Wang J (2004) Fast approximation of centrality. J Graph Algorithms Appl 8(1):39–45
Article MathSciNet MATH Google Scholar
Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40:35–41
Article Google Scholar
Geisberger R, Sanders P, Schultes D (2008) Better approximation of betweenness centrality. In: Munro JI, Wagner D (eds) Algorithm engineering & experiments (ALENEX’08), SIAM, pp 90–100
Har-Peled S, Sharir M (2011) Relative \((p,\varepsilon )\)-approximations in geometry. Discret Comput Geom 45(3):462–496. doi:10.1007/s00454-010-9248-1
Article MathSciNet MATH Google Scholar
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30
Article MathSciNet MATH Google Scholar
Jacob R, Koschützki D, Lehmann K, Peeters L, Tenfelde-Podehl D (2005) Algorithms for centrality indices. In: Brandes U, Erlebach T (eds) Network analysis. Lecture notes in computer science, vol 3418. Springer, Berlin, pp 62–82. doi:10.1007/978-3-540-31955-9_4
Google Scholar
Kaindl H, Kainz G (1997) Bidirectional heuristic search reconsidered. J Artif Intell Res 7:283–317. doi:10.1613/jair.460
MathSciNet MATH Google Scholar
Kourtellis N, Alahakoon T, Simha R, Iamnitchi A, Tripathi R (2012) Identifying high betweenness centrality nodes in large social networks. Soc Netw Anal Mining 3:899–914. doi:10.1007/s13278-012-0076-6
Article Google Scholar
Kourtellis N, Morales GDF, Bonchi F (2014) Scalable online betweenness centrality in evolving graphs. CoRR abs/1401.6981
Kranakis E, Krizanc D, Ruf B, Urrutia J, Woeginger G (1997) The VC-dimension of set systems defined by graphs. Discret Appl Math 77(3):237–257. doi:10.1016/S0166-218X(96)00137-0
Article MathSciNet MATH Google Scholar
Li Y, Long PM, Srinivasan A (2001) Improved bounds on the sample complexity of learning. J Comput Syst Sci 62(3):516–527. doi:10.1006/jcss.2000.1741
Article MathSciNet MATH Google Scholar
Lim YS, Menasche DS, Ribeiro B, Towsley D, Basu P (2011) Online estimating the k central nodes of a network. In: IEEE network science workshop, NSW’11, pp 118–122. doi:10.1109/NSW.2011.6004633
Löffler M, Phillips JM (2009) Shape fitting on point sets with probability distributions. In: Fiat A, Sanders P (eds) Algorithms—ESA 2009. Lecture notes in computer science, vol 5757. Springer, Berlin, pp 313–324. doi:10.1007/978-3-642-04128-0_29
Google Scholar
Maiya AS, Berger-Wolf TY (2010) Online sampling of high centrality individuals in social networks. In: Zaki M, Yu J, Ravindran B, Pudi V (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 6118. Springer, Berlin, pp 91–98. doi:10.1007/978-3-642-13657-3_12
Chapter Google Scholar
Matoušek J (2002) Lectures on discrete geometry. Springer, Secaucus
Book MATH Google Scholar
Mitzenmacher M, Upfal E (2005) Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press, New York
Book Google Scholar
Newman MEJ (2010) Networks—an introduction. Oxford University Press, Oxford
Book MATH Google Scholar
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(026):113. doi:10.1103/PhysRevE.69.026113
Google Scholar
Opsahl T, Agneessens F, Skvoretz J (2010) Node centrality in weighted networks: generalizing degree and shortest paths. Soc Netw 32(3):245–251. doi:10.1016/j.socnet.2010.03.006
Article Google Scholar
Papagelis M, Das G, Koudas N (2013) Sampling online social networks. IEEE Trans Knowl Data Eng 25(3):662–676
Article Google Scholar
Pfeffer J, Carley KM (2012) k-centralities: local approximations of global measures based on shortest paths. In: Proceedings of the 21st international conference on companion on world wide web, WWW ’12 Companion, pp 1043–1050. ACM, New York. doi:10.1145/2187980.2188239
Pohl I (1969) Bidirectional heuristic search in path problems. Ph.D. Thesis, Stanford University
Riondato M, Kornaropoulos EM (2014) Fast approximation of betweenness centrality through sampling. In: Castillo C, Metzler D (eds) Proceedings of 7th ACM conference on web search data mining, WSDM’14. ACM, New York
Riondato M, Upfal E (2014) Efficient discovery of association rules and frequent itemsets through sampling with tight performance guarantees. ACM Trans Knowl Disc Data 8(2)
Roditty L, Williams VV (2012) Approximating the diameter of a graph. CoRR abs/1207.3622
Sarıyüce AE, Saule E, Kaya K, Çatalyürek UV (2013) Shattering and compressing networks for betweenness centrality. In: SIAM data mining conference
Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: from theory to algorithms. Cambridge University Press, New York
Book Google Scholar
Staudt C, Sazonovs A, Meyerhenke H (2014) Networkit: an interactive tool suite for high-performance network analysis. CoRR abs/1403.3005
Tang J, Zhang C, Cai K, Zhang L, Su Z (2015) Sampling representative users from large social networks. In: AAAI
Vapnik VN, Chervonenkis AJ (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab Appl 16(2):264–280. doi:10.1137/1116025
Article MathSciNet MATH Google Scholar
Yoshida Y (2014) Almost linear-time algorithms for adaptive betweenness centrality using hypergraph sketches. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’14, pp 1416–1425. ACM, New York. doi:10.1145/2623330.2623626

Download references

Acknowledgments

This project was supported, in part, by the National Science Foundation under award IIS-1247581. E. M. Kornaropoulos was supported, in part by the Kanellakis Fellowship at Brown University. We are thankful to Eli Upfal for his guidance and advice and to the anonymous reviewers of WSDM’14 and DMKD whose comments helped us improving this work.

Author information

Authors and Affiliations

Department of Computer Science, Brown University, Providence, RI, USA
Matteo Riondato & Evgenios M. Kornaropoulos

Authors

Matteo Riondato
View author publications
You can also search for this author in PubMed Google Scholar
Evgenios M. Kornaropoulos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matteo Riondato.

Additional information

Responsible editor: Tina Eliassi-Rad.

A preliminary version of this work appeared in the proceedings of ACM WSDM’14 (Riondato and Kornaropoulos 2014).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Riondato, M., Kornaropoulos, E.M. Fast approximation of betweenness centrality through sampling. Data Min Knowl Disc 30, 438–475 (2016). https://doi.org/10.1007/s10618-015-0423-0

Download citation

Received: 01 October 2014
Accepted: 15 June 2015
Published: 30 June 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10618-015-0423-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fast approximation of betweenness centrality through sampling

Abstract

Access this article

Similar content being viewed by others

A new semi-local centrality for identifying influential nodes based on local average shortest path with extended neighborhood

Algorithms for generating all possible spanning trees of a simple undirected connected graph: an extensive review

Approximate Bayesian computation using asymptotically normal point estimates

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast approximation of betweenness centrality through sampling

Abstract

Access this article

Similar content being viewed by others

A new semi-local centrality for identifying influential nodes based on local average shortest path with extended neighborhood

Algorithms for generating all possible spanning trees of a simple undirected connected graph: an extensive review

Approximate Bayesian computation using asymptotically normal point estimates

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation