## Abstract

Betweenness centrality is a fundamental measure in social network analysis, expressing the importance or influence of individual vertices (or edges) in a network in terms of the fraction of shortest paths that pass through them. Since exact computation in large networks is prohibitively expensive, we present two efficient randomized algorithms for betweenness estimation. The algorithms are based on random sampling of shortest paths and offer probabilistic guarantees on the quality of the approximation. The first algorithm estimates the betweenness of all vertices (or edges): all approximate values are within an additive factor \(\varepsilon \in (0,1)\) from the real values, with probability at least \(1-\delta \). The second algorithm focuses on the top-K vertices (or edges) with highest betweenness and estimate their betweenness value to within a multiplicative factor \(\varepsilon \), with probability at least \(1-\delta \). This is the first algorithm that can compute such approximation for the top-K vertices (or edges). By proving upper and lower bounds to the VC-dimension of a range set associated with the problem at hand, we can bound the sample size needed to achieve the desired approximations. We obtain sample sizes that are independent from the number of vertices in the network and only depend on a characteristic quantity that we call the vertex-diameter, that is the maximum number of vertices in a shortest path. In some cases, the sample size is completely independent from any quantitative property of the graph. An extensive experimental evaluation on real and artificial networks shows that our algorithms are significantly faster and much more scalable as the number of vertices grows than other algorithms with similar approximation guarantees.

This is a preview of subscription content, access via your institution.

## Notes

Note that even if \(p_\emptyset =\emptyset \), the set \(\{p_\emptyset \}\) is not empty. It contains one element.

We use the normalized version of betweenness as we believe it to be more suitable for presenting approximation results.

After this work was accepted for publication, Bergamini and Meyerhenke (2015) presented an improved upper bound for VD(G) for undirected weighted graphs.

It is a well-known open problem whether there is an algorithm to perform a single

*s*,*t*-shortest path computation between a pair of vertices with smaller worst-case time complexity than the Single Source Shortest Path computation.Bounded-distance betweenness is also known as

*k*-betweenness. We prefer the former denomination to avoid confusion with*k*-path betweenness as defined by Kourtellis et al. (2012).We take the average, rather than the sum, as defined in the original work by Kourtellis et al. (2012), to normalize the value so that it belongs to the interval [0, 1]. This has no consequences on the results.

The implementations are available at http://cs.brown.edu/~matteo/centrsampl.tar.bz2. An independent implementation of our algorithms is available in NetworKit Staudt et al. (2014).

## References

Abraham I, Delling D, Fiat A, Goldberg AV, Werneck RF (2011) VC-dimension and shortest path algorithms. Automata, languages and programming., Lecture notes in computer scienceSpringer, Berlin, pp 690–699. doi:10.1007/978-3-642-22006-7_58

Aingworth D, Chekuri C, Indyk P, Motwani R (1999) Fast estimation of diameter and shortest paths (without matrix multiplication). SIAM J Comput 28(4):1167–1181. doi:10.1137/S0097539796303421

Anthonisse JM (1971) The rush in a directed graph. Technical Report BN 9/71, Stichting Mathematisch Centrum, Amsterdam

Anthony M, Bartlett PL (1999) Neural network learning—theoretical foundations. Cambridge University Press, New York

Bader DA, Kintali S, Madduri K, Mihail M (2007) Approximating betweenness centrality. In: Bonato A, Chung F (eds) Algorithms and models for the web-graph. Lecture notes in computer science, vol 4863. Springer, Berlin. doi:10.1007/978-3-540-77004-6_10

Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

Bergamini E, Meyerhenke H (2015) Fully-dynamic approximation of betweenness centrality. CoRR abs/1504.0709 (to appear in ESA’15)

Bergamini E, Meyerhenke H, Staudt CL (2015) Approximating betweenness centrality in large evolving networks. In: 17th Workshop on Algorithm Engineering and Experiments, ALENEX 2015, SIAM, pp 133–146

Boitmanis K, Freivalds K, Lediņš P, Opmanis R (2006) Fast and simple approximation of the diameter and radius of a graph. In: Alvarez C, Serna M (eds) Experimental algorithms. Lecture notes in computer science. Springer, Berlin, pp 98–108. doi:10.1007/11764298_9

Borgatti SP, Everett MG (2006) A graph-theoretic perspective on centrality. Soc Netw 28(4):466–484. doi:10.1016/j.socnet.2005.11.005

Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177. doi:10.1080/0022250X.2001.9990249

Brandes U (2008) On variants of shortest-path betweenness centrality and their generic computation. Soc Netw 30(2):136–145. doi:10.1016/j.socnet.2007.11.001

Brandes U, Pich C (2007) Centrality estimation in large networks. Int J Bifurc Chaos 17(7):2303–2318. doi:10.1142/S0218127407018403

Cormode G, Duffield N (2014) Sampling for big data: a tutorial. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’14, pp 1975–1975. ACM, New York. doi:10.1145/2623330.2630811. http://doi.acm.org/10.1145/2623330.2630811

Csárdi G, Nepusz T (2006) The igraph software package for complex network research. InterJ Complex Syst 1695:1–9

Dolev S, Elovici Y, Puzis R (2010) Routing betweenness centrality. J ACM 57(4):25:1–25:27. doi:10.1145/1734213.1734219

Eppstein D, Wang J (2004) Fast approximation of centrality. J Graph Algorithms Appl 8(1):39–45

Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40:35–41

Geisberger R, Sanders P, Schultes D (2008) Better approximation of betweenness centrality. In: Munro JI, Wagner D (eds) Algorithm engineering & experiments (ALENEX’08), SIAM, pp 90–100

Har-Peled S, Sharir M (2011) Relative \((p,\varepsilon )\)-approximations in geometry. Discret Comput Geom 45(3):462–496. doi:10.1007/s00454-010-9248-1

Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30

Jacob R, Koschützki D, Lehmann K, Peeters L, Tenfelde-Podehl D (2005) Algorithms for centrality indices. In: Brandes U, Erlebach T (eds) Network analysis. Lecture notes in computer science, vol 3418. Springer, Berlin, pp 62–82. doi:10.1007/978-3-540-31955-9_4

Kaindl H, Kainz G (1997) Bidirectional heuristic search reconsidered. J Artif Intell Res 7:283–317. doi:10.1613/jair.460

Kourtellis N, Alahakoon T, Simha R, Iamnitchi A, Tripathi R (2012) Identifying high betweenness centrality nodes in large social networks. Soc Netw Anal Mining 3:899–914. doi:10.1007/s13278-012-0076-6

Kourtellis N, Morales GDF, Bonchi F (2014) Scalable online betweenness centrality in evolving graphs. CoRR abs/1401.6981

Kranakis E, Krizanc D, Ruf B, Urrutia J, Woeginger G (1997) The VC-dimension of set systems defined by graphs. Discret Appl Math 77(3):237–257. doi:10.1016/S0166-218X(96)00137-0

Li Y, Long PM, Srinivasan A (2001) Improved bounds on the sample complexity of learning. J Comput Syst Sci 62(3):516–527. doi:10.1006/jcss.2000.1741

Lim YS, Menasche DS, Ribeiro B, Towsley D, Basu P (2011) Online estimating the k central nodes of a network. In: IEEE network science workshop, NSW’11, pp 118–122. doi:10.1109/NSW.2011.6004633

Löffler M, Phillips JM (2009) Shape fitting on point sets with probability distributions. In: Fiat A, Sanders P (eds) Algorithms—ESA 2009. Lecture notes in computer science, vol 5757. Springer, Berlin, pp 313–324. doi:10.1007/978-3-642-04128-0_29

Maiya AS, Berger-Wolf TY (2010) Online sampling of high centrality individuals in social networks. In: Zaki M, Yu J, Ravindran B, Pudi V (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 6118. Springer, Berlin, pp 91–98. doi:10.1007/978-3-642-13657-3_12

Matoušek J (2002) Lectures on discrete geometry. Springer, Secaucus

Mitzenmacher M, Upfal E (2005) Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press, New York

Newman MEJ (2010) Networks—an introduction. Oxford University Press, Oxford

Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(026):113. doi:10.1103/PhysRevE.69.026113

Opsahl T, Agneessens F, Skvoretz J (2010) Node centrality in weighted networks: generalizing degree and shortest paths. Soc Netw 32(3):245–251. doi:10.1016/j.socnet.2010.03.006

Papagelis M, Das G, Koudas N (2013) Sampling online social networks. IEEE Trans Knowl Data Eng 25(3):662–676

Pfeffer J, Carley KM (2012) k-centralities: local approximations of global measures based on shortest paths. In: Proceedings of the 21st international conference on companion on world wide web, WWW ’12 Companion, pp 1043–1050. ACM, New York. doi:10.1145/2187980.2188239

Pohl I (1969) Bidirectional heuristic search in path problems. Ph.D. Thesis, Stanford University

Riondato M, Kornaropoulos EM (2014) Fast approximation of betweenness centrality through sampling. In: Castillo C, Metzler D (eds) Proceedings of 7th ACM conference on web search data mining, WSDM’14. ACM, New York

Riondato M, Upfal E (2014) Efficient discovery of association rules and frequent itemsets through sampling with tight performance guarantees. ACM Trans Knowl Disc Data 8(2)

Roditty L, Williams VV (2012) Approximating the diameter of a graph. CoRR abs/1207.3622

Sarıyüce AE, Saule E, Kaya K, Çatalyürek UV (2013) Shattering and compressing networks for betweenness centrality. In: SIAM data mining conference

Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: from theory to algorithms. Cambridge University Press, New York

Staudt C, Sazonovs A, Meyerhenke H (2014) Networkit: an interactive tool suite for high-performance network analysis. CoRR abs/1403.3005

Tang J, Zhang C, Cai K, Zhang L, Su Z (2015) Sampling representative users from large social networks. In: AAAI

Vapnik VN, Chervonenkis AJ (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab Appl 16(2):264–280. doi:10.1137/1116025

Yoshida Y (2014) Almost linear-time algorithms for adaptive betweenness centrality using hypergraph sketches. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’14, pp 1416–1425. ACM, New York. doi:10.1145/2623330.2623626

## Acknowledgments

This project was supported, in part, by the National Science Foundation under award IIS-1247581. E. M. Kornaropoulos was supported, in part by the Kanellakis Fellowship at Brown University. We are thankful to Eli Upfal for his guidance and advice and to the anonymous reviewers of WSDM’14 and DMKD whose comments helped us improving this work.

## Author information

### Authors and Affiliations

### Corresponding author

## Additional information

Responsible editor: Tina Eliassi-Rad.

A preliminary version of this work appeared in the proceedings of ACM WSDM’14 (Riondato and Kornaropoulos 2014).

## Rights and permissions

## About this article

### Cite this article

Riondato, M., Kornaropoulos, E.M. Fast approximation of betweenness centrality through sampling.
*Data Min Knowl Disc* **30**, 438–475 (2016). https://doi.org/10.1007/s10618-015-0423-0

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s10618-015-0423-0

### Keywords

- Social network analysis
- Betweenness centrality
- VC-dimension
- Sampling
- Approximation algorithms