## Abstract

Centrality measures, erstwhile popular amongst the sociologists and psychologists, have seen broad and increasing applications across several disciplines of late. Amongst a plethora of application-specific definitions available in the literature to rank the vertices, closeness centrality, betweenness centrality and eigenvector centrality (page-rank) have been the widely applied ones. We are surrounded by networks where information, signal or commodities are flowing through the edges. Betweenness centrality comes as a handy tool to analyze such systems, but computation of these scores is a daunting task in large-size networks. Since computing the betweenness centrality of one node is conjectured to be same as time taken for computing the betweenness centrality of all the nodes, a fast algorithm was required that can efficiently estimate a node’s betweenness score. In this paper, we propose a heuristic that efficiently estimates the betweenness score of a given node. The algorithm incorporates a non-uniform node-sampling model which is developed based on the analysis of random Erdős-Rényi graphs. We apply the heuristic to estimate the ranking of an arbitrary *k* vertices, called betweenness-ordering problem, where *k* is much less than the total number of vertices. The proposed heuristic produces very efficient results even when runs for a linear time in the number of edges. An extensive experimental evidence is presented to demonstrate the performance of the proposed heuristic for betweenness estimation and ordering on several synthetic and real-world graphs.

This is a preview of subscription content, access via your institution.

## References

Agarwal M, Singh RR, Chaudhary S, Iyengar S (2015) An efficient estimation of a node’s betweenness. Complex networks VI. Springer, Berlin, pp 111–121

Anthonisse J.M (1971) The rush in a directed graph. Stichting Mathematisch Centrum. Mathematische Besliskunde (BN 9/71), 1–10

Bader DA, Kintali S, Madduri K, Mihail M (2007) Approximating betweenness centrality. In: Proceedings of the 5th International Conference on Algorithms and Models for the Web-graph, WAW’07, Springer, Berlin, pp 124–137

Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

Batagelj V, Mrvar A (2006) Pajek datasets. http://vlado.fmf.uni-lj.si/pub/networks/data

Bauckhage C, Kersting K, Rastegarpanah B (2013) The Weibull as a model of shortest path distributions in random networks. In: Proc. Int. Workshop on Mining and Learning with Graphs, Chicago, IL, USA

Bergamini E, Meyerhenke H (2015) Fully-dynamic approximation of betweenness centrality. arXiv preprint arXiv:1504.07091

Bergamini E, Meyerhenke H, Staudt CL (2014) Approximating betweenness centrality in large evolving networks. arXiv preprint arXiv:1409.6241

Borgatti SP, Li X (2009) On social network analysis in a supply chain context*. J Supply Chain Manag 45(2):5–22

Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177. https://doi.org/10.1080/0022250X.2001.9990249

Brandes U, Erlebach T (2005) Network analysis: methodological foundations, vol 3418. Springer, Berlin

Brandes U, Pich C (2007) Centrality estimation in large networks. Int J Bifurcation Chaos 17(07):2303–2318. https://doi.org/10.1142/S0218127407018403

Carvalho R, Buzna L, Bono F, Gutiérrez E, Just W, Arrowsmith D (2009) Robustness of trans-european gas networks. Phys Rev E 80(1):016,106

Chehreghani MH (2014) An efficient algorithm for approximate betweenness centrality computation. Comput J

Davis TA, Hu Y (2011) The university of florida sparse matrix collection. ACM Trans Math Softw (TOMS) 38(1):1

Derrible S (2012) Network centrality of metro systems. PloS One 7(7):e40,575

Dijkstra EW (1959) A note on two problems in connexion with graphs. Numerische Mathematik 1(1):269–271

Eppstein D, Wang J (2004) Fast approximation of centrality. J Graph Algorithms Appl 8:39–45

Ercsey-Ravasz M, Toroczkai Z (2010) Centrality scaling in large networks. Phys Rev Lett 105(3):038701

Ercsey-Ravasz M, Lichtenwalter RN, Chawla NV, Toroczkai Z (2012) Range-limited centrality measures in complex networks. Phys Rev E 85(6):066103

Erdos D, Ishakian V, Bestavros A, Terzi E (2014) A divide-and-conquer algorithm for betweenness centrality. arXiv preprint arXiv:1406.4173

Erdos P, Renyi A (1959) On random graphs I. Publ Math Debrecen 6:290–297

Floyd RW (1962) Algorithm 97: shortest path. Commun ACM 5(6):345

Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35–41

Geisberger R, Sanders P, Schultes D (2008) Better approximation of betweenness centrality, chap. 8, pp 90–100. https://doi.org/10.1137/1.9781611972887.9

Gkorou D, Pouwelse J, Epema D, Kielmann T, van Kreveld M, Niessen W (2010) Efficient approximate computation of betweenness centrality. In: 16th annual conf. of the Advanced School for Computing and Imaging (ASCI 2010)

Goel K, Singh RR, Iyengar S (2013) A faster algorithm to update betweenness centrality after node alteration. Algorithms and models for the web graph. Springer, Berlin, pp 170–184

Green O, McColl R, Bader D (2012) A fast algorithm for streaming betweenness centrality. In: Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom), pp 11–20. https://doi.org/10.1109/SocialCom-PASSAT.2012.37

Hage P, Harary F (1995) Eccentricity and centrality in networks. Soc Netw 17(1):57–63. https://doi.org/10.1016/0378-8733(94)00248-9

Jackson MO (2008) Social and economic networks. Princeton University Press, Princeton

Joy MP, Brock A, Ingber DE, Huang S (2005) High-betweenness proteins in the yeast protein interaction network. BioMed Res Int 2005(2):96–103

Kas M, Wachs M, Carley KM, Carley LR (2013) Incremental algorithm for updating betweenness centrality in dynamically growing networks. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM ’13, ACM, New York, pp 33–40. https://doi.org/10.1145/2492517.2492533

Kintali S (2008) Betweenness centrality: algorithms and lower bounds. arXiv preprint arXiv:0809.1906

Knuth DE (1998) The art of computer programming: sorting and searching, vol 3. Pearson Education, London

Lee C-Y (2006) Correlations among centrality measures in complex networks. arXiv preprint arXiv:physics/0605220

Le Merrer E, Le Scouarnec N, Trédan G (2014) Heuristical top-k: fast estimation of centralities in complex networks. Inf Process Lett 114(8):432–436

Lee MJ, Choi S, Chung CW (2016) Efficient algorithms for updating betweenness centrality in fully dynamic graphs. Inf Sci 326:278–296

Lee M.J, Lee J, Park JY, Choi RH, Chung CW (2012) Qube: a quick algorithm for updating betweenness centrality. In: Proceedings of the 21st International Conference on World Wide Web, WWW ’12, ACM, New York, pp 351–360. https://doi.org/10.1145/2187836.2187884

Leskovec J, Krevl A (2014) SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data

Lim YS, Menasche D.S, Ribeiro B, Towsley D, Basu P (2011) Online estimating the k central nodes of a network. IEEE Network Science Workshop, pp 118–122. http://doi.ieeecomputersociety.org/10.1109/NSW.2011.6004633

Lu TC, Zhang Y, Allen DL, Salman MA (2011) Design for fault analysis using multi-partite, multi-attribute betweenness centrality measures

Mizgier KJ, Jüttner MP, Wagner SM (2013) Bottleneck identification in supply chain networks. Int J Prod Res 51(5):1477–1490

Narayanan S (2005) The betweenness centrality of biological networks. Ph.D. thesis, Virginia Polytechnic Institute and State University

Nasre M, Pontecorvi M, Ramachandran V (2013) Betweenness centality–incremental and faster. arXiv preprint arXiv:1311.2147

Newman M (2010) Networks: an introduction. Oxford University Press Inc, New York

Riondato M, Kornaropoulos EM (2014) Fast approximation of betweenness centrality through sampling. In: Proceedings of the 7th ACM international conference on Web search and data mining, pp 413–422

Sariyüce AE, Saule E, Kaya K, Çatalyürek Ü.V (2013) Shattering and compressing networks for betweenness centrality. In: SIAM Data Mining Conference (SDM)

Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 990–998

Tizghadam A, Leon-Garcia A (2010) Betweenness centrality and resistance distance in communication networks. Network 24(6):10–16

Valente TW, Coronges K, Lakon C, Costenbader E (2008) How correlated are network centrality measures? Connections (Toronto, Ont.) 28(1):16

Van Der Hofstad R (2009) Random graphs and complex networks. http://www.win.tue.nl/rhofstad/NotesRGCN.pdf

Wang X (2011) Deciding on the type of the degree distribution of a graph (network) from traceroute-like measurements

Warshall S (1962) A theorem on boolean matrices. J ACM (JACM) 9(1):11–12

Welzl E (1991) Smallest enclosing disks (balls and ellipsoids). Springer, Berlin

## Acknowledgements

The authors would like to thank the IIT Ropar HPC committee for providing the resources for performing experiments. They also would like to thank the Malgudi team at IIT Ropar for their comments to improve the presentation of the paper. The authors would like to thank the referees for helpful comments.

## Author information

### Authors and Affiliations

### Corresponding author

## Additional information

This is an expanded and extended version of the results appeared in CompleNet 2015 Agarwal et al. (2015).

## Appendices

### Appendix 1: Average error and efficiency vs number of nodes in graphs (*n*)

In this section, we present the experimental results exhibiting the scalability of the proposed heuristic. The plots in this section compare the average error in the estimation of betweenness score and the average efficiency in ordering the nodes based on the betweenness scores with respect to the number of nodes in graphs. We generated graphs with \(n=100\) to \(n=1000\) with a step of 100 while keeping the average degree constant. For each *n*, we generated 5 graphs and took mean of the average error and efficiency over all the 5 graphs. Fig. 5 contains the plots. The plot in Fig. 5a depicts the change in the average error and plot in Fig. 5b represents the change in the average efficiency by changing the number of nodes in graphs while keeping the average degree of the graphs as a constant. We plotted the results for the Erdős-Rényi random (*G*(*n*, *p*)) graphs with \(\hbox {average degree} = \{3, 5, 10\}\) and for Barabasi-Albert random scale-free (*H*(*n*, *k*)) graphs with \(\hbox {average degree} = \{4, 6, 10\}\). These graphs have been discussed in Sect. 6.1.1. From these plots, we infer that the average error in the computation of betweenness score decreases and the average efficiency in ordering the nodes based on betweenness score increases with an increase in the number of nodes in graphs.

### Appendix 2: Average efficiency in *k*-betweenness-ordering vs *k*

In this section, we plot the average efficiency of the proposed heuristic for ordering *k* nodes in respect of the considered synthetic and real-world graphs. We calculate the average efficiency for *k*-betweenness-ordering as following. In a given graph, we randomly pick a set of *k* nodes and calculate the efficiency in ordering these *k* nodes based on the formula given in Sect. 6.2.2. We do this for 1000 iterations and take the mean of the efficiency in these 1000 iterations.

The plots in Fig. 6 show the average efficiency of the extension of Algorithm 2 for *k*-betweenness-ordering on various synthetic networks (Fig. 6a) and real-world networks (Fig. 6b). Label BO denotes the efficiency of betweenness-ordering. Labels 3-BO, 5-BO, and 10-BO denotes the average efficiency of the proposed heuristic for *k*-betweenness-ordering when \(k=3,5,10\) respectively. The plots show that the performance of the extended Algorithm 2 for *k*-betweenness-ordering is nearly same as the performance of Algorithm 2 for the betweenness-ordering of two nodes.

### Appendix 3: Average relaxed efficiency (\(\xi ^t\)) vs *t*

In this section, we inspect the plot between the relaxed efficiency \(\xi ^t\) and the threshold parameter *t* (as discussed in Sect. 6.2.2) on some synthetic networks with 1000 nodes. Similar results are achieved on all considered networks. The relaxed efficiency is computed by the formula given in Sect. 6.2.2. We vary *t* for \(t=2, 3, 5, 10\). The plots are compiled in Fig. 7. For each *t*, we generate 5 synthetic networks and then calculate the relaxed efficiency on each graph and average them to get average relaxed efficiency. We plotted average relaxed efficiency for both type of considered synthetic graphs (ER and BA). BO, 2-E, 3-E, 5-E, 10-E are the labels assumed for average relaxed efficiency when \(t=0, 2, 3, 5, 10\) respectively. The plots demonstrate that the efficiency increases with an increase in *t*, i.e., in a real-world situation where relaxation is allowed in ordering nodes, the proposed heuristic will perform much better than its usual performance. The relaxation in ordering means that the error in the betweenness-ordering of the nodes with approximately same betweenness rank is ignored. It considers an error only if the nodes with the difference in their betweenness ranks greater than a threshold value (*t*) are ordered incorrectly.

### 3.1 Correlation in ordering/average efficiency

In this section, we compare the performance of ranking all the nodes based on their estimated betweenness scores using the proposed heuristic and considered competitive estimation algorithms. The performance is evaluated in terms of the average *Spearman rank correlation* (\(\rho\)) with the exact ranking produced by Brandes’s algorithm (Brandes 2001). Spearman rank correlation \(\rho\) is a standard ranking correlation that measures the similarity between the ranking generated by two approaches. We plot the average Spearman rank correlation between the results produced by Algorithm 2 and the other competitive approaches on the considered synthetic and real-world networks. The plots are in Fig. 8.

Correlation of the proposed heuristic is very high (very close to 1) for almost all the considered networks. It outperforms all node-sampling based algorithms. In very dense networks, 2-BC sometimes produces a better result than our approach, but it should be noted that the difference in correlations are minute even when 2-BC takes a lot more time than our approach. In sparse networks, the proposed heuristic produces much better results than 2-BC in relatively smaller amount of time.

### Appendix 4: Corroborating experimental evidence of linear time running for efficient ordering

In this section, we show experimentally that the expected time for betweenness-ordering heuristic is linear with the number of edges (*m*). If the average degree is constant, then the average ordering time is also shown linear in the number of nodes (*n*). We have picked Gnutella-family of networks from Leskovec and Krevl (2014). This family comprises of 9 different snapshots of Gnutella peer-to-peer file sharing network from August 2002. The details of the networks and betweenness-ordering results on the networks are summarized in table 8. The columns in the table contain name of the network, number of nodes (*n*), average degree of nodes in the network(AVG.D.), average ordering time over 500 random pairs of nodes (Avg. time) in seconds, and average efficiency in ordering the 500 randomly picked pairs of nodes based on their betweenness score (Avg. Efficiency) respectively. The average betweenness-ordering for all the networks in this family of networks is above 97% which is surely a high accuracy in very less ordering time.

The plots of the results are shown in Fig. 9. The *x* axis represents either the number of nodes or the number of edges and* y* axis denotes the average time taken for ordering two nodes based on their betweenness score. It is clear from the plots that the average betweenness-ordering time is linear with the number of edges. Here ordering time is also linear with the number of edges. It is due to the constant and small average degree of nodes in all of the networks in the considered network-family which made the number of edges as linear to the number of nodes. These plots are the corroborating experimental evidence of the proposed heuristic’s linear running time on real-world graphs. This result also concludes that scaling of graphs in the terms of the number of edges will linearly affect the ordering time and thus our ordering algorithm will be very quick and accurate than any other possible algorithms.

### 4.1 Eccentricity ordering

We illustrate the centrality-ordering problem in the context of a centrality measure called the *eccentricity* measure and give a simple approximation approach for eccentricity ordering (Fig. 10).

The eccentricity of a node *v* (Hage and Harary 1995) in a connected graph *G* is defined as the shortest distance to the farthest node from *v* in *G*. Center of a graph which is a solution to the *facility location problems*, is calculated by picking the nodes with least eccentricity. Finding eccentricity of all nodes is as expensive as finding closeness, betweenness or stress centrality for all nodes in respect of time. Given a graph in the two-dimensional Euclidean space, if we were to solve the *eccentricity ordering problem* of two nodes in that graph without computing the eccentricities, we would go about the following way: drawing a minimum circle (Disk) covering all the nodes is a very well known problem called *smallest-circle problem* or *minimum covering circle problem*. A linear (*O*(*n*)) time randomized algorithm by Welzl (1991) can find the smallest circle covering *n* points on a 2-D Euclidean plane. Once we find the smallest circle, an approximate solution to the eccentricity comparison problem is to compare the distance from the center of the smallest circle to the nodes. If the nodes are evenly distributed in the smallest circle, then the node closer to the center of that smallest circle is likely to have smaller eccentricity and vice versa. Therefore, the eccentricity ordering problem can be estimated in linear time as opposed to finding it the conventional way by considering all possible distances from the given vertex to all other vertices. A theoretical bound on the ordering efficiency of the above scheme is still open.

## Rights and permissions

## About this article

### Cite this article

Singh, R.R., Iyengar, S.R.S., Chaudhary, S. *et al.* An efficient heuristic for betweenness estimation and ordering.
*Soc. Netw. Anal. Min.* **8**, 66 (2018). https://doi.org/10.1007/s13278-018-0542-x

Received:

Revised:

Accepted:

Published:

DOI: https://doi.org/10.1007/s13278-018-0542-x