Space Efficient Incremental Betweenness Algorithm for Directed Graphs

Gil-Pons, Reynaldo

doi:10.1007/978-3-030-13469-3_31

Reynaldo Gil-Pons ORCID: orcid.org/0000-0003-1804-3319¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11401))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

1962 Accesses
1 Citations

Abstract

Betweenness is one of the most popular centrality measures in the analysis of social networks. Its computation has a high computational cost making it implausible for relatively large networks. The dynamic nature of many social networks opens up the possibility of developing faster algorithms for the dynamic version of the problem. In this work we propose a new incremental algorithm to compute the betweenness centrality of all nodes in directed graphs extracted from social networks. The algorithm uses linear space, making it suitable for large scale applications. Our experimental evaluation on a variety of real-world networks have shown our algorithm is faster than recalculation from scratch and competitive with recent approaches.

You have full access to this open access chapter, Download conference paper PDF

An incremental algorithm for updating betweenness centrality and k-betweenness centrality and its performance on realistic dynamic social network data

Article 25 November 2014

Efficient algorithms for incremental all pairs shortest paths, closeness and betweenness in social network analysis

Article 03 August 2014

Efficient Incremental Laplace Centrality Algorithm for Dynamic Networks

Keywords

1 Introduction

Centrality is one of the most important concepts in the analysis of social networks. Among centrality measures, one of the most popular is betweenness centrality [1, 6]. The betweenness of a node is a measure of the control this node has on the communication paths in the network. Therefore, it can be used to rank nodes according to their relative importance in a graph. Betweenness has been used effectively in a variety of applications, such as: design and control of communications networks [15], traffic monitoring [13], identifying key actors in terrorist networks [11], finding essential proteins [8], and many others.

Computing the betweenness of all nodes in a network has a high computational cost, so efficiency is the target of much related research. Nowadays, most graphs are inherently dynamic. When a graph suffers small changes, recomputing betweenness from scratch would be very inefficient. Therefore, dynamic algorithms capable of computing betweenness faster by using previous computations have been proposed [10, 12]. None of these is better than Brandes [3] (brandes) in the worst case, and there is evidence that this is likely very hard to overcome [16]. Despite that, good speedups in typical instances have been achieved [2, 7].

In this work, we focus on the exact computation of betweenness centrality in incremental graphs. While not allowing edges to be deleted, incremental graphs cover some important applications, as has been pointed by several authors before [2, 9, 12]. Two recently proposed algorithms deal with the same problem, obtaining better performance than previous work, so we compare with them:

1.
icentral [7] works on undirected connected graphs, and allows edges to be deleted and inserted. It only stores the betweenness of all nodes of the graph, so memory requirement is linear. First, it decomposes the graph into biconnected components, and then updates betweenness of nodes in the component affected by the update. In the article it’s proven that for undirected graphs, the betweenness can change only for nodes in the affected component. Its time complexity is highly dependent on the size of the affected biconnected component.
2.
ibet [2] works on directed graphs, and allows edges to be inserted. It stores all distances between pairs of nodes, so memory requirement is quadratic. First, it identifies efficiently all pairs of nodes which distance or number of shortest paths are affected by the update. Then it applies an optimized procedure to calculate changes in betweenness for nodes affected by the update. Experiments showed it outperforms previous approaches requiring quadratic memory.

In this paper we present a space efficient algorithm to compute the betweenness centrality of all nodes in a directed incremental network. Its space complexity is linear in the size of the input graph and its time complexity is similar to that of icentral. In the worst case, it’s equivalent to recalculating betweenness in the biconnected component where the added edge resides, plus some linear overhead. Up to the authors knowledge it’s the first algorithm calculating betweenness centrality in incremental directed graphs, showing better performance than recalculation, and at the same time, having less than quadratic space complexity. On the other hand, it works with disconnected graphs, detail usually left out by previous approaches, but important in real world applications.

In the next section we define betweenness, biconnected component and incremental algorithms. In Sect. 3 we present the proposed algorithm, prove its correctness, and determine space and time complexity. In Sect. 4 we show the experimental validation of our algorithm. At the end, the conclusions and references.

2 Preliminaries

For simplicity, we will refer to directed, simple and unweighted graphs. In the following we will refer to a graph $G = (V, E)$ with n nodes and m edges.

2.1 Betweenness

Betweenness centrality of a node is formally defined by the following formula:

$$\begin{aligned} C_B(v) = \sum _{\begin{array}{c} s \ne v, t \ne v\\ s,t \in V \end{array}} {\frac{\sigma _{st}(v)}{\sigma _{st}}} \end{aligned}$$

(1)

where $\sigma _{st}(v)$ is the number of shortest paths from s to t passing through v and $\sigma _{st}$ is the number of shortest paths from s to t. A naive algorithm using this formula has $\mathcal {O}(n^3)$ complexity.

In [3] Brandes showed a more efficient way to calculate betweenness values:

$$\begin{aligned} C_B(v) = \displaystyle \sum _{\begin{array}{c} s \ne v, s \in V \end{array}} \delta _{s\cdot }(v) \end{aligned}$$

(2)

where $\delta _{s\cdot }(v) = \sum _{\begin{array}{c} s \ne v,t \ne v,t \in V \end{array}} \frac{\sigma _{st}(v)}{\sigma _{st}}$. Using this formula betweenness values can be computed in time $\mathcal {O}(n \cdot m)$, by running a BFS (Breath First Search [4]) on each node and computing the required values (distances, $\sigma $, $\delta $). For a complete explanation see [3].

2.2 Biconnected Components

Biconnected components were first proposed as a good heuristic for speeding up betweenness computations in [14], and more recently in the context of dynamic graphs in [7]. We will make use of the following definitions:

Definition 1

Let G be an undirected graph. A biconnected component is a connected induced subgraph A of G, such that the removal of any node doesn’t disconnect A, and is maximal.

Definition 2

Any node belonging to more than one biconnected component is called articulation point.

2.3 Incremental Graphs

We call a dynamic graph incremental if edges can be inserted, but not deleted. As previously mentioned in this work we focus on incremental graphs. Computing betweenness in such context is usually done in two steps. In the first step some preprocessing is done and initial betweenness is computed. Next, after each edge insertion, betweenness is updated. The two steps could have different time complexities, so both define the time complexity of an incremental algorithm. All algorithms mentioned here have the same complexity in the first step (the same as brandes), so in comparisons we will only take into account the update step.

3 Algorithm

The proposed algorithm is a generalization of icentral to deal with directed graphs.

Definition 3

Let G be a graph, and let $G^*$ be the graph G after inserting a new edge (u, v). We define affected component as the biconnected component of the undirected version of $G^*$ to which the newly inserted edge (u, v) belongs.

The main obstacle in generalizing icentral is that, in directed graphs, when an edge is inserted, betweenness values of nodes outside the affected component can change as well. In the next theorem we prove a formula allowing to compute those changes efficiently.

Theorem 1

Let x be a node outside the affected component A, and let s be the articulation point inside the component such that its removal disconnects x from A. Then after the update, the betweenness of x changes by

$$\begin{aligned} \delta _s(x) \cdot (\text {reach}^*(s) - \text {reach}(s)) + \delta ^r_s(x) \cdot (\text {reach}^{*r}(s) - \text {reach}^r(s)) \end{aligned}$$

(3)

where reach(s) equals the number of nodes z such that there exists a shortest path from z to x passing through A, superscript r means the function is applied to the reversed graph, and superscript $*$ indicates the function is applied to the updated graph.

Proof

For the sake of clearness, lets rename variables in the definition of betweenness 1:

$$\begin{aligned} C_B(x) = \sum _{\begin{array}{c} a \ne x, b \ne x\\ a,b \in V \end{array}} {\frac{\sigma _{ab}(x)}{\sigma _{ab}}} \end{aligned}$$

(4)

In the sum on the right the only terms that can change after an update are such that a and b are in different biconnected components, and such that all shortest paths from a to b pass through A. Therefore, all these paths must pass through s. Then, two cases may occur, according to the relative orders of s and x in the paths from a to b that go through x:

1.
$a, s, x, b \implies \frac{\sigma _{ab}(x)}{\sigma _{ab}} = \frac{\sigma _{as}\sigma _{sx}\sigma _{xb}}{\sigma _{as}\sigma _{sb}} = \frac{\sigma _{sx}\sigma _{xb}}{\sigma _{sb}} = \frac{\sigma _{sb}(x)}{\sigma _{sb}} $
2.
$a, x, s, b \implies \frac{\sigma _{ab}(x)}{\sigma _{ab}} = \frac{\sigma _{ax}\sigma _{xs}\sigma _{sb}}{\sigma _{as}\sigma _{sb}} = \frac{\sigma _{ax}\sigma _{xs}}{\sigma _{as}} = \frac{\sigma _{as}(x)}{\sigma _{as}}$

Therefore, the terms that can change equal:

$$\begin{aligned} \sum _{a, b \text { in case 1}} {\frac{\sigma _{sb}(x)}{\sigma _{sb}}} + \sum _{a, b \text { in case 2}} {\frac{\sigma _{as}(x)}{\sigma _{as}}} = \text {reach}(s)\cdot \delta _s(x) + \text {reach}^r(s)\cdot \delta ^r_s(x) \end{aligned}$$

(5)

and the theorem easily follows.

Following Theorem 1, pseudocode for the function updating betweenness outside A is shown in Algorithm 3. Then, it remains to update betweenness inside the component; this can be done as in icentral, and is shown in Algorithm 2. The Brandes-like function in lines 8 and 9 computes delta values in the affected component, as in icentral, using reach$_o^r$ values to add the contribution of nodes outside the affected component. r and $*$ have the same meaning as in Theorem 1. The pseudocode of the proposed algorithm is shown in Algorithm 1.

3.1 Complexity

The overall space complexity is linear (in the size of the graph) as the algorithm only uses a constant number of arrays with linear size ($C_B$, the different variants of reach, A, $A^*$, and the different variants of $\delta $). Only $C_B$ and the graph itself persist across updates.

Time complexity of the proposed algorithm (Algorithm 1) equals the complexity of finding biconnected components (linear), plus the complexity of Algorithm 2, plus the one of Algorithm 3. Let $n_A$ and $m_A$ be the number of nodes and edges respectively in the affected component. Algorithm 2 has exactly the same complexity as icentral, which is $\mathcal {O}(n + m + |Sr| * (n_A + m_A))$, where Sr is the set of affected sources (as defined in [7]).

In Algorithm 3, for a given s all variants of reach (lines 3 and 7) can be computed using BFS in time $\mathcal {O}(n_A + m_A)$, as there is no need to do any computation outside A at this point. As any node outside A will have at most one corresponding articulation point s, in lines 4, 5, 6, 8, 9, and 10 each node and edge of the graph appears at most once, and so the total complexity of these is $\mathcal {O}(n + m)$. Summing up, the complexity of Algorithm 3 is $\mathcal {O}(n + m + |\text {articulation-points}(A)| * (n_A + m_A))$.

Overall, using that there are at most $n_A$ articulation points in A, and also at most $n_A$ affected sources, complexity of the proposed algorithm is proven to be $\mathcal {O}(n + m + n_A * (n_A + m_A))$, matching that of icentral.

3.2 Notes

It’s possible to modify slightly the proposed algorithm to work with graphs with arbitrary positive weights, by using Dijkstra algorithm [5] instead of BFS. In graphs with multiples edges, parallel edges can be substituted with the edge with smallest weight, and then obtain a simple graph with the same betweenness.

On the other hand, it’s straightforward to parallelize the most time consuming part of the algorithm, the computation of the betweenness changes inside the affected component. As the $\delta $ values respect to affected sources are computed independently, this computations could be done by different nodes in a parallel environment. In this environment, good speedups are expected, similar to those in [7].

4 Experiments

We experimentally evaluate the proposed algorithm by measuring time and memory, and comparing it with icentral, ibet and brandes. All algorithms were implemented in pure python, and graphs were stored and manipulated using the python library NetworkX^{Footnote 1}. Algorithms were run on a GNU/Linux 64 bit machine, processor Intel(R) Core(TM) i3-4160 CPU @ 3.60 GHz, with 5 GBytes of main memory.

The datasets used for experimentation were taken from online sources, some of them being already referenced in [2] or [7]; p2p-Gnutella08, Wiki-Vote, and CollegeMsg, were taken from SNAP graphs collection. The description of the data is shown in Table 1.

Table 1. Statistics of graph datasets (lbc refers to largest biconnected component)

Full size table

For each graph, we randomly selected 100 edges that were not already contained in the graph, and measured the average time and maximum memory used by each algorithm to update the betweenness of all nodes when each edge is inserted. In the case of algorithms that work with directed graphs, when testing on an undirected one, each edge was transformed into two edges, one for each possible direction. Results are shown in Table 2. Note it’s not possible to test icentral on directed graphs.

Table 2. Results, time given in seconds and memory in MBytes.

Full size table

As expected, both our algorithm and icentral perform very similar, both in time and memory, and are consistently faster than brandes. This speedup is highly dependent on the size of the affected component. Best performance respect to brandes was obtained in dataset Eva, where the number of nodes in the largest biconnected component is relatively small. On average, our algorithm is between 2 and 3 times faster than brandes.

On the other hand, ibet is the fastest of all on most datasets, but it’s memory usage is very high, making it very expensive for graphs of tens of thousands of nodes. Also note, that for datasets like Eva, ibet is outperformed by algorithms icentral and our proposal, stressing the relevance of algorithms using the biconnected components decomposition.

5 Conclusions

In this work an algorithm for computing betweenness in incremental directed graphs has been proposed. Its memory usage is linear allowing it to scale to large graphs. It’s time complexity is similar to that of algorithm proposed in [7], despite of handling the more general case of directed graphs. Experiments have proven it can be a practical replacement of brandes for directed and undirected graphs, mostly when quadratic memory usage is not feasible due to large input.

As future work we plan to conduct experiments with a distributed and parallel implementation of the proposed algorithm. Also, we will extend the proposed algorithm to work with edge deletions. Moreover, it seems possible to apply some of the optimizations proposed in [2] to update betweenness values inside the affected biconnected component.

Notes

1.
http://networkx.github.io/.

References

Anthonisse, J.M.: The Rush in a Directed Graph. Stichting Mathematisch Centrum, Mathematische Besliskunde (1971)
Google Scholar
Bergamini, E., Meyerhenke, H., Ortmann, M., Slobbe, A.: Faster betweenness centrality updates in evolving networks. In: LIPIcs-Leibniz International Proceedings in Informatics, vol. 75, pp. 1–16 (2017). https://doi.org/10.4230/LIPIcs.SEA.2017.23
Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Soc. 25(2), 163–177 (2001). https://doi.org/10.1080/0022250X.2001.9990249
Article MATH Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2009)
MATH Google Scholar
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1(1), 269–271 (1959). https://doi.org/10.1007/BF01386390
Article MathSciNet MATH Google Scholar
Freeman, L.C.: A set of measures of centrality based on betweenness (1977). https://doi.org/10.2307/3033543
Jamour, F., Skiadopoulos, S., Kalnis, P.: Parallel algorithm for incremental betweenness centrality on large graphs. IEEE Trans. Parallel Distrib. Syst. (2017). https://doi.org/10.1109/TPDS.2017.2763951
Joy, M.P., Brock, A., Ingber, D.E., Huang, S.: High-betweenness proteins in the yeast protein interaction network. J. Biomed. Biotechnol. 2005(2), 96–103 (2005). https://doi.org/10.1155/JBB.2005.96
Article Google Scholar
Kas, M., Wachs, M., Carley, K.M., Carley, L.R.: Incremental algorithm for updating betweenness centrality in dynamically growing networks. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining - ASONAM 2013, pp. 33–40 (2013). https://doi.org/10.1145/2492517.2492533
Kourtellis, N., Morales, G.D.F., Bonchi, F.: Scalable online betweenness centrality in evolving graphs. IEEE Trans. Know. Data Eng. 27(9), 2494–2506 (2015). https://doi.org/10.1109/TKDE.2015.2419666
Article Google Scholar
Krebs, V.E.: Mapping networks of terrorist cells. Connections 24(3), 43–52 (2002)
Google Scholar
Nasre, M., Pontecorvi, M., Ramachandran, V.: Betweenness centrality – incremental and faster. In: Csuhaj-Varjú, E., Dietzfelbinger, M., Ésik, Z. (eds.) MFCS 2014. LNCS, vol. 8635, pp. 577–588. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44465-8_49
Chapter Google Scholar
Puzis, R., Altshuler, Y., Elovici, Y., Bekhor, S., Shiftan, Y., Pentland, A.S.: Augmented betweenness centrality for environmentally aware traffic monitoring in transportation networks. J. Intell. Transp. Syst. 17(1), 91–105 (2013). https://doi.org/10.1080/15472450.2012.716663
Article Google Scholar
Puzis, R., Zilberman, P., Elovici, Y., Dolev, S., Brandes, U.: Heuristics for speeding up betweenness centrality computation. In: Proceedings - 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust and 2012 ASE/IEEE International Conference on Social Computing, SocialCom/PASSAT 2012, pp. 302–311 (2012). https://doi.org/10.1109/SocialCom-PASSAT.2012.66
Tizghadam, A., Leon-Garcia, A.: Betweenness centrality and resistance distance in communication networks. IEEE Netw. 24(6), 10–16 (2010). https://doi.org/10.1109/MNET.2010.5634437
Article Google Scholar
Williams, V.V.: On some fine-grained questions in algorithms and complexity. In: Proceedings of the ICM (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

CERPAMID, Santiago de Cuba, Cuba
Reynaldo Gil-Pons

Authors

Reynaldo Gil-Pons
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reynaldo Gil-Pons .

Editor information

Editors and Affiliations

Biometrics and Data Pattern Analytics Lab, Universidad Autonoma de Madrid, Madrid, Spain
Ruben Vera-Rodriguez
Biometrics and Data Pattern Analytics Lab, Universidad Autonoma de Madrid, Madrid, Spain
Julian Fierrez
Biometrics and Data Pattern Analytics Lab, Universidad Autonoma de Madrid, Madrid, Spain
Aythami Morales

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gil-Pons, R. (2019). Space Efficient Incremental Betweenness Algorithm for Directed Graphs. In: Vera-Rodriguez, R., Fierrez, J., Morales, A. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2018. Lecture Notes in Computer Science(), vol 11401. Springer, Cham. https://doi.org/10.1007/978-3-030-13469-3_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-13469-3_31
Published: 03 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13468-6
Online ISBN: 978-3-030-13469-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)