Efficient algorithms for incremental all pairs shortest paths, closeness and betweenness in social network analysis

Khopkar, Sushant S.; Nagi, Rakesh; Nikolaev, Alexander G.; Bhembre, Vaibhav

doi:10.1007/s13278-014-0220-6

Efficient algorithms for incremental all pairs shortest paths, closeness and betweenness in social network analysis

Original Article
Published: 03 August 2014

Volume 4, article number 220, (2014)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Sushant S. Khopkar¹,
Rakesh Nagi²,
Alexander G. Nikolaev¹ &
…
Vaibhav Bhembre¹

655 Accesses
10 Citations
4 Altmetric
Explore all metrics

Abstract

One of the biggest challenges in today’s social network analysis (SNA) is handling dynamic data. Real-world social networks evolve with time, forcing their corresponding graph representations to dynamically update by addition or deletion of edges/nodes. Consequently, a researcher is often interested in fast recomputation of important SNA metrics pertaining to a network. Recomputations of SNA metrics are expensive. Use of dynamic algorithms has been found as a solution to this problem. For calculating closeness and betweenness centrality metrics, computations of all pairs shortest paths (APSP) are needed. Thus, to compute these SNA metrics dynamically, APSP are needed to be computed dynamically. This paper presents fast incremental updating algorithms along with the time complexity results for APSP, closeness centrality and betweenness centrality, considering two distinct cases: edge addition and node addition. The following time complexity results are presented: (1) The incremental APSP algorithm runs in $O(n^2)$ time ($\Omega (n^2)$ is the theoretical lower bound of the APSP problem), (2) The incremental closeness algorithm that runs in $O(n^2)$ time, and (3) The incremental betweenness algorithm runs in $O(nm + n^2\, {\mathrm{log}} \, n)$ time. Here, $m$ is the number of edges and $n$ is the number of nodes in the network. Though the time complexity of the presented incremental betweenness algorithm is no better than its static counterpart (Brandes, J Math Sociol 25(2):163–177, 2001), the experimental comparisons demonstrate that the former performs better than the latter. All the presented methods are applicable to weighted, directed or undirected graphs with non-negative real-valued edge weights. An alternate version of the incremental APSP algorithm is presented in the Appendix section. It is demonstrated that this version works better on large graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Complex Networks: a Mini-review

Article 13 July 2020

Graph Databases: Their Power and Limitations

Graph similarity learning for change-point detection in dynamic networks

Article Open access 31 October 2023

References

Akoglu L, Faloutsos C (2009) RTG: a recursive realistic graph generator using random typing. Mach Learn Knowl Discov Databases 13–28
Ausiello G, Italiano G, Nanni U (1991) Incremental algorithms for minimal length paths* 1. J Algorithms 12(4):615–638
Article MathSciNet MATH Google Scholar
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Article MathSciNet Google Scholar
Bavelas A (1948) A mathematical model for group structures. Hum Organ 7(3):16–30
Google Scholar
Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177
Article MATH Google Scholar
Burt R (1995) Structural holes: the social structure of competition. Harvard University Press, USA
Google Scholar
Demetrescu C, Italiano G (2001) Fully dynamic all pairs shortest paths with real edge weights. In: Proceedings of the 42nd IEEE symposium on foundations of computer science, IEEE, New York, pp 260–267
Demetrescu C, Italiano G (2004) A new approach to dynamic all pairs shortest paths. J ACM (JACM) 51(6):968–992
Article MathSciNet MATH Google Scholar
Dutot A, Guinand F, Olivier D, Pigné Y et al (2007) Graphstream: a tool for bridging the gap between complex systems and dynamic graphs. In: Emergent properties in natural and artificial complex systems. Satellite conference within the 4th European conference on complex systems (ECCS’2007)
Eppstein D, Galil Z, Italiano G, Nissenzweig A (1997) Sparsification—a technique for speeding up dynamic graph algorithms. J ACM (JACM) 44(5):669–696
Article MathSciNet MATH Google Scholar
Freeman L (1977) A set of measures of centrality based on betweenness. Sociometry 35–41
Freeman L (1979) Centrality in social networks conceptual clarification. Soc Netw 1(3):215–239
Article Google Scholar
Green O, McColl R, Bader DA (2012) A fast algorithm for streaming betweenness centrality. In: Proceeding of the 2012 international conference on privacy, security, risk and trust (PASSAT), 2012 international confernece on social computing (SocialCom), IEEE, New York, pp 11–20
Henzinger M, Klein P, Rao S, Subramanian S (1997) Faster shortest-path algorithms for planar graphs* 1,* 2. J Comput Syst Sci 55(1):3–23
Article MathSciNet MATH Google Scholar
King V (2002) Fully dynamic algorithms for maintaining all-pairs shortest paths and transitive closure in digraphs. In: Proceeding of the 40th annual symposium on foundations of computer science, 1999, IEEE, New York, pp 81–89
Kleinberg J (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632
Article MathSciNet MATH Google Scholar
Knoke D, Yang S (2008) Social network analysis. Sage Publications Inc., New York
Google Scholar
Laumann E, Pappi F (1976) Networks of collective action: a perspective on community influence systems. Academic Press, New York
Google Scholar
Lee MJ, Lee J, Park JY, Choi RH, Chung CW (2012) Qube: a quick algorithm for updating betweenness centrality. In: Proceedings of the 21st international conference on World Wide Web. ACM, New York, pp 351–360
Lin C, Chang R (1991) On the dynamic shortest path problem. J Inform Process 13(4):470–476
MathSciNet Google Scholar
Loubal P (1967) A network evaluation procedure. Highw Res Board 205:96–109
Google Scholar
Mark S (1973) The strength of weak ties1. Am J Sociol 78(6):1360–1380
Article Google Scholar
Marlow C (2009) Maintained relationships on facebook. Retriev Febr 15:2010
Google Scholar
Murchland J (1967) The effect of increasing or decreasing the length of a single arc on all shortest distances in a graph. Technical report. London Business School, Transport Network Theory Unit
Newman M (2005) A measure of betweenness centrality based on random walks. Soc Netw 27(1):39–54
Article Google Scholar
Opsahl T, Agneessens F, Skvoretz J (2010) Node centrality in weighted networks: generalizing degree and shortest paths. Soc Netw 32(3):245–251
Article Google Scholar
Ramalingam G, Reps T (1996a) An incremental algorithm for a generalization of the shortest-path problem. J Algorithms 21(2):267–305
Article MathSciNet MATH Google Scholar
Ramalingam G, Reps T (1996b) On the computational complexity of dynamic graph problems. Theor Comput Sci 158(1–2):233–277
Article MathSciNet MATH Google Scholar
Ripley RM, Snijders TA, Preciado P (2011) Manual for rsiena. University of Oxford, Department of Statistics, Nuffield College
Rodionov V (1968) The parametric problem of shortest distances. USSR Comput Math Math Phys 8(5):336–343
Article MathSciNet Google Scholar
Snijders TA, Van de Bunt GG, Steglich CE (2010) Introduction to stochastic actor-based models for network dynamics. Soc Netw 32(1):44–60
Article Google Scholar
Thorup M (2004) Fully-dynamic all-pairs shortest paths: faster and allowing negative cycles. Algorithm theory-SWAT 2004, pp 384–396
Watts DJ, Strogatz SH (1998) Collective dynamics of small-world networks. Nature 393(6684):440–442
Article Google Scholar
Westbrook J, Tarjan R (1992) Maintaining bridge-connected and biconnected components on-line. Algorithmica 7(1):433–464
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work was supported by the National Science Foundation via grant number ICES-1216082. This support is gratefully acknowledged.

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, State University of New York at Buffalo, Buffalo, NY, 14260, USA
Sushant S. Khopkar, Alexander G. Nikolaev & Vaibhav Bhembre
Department of Industrial and Enterprise Systems Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Rakesh Nagi

Authors

Sushant S. Khopkar
View author publications
You can also search for this author in PubMed Google Scholar
Rakesh Nagi
View author publications
You can also search for this author in PubMed Google Scholar
Alexander G. Nikolaev
View author publications
You can also search for this author in PubMed Google Scholar
Vaibhav Bhembre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sushant S. Khopkar.

Appendix

1.1 The modified incremental APSP algorithm

This section presents a modified version of the designed incremental APSP algorithm. In the incremental algorithm for node addition, the shortest paths passing through the new node $z$ are compared with all the $n^2$ shortest paths in the original network, in the last step of the algorithm. In this modified version, we try to avoid some of these comparisons with the aim to improve the empirical performance of the algorithm.

A new shortest path distance between a pair of nodes $(i,j)$ and passing through the newly added $z$ has two parts; $\min\nolimits_{k^{\text{ in }} \in T_1}\{d^{k^{\text{ in }}}_{iz}\}$ and $\min\nolimits_{k^{\text{ out }} \in T_2}\{d^{k^{\text{ out }}}_{zj}\}$. Now, if any of these two parts is greater than the original shortest path distance, $d_{ij}$, i.e., if $\min\nolimits_{k^{\text{ in }} \in T_1}\{d^{k^{\text{ in }}}_{iz}\} > d_{ij}$ or $\min\nolimits_{k^{\text{ out }} \in T_2}\{d^{k^{\text{ out }}}_{zj}\} > d_{ij}$, then the new shortest path distance of a path passing through $z$ will not be shorter than the original shortest path distance $d_{ij}$ and in this way further computations can be avoided. This idea provides a foundation to the concept of the modified version of the incremental algorithm. For this algorithm, we sort all rows and columns of the original distance matrix, which takes $O(n^2{\mathrm{log}}n)$ time. However, this is only one time investment, made at first addition of a new node, for original distance matrix. For each consecutive addition of a node, new insertion of shortest path distance into the distance matrix takes only $O({\mathrm{log}}n)$ time and hence the method turns out to be useful, as it saves computational efforts in the long run.

1.
Sort shortest path distances starting from each node: Let $d_{i\cdot }$ be the set of the shortest paths that start from node $i$ and end at any other node. $d_{i\cdot }$ represents a single row in the distance matrix of shortest paths. Sort this row in the descending order according to path lengths. Remove all elements in this row starting from the element of the highest path length, that satisfy the condition $d_{i\cdot } >\min\nolimits_{k^{\text{ in }} \in T_1}\{d^{k^{\text{ in }}}_{iz}\}$ and store in a new matrix. Follow this procedure for all $i$ nodes.
2.
Sort shortest path distances ending at each node: Similarly, let $d_{\cdot j}$ be the set of shortest paths ending at node $j$. $d_{\cdot j}$ represents a single column in the distance matrix of shortest paths. Sort this column in the descending order according to the path lengths from a copy of the distance matrix. Remove all elements in this column starting from the element of the highest path length, that satisfy the condition $d_{\cdot j} > \min\nolimits_{k^{\text{ out }} \in T_2}\{d^{k^{\text{ out }}}_{zj}\}$ and store in the same new matrix. Repeat this procedure for all columns.
3.
Update shortest path distances: Now, in this new matrix, the old shortest path distances that are selected by row as well as column operations will try to overlap in the same element of the matrix. Select only those paths for the final comparison of checking if $\min\nolimits_{k^{\text{ in }} \in T_1}\{d^{k^{\text{ in }}}_{iz}\} + \min\nolimits_{k^{\text{ out }} \in T_2}\{d^{k^{\text{ out }}}_{zj}\} < d_{ij}$. All non-overlapping elements do not qualify for this comparison. In this way, empirical performance can be further improved.

In the worst case when all $d_{ij}$ are greater than both $\min\nolimits_{k^{\text{ in }} \in T_1}\{d^{k^{\text{ in }}}_{iz}\}$ and $\min\nolimits_{k^{\text{ out }} \in T_2}\{d^{k^{\text{ out }}}_{zj}\}$ each time, this algorithm also performs $n^2$ comparisons and thus does not prove to be an improvement over the original version of our algorithm. However, usually the number comparisons are almost always smaller than $n^2$ and hence this algorithm can prove to be empirically effective (see also Sect. 4). Algorithm 4 below shows the pseudocode of the algorithm.

1.2 Snijder’s stochastic actor-based model for network dynamics

This section briefly describes Snijder’s stochastic actor-based model for network dynamics, which has been used to generate a set of graphs to test the two betweenness algorithms.

A set of longitudinal observed networks is an input to such a model. They are assumed to be outcomes of a Markov process. These models are “actor-based” models because it is assumed that actors in a network control their outgoing ties. Given an opportunity, an actor can form a tie, delete one of its existing ties, or do nothing. An opportunity to make such a change is probabilistically given to an actor. This opportunity may depend on the network position of the actors (e.g., centrality) and on actor covariates (e.g., age and sex). The probabilities of an actor’s choices depend on the so-called objective function. The objective function expresses how likely it is for the actor to change her/his network in a particular way. On average, each actor “tries to” move into a direction of higher values of her/his objective function, subject to constraints of the current network structure, the changes made by other actors in the network, and subject to random influence. The objective function is assumed to be a linear combination of effects.

$$\begin{aligned} f_{i}(\beta ,x) = \sum _{k}{\beta _{k}s_{ki}(x)}. \end{aligned}$$

(13)

Here, $f_{i}(\beta , x)$ is the value of the objective function for actor $i$ depending on the state $x$ of the network. The function $s_{ki}(x)$ are effects, chosen based on subject-matter knowledge and correspond to the tendencies of the actors. The weights $\beta _{k}$ are the statistical parameters. The effects represent aspects of the network as viewed from the point of view of actor $i$. If $\beta _{k}$ equals 0, the corresponding effect plays no role in the network dynamics; if $\beta _{k}$ is positive, then there will be a higher probability of moving into directions where the corresponding effect is higher, and the converse is true of $\beta _{k}$ is negative.

In the model that is used to generated graphs for testing the two betweenness algorithms, we have considered outdegree, reciprocity and transitive triplets as the effects in the model. Hence, the objective function of actor $i$ in our model would be,

$$\begin{aligned} f_{i}(\beta ,x) = \beta _{1}s_{1i}(x) + \beta _{2}s_{2i}(x) + \beta _{3}s_{3i}(x). \end{aligned}$$

(14)

Here, $s_{1i}(x) = \sum _{j}{x_{ij}}$, the outdegree effect; $s_{2i}(x) = \sum _{j}{x_{ij}x_{ji}}$, the reciprocity effect; and $s_{3i}(x) = \sum _{j,h}{x_{ih}x_{ij}x_{jh}}$, the transitive triplets effect.

The parameters of such a model are estimated by a maximum likelihood method. Once the parameters are estimated, we can simulate different networks having same effects and same strength. For the purpose of generating graphs, an R script mentioned in Ripley et al. (2011) is used. It is assumed that the model parameters are already estimated. 60 different graphs are generated with their node sizes ranging from 100 to 1000 and outdegree weights$(\beta _1)$ ranging from −2.8 to −1.8, keeping all other parameter values fixed (rate = 2, reciprocity parameter$(\beta _2) = 2$, transitive triplets parameter$(\beta _3) = 0.3$). These values are decided as per guidance given in the script. Note that, the outdegree weights have negative values. This parameter generally has negative value in large networks to hinder link formations between nodes that are remotely connected to each other. A node with a total degree (outgoing plus incoming) of approximately 10 is removed from each generated graph with its attached edges; it is then treated as an increment.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khopkar, S.S., Nagi, R., Nikolaev, A.G. et al. Efficient algorithms for incremental all pairs shortest paths, closeness and betweenness in social network analysis. Soc. Netw. Anal. Min. 4, 220 (2014). https://doi.org/10.1007/s13278-014-0220-6

Download citation

Received: 10 March 2014
Revised: 20 June 2014
Accepted: 10 July 2014
Published: 03 August 2014
DOI: https://doi.org/10.1007/s13278-014-0220-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient algorithms for incremental all pairs shortest paths, closeness and betweenness in social network analysis

Abstract

Access this article