Skip to main content

Efficient algorithms for incremental all pairs shortest paths, closeness and betweenness in social network analysis

Abstract

One of the biggest challenges in today’s social network analysis (SNA) is handling dynamic data. Real-world social networks evolve with time, forcing their corresponding graph representations to dynamically update by addition or deletion of edges/nodes. Consequently, a researcher is often interested in fast recomputation of important SNA metrics pertaining to a network. Recomputations of SNA metrics are expensive. Use of dynamic algorithms has been found as a solution to this problem. For calculating closeness and betweenness centrality metrics, computations of all pairs shortest paths (APSP) are needed. Thus, to compute these SNA metrics dynamically, APSP are needed to be computed dynamically. This paper presents fast incremental updating algorithms along with the time complexity results for APSP, closeness centrality and betweenness centrality, considering two distinct cases: edge addition and node addition. The following time complexity results are presented: (1) The incremental APSP algorithm runs in \(O(n^2)\) time (\(\Omega (n^2)\) is the theoretical lower bound of the APSP problem), (2) The incremental closeness algorithm that runs in \(O(n^2)\) time, and (3) The incremental betweenness algorithm runs in \(O(nm + n^2\, {\mathrm{log}} \, n)\) time. Here, \(m\) is the number of edges and \(n\) is the number of nodes in the network. Though the time complexity of the presented incremental betweenness algorithm is no better than its static counterpart (Brandes, J Math Sociol 25(2):163–177, 2001), the experimental comparisons demonstrate that the former performs better than the latter. All the presented methods are applicable to weighted, directed or undirected graphs with non-negative real-valued edge weights. An alternate version of the incremental APSP algorithm is presented in the Appendix section. It is demonstrated that this version works better on large graphs.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. Akoglu L, Faloutsos C (2009) RTG: a recursive realistic graph generator using random typing. Mach Learn Knowl Discov Databases 13–28

  2. Ausiello G, Italiano G, Nanni U (1991) Incremental algorithms for minimal length paths* 1. J Algorithms 12(4):615–638

    Article  MathSciNet  MATH  Google Scholar 

  3. Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    Article  MathSciNet  Google Scholar 

  4. Bavelas A (1948) A mathematical model for group structures. Hum Organ 7(3):16–30

    Google Scholar 

  5. Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177

    Article  MATH  Google Scholar 

  6. Burt R (1995) Structural holes: the social structure of competition. Harvard University Press, USA

    Google Scholar 

  7. Demetrescu C, Italiano G (2001) Fully dynamic all pairs shortest paths with real edge weights. In: Proceedings of the 42nd IEEE symposium on foundations of computer science, IEEE, New York, pp 260–267

  8. Demetrescu C, Italiano G (2004) A new approach to dynamic all pairs shortest paths. J ACM (JACM) 51(6):968–992

    Article  MathSciNet  MATH  Google Scholar 

  9. Dutot A, Guinand F, Olivier D, Pigné Y et al (2007) Graphstream: a tool for bridging the gap between complex systems and dynamic graphs. In: Emergent properties in natural and artificial complex systems. Satellite conference within the 4th European conference on complex systems (ECCS’2007)

  10. Eppstein D, Galil Z, Italiano G, Nissenzweig A (1997) Sparsification—a technique for speeding up dynamic graph algorithms. J ACM (JACM) 44(5):669–696

    Article  MathSciNet  MATH  Google Scholar 

  11. Freeman L (1977) A set of measures of centrality based on betweenness. Sociometry 35–41

  12. Freeman L (1979) Centrality in social networks conceptual clarification. Soc Netw 1(3):215–239

    Article  Google Scholar 

  13. Green O, McColl R, Bader DA (2012) A fast algorithm for streaming betweenness centrality. In: Proceeding of the 2012 international conference on privacy, security, risk and trust (PASSAT), 2012 international confernece on social computing (SocialCom), IEEE, New York, pp 11–20

  14. Henzinger M, Klein P, Rao S, Subramanian S (1997) Faster shortest-path algorithms for planar graphs* 1,* 2. J Comput Syst Sci 55(1):3–23

    Article  MathSciNet  MATH  Google Scholar 

  15. King V (2002) Fully dynamic algorithms for maintaining all-pairs shortest paths and transitive closure in digraphs. In: Proceeding of the 40th annual symposium on foundations of computer science, 1999, IEEE, New York, pp 81–89

  16. Kleinberg J (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632

    Article  MathSciNet  MATH  Google Scholar 

  17. Knoke D, Yang S (2008) Social network analysis. Sage Publications Inc., New York

    Google Scholar 

  18. Laumann E, Pappi F (1976) Networks of collective action: a perspective on community influence systems. Academic Press, New York

    Google Scholar 

  19. Lee MJ, Lee J, Park JY, Choi RH, Chung CW (2012) Qube: a quick algorithm for updating betweenness centrality. In: Proceedings of the 21st international conference on World Wide Web. ACM, New York, pp 351–360

  20. Lin C, Chang R (1991) On the dynamic shortest path problem. J Inform Process 13(4):470–476

    MathSciNet  Google Scholar 

  21. Loubal P (1967) A network evaluation procedure. Highw Res Board 205:96–109

    Google Scholar 

  22. Mark S (1973) The strength of weak ties1. Am J Sociol 78(6):1360–1380

    Article  Google Scholar 

  23. Marlow C (2009) Maintained relationships on facebook. Retriev Febr 15:2010

    Google Scholar 

  24. Murchland J (1967) The effect of increasing or decreasing the length of a single arc on all shortest distances in a graph. Technical report. London Business School, Transport Network Theory Unit

  25. Newman M (2005) A measure of betweenness centrality based on random walks. Soc Netw 27(1):39–54

    Article  Google Scholar 

  26. Opsahl T, Agneessens F, Skvoretz J (2010) Node centrality in weighted networks: generalizing degree and shortest paths. Soc Netw 32(3):245–251

    Article  Google Scholar 

  27. Ramalingam G, Reps T (1996a) An incremental algorithm for a generalization of the shortest-path problem. J Algorithms 21(2):267–305

    Article  MathSciNet  MATH  Google Scholar 

  28. Ramalingam G, Reps T (1996b) On the computational complexity of dynamic graph problems. Theor Comput Sci 158(1–2):233–277

    Article  MathSciNet  MATH  Google Scholar 

  29. Ripley RM, Snijders TA, Preciado P (2011) Manual for rsiena. University of Oxford, Department of Statistics, Nuffield College

  30. Rodionov V (1968) The parametric problem of shortest distances. USSR Comput Math Math Phys 8(5):336–343

    Article  MathSciNet  Google Scholar 

  31. Snijders TA, Van de Bunt GG, Steglich CE (2010) Introduction to stochastic actor-based models for network dynamics. Soc Netw 32(1):44–60

    Article  Google Scholar 

  32. Thorup M (2004) Fully-dynamic all-pairs shortest paths: faster and allowing negative cycles. Algorithm theory-SWAT 2004, pp 384–396

  33. Watts DJ, Strogatz SH (1998) Collective dynamics of small-world networks. Nature 393(6684):440–442

    Article  Google Scholar 

  34. Westbrook J, Tarjan R (1992) Maintaining bridge-connected and biconnected components on-line. Algorithmica 7(1):433–464

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Science Foundation via grant number ICES-1216082. This support is gratefully acknowledged.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Sushant S. Khopkar.

Appendix

Appendix

The modified incremental APSP algorithm

This section presents a modified version of the designed incremental APSP algorithm. In the incremental algorithm for node addition, the shortest paths passing through the new node \(z\) are compared with all the \(n^2\) shortest paths in the original network, in the last step of the algorithm. In this modified version, we try to avoid some of these comparisons with the aim to improve the empirical performance of the algorithm.

A new shortest path distance between a pair of nodes \((i,j)\) and passing through the newly added \(z\) has two parts; \(\min\nolimits_{k^{\text{ in }} \in T_1}\{d^{k^{\text{ in }}}_{iz}\}\) and \(\min\nolimits_{k^{\text{ out }} \in T_2}\{d^{k^{\text{ out }}}_{zj}\}\). Now, if any of these two parts is greater than the original shortest path distance, \(d_{ij}\), i.e., if \(\min\nolimits_{k^{\text{ in }} \in T_1}\{d^{k^{\text{ in }}}_{iz}\} > d_{ij}\) or \(\min\nolimits_{k^{\text{ out }} \in T_2}\{d^{k^{\text{ out }}}_{zj}\} > d_{ij}\), then the new shortest path distance of a path passing through \(z\) will not be shorter than the original shortest path distance \(d_{ij}\) and in this way further computations can be avoided. This idea provides a foundation to the concept of the modified version of the incremental algorithm. For this algorithm, we sort all rows and columns of the original distance matrix, which takes \(O(n^2{\mathrm{log}}n)\) time. However, this is only one time investment, made at first addition of a new node, for original distance matrix. For each consecutive addition of a node, new insertion of shortest path distance into the distance matrix takes only \(O({\mathrm{log}}n)\) time and hence the method turns out to be useful, as it saves computational efforts in the long run.

  1. 1.

    Sort shortest path distances starting from each node: Let \(d_{i\cdot }\) be the set of the shortest paths that start from node \(i\) and end at any other node. \(d_{i\cdot }\) represents a single row in the distance matrix of shortest paths. Sort this row in the descending order according to path lengths. Remove all elements in this row starting from the element of the highest path length, that satisfy the condition \(d_{i\cdot } >\min\nolimits_{k^{\text{ in }} \in T_1}\{d^{k^{\text{ in }}}_{iz}\}\) and store in a new matrix. Follow this procedure for all \(i\) nodes.

  2. 2.

    Sort shortest path distances ending at each node: Similarly, let \(d_{\cdot j}\) be the set of shortest paths ending at node \(j\). \(d_{\cdot j}\) represents a single column in the distance matrix of shortest paths. Sort this column in the descending order according to the path lengths from a copy of the distance matrix. Remove all elements in this column starting from the element of the highest path length, that satisfy the condition \(d_{\cdot j} > \min\nolimits_{k^{\text{ out }} \in T_2}\{d^{k^{\text{ out }}}_{zj}\}\) and store in the same new matrix. Repeat this procedure for all columns.

  3. 3.

    Update shortest path distances: Now, in this new matrix, the old shortest path distances that are selected by row as well as column operations will try to overlap in the same element of the matrix. Select only those paths for the final comparison of checking if \(\min\nolimits_{k^{\text{ in }} \in T_1}\{d^{k^{\text{ in }}}_{iz}\} + \min\nolimits_{k^{\text{ out }} \in T_2}\{d^{k^{\text{ out }}}_{zj}\} < d_{ij}\). All non-overlapping elements do not qualify for this comparison. In this way, empirical performance can be further improved.

In the worst case when all \(d_{ij}\) are greater than both \(\min\nolimits_{k^{\text{ in }} \in T_1}\{d^{k^{\text{ in }}}_{iz}\}\) and \(\min\nolimits_{k^{\text{ out }} \in T_2}\{d^{k^{\text{ out }}}_{zj}\}\) each time, this algorithm also performs \(n^2\) comparisons and thus does not prove to be an improvement over the original version of our algorithm. However, usually the number comparisons are almost always smaller than \(n^2\) and hence this algorithm can prove to be empirically effective (see also Sect. 4). Algorithm 4 below shows the pseudocode of the algorithm.

figured

Snijder’s stochastic actor-based model for network dynamics

This section briefly describes Snijder’s stochastic actor-based model for network dynamics, which has been used to generate a set of graphs to test the two betweenness algorithms.

A set of longitudinal observed networks is an input to such a model. They are assumed to be outcomes of a Markov process. These models are “actor-based” models because it is assumed that actors in a network control their outgoing ties. Given an opportunity, an actor can form a tie, delete one of its existing ties, or do nothing. An opportunity to make such a change is probabilistically given to an actor. This opportunity may depend on the network position of the actors (e.g., centrality) and on actor covariates (e.g., age and sex). The probabilities of an actor’s choices depend on the so-called objective function. The objective function expresses how likely it is for the actor to change her/his network in a particular way. On average, each actor “tries to” move into a direction of higher values of her/his objective function, subject to constraints of the current network structure, the changes made by other actors in the network, and subject to random influence. The objective function is assumed to be a linear combination of effects.

$$\begin{aligned} f_{i}(\beta ,x) = \sum _{k}{\beta _{k}s_{ki}(x)}. \end{aligned}$$
(13)

Here, \(f_{i}(\beta , x)\) is the value of the objective function for actor \(i\) depending on the state \(x\) of the network. The function \(s_{ki}(x)\) are effects, chosen based on subject-matter knowledge and correspond to the tendencies of the actors. The weights \(\beta _{k}\) are the statistical parameters. The effects represent aspects of the network as viewed from the point of view of actor \(i\). If \(\beta _{k}\) equals 0, the corresponding effect plays no role in the network dynamics; if \(\beta _{k}\) is positive, then there will be a higher probability of moving into directions where the corresponding effect is higher, and the converse is true of \(\beta _{k}\) is negative.

In the model that is used to generated graphs for testing the two betweenness algorithms, we have considered outdegree, reciprocity and transitive triplets as the effects in the model. Hence, the objective function of actor \(i\) in our model would be,

$$\begin{aligned} f_{i}(\beta ,x) = \beta _{1}s_{1i}(x) + \beta _{2}s_{2i}(x) + \beta _{3}s_{3i}(x). \end{aligned}$$
(14)

Here, \(s_{1i}(x) = \sum _{j}{x_{ij}}\), the outdegree effect; \(s_{2i}(x) = \sum _{j}{x_{ij}x_{ji}}\), the reciprocity effect; and \(s_{3i}(x) = \sum _{j,h}{x_{ih}x_{ij}x_{jh}}\), the transitive triplets effect.

The parameters of such a model are estimated by a maximum likelihood method. Once the parameters are estimated, we can simulate different networks having same effects and same strength. For the purpose of generating graphs, an R script mentioned in Ripley et al. (2011) is used. It is assumed that the model parameters are already estimated. 60 different graphs are generated with their node sizes ranging from 100 to 1000 and outdegree weights\((\beta _1)\) ranging from −2.8 to −1.8, keeping all other parameter values fixed (rate = 2, reciprocity parameter\((\beta _2) = 2\), transitive triplets parameter\((\beta _3) = 0.3\)). These values are decided as per guidance given in the script. Note that, the outdegree weights have negative values. This parameter generally has negative value in large networks to hinder link formations between nodes that are remotely connected to each other. A node with a total degree (outgoing plus incoming) of approximately 10 is removed from each generated graph with its attached edges; it is then treated as an increment.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Khopkar, S.S., Nagi, R., Nikolaev, A.G. et al. Efficient algorithms for incremental all pairs shortest paths, closeness and betweenness in social network analysis. Soc. Netw. Anal. Min. 4, 220 (2014). https://doi.org/10.1007/s13278-014-0220-6

Download citation

Keywords

  • Incremental graph algorithms
  • Centrality metrics
  • APSP
  • Social network analysis