Skip to main content

Distance labeling: on parallelism, compression, and ordering

Abstract

Distance labeling approaches are widely adopted to speed up the online performance of shortest-distance queries. The construction of the distance labeling, however, can be exhaustive, especially on big graphs. For a major category of large graphs, small-world networks, the state-of-the-art approach is pruned landmark labeling (\(\mathsf {PLL}\)). \({\mathsf {PLL}} \) prunes distance labels based on a node order and directly constructs the pruned labels by performing breadth-first searches in the node order. The pruning technique, as well as the index construction, has a strong sequential nature which hinders \({\mathsf {PLL}} \) from being parallelized. It becomes an urgent issue on massive small-world networks whose index can hardly be constructed by a single thread within a reasonable time. This paper first scales distance labeling on small-world networks by proposing a parallel shortest-distance labeling (\(\mathsf {PSL}\)) scheme. \(\mathsf {PSL}\) insightfully converts the \({\mathsf {PLL}} \)’s node-order dependency to a shortest-distance dependence, which leads to a propagation-based parallel labeling in D rounds where D denotes the diameter of the graph. To further scale up \(\mathsf {PSL}\), it is critical to reduce the index size. This paper proposes effective index compression techniques based on graph properties as well as label properties; it also explores best practices in using betweenness-based node order to reduce the index size. The efficient betweenness estimation of the graph nodes proposed may be of independent interest to graph practitioners. Extensive experimental results verify our efficiency on billion-scale graphs, near-linear speedup in a multi-core environment, and up to \(94\%\) reduction in the index size.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Notes

  1. 1.

    In many labeling approaches, the labels are pruned in an implicit way—a label will not be generated if pruning it is guaranteed to be safe.

  2. 2.

    The centrality can be defined with degree, closeness, and betweenness [31].

  3. 3.

    http://networkrepository.com/index.php.

  4. 4.

    http://law.di.unimi.it.

  5. 5.

    For the convenience of presentation, we replace an edge (uv) of length 2 with two unit-weighted edges (uw), (wv) with a new node w interpolated in between.

  6. 6.

    http://networkrepository.com/index.php.

  7. 7.

    http://snap.stanford.edu/data/.

  8. 8.

    http://law.di.unimi.it.

  9. 9.

    http://konect.uni-koblenz.de/.

  10. 10.

    We chose ABRA for two reasons. First, as pointed out in [38], ABRA outperforms the method of [37]. Second, ABRA can be terminated at any time during execution, which leads to a fair comparison with our method. The source code of ABRA is also the code used in the literature [6] and has been implemented in parallel with OpenMP.

References

  1. 1.

    Abboud, A., Grandoni, F., Williams, V.V.: Subcubic equivalences between graph centrality problems, APSP and diameter. In: Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1681–1697. SIAM (2014)

  2. 2.

    Abraham, I., Delling, D., Goldberg, A.V., Werneck, R.F.: Hierarchical hub labelings for shortest paths. In: European Symposium on Algorithms, pp. 24–35. Springer (2012)

  3. 3.

    Akiba, T., Iwata, Y., Kawarabayashi, K., Kawata, Y.: Fast shortest-path distance queries on road networks by pruned highway labeling. In: 2014 Proceedings of the Sixteenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 147–154. SIAM (2014)

  4. 4.

    Akiba, T., Iwata, Y., Yoshida, Y.: Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 349–360. ACM (2013)

  5. 5.

    Akiba, T., Sommer, C., Kawarabayashi, K.: Shortest-path queries for complex networks: exploiting low tree-width outside the core. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 144–155. ACM (2012)

  6. 6.

    AlGhamdi, Z., Jamour, F., Skiadopoulos, S., Kalnis, P.: A benchmark for betweenness centrality approximation algorithms on large graphs. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, pp. 1–12 (2017)

  7. 7.

    Bader, D.A., Kintali, S., Madduri, K., Mihail, M.: Approximating betweenness centrality. In: International Workshop on Algorithms and Models for the Web-Graph, pp. 124–137. Springer (2007)

  8. 8.

    Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Srinivasan, S., Ramamritham, K., Kumar, A., Ravindra, M.P., Bertino, E., Kumar, R. (eds.) Proceedings of the 20th International Conference on World Wide Web, pp. 587–596. ACM Press (2011)

  9. 9.

    Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601. ACM Press, Manhattan, USA (2004)

  10. 10.

    Borassi, M., Crescenzi, P., Habib, M.: Into the square: on the complexity of some quadratic-time solvable problems. Electron. Notes Theor. Comput. Sci. 322, 51–67 (2016)

    MathSciNet  Article  Google Scholar 

  11. 11.

    Borassi, M., Natale, E.: KADABRA is an adaptive algorithm for betweenness via random approximation. J. Exp. Algorithmics (JEA) 24(1), 1–35 (2019)

    MathSciNet  MATH  Google Scholar 

  12. 12.

    Borgatti, S.P., Everett, M.G.: A graph-theoretic perspective on centrality. Soc. Netw. 28(4), 466–484 (2006)

    Article  Google Scholar 

  13. 13.

    Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25(2), 163–177 (2001)

    Article  Google Scholar 

  14. 14.

    Brandes, U.: On variants of shortest-path betweenness centrality and their generic computation. Soc. Netw. 30(2), 136–145 (2008)

    Article  Google Scholar 

  15. 15.

    Chen, W., Sommer, C., Teng, S.-H., Wang, Y.: A compact routing scheme and approximate distance oracle for power-law graphs. ACM Trans. Algorithms (TALG) 9(1), 4 (2012)

    MathSciNet  MATH  Google Scholar 

  16. 16.

    Coffman, T., Greenblatt, S., Marcus, S.: Graph-based technologies for intelligence analysis. Commun. ACM 47(3), 45–47 (2004)

    Article  Google Scholar 

  17. 17.

    Cohen, E., Halperin, E., Kaplan, H., Zwick, U.: Reachability and distance queries via 2-hop labels. In: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 937–946. Society for Industrial and Applied Mathematics (2002)

  18. 18.

    Dolev, S., Elovici, Y., Puzis, R.: Routing betweenness centrality. J. ACM (JACM) 57(4), 1–27 (2010)

    MathSciNet  Article  Google Scholar 

  19. 19.

    Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40, 35–41 (1977)

    Article  Google Scholar 

  20. 20.

    Fu, A.W.C., Wu, H., Cheng, J., Wong, R.C.W.: Is-label: an independent-set based labeling scheme for point-to-point distance querying. Proc. VLDB Endow. 6(6), 457–468 (2013)

    Article  Google Scholar 

  21. 21.

    Guimera, R., Mossa, S., Turtschi, A., Amaral, L.A.N.: The worldwide air transportation network: anomalous centrality, community structure, and cities global roles. Proc. Natl. Acad. Sci. 102(22), 7794–7799 (2005)

    MathSciNet  Article  Google Scholar 

  22. 22.

    Hayashi, T., Akiba, T., Kawarabayashi, K.: Fully dynamic shortest-path distance query acceleration on massive networks. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1533–1542. ACM (2016)

  23. 23.

    Hoeffding, W.: Probability inequalities for sums of bounded random variables. In: The Collected Works of Wassily Hoeffding, pp. 409–426. Springer (1994)

  24. 24.

    Jacob, R., Koschützki, D., Lehmann, K.A., Peeters, L., Tenfelde-Podehl, D.: Algorithms for centrality indices. In: Network Analysis, pp. 62–82. Springer (2005)

  25. 25.

    Jeong, H., Mason, S.P., Barabási, A.-L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411(6833), 41–42 (2001)

    Article  Google Scholar 

  26. 26.

    Jiang, M., Fu, A.W.C., Wong, R.C.W., Xu, Y.: Hop doubling label indexing for point-to-point distance querying on scale-free networks. Proc. VLDB Endow. 7(12), 1203–1214 (2014)

    Article  Google Scholar 

  27. 27.

    Kunegis, J.: Konect: the koblenz network collection. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1343–1350. ACM (2013)

  28. 28.

    Jure, L., Andrej, K.: SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June (2014)

  29. 29.

    Li, J., Wang, X., Deng, K., Yang, X., Sellis, T., Yu, J.X.: Most influential community search over large social networks. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 871–882. IEEE (2017)

  30. 30.

    Li, W., Qiao, M., Qin, L., Zhang, Y., Chang, L., Lin, X.: Scaling distance labeling on small-world networks. In: Proceedings of the 2019 International Conference on Management of Data, pp. 1060–1077 (2019)

  31. 31.

    Li, Y., Leong Hou, U., Yiu, M.L., Kou, N.M., et al.: An experimental study on hub labeling based shortest path algorithms. Proc. VLDB Endow. 11(4), 445–457 (2017)

    Article  Google Scholar 

  32. 32.

    Liljeros, F., Edling, C.R., Amaral, L.A., Stanley, H.E., Åberg, Y.: The web of human sexual contacts. Nature 411(6840), 907–908 (2001)

    Article  Google Scholar 

  33. 33.

    Ouyang, D., Qin, L., Chang, L., Lin, X., Zhang, Y., Zhu, Q.: When hierarchy meets 2-hop-labeling: efficient shortest distance queries on road networks. In: Proceedings of the 2018 International Conference on Management of Data, pp. 709–724. ACM (2018)

  34. 34.

    Pfeffer, J., Carley, K.M.: k-centralities: local approximations of global measures based on shortest paths. In: Proceedings of the 21st International Conference on World Wide Web, pp. 1043–1050 (2012)

  35. 35.

    Potamias, M., Bonchi, F., Castillo, C., Gionis, A.: Fast shortest path distance estimation in large networks. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 867–876. ACM (2009)

  36. 36.

    Qiao, M., Cheng, H., Chang, L., Yu, J.X.: Approximate shortest distance computing: a query-dependent local landmark scheme. IEEE Trans. Knowl. Data Eng. 26(1), 55–68 (2014)

    Article  Google Scholar 

  37. 37.

    Riondato, M., Kornaropoulos, E.M.: Fast approximation of betweenness centrality through sampling. Data Min. Knowl. Disc. 30(2), 438–475 (2016)

    MathSciNet  Article  Google Scholar 

  38. 38.

    Riondato, M., Upfal, E.: Abra: approximating betweenness centrality in static and dynamic graphs with rademacher averages. ACM Trans. Knowl. Disc. Data (TKDD) 12(5), 1–38 (2018)

    Article  Google Scholar 

  39. 39.

    Rossi, R., Ahmed, N.: The network data repository with interactive graph analytics and visualization. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)

  40. 40.

    Polina, R., Aris, A., Aristides, G., Nikolaj, T.: Event detection in activity networks. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1176–1185. ACM (2014)

  41. 41.

    Shen, C.Y., Huang, L.H., Yang, D.N., Shuai, H.H., Lee, W.C., Chen, M.S.: On finding socially tenuous groups for online social networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 415–424. ACM (2017)

  42. 42.

    Then, M., Kaufmann, M., Chirigati, F., Hoang-Vu, T.-A., Pham, K., Kemper, A., Neumann, T., Vo, H.T.: The more the merrier: efficient multi-source graph traversal. Proc. VLDB Endow. 8(4), 449–460 (2014)

    Article  Google Scholar 

  43. 43.

    Travers, J., Milgram, S.: The small world problem. Psychol. Today 1(1), 61–67 (1967)

    Google Scholar 

  44. 44.

    Tretyakov, K., Armas-Cervantes, A., García-Bañuelos, L., Vilo, J., Dumas, M.: Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1785–1794. ACM (2011)

  45. 45.

    Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 440 (1998)

  46. 46.

    Wei, F.: TEDI: efficient shortest path query answering on graphs. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 99–110. ACM (2010)

  47. 47.

    Wu, L., Xiao, X., Deng, D., Cong, G., Zhu, A.D., Zhou, S.: Shortest path and distance queries on road networks: an experimental evaluation. Proc. VLDB Endow. 5(5), 406–417 (2012)

    Article  Google Scholar 

Download references

Acknowledgements

Miao Qiao is supported by Marsden Fund UOA1732, Royal Society of New Zealand and Catalyst: Strategic Fund 3721519 from Government Funding, Ministry of Business Innovation and Employment. Lu Qin is supported by ARC FT200100787 and DP210101347. Ying Zhang is supported by ARC DP180103096 and FT170100128. Lijun Chang is supported by ARC DP160101513 and FT180100256. Xuemin Lin is supported by NSFC61232006, 2018YFB1003504, ARC DP200101338, DP180103096, and DP170101628.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Miao Qiao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Proof of Lemma 1

According to triangle inequality, for any node \(u \in V\), \(\mathrm{dist}(s,u) + \mathrm{dist}(u,t) \ge \mathrm{dist}(s,t)\). For a node \(u'\) on a shortest path from s to t, \(\mathrm{dist}(s,t) = \mathrm{dist}(s,u') + \mathrm{dist}(u',t)\). Since \(C(s) \cap C(t)\) shares a node with a shortest path from s to t, \(\min _{v\in C(s) \cap C(t)}\mathrm{dist}(s,v) + \mathrm{dist}(v,t) = \mathrm{dist}(s,t)\).

B Extend \({\mathsf {PSL}} \) to directed graphs

For directed graphs, each node \(v \in V\) is associated with a set of hub nodes \(C_{{\mathsf {IN}}}(v)\), where \(w \in C_{{\mathsf {IN}}}(v)\) can reach v and another set of hub nodes \(C_{{\mathsf {OUT}}}(v)\), where v can reach \(w \in C_{{\mathsf {OUT}}}(v)\). Combined with the distance, we obtain two labels \(L_{{\mathsf {IN}}}(v) = \{(u,\mathrm{dist}(u,v))|u \in C_{{\mathsf {IN}}}(v)\}\) and \(L_{{\mathsf {OUT}}}(v) = \{(u,\mathrm{dist}(v,u))|u \in C_{{\mathsf {OUT}}}(v)\}\) for the node v. To compute the labels \(L_{{\mathsf {OUT}}}(v)\), we run \({\mathsf {PSL}} \) on G; to compute \(L_{{\mathsf {IN}}}(v)\), we reverse the edge direction of graph and run \({\mathsf {PSL}} \) on the reversed graph. To process the distance query q(st), we make use of \(\mathrm{Query}(s,t,L)\) defined in the following equation.

$$\begin{aligned} \mathrm{Query}(s,t,L) = \mathrm{min}_{u \in C_{{\mathsf {OUT}}}(s) \cap C_{{\mathsf {IN}}}(t)} (\mathrm{dist}(s,u) + \mathrm{dist}(u,t)). \end{aligned}$$

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, W., Qiao, M., Qin, L. et al. Distance labeling: on parallelism, compression, and ordering. The VLDB Journal (2021). https://doi.org/10.1007/s00778-021-00694-1

Download citation

Keywords

  • Shortest distance
  • 2-Hop labeling
  • Betweenness
  • Parallelism
  • Compression
  • Ordering