Advertisement

Efficient distributed reachability querying of massive temporal graphs

  • 253 Accesses

Abstract

Reachability computation is a fundamental graph functionality with a wide range of applications. In spite of this, little work has as yet been done on efficient reachability queries over temporal graphs, which are used extensively to model time-varying networks, such as communication networks, social networks, and transportation schedule networks. Moreover, we are faced with increasingly large real-world temporal networks that may be distributed across multiple data centers. This state of affairs motivates the paper’s study of efficient reachability queries on distributed temporal graphs. We propose an efficient index, called Temporal Vertex Labeling (TVL), which is a labeling scheme for distributed temporal graphs. We also present algorithms that exploit TVL to achieve efficient support for distributed reachability querying over temporal graphs in Pregel-like systems. The algorithms exploit several optimizations that hinge upon non-trivial lemmas. Extensive experiments using massive real and synthetic temporal graphs are conducted to provide detailed insight into the efficiency and scalability of the proposed methods, covering both index construction and query processing. Compared with the state-of-the-art methods, the TVL based query algorithms are capable of up to an order of magnitude speedup with lower index construction overhead.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Notes

  1. 1.

    Giraph is available at http://giraph.apache.org/.

  2. 2.

    Hama is available at http://hama.apache.org/.

  3. 3.

    KONECT is available at konect.uni-koblenz.de/.

  4. 4.

    JTGraph is available at http://www.cse.psu.edu/~kxm85/software/GTgraph/.

  5. 5.

    GTimer is available at http://www.cse.cuhk.edu.hk/systems/graph/Gtimer/index.html.

  6. 6.

    Code of Grail is available at https://code.google.com/archive/p/grail/.

  7. 7.

    Austin is available at https://code.google.com/archive/p/googletransitdatafeed/wikis/PublicFeeds.wiki.

References

  1. 1.

    Agrawal, R., Borgida, A., Jagadish, H.V.: Efficient management of transitive relationships in large data and knowledge bases. In: SIGMOD, pp. 253–262 (1989)

  2. 2.

    Batarfi, O., Shawi, R.E., Fayoumi, A.G., Nouri, R., Beheshti, S., Barnawi, A., Sakr, S.: Large scale graph processing systems: survey and an experimental evaluation. Clust. Comput. 18(3), 1189–1213 (2015)

  3. 3.

    Casteigts, A., Flocchini, P., Quattrociocchi, W., Santoro, N.: Time-varying graphs and dynamic networks. IJPEDS 27(5), 387–408 (2012)

  4. 4.

    Chen, L., Gupta, A., Kurul, M.E.: Stack-based algorithms for pattern matching on dags. In: VLDB, pp. 493–504 (2005)

  5. 5.

    Chen, Y., Chen, Y.: An efficient algorithm for answering graph reachability queries. In: ICDE, pp. 893–902 (2008)

  6. 6.

    Cheng, J., Huang, S., Wu, H., Fu, A.W.: Tf-label: A topological-folding labeling scheme for reachability querying in a large graph. In: SIGMOD, pp. 193–204 (2013)

  7. 7.

    Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

  8. 8.

    Fan, W., Wang, X., Wu, Y.: Performance guarantees for distributed reachability queries. PVLDB 5(11), 1304–1315 (2012)

  9. 9.

    Gao, Y., Miao, X., Chen, G., Zheng, B., Cai, D., Cui, H.: On efficiently finding reverse k-nearest neighbors over uncertain graphs. VLDB J. 26(4), 467–492 (2017)

  10. 10.

    Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: Graph processing in a distributed dataflow framework. In: OSDI, pp. 599–613 (2014)

  11. 11.

    Gurajada, S., Theobald, M.: Distributed set reachability. In: SIGMOD, pp. 1247–1261 (2016)

  12. 12.

    Holme, P., Saramäki, J.: Temporal networks. Phys. Rep. 519(3), 97–125 (2012)

  13. 13.

    Huang, S., Cheng, J., Wu, H.: Temporal graph traversals: definitions, algorithms, and applications. CoRR arxiv:1401.1919 (2014)

  14. 14.

    Huang, S., Fu, A.W., Liu, R.: Minimum spanning trees in temporal graphs. In: SIGMOD, pp. 419–430 (2015)

  15. 15.

    Jagadish, H.V.: A compression technique to materialize transitive closure. ACM Trans. Database Syst. 15(4), 558–598 (1990)

  16. 16.

    Jin, R., Ruan, N., Dey, S., Yu, J.X.: SCARAB: scaling reachability computation on large graphs. In: SIGMOD, pp. 169–180 (2012)

  17. 17.

    Jin, R., Ruan, N., Xiang, Y., Wang, H.: Path-tree: An efficient reachability indexing scheme for large directed graphs. ACM Trans. Database Syst. 36(1), 7:1–7:44 (2011)

  18. 18.

    Jin, R., Wang, G.: Simple, fast, and scalable reachability oracle. PVLDB 6(14), 1978–1989 (2013)

  19. 19.

    Jin, R., Xiang, Y., Ruan, N., Wang, H.: Efficiently answering reachability queries on very large directed graphs. In: SIGMOD, pp. 595–608 (2008)

  20. 20.

    Kostakos, V.: Temporal graphs. Phys. A Stat. Mech. Appl. 388(6), 1007–1023 (2009)

  21. 21.

    Koubarakis, M., Stamou, G.B., Stoilos, G., Horrocks, I., Kolaitis, P.G., Lausen, G., Weikum, G. (eds.): Reasoning Web. Reasoning on the Web in the Big Data Era. Lecture Notes in Computer Science, vol. 8714. Springer (2014)

  22. 22.

    Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)

  23. 23.

    Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)

  24. 24.

    Michail, O., Spirakis, P.G.: Traveling salesman problems in temporal graphs. Theor. Comput. Sci. 634, 1–23 (2016)

  25. 25.

    Nicosia, V., Tang, J.K., Musolesi, M., Russo, G., Mascolo, C., Latora, V.: Components in time-varying graphs. CoRR arxiv:1106.2134 (2011)

  26. 26.

    Pan, R.K., Saramäki, J.: Path lengths, correlations, and centrality in temporal networks. CoRR arxiv:1101.5913 (2011)

  27. 27.

    Redmond, U., Cunningham, P.: Temporal subgraph isomorphism. In: ASONAM, pp. 1451–1452 (2013)

  28. 28.

    Redmond, U., Cunningham, P.: Subgraph isomorphism in temporal networks. CoRR arxiv:1605.02174 (2016)

  29. 29.

    van Schaik, S.J., de Moor, O.: A memory efficient reachability data structure through bit vector compression. In: SIGMOD, pp. 913–924 (2011)

  30. 30.

    Seufert, S., Anand, A., Bedathur, S.J., Weikum, G.: FERRARI: flexible and efficient reachability range assignment for graph indexing. In: ICDE, pp. 1009–1020 (2013)

  31. 31.

    Shao, B., Wang, H., Li, Y.: Trinity: A distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)

  32. 32.

    Su, J., Zhu, Q., Wei, H., Yu, J.X.: Reachability querying: can it be even faster? TKDE 29(3), 683–697 (2017)

  33. 33.

    Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From think like a vertex to think like a graph. PVLDB 7(3), 193–204 (2013)

  34. 34.

    Trißl, S., Leser, U.: Fast and practical indexing and querying of very large graphs. In: SIGMOD, pp. 845–856 (2007)

  35. 35.

    Ueno, K., Suzumura, T., Maruyama, N., Fujisawa, K., Matsuoka, S.: Efficient breadth-first search on massively parallel and distributed-memory machines. Data Sci. Eng. 2(1), 22–35 (2017)

  36. 36.

    Wang, H., He, H., Yang, J., Yu, P.S., Yu, J.X.: Dual labeling: Answering graph reachability queries in constant time. In: ICDE, p. 75 (2006)

  37. 37.

    Wang, S., Lin, W., Yang, Y., Xiao, X., Zhou, S.: Efficient route planning on public transportation networks: a labelling approach. In: SIGMOD, pp. 967–982 (2015)

  38. 38.

    Wei, H., Yu, J.X., Lu, C., Jin, R.: Reachability querying: an independent permutation labeling approach. PVLDB 7(12), 1191–1202 (2014)

  39. 39.

    Wu, H., Cheng, J., Huang, S., Ke, Y., Lu, Y., Xu, Y.: Path problems in temporal graphs. PVLDB 7(9), 721–732 (2014)

  40. 40.

    Wu, H., Huang, Y., Cheng, J., Li, J., Ke, Y.: Efficient processing of reachability and time-based path queries in a temporal graph. CoRR arxiv:1601.05909 (2016)

  41. 41.

    Wu, H., Huang, Y., Cheng, J., Li, J., Ke, Y.: Reachability and time-based path queries in temporal graphs. In: ICDE, pp. 145–156 (2016)

  42. 42.

    Yan, D., Cheng, J., Lu, Y., Ng, W.: Blogel: a block-centric framework for distributed computation on real-world graphs. PVLDB 7(14), 1981–1992 (2014)

  43. 43.

    Yan, D., Cheng, J., Lu, Y., Ng, W.: Effective techniques for message reduction and load balancing in distributed graph computation. In: WWW, pp. 1307–1317 (2015)

  44. 44.

    Yan, D., Tian, Y., Cheng, J.: Systems for Big Graph Analytics. Springer Briefs in Computer Science. Springer, Berlin (2017)

  45. 45.

    Yang, Y., Yan, D., Wu, H., Cheng, J., Zhou, S., Lui, J.C.S.: Diversified temporal subgraph pattern mining. In: SIGKDD, pp. 1965–1974 (2016)

  46. 46.

    Yano, Y., Akiba, T., Iwata, Y., Yoshida, Y.: Fast and scalable reachability queries on graphs by pruned labeling with landmarks and paths. In: CIKM, pp. 1601–1606 (2013)

  47. 47.

    Yildirim, H., Chaoji, V., Zaki, M.J.: GRAIL: a scalable index for reachability queries in very large graphs. VLDB J. 21(4), 509–534 (2012)

  48. 48.

    Yildirim, H., Chaoji, V., Zaki, M.J.: DAGGER: a scalable index for reachability queries in large dynamic graphs. CoRR arxiv:1301.0977 (2013)

  49. 49.

    Yu, J.X., Cheng, J.: Graph reachability queries: a survey. In: Managing and Mining Graph Data, pp. 181–215 (2010)

  50. 50.

    Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI, pp. 15–28 (2012)

  51. 51.

    Zhang, X., Chen, L.: Distance-aware selective online query processing over large distributed graphs. Data Sci. Eng. 2(1), 2–21 (2017)

  52. 52.

    Zhu, A.D., Lin, W., Wang, S., Xiao, X.: Reachability queries on large dynamic graphs: a total order approach. In: SIGMOD, pp. 1323–1334 (2014)

Download references

Acknowledgements

This work was supported in part by the National Key R&D Program of China under Grant No. 2018YFB1004003, the NSFC under Grant No. 61972338, the NSFC-Zhejiang Joint Fund under Grant No. U1609217, the ZJU-Hikvision Joint Project, and the National Research Foundation, Prime Minister’s Office, Singapore under its International Research Centres in Singapore Funding Initiative. Yunjun Gao is the corresponding author of the work.

Author information

Correspondence to Yunjun Gao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, T., Gao, Y., Chen, L. et al. Efficient distributed reachability querying of massive temporal graphs. The VLDB Journal 28, 871–896 (2019). https://doi.org/10.1007/s00778-019-00572-x

Download citation

Keywords

  • Graph
  • Reachability
  • Distributed processing
  • Query processing
  • Algorithm