Advertisement

The VLDB Journal

, Volume 27, Issue 1, pp 1–26 | Cite as

Reachability querying: an independent permutation labeling approach

  • Hao Wei
  • Jeffrey Xu Yu
  • Can Lu
  • Ruoming Jin
Regular Paper
  • 388 Downloads

Abstract

Reachability query is a fundamental graph operation which answers whether a vertex can reach another vertex over a large directed graph G with n vertices and m edges and has been extensively studied. In the literature, all the approaches compute a label for every vertex in a graph G by index construction offline. The query time for answering reachability queries online is affected by the quality of the labels computed in index construction. The three main costs are the index construction time, the index size, and the query time. Some of the up-to-date approaches can answer reachability queries efficiently, but spend nonlinear time to construct an index. Some of the up-to-date approaches construct an index in linear time and space, but may need to depth-first search G at run-time in \(O(n + m)\). In this paper, we discuss a new randomized labeling approach, named IP label, to answer reachability queries with probability guarantee, and the randomness is by independent permutation. Two additional labels are also proposed to further enhance the query processing. In addition, to deal with dynamic graphs, we discuss the label maintenance over dynamic graphs and give efficient algorithms for the labels proposed. We conduct extensive experimental studies to compare with the up-to-date approaches using 19 large real datasets used in the existing work and synthetic datasets. We confirm the efficiency and scalability of our approach in static graphs testing, and our maintenance algorithms are about one order of magnitude faster than the existing ones in dynamic graphs testing.

Keywords

Reachability query Randomized labeling Label maintenance 

Notes

Acknowledgements

The work was supported by the grants of the Research Grants Council of the Hong Kong SAR, China, No. 14209314 and 14221716.

References

  1. 1.
    Abboud, A., Williams, V.V.: Popular conjectures imply strong lower bounds for dynamic problems. In: FOCS (2014)Google Scholar
  2. 2.
    Agrawal, R., Borgida, A., Jagadish, H.V.: Efficient management of transitive relationships in large data and knowledge bases. In: Proceedings of SIGMOD’89 (1989)Google Scholar
  3. 3.
    Boldi, P., Santini, M., Vigna, S.: A large time-aware web graph. SIGIR Forum 42(2), 33–38 (2008)CrossRefGoogle Scholar
  4. 4.
    Bramandia, R., Choi, B., Ng, W.K.: Incremental maintenance of 2-hop labeling of large graphs. IEEE Trans. Knowl. Data Eng. 22(5), 682–698 (2010)CrossRefGoogle Scholar
  5. 5.
    Broder, A.: On the resemblance and containment of documents. In: Proceedings of SEQUENCES’97 (1997)Google Scholar
  6. 6.
    Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. In: Proceedings of STOC’98 (1998)Google Scholar
  7. 7.
    Cai, J., Poon, C.K.: Path-hop: efficiently indexing large graphs for reachability queries. In: Proceedings of CIKM’10 (2010)Google Scholar
  8. 8.
    Cha, M., Haddadi, H., Benevenuto, F., Gummadi, P.K.: Measuring user influence in twitter: the million follower fallacy. In: Proceedings of ICWSM’10 (2010)Google Scholar
  9. 9.
    Chen, L., Gupta, A., Kurul, M.E.: Stack-based algorithms for pattern matching on dags. In: Proceedings of VLDB’05 (2005)Google Scholar
  10. 10.
    Chen, Y., Chen, Y.: An efficient algorithm for answering graph reachability queries. In: Proceedings of ICDE’08 (2008)Google Scholar
  11. 11.
    Chen, Y., Chen, Y.: Decomposing dags into spanning trees: a new way to compress transitive closures. In: Proceedings of ICDE’11 (2011)Google Scholar
  12. 12.
    Cheng, J., Huang, S., Wu, H., Fu, A.W.-C.: Tf-label: a topological-folding labeling scheme for reachability querying in a large graph. In: Proceedings of SIGMOD’13 (2013)Google Scholar
  13. 13.
    Cheng, J., Shang, Z., Cheng, H., Wang, K., Yu, J.X.: K-reach: who is in your small world. PVLDB 5(11), 1292–1303 (2012)Google Scholar
  14. 14.
    Cheng, J., Yu, J.X., Lin, X., Wang, H., Yu, P.S.: Fast computation of reachability labeling for large graphs. In: Proceedings of EDBT’06 (2006)Google Scholar
  15. 15.
    Cheng, J., Yu, J.X., Lin, X., Wang, H., Yu, P.S.: Fast computing reachability labelings for large graphs with high compression rate. In: Proceedings of EDBT’08 (2008)Google Scholar
  16. 16.
    Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci. 55(3), 441–453 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Cohen, E., Halperin, E., Kaplan, H., Zwick, U.: Reachability and distance queries via 2-hop labels. In: Proceedings of SODA’02 (2002)Google Scholar
  18. 18.
    Cohen, E., Kaplan, H.: Summarizing data using bottom-k sketches. In: PODC (2007)Google Scholar
  19. 19.
    Cohen, E., Kaplan, H.: Tighter estimation using bottom k sketches. PVLDB 1(1), 213–224 (2008)Google Scholar
  20. 20.
    Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. ACM SIGCOMM Comput. Commun. Rev. 29(4), 251–262 (1999)CrossRefzbMATHGoogle Scholar
  21. 21.
    Fisher, R.A., Yates, F., et al.: Statistical Tables for Biological, Agricultural and Medical Research, 3rd edn. Oliver and Boyd, Edinburgh (1949)zbMATHGoogle Scholar
  22. 22.
    Jagadish, H.V.: A compression technique to materialize transitive closure. ACM Trans. Database Syst. 15(4), 558–598 (1990)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Jin, R., Ruan, N., Dey, S., Yu, J. X.: Scarab: scaling reachability computation on large graphs. In: Proceedings of SIGMOD’12 (2012)Google Scholar
  24. 24.
    Jin, R., Ruan, N., Xiang, Y., Wang, H.: Path-tree: an efficient reachability indexing scheme for large directed graphs. ACM Trans. Database Syst. 36(1), 7:1–7:44 (2011). doi: 10.1145/1929934.1929941
  25. 25.
    Jin, R., Wang, G.: Simple, fast, and scalable reachability oracle. PVLDB 6(14), 1978–1989 (2013)Google Scholar
  26. 26.
    Jin, R., Xiang, Y., Ruan, N., Fuhry, D.: 3-HOP: A high-compression indexing scheme for reachability query. In: Proceedings of SIGMOD’09 (2009)Google Scholar
  27. 27.
    Jin, R., Xiang, Y., Ruan, N., Wang, H.: Efficiently answering reachability queries on very large directed graphs. In: Proceedings of SIGMOD’08 (2008)Google Scholar
  28. 28.
    Knuth, D.E.: The Art of Computer Programming, Volume 2: Seminumerical Algorithms. Addison-Wesley, Boston (1981)zbMATHGoogle Scholar
  29. 29.
    Lacki, J.: Improved deterministic algorithms for decremental reachability and strongly connected components. ACM Trans. Algorithms 9(3), 27:1–27:15 (2013). doi: 10.1145/2483699.2483707
  30. 30.
    Roditty, L.: Decremental maintenance of strongly connected components. In: Proceedings of SODA (2013)Google Scholar
  31. 31.
    Schenkel, R., Theobald, A., Weikum, G.: Hopi: An efficient connection index for complex XML document collections. In: Proceedings of EDBT’04 (2004)Google Scholar
  32. 32.
    Seufert, S., Anand, A., Bedathur, S. J., Weikum, G.: Ferrari: Flexible and efficient reachability range assignment for graph indexing. In: Proceedings of ICDE’13 (2013)Google Scholar
  33. 33.
    Simon, K.: An improved algorithm for transitive closure on acyclic digraphs. Theor. Comput. Sci. 58(1–3), 325–346 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    TrißI, S., Leser, U.: Fast and practical indexing and querying of very large graphs. In: Proceedings of SIGMOD’07 (2007)Google Scholar
  35. 35.
    van Schaik, S.J., de Moor, O.: A memory efficient reachability data structure through bit vector compression. In: Proceedings of SIGMOD’11 (2011)Google Scholar
  36. 36.
    Veloso, R.R., Cerf, L., W, M. Jr.: Zaki, M.J.: Reachability queries in very large graphs: A fast refined online search approach. In: Proceedings of EDBT (2014)Google Scholar
  37. 37.
    Wang, H., He, H., Yang, J., Yu, P. S., Yu, J. X.: Dual labeling: answering graph reachability queries in constant time. In: Proceedings of ICDE’06 (2006)Google Scholar
  38. 38.
    Wei, H., Yu, J.X., Lu, C., Jin, R.: Reachability querying: an independent permutation labeling approach. PVLDB 7(12), 1191–1202 (2014)Google Scholar
  39. 39.
    Yildirim, H., Chaoji, V., Zaki, M.J.: Grail: scalable reachability index for large graphs. PVLDB 3(1), 276–284 (2010)Google Scholar
  40. 40.
    Yildirim, H., Chaoji, V., Zaki, M.J.: Grail: a scalable index for reachability queries in very large graphs. VLDB J. 21(4), 509–534 (2012)CrossRefGoogle Scholar
  41. 41.
    Yildirim, H., Chaoji, V., Zaki, M. J.: Dagger: a scalable index for reachability queries in large dynamic graphs. arXiv preprint arXiv:1301.0977 (2013)
  42. 42.
    Yu, J. X., Cheng, J.: Graph reachability queries: A survey. In: Aggarwal, C.C., Wang, H. (eds.) Managing and Mining Graph Data, pp. 181–215. Springer (2010)Google Scholar
  43. 43.
    Zhang, Z., Yu, J. X., Qin, L., Zhu, Q., Zhou, X.: I/o cost minimization: reachability queries processing over massive graphs. In: Proceedings of EDBT’12 (2012)Google Scholar
  44. 44.
    Zhu, A. D., Lin, W., Wang, S., Xiao, X.: Reachability queries on large dynamic graphs: a total order approach. In: Proceedings of the 2014 ACM SIGMOD (2014)Google Scholar
  45. 45.
    Zhu, L., Choi, B., He, B., Yu, J.X., Ng, W.K.: A uniform framework for ad-hoc indexes to answer reachability queries on large graphs. In: DASFAA (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.Chinese University of Hong KongHong KongChina
  2. 2.Kent State UniversityKentUSA

Personalised recommendations