Skip to main content

Answering reachability and K-reach queries on large graphs with label constraints

Abstract

The purpose of this paper is to examine the problem of label-constrained reachability (LCR) and K-reach (LCKR) queries, which are fundamental in a wide variety of applications using directed edge-labeled graphs. While reachability and K-reach queries have been extensively researched, LCR and LCKR queries are much more challenging due to the fact that the number of potential label-constraint sets is exponential to the size of the labels. We note that existing techniques for LCR queries only build a partial index and that their worse-case query time could be comparable to that of an online breadth-first search (BFS). This paper proposes a new label-constrained 2-hop indexing method with innovative pruning rules and order strategies. Our work demonstrates that the worst query time could be bounded by the number of in-out index entries. Extensive experiments demonstrate that the proposed methods substantially outperform the state-of-the-art approach in terms of the query response time (up to 5 orders of magnitude speedup), index size, and the index construction time. More precisely, the method we present can response LCR queries across billion-scale networks within microseconds on a single machine. We formally define the problem of LCKR queries and discuss critical applications for addressing it. To tackle the difficulties presented by label and hop constraints, an efficient upper and lower bound is suggested based on a search method. Using all of these techniques, extensive experiments on synthetic and real-world networks demonstrate that our algorithm outperforms the baseline by about three to four orders of magnitude while maintaining competitive indexing time and size.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Notes

  1. 1.

    Otherwise, the system will be overwhelmed by false alarms raised by the above path analysis.

  2. 2.

    To avoid possible ambiguity of “labeling index” and “label-constrained”, we use “2-hop index technique” to present the “2-hop labeling index technique” used in the literature.

  3. 3.

    Our index construction process for each vertex will return a minimal label path tree.

  4. 4.

    This is the default setting for LI+ in [41].

  5. 5.

    Find the minimum distance between \(v_k\) and u in the index \(L_{k-1}\) with label constraints P[u].labels.

  6. 6.

    w is set due to the initial experiments.

  7. 7.

    P2H+ uses degree order which is the same as LI+ for the sake of comparison fairness. The effect of vertex order will be compared in Table 8

References

  1. 1.

    Abraham, I., Delling, D., Goldberg, A.V., Werneck, R.F.: Hierarchical hub labelings for shortest paths. In European Symposium on Algorithms, pages 24–35. Springer, (2012)

  2. 2.

    Akiba, T., Iwata, Y., Kawarabayashi, K.-i., Kawata, Y.: Fast shortest-path distance queries on road networks by pruned highway labeling. In 2014 Proceedings of the Sixteenth Workshop on Algorithm Engineering and Experiments (ALENEX), pages 147–154. SIAM, (2014)

  3. 3.

    Akiba, T., Iwata, Y., Yoshida, Y.: Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 349–360. ACM, (2013)

  4. 4.

    Barrett, C., Jacob, R., Marathe, M.: Formal-language-constrained path problems. SIAM J. Comput. 30(3), 809–837 (2000)

    MathSciNet  Article  Google Scholar 

  5. 5.

    Bonchi, F., Gionis, A., Gullo, F., Ukkonen, A.: Distance oracles in edge-labeled graphs. In EDBT, pages 547–558, (2014)

  6. 6.

    Chen, M., Gu, Y., Bao, Y., Yu, G.: Label and distance-constraint reachability queries in uncertain graphs. In International Conference on Database Systems for Advanced Applications, pages 188–202. Springer, (2014)

  7. 7.

    Chen, X., Lai, L., Qin, L., Lin, X., Liu, B.: A framework to quantify approximate simulation on graph data. In 2021 IEEE 37th International Conference on Data Engineering (ICDE), pages 1308–1319. IEEE, (2021)

  8. 8.

    Cheng, J., Huang, S., Wu, H., Fu, A.W.-C.: Tf-label: a topological-folding labeling scheme for reachability querying in a large graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 193–204. ACM, (2013)

  9. 9.

    Cheng, J., Shang, Z., Cheng, H., Wang, H., K-reach, J.XYu.: who is in your small world. Proceedings of the VLDB Endowment 5(11), 1292–1303 (2012)

  10. 10.

    Cheng, J., Yu, J.X.: On-line exact shortest distance query processing. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pages 481–492. ACM, (2009)

  11. 11.

    Cheng, J., Yu, J.X., Lin, X., Wang, H., Philip, S.Y.: Fast computation of reachability labeling for large graphs. In International Conference on Extending Database Technology, pages 961–979. Springer, (2006)

  12. 12.

    Cohen, E., Halperin, E., Kaplan, H., Zwick, U.: Reachability and distance queries via 2-hop labels. SIAM J. Comput. 32(5), 1338–1355 (2003)

    MathSciNet  Article  Google Scholar 

  13. 13.

    Fang, Y., Cheng, R., Li, X., Luo, S., Hu, J.: Effective community search over large spatial graphs. PVLDB 10(6), 709–720 (2017)

    Google Scholar 

  14. 14.

    Fang, Y., Cheng, R., Luo, S., Hu, J.: Effective community search for large attributed graphs. PVLDB 9(12), 1233–1244 (2016)

    Google Scholar 

  15. 15.

    Fang, Y., Cheng, R., Luo, S., Hu, J., Huang, K.: C-explorer: browsing communities in large graphs. PVLDB 10(12), 1885–1888 (2017)

    Google Scholar 

  16. 16.

    Fang, Y., Huang, X., Qin, L., Zhang, Y., Zhang, W., Cheng, R., Lin, X.: A survey of community search over big graphs. VLDB J. 29(1), 353–392 (2020)

    Article  Google Scholar 

  17. 17.

    Fang, Y., Yang, Y., Zhang, W., Lin, X., Cao, X.: Effective and efficient community search over large heterogeneous information networks. PVLDB 13(6), 854–857 (2020)

    Google Scholar 

  18. 18.

    Fang, Y., Yu, K., Cheng, R., Lakshmanan, L.V., Lin, X.: Efficient algorithms for densest subgraph discovery. PVLDB 12(11), 1719–1732 (2019)

    Google Scholar 

  19. 19.

    Hassan, M.S., Aref, W.G., Aly, A.M.: Graph indexing for shortest-path finding over dynamic sub-graphs. In Proceedings of the 2016 International Conference on Management of Data, pages 1183–1197. ACM, (2016)

  20. 20.

    Hu, J., Cheng, R., Chang, K.C.-C., Sankar, A., Fang, Y., Lam, B.Y.: Discovering maximal motif cliques in large heterogeneous information networks. In International Conference on Data Engineering (ICDE), pages 746–757. IEEE, (2019)

  21. 21.

    Jin, R., Hong, H., Wang, H., Ruan, N., Xiang, Y.: Computing label-constraint reachability in graph databases. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 123–134. ACM, (2010)

  22. 22.

    Jin, R., Wang, G.: Simple, fast, and scalable reachability oracle. Proceedings of the VLDB Endowment 6(14), 1978–1989 (2013)

  23. 23.

    Jin, X., Yang, Z., Lin, X., Yang, S., Qin, L., Peng, Y.: Fast: Fpga-based subgraph matching on massive graphs. arXiv preprint arXiv:2102.10768, (2021)

  24. 24.

    Klodt, P., Weikum, G., Bedathur, S., Seufert, S.: Indexing strategies for constrained shortest paths over large social networks. Universitat des Saarlandes, (2011)

  25. 25.

    Koschmieder, A., Leser, U.: Regular path queries on large graphs. In International Conference on Scientific and Statistical Database Management, pages 177–194. Springer, (2012)

  26. 26.

    Kunegis, J.: Konect: the koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web, pages 1343–1350. ACM, (2013)

  27. 27.

    Lai, Z., Peng, Y., Yang, S., Lin, X., Zhang, W.: Pefp: Efficient k-hop constrained s-t simple path enumeration on fpga. In ICDE, IEEE (2021)

  28. 28.

    Leskovec, J.: Snap: Stanford large network dataset collection, (2016)

  29. 29.

    Leskovec, J., Sosic, R.: Snap: A general purpose network analysis and graph mining library in c++, (2014)

  30. 30.

    Li, Y., Yiu, M.L., Kou, N.M., et al.: An experimental study on hub labeling based shortest path algorithms. Proceedings of the VLDB Endowment 11(4), 445–457 (2017)

  31. 31.

    Li, Z., Fang, Y., Qin, L., Cheng, J., Cheng, R., Lui, J.C.: Walking in the cloud: parallel simrank at scale. PVLDB 9(1), 24–35 (2015)

    Google Scholar 

  32. 32.

    Ma, C., Fang, Y., Cheng, R., Lakshmanan, L.V., Zhang, W., Lin, X.: Efficient algorithms for densest subgraph discovery on large directed graphs. In ACM SIGMOD, pages 1051–1066, (2020)

  33. 33.

    Peng, Y., Lin, X., Zhang, Y., Zhang, W., Qin, L., Zhou, J.: Efficient hop-constrained s-t simple path enumeration. The VLDB Journal, pages 1–24, (2021)

  34. 34.

    Peng, Y., Zhang, Y., Lin, X., Qin, L., Zhang, W.: Answering billion-scale label-constrained reachability queries within microsecond. Proceedings of the VLDB Endowment 13(6), 812–825 (2020)

  35. 35.

    Peng, Y., Zhang, Y., Lin, X., Zhang, W., Qin, L., Zhou, J.: Hop-constrained s-t simple path enumeration: Towards bridging theory and practice. Proceedings of the VLDB Endowment 13(4), 463–476 (2019)

    Article  Google Scholar 

  36. 36.

    Peng, Y., Zhang, Y., Lin, X., Zhang, W., Qin, L., Zhou, J.: Towards bridging theory and practice: hop-constrained st simple path enumeration. Proc. VLDB Endow. 13(4), 463–476 (2019)

    Article  Google Scholar 

  37. 37.

    Peng, Y., Zhang, Y., Zhang, W., Lin, X., Qin, L.: Efficient probabilistic k-core computation on uncertain graphs. In 2018 IEEE 34th International Conference on Data Engineering (ICDE), pages 1192–1203. IEEE, (2018)

  38. 38.

    Peng, Y., Zhao, W., Zhang, W., Lin, X., Zhang, Y.: Dlq: A system for label-constrained reachability queries on dynamic graphs. In Proceedings of the 230th ACM International Conference on Information & Knowledge Management, (2021)

  39. 39.

    Qiu, X., Cen, W., Qian, Z., Peng, Y., Zhang, Y., Lin, X., Zhou, J.: Real-time constrained cycle detection in large dynamic graphs. Proc. VLDB Endow. 11(12), 1876–1888 (2018)

    Article  Google Scholar 

  40. 40.

    Qiu, X., Cen, W., Qian, Z., Peng, Y., Zhang, Y., Lin, X., Zhou, J.: Real-time constrained cycle detection in large dynamic graphs. PVLDB 11(12), 1876–1888 (2018)

    Google Scholar 

  41. 41.

    Valstar, L.D., Fletcher, G.H., Yoshida, Y.: Landmark indexing for evaluation of label-constrained reachability queries. In Proceedings of the 2017 ACM International Conference on Management of Data, pages 345–358. ACM, (2017)

  42. 42.

    van Rest, O., Hong, S., Kim, J., Meng, X., Chafi, H.: Pgql: a property graph query language. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, page 7. ACM, (2016)

  43. 43.

    Wadhwa, S., Prasad, A., Ranu, S., Bagchi, A., Bedathur, S.: Efficiently answering regular simple path queries on large labeled networks. Age 3, v7 (2019)

    Google Scholar 

  44. 44.

    Wood, P.T.: Query languages for graph databases. ACM SIGMOD Rec. 41(1), 50–60 (2012)

    Article  Google Scholar 

  45. 45.

    Yuan, Y., Lian, X., Wang, G., Ma, Y., Wang, Y.: Constrained shortest path query in a large time-dependent graph. Proc. VLDB Endow. 12(10), 1058–1070 (2019)

    Article  Google Scholar 

  46. 46.

    Yue, D., Wu, X., Wang, Y., Li, Y., Chu, C.-H.: A review of data mining-based financial fraud detection research. In 2007 International Conference on Wireless Communications, Networking and Mobile Computing, pages 5519–5522. Ieee, (2007)

  47. 47.

    Zhang, X., Özsu, M.T.: Correlation constraint shortest path over large multi-relation graphs. Proc. VLDB Endow. 12(5), 488–501 (2019)

    Article  Google Scholar 

  48. 48.

    Zou, L., Xu, K., Yu, J.X., Chen, L., Xiao, Y., Zhao, D.: Efficient processing of label-constraint reachability queries in large graphs. Inf. Syst. 40, 47–66 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

Xuemin Lin is supported by ARC DP180103096 and DP200101338. Ying Zhang is supported by ARC FT170100128 and ARC DP210101393. Wenjie Zhang is supported by ARC DP200101116 and DP1801003096. Lu Qin is supported by ARC FT200100787 and DP210101347.

Author information

Affiliations

Authors

Corresponding author

Correspondence to You Peng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Peng, Y., Lin, X., Zhang, Y. et al. Answering reachability and K-reach queries on large graphs with label constraints. The VLDB Journal (2021). https://doi.org/10.1007/s00778-021-00695-0

Download citation

Keywords

  • Label Constraints
  • Reachability
  • Graph Theory