Skip to main content
Log in

GRAIL: a scalable index for reachability queries in very large graphs

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Given a large directed graph, rapidly answering reachability queries between source and target nodes is an important problem. Existing methods for reachability tradeoff indexing time and space versus query time performance. However, the biggest limitation of existing methods is that they do not scale to very large real-world graphs. We present a simple yet scalable reachability index, called GRAIL, that is based on the idea of randomized interval labeling and that can effectively handle very large graphs. Based on an extensive set of experiments, we show that while more sophisticated methods work better on small graphs, GRAIL is the only index that can scale to millions of nodes and edges. GRAIL has linear indexing time and space, and the query time ranges from constant time to being linear in the graph order and size. Our reference C++ implementations are open source and available for download at http://www.code.google.com/p/grail/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R., Borgida A., Jagadish H.V.: Efficient management of transitive relationships in large data and knowledge bases. SIGMOD Rec. 18(2), 253–262 (1989)

    Article  Google Scholar 

  2. Barabási A.L., Albert R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)

    Article  MathSciNet  Google Scholar 

  3. Bouros, P., Skiadopoulos, S., Dalamagas, T., Sacharidis, D., Sellis, T.: Evaluating reachability queries over path collections. In: SSDBM, p. 416 (2009)

  4. Bramandia, R., Choi, B., Ng, W.K.: On incremental maintenance of 2-hop labeling of graphs. In: WWW (2008)

  5. Chen, L., Gupta, A., Kurul, M.E.: Stack-based algorithms for pattern matching on dags. In: VLDB (2005)

  6. Chen, Y.: General spanning trees and reachability query evaluation. In: Canadian Conference on Computer Science and Software Engineering, Montreal (2009)

  7. Chen, Y., Chen, Y.: An efficient algorithm for answering graph reachability queries. In: ICDE (2008)

  8. Cheng, J., Yu, J.X., Lin, X., Wang, H., Yu, P.S.: Fast computing reachability labelings for large graphs with high compression rate. In: EBDT (2008)

  9. Cohen, E.: Estimating the size of the transitive closure in linear time. In: 35th Annual Symposium on Foundations of Computer Science, pp. 190–200 (1994)

  10. Cohen E.: Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci. 55(3), 441–453 (1997)

    Article  MATH  Google Scholar 

  11. Cohen E., Halperin E., Kaplan H., Zwick U.: Reachability and distance queries via 2-hop labels. SIAM J. Comput. 32(5), 1335–1355 (2003)

    Article  MathSciNet  Google Scholar 

  12. Cormen T.H., Leiserson C.E., Rivest R.L., Stein C.: Introduction to Algorithms. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  13. Demetrescu, C., Italiano, G.: Fully dynamic transitive closure: breaking through the O (n 2) barrier. In: FOCS (2000)

  14. Demetrescu C., Italiano G.: Dynamic shortest paths and transitive closure: algorithmic techniques and data structures. J. Discret. Algorithms 4(3), 353–383 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  15. Dietz, P.F.: Maintaining order in a linked list. In: STOC (1982)

  16. Gene Ontology. http://www.geneontology.org/ (2010). Accessed 4 Dec 2010

  17. Gene Ontology. Go Database Guide. http://www.geneontology.org/GO.database.shtml#schema_notes (2010). Accessed 4 Dec 2010

  18. He, H., Wang, H., Yang, J., Yu, P.S.: Compact reachability labeling for graph-structured data. In: CIKM (2005)

  19. Herman, I.L: W3c semantic web faq. http://www.w3.org/2001/sw/SW-FAQ# (2010). Accessed 4 Dec 2010

  20. Jagadish H.V.: A compression technique to materialize transitive closure. ACM Trans. Database Syst. 15(4), 558–598 (1990)

    Article  MathSciNet  Google Scholar 

  21. Jin, R., Xiang, Y., Ruan, N., Fuhry, D.: 3-hop: a high-compression indexing scheme for reachability query. In: SIGMOD (2009)

  22. Jin, R., Xiang, Y., Ruan, N., Wang, H.: Efficient answering reachability queries on very large directed graphs. In: SIGMOD (2008)

  23. King V., Sagert G.: A fully dynamic algorithm for maintaining the transitive closure. J. Comput. Syst. Sci. 65(1), 150–167 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  24. Krommidas I., Zaroliagis C.: An experimental study of algorithms for fully dynamic transitive closure. J. Exp. Algorithmics 12, 16 (2008)

    Article  MathSciNet  Google Scholar 

  25. Leskovec, J.: Snap Network Analysis Library. http://snap.stanford.edu/snap/index.html (2010). Accessed 4 Dec 2010

  26. Roditty, L., Zwick, U.: A fully dynamic reachability algorithm for directed graphs with an almost linear update time. In: STOC (2004)

  27. Schenkel, R., Theobald, A., Weikum, G.: HOPI: an efficient connection index for complex XML document collections. In: EBDT (2004)

  28. Schenkel, R., Theobald, A., Weikum, G.: Efficient creation and incremental maintenance of the hopi index for complex xml document collections. In: ICDE (2005)

  29. Steve Harris, G.: Sparql 1.1 Query Language. http://www.w3.org/TR/sparql11-query/#propertypaths (2010). Accessed 4 Dec 2010

  30. Trissl, S., Leser, U.: Fast and practical indexing and querying of very large graphs. In: SIGMOD (2007)

  31. UniProt. http://www.uniprot.org/ (2010). Accessed 4 Dec 2010

  32. Wang, H., He, H., Yang, J., Yu, P., Yu, J.X.: Dual labeling: answering graph reachability queries in constant time. In: ICDE (2006)

  33. Wu, Z., Eadon, G., Das, S., Chong, E.I., Kolovski, V., Annamalai, M., Srinivasan, J.: Implementing an inference engine for rdfs/owl constructs and user-defined rules in oracle. In: International Conference on Data Engineering, pp. 1239–1248 (2008)

  34. Yildirim H., Chaoji V., Zaki M.J.: Grail: scalable reachability index for large graphs. PVLDB 3(1), 276–284 (2010)

    Google Scholar 

  35. Yu, J.X., Lin, X., Wang, H., Yu, P.S., Cheng, J.: Fast computation of reachability labeling for large graphs. In: EBDT (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hilmi Yıldırım.

Additional information

This work was supported in part by NSF Grants EMT-0829835, and CNS-0103708, and NIH Grant 1R01EB0080161-01A1.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yıldırım, H., Chaoji, V. & Zaki, M.J. GRAIL: a scalable index for reachability queries in very large graphs. The VLDB Journal 21, 509–534 (2012). https://doi.org/10.1007/s00778-011-0256-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-011-0256-4

Keywords

Navigation