The VLDB Journal

, Volume 21, Issue 4, pp 509–534 | Cite as

GRAIL: a scalable index for reachability queries in very large graphs

  • Hilmi Yıldırım
  • Vineet Chaoji
  • Mohammed J. Zaki
Regular Paper

Abstract

Given a large directed graph, rapidly answering reachability queries between source and target nodes is an important problem. Existing methods for reachability tradeoff indexing time and space versus query time performance. However, the biggest limitation of existing methods is that they do not scale to very large real-world graphs. We present a simple yet scalable reachability index, called GRAIL, that is based on the idea of randomized interval labeling and that can effectively handle very large graphs. Based on an extensive set of experiments, we show that while more sophisticated methods work better on small graphs, GRAIL is the only index that can scale to millions of nodes and edges. GRAIL has linear indexing time and space, and the query time ranges from constant time to being linear in the graph order and size. Our reference C++ implementations are open source and available for download at http://www.code.google.com/p/grail/.

Keywords

Graph query processing Scalable graph indexing Reachability queries 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal R., Borgida A., Jagadish H.V.: Efficient management of transitive relationships in large data and knowledge bases. SIGMOD Rec. 18(2), 253–262 (1989)CrossRefGoogle Scholar
  2. 2.
    Barabási A.L., Albert R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bouros, P., Skiadopoulos, S., Dalamagas, T., Sacharidis, D., Sellis, T.: Evaluating reachability queries over path collections. In: SSDBM, p. 416 (2009)Google Scholar
  4. 4.
    Bramandia, R., Choi, B., Ng, W.K.: On incremental maintenance of 2-hop labeling of graphs. In: WWW (2008)Google Scholar
  5. 5.
    Chen, L., Gupta, A., Kurul, M.E.: Stack-based algorithms for pattern matching on dags. In: VLDB (2005)Google Scholar
  6. 6.
    Chen, Y.: General spanning trees and reachability query evaluation. In: Canadian Conference on Computer Science and Software Engineering, Montreal (2009)Google Scholar
  7. 7.
    Chen, Y., Chen, Y.: An efficient algorithm for answering graph reachability queries. In: ICDE (2008)Google Scholar
  8. 8.
    Cheng, J., Yu, J.X., Lin, X., Wang, H., Yu, P.S.: Fast computing reachability labelings for large graphs with high compression rate. In: EBDT (2008)Google Scholar
  9. 9.
    Cohen, E.: Estimating the size of the transitive closure in linear time. In: 35th Annual Symposium on Foundations of Computer Science, pp. 190–200 (1994)Google Scholar
  10. 10.
    Cohen E.: Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci. 55(3), 441–453 (1997)MATHCrossRefGoogle Scholar
  11. 11.
    Cohen E., Halperin E., Kaplan H., Zwick U.: Reachability and distance queries via 2-hop labels. SIAM J. Comput. 32(5), 1335–1355 (2003)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Cormen T.H., Leiserson C.E., Rivest R.L., Stein C.: Introduction to Algorithms. MIT Press, Cambridge (2001)MATHGoogle Scholar
  13. 13.
    Demetrescu, C., Italiano, G.: Fully dynamic transitive closure: breaking through the O (n 2) barrier. In: FOCS (2000)Google Scholar
  14. 14.
    Demetrescu C., Italiano G.: Dynamic shortest paths and transitive closure: algorithmic techniques and data structures. J. Discret. Algorithms 4(3), 353–383 (2006)MathSciNetMATHCrossRefGoogle Scholar
  15. 15.
    Dietz, P.F.: Maintaining order in a linked list. In: STOC (1982)Google Scholar
  16. 16.
    Gene Ontology. http://www.geneontology.org/ (2010). Accessed 4 Dec 2010
  17. 17.
    Gene Ontology. Go Database Guide. http://www.geneontology.org/GO.database.shtml#schema_notes (2010). Accessed 4 Dec 2010
  18. 18.
    He, H., Wang, H., Yang, J., Yu, P.S.: Compact reachability labeling for graph-structured data. In: CIKM (2005)Google Scholar
  19. 19.
    Herman, I.L: W3c semantic web faq. http://www.w3.org/2001/sw/SW-FAQ# (2010). Accessed 4 Dec 2010
  20. 20.
    Jagadish H.V.: A compression technique to materialize transitive closure. ACM Trans. Database Syst. 15(4), 558–598 (1990)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Jin, R., Xiang, Y., Ruan, N., Fuhry, D.: 3-hop: a high-compression indexing scheme for reachability query. In: SIGMOD (2009)Google Scholar
  22. 22.
    Jin, R., Xiang, Y., Ruan, N., Wang, H.: Efficient answering reachability queries on very large directed graphs. In: SIGMOD (2008)Google Scholar
  23. 23.
    King V., Sagert G.: A fully dynamic algorithm for maintaining the transitive closure. J. Comput. Syst. Sci. 65(1), 150–167 (2002)MathSciNetMATHCrossRefGoogle Scholar
  24. 24.
    Krommidas I., Zaroliagis C.: An experimental study of algorithms for fully dynamic transitive closure. J. Exp. Algorithmics 12, 16 (2008)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Leskovec, J.: Snap Network Analysis Library. http://snap.stanford.edu/snap/index.html (2010). Accessed 4 Dec 2010
  26. 26.
    Roditty, L., Zwick, U.: A fully dynamic reachability algorithm for directed graphs with an almost linear update time. In: STOC (2004)Google Scholar
  27. 27.
    Schenkel, R., Theobald, A., Weikum, G.: HOPI: an efficient connection index for complex XML document collections. In: EBDT (2004)Google Scholar
  28. 28.
    Schenkel, R., Theobald, A., Weikum, G.: Efficient creation and incremental maintenance of the hopi index for complex xml document collections. In: ICDE (2005)Google Scholar
  29. 29.
    Steve Harris, G.: Sparql 1.1 Query Language. http://www.w3.org/TR/sparql11-query/#propertypaths (2010). Accessed 4 Dec 2010
  30. 30.
    Trissl, S., Leser, U.: Fast and practical indexing and querying of very large graphs. In: SIGMOD (2007)Google Scholar
  31. 31.
    UniProt. http://www.uniprot.org/ (2010). Accessed 4 Dec 2010
  32. 32.
    Wang, H., He, H., Yang, J., Yu, P., Yu, J.X.: Dual labeling: answering graph reachability queries in constant time. In: ICDE (2006)Google Scholar
  33. 33.
    Wu, Z., Eadon, G., Das, S., Chong, E.I., Kolovski, V., Annamalai, M., Srinivasan, J.: Implementing an inference engine for rdfs/owl constructs and user-defined rules in oracle. In: International Conference on Data Engineering, pp. 1239–1248 (2008)Google Scholar
  34. 34.
    Yildirim H., Chaoji V., Zaki M.J.: Grail: scalable reachability index for large graphs. PVLDB 3(1), 276–284 (2010)Google Scholar
  35. 35.
    Yu, J.X., Lin, X., Wang, H., Yu, P.S., Cheng, J.: Fast computation of reachability labeling for large graphs. In: EBDT (2006)Google Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Hilmi Yıldırım
    • 1
  • Vineet Chaoji
    • 2
  • Mohammed J. Zaki
    • 1
  1. 1.Rensselaer Polytechnic InstituteTroyUSA
  2. 2.Yahoo! LabsBangaloreIndia

Personalised recommendations